Commit graph

3751 commits

Author SHA1 Message Date
Al Viro
177abc348a [EBTABLES]: Clean ebt_get_udc_positions() up.
Check for valid_hooks is redundant (newinfo->hook_entry[i] will
be NULL if bit i is not set).  Kill it, kill unused arguments.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:32:01 -08:00
Al Viro
0e795531c5 [EBTABLES]: Switch ebt_check_entry_size_and_hooks() to use of newinfo->hook_entry[]
kill unused arguments

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:32:01 -08:00
Al Viro
1f072c96fd [EBTABLES]: translate_table(): switch direct uses of repl->hook_info to newinfo
Since newinfo->hook_table[] already has been set up, we can switch to using
it instead of repl->{hook_info,valid_hooks}.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:32:00 -08:00
Al Viro
e4fd77deac [EBTABLES]: Move more stuff into ebt_verify_pointers().
Take intialization of ->hook_entry[...], ->entries_size and ->nentries
over there, pull the check for empty chains into the end of that sucker.

Now it's self-contained, so we can move it up in the very beginning of
translate_table() *and* we can rely on ->hook_entry[] being properly
transliterated after it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:59 -08:00
Al Viro
70fe9af47e [EBTABLES]: Pull the loop doing __ebt_verify_pointers() into a separate function.
It's easier to expand the iterator here *and* we'll be able to move all
uses of ebt_replace from translate_table() into this one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:57 -08:00
Al Viro
22b440bf9e [EBTABLES]: Split ebt_check_entry_size_and_hooks
Split ebt_check_entry_size_and_hooks() in two parts - one that does
sanity checks on pointers (basically, checks that we can safely
use iterator from now on) and the rest of it (looking into details
of entry).

The loop applying ebt_check_entry_size_and_hooks() is split in two.

Populating newinfo->hook_entry[] is done in the first part.

Unused arguments killed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:56 -08:00
Al Viro
14197d5447 [EBTABLES]: Prevent wraparounds in checks for entry components' sizes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:56 -08:00
Al Viro
98a0824a0f [EBTABLES]: Deal with the worst-case behaviour in loop checks.
No need to revisit a chain we'd already finished with during
the check for current hook.  It's either instant loop (which
we'd just detected) or a duplicate work.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:55 -08:00
Al Viro
40642f95f5 [EBTABLES]: Verify that ebt_entries have zero ->distinguisher.
We need that for iterator to work; existing check had been too weak.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:54 -08:00
Al Viro
bb2ef25c2c [EBTABLES]: Fix wraparounds in ebt_entries verification.
We need to verify that
	a) we are not too close to the end of buffer to dereference
	b) next entry we'll be checking won't be _before_ our

While we are at it, don't subtract unrelated pointers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:53 -08:00
Andrew Morton
b6332e6cf9 [TCP]: Fix warnings with TCP_MD5SIG disabled.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:52 -08:00
Adrian Bunk
f5b99bcddd [NET]: Possible cleanups.
This patch contains the following possible cleanups:
- make the following needlessly global functions statis:
  - ipv4/tcp.c: __tcp_alloc_md5sig_pool()
  - ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup()
  - ipv4/udplite.c: udplite_rcv()
  - ipv4/udplite.c: udplite_err()
- make the following needlessly global structs static:
  - ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops
  - ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific
  - ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops
- net/ipv{4,6}/udplite.c: remove inline's from static functions
                          (gcc should know best when to inline them)

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:51 -08:00
Miika Komu
2718aa7c55 [IPSEC]: Add AF_KEY interface for encapsulation family.
Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
2006-12-02 21:31:50 -08:00
Miika Komu
8511d01d7c [IPSEC]: Add netlink interface for the encapsulation family.
Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:49 -08:00
Miika Komu
76b3f055f3 [IPSEC]: Add encapsulation family.
Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:48 -08:00
David S. Miller
08dd1a506b [TCP] MD5SIG: Kill CONFIG_TCP_MD5SIG_DEBUG.
It just obfuscates the code and adds limited value.  And as Adrian
Bunk noticed, it lacked Kconfig help text too, so just kill it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:47 -08:00
Patrick McHardy
e488eafcc5 [NET_SCHED]: Fix endless loops (part 5): netem/tbf/hfsc ->requeue failures
When peeking at the next packet in a child qdisc by calling dequeue/requeue,
the upper qdisc qlen counter may get out of sync in case the requeue fails.
The qdisc and the child qdisc both have their counter decremented, but since
no packet is given to the upper qdisc it won't decrement its counter itself.

requeue should not fail, so this is mostly for "correctness".

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:46 -08:00
Patrick McHardy
256d61b87b [NET_SCHED]: Fix endless loops (part 4): HTB
Convert HTB to use qdisc_tree_decrease_len() and add a callback
for deactivating a class when its child queue becomes empty.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:45 -08:00
Patrick McHardy
f973b913e1 [NET_SCHED]: Fix endless loops (part 3): HFSC
Convert HFSC to use qdisc_tree_decrease_len() and add a callback
for deactivating a class when its child queue becomes empty.

All queue purging goes through hfsc_purge_queue(), which is used in
three cases: grafting, class creation (when a leaf class is turned
into an intermediate class by attaching a new class) and class
deletion. In all cases qdisc_tree_decrease_len() is needed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:44 -08:00
Patrick McHardy
5e50da01d0 [NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs
Convert the "simple" qdiscs to use qdisc_tree_decrease_qlen() where
necessary:

- all graft operations
- destruction of old child qdiscs in prio, red and tbf change operation
- purging of queue in sfq change operation

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:43 -08:00
Patrick McHardy
43effa1e57 [NET_SCHED]: Fix endless loops caused by inaccurate qlen counters (part 1)
There are multiple problems related to qlen adjustment that can lead
to an upper qdisc getting out of sync with the real number of packets
queued, leading to endless dequeueing attempts by the upper layer code.

All qdiscs must maintain an accurate q.qlen counter. There are basically
two groups of operations affecting the qlen: operations that propagate
down the tree (enqueue, dequeue, requeue, drop, reset) beginning at the
root qdisc and operations only affecting a subtree or single qdisc
(change, graft, delete class). Since qlen changes during operations from
the second group don't propagate to ancestor qdiscs, their qlen values
become desynchronized.

This patch adds a function to propagate qlen changes up the qdisc tree,
optionally calling a callback function to perform qdisc-internal
maintenance when the child qdisc becomes empty. The follow-up patches
will convert all qdiscs to use this function where necessary.

Noticed by Timo Steinbach <tsteinbach@astaro.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:42 -08:00
Patrick McHardy
9f9afec482 [NET_SCHED]: Set parent classid in default qdiscs
Set parent classids in default qdiscs to allow walking up the tree
from outside the qdiscs. This is needed by the next patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:41 -08:00
Patrick McHardy
814a175e7b [NET_SCHED]: sch_htb: perform qlen adjustment immediately in ->delete
qlen adjustment should happen immediately in ->delete and not in the
class destroy function because the reference count will not hit zero in
->delete (sch_api holds a reference) but in ->put. Since the qdisc
lock is released between deletion of the class and final destruction
this creates an externally visible error in the qlen counter.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:40 -08:00
Paul Moore
484b366932 NetLabel: add the ranged tag to the CIPSOv4 protocol
Add support for the ranged tag (tag type #5) to the CIPSOv4 protocol.

The ranged tag allows for seven, or eight if zero is the lowest category,
category ranges to be specified in a CIPSO option.  Each range is specified by
two unsigned 16 bit fields, each with a maximum value of 65534.  The two values
specify the start and end of the category range; if the start of the category
range is zero then it is omitted.

See Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt for more details.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02 21:31:38 -08:00
Paul Moore
654bbc2a2b NetLabel: add the enumerated tag to the CIPSOv4 protocol
Add support for the enumerated tag (tag type #2) to the CIPSOv4 protocol.

The enumerated tag allows for 15 categories to be specified in a CIPSO option,
where each category is an unsigned 16 bit field with a maximum value of 65534.

See Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt for more details.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02 21:31:37 -08:00
Paul Moore
0275276035 NetLabel: convert to an extensibile/sparse category bitmap
The original NetLabel category bitmap was a straight char bitmap which worked
fine for the initial release as it only supported 240 bits due to limitations
in the CIPSO restricted bitmap tag (tag type 0x01).  This patch converts that
straight char bitmap into an extensibile/sparse bitmap in order to lay the
foundation for other CIPSO tag types and protocols.

This patch also has a nice side effect in that all of the security attributes
passed by NetLabel into the LSM are now in a format which is in the host's
native byte/bit ordering which makes the LSM specific code much simpler; look
at the changes in security/selinux/ss/ebitmap.c as an example.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2006-12-02 21:31:36 -08:00
Pablo Neira Ayuso
ef91fd522b [NETFILTER]: remove the reference to ipchains from Kconfig
It is time to move on :-)

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:35 -08:00
Patrick McHardy
76592584be [NETFILTER]: Fix PROC_FS=n warnings
Fix some unused function/variable warnings.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:34 -08:00
Patrick McHardy
65195686ff [NETFILTER]: remove remaining ASSERT_{READ,WRITE}_LOCK
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:33 -08:00
Bart De Schuymer
d12cdc3ccf [NETFILTER]: ebtables: add --snap-arp option
The attached patch adds --snat-arp support, which makes it possible to
change the source mac address in both the mac header and the arp header
with one rule.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:32 -08:00
Patrick McHardy
baf7b1e112 [NETFILTER]: x_tables: add NFLOG target
Add new NFLOG target to allow use of nfnetlink_log for both IPv4 and IPv6.
Currently we have two (unsupported by userspace) hacks in the LOG and ULOG
targets to optionally call to the nflog API. They lack a few features,
namely the IPv4 and IPv6 LOG targets can not specify a number of arguments
related to nfnetlink_log, while the ULOG target is only available for IPv4.
Remove those hacks and add a clean way to use nfnetlink_log.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:31 -08:00
Patrick McHardy
39b46fc6f0 [NETFILTER]: x_tables: add port of hashlimit match for IPv4 and IPv6
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:31 -08:00
Patrick McHardy
d7a5c32442 [NETFILTER]: nfnetlink_log: remove useless prefix length limitation
There is no reason for limiting netlink attributes in size.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:30 -08:00
Eric Leblond
829e17a1a6 [NETFILTER]: nfnetlink_queue: allow changing queue length through netlink
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:29 -08:00
Pablo Neira Ayuso
7b621c1ea6 [NETFILTER]: ctnetlink: rework conntrack fields dumping logic on events
|   NEW   | UPDATE  | DESTROY |
     ----------------------------------------|
     tuples    |    Y    |    Y    |    Y    |
     status    |    Y    |    Y    |    N    |
     timeout   |    Y    |    Y    |    N    |
     protoinfo |    S    |    S    |    N    |
     helper    |    S    |    S    |    N    |
     mark      |    S    |    S    |    N    |
     counters  |    F    |    F    |    Y    |

 Leyend:
         Y: yes
         N: no
         S: iif the field is set
	 F: iif overflow

This patch also replace IPCT_HELPINFO by IPCT_HELPER since we want to
track the helper assignation process, not the changes in the private
information held by the helper.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:28 -08:00
Pablo Neira Ayuso
bbb3357d14 [NETFILTER]: ctnetlink: check for status attribute existence on conntrack creation
Check that status flags are available in the netlink message received
to create a new conntrack.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:27 -08:00
Patrick McHardy
1b683b5512 [NETFILTER]: sip conntrack: better NAT handling
The NAT handling of the SIP helper has a few problems:

- Request headers are only mangled in the reply direction, From/To headers
  not at all, which can lead to authentication failures with DNAT in case
  the authentication domain is the IP address

- Contact headers in responses are only mangled for REGISTER responses

- Headers may be mangled even though they contain addresses not
  participating in the connection, like alternative addresses

- Packets are droppen when domain names are used where the helper expects
  IP addresses

This patch takes a different approach, instead of fixed rules what field
to mangle to what content, it adds symetric mapping of From/To/Via/Contact
headers, which allows to deal properly with echoed addresses in responses
and foreign addresses not belonging to the connection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:26 -08:00
Patrick McHardy
77a78dec48 [NETFILTER]: sip conntrack: make header shortcuts optional
Not every header has a shortcut, so make them optional instead
of searching for the same string twice.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:25 -08:00
Patrick McHardy
40883e8184 [NETFILTER]: sip conntrack: do case insensitive SIP header search
SIP headers are generally case-insensitive, only SDP headers are
case sensitive.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:24 -08:00
Patrick McHardy
9d5b8baa4e [NETFILTER]: sip conntrack: minor cleanup
- Use enum for header field enumeration
- Use numerical value instead of pointer to header info structure to
  identify headers, unexport ct_sip_hdrs
- group SIP and SDP entries in header info structure
- remove double forward declaration of ct_sip_get_info

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:23 -08:00
Patrick McHardy
337fbc4166 [NETFILTER]: ip_conntrack: fix NAT helper unload races
The NAT helpr hooks are protected by RCU, but all of the
conntrack helpers test and use the global pointers instead
of copying them first using rcu_dereference()

Also replace synchronize_net() by synchronize_rcu() for clarity
since sychronizing only with packet receive processing is
insufficient to prevent races.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:22 -08:00
Yasuyuki Kozakai
468ec44bd5 [NETFILTER]: conntrack: add '_get' to {ip, nf}_conntrack_expect_find
We usually uses 'xxx_find_get' for function which increments
reference count.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:21 -08:00
Patrick McHardy
e4bd8bce3e [NETFILTER]: nf_conntrack: /proc compatibility with old connection tracking
This patch adds /proc/net/ip_conntrack, /proc/net/ip_conntrack_expect and
/proc/net/stat/ip_conntrack files to keep old programs using them working.

The /proc/net/ip_conntrack and /proc/net/ip_conntrack_expect files show only
IPv4 entries, the /proc/net/stat/ip_conntrack shows global statistics.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:20 -08:00
Patrick McHardy
a999e68376 [NETFILTER]: nf_conntrack: sysctl compatibility with old connection tracking
This patch adds an option to keep the connection tracking sysctls visible
under their old names.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:19 -08:00
Patrick McHardy
933a41e7e1 [NETFILTER]: nf_conntrack: move conntrack protocol sysctls to individual modules
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:18 -08:00
Patrick McHardy
d62f9ed4a4 [NETFILTER]: nf_conntrack: automatic sysctl registation for conntrack protocols
Add helper functions for sysctl registration with optional instantiating
of common path elements (like net/netfilter) and use it for support for
automatic registation of conntrack protocol sysctls.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:17 -08:00
Patrick McHardy
f8eb24a89a [NETFILTER]: nf_conntrack: move extern declaration to header files
Using extern in a C file is a bad idea because the compiler can't
catch type errors.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:16 -08:00
Patrick McHardy
d734685334 [NETFILTER]: nf_conntrack_ftp: fix missing helper mask initilization
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:15 -08:00
Martin Josefsson
be00c8e489 [NETFILTER]: nf_conntrack: reduce timer updates in __nf_ct_refresh_acct()
Only update the conntrack timer if there's been at least HZ jiffies since
the last update. Reduces the number of del_timer/add_timer cycles from one
per packet to one per connection per second (plus once for each state change
of a connection)

Should handle timer wraparounds and connection timeout changes.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:14 -08:00
Martin Josefsson
824621eddd [NETFILTER]: nf_conntrack: remove unused struct list_head from protocols
Remove unused struct list_head from struct nf_conntrack_l3proto and
nf_conntrack_l4proto as all protocols are kept in arrays, not linked
lists.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:13 -08:00
Martin Josefsson
3ffd5eeb1a [NETFILTER]: nf_conntrack: minor __nf_ct_refresh_acct() whitespace cleanup
Minor whitespace cleanup.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:12 -08:00
Martin Josefsson
951d36cace [NETFILTER]: nf_conntrack: remove ASSERT_{READ,WRITE}_LOCK
Remove the usage of ASSERT_READ_LOCK/ASSERT_WRITE_LOCK in nf_conntrack,
it didn't do anything, it was just an empty define and it uglified the code.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:11 -08:00
Martin Josefsson
ae5718fb3d [NETFILTER]: nf_conntrack: more sanity checks in protocol registration/unregistration
Add some more sanity checks when registering/unregistering l3/l4 protocols.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:10 -08:00
Martin Josefsson
605dcad6c8 [NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocol
Rename 'struct nf_conntrack_protocol' to 'struct nf_conntrack_l4proto' in
order to help distinguish it from 'struct nf_conntrack_l3proto'. It gets
rather confusing with 'nf_conntrack_protocol'.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:09 -08:00
Martin Josefsson
e2b7606cdb [NETFILTER]: More __read_mostly annotations
Place rarely written variables in the read-mostly section by using
__read_mostly

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:08 -08:00
Martin Josefsson
8f03dea52b [NETFILTER]: nf_conntrack: split out protocol handling
This patch splits out L3/L4 protocol handling into its own file
nf_conntrack_proto.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:07 -08:00
Martin Josefsson
f61801218a [NETFILTER]: nf_conntrack: split out the event cache
This patch splits out the event cache into its own file
nf_conntrack_ecache.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:06 -08:00
Martin Josefsson
7e5d03bb9d [NETFILTER]: nf_conntrack: split out helper handling
This patch splits out handling of helpers into its own file
nf_conntrack_helper.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:05 -08:00
Martin Josefsson
77ab9cff0f [NETFILTER]: nf_conntrack: split out expectation handling
This patch splits out expectation handling into its own file
nf_conntrack_expect.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:04 -08:00
David S. Miller
d2e4bdc870 [TCP] Vegas: Increase default alpha to 2 and beta to 4.
This helps Vegas cope better with delayed ACKs, see
analysis at:

http://www.cs.caltech.edu/%7Eweixl/technical/ns2linux/known_linux/index.html#vegas

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:03 -08:00
Gerrit Renker
6b57c93dc3 [DCCP]: Use `unsigned' for packet lengths
This patch implements a suggestion by Ian McDonald and

 1) Avoids tests against negative packet lengths by using unsigned int
    for packet payload lengths in the CCID send_packet()/packet_sent() routines

 2) As a consequence, it removes an now unnecessary test with regard to `len > 0'
    in ccid3_hc_tx_packet_sent: that condition is always true, since
      * negative packet lengths are avoided
      * ccid3_hc_tx_send_packet flags an error whenever the payload length is 0.
        As a consequence, ccid3_hc_tx_packet_sent is never called as all errors
        returned by ccid_hc_tx_send_packet are caught in dccp_write_xmit

 3) Removes the third argument of ccid_hc_tx_send_packet (the `len' parameter),
    since it is currently always set to skb->len. The code is updated with regard
    to this parameter change.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:31:02 -08:00
Gerrit Renker
a79ef76f4d [DCCP] ccid3: Larger initial windows
This implements the larger-initial-windows feature for CCID 3, as described in
section 5 of RFC 4342. When the first feedback packet arrives, the sender can
send up to 2..4 packets per RTT, instead of just one.

The patch further
 * reduces the number of timestamping calls by passing the timestamp value
   (which is computed in one of the calling functions anyway) as argument

 * renames one constant with a very long name into one which is shorter and
   resembles the one in RFC 3448 (t_mbi)

 * simplifies some of the min_t/max_t cases where both `x', `y' have the same
   type

Commiter note: renamed TFRC_t_mbi to TFRC_T_MBI, to follow Linux coding style.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:31:01 -08:00
Arnaldo Carvalho de Melo
841bac1d60 [DCCP]: Make {set,get}sockopt(DCCP_SOCKOPT_PACKET_SIZE) return 0
To reflect the fact that this now is of no effect, not making apps
stop working, just be warned in the system log.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:31:00 -08:00
Gerrit Renker
5aed324369 [DCCP]: Tidy up unused structures
This removes and cleans up unused variables and structures which have become
unnecessary following the introduction of the EWMA patch to automatically track
the CCID 3 receiver/sender packet sizes `s'.

It deprecates the PACKET_SIZE socket option by returning an error code and
printing a deprecation warning if an application tries to read or write this
socket option.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:59 -08:00
Gerrit Renker
78ad713da6 [DCCP] ccid3: Track RX/TX packet size `s' using moving-average
Problem:
2006-12-02 21:30:58 -08:00
Gerrit Renker
2a1fda6f6c [DCCP] ccid3: Set NoFeedback Timeout according to RFC 3448
This corrects the setting of the nofeedback timer with regard to RFC
3448 - previously it was not set to max(4*R, 2*s/X) as specified. Using
the maximum of 1 second as upper bound (as it was done before) can have
detrimental effects, especially if R is small.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:57 -08:00
Gerrit Renker
4384260443 [DCCP]: Remove allocation of sysctl numbers
This is in response to a request sent earlier by Eric W. Biederman
and replaces all sysctl numbers for net.dccp.default with CTL_UNNUMBERED.

It has been tested to compile and to work.

Commiter note: I've removed the use of CTL_UNNUMBERED, not setting .ctl_name
               sets it to 0, that is the what CTL_UNNUMBERED is, reason is
               to avoid unneeded source code cluttering.

Signed-off-by: Gerrit Renker  <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:56 -08:00
Gerrit Renker
5d0dbc4a9b [DCCP] ccid3: Consolidate handling of t_RTO
This patch
 * removes setting t_RTO in ccid3_hc_tx_init (per [RFC 3448, 4.2], t_RTO is
   undefined until feedback has been received);

 * makes some trivial changes (updates of comments);

 * performs a small optimisation by exploiting that the feedback timeout
   uses the value of t_ipi. The way it is done is safe, because the timeouts
   appear after the changes to t_ipi, ensuring that up-to-date values are used;

 * in ccid3_hc_tx_packet_recv, moves the t_rto statement closer to the calculation
   of the next_tmout. This makes the code clearer to read and is also safe, since
   t_rto is not updated until the next call of ccid3_hc_tx_packet_recv, and is not
   read by the functions called via ccid_wait_for_ccid();

 * removes a `max' statement in sk_reset_timer, this is not needed since the timeout
   value is always greater than 1E6 microseconds.

 * adds `XXX'es to highlight that currently the nofeedback timer is set
   in a non-standard way

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:52 -08:00
Gerrit Renker
17893bc1a6 [DCCP] ccid3: Consistently update t_nom, t_ipi, t_delta
This patch:

 * consolidates updating of parameters (t_nom, t_ipi, t_delta) which
   need to be updated at the same time, since they are inter-dependent

 * removes two inline functions which are no longer needed as a result of
   the above consolidation

 * resolves a FIXME regarding the re-calculation of t_ipi within the nofeedback
   timer, in the state where no feedback has previously been received

 * ties updating these parameters to updating the sending rate X, exploiting
   that all three parameters in turn depend on X; and using a small optimisation
   which can reduce the number of required instructions: only update the three
   parameters when X really changes

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:51 -08:00
Gerrit Renker
48e03eee71 [DCCP] ccid3: Consolidate timer resets
This patch concerns updating the value of the nofeedback timer when no feedback
has been received so far.

Since in this case the value of R is still undefined according to [RFC 3448,
4.2], we can not perform step (3) of [RFC 3448, 4.3].  A clarification is
provided in [RFC 4342, sec. 5], which states that in these cases the nofeedback
timer (still) expires "after two seconds".

Many thanks to Ian McDonald for pointing this out and providing the
clarification.

The patch
  * implements [RFC 4342, sec. 5] with regard to the above case
  * consolidates handling timer restart by
	- adding an appropriate jump label and
	- initialising the timeout value

Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:50 -08:00
Jamal Hadi Salim
b798a9ede2 [XFRM]: Convert a few __u8 to proper u8
Caught by the EyeBalls(tm) of Thomas Graf

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:50 -08:00
Jamal Hadi Salim
0c51f53c57 [XFRM]: Make flush notifier prettier when subpolicy used
Might as well make flush notifier prettier when subpolicy used

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:49 -08:00
Gerrit Renker
4c0a6cb0db [UDP(-Lite)]: consolidate v4 and v6 get|setsockopt code
This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The
justification is that UDP(-Lite) is a transport-layer protocol and therefore
the socket option code (at least in theory) should be AF-independent.

Furthermore, there is the following code reduplication:
 * do_udp{,v6}_getsockopt is 100% identical between v4 and v6
 * do_udp{,v6}_setsockopt is identical up to the following differerence
	--v4 in contrast to v4 additionally allows the experimental encapsulation
          types  UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE
	--the remainder is identical between v4 and v6
   I believe that this difference is of little relevance.

The advantages in not duplicating twice almost completely identical code.

The patch further simplifies the interface of udp{,v6}_push_pending_frames,
since for the second argument (struct udp_sock *up) it always holds that
up = udp_sk(sk); where sk is the first function argument.

Signed-off-by: Gerrit Renker  <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:45 -08:00
Thomas Graf
e3703b3de1 [RTNETLINK]: Add rtnl_put_cacheinfo() to unify some code
IPv4, IPv6, and DECNet all use struct rta_cacheinfo in a similiar
way, therefore rtnl_put_cacheinfo() is added to reuse code.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:44 -08:00
Thomas Graf
4e9b826935 [NETLINK]: Remove unused dst_pid field in netlink_skb_parms
The destination PID is passed directly to netlink_unicast()
respectively netlink_multicast().

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:43 -08:00
Gerrit Renker
5e19e3fcd7 [DCCP] ccid3: Resolve small FIXME
This considers the  case - ACK received while no packet has been sent
so far. Resolved by printing a (rate-limited) warning message.

Further removes an unnecessary BUG_ON in ccid3_hc_tx_packet_recv,
received feedback on a terminating connection is simply ignored.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:41 -08:00
Gerrit Renker
70dbd5b0ef [DCCP] ccid3: Remove redundant statements in ccid3_hc_tx_packet_sent
This patch removes a switch statement which is redundant since,
 * nothing is done in states TFRC_SSTATE_NO_SENT/TFRC_SSTATE_NO_FBACK
 * it is impossible that the function is called in the state TFRC_SSTATE_TERM, since
       --the function is called, in dccp_write_xmit, after ccid3_hc_tx_send_packet
       --if ccid3_hc_tx_send_packet is called in state TFRC_SSTATE_TERM, it returns
         -EINVAL, which means that ccid3_hc_tx_packet_sent will not be called
	 (compare dccp_write_xmit)
       --> therefore, this case is logically impossible
 * the remaining state is TFRC_SSTATE_FBACK which conditionally updates t_ipi, t_nom,
   and t_delta. This is a no-op, since
       --t_ipi only changes when feedback is received
       --however, when feedback arrives via ccid3_hc_tx_packet_recv, there is an identical
         code block which performs the same set of operations
       --performing the same set of operations again in ccid3_hc_tx_packet_sent therefore
         does not change anything, since between the time of receiving the last feedback
	 (and therefore update of t_ipi, t_nom, and t_delta), the value of t_ipi has not
	 changed
       --since t_ipi has not changed, the values of t_delta and t_nom also do not change,
         they depend fully on t_ipi

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:40 -08:00
Gerrit Renker
da335baf9e [DCCP] ccid3: Avoid congestion control on zero-sized data packets
This resolves an `XXX' in ccid3_hc_tx_send_packet().

The function is only called on Data and DataAck packets and returns a negative
result on zero-sized messages. This is a reasonable policy since CCID 3 is a
congestion-control module and congestion control on zero-sized Data(Ack)
packets is in a way pathological.

The patch uses a more suitable error code for this case, it returns the Posix.1
code `EBADMSG' ("Not a data message") instead of `ENOTCONN'.

As a result of ignoring zero-sized packets, a the condition for a warning
"First packet is data" in ccid3_hc_tx_packet_sent is always satisfied; this
message has been removed since it will always be printed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:39 -08:00
Gerrit Renker
7da7f456d7 [DCCP] ccid3: Simplify control flow of ccid3_hc_tx_send_packet
This makes some logically equivalent simplifications, by replacing
rc - values plus goto's with direct return statements.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:38 -08:00
Gerrit Renker
91cf5a1725 [DCCP] ccid3: Fix calculation of t_ipi time of scheduled transmission
Problem:
2006-12-02 21:30:37 -08:00
Gerrit Renker
f5c2d6367b [DCCP] ccid3: Simplify control flow in the calculation of t_ipi
This patch performs a simplifying (performance) optimisation:

 In each call of the inline function ccid3_calc_new_t_ipi(), the state is
 tested against TFRC_SSTATE_NO_FBACK. This is expensive when the function
 is called very often. A simpler solution, implemented by this patch, is
 to adapt the control flow.

Background:
2006-12-02 21:30:36 -08:00
Gerrit Renker
90feeb951f [DCCP] ccid3: Fix bug in calculation of first t_nom and first t_ipi
Problem:
2006-12-02 21:30:35 -08:00
Andrea Bittau
6472c051fc [DCCP] ccid2: Allow window to grow larger
Now that we can stuff bigger ack vectors into options.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:34 -08:00
Andrea Bittau
522f1d095b [DCCP] ackvec: Split long ack vectors across multiple options
Ack vectors grow proportional to the window size.  If an ack vector does not fit
into a single option, it must be spread across multiple options.  This patch
will allow for windows to grow larger.

Committer note: Simplified the patch a bit, original algorithm kept.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:33 -08:00
Andrea Bittau
bdf13d208d [DCCP] ackvec: infrastructure for sending more than one ackvec per packet
Commiter note:

This was split from Andrea's original patch, in the process I changed the type
of the ackvec index fields to u16 instead of to int and haven't folded
dccp_ackvec_parse with dccp_ackvec_check_rcv_ackno.

Next patch will actually do the insertion of more than one ackvec per packet,
using, initially, up to a max of 2 ackvecs as per Andrea's original patch, then
I'll work on support for larger ackvecs, be it using a sysctl or using
setsockopt.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:32 -08:00
Andrea Bittau
0bd4ff1b15 [DCCP] ackvec: Remove unused dccpav_ack_ptr field from dccp_ackvec
Commiter note: original patch was splitted.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:31 -08:00
Thomas Graf
4a89c2562c [DECNET] address: Convert to new netlink interface
Extends the netlink interface to support the __le16 type and
converts address addition, deletion and, dumping to use the
new netlink interface.

Fixes multiple occasions of possible illegal memory references
due to not validated netlink attributes.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:30 -08:00
Thomas Graf
b020b942cd [DECNET] address: Rename rtmsg_ifa() to dn_ifaddr_notify()
The name rtmsg_ifa is heavly overused and confusing.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:29 -08:00
Thomas Graf
a6f01cace3 [DECNET] address: Calculate accurate message size for netlink notifications
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:28 -08:00
Ville Nuorvala
107a5fe619 [IPV6]: Improve IPv6 tunnel error reporting
Log an error if the remote tunnel endpoint is unable to handle
tunneled packets.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:27 -08:00
Ville Nuorvala
6fb32ddeb2 [IPV6]: Don't allocate memory for Tunnel Encapsulation Limit Option
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:26 -08:00
Ville Nuorvala
305d4b3ce8 [IPV6]: Allow link-local tunnel endpoints
Allow link-local tunnel endpoints if the underlying link is defined.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:25 -08:00
Ville Nuorvala
09c6bbf090 [IPV6]: Do mandatory IPv6 tunnel endpoint checks in realtime
Doing the mandatory tunnel endpoint checks when the tunnel is set up
isn't enough as interfaces can go up or down and addresses can be
added or deleted after this. The checks need to be done realtime when
the tunnel is processing a packet.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:24 -08:00
Ville Nuorvala
567131a722 [IPV6]: Fix SIOCCHGTUNNEL bug in IPv6 tunnels
A logic bug in tunnel lookup could result in duplicate tunnels when
changing an existing device.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:24 -08:00
Thomas Graf
e94ef68205 [GENETLINK] ctrl: Avoid empty CTRL_ATTR_OPS attribute when dumping
Based on Jamal's patch but compiled and even tested. :-)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:23 -08:00
Arnaldo Carvalho de Melo
cdbc6dae5c [XFRM]: Use kmemdup where appropriate
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:22 -08:00
Arnaldo Carvalho de Melo
2710b57ff9 [TIPC]: Use kzalloc where appropriate
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:21 -08:00
Arnaldo Carvalho de Melo
e69062b4f7 [SUNRPC]: Use k{mem,str}dup where applicable
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:20 -08:00
Arnaldo Carvalho de Melo
af997d8c95 [SCTP]: Use kzalloc where appropriate
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:19 -08:00
Arnaldo Carvalho de Melo
c7b1b24978 [SCHED]: Use kmemdup & kzalloc where appropriate
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:18 -08:00