Commit graph

6239 commits

Author SHA1 Message Date
Radu Rendec
543821c6f5 [PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.
While trying to implement u32 hashes in my shaping machine I ran into
a possible bug in the u32 hash/bucket computing algorithm
(net/sched/cls_u32.c).

The problem occurs only with hash masks that extend over the octet
boundary, on little endian machines (where htonl() actually does
something).

Let's say that I would like to use 0x3fc0 as the hash mask. This means
8 contiguous "1" bits starting at b6. With such a mask, the expected
(and logical) behavior is to hash any address in, for instance,
192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
bucket 1, then 192.168.0.128/26 in bucket 2 and so on.

This is exactly what would happen on a big endian machine, but on
little endian machines, what would actually happen with current
implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl()
in the userspace tool and then applied to 192.168.x.x in the u32
classifier. When shifting right by 16 bits (rank of first "1" bit in
the reversed mask) and applying the divisor mask (0xff for divisor
256), what would actually remain is 0x3f applied on the "168" octet of
the address.

One could say is this can be easily worked around by taking endianness
into account in userspace and supplying an appropriate mask (0xfc03)
that would be turned into contiguous "1" bits when reversed
(0x03fc0000). But the actual problem is the network address (inside
the packet) not being converted to host order, but used as a
host-order value when computing the bucket.

Let's say the network address is written as n31 n30 ... n0, with n0
being the least significant bit. When used directly (without any
conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15
etc in the machine's registers. Thus bits n7 and n8 would no longer be
adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be
consecutive.

The fix is to apply ntohl() on the hmask before computing fshift,
and in u32_hash_fold() convert the packet data to host order before
shifting down by fshift.

With helpful feedback from Jamal Hadi Salim and Jarek Poplawski.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:11:45 -08:00
Jiri Olsa
40208d71e0 [NET]: Removing duplicit #includes
Removing duplicit #includes for net/

Signed-off-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:11:44 -08:00
Pavel Emelyanov
c3e9a353d8 [IPV4]: Compact some ifdefs in the fib code.
There are places that check for CONFIG_IP_MULTIPLE_TABLES
twice in the same file, but the internals of these #ifdefs
can be merged.

As a side effect - remove one ifdef from inside a function.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:11:41 -08:00
Alexey Dobriyan
33120b30cc [IPV6]: Convert /proc/net/ipv6_route to seq_file interface
This removes last proc_net_create() user. Kudos to Benjamin Thery and
Stephen Hemminger for comments on previous version.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:09:18 -08:00
Evgeniy Polyakov
4f9f8311a0 [PKT_SCHED]: Fix OOPS when removing devices from a teql queuing discipline
tecl_reset() is called from deactivate and qdisc is set to noop already,
but subsequent teql_xmit does not know about it and dereference private
data as teql qdisc and thus oopses.
not catch it first :)

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:09:17 -08:00
David S. Miller
c62cf5cb17 [DCCP]: Use DEFINE_PROTO_INUSE infrastructure.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:09:01 -08:00
Eric Dumazet
8295b6d9e6 [SCTP]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure
Trivial patch to make "sctcp,sctpv6" protocols uses the fast "inuse
sockets" infrastructure

Each protocol use then a static percpu var, instead of a dynamic one.
This saves some ram and some cpu cycles

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:09:00 -08:00
Eric Dumazet
c5a432f1a1 [IPV6]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure
Trivial patch to make "tcpv6,udpv6,udplitev6,rawv6" protocols uses the
fast "inuse sockets" infrastructure

Each protocol use then a static percpu var, instead of a dynamic one.
This saves some ram and some cpu cycles

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:59 -08:00
Eric Dumazet
47a31a6ffc [IPV4]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure
Trivial patch to make "tcp,udp,udplite,raw" protocols uses the fast
"inuse sockets" infrastructure

Each protocol use then a static percpu var, instead of a dynamic one.
This saves some ram and some cpu cycles

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:58 -08:00
Eric Dumazet
286ab3d460 [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way.
"struct proto" currently uses an array stats[NR_CPUS] to track change on
'inuse' sockets per protocol.

If NR_CPUS is big, this means we use a big memory area for this.
Moreover, all this memory area is located on a single node on NUMA
machines, increasing memory pressure on the boot node.

In this patch, I tried to :

- Keep a fast !CONFIG_SMP implementation
- Keep a fast CONFIG_SMP implementation for often used protocols
(tcp,udp,raw,...)
- Introduce a NUMA efficient implementation

Some helper macros are defined in include/net/sock.h
These macros take into account CONFIG_SMP

If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
REF_PROTO_INUSE
macros, it will automatically use a default implementation, using a
dynamically allocated percpu zone.
This default implementation will be NUMA efficient, but might use 32/64
bytes per possible cpu
because of current alloc_percpu() implementation.
However it still should be better than previous implementation based on
stats[NR_CPUS] field.

When a "struct proto" is changed to use the new macros, we use a single
static "int" percpu variable,
lowering the memory and cpu costs, still preserving NUMA efficiency.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:57 -08:00
Pavel Emelyanov
6a9fb9479f [IPV4]: Clean the ip_sockglue.c from some ugly ifdefs
The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
which looks completely bad. Similar ifdefs inside the functions
looks a bit better, but they are also not recommended to be used.

Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:55 -08:00
Alexey Dobriyan
4e058063f4 [DECNET]: "addr" module param can't be __initdata
sysfs keeps references to module parameters via /sys/module/*/parameters,
so marking them as __initdata can't work.

Steps to reproduce:

	modprobe decnet
	cat /sys/module/decnet/parameters/addr

BUG: unable to handle kernel paging request at virtual address f88cd410
printing eip: c043dfd1 *pdpt = 0000000000004001 *pde = 0000000004408067 *pte = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: decnet sunrpc af_packet ipv6 binfmt_misc dm_mirror dm_multipath dm_mod sbs sbshc fan dock battery backlight ac power_supply parport loop rtc_cmos serio_raw rtc_core rtc_lib button amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore
Pid: 2099, comm: cat Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #6)
EIP: 0060:[<c043dfd1>] EFLAGS: 00210286 CPU: 1
EIP is at param_get_int+0x6/0x20
EAX: c5c87000 EBX: 00000000 ECX: 000080d0 EDX: f88cd410
ESI: f8a108f8 EDI: c5c87000 EBP: 00000000 ESP: c5c97f00
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 2099, ti=c5c97000 task=c641ee10 task.ti=c5c97000)
Stack: 00000000 f8a108f8 c5c87000 c043db6b f8a108f1 00000124 c043de1a c043db2f
       f88cd410 ffffffff c5c87000 f8a16bc8 f8a16bc8 c043dd69 c043dd54 c5dd5078
       c043dbc8 c5cc7580 c06ee64c c5d679f8 c04c431f c641f480 c641f484 00001000
Call Trace:
 [<c043db6b>] param_array_get+0x3c/0x62
 [<c043de1a>] param_array_set+0x0/0xdf
 [<c043db2f>] param_array_get+0x0/0x62
 [<c043dd69>] param_attr_show+0x15/0x2d
 [<c043dd54>] param_attr_show+0x0/0x2d
 [<c043dbc8>] module_attr_show+0x1a/0x1e
 [<c04c431f>] sysfs_read_file+0x7c/0xd9
 [<c04c42a3>] sysfs_read_file+0x0/0xd9
 [<c048d4b2>] vfs_read+0x88/0x134
 [<c042090b>] do_page_fault+0x0/0x7d5
 [<c048d920>] sys_read+0x41/0x67
 [<c04080fa>] sysenter_past_esp+0x6b/0xc1
 =======================
Code: 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 27 dd 6c c0 89 04 24 89 54 24 08 e8 ea 01 0c 00 83 c4 0c c3 83 ec 0c 8b 52 10 <8b> 12 c7 44 24 04 58 8c 6a c0 89 04 24 89 54 24 08 e8 ca 01 0c
EIP: [<c043dfd1>] param_get_int+0x6/0x20 SS:ESP 0068:c5c97f00

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:55 -08:00
Mitsuru Chinen
7a0ff716c2 [IPv6] SNMP: Restore Udp6InErrors incrementation
As the checksum verification is postponed till user calls recv or poll,
the inrementation of Udp6InErrors counter should be also postponed.
Currently, it is postponed in non-blocking operation case. However it
should be postponed in all case like the IPv4 code.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:54 -08:00
Alexey Dobriyan
3f192b5c58 [NET]: Remove /proc/net/stat/*_arp_cache upon module removal
neigh_table_init_no_netlink() creates them, but they aren't removed anywhere.

Steps to reproduce:

	modprobe clip
	rmmod clip
	cat /proc/net/stat/clip_arp_cache

BUG: unable to handle kernel paging request at virtual address f89d7758
printing eip: c05a99da *pdpt = 0000000000004001 *pde = 0000000004408067 *pte = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: atm af_packet ipv6 binfmt_misc sbs sbshc fan dock battery backlight ac power_supply parport loop rtc_cmos rtc_core rtc_lib serio_raw button k8temp hwmon amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore
Pid: 2082, comm: cat Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #4)
EIP: 0060:[<c05a99da>] EFLAGS: 00210256 CPU: 0
EIP is at neigh_stat_seq_next+0x26/0x3f
EAX: 00000001 EBX: f89d7600 ECX: c587bf40 EDX: 00000000
ESI: 00000000 EDI: 00000001 EBP: 00000400 ESP: c587bf1c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 2082, ti=c587b000 task=c5984e10 task.ti=c587b000)
Stack: c06228cc c5313790 c049e5c0 0804f000 c45a7b00 c53137b0 00000000 00000000
       00000082 00000001 00000000 00000000 00000000 fffffffb c58d6780 c049e437
       c45a7b00 c04b1f93 c587bfa0 00000400 0804f000 00000400 0804f000 c04b1f2f
Call Trace:
 [<c049e5c0>] seq_read+0x189/0x281
 [<c049e437>] seq_read+0x0/0x281
 [<c04b1f93>] proc_reg_read+0x64/0x77
 [<c04b1f2f>] proc_reg_read+0x0/0x77
 [<c048907e>] vfs_read+0x80/0xd1
 [<c0489491>] sys_read+0x41/0x67
 [<c04080fa>] sysenter_past_esp+0x6b/0xc1
 =======================
Code: e9 ec 8d 05 00 56 8b 11 53 8b 40 70 8b 58 3c eb 29 0f a3 15 80 91 7b c0 19 c0 85 c0 8d 42 01 74 17 89 c6 c1 fe 1f 89 01 89 71 04 <8b> 83 58 01 00 00 f7 d0 8b 04 90 eb 09 89 c2 83 fa 01 7e d2 31
EIP: [<c05a99da>] neigh_stat_seq_next+0x26/0x3f SS:ESP 0068:c587bf1c

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:53 -08:00
Pavel Emelyanov
bf138862b1 [IPV6]: Consolidate the ip cork destruction in ip6_output.c
The ip6_push_pending_frames and ip6_flush_pending_frames do the
same things to flush the sock's cork. Move this into a separate
function and save ~100 bytes from the .text

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:26 -08:00
Pavel Emelyanov
429f08e950 [IPV4]: Consolidate the ip cork destruction in ip_output.c
The ip_push_pending_frames and ip_flush_pending_frames do the
same things to flush the sock's cork. Move this into a separate
function and save ~80 bytes from the .text

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:25 -08:00
Bart De Schuymer
e011ff48ab [NETFILTER]: ebt_arp: fix --arp-gratuitous matching dependence on --arp-ip-{src,dst}
Fix --arp-gratuitous matching dependence on --arp-ip-{src,dst}

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Lutz Preßler <Lutz.Pressler@SerNet.DE>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:25 -08:00
Alexey Dobriyan
55d84acd36 [NETFILTER]: nf_sockopts list head cleanup
Code is using knowledge that nf_sockopt_ops::list list_head is first
field in structure by using casts. Switch to list_for_each_entry()
itetators while I am at it.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:24 -08:00
Patrick McHardy
d1332e0ab8 [NETFILTER]: remove unneeded rcu_dereference() calls
As noticed by Paul McKenney, the rcu_dereference calls in the init path
of NAT modules are unneeded, remove them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:23 -08:00
Jan Engelhardt
0795c65d9f [NETFILTER]: Clean up Makefile
Sort matches and targets in the NF makefiles.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:22 -08:00
Jan Engelhardt
ba5dc2756c [NETFILTER]: Copyright/Email update
Transfer all my copyright over to our company.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:20 -08:00
Alexey Dobriyan
7351a22a3a [NETFILTER]: ip{,6}_queue: convert to seq_file interface
I plan to kill ->get_info which means killing proc_net_create().

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:08:20 -08:00
Latchesar Ionkov
55762690e2 9p: add missing end-of-options record for trans_fd
The list of options that the fd transport accepts is missing end-of-options
marker. This patch adds it.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-11-06 08:02:53 -06:00
Latchesar Ionkov
dd1a458412 9p: return NULL when trans not found
v9fs_match_trans function returns arbitrary transport module instead of NULL
when the requested transport is not registered. This patch modifies the
function to return NULL in that case.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-11-06 08:02:53 -06:00
Jens Axboe
c46f2334c8 [SG] Get rid of __sg_mark_end()
sg_mark_end() overwrites the page_link information, but all users want
__sg_mark_end() behaviour where we just set the end bit. That is the most
natural way to use the sg list, since you'll fill it in and then mark the
end point.

So change sg_mark_end() to only set the termination bit. Add a sg_magic
debug check as well, and clear a chain pointer if it is set.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-02 08:47:06 +01:00
Adrian Bunk
87ae9afdca cleanup asm/scatterlist.h includes
Not architecture specific code should not #include <asm/scatterlist.h>.

This patch therefore either replaces them with
#include <linux/scatterlist.h> or simply removes them if they were
unused.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-02 08:47:06 +01:00
David S. Miller
49259d34c5 [IRDA] IRNET: Fix build when TCGETS2 is defined.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 02:26:38 -07:00
Stephen Hemminger
3b582cc14c [NET]: docbook fixes for netif_ functions
Documentation updates for network interfaces.

1. Add doc for netif_napi_add
2. Remove doc for unused returns from netif_rx
3. Add doc for netif_receive_skb

[ Incorporated minor mods from Randy Dunlap -DaveM ]

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 02:21:47 -07:00
Pavel Emelyanov
d57a9212e0 [NET]: Hide the net_ns kmem cache
This cache is only required to create new namespaces,
but we won't have them in CONFIG_NET_NS=n case.

Hide it under the appropriate ifdef.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:46:50 -07:00
Pavel Emelyanov
1a2ee93d28 [NET]: Mark the setup_net as __net_init
The setup_net is called for the init net namespace
only (int the CONFIG_NET_NS=n of course) from the __init
function, so mark it as __net_init to disappear with the
caller after the boot.

Yet again, in the perfect world this has to be under
#ifdef CONFIG_NET_NS, but it isn't guaranteed that every
subsystem is registered *after* the init_net_ns is set
up. After we are sure, that we don't start registering
them before the init net setup, we'll be able to move
this code under the ifdef.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:45:59 -07:00
Pavel Emelyanov
6a1a3b9f68 [NET]: Hide the dead code in the net_namespace.c
The namespace creation/destruction code is never called
if the CONFIG_NET_NS is n, so it's OK to move it under
appropriate ifdef.

The copy_net_ns() in the "n" case checks for flags and
returns -EINVAL when new net ns is requested. In a perfect
world this stub must be in net_namespace.h, but this
function need to know the CLONE_NEWNET value and thus
requires sched.h. On the other hand this header is to be
injected into almost every .c file in the networking code,
and making all this code depend on the sched.h is a
suicidal attempt.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:44:50 -07:00
Pavel Emelyanov
1dba323b3f [NETNS]: Make the init/exit hooks checks outside the loop
When the new pernet something (subsys, device or operations) is
being registered, the init callback is to be called for each
namespace, that currently exitst in the system. During the
unregister, the same is to be done with the exit callback.

However, not every pernet something has both calls, but the
check for the appropriate pointer to be not NULL is performed
inside the for_each_net() loop.

This is (at least) strange, so tune this.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:42:43 -07:00
Pavel Emelyanov
6257ff2177 [NET]: Forget the zero_it argument of sk_alloc()
Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.

Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.

This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope 
this particular split helped.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:39:31 -07:00
Pavel Emelyanov
154adbc846 [NET]: Remove bogus zero_it argument from sk_alloc
At this point nobody calls the sk_alloc(() with zero_it == 0,
so remove unneeded checks from it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:38:43 -07:00
Pavel Emelyanov
8fd1d178a3 [NET]: Make the sk_clone() lighter
The sk_prot_alloc() already performs all the stuff needed by the
sk_clone(). Besides, the sk_prot_alloc() requires almost twice
less arguments than the sk_alloc() does, so call the sk_prot_alloc()
saving the stack a bit.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:37:32 -07:00
Pavel Emelyanov
2e4afe7b35 [NET]: Move some core sock setup into sk_prot_alloc
The security_sk_alloc() and the module_get is a part of the
object allocations - move it in the proper place.

Note, that since we do not reset the newly allocated sock
in the sk_alloc() (memset() is removed with the previous
patch) we can safely do this.

Also fix the error path in sk_prot_alloc() - release the security
context if needed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:36:26 -07:00
Pavel Emelyanov
3f0666ee30 [NET]: Auto-zero the allocated sock object
We have a __GFP_ZERO flag that allocates a zeroed chunk of memory.
Use it in the sk_alloc() and avoid a hand-made memset().

This is a temporary patch that will help us in the nearest future :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:34:42 -07:00
Pavel Emelyanov
c308c1b20e [NET]: Cleanup the allocation/freeing of the sock object
The sock object is allocated either from the generic cache with
the kmalloc, or from the proc->slab cache.

Move this logic into an isolated set of helpers and make the
sk_alloc/sk_free look a bit nicer.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:33:50 -07:00
Pavel Emelyanov
1e2e6b89f1 [NET]: Move the get_net() from sock_copy()
The sock_copy() is supposed to just clone the socket. In a perfect
world it has to be just memcpy, but we have to handle the security
mark correctly. All the extra setup must be performed in sk_clone() 
call, so move the get_net() into more proper place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:31:26 -07:00
Pavel Emelyanov
f1a6c4da14 [NET]: Move the sock_copy() from the header
The sock_copy() call is not used outside the sock.c file,
so just move it into a sock.c

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:29:45 -07:00
Ilpo Järvinen
261ab365fa [TCP]: Another TAGBITS -> SACKED_ACKED|LOST conversion
Similar to commit 3eec0047d9, point of this is to avoid
skipping R-bit skbs.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:10:18 -07:00
Ilpo Järvinen
e56d6cd605 [TCP]: Process DSACKs that reside within a SACK block
DSACK inside another SACK block were missed if start_seq of DSACK
was larger than SACK block's because sorting prioritizes full
processing of the SACK block before DSACK. After SACK block
sorting situation is like this:

             SSSSSSSSS
                  D
                        SSSSSS
                               SSSSSSS

Because write_queue is walked in-order, when the first SACK block
has been processed, TCP is already past the skb for which the
DSACK arrived and we haven't taught it to backtrack (nor should
we), so TCP just continues processing by going to the next SACK
block after the DSACK (if any).

Whenever such DSACK is present, do an embedded checking during
the previous SACK block.

If the DSACK is below snd_una, there won't be overlapping SACK
block, and thus no problem in that case. Also if start_seq of
the DSACK is equal to the actual block, it will be processed
first.

Tested this by using netem to duplicate 15% of packets, and
by printing SACK block when found_dup_sack is true and the 
selected skb in the dup_sack = 1 branch (if taken):

  SACK block 0: 4344-5792 (relative to snd_una 2019137317)
  SACK block 1: 4344-5792 (relative to snd_una 2019137317) 

equal start seqnos => next_dup = 0, dup_sack = 1 won't occur...

  SACK block 0: 5792-7240 (relative to snd_una 2019214061)
  SACK block 1: 2896-7240 (relative to snd_una 2019214061)
  DSACK skb match 5792-7240 (relative to snd_una)

...and next_dup = 1 case (after the not shown start_seq sort),
went to dup_sack = 1 branch.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-01 00:09:37 -07:00
Stephen Rothwell
298bb62175 [AF_KEY]: suppress a warning for 64k pages.
On PowerPC allmodconfig build we get this:

net/key/af_key.c:400: warning: comparison is always false due to limited range of data type

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 23:57:05 -07:00
David S. Miller
51c739d1f4 [NET]: Fix incorrect sg_mark_end() calls.
This fixes scatterlist corruptions added by

	commit 68e3f5dd4d
	[CRYPTO] users: Fix up scatterlist conversion errors

The issue is that the code calls sg_mark_end() which clobbers the
sg_page() pointer of the final scatterlist entry.

The first part fo the fix makes skb_to_sgvec() do __sg_mark_end().

After considering all skb_to_sgvec() call sites the most correct
solution is to call __sg_mark_end() in skb_to_sgvec() since that is
what all of the callers would end up doing anyways.

I suspect this might have fixed some problems in virtio_net which is
the sole non-crypto user of skb_to_sgvec().

Other similar sg_mark_end() cases were converted over to
__sg_mark_end() as well.

Arguably sg_mark_end() is a poorly named function because it doesn't
just "mark", it clears out the page pointer as a side effect, which is
what led to these bugs in the first place.

The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable()
and arguably it could be converted to __sg_mark_end() if only so that
we can delete this confusing interface from linux/scatterlist.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 21:29:29 -07:00
Alexey Dobriyan
07afa04025 [IPVS]: Remove /proc/net/ip_vs_lblcr
It's under CONFIG_IP_VS_LBLCR_DEBUG option which never existed.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 21:16:27 -07:00
Daniel Lezcano
1675c7b254 [IPV6]: remove duplicate call to proc_net_remove
The file /proc/net/if_inet6 is removed twice.
First time in:
        inet6_exit
             ->addrconf_cleanup
And followed a few lines after by:
        inet6_exit
             -> if6_proc_exit

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 21:16:24 -07:00
Daniel Lezcano
310928d963 [NETNS]: fix net released by rcu callback
When a network namespace reference is held by a network subsystem,
and when this reference is decremented in a rcu update callback, we
must ensure that there is no more outstanding rcu update before
trying to free the network namespace.

In the normal case, the rcu_barrier is called when the network namespace
is exiting in the cleanup_net function.

But when a network namespace creation fails, and the subsystems are
undone (like the cleanup), the rcu_barrier is missing.

This patch adds the missing rcu_barrier.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 21:16:21 -07:00
Daniel Lezcano
93ee31f14f [NET]: Fix free_netdev on register_netdev failure.
Point 1:
The unregistering of a network device schedule a netdev_run_todo.
This function calls dev->destructor when it is set and the
destructor calls free_netdev.

Point 2:
In the case of an initialization of a network device the usual code
is:
 * alloc_netdev
 * register_netdev
    -> if this one fails, call free_netdev and exit with error.

Point 3:
In the register_netdevice function at the later state, when the device
is at the registered state, a call to the netdevice_notifiers is made.
If one of the notification falls into an error, a rollback to the
registered state is done using unregister_netdevice.

Conclusion:
When a network device fails to register during initialization because
one network subsystem returned an error during a notification call
chain, the network device is freed twice because of fact 1 and fact 2.
The second free_netdev will be done with an invalid pointer.

Proposed solution:
The following patch move all the code of unregister_netdevice *except*
the call to net_set_todo, to a new function "rollback_registered".

The following functions are changed in this way:
 * register_netdevice: calls rollback_registered when a notification fails
 * unregister_netdevice: calls rollback_register + net_set_todo, the call
                         order to net_set_todo is changed because it is the
                         latest now. Since it justs add an element to a list
                         that should not break anything.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 21:16:18 -07:00
Dirk Hohndel
e403149c92 Kbuild/doc: fix links to Documentation files
Fix links to files in Documentation/* in various Kconfig files

Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-30 14:26:30 -07:00
J. Bruce Fields
521c2a43b2 [SUNRPC]: fix rpc debugging
Commit baa3a2a0d2, by removing initialization
of the ctl_name field, broke this conditional, preventing the display of
rpc_tasks that you previously got when turning on rpc debugging.

[akpm@linux-foundation.org: coding-style fixes]

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-30 01:07:15 -07:00