kernel-fxtec-pro1x

Author	SHA1	Message	Date
Pavel Emelyanov	358352b8b8	[INET]: Cleanup the xfrm4_tunnel_(un)register Both check for the family to select an appropriate tunnel list. Consolidate this check and make the for() loop more readable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:48:54 -08:00
Pavel Emelyanov	99f933263a	[INET]: Add missed tunnel64_err handler The tunnel64_protocol uses the tunnel4_protocol's err_handler and thus calls the tunnel4_protocol's handlers. This is not very good, as in case of (icmp) error the wrong error handlers will be called (e.g. ipip ones instead of sit) and this won't be noticed at all, because the error is not reported. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:47:39 -08:00
Pavel Emelyanov	c2b42336f4	[IPX]: Use existing sock refcnt debugging infrastructure Just like in the af_packet.c, the ipx_sock_nr variable is used for debugging purposes. Switch to using existing infrastructure. Thanks to Arnaldo for pointing this out. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:39:26 -08:00
Pavel Emelyanov	17ab56a260	[PACKET]: Use existing sock refcnt debugging infrastructure The packet_socks_nr variable is used purely for debugging the number of sockets. As Arnaldo pointed out, there's already an infrastructure for this purposes, so switch to using it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:38:48 -08:00
Joe Perches	e9671fcb3b	[NET]: Fix infinite loop in dev_mc_unsync(). From: Joe Perches <joe@perches.com> Based upon an initial patch and report by Luis R. Rodriguez. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:36:04 -08:00
Pavel Emelyanov	03f49f3457	[NET]: Make helper to get dst entry and "use" it There are many places that get the dst entry, increase the __use counter and set the "lastuse" time stamp. Make a helper for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:28:34 -08:00
Pavel Emelyanov	b1667609cd	[IPV4]: Remove bugus goto-s from ip_route_input_slow Both places look like if (err == XXX) goto yyy; done: while both yyy targets look like err = XXX; goto done; so this is ok to remove the above if-s. yyy labels are used in other places and are not removed. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:26:41 -08:00
Ilpo J�rvinen	fbd52eb2bd	[TCP]: Split SACK FRTO flag clearing (fixes FRTO corner case bug) In case we run out of mem when fragmenting, the clearing of FLAG_ONLY_ORIG_SACKED might get missed which then feeds FRTO with false information. Move clearing outside skb processing loop so that it will get executed even if the skb loop terminates prematurely due to out-of-mem. Besides, now the core of the loop truly deals with a single skb only, which also enables creation a more self-contained of tcp_sacktag_one later on. In addition, small reorganization of if branches was made. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:24:19 -08:00
Ilpo J�rvinen	e49aa5d456	[TCP]: Add unlikely() to sacktag out-of-mem in fragment case Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:23:08 -08:00
Ilpo J�rvinen	c7caf8d3ed	[TCP]: Fix reord detection due to snd_una covered holes Fixes subtle bug like the one with fastpath_cnt_hint happening due to the way the GSO and hints interact. Because hints are not reset when just a GSOed skb is partially ACKed, there's no guarantee that the relevant part of the write queue is going to be processed in sacktag at all (skbs below snd_una) because fastpath hint can fast forward the entrypoint. This was also on the way of future reductions in sacktag's skb processing. Also future cleanups in sacktag can be made after this (in 2.6.25). This may make reordering update in tcp_try_undo_partial redundant but I'm not too sure so I left it there. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:22:18 -08:00
Ilpo J�rvinen	8dd71c5d28	[TCP]: Consider GSO while counting reord in sacktag Reordering detection fails to take account that the reordered skb may have pcount larger than 1. In such case the lowest of them had the largest reordering, the old formula used the highest of them which is pcount - 1 packets less reordered. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:20:59 -08:00
Vlad Yasevich	7d54dc6876	SCTP: Always flush the queue when uncorcking. When the code calls uncork, trigger a queue flush, even if the queue was not corked. Most callers that explicitely cork the queue will have additinal checks to see if they corked it. Callers who do not cork the queue expect packets to flow when they call uncork. The scneario that showcased this bug happend when we were not able to bundle DATA with outgoing COOKIE-ECHO. As a result the data just sat in the outqueue and did not get transmitted. The application expected a response, but nothing happened. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-09 11:43:41 -05:00
Vlad Yasevich	cd3ae8e615	SCTP: Fix PR-SCTP to deliver all the accumulated ordered chunks There is a small bug when we process a FWD-TSN. We'll deliver anything upto the current next expected SSN. However, if the next expected is already in the queue, it will take another chunk to trigger its delivery. The fix is to simply check the current queued SSN is the next expected one. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-09 11:43:41 -05:00
Vlad Yasevich	7ab9080467	SCTP: Make sctp_verify_param return multiple indications. SCTP-AUTH and future ADD-IP updates have a requirement to do additional verification of parameters and an ability to ABORT the association if verification fails. So, introduce additional return code so that we can clear signal a required action. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-09 11:43:41 -05:00
Vlad Yasevich	d970dbf845	SCTP: Convert custom hash lists to use hlist. Convert the custom hash list traversals to use hlist functions. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-09 11:43:40 -05:00
Vlad Yasevich	123ed979ea	SCTP: Use hashed lookup when looking for an association. A SCTP endpoint may have a lot of associations on them and walking the list is fairly inefficient. Instead, use a hashed lookup, and filter out the hash list based on the endopoing we already have. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-09 11:41:36 -05:00
Vlad Yasevich	027f6e1ad3	SCTP: Fix a potential race between timers and receive path. There is a possible race condition where the timer code will free the association and the next packet in the queue will also attempt to free the same association. The example is, when we receive an ABORT at about the same time as the retransmission timer fires. If the timer wins the race, it will free the association. Once it releases the lock, the queue processing will recieve the ABORT and will try to free the association again. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-07 11:39:27 -05:00
Vlad Yasevich	73d9c4fd1a	SCTP: Allow ADD_IP to work with AUTH for backward compatibility. This patch adds a tunable that will allow ADD_IP to work without AUTH for backward compatibility. The default value is off since the default value for ADD_IP is off as well. People who need to use ADD-IP with older implementations take risks of connection hijacking and should consider upgrading or turning this tunable on. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-07 11:39:27 -05:00
Vlad Yasevich	88799fe5ec	SCTP: Correctly disable ADD-IP when AUTH is not supported. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-07 11:39:27 -05:00
Vlad Yasevich	0ed90fb0f6	SCTP: Update RCU handling during the ADD-IP case After learning more about rcu, it looks like the ADD-IP hadling doesn't need to call call_rcu_bh. All the rcu critical sections use rcu_read_lock, so using call_rcu_bh is wrong here. Now, restore the local_bh_disable() code blocks and use normal call_rcu() calls. Also restore the missing return statement. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-07 11:39:27 -05:00
Vlad Yasevich	b6157d8e03	SCTP: Fix difference cases of retransmit. Commit `d0ce92910b` broke several retransmit cases including fast retransmit. The reason is that we should only delay by rto while doing retranmists as a result of a timeout. Retransmit as a result of path mtu discover, fast retransmit, or other evernts that should trigger immidiate retransmissions got broken. Also, since rto is doubled prior to marking of packets elegable for retransmission, we never marked correct chunks anyway. The fix is provide a reason for a given retransmission so that we can mark chunks appropriately and to save the old rto value to do comparisons against. All regressions tests passed with this code. Spotted by Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-07 11:39:27 -05:00
Wei Yongjun	f3830ccc2e	SCTP : Fix to process bundled ASCONF chunk correctly If ASCONF chunk is bundled with other chunks as the first chunk, when process the ASCONF parameters, full packet data will be process as the parameters of the ASCONF chunk, not only the real parameters. So if you send a ASCONF chunk bundled with other chunks, you will get an unexpect result. This problem also exists when ASCONF-ACK chunk is bundled with other chunks. This patch fix this problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-07 11:39:27 -05:00
Wei Yongjun	64b0812b6d	SCTP : Fix bad formatted comment in outqueue.c Just fix the bad format of the comment in outqueue.c. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>	2007-11-07 11:39:26 -05:00
Patrick McHardy	c3d8d1e30c	[NETLINK]: Fix unicast timeouts Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts by moving the schedule_timeout() call to a new function that doesn't propagate the remaining timeout back to the caller. This means on each retry we start with the full timeout again. ipc/mqueue.c seems to actually want to wait indefinitely so this behaviour is retained. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:12 -08:00
Eric Dumazet	230140cffa	[INET]: Remove per bucket rwlock in tcp/dccp ehash table. As done two years ago on IP route cache table (commit `22c047ccbc`) , we can avoid using one lock per hash bucket for the huge TCP/DCCP hash tables. On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle performance differences. (we hit a different cache line for the rwlock, but then the bucket cache line have a better sharing factor among cpus, since we dirty it less often). For netstat or ss commands that want a full scan of hash table, we perform fewer memory accesses. Using a 'small' table of hashed rwlocks should be more than enough to provide correct SMP concurrency between different buckets, without using too much memory. Sizing of this table depends on num_possible_cpus() and various CONFIG settings. This patch provides some locking abstraction that may ease a future work using a different model for TCP/DCCP table. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:11 -08:00
Rumen G. Bogdanovski	efac52762b	[IPVS]: Synchronize closing of Connections This patch makes the master daemon to sync the connection when it is about to close. This makes the connections on the backup to close or timeout according their state. Before the sync was performed only if the connection is in ESTABLISHED state which always made the connections to timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch ([IPVS]: use proper timeout instead of fixed value) effectively did nothing more than increasing this to 15 minutes (Established state timeout). So this patch makes use of proper timeout since it syncs the connections on status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout). However if the backup misses CLOSE hopefully it did not miss FIN_WAIT. Otherwise we will just have to wait for the ESTABLISHED state timeout. As it is without this patch. This way the number of the hanging connections on the backup is kept to minimum. And very few of them will be left to timeout with a long timeout. This is important if we want to make use of the fix for the real server overcommit on master/backup fail-over. Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:10 -08:00
Rumen G. Bogdanovski	1e356f9cdf	[IPVS]: Bind connections on stanby if the destination exists This patch fixes the problem with node overload on director fail-over. Given the scenario: 2 nodes each accepting 3 connections at a time and 2 directors, director failover occurs when the nodes are fully loaded (6 connections to the cluster) in this case the new director will assign another 6 connections to the cluster, If the same real servers exist there. The problem turned to be in not binding the inherited connections to the real servers (destinations) on the backup director. Therefore: "ipvsadm -l" reports 0 connections: root@test2:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc -> node473.local:5999 Route 1000 0 0 -> node484.local:5999 Route 1000 0 0 while "ipvs -lnc" is right root@test2:~# ipvsadm -lnc IPVS connection entries pro expire state source virtual destination TCP 14:56 ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999 192.168.0.51:5999 TCP 14:59 ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999 192.168.0.52:5999 So the patch I am sending fixes the problem by binding the received connections to the appropriate service on the backup director, if it exists, else the connection will be handled the old way. So if the master and the backup directors are synchronized in terms of real services there will be no problem with server over-committing since new connections will not be created on the nonexistent real services on the backup. However if the service is created later on the backup, the binding will be performed when the next connection update is received. With this patch the inherited connections will show as inactive on the backup: root@test2:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc -> node473.local:5999 Route 1000 0 1 -> node484.local:5999 Route 1000 0 1 rumen@test2:~$ cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP C0A800DE:176F wlc -> C0A80033:176F Route 1000 0 1 -> C0A80032:176F Route 1000 0 1 Regards, Rumen Bogdanovski Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2007-11-07 04:15:09 -08:00
Pavel Emelyanov	b733c007ed	[NET]: Clean proto_(un)register from in-code ifdefs The struct proto has the per-cpu "inuse" counter, which is handled with a special care. All the handling code hides under the ifdef CONFIG_SMP and it introduces some code duplication and makes it look worse than it could. Clean this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:04 -08:00
Herbert Xu	4999f3621f	[IPSEC]: Fix crypto_alloc_comp error checking The function crypto_alloc_comp returns an errno instead of NULL to indicate error. So it needs to be tested with IS_ERR. This is based on a patch by Vicen� Beltran Querol. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:03 -08:00
Patrick McHardy	fffe470a80	[VLAN]: Fix SET_VLAN_INGRESS_PRIORITY_CMD ioctl Based on report and patch by Doug Kehn <rdkehn@yahoo.com>: vconfig returns the following error when attempting to execute the set_ingress_map command: vconfig: socket or ioctl error for set_ingress_map: Operation not permitted In vlan.c, vlan_ioctl_handler for SET_VLAN_INGRESS_PRIORITY_CMD sets err = -EPERM and calls vlan_dev_set_ingress_priority. vlan_dev_set_ingress_priority is a void function so err remains at -EPERM and results in the vconfig error (even though the ingress map was set). Fix by setting err = 0 after the vlan_dev_set_ingress_priority call. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:02 -08:00
Johann Felix Soden	45a19b0a72	[NETNS]: Fix compiler error in net_namespace.c Because net_free is called by copy_net_ns before its declaration, the compiler gives an error. This patch puts net_free before copy_net_ns to fix this. The compiler error: net/core/net_namespace.c: In function 'copy_net_ns': net/core/net_namespace.c:97: error: implicit declaration of function 'net_free' net/core/net_namespace.c: At top level: net/core/net_namespace.c:104: warning: conflicting types for 'net_free' net/core/net_namespace.c:104: error: static declaration of 'net_free' follows non-static declaration net/core/net_namespace.c:97: error: previous implicit declaration of 'net_free' was here The error was introduced by the '[NET]: Hide the dead code in the net_namespace.c' patch (`6a1a3b9f68`). Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:02 -08:00
Radu Rendec	543821c6f5	[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks. While trying to implement u32 hashes in my shaping machine I ran into a possible bug in the u32 hash/bucket computing algorithm (net/sched/cls_u32.c). The problem occurs only with hash masks that extend over the octet boundary, on little endian machines (where htonl() actually does something). Let's say that I would like to use 0x3fc0 as the hash mask. This means 8 contiguous "1" bits starting at b6. With such a mask, the expected (and logical) behavior is to hash any address in, for instance, 192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in bucket 1, then 192.168.0.128/26 in bucket 2 and so on. This is exactly what would happen on a big endian machine, but on little endian machines, what would actually happen with current implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl() in the userspace tool and then applied to 192.168.x.x in the u32 classifier. When shifting right by 16 bits (rank of first "1" bit in the reversed mask) and applying the divisor mask (0xff for divisor 256), what would actually remain is 0x3f applied on the "168" octet of the address. One could say is this can be easily worked around by taking endianness into account in userspace and supplying an appropriate mask (0xfc03) that would be turned into contiguous "1" bits when reversed (0x03fc0000). But the actual problem is the network address (inside the packet) not being converted to host order, but used as a host-order value when computing the bucket. Let's say the network address is written as n31 n30 ... n0, with n0 being the least significant bit. When used directly (without any conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15 etc in the machine's registers. Thus bits n7 and n8 would no longer be adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be consecutive. The fix is to apply ntohl() on the hmask before computing fshift, and in u32_hash_fold() convert the packet data to host order before shifting down by fshift. With helpful feedback from Jamal Hadi Salim and Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:11:45 -08:00
Jiri Olsa	40208d71e0	[NET]: Removing duplicit #includes Removing duplicit #includes for net/ Signed-off-by: Jiri Olsa <olsajiri@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:11:44 -08:00
Pavel Emelyanov	c3e9a353d8	[IPV4]: Compact some ifdefs in the fib code. There are places that check for CONFIG_IP_MULTIPLE_TABLES twice in the same file, but the internals of these #ifdefs can be merged. As a side effect - remove one ifdef from inside a function. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:11:41 -08:00
Alexey Dobriyan	33120b30cc	[IPV6]: Convert /proc/net/ipv6_route to seq_file interface This removes last proc_net_create() user. Kudos to Benjamin Thery and Stephen Hemminger for comments on previous version. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:09:18 -08:00
Evgeniy Polyakov	4f9f8311a0	[PKT_SCHED]: Fix OOPS when removing devices from a teql queuing discipline tecl_reset() is called from deactivate and qdisc is set to noop already, but subsequent teql_xmit does not know about it and dereference private data as teql qdisc and thus oopses. not catch it first :) Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:09:17 -08:00
David S. Miller	c62cf5cb17	[DCCP]: Use DEFINE_PROTO_INUSE infrastructure. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:09:01 -08:00
Eric Dumazet	8295b6d9e6	[SCTP]: Use the {DEFINE\|REF}_PROTO_INUSE infrastructure Trivial patch to make "sctcp,sctpv6" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:09:00 -08:00
Eric Dumazet	c5a432f1a1	[IPV6]: Use the {DEFINE\|REF}_PROTO_INUSE infrastructure Trivial patch to make "tcpv6,udpv6,udplitev6,rawv6" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:59 -08:00
Eric Dumazet	47a31a6ffc	[IPV4]: Use the {DEFINE\|REF}_PROTO_INUSE infrastructure Trivial patch to make "tcp,udp,udplite,raw" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:58 -08:00
Eric Dumazet	286ab3d460	[NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. "struct proto" currently uses an array stats[NR_CPUS] to track change on 'inuse' sockets per protocol. If NR_CPUS is big, this means we use a big memory area for this. Moreover, all this memory area is located on a single node on NUMA machines, increasing memory pressure on the boot node. In this patch, I tried to : - Keep a fast !CONFIG_SMP implementation - Keep a fast CONFIG_SMP implementation for often used protocols (tcp,udp,raw,...) - Introduce a NUMA efficient implementation Some helper macros are defined in include/net/sock.h These macros take into account CONFIG_SMP If a "struct proto" is declared without using DEFINE_PROTO_INUSE / REF_PROTO_INUSE macros, it will automatically use a default implementation, using a dynamically allocated percpu zone. This default implementation will be NUMA efficient, but might use 32/64 bytes per possible cpu because of current alloc_percpu() implementation. However it still should be better than previous implementation based on stats[NR_CPUS] field. When a "struct proto" is changed to use the new macros, we use a single static "int" percpu variable, lowering the memory and cpu costs, still preserving NUMA efficiency. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:57 -08:00
Pavel Emelyanov	6a9fb9479f	[IPV4]: Clean the ip_sockglue.c from some ugly ifdefs The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s, which looks completely bad. Similar ifdefs inside the functions looks a bit better, but they are also not recommended to be used. Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:55 -08:00
Alexey Dobriyan	4e058063f4	[DECNET]: "addr" module param can't be __initdata sysfs keeps references to module parameters via /sys/module//parameters, so marking them as __initdata can't work. Steps to reproduce: modprobe decnet cat /sys/module/decnet/parameters/addr BUG: unable to handle kernel paging request at virtual address f88cd410 printing eip: c043dfd1 pdpt = 0000000000004001 pde = 0000000004408067 pte = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: decnet sunrpc af_packet ipv6 binfmt_misc dm_mirror dm_multipath dm_mod sbs sbshc fan dock battery backlight ac power_supply parport loop rtc_cmos serio_raw rtc_core rtc_lib button amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore Pid: 2099, comm: cat Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #6) EIP: 0060:[<c043dfd1>] EFLAGS: 00210286 CPU: 1 EIP is at param_get_int+0x6/0x20 EAX: c5c87000 EBX: 00000000 ECX: 000080d0 EDX: f88cd410 ESI: f8a108f8 EDI: c5c87000 EBP: 00000000 ESP: c5c97f00 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 2099, ti=c5c97000 task=c641ee10 task.ti=c5c97000) Stack: 00000000 f8a108f8 c5c87000 c043db6b f8a108f1 00000124 c043de1a c043db2f f88cd410 ffffffff c5c87000 f8a16bc8 f8a16bc8 c043dd69 c043dd54 c5dd5078 c043dbc8 c5cc7580 c06ee64c c5d679f8 c04c431f c641f480 c641f484 00001000 Call Trace: [<c043db6b>] param_array_get+0x3c/0x62 [<c043de1a>] param_array_set+0x0/0xdf [<c043db2f>] param_array_get+0x0/0x62 [<c043dd69>] param_attr_show+0x15/0x2d [<c043dd54>] param_attr_show+0x0/0x2d [<c043dbc8>] module_attr_show+0x1a/0x1e [<c04c431f>] sysfs_read_file+0x7c/0xd9 [<c04c42a3>] sysfs_read_file+0x0/0xd9 [<c048d4b2>] vfs_read+0x88/0x134 [<c042090b>] do_page_fault+0x0/0x7d5 [<c048d920>] sys_read+0x41/0x67 [<c04080fa>] sysenter_past_esp+0x6b/0xc1 ======================= Code: 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 27 dd 6c c0 89 04 24 89 54 24 08 e8 ea 01 0c 00 83 c4 0c c3 83 ec 0c 8b 52 10 <8b> 12 c7 44 24 04 58 8c 6a c0 89 04 24 89 54 24 08 e8 ca 01 0c EIP: [<c043dfd1>] param_get_int+0x6/0x20 SS:ESP 0068:c5c97f00 Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:55 -08:00
Mitsuru Chinen	7a0ff716c2	[IPv6] SNMP: Restore Udp6InErrors incrementation As the checksum verification is postponed till user calls recv or poll, the inrementation of Udp6InErrors counter should be also postponed. Currently, it is postponed in non-blocking operation case. However it should be postponed in all case like the IPv4 code. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:54 -08:00
Alexey Dobriyan	3f192b5c58	[NET]: Remove /proc/net/stat/_arp_cache upon module removal neigh_table_init_no_netlink() creates them, but they aren't removed anywhere. Steps to reproduce: modprobe clip rmmod clip cat /proc/net/stat/clip_arp_cache BUG: unable to handle kernel paging request at virtual address f89d7758 printing eip: c05a99da pdpt = 0000000000004001 pde = 0000000004408067 pte = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: atm af_packet ipv6 binfmt_misc sbs sbshc fan dock battery backlight ac power_supply parport loop rtc_cmos rtc_core rtc_lib serio_raw button k8temp hwmon amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore Pid: 2082, comm: cat Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #4) EIP: 0060:[<c05a99da>] EFLAGS: 00210256 CPU: 0 EIP is at neigh_stat_seq_next+0x26/0x3f EAX: 00000001 EBX: f89d7600 ECX: c587bf40 EDX: 00000000 ESI: 00000000 EDI: 00000001 EBP: 00000400 ESP: c587bf1c DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 2082, ti=c587b000 task=c5984e10 task.ti=c587b000) Stack: c06228cc c5313790 c049e5c0 0804f000 c45a7b00 c53137b0 00000000 00000000 00000082 00000001 00000000 00000000 00000000 fffffffb c58d6780 c049e437 c45a7b00 c04b1f93 c587bfa0 00000400 0804f000 00000400 0804f000 c04b1f2f Call Trace: [<c049e5c0>] seq_read+0x189/0x281 [<c049e437>] seq_read+0x0/0x281 [<c04b1f93>] proc_reg_read+0x64/0x77 [<c04b1f2f>] proc_reg_read+0x0/0x77 [<c048907e>] vfs_read+0x80/0xd1 [<c0489491>] sys_read+0x41/0x67 [<c04080fa>] sysenter_past_esp+0x6b/0xc1 ======================= Code: e9 ec 8d 05 00 56 8b 11 53 8b 40 70 8b 58 3c eb 29 0f a3 15 80 91 7b c0 19 c0 85 c0 8d 42 01 74 17 89 c6 c1 fe 1f 89 01 89 71 04 <8b> 83 58 01 00 00 f7 d0 8b 04 90 eb 09 89 c2 83 fa 01 7e d2 31 EIP: [<c05a99da>] neigh_stat_seq_next+0x26/0x3f SS:ESP 0068:c587bf1c Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:53 -08:00
Pavel Emelyanov	bf138862b1	[IPV6]: Consolidate the ip cork destruction in ip6_output.c The ip6_push_pending_frames and ip6_flush_pending_frames do the same things to flush the sock's cork. Move this into a separate function and save ~100 bytes from the .text Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:26 -08:00
Pavel Emelyanov	429f08e950	[IPV4]: Consolidate the ip cork destruction in ip_output.c The ip_push_pending_frames and ip_flush_pending_frames do the same things to flush the sock's cork. Move this into a separate function and save ~80 bytes from the .text Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:25 -08:00
Bart De Schuymer	e011ff48ab	[NETFILTER]: ebt_arp: fix --arp-gratuitous matching dependence on --arp-ip-{src,dst} Fix --arp-gratuitous matching dependence on --arp-ip-{src,dst} Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Lutz Pre�ler <Lutz.Pressler@SerNet.DE> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:25 -08:00
Alexey Dobriyan	55d84acd36	[NETFILTER]: nf_sockopts list head cleanup Code is using knowledge that nf_sockopt_ops::list list_head is first field in structure by using casts. Switch to list_for_each_entry() itetators while I am at it. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:24 -08:00
Patrick McHardy	d1332e0ab8	[NETFILTER]: remove unneeded rcu_dereference() calls As noticed by Paul McKenney, the rcu_dereference calls in the init path of NAT modules are unneeded, remove them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:23 -08:00
Jan Engelhardt	0795c65d9f	[NETFILTER]: Clean up Makefile Sort matches and targets in the NF makefiles. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:22 -08:00
Jan Engelhardt	ba5dc2756c	[NETFILTER]: Copyright/Email update Transfer all my copyright over to our company. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:20 -08:00
Alexey Dobriyan	7351a22a3a	[NETFILTER]: ip{,6}_queue: convert to seq_file interface I plan to kill ->get_info which means killing proc_net_create(). Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:08:20 -08:00
Latchesar Ionkov	55762690e2	9p: add missing end-of-options record for trans_fd The list of options that the fd transport accepts is missing end-of-options marker. This patch adds it. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>	2007-11-06 08:02:53 -06:00
Latchesar Ionkov	dd1a458412	9p: return NULL when trans not found v9fs_match_trans function returns arbitrary transport module instead of NULL when the requested transport is not registered. This patch modifies the function to return NULL in that case. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>	2007-11-06 08:02:53 -06:00
Jens Axboe	c46f2334c8	[SG] Get rid of __sg_mark_end() sg_mark_end() overwrites the page_link information, but all users want __sg_mark_end() behaviour where we just set the end bit. That is the most natural way to use the sg list, since you'll fill it in and then mark the end point. So change sg_mark_end() to only set the termination bit. Add a sg_magic debug check as well, and clear a chain pointer if it is set. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-02 08:47:06 +01:00
Adrian Bunk	87ae9afdca	cleanup asm/scatterlist.h includes Not architecture specific code should not #include <asm/scatterlist.h>. This patch therefore either replaces them with #include <linux/scatterlist.h> or simply removes them if they were unused. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-02 08:47:06 +01:00
David S. Miller	49259d34c5	[IRDA] IRNET: Fix build when TCGETS2 is defined. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 02:26:38 -07:00
Stephen Hemminger	3b582cc14c	[NET]: docbook fixes for netif_ functions Documentation updates for network interfaces. 1. Add doc for netif_napi_add 2. Remove doc for unused returns from netif_rx 3. Add doc for netif_receive_skb [ Incorporated minor mods from Randy Dunlap -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 02:21:47 -07:00
Pavel Emelyanov	d57a9212e0	[NET]: Hide the net_ns kmem cache This cache is only required to create new namespaces, but we won't have them in CONFIG_NET_NS=n case. Hide it under the appropriate ifdef. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:46:50 -07:00
Pavel Emelyanov	1a2ee93d28	[NET]: Mark the setup_net as __net_init The setup_net is called for the init net namespace only (int the CONFIG_NET_NS=n of course) from the __init function, so mark it as __net_init to disappear with the caller after the boot. Yet again, in the perfect world this has to be under #ifdef CONFIG_NET_NS, but it isn't guaranteed that every subsystem is registered after the init_net_ns is set up. After we are sure, that we don't start registering them before the init net setup, we'll be able to move this code under the ifdef. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:45:59 -07:00
Pavel Emelyanov	6a1a3b9f68	[NET]: Hide the dead code in the net_namespace.c The namespace creation/destruction code is never called if the CONFIG_NET_NS is n, so it's OK to move it under appropriate ifdef. The copy_net_ns() in the "n" case checks for flags and returns -EINVAL when new net ns is requested. In a perfect world this stub must be in net_namespace.h, but this function need to know the CLONE_NEWNET value and thus requires sched.h. On the other hand this header is to be injected into almost every .c file in the networking code, and making all this code depend on the sched.h is a suicidal attempt. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:44:50 -07:00
Pavel Emelyanov	1dba323b3f	[NETNS]: Make the init/exit hooks checks outside the loop When the new pernet something (subsys, device or operations) is being registered, the init callback is to be called for each namespace, that currently exitst in the system. During the unregister, the same is to be done with the exit callback. However, not every pernet something has both calls, but the check for the appropriate pointer to be not NULL is performed inside the for_each_net() loop. This is (at least) strange, so tune this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:42:43 -07:00
Pavel Emelyanov	6257ff2177	[NET]: Forget the zero_it argument of sk_alloc() Finally, the zero_it argument can be completely removed from the callers and from the function prototype. Besides, fix the checkpatch.pl warnings about using the assignments inside if-s. This patch is rather big, and it is a part of the previous one. I splitted it wishing to make the patches more readable. Hope this particular split helped. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:39:31 -07:00
Pavel Emelyanov	154adbc846	[NET]: Remove bogus zero_it argument from sk_alloc At this point nobody calls the sk_alloc(() with zero_it == 0, so remove unneeded checks from it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:38:43 -07:00
Pavel Emelyanov	8fd1d178a3	[NET]: Make the sk_clone() lighter The sk_prot_alloc() already performs all the stuff needed by the sk_clone(). Besides, the sk_prot_alloc() requires almost twice less arguments than the sk_alloc() does, so call the sk_prot_alloc() saving the stack a bit. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:37:32 -07:00
Pavel Emelyanov	2e4afe7b35	[NET]: Move some core sock setup into sk_prot_alloc The security_sk_alloc() and the module_get is a part of the object allocations - move it in the proper place. Note, that since we do not reset the newly allocated sock in the sk_alloc() (memset() is removed with the previous patch) we can safely do this. Also fix the error path in sk_prot_alloc() - release the security context if needed. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:36:26 -07:00
Pavel Emelyanov	3f0666ee30	[NET]: Auto-zero the allocated sock object We have a __GFP_ZERO flag that allocates a zeroed chunk of memory. Use it in the sk_alloc() and avoid a hand-made memset(). This is a temporary patch that will help us in the nearest future :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:34:42 -07:00
Pavel Emelyanov	c308c1b20e	[NET]: Cleanup the allocation/freeing of the sock object The sock object is allocated either from the generic cache with the kmalloc, or from the proc->slab cache. Move this logic into an isolated set of helpers and make the sk_alloc/sk_free look a bit nicer. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:33:50 -07:00
Pavel Emelyanov	1e2e6b89f1	[NET]: Move the get_net() from sock_copy() The sock_copy() is supposed to just clone the socket. In a perfect world it has to be just memcpy, but we have to handle the security mark correctly. All the extra setup must be performed in sk_clone() call, so move the get_net() into more proper place. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:31:26 -07:00
Pavel Emelyanov	f1a6c4da14	[NET]: Move the sock_copy() from the header The sock_copy() call is not used outside the sock.c file, so just move it into a sock.c Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:29:45 -07:00
Ilpo J�rvinen	261ab365fa	[TCP]: Another TAGBITS -> SACKED_ACKED\|LOST conversion Similar to commit `3eec0047d9`, point of this is to avoid skipping R-bit skbs. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:10:18 -07:00
Ilpo J�rvinen	e56d6cd605	[TCP]: Process DSACKs that reside within a SACK block DSACK inside another SACK block were missed if start_seq of DSACK was larger than SACK block's because sorting prioritizes full processing of the SACK block before DSACK. After SACK block sorting situation is like this: SSSSSSSSS D SSSSSS SSSSSSS Because write_queue is walked in-order, when the first SACK block has been processed, TCP is already past the skb for which the DSACK arrived and we haven't taught it to backtrack (nor should we), so TCP just continues processing by going to the next SACK block after the DSACK (if any). Whenever such DSACK is present, do an embedded checking during the previous SACK block. If the DSACK is below snd_una, there won't be overlapping SACK block, and thus no problem in that case. Also if start_seq of the DSACK is equal to the actual block, it will be processed first. Tested this by using netem to duplicate 15% of packets, and by printing SACK block when found_dup_sack is true and the selected skb in the dup_sack = 1 branch (if taken): SACK block 0: 4344-5792 (relative to snd_una 2019137317) SACK block 1: 4344-5792 (relative to snd_una 2019137317) equal start seqnos => next_dup = 0, dup_sack = 1 won't occur... SACK block 0: 5792-7240 (relative to snd_una 2019214061) SACK block 1: 2896-7240 (relative to snd_una 2019214061) DSACK skb match 5792-7240 (relative to snd_una) ...and next_dup = 1 case (after the not shown start_seq sort), went to dup_sack = 1 branch. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:09:37 -07:00
Stephen Rothwell	298bb62175	[AF_KEY]: suppress a warning for 64k pages. On PowerPC allmodconfig build we get this: net/key/af_key.c:400: warning: comparison is always false due to limited range of data type Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 23:57:05 -07:00
David S. Miller	51c739d1f4	[NET]: Fix incorrect sg_mark_end() calls. This fixes scatterlist corruptions added by commit `68e3f5dd4d` [CRYPTO] users: Fix up scatterlist conversion errors The issue is that the code calls sg_mark_end() which clobbers the sg_page() pointer of the final scatterlist entry. The first part fo the fix makes skb_to_sgvec() do __sg_mark_end(). After considering all skb_to_sgvec() call sites the most correct solution is to call __sg_mark_end() in skb_to_sgvec() since that is what all of the callers would end up doing anyways. I suspect this might have fixed some problems in virtio_net which is the sole non-crypto user of skb_to_sgvec(). Other similar sg_mark_end() cases were converted over to __sg_mark_end() as well. Arguably sg_mark_end() is a poorly named function because it doesn't just "mark", it clears out the page pointer as a side effect, which is what led to these bugs in the first place. The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable() and arguably it could be converted to __sg_mark_end() if only so that we can delete this confusing interface from linux/scatterlist.h Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 21:29:29 -07:00
Alexey Dobriyan	07afa04025	[IPVS]: Remove /proc/net/ip_vs_lblcr It's under CONFIG_IP_VS_LBLCR_DEBUG option which never existed. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 21:16:27 -07:00
Daniel Lezcano	1675c7b254	[IPV6]: remove duplicate call to proc_net_remove The file /proc/net/if_inet6 is removed twice. First time in: inet6_exit ->addrconf_cleanup And followed a few lines after by: inet6_exit -> if6_proc_exit Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 21:16:24 -07:00
Daniel Lezcano	310928d963	[NETNS]: fix net released by rcu callback When a network namespace reference is held by a network subsystem, and when this reference is decremented in a rcu update callback, we must ensure that there is no more outstanding rcu update before trying to free the network namespace. In the normal case, the rcu_barrier is called when the network namespace is exiting in the cleanup_net function. But when a network namespace creation fails, and the subsystems are undone (like the cleanup), the rcu_barrier is missing. This patch adds the missing rcu_barrier. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 21:16:21 -07:00
Daniel Lezcano	93ee31f14f	[NET]: Fix free_netdev on register_netdev failure. Point 1: The unregistering of a network device schedule a netdev_run_todo. This function calls dev->destructor when it is set and the destructor calls free_netdev. Point 2: In the case of an initialization of a network device the usual code is: * alloc_netdev * register_netdev -> if this one fails, call free_netdev and exit with error. Point 3: In the register_netdevice function at the later state, when the device is at the registered state, a call to the netdevice_notifiers is made. If one of the notification falls into an error, a rollback to the registered state is done using unregister_netdevice. Conclusion: When a network device fails to register during initialization because one network subsystem returned an error during a notification call chain, the network device is freed twice because of fact 1 and fact 2. The second free_netdev will be done with an invalid pointer. Proposed solution: The following patch move all the code of unregister_netdevice except the call to net_set_todo, to a new function "rollback_registered". The following functions are changed in this way: * register_netdevice: calls rollback_registered when a notification fails * unregister_netdevice: calls rollback_register + net_set_todo, the call order to net_set_todo is changed because it is the latest now. Since it justs add an element to a list that should not break anything. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 21:16:18 -07:00
Dirk Hohndel	e403149c92	Kbuild/doc: fix links to Documentation files Fix links to files in Documentation/* in various Kconfig files Signed-off-by: Dirk Hohndel <hohndel@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-30 14:26:30 -07:00
J. Bruce Fields	521c2a43b2	[SUNRPC]: fix rpc debugging Commit `baa3a2a0d2`, by removing initialization of the ctl_name field, broke this conditional, preventing the display of rpc_tasks that you previously got when turning on rpc debugging. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 01:07:15 -07:00
Jean Delvare	0ccfe61803	[TCP]: Saner thash_entries default with much memory. On systems with a very large amount of memory, the heuristics in alloc_large_system_hash() result in a very large TCP established hash table: 16 millions of entries for a 128 GB ia64 system. This makes reading from /proc/net/tcp pretty slow (well over a second) and as a result netstat is slow on these machines. I know that /proc/net/tcp is deprecated in favor of tcp_diag, however at the moment netstat only knows of the former. I am skeptical that such a large TCP established hash is often needed. Just because a system has a lot of memory doesn't imply that it will have several millions of concurrent TCP connections. Thus I believe that we should put an arbitrary high limit to the size of the TCP established hash by default. Users who really need a bigger hash can always use the thash_entries boot parameter to get more. I propose 2 millions of entries as the arbitrary high limit. This makes /proc/net/tcp reasonably fast on the system in question (0.2 s) while being still large enough for me to be confident that network performance won't suffer. This is just one way to limit the hash size, there are others; I am not familiar enough with the TCP code to decide which is best. Thus, I would welcome the proposals of alternatives. [ 2 million is still too large, thus I've modified the limit in the change to be '512 * 1024'. -DaveM ] Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 00:59:25 -07:00
Stephen Rothwell	e08a132b0e	[SUNRPC] rpc_rdma: we need to cast u64 to unsigned long long for printing as some architectures have unsigned long for u64. net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_create_chunks': net/sunrpc/xprtrdma/rpc_rdma.c:222: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/sunrpc/xprtrdma/rpc_rdma.c:234: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_count_chunks': net/sunrpc/xprtrdma/rpc_rdma.c:577: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64 Noticed on PowerPC pseries_defconfig build. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-30 00:44:32 -07:00
Mitsuru Chinen	064f3605be	[IPv4] SNMP: Refer correct memory location to display ICMP out-going statistics While displaying ICMP out-going statistics as Out<name> counters in /proc/net/snmp, the memory location for ICMP in-coming statistics was referred by mistake. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:36 -07:00
David S. Miller	bf3c23d171	[NET]: Fix error reporting in sys_socketpair(). If either of the two sock_alloc_fd() calls fail, we forget to update 'err' and thus we'll erroneously return zero in these cases. Based upon a report and patch from Rich Paul, and commentary from Chuck Ebbert. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:34 -07:00
Andrew Morton	29b67497f2	[NETFILTER]: nf_ct_alloc_hashtable(): use __GFP_NOWARN This allocation is expected to fail and we handle it by fallback to vmalloc(). So don't scare people with nasty messages like http://bugzilla.kernel.org/show_bug.cgi?id=9190 Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:31 -07:00
David S. Miller	0a7606c121	[NET]: Fix race between poll_napi() and net_rx_action() netpoll_poll_lock() synchronizes the ->poll() invocation code paths, but once we have the lock we have to make sure that NAPI_STATE_SCHED is still set. Otherwise we get: cpu 0 cpu 1 net_rx_action() poll_napi() netpoll_poll_lock() ... spin on ->poll_lock ->poll() netif_rx_complete netpoll_poll_unlock() acquire ->poll_lock() ->poll() netif_rx_complete() CRASH Based upon a bug report from Tina Yang. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:28 -07:00
Matthias M. Dellweg	b0a713e9e6	[TCP] MD5: Remove some more unnecessary casting. while reviewing the tcp_md5-related code further i came across with another two of these casts which you probably have missed. I don't actually think that they impose a problem by now, but as you said we should remove them. Signed-off-by: Matthias M. Dellweg <2500@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:27 -07:00
Xiaoliang (David) Wei	c940587bf6	[TCP] vegas: Fix a bug in disabling slow start by gamma parameter. TCP Vegas implementation has a bug in the process of disabling slow-start with gamma parameter. The bug may lead to extreme unfairness in the presence of early packet loss. See details in: http://www.cs.caltech.edu/~weixl/technical/ns2linux/known_linux/index.html#vegas Switch the order of "if (tp->snd_cwnd <= tp->snd_ssthresh)" statement and "if (diff > gamma)" statement to eliminate the problem. Signed-off-by: Xiaoliang (David) Wei <davidwei79@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:25 -07:00
Andy Gospodarek	5c81833c2f	[IPVS]: use proper timeout instead of fixed value Instead of using the default timeout of 3 minutes, this uses the timeout specific to the protocol used for the connection. The 3 minute timeout seems somewhat arbitrary (though I know it is used other places in the ipvs code) and when failing over it would be much nicer to use one of the configured timeout values. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:23 -07:00
YOSHIFUJI Hideaki	ad02ac145d	[IPV6] NDISC: Fix setting base_reachable_time_ms variable. This bug was introduced by the commit `d12af679bc` (sysctl: fix neighbour table sysctls). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-29 22:37:22 -07:00
Al Viro	d06f608265	SCTP endianness annotations regression Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-29 07:41:32 -07:00
Al Viro	2d8a972661	SUNRPC endianness annotations rpcrdma stuff lacks endianness annotations for on-the-wire data. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-29 07:41:32 -07:00
Herbert Xu	68e3f5dd4d	[CRYPTO] users: Fix up scatterlist conversion errors This patch fixes the errors made in the users of the crypto layer during the sg_init_table conversion. It also adds a few conversions that were missing altogether. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-27 00:52:07 -07:00
Eric W. Biederman	ceaa79c434	[NETNS]: Fix get_net_ns_by_pid The pid namespace patches changed the semantics of find_task_by_pid without breaking the compile resulting in get_net_ns_by_pid doing the wrong thing. So switch to using the intended find_task_by_vpid. Combined with Denis' earlier patch to make netlink traffic fully synchronous the inadvertent race I introduced with accessing current is actually removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 22:56:12 -07:00
Eric W. Biederman	2b008b0a8e	[NET]: Marking struct pernet_operations __net_initdata was inappropriate It is not safe to to place struct pernet_operations in a special section. We need struct pernet_operations to last until we call unregister_pernet_subsys. Which doesn't happen until module unload. So marking struct pernet_operations is a disaster for modules in two ways. - We discard it before we call the exit method it points to. - Because I keep struct pernet_operations on a linked list discarding it for compiled in code removes elements in the middle of a linked list and does horrible things for linked insert. So this looks safe assuming __exit_refok is not discarded for modules. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 22:54:53 -07:00
Adrian Bunk	72998d8c84	[INET] ESP: Must #include <linux/scatterlist.h> This patch fixes the following compile errors in some configurations: <-- snip --> ... CC net/ipv4/esp4.o /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv4/esp4.c: In function 'esp_output': /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv4/esp4.c:113: error: implicit declaration of function 'sg_init_table' make[3]: * [net/ipv4/esp4.o] Error 1 ... /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv6/esp6.c: In function 'esp6_output': /home/bunk/linux/kernel-2.6/git/linux-2.6/net/ipv6/esp6.c:112: error: implicit declaration of function 'sg_init_table' make[3]: * [net/ipv6/esp6.o] Error 1 <-- snip --> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 22:53:58 -07:00
Jeff Garzik	18134bed02	[TCP] IPV6: fix softnet build breakage net/ipv6/tcp_ipv6.c: In function 'tcp_v6_rcv': net/ipv6/tcp_ipv6.c:1736: error: implicit declaration of function 'get_softnet_dma' net/ipv6/tcp_ipv6.c:1736: warning: assignment makes pointer from integer without a cast Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 22:53:14 -07:00
Paul Moore	4be2700fb7	[NetLabel]: correct usage of RCU locking This fixes some awkward, and perhaps even problematic, RCU lock usage in the NetLabel code as well as some other related trivial cleanups found when looking through the RCU locking. Most of the changes involve removing the redundant RCU read locks wrapping spinlocks in the case of a RCU writer. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 04:29:08 -07:00
Ryousei Takano	94d3b1e586	[TCP]: fix D-SACK cwnd handling In the current net-2.6 kernel, handling FLAG_DSACKING_ACK is broken. The flag is cleared to 1 just after FLAG_DSACKING_ACK is set. if (found_dup_sack) flag \|= FLAG_DSACKING_ACK; : flag = 1; To fix it, this patch introduces a part of the tcp_sacktag_state patch: http://marc.info/?l=linux-netdev&m=119210560431519&w=2 Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 04:27:59 -07:00

1 2 3 4 5 ...

6320 commits