kernel-fxtec-pro1x

Author	SHA1	Message	Date
Sven Schnelle	338e8a7926	[NETFILTER]: x_tables: add TCPOPTSTRIP target Signed-off-by: Sven Schnelle <svens@bitebene.org> Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:51 -08:00
Patrick McHardy	6ac552fdc6	[NETLINK]: af_netlink.c checkpatch cleanups Fix large number of checkpatch errors. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:50 -08:00
Herbert Xu	2fcb45b6b8	[IPSEC]: Use the correct family for input state lookup When merging the input paths of IPsec I accidentally left a hard-coded AF_INET for the state lookup call. This broke IPv6 obviously. This patch fixes by getting the input callers to specify the family through skb->cb. Credit goes to Kazunori Miyazawa for diagnosing this and providing an initial patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:49 -08:00
Wang Chen	bbca17680f	[UDP]: Counter increment should be in USER mode for recvmsg System calls should be USER. So change the BH to USER for UDP*_INC_STATS_BH(). Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:49 -08:00
Wang Chen	b2bf1e2659	[UDP]: Clean up for IS_UDPLITE macro Since we have macro IS_UDPLITE, we can use it. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:48 -08:00
Wang Chen	cb75994ec3	[UDP]: Defer InDataGrams increment until recvmsg() does checksum Thanks dave, herbert, gerrit, andi and other people for your discussion about this problem. UdpInDatagrams can be confusing because it counts packets that might be dropped later. Move UdpInDatagrams into recvmsg() as allowed by the RFC. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:47 -08:00
Ilpo Järvinen	6859d49475	[TCP]: Abstract tp->highest_sack accessing & point to next skb Pointing to the next skb is necessary to avoid referencing already SACKed skbs which will soon be on a separate list. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:46 -08:00
Ilpo Järvinen	7201883599	[TCP]: Cleanup local variables of clean_rtx_queue Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:46 -08:00
Ilpo Järvinen	ea60658cde	[TCP]: Add unlikely() to urgent handling in clean_rtx_queue Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:45 -08:00
Ilpo Järvinen	89d478f7f2	[TCP]: Remove duplicated code block from clean_rtx_queue Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:44 -08:00
Ilpo Järvinen	234b686070	[TCP]: Add tcp_for_write_queue_from_safe and use it in mtu_probe Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:43 -08:00
Ilpo Järvinen	d67c58e9ae	[TCP]: Remove local variable and use packets_in_flight directly Lines won't be that long and it's compiler's job to optimize them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:43 -08:00
Ilpo Järvinen	50c4817e99	[TCP]: MTUprobe: prepare skb fields earlier They better be valid when call to write_queue functions is made once things that follow are going in. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:42 -08:00
Ilpo Järvinen	c3a05c6050	[TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:41 -08:00
Ilpo Järvinen	ede9f3b186	[TCP]: Unite identical code from two seqno split blocks Bogus seqno compares just mislead, the code is identical for both sides of the seqno compare (and was even executed just once because of return in between). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:41 -08:00
Ilpo Järvinen	407ef1de03	[TCP]: Remove superflucious FLAG_DATA_SACKED To get there, highest_sack must have advanced. When it advances, a new skb is SACKed, which already sets that FLAG. Besides, the original purpose of it has puzzled me, never understood why LOST bit setting of retransmitted skb is marked with FLAG_DATA_SACKED. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:40 -08:00
Ilpo Järvinen	bce392f3b0	[TCP]: Move LOSTRETRANS MIB outside !(L\|S) check Usually those skbs will have L set, not counting them as lost retransmissions is misleading. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:39 -08:00
Pavel Emelyanov	1dab62226d	[IPV6]: Use ctl paths to register addrconf sysctls This looks very much like the patch for ipv4's devinet. This is also intended to help us with the net namespaces and saves the ipv6.ko size by ~320 bytes. The difference from the first version is just the patch offsets, that changed due to changes in the patch #2. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:38 -08:00
Pavel Emelyanov	f52295a9c5	[IPV6]: Unify and cleanup calls to addrconf_sysctl_register Currently this call is (ab)used similar to devinet one - it registers sysctls for devices and for the "default" confs, while the "all" sysctls are registered separately. But unlike its devinet brother, the passed inet6_device is needed. The fix is to make a __addrconf_sysctl_register(), which registers sysctls for all "devices" we need, including "default" and "all" :) The original addrconf_sysctl_register() calls the introduced function, passing the inet6_device, device name and ifindex (to be used as procname and ctl_name) into it. Thanks to Herbert again for pointing out, that we can shrink the argument list to 1 :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:38 -08:00
Pavel Emelyanov	bfada697bd	[IPV4]: Use ctl paths to register devinet sysctls This looks very much like the patch for neighbors. The path is also located on the stack and is prepared inside the function. This time, the call to the registering function is guarded with the RTNL lock, but I decided to keep it on the stack not to litter the devinet.c file with unneeded names and to make it look similar to the neighbors code. This is also intended to help us with the net namespaces and saves the vmlinux size as well - this time by more than 670 bytes. The difference from the first version is just the patch offsets, that changed due to changes in the patch #2. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:37 -08:00
Pavel Emelyanov	66f27a5203	[IPV4]: Unify and cleanup calls to devinet_sysctl_register Currently this call is used to register sysctls for devices and for the "default" confs. The "all" sysctls are registered separately. Besides, the inet_device is passed to this function, but it is not needed there at all - just the device name and ifindex are required. Thanks to Herbert, who noticed, that this call doesn't even require the devconf pointer (the last argument) - all we need we can take from the in_device itself. The fix is to make a __devinet_sysctl_register(), which registers sysctls for all "devices" we need, including "default" and "all" :) The original devinet_sysctl_register() works with struct net_device, not the inet_device, and calls the introduced function, passing the device name and ifindex (to be used as procname and ctl_name) into it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:36 -08:00
John W. Linville	edae58ead5	softmac: mark as obsolete and schedule for removal Schedule softmac for for removal in the 2.6.26 development window. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:36 -08:00
John W. Linville	88fecd092e	mac80211: remove "bcn_int" and "capab" scan results info These bits were dead code before "mac80211: Remove local->scan_flags" (commit 6681dd3fd0e4d36a4547415853e83411baa7b705) and probably should have been removed as part of that commit. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:34 -08:00
Ron Rindjunsky	64bd4b693f	mac80211: move A-MSDU identifier to flags This patch moves u8 amsdu_frame in ieee80211_txrx_data to the flags section as IEEE80211_TXRXD_RX_AMSDU Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:34 -08:00
Ron Rindjunsky	d3c990fb26	mac80211: adding 802.11n configuration flows This patch configures the 802.11n mode of operation internally in ieee80211_conf structure and in the low-level driver as well (through op conf_ht). It does not include AP configuration flows. Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:33 -08:00
Ron Rindjunsky	fd4c7f2fce	mac80211: adding 802.11n essential A-MSDU Rx capability This patch adds the ability to receive and handle A-MSDU frames. Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:32 -08:00
Ron Rindjunsky	9f985b0eee	mac80211: adding 802.11n essential A-MPDU addBA capability This patch adds the capability to identify and answer an add block ACK request. As this series of patches only adds HT handling with no aggregations, (A-MPDU aggregations acceptance is not obligatory according to 802.11n draft) we are currently sending back a refusal upon this request. Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:31 -08:00
Ron Rindjunsky	c715350828	mac80211: adding 802.11n IEs handling This patch presents the ability to parse and compose HT IEs, and to put the IE relevant data inside the mac80211's internal HT structures Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:31 -08:00
Ron Rindjunsky	10816d40f2	mac80211: adding 802.11n HT framework definitions New structures: - ieee80211_ht_info: describing STA's HT capabilities - ieee80211_ht_bss_info: describing BSS's HT characteristics Changed structures: - ieee80211_hw_mode: now also holds PHY HT capabilities for each HW mode - ieee80211_conf: ht_conf holds current self HT configuration ht_bss_conf holds current BSS HT configuration - flag IEEE80211_CONF_SUPPORT_HT_MODE added to indicate if HT use is desired - sta_info: now also holds Peer's HT capabilities Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:30 -08:00
Ron Rindjunsky	82b3cad942	mac80211: adding MAC80211_HT_DEBUG config variable This patch adds MAC80211_HT_DEBUG config variable to separate HT debug features Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:29 -08:00
Johannes Berg	b1357a81a9	mac80211: allow setting drop_unencrypted with wext This patch allows wpa_supplicant to set the drop_unencrypted setting in mac80211. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:29 -08:00
Johannes Berg	e38bad4766	mac80211: make ieee80211_iterate_active_interfaces not need rtnl Interface iteration in mac80211 can be done without holding any locks because I converted it to RCU. Initially, I thought this wouldn't be needed for ieee80211_iterate_active_interfaces but it's turning out that multi-BSS AP support can be much simpler in a driver if ieee80211_iterate_active_interfaces can be called without holding locks. This converts it to use RCU, it adds a requirement that the callback it invokes cannot sleep. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:28 -08:00
Ron Rindjunsky	76ee65bfaa	mac80211: restructuring data Rx handlers This patch restructures the Rx handlers chain by incorporating previously handlers ieee80211_rx_h_802_1x_pae and ieee80211_rx_h_drop_unencrypted into ieee80211_rx_h_data, already in 802.3 form. this scheme follows more precisely after the IEEE802.11 data plane archituecture, and will prevent code duplication to IEEE8021.11n A-MSDU handler. added function: - ieee80211_data_to_8023: transfering 802.11 data frames to 802.3 frame - ieee80211_deliver_skb: delivering the 802.3 frames to upper stack eliminated handlers: - ieee80211_rx_h_drop_unencrypted: now function ieee80211_drop_unencrypted - ieee80211_rx_h_802_1x_pae: now function ieee80211_802_1x_pae changed handlers: - ieee80211_rx_h_data: now contains calls to four above function Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:27 -08:00
Zhu Yi	ece8edddf0	mac80211: hardware scan rework The scan code in mac80211 makes the software scan assumption in various places. For example, we stop the Tx queue during a software scan so that all the Tx packets will be queued by the stack. We also drop frames not related to scan in the software scan process. But these are not true for hardware scan. Some wireless hardwares (for example iwl3945/4965) has the ability to perform the whole scan process by hardware and/or firmware. The hardware scan is relative powerful in that it tries to maintain normal network traffic while doing a scan in the background. Some drivers (i.e iwlwifi) do provide a way to tune the hardware scan parameters (for example if the STA is associated, what's the max time could the STA leave from the associated channel, how long the scans get suspended after returning to the service channel, etc). But basically this is transparent to the stack. mac80211 should not stop Tx queues or drop Rx packets during a hardware scan. This patch resolves the above problem by spliting the current scan indicator local->sta_scanning into local->sta_sw_scanning and local->sta_hw_scanning. It then changes the scan related code to be aware of hardware scan or software scan in various places. With this patch, iwlwifi performs much better in the scan-while-associated condition and disable_hw_scan=1 should never be required. Cc: Mohamed Abbas <mohamed.abbas@intel.com> Cc: Ben Cahill <ben.m.cahill@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:27 -08:00
Pavel Emelyanov	f68635e627	[IPV6]: Cleanup the addconf_sysctl_register This only includes fixing the space-indented lines and removing one unneeded else after the goto. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:26 -08:00
Pavel Emelyanov	9fa8964299	[IPV4]: Cleanup the devinet_sysctl_register I moved the call to kmalloc() from the *t declaration into the code (this is confusing when a variable is initialized with the result of some call) and removed unneeded comment near the error path. Just like I did with the neigh ctl-s. Besides, I fixed the goto's and the labels - they were indented with spaces :( Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:25 -08:00
Pavel Emelyanov	c3bac5a71b	[NEIGH]: Use the ctl paths to create neighbours sysctls The appropriate path is prepared right inside this function. It is prepared similar to how the ctl tables were. Since the path is modified, it is put on the stack, to avoid possible races with multiple calls to neigh_sysctl_register() : it is called by protocols and I didn't find any protection in this case. Did I overlooked the rtnl lock?. The stack growth of the neigh_sysctl_register() is 40 bytes. I believe this is OK, since this is not that much and this function is not called with the deep stack (device/protocols register). The device's name is stored on the template to free it later. This will help with the net namespaces, as each namespace should have its own set of these ctls. Besides, this saves ~350 bytes from the neigh template :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:24 -08:00
Pavel Emelyanov	3c607bbb47	[NEIGH]: Cleanup the neigh_sysctl_register This mainly removes the err variable, as this call always return the same error code (-ENOBUFS). Besides, I moved the call to kmalloc() from the *t declaration into the code (this is confusing when a variable is initialized with the result of some call) and removed unneeded comment near the error path. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:24 -08:00
Pavel Emelyanov	1597fbc0fa	[UNIX]: Make the unix sysctl tables per-namespace This is the core. * add the ctl_table_header on the struct net; * make the unix_sysctl_register and _unregister clone the table; * moves calls to them into per-net init and exit callbacks; * move the .data pointer in the proper place. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:23 -08:00
Pavel Emelyanov	1d430b913c	[UNIX]: Use ctl paths to register unix ctl tables Unlike previous ones, this patch is useful by its own, as it decreases the vmlinux size :) But it will be used later, when the per-namespace sysctl is added. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:22 -08:00
Pavel Emelyanov	d392e49756	[UNIX]: Move the sysctl_unix_max_dgram_qlen This will make all the sub-namespaces always use the default value (10) and leave the tuning via sysctl to the init namespace only. Per-namespace tuning is coming. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:22 -08:00
Pavel Emelyanov	97577e3828	[UNIX]: Extend unix_sysctl_(un)register prototypes Add the struct net * argument to both of them to use in the future. Also make the register one return an error code. It is useless right now, but will make the future patches much simpler. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:21 -08:00
Denis V. Lunev	dd88590995	[DECNET]: Remove extra memset from dn_fib_check_nh Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:20 -08:00
Paul Moore	875179fa60	[IPSEC]: SPD auditing fix to include the netmask/prefix-length Currently the netmask/prefix-length of an IPsec SPD entry is not included in any of the SPD related audit messages. This can cause a problem when the audit log is examined as the netmask/prefix-length is vital in determining what network traffic is affected by a particular SPD entry. This patch fixes this problem by adding two additional fields, "src_prefixlen" and "dst_prefixlen", to the SPD audit messages to indicate the source and destination netmasks. These new fields are only included in the audit message when the netmask/prefix-length is less than the address length, i.e. the SPD entry applies to a network address and not a host address. Example audit message: type=UNKNOWN[1415] msg=audit(1196105849.752:25): auid=0 \ subj=root:system_r:unconfined_t:s0-s0:c0.c1023 op=SPD-add res=1 \ src=192.168.0.0 src_prefixlen=24 dst=192.168.1.0 dst_prefixlen=24 In addition, this patch also fixes a few other things in the xfrm_audit_common_policyinfo() function. The IPv4 string formatting was converted to use the standard NIPQUAD_FMT constant, the memcpy() was removed from the IPv6 code path and replaced with a typecast (the memcpy() was acting as a slow, implicit typecast anyway), and two local variables were created to make referencing the XFRM security context and selector information cleaner. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:19 -08:00
Arnaldo Carvalho de Melo	9108d5f4b2	[TFRC]: Hide tx history details from the CCIDs Based on a previous patch by Gerrit Renker. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:19 -08:00
Eric W. Biederman	95bdfccb2b	[NET]: Implement the per network namespace sysctl infrastructure The user interface is: register_net_sysctl_table and unregister_net_sysctl_table. Very much like the current interface except there is a network namespace parameter. With this any sysctl registered with register_net_sysctl_table will only show up to tasks in the same network namespace. All other sysctls continue to be globally visible. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:18 -08:00
Patrick McHardy	be0ea7d5da	[NETFILTER]: Convert old checksum helper names Kill the defines again, convert to the new checksum helper names and remove the dependency of NET_ACT_NAT on NETFILTER. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:15 -08:00
Patrick McHardy	a99a00cf1a	[NET]: Move netfilter checksum helpers to net/core/utils.c This allows to get rid of the CONFIG_NETFILTER dependency of NET_ACT_NAT. This patch redefines the old names to keep the noise low, the next patch converts all users. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:14 -08:00
Gerrit Renker	3159afe0d2	[DCCP]: Remove duplicate test for CloseReq This removes a redundant test for unexpected packet types. In dccp_rcv_state_process it is tested twice whether a DCCP-server has received a CloseReq (Step 7): * first in the combined if-statement, * then in the call to dccp_rcv_closereq(). The latter is necesssary since dccp_rcv_closereq() is also called from __dccp_rcv_established(). This patch removes the duplicate test. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:14 -08:00
Gerrit Renker	0c86962076	[DCCP]: Integrate state transitions for passive-close This adds the necessary state transitions for the two forms of passive-close * PASSIVE_CLOSE - which is entered when a host receives a Close; * PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq. Here is a detailed account of what the patch does in each state. 1) Receiving CloseReq The pseudo-code in 8.5 says: Step 13: Process CloseReq If P.type == CloseReq and S.state < CLOSEREQ, Generate Close S.state := CLOSING Set CLOSING timer. This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN. * CLOSED: silently ignore - it may be a late or duplicate CloseReq; * LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client); * REQUEST: perform Step 13 directly (no need to enqueue packet); * OPEN/PARTOPEN: enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data. When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored. I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it gets the Reset, so ignoring the CloseReq while in state CLOSING is sane. 2) Receiving Close The code below from 8.5 is unconditional. Step 14: Process Close If P.type == Close, Generate Reset(Closed) Tear down connection Drop packet and return Thus we need to consider all states: * CLOSED: silently ignore, since this can happen when a retransmitted or late Close arrives; * LISTEN: dccp_rcv_state_process() will generate a Reset ("No Connection"); * REQUEST: perform Step 14 directly (no need to enqueue packet); * RESPOND: dccp_check_req() will generate a Reset ("Packet Error") -- left it at that; * OPEN/PARTOPEN: enter PASSIVE_CLOSE so that application has a chance to process unread data; * CLOSEREQ: server performed active-close -- perform Step 14; * CLOSING: simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment); * PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close); * TIMEWAIT: packet is ignored. Note that the condition of receiving a packet in state CLOSED here is different from the condition "there is no socket for such a connection": the socket still exists, but its state indicates it is unusable. Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:13 -08:00
Gerrit Renker	f11135a344	[DCCP]: Dedicated auxiliary states to support passive-close This adds two auxiliary states to deal with passive closes: * PASSIVE_CLOSE (reached from OPEN via reception of Close) and * PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq) as internal intermediate states. These states are used to allow a receiver to process unread data before acknowledging the received connection-termination-request (the Close/CloseReq). Without such support, it will happen that passively-closed sockets enter CLOSED state while there is still unprocessed data in the queue; leading to unexpected and erratic API behaviour. PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will seamlessly work with inet_accept() (which tests for this state). The state names are thanks to Arnaldo, who suggested this naming scheme following an earlier revision of this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:12 -08:00
Gerrit Renker	f53dc67c5e	[DCCP]: Use AF-independent rebuild_header routine This fixes a nasty bug: dccp_send_reset() is called by both DCCPv4 and DCCPv6, but uses inet_sk_rebuild_header() in each case. This leads to unpredictable and weird behaviour: under some conditions, DCCPv6 Resets were sent, in other not. The fix is to use the AF-independent rebuild_header routine. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:12 -08:00
Arnaldo Carvalho de Melo	276f2edc52	[TFRC]: Migrate TX history to singly-linked lis This patch was based on another made by Gerrit Renker, his changelog was: ------------------------------------------------------ The patch set migrates TFRC TX history to a singly-linked list. The details are: * use of a consistent naming scheme (all TFRC functions now begin with `tfrc_'); * allocation and cleanup are taken care of internally; * provision of a lookup function, which is used by the CCID TX infrastructure to determine the time a packet was sent (in turn used for RTT sampling); * integration of the new interface with the present use in CCID3. ------------------------------------------------------ Simplifications I did: . removing the tfrc_tx_hist_head that had a pointer to the list head and another for the slabcache. . No need for creating a slabcache for each CCID that wants to use the TFRC tx history routines, create a single slabcache when the dccp_tfrc_lib module init routine is called. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:11 -08:00
Ilpo Järvinen	ea4f76ae13	[TCP]: Two fixes to new sacktag code 1) Skip condition used to be wrong way around which made SACK processing very broken, missed many blocks because of that. 2) Use highest_sack advancement only if some skbs are already sacked because otherwise tcp_write_queue_next may move things too far (occurs mainly with GSO). The other similar advancement is not problem because highest_sack was previosly put to point a sacked skb. These problems were located because of problem report from Matt Mathis <mathis@psc.edu>. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:10 -08:00
Pavel Emelyanov	df1b86c53d	[NET]: Nicer WARN_ON in netstat_show The if (statement) WARN_ON(1); looks much better as WARN_ON(statement); Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:10 -08:00
Fred L. Templin	c7dc89c0ac	[IPV6]: Add RFC4214 support This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the "iproute2" utility. The diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This version includes the diff for ./include/linux/if.h which was missing in the v2.4 submission and is needed to make the patch compile. The patch has been installed, compiled and tested in a clean 2.6.24-rc2 kernel build area. Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:09 -08:00
Pavel Emelyanov	df97c708d5	[NET]: Eliminate unused argument from sk_stream_alloc_pskb The 3rd argument is always zero (according to grep :) Eliminate it and merge the function with sk_stream_alloc_skb. This saves 44 more bytes, and together with the previous patch we have: add/remove: 1/0 grow/shrink: 0/8 up/down: 183/-751 (-568) function old new delta sk_stream_alloc_skb - 183 +183 ip_rt_init 529 525 -4 arp_ignore 112 107 -5 __inet_lookup_listener 284 274 -10 tcp_sendmsg 2583 2481 -102 tcp_sendpage 1449 1300 -149 tso_fragment 417 258 -159 tcp_fragment 1149 988 -161 __tcp_push_pending_frames 1998 1837 -161 Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:08 -08:00
Pavel Emelyanov	f561d0f27d	[NET]: Uninline the sk_stream_alloc_pskb This function seems too big for inlining. Indeed, it saves half-a-kilo when uninlined: add/remove: 1/0 grow/shrink: 0/7 up/down: 195/-719 (-524) function old new delta sk_stream_alloc_pskb - 195 +195 ip_rt_init 529 525 -4 __inet_lookup_listener 284 274 -10 tcp_sendmsg 2583 2486 -97 tcp_sendpage 1449 1305 -144 tso_fragment 417 267 -150 tcp_fragment 1149 992 -157 __tcp_push_pending_frames 1998 1841 -157 Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:07 -08:00
Joonwoo Park	3015a347dc	[IPV4] fib_hash: kmalloc + memset conversion to kzalloc fib_hash: kmalloc + memset conversion to kzalloc fix to avoid memset entirely. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:07 -08:00
Joonwoo Park	88f8349164	[IPV4] fib_semantics: kmalloc + memset conversion to kzalloc fib_semantics: kmalloc + memset conversion to kzalloc fix to avoid memset entirely. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:06 -08:00
Joonwoo Park	dcaee95a1b	[IPSEC]: kmalloc + memset conversion to kzalloc 2007/11/26, Patrick McHardy <kaber@trash.net>: > How about also switching vmalloc/get_free_pages to GFP_ZERO > and getting rid of the memset entirely while you're at it? > xfrm_hash: kmalloc + memset conversion to kzalloc fix to avoid memset entirely. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:05 -08:00
Ilpo Järvinen	8512430e55	[TCP]: Move FRTO checks out from write queue abstraction funcs Better place exists in update_send_head (other non-queue related adjustments are done there as well) which is the only caller of tcp_advance_send_head (now that the bogus call from mtu_probe is gone). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:05 -08:00
Pavel Emelyanov	82d8a867ff	[NET]: Make macro to specify the ptype_base size Currently this size is 16, but as the comment says this is so only because all the chains (except one) has the length 1. I think, that some day this may change, so growing this hash will be much easier. Besides, symbolic names are read better than magic constants. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:04 -08:00
Pavel Emelyanov	8d8ad9d7c4	[NET]: Name magic constants in sock_wake_async() The sock_wake_async() performs a bit different actions depending on "how" argument. Unfortunately this argument ony has numerical magic values. I propose to give names to their constants to help people reading this function callers understand what's going on without looking into this function all the time. I suppose this is 2.6.25 material, but if it's not (or the naming seems poor/bad/awful), I can rework it against the current net-2.6 tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:03 -08:00
Gerrit Renker	ce865a61c8	[DCCP]: Add support for abortive release This continues from the previous patch and adds support for actively aborting a DCCP connection, using a Reset Code 2, "Aborted" to inform the peer of an abortive release. I have tried this in various client/server settings and it works as expected. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:02 -08:00
Gerrit Renker	d83bd95bf1	[DCCP]: Check for unread data on close This removes one FIXME with regard to close when there is still unread data. The mechanism is implemented similar to TCP: with regard to DCCP-specifics, a Reset with Code 2, "Aborted" is sent to the peer. This corresponds in part to RFC 4340, 8.1.1 and 8.1.5. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:01 -08:00
Gerrit Renker	dcfbc7e97a	[CCID2]: Remove misleading comment This removes a comment which identifies an `issue' with dccp_write_xmit() where there is none. The comment assumes it is possible that a packet is sent between the calls to ccid_hc_tx_send_packet(), dccp_transmit_skb(), ccid_hc_tx_packet_sent() (in the above order) in dccp_write_xmit(). I think that this is impossible, since dccp_write_xmit() is always called under lock: * when called as dccp_write_xmit(sk, 1) from dccp_send_close(), the socket is locked (see code comment above dccp_send_close()); * when called as dccp_write_xmit(sk, 0) from dccp_send_msg(), it is after lock_sock() has been called; * when called as dccp_write_xmit(sk, 0) from dccp_write_xmit_timer(), bh_lock_sock() has been called and the if/else statement has made sure that sk_lock.owner is not set; * there are no other places where dccp_write_xmit() is called. Furthermore, the debug statement for printing the sequence number of the packet just sent has been removed, since the entire list is being printed anyway and so the entry of that number appears last. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:01 -08:00
Gerrit Renker	a302002516	[CCID2]: Remove redundant ack-counting variable The code used two different variables to count Acks, one of them redundant. This patch reduces the number of Ack counters to one. The type of the Ack counter has also been changed to u32 (twice the range of int); and the variable has been renamed into `packets_acked' - for consistency with RFC 3465 (and similarly named variables are used by TCP and SCTP). Lastly, a slightly less aggressive `maxincr' increment is used (for even Ack Ratios, maxincr was Ack Ratio/2 + 1 instead of Ack Ratio/2). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:00 -08:00
Gerrit Renker	83399361c3	[CCID2]: Remove redundant synchronisation variable This removes the synchronisation variable `ccid2hctx_sendwait', which is set to 1 when the CCID2 sender may send a new packet, and which is set to 0 otherwise The variable is redundant, since it is only used in combination with the hc_tx_send_packet/ hc_tx_packet_sent function pair. Both functions are called under socket lock, so the following happens when the CCID2 may send a new packet: * it sets sendwait = 1 in tx_send_packet and returns 0; * the subsequent call to tx_packet_sent clears the sendwait flag; * since tx_send_packet returns 0 if and only if sendwait == 1, the BUG_ON condition in tx_packet_sent is never satisfied, since that function is never called when tx_send_packet returns a value different from 0 (cf. dccp_write_xmit); * the call to tx_packet_sent clears the flag so that the condition "!sendwait" is true the next time tx_packet_sent is called. In other words, it is sufficient to just return 0 / not-0 to synchronise tx_send_packet and tx_packet_sent -- which is what the patch does. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:59 -08:00
Gerrit Renker	da98e0b5d4	[CCID2]: Redundant debugging output This reduces the amount of redundant debugging messages: * pipe/cwnd are printed in both tx_send_packet() and tx_packet_sent(). Both functions are called immediately after one another, so one occurrence is sufficient. * Since tx_packet_sent() prints pipe/cwnd already, the second printk for pipe is redundant. * In tx_packet_sent() the check_sanity function is called twice (at the begin and at the end). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:59 -08:00
Gerrit Renker	95b21d7e9d	[CCID2]: Replace pipe assignment-function with assignment The function ccid2_change_pipe only does an assignment. This patch simplifies the code by replacing the function with the assignment it performs. Furthermore, the type of pipe is promoted from `signed' to unsigned (increasing the range). As a result, a BUG_ON test for negative values now becomes obsolete (for safety not removed, but replaced with a less annoying `DCCP_BUG'). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:58 -08:00
Gerrit Renker	3deeadd74b	[CCID2]: Replace cwnd assignment-function with assignment The current function ccid2_change_cwnd in effect makes only an assignment, as the test whether cwnd has reached 0 is only required when cwnd is halved. This patch simplifies the code by replacing the function with the assignment it performs. Furthermore, since ssthresh derives from cwnd and appears in many assignments and comparisons, the type of ssthresh has also been changed to match that of cwnd. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:57 -08:00
Gerrit Renker	63df18ad7f	[CCID2]: Replace read-only variable with constant This replaces the field member `numdupack', which was used as a read-only constant in the code, with a #define. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:57 -08:00
Gerrit Renker	7792cd8885	[CCID2]: Remove unused variable This removes a variable `ccid2hctx_sent' which is incremented but never referenced/read (i.e., dead code). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:56 -08:00
Gerrit Renker	900bfed471	[CCID2]: Disable broken Ack Ratio adaptation algorithm This comments out a problematic section comprising a half-finished algorithm: - The variable `ccid2hctx_ackloss' is never initialised to a value different from 0 and hence in fact is a read-only constant. - The `arsent' variable counts packets other than Acks (it is incremented for every packet), and there is no test for Ack Loss. - The concept of counting Acks as such leads to a complex calculation, and the calculation at the moment is inconsistent with this concept. The problem is that the number of Acks - rather than the number of windows - is counted, which leads to a complex (cubic/quadratic) expression - this is not even implemented. In its current state, the commented-out algorithm interfers with normal processing by changing Ack Ratio incorrectly, and at the wrong times. A new algorithm is necessary, which will not necessarily use the same variables as used by the unfinished one; hence the old variables have been removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:55 -08:00
Gerrit Renker	b00d2bbc45	[CCID2]: Larger initial windows also for CCID2 RFC 4341, sec. 5 states that "The cwnd parameter is initialized to at most four packets for new connections, following the rules from [RFC3390]", which is implemented by this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:55 -08:00
Arnaldo Carvalho de Melo	e18d7a9857	[DCCP]: Initialize dccp_sock before calling the ccid constructors This is because in the next patch CCID2 will assume that dccps_mss_cache is non-zero. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:54 -08:00
Gerrit Renker	d50ad163e6	[CCID2]: Deadlock and spurious timeouts when Ack Ratio > cwnd This patch removes a bug in the current code. I agree with Andrea's comment that there is a problem here but the way it is treated does not fix it. The problem is that whenever Ack Ratio > cwnd, starvation/deadlock occurs: * the receiver will not send an Ack until (Ack Ratio - cwnd) data packets have arrived; * the sender will not send any data packet before the receipt of an Ack advances the send window. The only way that the connection then progresses was via RTO timeout. In one extreme case (bulk transfer), it was observed that this happened for every single packet; i.e. hundreds of packets, each a RTO timeout of 1..3 seconds apart: a transfer which normally would take a fraction of a second thus grew to several minutes. The solution taken by this approach is to observe the relation "Ack Ratio <= cwnd" by using the constraint (1) from RFC 4341, 6.1.2; i.e. set Ack Ratio = ceil(cwnd / 2) and update it whenever either Ack Ratio or cwnd change. This ensures that the deadlock problem can not arise. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:53 -08:00
Gerrit Renker	df054e1d00	[CCID2]: Don't assign negative values to Ack Ratio Since it makes not sense to assign negative values to Ack Ratio, this patch disallows this possibility. As a consequence, a Bug test for negative Ack Ratio values becomes obsolete. Furthermore, a check against overflow (as Ack Ratio may not exceed 2 bytes, due to RFC 4340, 11.3) has been added. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:53 -08:00
Gerrit Renker	cfbbeabc88	[CCID2]: Fix sequence number arithmetic/comparisons This replaces use of normal subtraction with modulo-48 subtraction. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:52 -08:00
Gerrit Renker	3de5489f47	[CCID2]: Bug in reading Ack Vectors In CCID2 the receiver-history is sorted in ascending order of sequence number, but the processing of received Ack Vectors requires the list traversal in the opposite direction. The current code has a bug in this regard: the list traversal is upwards. As a consequence, only Ack Vectors with a run length of 1 will pass, in all other Ack Vectors the remaining (acked) sequence numbers are missed, and may later falsely be identified as lost. Note: This bug is only visible when Ack Ratio > 1, since otherwise the run lengths of Ack Vectors are 0. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:51 -08:00
Gerrit Renker	a47c51044a	[ACKVEC]: Reduce length of identifiers This is reduces the length of the struct ackvec/ackvec_record fields. It is a purely text-based replacement: s#dccpavr_#avr_#g; s#dccpav_#av_#g; and increases readability somewhat. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:51 -08:00
Pavel Emelyanov	f126734735	[IPV6]: Correct the comment concerning inetsw6 table It seems that net/ipv6/af_inet6.c was copied from net/ipv4/af_inet.c, but one comment was not fixed. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:49 -08:00
Pavel Emelyanov	a53eb3feb2	[UNIX] Move the unix sock iterators in to proper place The first_unix_socket() and next_unix_sockets() are now used in proc file and in forall_unix_socets macro only. The forall_unix_sockets is not used in this file at all so remove it. After this move the helpers to where they really belong, i.e. closer to proc code under the #ifdef CONFIG_PROC_FS option. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:49 -08:00
Gerrit Renker	c86ab2b6a5	[DCCP]: Ignore Ack Vectors / Elapsed Time on DCCP-Request also Small update with regard to RFC 4340 (references added as documentation): on Requests, Ack Vectors / Elapsed Time should be ignored. Length handling of Elapsed Time also simplified. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:47 -08:00
Gerrit Renker	6d57b43bf8	[DCCP]: Remove redundant dependency on IP_DCCP This cleans up the consequences of an earlier patch which introduced the `if IP_DCCP' clause into net/dccp/Kconfig. The CCID Kconfig menu is sourced within this clause; as a consequence, all tests of type `depends on IP_DCCP' are now redundant. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:46 -08:00
Gerrit Renker	e333b3edc4	[DCCP]: Promote CCID2 as default CCID This patch addresses the following problems: 1. DCCP relies for its proper functioning on having at least one CCID module enabled (as in TCP plugable congestion control). Currently it is possible to disable both CCIDs and thus leave the DCCP module in a compiled, but entirely non-functional state: no sockets can be created when no CCID is available. Furthermore, the protocol is (again like TCP) not intended to be used without CCIDs. Last, a non-empty CCID list is needed for doing CCID feature negotiation. 2. Internally the default CCID that is advertised by the Linux host is set to CCID2 (DCCPF_INITIAL_CCID in include/linux/dccp.h). Disabling CCID2 in the Kconfig menu without changing the defaults leads to a failure `module not found' when trying to load the dccp module (which internally tries to load the default CCID). 3. The specification (RFC 4340, sec. 10) treats CCID2 somewhat like a `minimum common denominator'; the specification says that: * "New connections start with CCID 2 for both endpoints" * "A DCCP implementation intended for general use, such as an implementation in a general-purpose operating system kernel, SHOULD implement at least CCID 2. The intent is to make CCID 2 broadly available for interoperability [...]" Providing CCID2 as minimum-required CCID (like Reno/Cubic in TCP) thus seems reasonable. Hence this patch automatically selects CCID2 when DCCP is enabled. Documentation also added. Discussions with Ian McDonald on this subject are gratefully acknowledged. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:46 -08:00
Gerrit Renker	8e8c71f1ab	[DCCP]: Honour and make use of shutdown option set by user This extends the DCCP socket API by honouring any shutdown(2) option set by the user. The behaviour is, as much as possible, made consistent with the API for TCP's shutdown. This patch exploits the information provided by the user via the socket API to reduce processing costs: * if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID; * if the write end is closed (SHUT_WR), the same idea applies, but with a difference - as long as the TX queue has not been drained, we need to receive feedback to keep congestion-control rates up to date. Hence SHUT_WR is honoured only after the last packet (under congestion control) has been sent; * although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c). Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added to this by the present patch. There will also no longer be any delivery when the socket is in the final stages, i.e. when one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at that stage the connection is its final stages. Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7). Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make sure the socket is a TCP socket". This should probably be extended to mean `TCP or DCCP socket' (the code is also used by UDP and raw sockets). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:44 -08:00
Gerrit Renker	c3ada46a00	[CCID3]: Inline for moving average The moving average computation occurs so frequently in the CCID 3 code that it merits an inline function of its own. This is uses a suggestion by Arnaldo as per http://www.mail-archive.com/dccp@vger.kernel.org/msg01662.html Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:43 -08:00
Gerrit Renker	a5358fdc9c	[CCID3]: Accurately determine idle & application-limited periods This fixes/updates the handling of idle and application-limited periods in CCID3, which currently is broken: there is no detection as to how long a sender has been idle - there is only one flag which is toggled in between function calls. Being obsolete now, the `idle' flag is removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:42 -08:00
Gerrit Renker	eb279b79c4	[CCID3]: Ignore trivial amounts of elapsed time This patch fixes a previously undiscovered bug; the problem is in computing the elapsed time as the time between `receiving' the packet (i.e. skb enters CCID module) and sending feedback: - there is no layer-processing, queueing, or delay involved, - hence the elapsed time is in the order of 1 function call - this is in the dimension of maximally 50..100usec - which renders the use of elapsed time almost entirely useless. The fix is simply to ignore such trivial amounts of elapsed time. As a further advantage, the now useless elapsed_time field can be removed from the socket, which reduces the socket structure by another four bytes. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:42 -08:00
Gerrit Renker	6c08b2cf48	[CCID3]: Revert use of MSS instead of s This updates the CCID3 code with regard to two instances of using `MSS' in place of `s': 1. The RFC3390-based initial rate: both rfc3448bis as well as the Faster Restart draft now consistently use `s' instead of MSS. 2. Now agrees with section 4.2 of rfc3448bis: "If the sender is ready to send data when it does not yet have a round trip sample, the value of X is set to s bytes per second, for segment size s [...]" Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:41 -08:00
Arnaldo Carvalho de Melo	ebb53d7565	[NET] proto: Use pcounters for the inuse field Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:40 -08:00
Johannes Berg	0c884439db	mac80211: remove more forgotten code Hopefully that's the rest. Seems I didn't do a very thorough job removing the management interface. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:39 -08:00
Helmut Schaa	48933dea47	mac80211: Remove local->scan_flags This patch removes all references to local->scan_flags as these are not used anymore since the removal of prism2 ioctls. Signed-off-by: Helmut Schaa <hschaa@suse.de> Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:37 -08:00
Johannes Berg	dabeb344f5	mac80211: provide interface iterator for drivers Sometimes drivers need to know which interfaces are associated with their hardware. Rather than forcing those drivers to keep track of the interfaces that were added, this adds an iteration function to mac80211. As it is intended to be used from the interface add/remove callbacks, the iteration function may currently only be called under RTNL. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:37 -08:00
Pavel Emelyanov	9859a79023	[NET]: Compact sk_stream_mem_schedule() code This function references sk->sk_prot->xxx for many times. It turned out, that there's so many code in it, that gcc cannot always optimize access to sk->sk_prot's fields. After saving the sk->sk_prot on the stack and comparing disassembled code, it turned out that the function became ~10 bytes shorter and made less dereferences (on i386 and x86_64). Stack consumption didn't grow. Besides, this patch drives most of this function into the 80 columns limit. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:36 -08:00
Benjamin Thery	3ef1355dcb	[NET]: Make netns cleanup to run in a separate queue This patch adds a separate workqueue for cleaning up a network namespace. If we use the keventd workqueue to execute cleanup_net(), there is a problem to unregister devices in IPv6. Indeed the code that cleans up also schedule work in keventd: as long as cleanup_net() hasn't return, dst_gc_task() cannot run and as long as dst_gc_task() has not run, there are still some references pending on the net devices and cleanup_net() can not unregister and exit the keventd workqueue. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Denis V. Lunev <den@openvz.org> Acked-By: Kirill Korotaev <dev@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:35 -08:00
Pavel Emelyanov	85b606800b	[IPVS]: Relax the module get/put in ip_vs_app.c Both try_module_get/module_put already handle the module == NULL case, so no need in manual checking. This patch fits both net-2.6 and net-2.6.25. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:35 -08:00
Adrian Bunk	02d45827fa	[NET] net/core/request_sock.c: Remove unused exports. This patch removes the following unused EXPORT_SYMBOL's: - reqsk_queue_alloc - __reqsk_queue_destroy - reqsk_queue_destroy Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:33 -08:00
Eric Dumazet	beb659bd8c	[PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue Every 600 seconds (ip_rt_secret_interval), a softirq flush of the whole ip route cache is triggered. On loaded machines, this can starve softirq for many seconds and can eventually crash. This patch moves this flush to a workqueue context, using the worker we intoduced in commit `39c90ece75` (IPV4: Convert rt_check_expire() from softirq processing to workqueue.) Also, immediate flushes (echo 0 >/proc/sys/net/ipv4/route/flush) are using rt_do_flush() helper function, wich take attention to rescheduling. Next step will be to handle delayed flushes ("echo -1 >/proc/sys/net/ipv4/route/flush" or "ip route flush cache") Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:33 -08:00
Pavel Emelyanov	42a73808ed	[RAW]: Consolidate proc interface. Both ipv6/raw.c and ipv4/raw.c use the seq files to walk through the raw sockets hash and show them. The "walking" code is rather huge, but is identical in both cases. The difference is the hash table to walk over and the protocol family to check (this was not in the first virsion of the patch, which was noticed by YOSHIFUJI) Make the ->open store the needed hash table and the family on the allocated raw_iter_state and make the start/next/stop callbacks work with it. This removes most of the code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:32 -08:00
Pavel Emelyanov	ab70768ec7	[RAW]: Consolidate proto->unhash callback Same as the ->hash one, this is easily consolidated. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:31 -08:00
Pavel Emelyanov	65b4c50b47	[RAW]: Consolidate proto->hash callback Having the raw_hashinfo it's easy to consolidate the raw[46]_hash functions. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:31 -08:00
Pavel Emelyanov	b673e4dfc8	[RAW]: Introduce raw_hashinfo structure The ipv4/raw.c and ipv6/raw.c contain many common code (most of which is proc interface) which can be consolidated. Most of the places to consolidate deal with the raw sockets hashtable, so introduce a struct raw_hashinfo which describes the raw sockets hash. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:30 -08:00
Pavel Emelyanov	69d6da0b0f	[IPv6] RAW: Compact the API for the kernel Same as in the previous patch for ipv4, compact the API and hide hash table and rwlock inside the raw.c file. Plus fix some "bad" places from checkpatch.pl point of view (assignments inside if()). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:29 -08:00
Pavel Emelyanov	7bc54c9030	[IPv4] RAW: Compact the API for the kernel The raw sockets functions are explicitly used from inside the kernel in two places: 1. in ip_local_deliver_finish to intercept skb-s 2. in icmp_error For this purposes many functions and even data structures, that are naturally internal for raw protocol, are exported. Compact the API to two functions and hide all the other (including hash table and rwlock) inside the net/ipv4/raw.c Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:28 -08:00
Denis V. Lunev	e372c41401	[NET]: Consolidate net namespace related proc files creation. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:28 -08:00
Denis V. Lunev	097e66c578	[NET]: Make AF_UNIX per network namespace safe [v2] Because of the global nature of garbage collection, and because of the cost of per namespace hash tables unix_socket_table has been kept global. With a filter added on lookups so we don't see sockets from the wrong namespace. Currently I don't fold the namesapce into the hash so multiple namespaces using the same socket name will be guaranteed a hash collision. Changes from v1: - fixed unix_seq_open Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:27 -08:00
Denis V. Lunev	d12d01d6b4	[NET]: Make AF_PACKET handle multiple network namespaces This is done by making packet_sklist_lock and packet_sklist per network namespace and adding an additional filter condition on received packets to ensure they came from the proper network namespace. Changes from v1: - prohibit to call inet_dgram_ops.ioctl in other than init_net Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:26 -08:00
Eric W. Biederman	4b3da706bb	[NET]: Make the netlink methods in rtnetlink handle multiple network namespaces After the previous prep work this just consists of removing checks limiting the code to work in the initial network namespace, and updating rtmsg_ifinfo so we can generate events for devices in something other then the initial network namespace. Referring to network other network devices like the IFLA_LINK and IFLA_MASTER attributes do, gets interesting if those network devices happen to be in other network namespaces. Currently ifindex numbers are allocated globally so I have taken the path of least resistance and not still report the information even though the devices they are talking about are invisible. If applications start getting confused or when ifindex numbers become local to the network namespace we may need to do something different in the future. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Denis V. Lunev <den@openz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:26 -08:00
Denis V. Lunev	97c53cacf0	[NET]: Make rtnetlink infrastructure network namespace aware (v3) After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:25 -08:00
Denis V. Lunev	b854272b3c	[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2) Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2008-01-28 14:54:24 -08:00
David S. Miller	1b0b04f9fb	[IPCONFIG]: Mark vendor_class_identifier as __initdata. Based upon a suggestion by Francois Romieu. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:22 -08:00
Rumen G. Bogdanovski	b209639e8a	[IPVS]: Create synced connections with their real state With this patch the synced connections are created with their real state, which can be changed on the next synchronizations if necessary. This way on fail-over all the connections will be treated according to their actual state, causing no scheduling problems (the active and the nonactive connections have different weights in the schedulers). The backwards compatibility is preserved and the existing tools will show the true connection states even on the backup director. Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:21 -08:00
Rumen G. Bogdanovski	7a4fbb1fa4	[IPVS]: Flag synced connections and expose them in proc This patch labels the sync-created connections with IP_VS_CONN_F_SYNC flag and creates /proc/net/ip_vs_conn_sync to enable monitoring of the origin of the connections, if they are local or created by the synchronization. Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:21 -08:00
Mattias Nissler	6a4329554c	mac80211: Accept auto txpower setting This changes the SIWTXPOWER ioctl to also accept a txpower setting of "automatic". Since mac80211 currently cannot tell drivers to automatically adjust tx power, we select the tx power level of the current channel. While this is kind of a hack, it certainly saves some iwconfig users from headaches. Signed-off-by: Mattias Nissler <mattias.nissler@gmx.de> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:19 -08:00
Stephen Hemminger	c7b6ea24b4	[NETPOLL]: Don't need rx_flags. The rx_flags variable is redundant. Turning rx on/off is done via setting the rx_np pointer. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:18 -08:00
Stephen Hemminger	33f807ba0d	[NETPOLL]: Kill NETPOLL_RX_DROP, set but never tested. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:18 -08:00
Stephen Hemminger	0953864160	[NETPOLL]: no need to store local_mac The local_mac is managed by the network device, no need to keep a spare copy and all the management problems that could cause. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:17 -08:00
Stephen Hemminger	5106930bd6	[NETPOLL]: netpoll_poll() cleanup Restructure code slightly to improve readability: * dereference device once * change obvious while() loop * let poll_napi() handle null list itself Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:16 -08:00
Stephen Hemminger	0adc9add77	[NETPOLL]: Use skb_queue_purge(). Use standard routine for flushing queue. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:16 -08:00
Ilpo Järvinen	20de20beba	[TCP]: Correct DSACK check placing Previously one of the in-block skip branches was missing it. Also, drop it from tail-fully-processed case because the next iteration will do exactly the same thing, i.e., process the SACK block that contains the DSACK information. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:15 -08:00
Oliver Hartkopp	ccb2963799	[CAN]: Add virtual CAN netdevice driver This patch adds the virtual CAN bus (vcan) network driver. The vcan device is just a loopback device for CAN frames, no real CAN hardware is involved. Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:12 -08:00
Oliver Hartkopp	ffd980f976	[CAN]: Add broadcast manager (bcm) protocol This patch adds the CAN broadcast manager (bcm) protocol. Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:11 -08:00
Oliver Hartkopp	c18ce101f2	[CAN]: Add raw protocol This patch adds the CAN raw protocol. Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:10 -08:00
Oliver Hartkopp	0d66548a10	[CAN]: Add PF_CAN core module This patch adds the CAN core functionality but no protocols or drivers. No protocol implementations are included here. They come as separate patches. Protocol numbers are already in include/linux/can.h. Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:10 -08:00
Oliver Hartkopp	cd05acfe65	[CAN]: Allocate protocol numbers for PF_CAN This patch adds a protocol/address family number, ARP hardware type, ethernet packet type, and a line discipline number for the SocketCAN implementation. Signed-off-by: Oliver Hartkopp <oliver.hartkopp@volkswagen.de> Signed-off-by: Urs Thuermann <urs.thuermann@volkswagen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:09 -08:00
Eric Dumazet	8dbde28d97	[NET]: NET_CLS_ROUTE : convert ip_rt_acct to per_cpu variables ip_rt_acct needs 4096 bytes per cpu to perform some accounting. It is actually allocated as a single huge array [4096*NR_CPUS] (rounded up to a power of two) Converting it to a per cpu variable is wanted to : - Save space on machines were num_possible_cpus() < NR_CPUS - Better NUMA placement (each cpu gets memory on its node) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:08 -08:00
Ilpo Järvinen	68f8353b48	[TCP]: Rewrite SACK block processing & sack_recv_cache use Key points of this patch are: - In case new SACK information is advance only type, no skb processing below previously discovered highest point is done - Optimize cases below highest point too since there's no need to always go up to highest point (which is very likely still present in that SACK), this is not entirely true though because I'm dropping the fastpath_skb_hint which could previously optimize those cases even better. Whether that's significant, I'm not too sure. Currently it will provide skipping by walking. Combined with RB-tree, all skipping would become fast too regardless of window size (can be done incrementally later). Previously a number of cases in TCP SACK processing fails to take advantage of costly stored information in sack_recv_cache, most importantly, expected events such as cumulative ACK and new hole ACKs. Processing on such ACKs result in rather long walks building up latencies (which easily gets nasty when window is huge). Those latencies are often completely unnecessary compared with the amount of _new_ information received, usually for cumulative ACK there's no new information at all, yet TCP walks whole queue unnecessary potentially taking a number of costly cache misses on the way, etc.! Since the inclusion of highest_sack, there's a lot information that is very likely redundant (SACK fastpath hint stuff, fackets_out, highest_sack), though there's no ultimate guarantee that they'll remain the same whole the time (in all unearthly scenarios). Take advantage of this knowledge here and drop fastpath hint and use direct access to highest SACKed skb as a replacement. Effectively "special cased" fastpath is dropped. This change adds some complexity to introduce better coveraged "fastpath", though the added complexity should make TCP behave more cache friendly. The current ACK's SACK blocks are compared against each cached block individially and only ranges that are new are then scanned by the high constant walk. For other parts of write queue, even when in previously known part of the SACK blocks, a faster skip function is used (if necessary at all). In addition, whenever possible, TCP fast-forwards to highest_sack skb that was made available by an earlier patch. In typical case, no other things but this fast-forward and mandatory markings after that occur making the access pattern quite similar to the former fastpath "special case". DSACKs are special case that must always be walked. The local to recv_sack_cache copying could be more intelligent w.r.t DSACKs which are likely to be there only once but that is left to a separate patch. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:07 -08:00
Ilpo Järvinen	fd6dad616d	[TCP]: Earlier SACK block verification & simplify access to them Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:07 -08:00
Ilpo Järvinen	9e10c47cb9	[TCP]: Create tcp_sacktag_one(). Worker function that implements the main logic of the inner-most loop of tcp_sacktag_write_queue(). Idea was originally presented by David S. Miller. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:06 -08:00
Ilpo Järvinen	b7d4815f35	[TCP]: Prior_fackets can be replaced by highest_sack seq Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:05 -08:00
Ilpo Järvinen	9f58f3b721	[TCP]: Make lost retrans detection more self-contained Highest_sack_end_seq is no longer calculated in the loop, thus it can be pushed to the worker function altogether making that function independent of the sacktag. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:04 -08:00
Ilpo Järvinen	a47e5a988a	[TCP]: Convert highest_sack to sk_buff to allow direct access It is going to replace the sack fastpath hint quite soon... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:03 -08:00
Ilpo Järvinen	85cc391c0e	[TCP]: non-FACK SACK follows conservative SACK loss recovery Many assumptions that are true when no reordering or other strange events happen are not a part of the RFC3517. FACK implementation is based on such assumptions. Previously (before the rewrite) the non-FACK SACK was basically doing fast rexmit and then it times out all skbs when first cumulative ACK arrives, which cannot really be called SACK based recovery :-). RFC3517 SACK disables these things: - Per SKB timeouts & head timeout entry to recovery - Marking at least one skb while in recovery (RFC3517 does this only for the fast retransmission but not for the other skbs when cumulative ACKs arrive in the recovery) - Sacktag's loss detection flavors B and C (see comment before tcp_sacktag_write_queue) This does not implement the "last resort" rule 3 of NextSeg, which allows retransmissions also when not enough SACK blocks have yet arrived above a segment for IsLost to return true [RFC3517]. The implementation differs from RFC3517 in these points: - Rate-halving is used instead of FlightSize / 2 - Instead of using dupACKs to trigger the recovery, the number of SACK blocks is used as FACK does with SACK blocks+holes (which provides more accurate number). It seems that the difference can affect negatively only if the receiver does not generate SACK blocks at all even though it claimed to be SACK-capable. - Dupthresh is not a constant one. Dynamical adjustments include both holes and sacked segments (equal to what FACK has) due to complexity involved in determining the number sacked blocks between highest_sack and the reordered segment. Thus it's will be an over-estimate. Implementation note: tcp_clean_rtx_queue doesn't need a lost_cnt tweak because head skb at that point cannot be SACKED_ACKED (nor would such situation last for long enough to cause problems). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:03 -08:00
Ilpo Järvinen	f577111302	[TCP]: Extend reordering detection to cover CA_Loss partially This implements more accurately what is stated in sacktag's overall comment: "Both of these heuristics are not used in Loss state, when we cannot account for retransmits accurately." When CA_Loss state is entered, the state changer ensures that undo_marker is only set if no TCPCB_RETRANS skbs were found, thus having non-zero undo_marker in CA_Loss basically tells that the R-bits still accurately reflect the current state of TCP. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:02 -08:00
Ilpo Järvinen	b9d86585dc	[TCP]: Move !in_sack test earlier in sacktag & reorganize if()s All intermediate conditions include it already, make them simpler as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:01 -08:00
Pavel Emelyanov	c0ef877b2c	[NET]: Move sock_valbool_flag to socket.c The sock_valbool_flag() helper is used in setsockopt to set or reset some flag on the sock. This helper is required in the net/socket.c only, so move it there. Besides, patch two places in sys_setsockopt() that repeat this helper functionality manually. Since this is not a bugfix, but a trivial cleanup, I prepared this patch against net-2.6.25, but it also applies (with a single offset) to the latest net-2.6. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:00 -08:00
Pavel Emelyanov	de0fa95c14	[NET]: Use sockfd_lookup_light in the rest of the net/socket.c Some time ago a sockfd_lookup_light was introduced and most of the socket.c file was patched to use it. However two routines were left - sys_sendto and sys_recvfrom. Patch them as well, since this helper does exactly what these two need. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:00 -08:00
Rainer Jochem	62013dbb84	[IPV4] ipconfig: Implement DHCP Class-identifier From : Rainer Jochem <rainer.jochem@mpi-sb.mpg.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:59 -08:00
Eric Dumazet	20fea08b5f	[NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections. Qdisc_class_ops are const, and Qdisc_ops are mostly read. Using "const" and "__read_mostly" qualifiers helps to reduce false sharing. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:58 -08:00
YOSHIFUJI Hideaki	2a8cc6c890	[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table. Policy table is implemented as an RCU linear list since we do not expect large list nor frequent updates. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:58 -08:00
YOSHIFUJI Hideaki	303065a854	[IPV6] ADDRCONF: Allow address selection policy with ifindex. This patch allows ifindex to be a key for address selection policy table. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:57 -08:00
YOSHIFUJI Hideaki	c1ee656ccb	[IPV6] ADDRCONF: Rename ipv6_saddr_label() to ipv6_addr_label(). This patch renames ipv6_saddr_label() to ipv6_addr_label() because address label is used for both of source address and destination address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:56 -08:00
David S. Miller	294b4baf29	[IPSEC]: Kill afinfo->nf_post_routing After changeset: [NETFILTER]: Introduce NF_INET_ hook values It always evaluates to NF_INET_POST_ROUTING. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:55 -08:00
Patrick McHardy	6e23ae2a48	[NETFILTER]: Introduce NF_INET_ hook values The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:55 -08:00
Herbert Xu	1bf06cd2e3	[IPSEC]: Add async resume support on input This patch adds support for async resumptions on input. To do so, the transform would return -EINPROGRESS and subsequently invoke the function xfrm_input_resume to resume processing. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:54 -08:00
Herbert Xu	60d5fcfb19	[IPSEC]: Remove nhoff from xfrm_input The nhoff field isn't actually necessary in xfrm_input. For tunnel mode transforms we now throw away the output IP header so it makes no sense to fill in the nexthdr field. For transport mode we can now let the function transport_finish do the setting and it knows where the nexthdr field is. The only other thing that needs the nexthdr field to be set is the header extraction code. However, we can simply move the protocol extraction out of the generic header extraction. We want to minimise the amount of info we have to carry around between transforms as this simplifies the resumption process for async crypto. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:53 -08:00
Herbert Xu	d26f398400	[IPSEC]: Make x->lastused an unsigned long Currently x->lastused is u64 which means that it cannot be read/written atomically on all architectures. David Miller observed that the value stored in it is only an unsigned long which is always atomic. So based on his suggestion this patch changes the internal representation from u64 to unsigned long while the user-interface still refers to it as u64. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:52 -08:00

1 2 3 4 5 ...

6793 commits