In order to allow GRO packets without frag_list at all, we need to
store the MSS in the packet itself. The obvious place is gso_size.
The only thing to watch out for is if the packet ends up not being
GRO then we need to clear gso_size before pushing the packet into
the stack.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks to excellent diagnosis by Eduard Guzovsky.
The core problem is that on a network with lots of active
multicast traffic, the neighbour cache can fill up. If
we try to allocate a new route and thus neighbour cache
entry, the bog-standard GC attempt the neighbour layer does
in ineffective because route entries hold a reference
to the existing neighbour entries and GC can only liberate
entries with no references.
IPV4 already has a way to handle this, by doing a route cache
GC in such situations (when neigh attach returns -ENOBUFS).
So simply mimick this on the ipv6 side.
Tested-by: Eduard Guzovsky <eguzovsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we converted the protocol atomic counters such as the orphan
count and the total socket count deadlocks were introduced due to
the mismatch in BH status of the spots that used the percpu counter
operations.
Based on the diagnosis and patch by Peter Zijlstra, this patch
fixes these issues by disabling BH where we may be in process
context.
Reported-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In future all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators
and other comparisons.
This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
cls_cgroup can't be compiled as a module, since it's not supported by
cgroup.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- It's better to use container_of() instead of casting cgroup_subsys_state *
to cgroup_cls_state *.
- Add helper function task_cls_state().
- Rename net_cls_state() to cgrp_cls_state().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When removing a cgroup, an oops was triggered immediately. The cause
is wrong kfree() in cgrp_destroy().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During network namespace teardown we either move or delete
all of the network devices associated with a network namespace.
In the case of veth devices deleting one will also delete it's
pair device. If both devices are in the same network namespace
then for_each_netdev_safe is insufficient as next may point
to the second veth device we have deleted.
To avoid problems I do what we do in __rtnl_kill_links and
restart the scan of the device list, after we have deleted
a device.
Currently dev_change_netnamespace does not appear to suffer from
this problem, but wireless devices are also paired and likely
should be moved between network namespaces together. So I have
errored on the side of caution and restart the scan of the network
devices in that case as well.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
The initial skb may have been freed after napi_gro_complete in
napi_gro_receive if it was merged into an existing packet. Thus
we cannot check same_flow (which indicates whether it was merged)
after calling napi_gro_complete.
This patch fixes this by saving the same_flow status before the
call to napi_gro_complete.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent GRO patches introduced the NAPI removal of devices in
free_netdev. For drivers that can change the number of queues during
driver operation, the NAPI infrastructure doesn't allow the freeing and
re-addition of NAPI entities without reloading the driver.
This change reinitializes the dev_list in each NAPI struct on delete,
instead of just deleting it (and assigning the list pointers to POISON).
Drivers that wish to remove/re-add NAPI will need to re-initialize the
netdev napi_list after removing all NAPI instances, before re-adding NAPI
devices again.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes a useless ret variable from the IPv4 ESP/UDP
decapsulation code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
atif is tested for being NULL twice, with the same effect in each case. I
have kept the second test, as it seems to fit well with the comment above it.
A simplified version of the semantic patch that makes this change is as
follows: (http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r exists@
local idexpression x;
expression E;
position p1,p2;
@@
if (x@p1 == NULL || ...) { ... when forall
return ...; }
... when != \(x=E\|x--\|x++\|--x\|++x\|x-=E\|x+=E\|x|=E\|x&=E\|&x\)
(
x@p2 == NULL
|
x@p2 != NULL
)
// another path to the test that is not through p1?
@s exists@
local idexpression r.x;
position r.p1,r.p2;
@@
... when != x@p1
(
x@p2 == NULL
|
x@p2 != NULL
)
@fix depends on !s@
position r.p1,r.p2;
expression x,E;
statement S1,S2;
@@
(
- if ((x@p2 != NULL) || ...)
S1
|
- if ((x@p2 == NULL) && ...) S1
|
- BUG_ON(x@p2 == NULL);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.
This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.
Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.
Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted. This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.
The only potential downside is that applications on the remote
end may see multiple SIGURG notifications. However, this would
occur anyway with other TCP stacks. More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The latest ietf socket extensions API draft said:
8.1.21. Set or Get the SCTP Partial Delivery Point
Note also that the call will fail if the user attempts to set
this value larger than the socket receive buffer size.
This patch add this validity check for SCTP_PARTIAL_DELIVERY_POINT
socket option.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If FWD-TSN chunk is received with bad stream ID, the sctp will not do the
validity check, this may cause memory overflow when overwrite the TSN of
the stream ID.
The FORWARD-TSN chunk is like this:
FORWARD-TSN chunk
Type = 192
Flags = 0
Length = 172
NewTSN = 99
Stream = 10000
StreamSequence = 0xFFFF
This patch fix this problem by discard the chunk if stream ID is not
less than MIS.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement socket option SCTP_GET_ASSOC_NUMBER of the latest ietf socket
extensions API draft.
8.2.5. Get the Current Number of Associations (SCTP_GET_ASSOC_NUMBER)
This option gets the current number of associations that are attached
to a one-to-many style socket. The option value is an uint32_t.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just fix a typo in socket.c.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brings maxseg socket option set/get into line with the latest ietf socket
extensions API draft, while maintaining backwards compatibility.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 656299f706
(vlan: convert to net_device_ops) added a net_device_ops
with a NULL ndo_start_xmit field.
This gives a crash in dev_hard_start_xmit()
Fix it using two net_device_ops structures, one for hwaccel vlan,
one for non hwaccel vlan.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the followinf proc entries per-netns:
/proc/net/igmp
/proc/net/mcfilter
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Looks like everything is already ready.
Required for ebtables(8) for one thing.
Also, required for ipmr per-netns (coming soon). (Benjamin)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a locking free version of iucv_message_receive and iucv_message_send
that do not call local_bh_enable in a spin_lock_(bh|irqsave)() context.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
While implementing a TCQ_F_THROTTLED flag there was used an smp_wmb()
in qdisc_watchdog(), but since this flag is practically used only in
sch_netem(), and since it's not even clear what reordering is avoided
here (TCQ_F_THROTTLED vs. __QDISC_STATE_SCHED?) it seems the barrier
could be safely removed.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A command like this: "brctl addif br1 eth1" issued as a user gave me
an oops when bridge module wasn't loaded. It's caused by using a dev
pointer before checking for NULL.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some gcc versions warn that ret may be used uninitialized in
sfq_enqueue(). It's a false positive, so let's annotate this.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds the Backward Congestion Notification Address (BCNA) attribute to the
Backward Congestion Notification (BCN) interface for Data Center Bridging
(DCB), which was missing. Receive the BCNA attribute in the ixgbe driver.
The BCNA attribute is for a switch to inform the endstation about the physical
port identification in order to support BCN on aggregated links.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Data Center Bridging (DCB) had no way to know if setstate had failed in the
driver. This patch enables dcb netlink code to handle the status for the DCB
setstate interface. Likewise it allows the driver to return a failed status
if MSI-X isn't enabled.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Eric W Multanen <eric.w.multanen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements dynamic power save for mac80211. Basically it
means enabling power save mode after an idle period. Implementing it
dynamically gives a good compromise of low power consumption and low
latency. Some hardware have support for this in firmware, but some
require the host to do it.
The dynamic power save is implemented by adding an timeout to
ieee80211_subif_start_xmit(). The timeout can be enabled from userspace
with Wireless Extensions. For example, the command below enables the
dynamic power save and sets the time timeout to 500 ms:
iwconfig wlan0 power timeout 500m
Power save now only works with devices which handle power save in firmware.
It's also disabled by default and the heuristics when and how to enable is
considered as a policy decision and will be left for the userspace to handle.
In case the firmware has support for this, drivers can disable this feature
with IEEE80211_HW_NO_STACK_DYNAMIC_PS.
Big thanks to Johannes Berg for the help with the design and code.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is a preparation for the dynamic power save support. In future there are
two paths to stop the master queues and we need to track this properly to
avoid starting queues incorrectly. Implement this by adding a status
array for each queue.
The original idea and design is from Johannes Berg, I just did
the implementation based on his notes. All the bugs are mine, of course.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Also disable power save when disassociated. It makes no sense to have
power save enabled while disassociated.
iwlwifi seems to have this check in the driver, but it's better to do this
in mac80211 instead.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In stress testing p54usb, the WARN_ON() in ieee80211_tasklet_handler() was
triggered; however, there is no logging of the received value for packet
type. Adding that feature will improve the warning.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes a typo in ieee80211_send_assoc(), net/mac80211/mlme.c.
The error is usage of a wrong member when building
the ie80211 management frame (it should be assoc_req, and not reassoc_req).
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since we do not currently report HT rates (MCS index) in radiotap
header for HT rates, we should not claim the rate is present. The rate
octet itself is used as padding in this case, so only the it_present
flag needs to be removed in case of HT rates.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When a STA roams back to the same AP before the previous STA entry has
expired, a new STA entry is not added in mac80211. However, a Layer 2
Update frame still needs to be transmitted to update layer 2 devices
about the new location for the STA. Without this, switches may
continue to forward frames to the previous (now incorrect) port when
STA roams between APs.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds option for HT-enabled drivers to report HT rates
(HT20/HT40, short GI, MCS index) to mac80211. These rates are
currently not in the rate table, so the rate_idx is used to indicate
MCS index.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
HT management is done differently for AP and STA modes, unify
to just the ->config() callback since HT is fundamentally a
PHY property and cannot be per-BSS.
Rename enum nl80211_sec_chan_offset as nl80211_channel_type to denote
the channel type ( NO_HT, HT20, HT40+, HT40- ).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds signal strength and transmission bitrate
to the station_info of nl80211.
Signed-off-by: Henning Rogge <rogge@fgan.de>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The original message was unhelpful and extremely alarming to our poor
users, despite its charm. Make it less frightening.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel_accept() does not hold the module refcount of newsock->ops->owner,
so we need __module_get(newsock->ops->owner) code after call kernel_accept()
by hand.
In sunrpc, the module refcount is missing to hold. So this cause kernel panic.
Used following script to reproduct:
while [ 1 ];
do
mount -t nfs4 192.168.0.19:/ /mnt
touch /mnt/file
umount /mnt
lsmod | grep ipv6
done
This patch fixed the problem by add __module_get(newsock->ops->owner) to
kernel_accept(). So we do not need to used __module_get(newsock->ops->owner)
in every place when used kernel_accept().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 7035560287.
As pointed out by Mark McLoughlin IP_PKTINFO cmsg data is one
post-queueing user, so this optimization is not valid right
now.
Signed-off-by: David S. Miller <davem@davemloft.net>
And thus when we try to use 'ss -danemi' on these sockets that have no
ccid blocks (data collected using systemtap after I fixed the problem):
dccp_diag_get_info sk=0xffff8801220a3100, dp->dccps_hc_rx_ccid=0x0000000000000000, dp->dccps_hc_tx_ccid=0x0000000000000000
We get an OOPS:
mica.ghostprotocols.net login: BUG: unable to handle kernel NULL pointer
dereferenc0
IP: [<ffffffffa0136082>] dccp_diag_get_info+0x82/0xc0 [dccp_diag]
PGD 12106f067 PUD 122488067 PMD 0
Oops: 0000 [#1] PREEMPT
Fix is trivial, and 'ss -d' is working again:
[root@mica ~]# ss -danemi
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 0 *:5001 *:*
ino:7288 sk:220a3100ffff8801
mem:(r0,w0,f0,t0) cwnd:0 ssthresh:0
[root@mica ~]#
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>