This fixes the following bug in the current implementation of
net/xfrm: SAD entries timeouts do not count the time spent by the machine
in the suspended state. This leads to the connectivity problems because
after resuming local machine thinks that the SAD entry is still valid, while
it has already been expired on the remote server.
The cause of this is very simple: the timeouts in the net/xfrm are bound to
the old mod_timer() timers. This patch reassigns them to the
CLOCK_REALTIME hrtimer.
I have been using this version of the patch for a few months on my
machines without any problems. Also run a few stress tests w/o any
issues.
This version of the patch uses tasklet_hrtimer by Peter Zijlstra
(commit 9ba5f0).
This patch is against 2.6.31.4. Please CC me.
Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds compat_ioctl support for SIOCWANDEV, which has
always been missing.
The definition of struct compat_ifreq was missing an
ifru_settings fields that is needed to support SIOCWANDEV,
so add that and clean up the whitespace damage in the
struct definition.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
SIOCGMIIPHY and SIOCGMIIREG return data through ifreq,
so it needs to be converted on the way out as well.
SIOCGIFPFLAGS is unused, but has the same problem in theory.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When skb_clone() fails, we should increment sk_drops and SNMP counters.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPV6 UDP multicast rx path is a bit complex and can hold a spinlock
for a long time.
Using a small (32 or 64 entries) stack of socket pointers can help
to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
outside of the lock, in most cases.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP multicast rx path is a bit complex and can hold a spinlock
for a long time.
Using a small (32 or 64 entries) stack of socket pointers can help
to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
outside of the lock, in most cases.
It's also a base for a future RCU conversion of multicast recption.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We first locate the (local port) hash chain head
If few sockets are in this chain, we proceed with previous lookup algo.
If too many sockets are listed, we take a look at the secondary
(port, address) hash chain.
We choose the shortest chain and proceed with a RCU lookup on the elected chain.
But, if we chose (port, address) chain, and fail to find a socket on given address,
we must try another lookup on (port, in6addr_any) chain to find sockets not bound
to a particular IP.
-> No extra cost for typical setups, where the first lookup will probabbly
be performed.
RCU lookups everywhere, we dont acquire spinlock.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We first locate the (local port) hash chain head
If few sockets are in this chain, we proceed with previous lookup algo.
If too many sockets are listed, we take a look at the secondary
(port, address) hash chain we added in previous patch.
We choose the shortest chain and proceed with a RCU lookup on the elected chain.
But, if we chose (port, address) chain, and fail to find a socket on given address,
we must try another lookup on (port, INADDR_ANY) chain to find socket not bound
to a particular IP.
-> No extra cost for typical setups, where the first lookup will probabbly
be performed.
RCU lookups everywhere, we dont acquire spinlock.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends udp_table to contain a secondary hash table.
socket anchor for this second hash is free, because UDP
doesnt use skc_bind_node : We define an union to hold
both skc_bind_node & a new hlist_nulls_node udp_portaddr_node
udp_lib_get_port() inserts sockets into second hash chain
(additional cost of one atomic op)
udp_lib_unhash() deletes socket from second hash chain
(additional cost of one atomic op)
Note : No spinlock lockdep annotation is needed, because
lock for the secondary hash chain is always get after
lock for primary hash chain.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Union sk_hash with two u16 hashes for udp (no extra memory taken)
One 16 bits hash on (local port) value (the previous udp 'hash')
One 16 bits hash on (local address, local port) values, initialized
but not yet used. This second hash is using jenkin hash for better
distribution.
Because the 'port' is xored later, a partial hash is performed
on local address + net_hash_mix(net)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a counter in udp_hslot to keep an accurate count
of sockets present in chain.
This will permit to upcoming UDP lookup algo to chose
the shortest chain when secondary hash is added.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christian Pellegrin <chripell@fsfe.org>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no good reason to not support userspace specifying the
network namespace during device creation, and it makes it easier
to create a network device and pass it to a child network namespace
with a well known name.
We have to be careful to ensure that the target network namespace
for the new device exists through the life of the call. To keep
that logic clear I have factored out the network namespace grabbing
logic into rtnl_link_get_net.
In addtion we need to continue to pass the source network namespace
to the rtnl_link_ops.newlink method so that we can find the base
device source network namespace.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
or it will taint the kernel and fail to load becuase
of_address_to_resource() is GPL only.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
On older kernels, e.g. 2.6.27, a WARN_ON dump in rtmsg_ifinfo()
is thrown when the CAN device is registered due to insufficient
skb space, as reported by various users. This patch adds the
rtnl_link_ops "get_size" to fix the problem. I think this patch
is required for more recent kernels as well, even if no WARN_ON
dumps are triggered. Maybe we also need "get_xstats_size" for
the CAN xstats.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
atalk_sum_partial can now use the rol16 function in bitops.h
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using RCU helps not touching device refcount in rawv6_bind()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit fba4ed030c ("gianfar: Add Multiple
Queue Support") introduced the following warnings:
CHECK gianfar.c
gianfar.c:333:8: warning: incorrect type in assignment (different address spaces)
gianfar.c:333:8: expected unsigned int [usertype] *baddr
gianfar.c:333:8: got unsigned int [noderef] <asn:2>*<noident>
[... 67 lines skipped ...]
gianfar.c:2565:3: warning: incorrect type in argument 1 (different type sizes)
gianfar.c:2565:3: expected unsigned long const *addr
gianfar.c:2565:3: got unsigned int *<noident>
CC gianfar.o
gianfar.c: In function 'gfar_probe':
gianfar.c:985: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c:985: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c:993: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c:993: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c: In function 'gfar_configure_coalescing':
gianfar.c:1680: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c:1680: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c:1688: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c:1688: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c: In function 'gfar_poll':
gianfar.c:2565: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c:2565: warning: passing argument 1 of 'find_next_bit' from incompatible pointer type
gianfar.c:2566: warning: passing argument 2 of 'test_bit' from incompatible pointer type
gianfar.c:2585: warning: passing argument 2 of 'set_bit' from incompatible pointer type
Following warnings left unfixed (looks like sparse doesn't like
locks in loops, so __acquires/__releases() doesn't help):
gianfar.c:441:40: warning: context imbalance in 'lock_rx_qs': wrong count at exit
gianfar.c:441:40: context '<noident>': wanted 0, got 1
gianfar.c:449:40: warning: context imbalance in 'lock_tx_qs': wrong count at exit
gianfar.c:449:40: context '<noident>': wanted 0, got 1
gianfar.c:458:3: warning: context imbalance in 'unlock_rx_qs': __context__ statement expected different context
gianfar.c:458:3: context '<noident>': wanted >= 0, got -1
gianfar.c:466:3: warning: context imbalance in 'unlock_tx_qs': __context__ statement expected different context
gianfar.c:466:3: context '<noident>': wanted >= 0, got -1
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes following warnings:
fsl_pq_mdio.c:112:38: warning: cast adds address space to expression (<asn:2>)
fsl_pq_mdio.c:124:38: warning: cast adds address space to expression (<asn:2>)
fsl_pq_mdio.c:133:38: warning: cast adds address space to expression (<asn:2>)
fsl_pq_mdio.c:414:11: warning: cast adds address space to expression (<asn:2>)
Instead of adding __force all over the place, introduce convenient
fsl_pq_mdio_get_regs() call that does the ugly casting just once.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 1d2397d742 ("fsl_pq_mdio: Add
Suport for etsec2.0 devices") introduced the following warnings:
CHECK fsl_pq_mdio.c
fsl_pq_mdio.c:287:22: warning: incorrect type in initializer (different base types)
fsl_pq_mdio.c:287:22: expected unknown type 11 const *__mptr
fsl_pq_mdio.c:287:22: got unsigned long long [unsigned] [assigned] [usertype] addr
fsl_pq_mdio.c:287:19: warning: incorrect type in assignment (different base types)
fsl_pq_mdio.c:287:19: expected unsigned long long [unsigned] [usertype] ioremap_miimcfg
fsl_pq_mdio.c:287:19: got struct fsl_pq_mdio *<noident>
CC fsl_pq_mdio.o
fsl_pq_mdio.c: In function 'fsl_pq_mdio_probe':
fsl_pq_mdio.c:287: warning: initialization makes pointer from integer without a cast
fsl_pq_mdio.c:287: warning: assignment makes integer from pointer without a cast
These warnings are not easy to fix without ugly __force casts. So,
instead of introducing the casts, rework the code to substitute an
offset from an already mapped area. This makes the code a lot simpler
and less duplicated.
Plus, from now on we don't actually map reserved registers on
non-etsec2.0 devices, so we have more chances to catch programming
errors.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bcm_proc_getifname() is called with RTNL and dev_base_lock
not held. It calls __dev_get_by_index() without locks, and
this is illegal (might crash)
Close the race by holding dev_base_lock and copying dev->name
in the protected section.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
All users of wrapped proto_ops are now gone, so we can safely remove
the wrappers as well.
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The x25 driver uses lock_kernel() implicitly through
its proto_ops wrapper. The makes the usage explicit
in order to get rid of that wrapper and to better document
the usage of the BKL.
The next step should be to get rid of the usage of the BKL
in x25 entirely, which requires understanding what data
structures need serialized accesses.
Cc: Henner Eisen <eis@baty.hanse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-x25@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The irda driver uses the BKL implicitly in its protocol
operations. Replace the wrapped proto_ops with explicit
lock_kernel() calls makes the usage more obvious and
shrinks the size of the object code.
The calls t lock_kernel() should eventually all be replaced
by other serialization methods, which requires finding out
The calls t lock_kernel() should eventually all be replaced
by other serialization methods, which requires finding out
which data actually needs protection.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Making the BKL usage explicit in ipx makes it more
obvious where it is used, reduces code size and helps
getting rid of the BKL in common code.
I did not analyse how to kill lock_kernel from ipx
entirely, this will involve either proving that it's not
needed, or replacing with a proper mutex or spinlock,
after finding out which data structures are protected
by the lock.
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Making the BKL usage explicit in appletalk makes it more
obvious where it is used, reduces code size and helps
getting rid of the BKL in common code.
I did not analyse how to kill lock_kernel from appletalk
entirely, this will involve either proving that it's not
needed, or replacing with a proper mutex or spinlock,
after finding out which data structures are protected
by the lock.
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
change atl1c_buffer struct, use "u16 flags" instead of "u16 state"
to store more infomation for atl1c_buffer, and restructure clean
atl1c_buffer procedure, add common api atl1c_clean_buffer.
Signed-off-by: Jie Yang <jie.yang@atheros.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SPIN_LOCK_UNLOCKED is deprecated. Use DEFINE_SPINLOCK instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MII ioctls and SIOCSIFNAME need to go through ifsioc conversion,
which they never did so far. Some others are not implemented in the
native path, so we can just return -EINVAL directly.
Add IFSLAVE ioctls to the EINVAL list and move it to the end to
optimize the code path for the common case.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes the original socket compat_ioctl code
from fs/compat_ioctl.c and converts the code from the copy
in net/socket.c into a single function. We add a few cycles
of runtime to compat_sock_ioctl() with the long switch()
statement, but gain some cycles in return by simplifying
the call chain to get there.
Due to better inlining, save 1.5kb of object size in the
process, and enable further savings:
before:
text data bss dec hex filename
13540 18008 2080 33628 835c obj/fs/compat_ioctl.o
14565 636 40 15241 3b89 obj/net/socket.o
after:
text data bss dec hex filename
8916 15176 2080 26172 663c obj/fs/compat_ioctl.o
20725 636 40 21401 5399 obj/net/socket.o
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We must not have a compat ioctl handler for SIOCATALKDIFADDR
in common code, because the same number is used in other protocols
with different data structures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes an identical copy of the socket compat_ioctl code
from fs/compat_ioctl.c to net/socket.c, as a preparation
for moving the functionality in a way that can be easily
reviewed.
The code is hidden inside of #if 0 and gets activated in the
patch that will make it work.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Slip and a few other drivers use the same ioctl numbers on
tty devices that are normally meant for sockets. This causes
problems with our compat_ioctl handling that tries to convert
the data structures in a different format.
Fortunately, these five drivers all use 32 bit compatible
data structures in the ioctl numbers, so we can just add
a trivial compat_ioctl conversion function to each of them.
SIOCSIFENCAP and SIOCGIFENCAP do not need to live in
fs/compat_ioctl.c after this any more, and they are not
used on any sockets.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tun driver is the only code in the kernel that operates
on a character device with struct ifreq. Change the driver
to handle the conversion itself so we can contain the
remaining ifreq handling in the socket layer.
This also fixes a bug in the handling of invalid ioctl
numbers on an unbound tun device. The driver treats this
as a TUNSETIFF in native mode, but there is no way for
the generic compat_ioctl() function to emulate this
behaviour. Possibly the driver was only doing this
accidentally anyway, but if any code relies on this
misfeature, it now also works in compat mode.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to move socket ioctl conversion code into multiple
places in the socket code, we need a common defintion of
the data structures it uses.
Also change the name from ifreq32 to compat_ifreq to
follow the naming convention for compat.h
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hisax ISDN driver fails to build on ARM with CONFIG_HISAX_ELSA:
| drivers/built-in.o: In function `modem_set_dial':
| drivers/isdn/hisax/elsa_ser.c:535: undefined reference to `__bad_udelay'
| drivers/isdn/hisax/elsa_ser.c:544: undefined reference to `__bad_udelay'
| drivers/built-in.o: In function `modem_set_init':
| drivers/isdn/hisax/elsa_ser.c:486: undefined reference to `__bad_udelay'
| [...]
According to the comment in arch/arm/include/asm/delay.h, __bad_udelay
is specifically designed on ARM to produce a build failure when udelay
is called with a value > 2000.
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 2003 requires the outer header to have DF set if DF is set
on the inner header, even when PMTU discovery is off for the
tunnel. Our implementation does exactly that.
For this to work properly the IPIP gateway also needs to engate
in PMTU when the inner DF bit is set. As otherwise the original
host would not be able to carry out its PMTU successfully since
part of the path is only visible to the gateway.
Unfortunately when the tunnel PMTU discovery setting is off, we
do not collect the necessary soft state, resulting in blackholes
when the original host tries to perform PMTU discovery.
This problem is not reproducible on the IPIP gateway itself as
the inner packet usually has skb->local_df set. This is not
correctly cleared (an unrelated bug) when the packet passes
through the tunnel, which allows fragmentation to occur. For
hosts behind the IPIP gateway it is readily visible with a simple
ping.
This patch fixes the problem by performing PMTU discovery for
all packets with the inner DF bit set, regardless of the PMTU
discovery setting on the tunnel itself.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This device requires a fundamental reset when recovering from EEH.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This line was accidentally left out of the previous commit #
da03945140 ("qlge: Fix firmware mailbox
command timeout.").
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ioatdma was loaded we we were unable to transmit traffic. We weren't
using the correct registers in ixgbe_update_tx_dca for 82599 systems.
Likewise in ixgbe_configure_tx() we weren't disabling the arbiter before
modifying MTQC.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When DCB is enabled, the ixgbe_check_tx_hang() should check the corresponding
TC's TXOFF in TFCS based on the TC that the tx ring belongs to. Adds a
function to map from the tx_ring hw reg_idx to the correspodning TC and read
TFCS accordingly.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 32k gso_max_size when DCB is enabled is for 82598 only, not for 82599.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No-one seems to know where the PowerBook 500 series store their ethernet
MAC addresses. So, rather than crash, use a MAC address from the SONIC
CAM. Failing that, generate a random one.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanse found that one error path in cas_open omits to unlock pm_mutex.
Fix that.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CPC-USB is using a ARM7 core with little endian byte order. The "id" field
in can_msg needs byte order conversion from/to CPU byte order.
Signed-off-by: Sebastian Haas <haas@ems-wuensche.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sending config commands to be2 hardware before netdev_register is
completed, is sometimes causing the async link notification to arrive
even before the driver is ready to handle it. The commands for vlan
config and flow control settings can infact wait till be_open.
This patch takes care of that.
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>