a recent fix to e1000 (commit 15b2bee2) caused KVM/QEMU/VMware based
virtualized e1000 interfaces to begin failing when resetting.
This is because the driver in a virtual environment doesn't
get to run instructions *AT ALL* when an interrupt is asserted.
The interrupt code runs immediately and this recent bug fix
allows an interrupt to be possible when the interrupt handler
will reject it (due to the new code), when being called from
any path in the driver that holds the E1000_RESETTING flag.
the driver should use the __E1000_DOWN flag instead of the
__E1000_RESETTING flag to prevent interrupt execution
while reconfiguring the hardware.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix locking issue in alb MAC address management; removed
incorrect locking and replaced with correct locking. This bug was
introduced in commit 059fe7a578
("bonding: Convert locks to _bh, rework alb locking for new locking")
Bug reported by Paul Smith <paul@mad-scientist.net>, who also
tested the fix.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to a semantic changes in flush_workqueue() the current approach of
synchronizing the sysfs handling for connections doesn't work anymore. The
whole approach is actually fully broken and based on assumptions that are
no longer valid.
With the introduction of Simple Pairing support, the creation of low-level
ACL links got changed. This change invalidates the reason why in the past
two independent work queues have been used for adding/removing sysfs
devices. The adding of the actual sysfs device is now postponed until the
host controller successfully assigns an unique handle to that link. So
the real synchronization happens inside the controller and not the host.
The only left-over problem is that some internals of the sysfs device
handling are not initialized ahead of time. This leaves potential access
to invalid data and can cause various NULL pointer dereferences. To fix
this a new function makes sure that all sysfs details are initialized
when an connection attempt is made. The actual sysfs device is only
registered when the connection has been successfully established. To
avoid a race condition with the registration, the check if a device is
registered has been moved into the removal work.
As an extra protection two flush_work() calls are left in place to
make sure a previous add/del work has been completed first.
Based on a report by Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Justin P. Mattock <justinmattock@gmail.com>
Tested-by: Roger Quadros <ext-roger.quadros@nokia.com>
Tested-by: Marc Pignat <marc.pignat@hevs.ch>
This introduces a CDC Ethernet Emulation Model (EEM) host side
driver to support USB EEM devices.
EEM is different from the Ethernet Control Model (ECM) currently
supported by the "CDC Ethernet" driver. One key difference is
that it doesn't require of USB interface alternate settings to
manage interface state; some maldesigned hardware can't handle
that part of USB. It also avoids a separate USB interface for
control and status updates.
[ dbrownell@users.sourceforge.net: fix skb leaks, add rx packet
checks, improve fault handling, EEM conformance updates, cleanup ]
Signed-off-by: Omar Laazimani <omar.oberthur@gmail.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of
the actual value might be tuned. The following patches fix this and make
tcp_prequeue get the actual value returns from tcp_rto_min().
Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes an invalid pointer access in case the receive queue
holds no pointer to the next skb when the queue is empty.
Signed-off-by: Hannes Hering <hering2@de.ibm.com>
Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doing it in reverse order causes uevent to be sent before
we have a MAC address, which confuses udev.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0ba25ff4c6 ("br2684: convert to
net_device_ops") inadvertently deleted the initialization of the net_dev
pointer in the br2684_dev structure, leading to crashes. This patch
adds it back.
Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel should only be using the high 16 bits of a kernel
generated priority. Filter priorities in all other cases only
use the upper 16 bits of the u32 'prio' field of 'struct tcf_proto',
but when the kernel generates the priority of a filter is saves all
32 bits which can result in incorrect lookup failures when a filter
needs to be deleted or modified.
Signed-off-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were avoiding calling sg_init* on scatterlists passed
into virtnet_send_command to prevent extraneous end markers.
This caused build warnings for uninitialized variables.
Cleanup the code to create proper scatterlists.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the cleanup in bond_create nicer :) Also now the forgotten
free_netdev is called.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio_net.h uses the macro ETH_ALEN which is defined in linux/if_ether.h.
Discovered when hacking on virtio-over-pci patches.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
LAN9512 and LAN9514 are USB hubs with an integrated 10/100 ethernet
controller. Logically this looks like an ethernet controller (similar
to LAN9500) permanently attached to one of the hub's downstream ports.
This patch adds the usb device id of the new ethernet controller to the
smsc95xx driver. This id is the same in both new devices.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SMSC LAN9500 has dual purpose GPIO/LED pins, and by default at power-on
these are configured as GPIOs. This means that if LEDs are fitted they
won't ever light.
This patch sets them to be LED outputs for speed, duplex and
link/activity.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When netconsole is loaded and a network interface fades away (e.g. on
rmmod $interface_driver_module) the rmmod remains stuck and some locks
are taken that prevent any additional module loading/unloading as well
as interface up/down changes.
In addition kernel logs (and console) get flooded at 10s interval with
[ 122.464065] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
[ 132.704059] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
This patch lets netconsole take NETDEV_UNREGISTER event into account
and release the affected interface if it was in use.
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xt_socket can use connection tracking, and checks whether it is a module.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_slave_info_query() should keep a read lock while accessing slave info,
or risk accessing stale data and corruption.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial: fixing gcc 4.4 compiler warning:
drivers/net/cxgb3/t3_hw.c: In function ‘t3_prep_adapter’:
drivers/net/cxgb3/t3_hw.c:3782: warning: suggest parentheses around operand of ‘!’ or change ‘|’ to ‘||’ or ‘!’ to ‘~’
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
Signed-off-by: David S. Miller <davem@davemloft.net>
When skb_rx_queue_recorded() is true, we dont want to use jash distribution
as the device driver exactly told us which queue was selected at RX time.
jhash makes a statistical shuffle, but this wont work with 8 static inputs.
Later improvements would be to compute reciprocal value of real_num_tx_queues
to avoid a divide here. But this computation should be done once,
when real_num_tx_queues is set. This needs a separate patch, and a new
field in struct net_device.
Reported-by: Andrew Dickinson <andrew@whydna.net>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lennert Buytenhek wrote:
> Since 4fb6699481 ("net: Optimize memory
> usage when splicing from sockets.") I'm seeing this oops (e.g. in
> 2.6.30-rc3) when splicing from a TCP socket to /dev/null on a driver
> (mv643xx_eth) that uses LRO in the skb mode (lro_receive_skb) rather
> than the frag mode:
My patch incorrectly assumed skb->sk was always valid, but for
"frag_listed" skbs we can only use skb->sk of their parent.
Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Debugged-by: Lennert Buytenhek <buytenh@wantstofly.org>
Tested-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On several mv643xx_eth hardware versions, the two 64bit mib counters
for 'good octets received' and 'good octets sent' are actually 32bit
counters, and reading from the upper half of the register has the same
effect as reading from the lower half of the register: an atomic
read-and-clear of the entire 32bit counter value. This can under heavy
traffic occasionally lead to small numbers being added to the upper
half of the 64bit mib counter even though no 32bit wrap has occured.
Since we poll the mib counters at least every 30 seconds anyway, we
might as well just skip the reads of the upper halves of the hardware
counters without breaking the stats, which this patch does.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when OOM occurs during rx ring refill, mv643xx_eth will get
into an infinite loop, due to the refill function setting the OOM bit
but not clearing the 'rx refill needed' bit for this queue, while the
calling function (the NAPI poll handler) will call the refill function
in a loop until the 'rx refill needed' bit goes off, without checking
the OOM bit.
This patch fixes this by checking the OOM bit in the NAPI poll handler
before attempting to do rx refill. This means that once OOM occurs,
we won't try to do any memory allocations again until the next invocation
of the poll handler.
While we're at it, change the OOM flag to be a single bit instead of
one bit per receive queue since OOM is a system state rather than a
per-queue state, and cancel the OOM timer on entry to the NAPI poll
handler if it's running to prevent it from firing when we've already
come out of OOM.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
In "mac80211: correct wext transmit power handler"
I fixed the wext handler, but forgot to make the default of the
user_power_level -1 (aka "auto"), so that now the transmit power
is always set to 0, causing associations to time out and similar
problems since we're transmitting with very little power. Correct
this by correcting the default user_power_level to -1.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Bisected-by: Niel Lambrechts <niel.lambrechts@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
- ieee80211_wep_init(), which is called with rtnl_lock held, blocks in
request_module() [waiting for modprobe to load a crypto module].
- modprobe blocks in a call to flush_workqueue(), when it closes a TTY
[presumably when it exits].
- The workqueue item linkwatch_event() blocks on rtnl_lock.
There's no reason for wep_init() to be called with rtnl_lock held, so
just move it outside the critical section.
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
After experimenting with kexec with the last merges after 2.6.29, I've
had some problems when probing e100. It would not read the eeprom. After
some bisects, I realized this has been like that since forever (at least
2.6.18). The problem is that shutdown is doing the same thing that
suspend does and puts the device in D3 state. I couldn't find a way to
get the device back to a sane state in the probe function. So, based on
some similar patches from Rafael J. Wysocki for e1000, e1000e, and ixgbe,
I wrote this one for e100.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The x_tables are organized with a table structure and a per-cpu copies
of the counters and rules. On older kernels there was a reader/writer
lock per table which was a performance bottleneck. In 2.6.30-rc, this
was converted to use RCU and the counters/rules which solved the performance
problems for do_table but made replacing rules much slower because of
the necessary RCU grace period.
This version uses a per-cpu set of spinlocks and counters to allow to
table processing to proceed without the cache thrashing of a global
reader lock and keeps the same performance for table updates.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
char bname[5] is too small for the string "X GHz" when the null
terminator is taken into account. Thus, turning on rate debugging
can crash unless we have lucky stack alignment.
Cc: stable@kernel.org
Reported-by: Paride Legovini <legovini@spiro.fisica.unipd.it>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Under certain circumstances iwlwifi can get stuck and will no
longer accept scan requests, because the core code (cfg80211)
thinks that it's still processing one. This fixes one of the
points where it can happen, but I've still seen it (although
only with my radio-off-when-idle patch).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rndis_wext_link_change() might be called from rndis_command() at
initialization stage and priv->workqueue/priv->work have not been
initialized yet. This causes invalid opcode at rndis_wext_bind on
some brands of bcm4320.
Fix by initializing workqueue/workers in rndis_wext_bind() before
rndis_command is used.
This bug has existed since 2.6.25, reported at:
http://bugzilla.kernel.org/show_bug.cgi?id=12794
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
drivers/net/wireless/iwlwifi/iwl3945-base.c:1415: error: __ksymtab_iwl3945_rx_queue_reset causes a section type conflict
I am pretty sure that this is a compiler bug, so not to worry. However,
as far as I can see, iwl-3945.o (the only user) and iwl3945-base.o are
always linked into the same module, so the EXPORT_SYMBOL (which causes
the problem) should not be needed. Correct?
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The Bluetooth 2.1 specification introduced four different security modes
that can be mapped using Legacy Pairing and Simple Pairing. With the
usage of Simple Pairing it is required that all connections (except
the ones for SDP) are encrypted. So even the low security requirement
mandates an encrypted connection when using Simple Pairing. When using
Legacy Pairing (for Bluetooth 2.0 devices and older) this is not required
since it causes interoperability issues.
To support this properly the low security requirement translates into
different host controller transactions depending if Simple Pairing is
supported or not. However in case of Simple Pairing the command to
switch on encryption after a successful authentication is not triggered
for the low security mode. This patch fixes this and actually makes
the logic to differentiate between Simple Pairing and Legacy Pairing
a lot simpler.
Based on a report by Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth stack uses a reference counting for all established ACL
links and if no user (L2CAP connection) is present, the link will be
terminated to save power. The problem part is the dedicated pairing
when using Legacy Pairing (Bluetooth 2.0 and before). At that point
no user is present and pairing attempts will be disconnected within
10 seconds or less. In previous kernel version this was not a problem
since the disconnect timeout wasn't triggered on incoming connections
for the first time. However this caused issues with broken host stacks
that kept the connections around after dedicated pairing. When the
support for Simple Pairing got added, the link establishment procedure
needed to be changed and now causes issues when using Legacy Pairing
When using Simple Pairing it is possible to do a proper reference
counting of ACL link users. With Legacy Pairing this is not possible
since the specification is unclear in some areas and too many broken
Bluetooth devices have already been deployed. So instead of trying to
deal with all the broken devices, a special pairing timeout will be
introduced that increases the timeout to 60 seconds when pairing is
triggered.
If a broken devices now puts the stack into an unforeseen state, the
worst that happens is the disconnect timeout triggers after 120 seconds
instead of 4 seconds. This allows successful pairings with legacy and
broken devices now.
Based on a report by Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In 2.6.25 we added UDP mem accounting.
This unfortunatly added a penalty when a frame is transmitted, since
we have at TX completion time to call sock_wfree() to perform necessary
memory accounting. This calls sock_def_write_space() and utimately
scheduler if any thread is waiting on the socket.
Thread(s) waiting for an incoming frame was scheduled, then had to sleep
again as event was meaningless.
(All threads waiting on a socket are using same sk_sleep anchor)
This adds lot of extra wakeups and increases latencies, as noted
by Christoph Lameter, and slows down softirq handler.
Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2
Fortunatly, Davide Libenzi recently added concept of keyed wakeups
into kernel, and particularly for sockets (see commit
37e5540b3c
epoll keyed wakeups: make sockets use keyed wakeups)
Davide goal was to optimize epoll, but this new wakeup infrastructure
can help non epoll users as well, if they care to setup an appropriate
handler.
This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
in wait_for_packet(), so that only relevant event can wakeup a thread
blocked in this function.
Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
__kfree_skb()
skb_release_head_state()
sock_wfree()
sock_def_write_space()
__wake_up_sync_key()
__wake_up_common()
receiver_wake_function() : Stops here since thread is waiting for an INPUT
Reported-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now we have no upper limit on the size of the route cache hash table.
On a 128GB POWER6 box it ends up as 32MB:
IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)
It would be nice to cap this for memory consumption reasons, but a massive
hashtable also causes a significant spike when measuring OS jitter.
With a 32MB hashtable and 4 million entries, rt_worker_func is taking
5 ms to complete. On another system with more memory it's taking 14 ms.
Even though rt_worker_func does call cond_sched() to limit its impact,
in an HPC environment we want to keep all sources of OS jitter to a minimum.
With the patch applied we limit the number of entries to 512k which
can still be overriden by using the rt_entries boot option:
IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)
With this patch rt_worker_func now takes 0.460 ms on the same system.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the entry about CAPI 2.0 to the beginning and add a URL.
Incorporate changes suggested by Randy Dunlap, thanks for proofreading.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
isdn: document Kernel CAPI driver interface
Create a file Documentation/isdn/INTERFACE.CAPI describing the
interface between the kernel CAPI subsystem and ISDN device drivers,
analogous to the existing Documentation/isdn/INTERFACE for the old
isdn4linux subsystem. Also add kerneldoc comments to the exported
functions in drivers/isdn/capi/kcapi.c.
Impact: Documentation
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the merging of mISDN, state which files refer only to the
old isdn4linux subsystem. Also add a few missing files.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code writes the PME enabled bit in PCI config space which is
wrong. This was needed for pre-release hardware, and was not removed from
the driver. Also, we need to clear the WUS (wake up status) after we
resume. Otherwise we can't wake for the same event again since it's still
asserted in the hardware. Plus, the multicast lists were being written
improperly, causing multicast WoL to fail.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Stephen Hemminger <shemminger@vyatta.com>
The veth driver will oops if sysfs hooks are open while module is removed.
The net device destructor can not point to code in a module; basically
there are only two possible safe values: NULL - no destructor, or
free_netdev - free on last use
Signed-off-by: David S. Miller <davem@davemloft.net>
When kernel inserts a temporary SA for IKE, it uses the wrong hash
value for dst list. Two hash values were calcultated before: one with
source address and one with a wildcard source address.
Bug hinted by Junwei Zhang <junwei.zhang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the tx_timeout() to properly handle the clean up of the
tx ring. It also sets the tx put pointer back to the correct position to
be in sync with HW.
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we failed to allocate new fragments for receive buffer,
the packet should be dropped and packets should be reused.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>