Currently, netlink_broadcast() reports errors to the caller if no
messages at all were delivered:
1) If, at least, one message has been delivered correctly, returns 0.
2) Otherwise, if no messages at all were delivered due to skb_clone()
failure, return -ENOBUFS.
3) Otherwise, if there are no listeners, return -ESRCH.
With this patch, the caller knows if the delivery of any of the
messages to the listeners have failed:
1) If it fails to deliver any message (for whatever reason), return
-ENOBUFS.
2) Otherwise, if all messages were delivered OK, returns 0.
3) Otherwise, if no listeners, return -ESRCH.
In the current ctnetlink code and in Netfilter in general, we can add
reliable logging and connection tracking event delivery by dropping the
packets whose events were not successfully delivered over Netlink. Of
course, this option would be settable via /proc as this approach reduces
performance (in terms of filtered connections per seconds by a stateful
firewall) but providing reliable logging and event delivery (for
conntrackd) in return.
This patch also changes some clients of netlink_broadcast() that
may report ENOBUFS errors via printk. This error handling is not
of any help. Instead, the userspace daemons that are listening to
those netlink messages should resync themselves with the kernel-side
if they hit ENOBUFS.
BTW, netlink_broadcast() clients include those that call
cn_netlink_send(), nlmsg_multicast() and genlmsg_multicast() since they
internally call netlink_broadcast() and return its error value.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous fix ad0f990444 (gro:
Fix handling of imprecisely split packets) only fixed the case
of frags merging, frag_list merging in the same circumstances
were still broken.
In particular, the packet headers end up in the data stream.
This patch fixes this plus another issue where an imprecisely
split packet header may be read incorrectly (this is mostly
harmless since it'll simply cause the packet to not match and
be rejected for GRO).
Thanks to Emil Tantilov and Jeff Kirsher for helping to track
this down.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function sock_alloc_send_pskb is completely useless if not
exported since most of the code in it won't be used as is. In
fact, this code has already been duplicated in the tun driver.
Now that we need accounting in the tun driver, we can in fact
use this function as is. So this patch marks it for export again.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it currently stands, skb destructors are forbidden on the
receive path because the protocol end-points will overwrite
any existing destructor with their own.
This is the reason why we have to call skb_orphan in the loopback
driver before we reinject the packet back into the stack, thus
creating a period during which loopback traffic isn't charged
to any socket.
With virtualisation, we have a similar problem in that traffic
is reinjected into the stack without being associated with any
socket entity, thus providing no natural congestion push-back
for those poor folks still stuck with UDP.
Now had we been consistent in telling them that UDP simply has
no congestion feedback, I could just fob them off. Unfortunately,
we appear to have gone to some length in catering for this on
the standard UDP path, with skb/socket accounting so that has
created a very unhealthy dependency.
Alas habits are difficult to break out of, so we may just have
to allow skb destructors on the receive path.
It turns out that making skb destructors useable on the receive path
isn't as easy as it seems. For instance, simply adding skb_orphan
to skb_set_owner_r isn't enough. This is because we assume all
over the IP stack that skb->sk is an IP socket if present.
The new transparent proxy code goes one step further and assumes
that skb->sk is the receiving socket if present.
Now all of this can be dealt with by adding simple checks such
as only treating skb->sk as an IP socket if skb->sk->sk_family
matches. However, it turns out that for bridging at least we
don't need to do all of this work.
This is of interest because most virtualisation setups use bridging
so we don't actually go through the IP stack on the host (with
the exception of our old nemesis the bridge netfilter, but that's
easily taken care of).
So this patch simply adds skb_orphan to the point just before we
enter the IP stack, but after we've gone through the bridge on the
receive path. It also adds an skb_orphan to the one place in
netfilter that touches skb->sk/skb->destructor, that is, tproxy.
One word of caution, because of the internal code structure, anyone
wishing to deploy this must use skb_set_owner_w as opposed to
skb_set_owner_r since many functions that create a new skb from
an existing one will invoke skb_set_owner_w on the new skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 93821778de (udp: Fix rcv socket
locking) accidentally removed sk_drops increments for UDP IPV4
sockets.
This field can be used to detect incorrect sizing of socket receive
buffers.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_alloc now sets sk_family so this is redundant. In fact it caught
my eye because sock_init_data already uses sk_family so this is too
late anyway.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
And switch bsockets to atomic_t since it might be changed in parallel.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
packet_lookup_frames() fails to get user frame if current frame header
status contains extra flags.
This is due to the wrong assumption on the operators precedence during
frame status tests.
Fixed by forcing the right operators precedence order with explicit brackets.
Signed-off-by: Paolo Abeni <paolo.abeni@gmail.com>
Signed-off-by: Sebastiano Di Paola <sebastiano.dipaola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Stephen Hemminger <shemminger@vyatta.com>
Fix regression introduced by a9d8f9110d
("inet: Allowing more than 64k connections and heavily optimize
bind(0) time.")
Based upon initial patches and feedback from Evegniy Polyakov and
Eric Dumazet.
From Eric Dumazet:
--------------------
Also there might be a problem at line 175
if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) {
spin_unlock(&head->lock);
goto again;
If we entered inet_csk_get_port() with a non null snum, we can "goto again"
while it was not expected.
--------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 89a1b249edcf9be884e71f92df84d48355c576aa (gro: Avoid
copying headers of unmerged packets) only worked for packets
which are either completely linear, completely non-linear, or
packets which exactly split at the boundary between headers and
payload.
Anything else would cause bits in the header to go missing if
the packet is held by GRO.
This may have broken drivers such as ixgbe.
This patch fixes the places that assumed or only worked with
the above cases.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy <kaber@trash.net> suggested using a workqueue instead
of hrtimers to trigger netif_schedule() when there is a problem with
setting exact time of this event: 'The differnce - yeah, it shouldn't
make much, mainly wake up the qdisc earlier (but not too early) after
"too many events" occured _and_ no further enqueue events wake up the
qdisc anyways.'
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let's get some info on possible config problems. This patch brings
back an old warning, but is printed only once now.
With feedback from Patrick McHardy <kaber@trash.net>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy <kaber@trash.net> suggested:
> How about making this flag and the warning message (in a out-of-line
> function) globally available? Other qdiscs (f.i. HFSC) can't deal with
> inner non-work-conserving qdiscs as well.
This patch uses qdisc->flags field of "suspected" child qdisc.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds another inet device option to enable gratuitous ARP
when device is brought up or address change. This is handy for
clusters or virtualization.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Base versions handle constant folding now.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent fix of data corruption when splicing from sockets uses
memory very inefficiently allocating a new page to copy each chunk of
linear part of skb. This patch uses the same page until it's full
(almost) by caching the page in sk_sndmsg_page field.
With changes from David S. Miller <davem@davemloft.net>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
igb: fix link reporting when using sgmii
igb: prevent skb_over panic w/ mtu smaller than 1K
igb: Fix DCA errors and do not use context index for 82576
ipv6: compile fix for ip6mr.c
packet: Avoid lock_sock in mmap handler
sfc: Replace stats_enabled flag with a disable count
sfc: SFX7101/SFT9001: Fix AN advertisements
sfc: SFT9001: Always enable XNP exchange on SFT9001 rev B
sfc: Update board info for hardware monitor on SFN4111T-R5 and later
sfc: Test for PHYXS faults whenever we cannot test link state bits
sfc: Reinitialise the PHY completely in case of a PHY or NIC reset
sfc: Fix post-reset MAC selection
sfc: SFN4111T: Fix GPIO sharing between I2C and FLASH_CFG_1
sfc: SFT9001: Fix speed reporting in 1G PHY loopback
sfc: SFX7101: Remove workaround for bad link training
sfc: SFT9001: Enable robust link training
sky2: fix hard hang with netconsoling and iface going up
net/ipv6/ip6mr.c: In function 'pim6_rcv':
net/ipv6/ip6mr.c:368: error: implicit declaration of function 'csum_ipv6_magic'
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the mmap handler gets called under mmap_sem, and we may grab
mmap_sem elsewhere under the socket lock to access user data, we
should avoid grabbing the socket lock in the mmap handler.
Since the only thing we care about in the mmap handler is for
pg_vec* to be invariant, i.e., to exclude packet_set_ring, we
can achieve this by simply using a new mutex.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Martin MOKREJŠ <mmokrejs@ribosome.natur.cuni.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the dynamic power save timer has been started before the power save
is disabled using iwconfig, we fail to cancel the timer. Hence cancel it
while disabling power save.
Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
tulip: fix 21142 with 10Mbps without negotiation
drivers/net/skfp: if !capable(CAP_NET_ADMIN): inverted logic
gianfar: Fix Wake-on-LAN support
smsc911x: timeout reaches -1
smsc9420: fix interrupt signalling test failures
ucc_geth: Change uec phy id to the same format as gianfar's
wimax: fix build issue when debugfs is disabled
netxen: fix memory leak in drivers/net/netxen_nic_init.c
tun: Add some missing TUN compat ioctl translations.
ipv4: fix infinite retry loop in IP-Config
net: update documentation ip aliases
net: Fix OOPS in skb_seq_read().
net: Fix frag_list handling in skb_seq_read
netxen: revert jumbo ringsize
ath5k: fix locking in ath5k_config
cfg80211: print correct intersected regulatory domain
cfg80211: Fix sanity check on 5 GHz when processing country IE
iwlwifi: fix kernel oops when ucode DMA memory allocation failure
rtl8187: Fix error in setting OFDM power settings for RTL8187L
mac80211: remove Michael Wu as maintainer
...
As reported by Toralf Förster and Randy Dunlap.
- http://linuxwimax.org/pipermail/wimax/2009-January/000460.html
- http://lkml.org/lkml/2009/1/29/279
The definitions needed for the wimax stack and i2400m driver debug
infrastructure was, by mistake, compiled depending on CONFIG_DEBUG_FS
(by them being placed in the debugfs.c files); thus the build broke in
2.6.29-rc3 when debugging was enabled (CONFIG_WIMAX_DEBUG) and
DEBUG_FS was disabled.
These definitions are always needed if debug is enabled at compile
time (independently of DEBUG_FS being or not enabled), so moving them
to a file that is always compiled fixes the issue.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch optimises napi_fraginfo_skb to only copy the bits
necessary. We also open-code the memcpy so that the alignment
information is always available to gcc.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
gro: Do not merge paged packets into frag_list
Bigger is not always better :)
It was easy to continue to merged packets into frag_list after the
page array is full. However, this turns out to be worse than LRO
because frag_list is a much less efficient form of storage than the
page array. So we're better off stopping the merge and starting
a new entry with an empty page array.
In future we can optimise this further by doing frag_list merging
but making sure that we continue to fill in the page array.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unfortunately simplicity isn't always the best. The fraginfo
interface turned out to be suboptimal. The problem was quite
obvious. For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.
LRO didn't have this problem because it directly read the headers
from the frags structure.
This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it. Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently VLAN still has a bit of common code handling the aftermath
of GRO that's shared with the common path. This patch moves them
into shared helpers to reduce code duplication.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It oopsd for me in skb_seq_read. addr2line said it was
linux-2.6/net/core/skbuff.c:2228, which is this line:
while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {
I added some printks in there and it looks like we hit this:
} else if (st->root_skb == st->cur_skb &&
skb_shinfo(st->root_skb)->frag_list) {
st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
st->frag_idx = 0;
goto next_skb;
}
Actually I did some testing and added a few printks and found that the
st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null.
This caused the kernel panic.
if (abs_offset < block_limit) {
- *data = st->cur_skb->data + abs_offset;
+ *data = st->cur_skb->data + (abs_offset - st->stepped_offset);
I enabled the debug_tcp and with a few printks found that the code did
not go to the next_skb label and could find that the sequence being
followed was this -
It hit this if condition -
if (st->cur_skb->next) {
st->cur_skb = st->cur_skb->next;
st->frag_idx = 0;
goto next_skb;
And so, now the st pointer is shifted to the next skb whereas actually
it should have hit the second else if first since the data is in the
frag_list.
else if (st->root_skb == st->cur_skb &&
skb_shinfo(st->root_skb)->frag_list) {
st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
goto next_skb;
}
Reversing the two conditions the attached patch fixes the issue for me
on top of Herbert's patches.
Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The frag_list handling was broken in skb_seq_read:
1) We didn't add the stepped offset when looking at the head
are of fragments other than the first.
2) We didn't take the stepped offset away when setting the data
pointer in the head area.
3) The frag index wasn't reset.
This patch fixes both issues.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the regulatory domain is already set it is technically not an error
so do not pass an errno to userspace.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch enables low-level driver independent debugging of the TSF and remove the driver specific things of ath5k and ath9k from the debugfs.
Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Using only the RTNL has a number of problems, most notably that
ieee80211_iterate_active_interfaces() and other interface list
traversals cannot be done from the internal workqueue because it
needs to be flushed under the RTNL.
This patch introduces a new mutex that protects the interface list
against modifications. A more detailed explanation is part of the
code change.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers can theoretically queue more work in one of their callbacks
from mac80211 suspend, so let's flush it once more to be on the safe
side, just before calling ->stop().
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
"mac80211: make workqueue freezable" made the mac80211
workqueue freezeable to prevent us from doing any work after the
driver went away. This was fine before mac80211 had any suspend
support.
However, now we want to flush this workqueue in suspend(). Because
the thread for a freezeable workqueue is stopped before the device
class suspend() is called, flush_workqueue() will hang in the
suspend-to-disk case.
Converting it back to a non-freezeable queue will keep suspend from
hanging. Moreover, since we flush the workqueue under RTNL and
userspace is stopped, there won't be any new work in the workqueue
until after resume. Thus we still don't have to worry about pinging
the AP without hardware.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch cleanup the fixed BSSID handling, that
ieee80211_sta_set_bssid() works like ieee80211_sta_set_ssid(). So
that the BSSID is only a second selection criterion besides the
SSID. This allows us to create new IBSS networks with fixed BSSIDs,
which was broken before.
In the second version of this patch the handling of the stupid merges
to the same BSSID is moved out to get reworked into an other patch.
And this version hopefully solves the problems with some low-level
drivers and re-adds the config BSSID warning to help debugging the
low-level drivers.
Much thanks to all who have helped testing! :)
Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds an low-level driver independent entry to read the TSF value into the debugfs of mac80211. This makes debugging the IBSS handling of wifi drivers easier.
Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Let users be more compliant if so desired.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If a driver is given a wiphy and it wants to get to its private
mac80211 driver area it can use wiphy_to_ieee80211_hw() to get first
to its ieee80211_hw and then access the private structure via hw->priv. The
wiphy_priv() is already being used internally by mac80211 and drivers
should not use this. This can be helpful in a drivers reg_notifier().
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This allows drivers to request strict regulatory settings to
be applied to its devices. This is desirable for devices where
proper calibration and compliance can only be gauranteed for
for the device's programmed regulatory domain. Regulatory
domain settings will be ignored until the device's own
regulatory domain is properly configured. If no regulatory
domain is received only the world regulatory domain will be
applied -- if OLD_REG (default to "US") is not enabled. If
OLD_REG behaviour is not acceptable to drivers they must
update their wiphy with a custom reuglatory prior to wiphy
registration.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers may need more information than just who set the last regulatory domain,
as such lets just pass the last regulatory_request receipt. To do this we need
to move out to headers struct regulatory_request, and enum environment_cap. While
at it lets add documentation for enum environment_cap.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This ensures that the initial REGDOM_SET_BY_CORE upon wiphy registration
respects the wiphy->custom_regulatory setting. Without this and if OLD_REG
is disabled (which will be default soon as we remove it) the
wiphy->custom_regulatory is simply ignored.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers without firmware can also have custom regulatory maps
which do not map to a specific ISO / IEC alpha2 country code.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>