Commit graph

12699 commits

Author SHA1 Message Date
Pablo Neira Ayuso
274d383b9c netfilter: conntrack: don't report events on module removal
During the module removal there are no possible event listeners
since ctnetlink must be removed before to allow removing
nf_conntrack. This patch removes the event reporting for the
module removal case which is not of any use in the existing code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:08:38 +02:00
Pablo Neira Ayuso
03b64f518a netfilter: ctnetlink: cleanup message-size calculation
This patch cleans up the message calculation to make it similar
to rtnetlink, moreover, it removes unneeded verbose information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:08:27 +02:00
Pablo Neira Ayuso
96bcf938dc netfilter: ctnetlink: use nlmsg_* helper function to build messages
Replaces the old macros to build Netlink messages with the
new nlmsg_*() helper functions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:07:39 +02:00
Pablo Neira Ayuso
f2f3e38c63 netfilter: ctnetlink: rename tuple() by nf_ct_tuple() macro definition
This patch move the internal tuple() macro definition to the
header file as nf_ct_tuple().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:03:35 +02:00
Pablo Neira Ayuso
8b0a231d4d netfilter: ctnetlink: remove nowait parameter from *fill_info()
This patch is a cleanup, it removes the `nowait' parameter
from all *fill_info() function since it is always set to one.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:03:34 +02:00
Pablo Neira Ayuso
f49c857ff2 netfilter: nfnetlink: cleanup for nfnetlink_rcv_msg() function
This patch cleans up the message handling path in two aspects:

 * it uses NLMSG_LENGTH() instead of NLMSG_SPACE() like rtnetlink
does in this case to check if there is enough room for the
Netlink/nfnetlink headers. No need to check for the padding room.

 * it removes a redundant header size checking that has been
 already do at the beginning of the function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:03:33 +02:00
Linus Torvalds
ca55bd7e29 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup
  e1000: add missing length check to e1000 receive routine
  forcedeth: add phy_power_down parameter, leave phy powered up by default (v2)
  Bluetooth: Remove useless flush_work() causing lockdep warnings
2009-06-02 09:49:06 -07:00
Jozsef Kadlecsik
874ab9233e netfilter: nf_ct_tcp: TCP simultaneous open support
The patch below adds supporting TCP simultaneous open to conntrack. The
unused LISTEN state is replaced by a new state (SYN_SENT2) denoting the
second SYN sent from the reply direction in the new case. The state table
is updated and the function tcp_in_window is modified to handle
simultaneous open.

The functionality can fairly easily be tested by socat. A sample tcpdump
recording

23:21:34.244733 IP (tos 0x0, ttl 64, id 49224, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xe75f (correct), 3383710133:3383710133(0) win 5840 <mss 1460,sackOK,timestamp 173445629 0,nop,wscale 7>
23:21:34.244783 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.0.1.2020 > 192.168.0.254.2020: R, cksum 0x0253 (correct), 0:0(0) ack 3383710134 win 0
23:21:36.038680 IP (tos 0x0, ttl 64, id 28092, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.1.2020 > 192.168.0.254.2020: S, cksum 0x704b (correct), 2634546729:2634546729(0) win 5840 <mss 1460,sackOK,timestamp 824213 0,nop,wscale 1>
23:21:36.038777 IP (tos 0x0, ttl 64, id 49225, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xb179 (correct), 3383710133:3383710133(0) ack 2634546730 win 5840 <mss 1460,sackOK,timestamp 173447423 824213,nop,wscale 7>
23:21:36.038847 IP (tos 0x0, ttl 64, id 28093, offset 0, flags [DF], proto TCP (6), length 52) 192.168.0.1.2020 > 192.168.0.254.2020: ., cksum 0xebad (correct), ack 3383710134 win 2920 <nop,nop,timestamp 824213 173447423>

and the corresponding netlink events:

    [NEW] tcp      6 120 SYN_SENT src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 [UNREPLIED] src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
 [UPDATE] tcp      6 120 LISTEN src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
 [UPDATE] tcp      6 60 SYN_RECV src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020 [ASSURED]

The RST packet was dropped in the raw table, thus it did not reach
conntrack.  nfnetlink_conntrack is unpatched so it shows the new SYN_SENT2
state as the old unused LISTEN.

With TCP simultaneous open support we satisfy REQ-2 in RFC 5382  ;-) .

Additional minor correction in this patch is that in order to catch
uninitialized reply directions, "td_maxwin == 0" is used instead of
"td_end == 0" because the former can't be true except in uninitialized
state while td_end may accidentally be equal to zero in the mid of a
connection.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-02 13:58:56 +02:00
Patrick McHardy
8cc848fa34 Merge branch 'master' of git://dev.medozas.de/linux 2009-06-02 13:44:56 +02:00
Minoru Usui
12186be7d2 net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup
This patch fixes a bug which unconfigured struct tcf_proto keeps
chaining in tc_ctl_tfilter(), and avoids kernel panic in
cls_cgroup_classify() when we use cls_cgroup.

When we execute 'tc filter add', tcf_proto is allocated, initialized
by classifier's init(), and chained.  After it's chained,
tc_ctl_tfilter() calls classifier's change().  When classifier's
change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.

In addition, cls_cgroup is initialized in change() not in init().  It
accesses unconfigured struct tcf_proto which is chained before
change(), then hits Oops.

Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-02 02:17:34 -07:00
Nivedita Singhvi
f771bef980 ipv4: New multicast-all socket option
After some discussion offline with Christoph Lameter and David Stevens
regarding multicast behaviour in Linux, I'm submitting a slightly
modified patch from the one Christoph submitted earlier.

This patch provides a new socket option IP_MULTICAST_ALL.

In this case, default behaviour is _unchanged_ from the current
Linux standard. The socket option is set by default to provide
original behaviour. Sockets wishing to receive data only from
multicast groups they join explicitly will need to clear this
socket option.

Signed-off-by: Nivedita Singhvi <niv@us.ibm.com>
Signed-off-by: Christoph Lameter<cl@linux.com>
Acked-by: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-02 00:45:24 -07:00
Eric Dumazet
4d52cfbef6 net: ipv4/ip_sockglue.c cleanups
Pure cleanups

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-02 00:42:16 -07:00
Brian Haley
dae9de8e13 IPv6: Print error value when skb allocation fails
Print-out the error value when sock_alloc_send_skb() fails in
the IPv6 neighbor discovery code - can be useful for debugging.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-02 00:20:26 -07:00
Rémi Denis-Courmont
bbd5898d39 Phonet: fix accounting race between gprs_writeable() and gprs_xmit()
In the unlikely event that gprs_writeable() and gprs_xmit() check for
writeability at the same, we could stop the device queue forever.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-02 00:17:43 -07:00
David S. Miller
fc23ffe075 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2009-06-01 14:32:08 -07:00
Linus Torvalds
6e42910184 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  3c509: Add missing EISA IDs
  MAINTAINERS: take maintainership of the cpmac Ethernet driver
  net/firmare: Ignore .cis files
  ath1e: add new device id for asus hardware
  mlx4_en: Fix a kernel panic when waking tx queue
  rtl8187: add USB ID for Linksys WUSB54GC-EU v2 USB wifi dongle
  at76c50x-usb: avoid mutex deadlock in at76_dwork_hw_scan
  mac8390: fix build with NET_POLL_CONTROLLER
  cxgb3: link fault fixes
  cxgb3: fix dma mapping regression
  netfilter: nfnetlink_log: fix wrong skbuff size	calculation
  netfilter: xt_hashlimit does a wrong SEQ_SKIP
  bfin_mac: fix build error due to net_device_ops convert
  atlx: move modinfo data from atlx.h to atl1.c
  gianfar: fix babbling rx error event bug
  cls_cgroup: read classid atomically in classifier
  netfilter: nf_ct_dccp: add missing DCCP protocol changes in event cache
  netfilter: nf_ct_tcp: fix accepting invalid RST segments
2009-06-01 08:02:05 -07:00
Brian Haley
56d417b12e IPv6: Add 'autoconf' and 'disable_ipv6' module parameters
Add 'autoconf' and 'disable_ipv6' parameters to the IPv6 module.

The first controls if IPv6 addresses are autoconfigured from
prefixes received in Router Advertisements.  The IPv6 loopback
(::1) and link-local addresses are still configured.

The second controls if IPv6 addresses are desired at all.  No
IPv6 addresses will be added to any interfaces.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-01 03:07:33 -07:00
Jiri Pirko
ccffad25b5 net: convert unicast addr list
This patch converts unicast address list to standard list_head using
previously introduced struct netdev_hw_addr. It also relaxes the
locking. Original spinlock (still used for multicast addresses) is not
needed and is no longer used for a protection of this list. All
reading and writing takes place under rtnl (with no changes).

I also removed a possibility to specify the length of the address
while adding or deleting unicast address. It's always dev->addr_len.

The convertion touched especially e1000 and ixgbe codes when the
change is not so trivial.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 drivers/net/bnx2.c               |   13 +--
 drivers/net/e1000/e1000_main.c   |   24 +++--
 drivers/net/ixgbe/ixgbe_common.c |   14 ++--
 drivers/net/ixgbe/ixgbe_common.h |    4 +-
 drivers/net/ixgbe/ixgbe_main.c   |    6 +-
 drivers/net/ixgbe/ixgbe_type.h   |    4 +-
 drivers/net/macvlan.c            |   11 +-
 drivers/net/mv643xx_eth.c        |   11 +-
 drivers/net/niu.c                |    7 +-
 drivers/net/virtio_net.c         |    7 +-
 drivers/s390/net/qeth_l2_main.c  |    6 +-
 drivers/scsi/fcoe/fcoe.c         |   16 ++--
 include/linux/netdevice.h        |   18 ++--
 net/8021q/vlan.c                 |    4 +-
 net/8021q/vlan_dev.c             |   10 +-
 net/core/dev.c                   |  195 +++++++++++++++++++++++++++-----------
 net/dsa/slave.c                  |   10 +-
 net/packet/af_packet.c           |    4 +-
 18 files changed, 227 insertions(+), 137 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29 22:12:32 -07:00
Ilpo Järvinen
2df9001edc tcp: fix loop in ofo handling code and reduce its complexity
Somewhat luckily, I was looking into these parts with very fine
comb because I've made somewhat similar changes on the same
area (conflicts that arose weren't that lucky though). The loop
was very much overengineered recently in commit 915219441d
(tcp: Use SKB queue and list helpers instead of doing it
by-hand), while it basically just wants to know if there are
skbs after 'skb'.

Also it got broken because skb1 = skb->next got translated into
skb1 = skb1->next (though abstracted) improperly. Note that
'skb1' is pointing to previous sk_buff than skb or NULL if at
head. Two things went wrong:
- We'll kfree 'skb' on the first iteration instead of the
  skbuff following 'skb' (it would require required SACK reneging
  to recover I think).
- The list head case where 'skb1' is NULL is checked too early
  and the loop won't execute whereas it previously did.

Conclusion, mostly revert the recent changes which makes the
cset very messy looking but using proper accessor in the
previous-like version.

The effective changes against the original can be viewed with:
  git-diff 915219441d566f1da0caa0e262be49b666159e17^ \
		net/ipv4/tcp_input.c | sed -n -e '57,70 p'

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29 15:02:29 -07:00
Linus Torvalds
c8bce3d3bd Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
  svcrdma: dma unmap the correct length for the RPCRDMA header page.
  nfsd: Revert "svcrpc: take advantage of tcp autotuning"
  nfsd: fix hung up of nfs client while sync write data to nfs server
2009-05-29 08:49:09 -07:00
Eric Dumazet
108bfa895c net: unset IFF_XMIT_DST_RELEASE in ipgre_tunnel_setup()
ipgre_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit()
to no release it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29 01:46:29 -07:00
Eric Dumazet
a489e51c1a atm: unset IFF_XMIT_DST_RELEASE in clip_setup()
clip_start_xmit() needs skb->dst so tell dev_hard_start_xmit()
to no release it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29 01:46:28 -07:00
Eric Dumazet
28e72216d7 net: unset IFF_XMIT_DST_RELEASE in ipip_tunnel_setup()
ipip_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit()
to no release it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29 01:46:27 -07:00
David S. Miller
3f1f39c42b Merge branch 'linux-2.6.31.y' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax 2009-05-29 01:41:32 -07:00
David S. Miller
dfe9a83798 llc: Kill outdated and incorrect comment.
This comment suggested storing two pieces of state in the
LLC skb control block, and in fact we do.  Someone did
the implementation but never killed this todo comment :-)

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-28 23:31:56 -07:00
David S. Miller
528be7ff82 irda: Use SKB queue and list helpers instead of doing it by-hand.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-28 23:26:33 -07:00
David S. Miller
915219441d tcp: Use SKB queue and list helpers instead of doing it by-hand.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-28 21:35:47 -07:00
Paulius Zaleckas
7f0333eb2f wimax: Add netlink interface to get device state
wimax connection manager / daemon has to know what is current
state of the device. Previously it was only possible to get
notification whet state has changed.

Note:

 By mistake, the new generic netlink's number for
 WIMAX_GNL_OP_STATE_GET was declared inserting into the existing list
 of API calls, not appending; thus, it'd break existing API.

 Fixed by Inaky Perez-Gonzalez <inaky@linux.intel.com> by moving to
 the tail, where we add to the interface, not modify the interface.

 Thanks to Stephen Hemminger <shemminger@vyatta.com> for catching this.

Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
2009-05-28 18:02:20 -07:00
Inaky Perez-Gonzalez
52a8d96308 wimax: document why wimax_msg_*() operations can be used in any state
Funcion documentation for wimax_msg_alloc() and wimax_msg_send() needs
to clarify that they can be used in the very early stages of a
wimax_dev lifecycle.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
2009-05-28 18:02:04 -07:00
David S. Miller
de1033428b econet: Use SKB queue and list helpers instead of doing it by-hand.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-28 16:46:29 -07:00
David S. Miller
bec571ec76 decnet: Use SKB queue and list helpers instead of doing it by-hand.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-28 16:43:52 -07:00
David S. Miller
b6211ae7f2 atm: Use SKB queue and list helpers instead of doing it by-hand.
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-28 16:36:47 -07:00
Steve Wise
98779be861 svcrdma: dma unmap the correct length for the RPCRDMA header page.
The svcrdma module was incorrectly unmapping the RPCRDMA header page.
On IBM pserver systems this causes a resource leak that results in
running out of bus address space (10 cthon iterations will reproduce it).
The code was mapping the full page but only unmapping the actual header
length.  The fix is to only map the header length.

I also cleaned up the use of ib_dma_map_page() calls since the unmap
logic always uses ib_dma_unmap_single().  I made these symmetrical.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 18:57:24 -04:00
David S. Miller
4d3383d0ad Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-05-27 15:51:25 -07:00
J. Bruce Fields
7f4218354f nfsd: Revert "svcrpc: take advantage of tcp autotuning"
This reverts commit 47a14ef1af "svcrpc:
take advantage of tcp autotuning", which uncovered some further problems
in the server rpc code, causing significant performance regressions in
common cases.

We will likely reinstate this patch after releasing 2.6.30 and applying
some work on the underlying fixes to the problem (developed by Trond).

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Olga Kornievskaia <aglo@citi.umich.edu>
Cc: Jim Rees <rees@umich.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-27 18:51:06 -04:00
Eric Dumazet
2a91525c20 net: net/core/sock.c cleanup
Pure style cleanup patch.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 15:47:07 -07:00
Eric Dumazet
1ce8e7b57b net: ALIGN/PTR_ALIGN cleanup in alloc_netdev_mq()/netdev_priv()
Use ALIGN() and PTR_ALIGN() macros instead of handcoding them.

Get rid of NETDEV_ALIGN_CONST ugly define

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 15:47:06 -07:00
Jiri Pirko
0bb32417ff bridge: avoid an extra space in br_fdb_update()
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 15:46:54 -07:00
Pablo Neira Ayuso
a17c859849 netfilter: conntrack: add support for DCCP handshake sequence to ctnetlink
This patch adds CTA_PROTOINFO_DCCP_HANDSHAKE_SEQ that exposes
the u64 handshake sequence number to user-space.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-27 17:50:35 +02:00
Pablo Neira Ayuso
eeff9beec3 netfilter: nfnetlink_log: fix wrong skbuff size calculation
This problem was introduced in 72961ecf84
since no space was reserved for the new attributes NFULA_HWTYPE,
NFULA_HWLEN and NFULA_HWHEADER.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-27 15:49:11 +02:00
Jesper Dangaard Brouer
683a04cebc netfilter: xt_hashlimit does a wrong SEQ_SKIP
The function dl_seq_show() returns 1 (equal to SEQ_SKIP) in case
a seq_printf() call return -1.  It should return -1.

This SEQ_SKIP behavior brakes processing the proc file e.g. via a
pipe or just through less.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-27 15:45:34 +02:00
Herbert Xu
a2a804cddf tcp: Do not check flush when comparing options for GRO
There is no need to repeatedly check flush when comparing TCP
options for GRO as it will be false 99% of the time where it
matters.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:05 -07:00
Herbert Xu
9aaa156cf9 gro: Store shinfo in local variable in skb_gro_receive
This patch stores the two shinfo pointers in local variables
because they're used over and over again in skb_gro_receive.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:05 -07:00
Herbert Xu
66e92fcf1d gro: Nasty optimisations for page frags in skb_gro_receive
This patch reverses the direction of the frags array copy in
skb_gro_receive in order simplify the loop conditional.  It
also avoids touching the first element of the original frags
array.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:04 -07:00
Herbert Xu
cb18978cbf gro: Open-code final pskb_may_pull
As we know the only packets which need the final pskb_may_pull
are completely non-linear, and have all the required bits in
frag0, we can perform a straight memcpy instead of going through
pskb_may_pull and doing skb_copy_bits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:02 -07:00
Herbert Xu
1075f3f65d ipv4: Use 32-bit loads for ID and length in GRO
This patch optimises the IPv4 GRO code by using 32-bit loads
(instead of 16-bit ones) on the ID and length checks in the receive
function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:02 -07:00
Herbert Xu
a5b1cf288d gro: Avoid unnecessary comparison after skb_gro_header
For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL.  Yet we must check it because of its current
form.  This patch splits it up into multiple functions in order
to avoid this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:01 -07:00
Herbert Xu
7489594cb2 gro: Optimise length comparison in skb_gro_header
By caching frag0_len, we can avoid checking both frag0 and the
length separately in skb_gro_header.  This helps as skb_gro_header
is called four times per packet which amounts to a few million
times at 10Gb/s.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:01 -07:00
Herbert Xu
30a3ae30c7 tcp: Optimise len/mss comparison
Instead of checking len > mss || len == 0, we can accomplish
both by checking (len - 1) > mss using the unsigned wraparound.
At nearly a million times a second, this might just help.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:26:00 -07:00
Herbert Xu
4a9a2968a1 tcp: Remove unnecessary window comparisons for GRO
The window has already been checked as part of the flag word
so there is no need to check it explicitly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:59 -07:00
Herbert Xu
745898eaf0 tcp: Optimise GRO port comparisons
Instead of doing two 16-bit operations for the source/destination
ports, we can do one 32-bit operation to take care both.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:57 -07:00
Herbert Xu
78d3fd0b7d gro: Only use skb_gro_header for completely non-linear packets
Currently skb_gro_header is used for packets which put the hardware
header in skb->data with the rest in frags.  Since the drivers that
need this optimisation all provide completely non-linear packets,
we can gain extra optimisations by only performing the frag0
optimisation for completely non-linear packets.

In particular, we can simply test frag0 (instead of skb_headlen)
to see whether the optimisation is in force.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:57 -07:00
Herbert Xu
67147ba99a gro: Localise offset/headlen in skb_gro_offset
This patch stores the offset/headlen in local variables as they're
used repeatedly in skb_gro_offset.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:55 -07:00
Herbert Xu
78a478d0ef gro: Inline skb_gro_header and cache frag0 virtual address
The function skb_gro_header is called four times per packet which
quickly adds up at 10Gb/s.  This patch inlines it to allow better
optimisations.

Some architectures perform multiplication for page_address, which
is done by each skb_gro_header invocation.  This patch caches that
value in skb->cb to avoid the unnecessary multiplications.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:55 -07:00
Herbert Xu
42da6994ca gro: Open-code frags copy in skb_gro_receive
gcc does a poor job at generating code for the memcpy of the frags
array in skb_gro_receive, which is the primary purpose of that
function when merging frags.  In particular, it can't utilise the
alignment information of the source and destination.  This patch
open-codes the copy so we process words instead of bytes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-27 03:25:54 -07:00
Dave Young
4c71318948 Bluetooth: Remove useless flush_work() causing lockdep warnings
The calls to flush_work() are pointless in a single thread workqueue
and they are actually causing a lockdep warning.

=============================================
[ INFO: possible recursive locking detected ]
2.6.30-rc6-02911-gbb803cf #16
---------------------------------------------
bluetooth/2518 is trying to acquire lock:
 (bluetooth){+.+.+.}, at: [<c0130c14>] flush_work+0x28/0xb0

but task is already holding lock:
 (bluetooth){+.+.+.}, at: [<c0130424>] worker_thread+0x149/0x25e

other info that might help us debug this:
2 locks held by bluetooth/2518:
 #0:  (bluetooth){+.+.+.}, at: [<c0130424>] worker_thread+0x149/0x25e
 #1:  (&conn->work_del){+.+...}, at: [<c0130424>] worker_thread+0x149/0x25e

stack backtrace:
Pid: 2518, comm: bluetooth Not tainted 2.6.30-rc6-02911-gbb803cf #16
Call Trace:
 [<c03d64d9>] ? printk+0xf/0x11
 [<c0140d96>] __lock_acquire+0x7ce/0xb1b
 [<c0141173>] lock_acquire+0x90/0xad
 [<c0130c14>] ? flush_work+0x28/0xb0
 [<c0130c2e>] flush_work+0x42/0xb0
 [<c0130c14>] ? flush_work+0x28/0xb0
 [<f8b84966>] del_conn+0x1c/0x84 [bluetooth]
 [<c0130469>] worker_thread+0x18e/0x25e
 [<c0130424>] ? worker_thread+0x149/0x25e
 [<f8b8494a>] ? del_conn+0x0/0x84 [bluetooth]
 [<c0133843>] ? autoremove_wake_function+0x0/0x33
 [<c01302db>] ? worker_thread+0x0/0x25e
 [<c013355a>] kthread+0x45/0x6b
 [<c0133515>] ? kthread+0x0/0x6b
 [<c01034a7>] kernel_thread_helper+0x7/0x10

Based on a report by Oliver Hartkopp <oliver@hartkopp.net>

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Tested-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-05-27 09:15:57 +02:00
David S. Miller
079e24ed80 nl80211: Eliminate reference to BUS_ID_SIZE.
It's going away.  Just leave the constant "20" here so that
behavior doesn't change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-26 21:15:00 -07:00
David S. Miller
2b0cc7f78b net: Remove bogus reference to BUS_ID_SIZE in sysfs code.
BUS_ID_SIZE is really no more, and device names are dynamically
allocated and thus can be any necessary size.

So remove the BUG check here making sure BUS_ID_SIZE is at least
as large as IFNAMSIZ.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-26 21:05:19 -07:00
Paul Menage
e65fcfd63a cls_cgroup: read classid atomically in classifier
Avoid reading the unsynchronized value cs->classid multiple times,
since it could change concurrently from non-zero to zero; this would
result in the classifier returning a positive result with a bogus
(zero) classid.

Signed-off-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-26 20:47:02 -07:00
Linus Torvalds
e2a1b9ee23 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFSv4: Fix the case where NFSv4 renewal fails
  nfs: fix build error in nfsroot with initconst
  XPRTRDMA: fix client rpcrdma FRMR registration on mlx4 devices
2009-05-26 12:15:35 -07:00
Vu Pham
68743082b5 XPRTRDMA: fix client rpcrdma FRMR registration on mlx4 devices
mlx4/connectX FRMR requires local write enable together with remote
rdma write enable. This fixes NFS/RDMA operation over the ConnectX
Infiniband HCA in the default memreg mode.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-05-26 14:51:00 -04:00
Eric Dumazet
08baf56108 net: txq_trans_update() helper
We would like to get rid of netdev->trans_start = jiffies; that about all net
drivers have to use in their start_xmit() function, and use txq->trans_start
instead.

This can be done generically in core network, as suggested by David.

Some devices, (particularly loopback) dont need trans_start update, because
they dont have transmit watchdog. We could add a new device flag, or rely
on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
different than -1. Use a helper function to hide our choice.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 22:58:01 -07:00
Jarek Poplawski
a1dcb6628b pkt_sched: gen_estimator: Fix signed integers right-shifts.
Right-shifts of signed integers are implementation-defined so unportable.

With feedback from: Eric Dumazet <dada1@cosmosbay.com>

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 22:47:01 -07:00
Doug Leith
c80a5cdfc5 tcp: tcp_vegas ssthresh bugfix
This patch fixes ssthresh accounting issues in tcp_vegas when cwnd decreases

Signed-off-by: Doug Leith <doug.leith@nuim.ie>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 22:44:59 -07:00
Pablo Neira Ayuso
b38b1f6168 netfilter: nf_ct_dccp: add missing DCCP protocol changes in event cache
This patch adds the missing protocol state-change event reporting
for DCCP.

$ sudo conntrack -E
    [NEW] dccp     33 240 src=192.168.0.2 dst=192.168.1.2 sport=57040 dport=5001 [UNREPLIED] src=192.168.1.2 dst=192.168.1.100 sport=5001 dport=57040

With this patch:

$ sudo conntrack -E
    [NEW] dccp     33 240 REQUEST src=192.168.0.2 dst=192.168.1.2 sport=57040 dport=5001 [UNREPLIED] src=192.168.1.2 dst=192.168.1.100 sport=5001 dport=57040

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-25 17:29:43 +02:00
Jozsef Kadlecsik
bfcaa50270 netfilter: nf_ct_tcp: fix accepting invalid RST segments
Robert L Mathews discovered that some clients send evil TCP RST segments,
which are accepted by netfilter conntrack but discarded by the
destination. Thus the conntrack entry is destroyed but the destination
retransmits data until timeout.

The same technique, i.e. sending properly crafted RST segments, can easily
be used to bypass connlimit/connbytes based restrictions (the sample
script written by Robert can be found in the netfilter mailing list
archives).

The patch below adds a new flag and new field to struct ip_ct_tcp_state so
that checking RST segments can be made more strict and thus TCP conntrack
can catch the invalid ones: the RST segment is accepted only if its
sequence number higher than or equal to the highest ack we seen from the
other direction. (The last_ack field cannot be reused because it is used
to catch resent packets.)

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-05-25 17:23:15 +02:00
Alexander Beregalov
e3804cbebb net: remove COMPAT_NET_DEV_OPS
All drivers are already converted to new net_device_ops API
and nobody uses old API anymore.

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 01:53:53 -07:00
David S. Miller
c649c0e31d Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/ath/ath5k/phy.c
	drivers/net/wireless/iwlwifi/iwl-agn.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-05-25 01:42:21 -07:00
Herbert Xu
9bcb97cace skbuff: Copy csum instead of csum_start/csum_offset
Hi:

skbuff: Copy csum instead of csum_start/csum_offset

It's easier to copy the u32 csum instead of its two u16
constituents.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 00:40:43 -07:00
Herbert Xu
82c49a352e skbuff: Move new code into __copy_skb_header
Hi:

skbuff: Move new __skb_clone code into __copy_skb_header

It seems that people just keep on adding stuff to __skb_clone
instead __copy_skb_header.  This is wrong as it means your brand-new
attributes won't always get copied as you intended.

This patch moves them to the right place, and adds a comment to
prevent this from happening again.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-25 00:40:43 -07:00
David S. Miller
45ea4ea2af Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-05-25 00:38:24 -07:00
Zhu Yi
e31a16d6f6 wireless: move some utility functions from mac80211 to cfg80211
The patch moves some utility functions from mac80211 to cfg80211.
Because these functions are doing generic 802.11 operations so they
are not mac80211 specific. The moving allows some fullmac drivers
to be also benefit from these utility functions.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Samuel Ortiz <samuel.ortiz@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-22 14:06:02 -04:00
Johannes Berg
a971be223f mac80211: correct probe wait time
My first patch submission used 200ms, which I then somehow
managed to revert back to the earlier 50ms I had used for
some tests in the second patch submission -- but that was
wrong, I should have used 200ms here. Correct that.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-22 14:06:01 -04:00
Johannes Berg
4ef699fb77 mac80211: fix probe response wait timing
In "mac80211: split out and decrease probe wait time" I tried
to reduce the time waiting for a probe response, but failed to
take into account the case where we are detecting beacon loss
in software -- in that case we still wait the monitoring time
rather than the probe wait time. Fix this by refactoring the
mod_timer() calls in ieee80211_associated().

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-22 14:05:59 -04:00
Johannes Berg
8705782582 wext: remove atomic requirement for wireless stats
The requirement for wireless stats to be atomic is now mostly
artificial since we hold the rtnl _and_ the dev_base_lock for
iterating the device list. Doing that is not required, just the
rtnl is sufficient (and the rtnl is required for other reasons
outlined in commit "wext: fix get_wireless_stats locking").

This will fix http://bugzilla.kernel.org/show_bug.cgi?id=13344
and make things easier for drivers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-22 14:05:59 -04:00
Herbert Xu
3699067381 tcp: Unexport TCPv6 GRO functions
Sinec the TCPv6 GRO functions are used in the same file where
they are defined, we do not need to export them.  This was a
cut-n-paste from the IPv4 code which does need to export them.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-22 00:45:28 -07:00
David S. Miller
7d18f11489 net: Fix arg to trace_napi_poll() in netpoll.
Reproted by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 23:30:09 -07:00
Michał Mirosław
0d63cbb535 wireless: Use genl_register_family_with_ops()
Use genl_register_family_with_ops() instead of a copy.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 16:50:25 -07:00
Michał Mirosław
7ae740df3a netlabel: Use genl_register_family_with_ops()
Use genl_register_family_with_ops() instead of a copy. This fixes genetlink
family leak on error path.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 16:50:24 -07:00
Michał Mirosław
8f698d5453 ipvs: Use genl_register_family_with_ops()
Use genl_register_family_with_ops() instead of a copy.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 16:50:24 -07:00
Michał Mirosław
acb0a200ae tipc: Use genl_register_family_with_ops()
Use genl_register_family_with_ops() instead of a copy. This also changes
netlink related variable names to be kernel-wide unique for consistency
with other users.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 16:50:23 -07:00
Michał Mirosław
502664eeaf irda: Use genl_register_family_with_ops()
Use genl_register_family_with_ops() instead of a copy.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 16:50:22 -07:00
Michał Mirosław
a7b11d7382 genetlink: Introduce genl_register_family_with_ops()
This introduces genl_register_family_with_ops() that registers a genetlink
family along with operations from a table. This is used to kill copy'n'paste
occurrences in following patches.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 16:50:22 -07:00
Neil Horman
4ea7e38696 dropmon: add ability to detect when hardware dropsrxpackets
Patch to add the ability to detect drops in hardware interfaces via dropwatch.
Adds a tracepoint to net_rx_action to signal everytime a napi instance is
polled.  The dropmon code then periodically checks to see if the rx_frames
counter has changed, and if so, adds a drop notification to the netlink
protocol, using the reserved all-0's vector to indicate the drop location was in
hardware, rather than somewhere in the code.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/net_dropmon.h |    8 ++
 include/trace/napi.h        |   11 +++
 net/core/dev.c              |    5 +
 net/core/drop_monitor.c     |  124 ++++++++++++++++++++++++++++++++++++++++++--
 net/core/net-traces.c       |    4 +
 net/core/netpoll.c          |    2
 6 files changed, 149 insertions(+), 5 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 16:50:21 -07:00
Dan Carpenter
0975ecba3b RxRPC: Error handling for rxrpc_alloc_connection()
rxrpc_alloc_connection() doesn't return an error code on failure, it just
returns NULL.  IS_ERR(NULL) is false.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:22:02 -07:00
Robert Olsson
3ed18d76d9 ipv4: Fix oops with FIB_TRIE
It seems we can fix this by disabling preemption while we re-balance the 
trie. This is with the CONFIG_CLASSIC_RCU. It's been stress-tested at high 
loads continuesly taking a full BGP table up/down via iproute -batch.

Note. fib_trie is not updated for CONFIG_PREEMPT_RCU

Reported-by: Andrei Popa
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:20:59 -07:00
Eric W. Biederman
d95ed9275e af_packet: Teach to listen for multiple unicast addresses.
The the PACKET_ADD_MEMBERSHIP and the PACKET_DROP_MEMBERSHIP setsockopt
calls for af_packet already has all of the infrastructure needed to subscribe
to multiple mac addresses.  All that is missing is a flag to say that
the address we want to listen on is a unicast address.

So introduce PACKET_MR_UNICAST and wire it up to dev_unicast_add and
dev_unicast_delete.

Additionally I noticed that errors from dev_mc_add were not propagated
from packet_dev_mc so fix that.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:13:39 -07:00
Stephen Hemminger
ca0f31125c netns: simplify net_ns_init
The net_ns_init code can be simplified. No need to save error code
if it is only going to panic if it is set 4 lines later.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:10:31 -07:00
Stephen Hemminger
1f7a2bb4ef netns: remove leftover debugging message
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:10:05 -07:00
Florian Westphal
5b5f792a6a pktgen: do not access flows[] beyond its length
typo -- pkt_dev->nflows is for stats only, the number of concurrent
flows is stored in cflows.

Reported-By: Vladimir Ivashchenko <hazard@francoudi.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-21 15:07:12 -07:00
Jean-Mickael Guerin
4f72427998 IPv6: set RTPROT_KERNEL to initial route
The use of unspecified protocol in IPv6 initial route prevents quagga to
install IPv6 default route:
# show ipv6 route
S   ::/0 [1/0] via fe80::1, eth1_0
K>* ::/0 is directly connected, lo, rej
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0

# ip -6 route
fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0  metric 256  mtu 1500 advmss 1440 hoplimit -1
unreachable default dev lo  proto none  metric -1  error -101 hoplimit 255

The attached patch ensures RTPROT_KERNEL to the default initial route
and fixes the problem for quagga.
This is similar to "ipv6: protocol for address routes"
f410a1fba7.

# show ipv6 route
S>* ::/0 [1/0] via fe80::1, eth1_0
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0

# ip -6 route
fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
hoplimit -1
fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0  metric 256  mtu 1500 advmss 1440 hoplimit -1
default via fe80::1 dev eth1_0  proto zebra  metric 1024  mtu 1500
advmss 1440 hoplimit -1
unreachable default dev lo  proto kernel  metric -1  error -101 hoplimit 255

Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:38:59 -07:00
David S. Miller
86c2fe1e3a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-05-20 17:31:25 -07:00
Rami Rosen
04af8cf6f3 net: Remove unused parameter from fill method in fib_rules_ops.
The netlink message header (struct nlmsghdr) is an unused parameter in
fill method of fib_rules_ops struct.  This patch removes this
parameter from this method and fixes the places where this method is
called.

(include/net/fib_rules.h)

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:26:23 -07:00
Eric Dumazet
1ddbcb005c net: fix rtable leak in net/ipv4/route.c
Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :

begin_of_quotation
 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
 patch has at least one critical flaw, and another problem.

 rt_intern_hash calculates rthi pointer, which is later used for new entry
 insertion. The same loop calculates cand pointer which is used to clean the
 list. If the pointers are the same, rtable leak occurs, as first the cand is
 removed then the new entry is appended to it.

 This leak leads to unregister_netdevice problem (usage count > 0).

 Another problem of the patch is that it tries to insert the entries in certain
 order, to facilitate counting of entries distinct by all but QoS parameters.
 Unfortunately, referencing an existing rtable entry moves it to list beginning,
 to speed up further lookups, so the carefully built order is destroyed.

 For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
 it will also destroy the ordering.
end_of_quotation

Problematic commit is 1080d709fb
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)

Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.

A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.

Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:18:02 -07:00
Eric Dumazet
cf8da764fc net: fix length computation in rt_check_expire()
rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead
of meaningful results.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-20 17:18:01 -07:00
Johannes Berg
9cef873798 mac80211: fix managed mode BSSID handling
Currently, we will ask the driver to configure right away
when somebody changes the desired BSSID. That's totally
strange because then we will configure the driver without
even knowing whether the BSS exists. Change this to only
configure the BSSID when associated, and configure a zero
BSSID when not associated.

As a side effect, this fixes an issue with the iwlwifi
driver which doesn't implement sta_notify properly and
uses the BSSID instead and gets very confused if the
BSSID is cleared before we disassociate, which results
in the warning Marcel posted [1] and iwlwifi bug 1995 [2].

[1] http://thread.gmane.org/gmane.linux.kernel.wireless.general/32598
[2] http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1995

Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:37 -04:00
Luis R. Rodriguez
bbcf3f0277 cfg80211: warn when wiphy_apply_custom_regulatory() does nothing
Device drivers using wiphy_apply_custom_regulatory() want some
regulatory settings applied to their wiphy, if no bands were
configured on the wiphy then something went wrong.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:37 -04:00
Johannes Berg
db67645db6 mac80211: fix parameter confusion when finding IBSS
When I fixed the crypto bit I must have done the negative
test only -- it is quite clearly impossible to find _any_
IBSS to join with the parameters put the wrong way around.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:36 -04:00
Johannes Berg
175427ce40 mac80211: don't try to do anything on unchanged genIE
When the genIE hasn't changed there's no reason to kick
the state machine since it won't be able to do anything
new -- doing this decreases the useless work we do for
reassociating because if we do kick the state machine
it will try to find a usable BSS but there might not be
one because wpa_supplicant will only change the BSSID
a little later.

In a sense this is a workaround for userspace behaviour,
but on the other hand userspace cannot really keep track
of what the kernel currently has for genIE since any
process could have changed that while wpa_supplicant
wasn't looking.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:35 -04:00
Jouni Malinen
7e0aae4732 mac80211: Do not override AID in the duration field
When updating the duration field for TX frames, skip the update for
PS-Poll frames that use this field for other purposes (AID).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:34 -04:00
Jouni Malinen
30196673fe mac80211: PS processing for every Beacon with our AID in TIM
If the AP includes our AID in the TIM IE, we need to process the
Beacon frame as far as PS is concerned (send PS-Poll or nullfunc data
with PM=0). The previous code skipped this in cases where the CRC
value did not change and it would not change if the AP continues
including our AID in the TIM..

There is no need to count the crc32 value for directed_tim with this
change, so we can remove that part. In order not to change the order
of operations (i.e., update WMM parameters prior to sending PS-Poll),
the CRC match is checked twice as only after the PS processing step,
the rest of the function is skipped if nothing changed in the Beacon.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:33 -04:00
Johannes Berg
cce4c77b87 mac80211: fix kernel-doc
Moving information from config_interface to bss_info_changed
removed struct ieee80211_if_conf which the documentation still
refers to, additionally there's one kernel-doc description too
much and one other missing, fix all this.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:32 -04:00
Pavel Roskin
e43e820c9c cfg80211: fix compile error with CONFIG_CFG80211_DEBUGFS
If CONFIG_CFG80211_DEBUGFS is enabled and CONFIG_MAC80211_DEBUGFS is
not, compilation fails in net/wireless/debugfs.c:

net/wireless/debugfs.c: In function 'cfg80211_debugfs_drv_add':
net/wireless/debugfs.c:117: error: 'struct cfg80211_registered_device'
has no member named 'debugfs'

The debugfs filed is needed if and only if CONFIG_CFG80211_DEBUGFS is
enabled, so use that instead of CONFIG_MAC80211_DEBUGFS.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:32 -04:00
Luis R. Rodriguez
61405e9778 cfg80211: fix in nl80211_set_reg()
There is a race on access to last_request and its alpha2
through reg_is_valid_request() and us possibly processing
first another regulatory request on another CPU. We avoid
this improbably race by locking with the cfg80211_mutex as
we should have done in the first place. While at it add
the assert on locking on reg_is_valid_request().

Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:32 -04:00
Luis R. Rodriguez
d0e18f833d cfg80211: cleanup return calls on nl80211_set_reg()
This has no functional change, but it will make the race
fix easier to spot in my next patch.

Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:31 -04:00
Luis R. Rodriguez
4776c6e7f6 cfg80211: return immediately if num reg rules > NL80211_MAX_SUPP_REG_RULES
This has no functional change except we save a kfree(rd) and
allows us to clean this code up a bit after this. We do
avoid an unnecessary kfree(NULL) but calling that was OK too.

Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:31 -04:00
Johannes Berg
e3da574a0d cfg80211: allow wext to remove keys that don't exist
Some applications using wireless extensions expect to be able to
remove a key that doesn't exist. One example is wpa_supplicant
which doesn't actually change behaviour when running into an
error while trying to do that, but it prints an error message
which users interpret as wpa_supplicant having problems.

The safe thing to do is not change the behaviour of wireless
extensions any more, so when the driver reports -ENOENT let
the wext bridge code return success to userspace. To guarantee
this, also document that drivers should return -ENOENT when the
key doesn't exist.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:30 -04:00
David Kilroy
3dcf670baf cfg80211: mark ops as pointer to const
This allows drivers to mark their cfg80211_ops tables const.

Signed-off-by: David Kilroy <kilroyd@googlemail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:27 -04:00
Johannes Berg
5bb644a0fd mac80211: cancel/restart all timers across suspend/resume
We forgot to cancel all timers in mac80211 when suspending.
In particular we forgot to deal with some things that can
cause hardware reconfiguration -- while it is down.

While at it we go ahead and add a warning in ieee80211_sta_work()
if its run while the suspend->resume cycle is in effect. This
should not happen and if it does it would indicate there is
a bug lurking in either mac80211 or mac80211 drivers.

With this now wpa_supplicant doesn't blink when I go to suspend
and resume where as before there where issues with some timers
running during the suspend->resume cycle. This caused a lot of
incorrect assumptions and would at times bring back the device
in an incoherent, but mostly recoverable, state.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:25 -04:00
Johannes Berg
cc32abd494 mac80211: move channel switch code
The channel switch code is currently in the spectrum
management file, where arguably it belongs. However,
it is for managed mode only and uses the structures
for that mode only so having it in a more generic
file can be confusing. Additionally, my next patch
gets simpler with the code here.

When/if we ever implement this for IBSS or mesh then
we will need to rework the structures it uses anyway
at which point we could move the code back.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:25 -04:00
Jouni Malinen
9f26a95221 nl80211: Validate NL80211_ATTR_KEY_SEQ length
Validate RSC (NL80211_ATTR_KEY_SEQ) length in nl80211/cfg80211 instead
of having to do this in all the drivers.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:25 -04:00
Jouni Malinen
92778180f7 mac80211: Cancel pending probereq poll on beacon RX
While the probe request poll is expected to work, it looks like it
does not always result in getting a response. The exact reason for
this is unclear, but anyway, if we do receive a Beacon frame from our
AP, there is no need to disconnect based on the probereq poll. This
seems to help keep the connection bit more stable in cases where
beacon loss is occurring semi-frequently.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:24 -04:00
Senthil Balasubramanian
cccaec98a3 mac80211: Initialize RX's last received sequence number
The STA may drop the very first frame if it happens to be a retried
frame. This is because we maintian the last received sequence number
per TID for QoS frames and it is initialized to zero through kzalloc
during sta_info_alloc and the sequence number of the very first date
frame received would be ZERO (as per IEEE 802.11-2007, 7.1.3.4.1).

If the frame dropped happens to be an EAP Request Identity(very first
frame from the AP), then wpa_supplicnat disconnects the STA and the
whole procedure starts again.

Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:23 -04:00
Luis R. Rodriguez
80a3511d70 cfg80211: add debugfs HT40 allow map
Here's a screenshot of what this looks like with ath9k:

mcgrof@pogo /debug/ieee80211/phy0 $ cat ht40allow_map
2412 HT40  +
2417 HT40  +
2422 HT40  +
2427 HT40  +
2432 HT40 -+
2437 HT40 -+
2442 HT40 -+
2447 HT40 -
2452 HT40 -
2457 HT40 -
2462 HT40 -
2467 Disabled
2472 Disabled
2484 Disabled
5180 HT40  +
5200 HT40 -+
5220 HT40 -+
5240 HT40 -+
5260 HT40 -+
5280 HT40 -+
5300 HT40 -+
5320 HT40 -
5500 HT40  +
5520 HT40 -+
5540 HT40 -+
5560 HT40 -+
5580 HT40 -+
5600 HT40 -+
5620 HT40 -+
5640 HT40 -+
5660 HT40 -+
5680 HT40 -+
5700 HT40 -
5745 HT40  +
5765 HT40 -+
5785 HT40 -+
5805 HT40 -+
5825 HT40 -

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:23 -04:00
Luis R. Rodriguez
1ac61302dc mac80211/cfg80211: move wiphy specific debugfs entries to cfg80211
This moves the cfg80211 specific stuff to new cfg80211 debugfs
entries. Non-mac80211 will also get these entries now. There were
only 4 which we take:

rts_threshold
fragmentation_threshold
short_retry_limit
long_retry_limit

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:23 -04:00
Luis R. Rodriguez
294196ab22 cfg80211: check allowed channel type upon userspace requests
Thanks to nl80211 userspace can be very specific upon device
configuration. Before processing the request for the new HT40
channel types (HT40- or HT40+) we need to ensure we can use them
regulatory-wise. This wasn't required with wireless extensions as
specifying the channel type wasn't not available and configuration
was done towards the end implicitly upon association or reception
of beacons from the AP. For the new nl80211 we have to check this
when configuring the interfaces explicitly.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:23 -04:00
Luis R. Rodriguez
768777ea11 mac80211: check if HT40+/- is allowed before sending assoc
We weren't checking this at all.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:23 -04:00
Luis R. Rodriguez
689da1b3b8 wireless: rename IEEE80211_CHAN_NO_FAT_* to HT40-/+
This is more consistent with our nl80211 naming convention
for HT40-/+.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:22 -04:00
Luis R. Rodriguez
038659e7c6 cfg80211: Process regulatory max bandwidth checks for HT40
We are not correctly listening to the regulatory max bandwidth
settings. To actually make use of it we need to redesign things
a bit. This patch does the work for that. We do this to so we
can obey to regulatory rules accordingly for use of HT40.

We end up dealing with HT40 by having two passes for each channel.

The first check will see if a 20 MHz channel fits into the channel's
center freq on a given frequency range. We check for a 20 MHz
banwidth channel as that is the maximum an individual channel
will use, at least for now. The first pass will go ahead and
check if the regulatory rule for that given center of frequency
allows 40 MHz bandwidths and we use this to determine whether
or not the channel supports HT40 or not. So to support HT40 you'll
need at a regulatory rule that allows you to use 40 MHz channels
but you're channel must also be enabled and support 20 MHz by itself.

The second pass is done after we do the regulatory checks over
an device's supported channel list. On each channel we'll check
if the control channel and the extension both:

 o exist
 o are enabled
 o regulatory allows 40 MHz bandwidth on its frequency range

This work allows allows us to idependently check for HT40- and
HT40+.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:46:22 -04:00
Luis R. Rodriguez
5078b2e32a cfg80211: fix race between core hint and driver's custom apply
Its possible for cfg80211 to have scheduled the work and for
the global workqueue to not have kicked in prior to a cfg80211
driver's regulatory hint or wiphy_apply_custom_regulatory().

Although this is very unlikely its possible and should fix
this race. When this race would happen you are expected to have
hit a null pointer dereference panic.

Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:29:54 -04:00
Johannes Berg
88f16db7a2 wext: verify buffer size for SIOCSIWENCODEEXT
Another design flaw in wireless extensions (is anybody
surprised?) in the way it handles the iw_encode_ext
structure: The structure is part of the 'extra' memory
but contains the key length explicitly, instead of it
just being the length of the extra buffer - size of
the struct and using the explicit key length only for
the get operation (which only writes it).

Therefore, we have this layout:

extra: +-------------------------+
       | struct iw_encode_ext  { |
       |     ...                 |
       |     u16 key_len;        |
       |     u8 key[0];          |
       | };                      |
       +-------------------------+
       | key material            |
       +-------------------------+

Now, all drivers I checked use ext->key_len without
checking that both key_len and the struct fit into the
extra buffer that has been copied from userspace. This
leads to a buffer overrun while reading that buffer,
depending on the driver it may be possible to specify
arbitrary key_len or it may need to be a proper length
for the key algorithm specified.

Thankfully, this is only exploitable by root, but root
can actually cause a segfault or use kernel memory as
a key (which you can even get back with siocgiwencode
or siocgiwencodeext from the key buffer).

Fix this by verifying that key_len fits into the buffer
along with struct iw_encode_ext.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-20 14:07:50 -04:00
Sascha Hlusiak
645069299a sit: stateless autoconf for isatap
be sent periodically. The rs_delay can be speficied when adding the
PRL entry and defaults to 15 minutes.

The RS is sent from every link local adress that's assigned to the
tunnel interface. It's directed to the (guessed) linklocal address
of the router and is sent through the tunnel.

Better: send to ff02::2 encapsuled in unicast directed to router-v4.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19 16:02:02 -07:00
Sascha Hlusiak
9af28511be addrconf: refuse isatap eui64 for INADDR_ANY
A tunnel with no local ipv4 endpoint would otherwise use the
ISATAP linklocal address fe80::5efe:0:0, which is invalid. Rather not
add a linklocal address at all.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19 16:02:02 -07:00
Sascha Hlusiak
4b27960174 sit: ipip6_tunnel_del_prl: return err
Typo. When deleting a PRL entry, return status to userspace
instead of success.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19 16:02:01 -07:00
Sascha Hlusiak
4fddbf5d78 sit: strictly restrict incoming traffic to tunnel link device
Check link device when looking up a tunnel. When a tunnel is
linked to a interface, traffic from a different interface must not
reach the tunnel.

This also allows creating of multiple tunnels with the same
endpoints, if the link device differs.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19 16:02:00 -07:00
Sascha Hlusiak
8db99e5717 sit: Fail to create tunnel, if it already exists
When locating the tunnel, do not continue if it is found. Otherwise
a different tunnel with similar configuration would be returned and
parts could be overwritten.

Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19 16:02:00 -07:00
Chris Friesen
9643f45512 ipv4: teach ipconfig about the MTU option in DHCP
The DHCP spec allows the server to specify the MTU.  This can be useful
for netbooting with UDP-based NFS-root on a network using jumbo frames.
This patch allows the kernel IP autoconfiguration to handle this option
correctly.

It would be possible to use initramfs and add a script to set the MTU,
but that seems like a complicated solution if no initramfs is otherwise
necessary, and would bloat the kernel image more than this code would.

This patch was originally submitted to LKML in 2003 by Hans-Peter Jansen.

Signed-off-by: Chris Friesen <cfriesen@nortel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19 15:36:17 -07:00
Pablo Neira Ayuso
fd2120ca0d net: use NLMSG_DEFAULT_SIZE in nlmsg_new() allocations
nlmsg_new() adds the size of the netlink header to the value
that has been passed as parameter. If NLMSG_GOODSIZE is selected,
we request an allocation of one memory page plus the size of the
header. Instead, NLMSG_DEFAULT_SIZE should be used since it
already substracts the size of the Netlink header.

I have the impression that the similar naming in both constant
is error prone when using it with nlmsg_new(). This is already
documented in include/net/netlink.h

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19 15:36:16 -07:00
Eric Dumazet
ab35cd4b8f sch_teql: Use net_device internal stats
We can slightly reduce size of teqlN structure, not duplicating stats
structure in teql_master but using stats field from net_device.stats
for tx_errors and from netdev_queue for tx_bytes/tx_packets/tx_dropped
values.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-19 15:36:15 -07:00
Eric Dumazet
93f154b594 net: release dst entry in dev_hard_start_xmit()
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).

CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.

It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.

David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().

List of devices that must clear this flag is :

- loopback device, because it calls netif_rx() and quoting Patrick :
    "ip_route_input() doesn't accept loopback addresses, so loopback packets
     already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function

- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:19:19 -07:00
Eric W. Biederman
af38f29895 net: Fix bridgeing sysfs handling of rtnl_lock
Holding rtnl_lock when we are unregistering the sysfs files can
deadlock if we unconditionally take rtnl_lock in a sysfs file.  So fix
it with the now familiar patter of: rtnl_trylock and syscall_restart()

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:15:59 -07:00
Eric W. Biederman
9b8adb5ea0 net: Fix devinet_sysctl_forward
sysctls are unregistered with the rntl_lock held making
it unsafe to unconditionally grab the the rtnl_lock.  Instead
we need to call rtnl_trylock and restart the system call
if we can not grab it.  Otherwise we could deadlock at unregistration
time.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:15:58 -07:00
Eric W. Biederman
5007392d85 net: FIX ipv6_forward sysctl restart
Just returning -ERESTARTSYS without a signal pending is not
good that will just leak it to userspace.  We need return
-ERESTARTNOINTR so we always restart and set signal pending
so that we fall of the fast path of syscall return and setup
the system call restart.

So use restart_syscall() which does all of this for us.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:15:58 -07:00
Eric W. Biederman
336ca57c3b net-sysfs: Use rtnl_trylock in sysfs methods.
The earlier patch to fix the deadlock between a network device going
away and writing to sysfs attributes was incomplete.
- It did not set signal_pending so we would leak ERSTARTSYS to user space.
- It used ERESTARTSYS which only restarts if sigaction configures it to.
- It did not cover store and show for ifalias.

So fix all of these up and use the new helper restart_syscall so we get
the details correct on what it takes.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:15:57 -07:00
Johann Baudy
69e3c75f4d net: TX_RING and packet mmap
New packet socket feature that makes packet socket more efficient for
transmission.

- It reduces number of system call through a PACKET_TX_RING mechanism,
  based on PACKET_RX_RING (Circular buffer allocated in kernel space
  which is mmapped from user space).

- It minimizes CPU copy using fragmented SKB (almost zero copy).

Signed-off-by: Johann Baudy <johann.baudy@gnu-log.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:11:22 -07:00
Frans Pop
bc8a539743 ipv4: make default for INET_LRO consistent with help text
Commit e81963b1 ("ipv4: Make INET_LRO a bool instead of tristate.")
changed this config from tristate to bool.  Add default so that it is
consistent with the help text.

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 21:48:38 -07:00
Thomas Chenault
995b337952 net: fix skb_seq_read returning wrong offset/length for page frag data
When called with a consumed value that is less than skb_headlen(skb)
bytes into a page frag, skb_seq_read() incorrectly returns an
offset/length relative to skb->data. Ensure that data which should come
from a page frag does.

Signed-off-by: Thomas Chenault <thomas_chenault@dell.com>
Tested-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 21:43:27 -07:00
David S. Miller
bb803cfbec Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/scsi/fcoe/fcoe.c
2009-05-18 21:08:20 -07:00
Eric Dumazet
511e11e396 pkt_sched: gen_estimator: use 64 bit intermediate counters for bps
gen_estimator can overflow bps (bytes per second) with Gb links, while
it was designed with a u32 API, with a theorical limit of 34360Mbit
(2^32 bytes)

Using 64 bit intermediate avbps/brate counters can allow us to reach
this theorical limit.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 19:26:37 -07:00
Rami Rosen
d23a9b5baa ipv4: cleanup: remove unnecessary include.
There is no need for net/icmp.h header in net/ipv4/fib_frontend.c.
This patch removes the #include net/icmp.h from it.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:16:38 -07:00
Rami Rosen
e204a345a0 ipv4: cleanup - remove two unused parameters from fib_semantic_match().
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:16:37 -07:00
Eric Dumazet
450c4ea15e vlan: use struct netdev_queue counters instead of dev->stats
We can update netdev_queue tx_bytes/tx_packets/tx_dropped counters instead
of dev->stats ones, to reduce number of cache lines dirtied in xmit path.

This fixes a performance problem on SMP when many different cpus take
vlan tx path.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:15:06 -07:00
Eric Dumazet
7004bf252c net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queue
offsetof(struct net_device, features)=0x44
offsetof(struct net_device, stats.tx_packets)=0x54
offsetof(struct net_device, stats.tx_bytes)=0x5c
offsetof(struct net_device, stats.tx_dropped)=0x6c

Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
tx path can slow down SMP operations, since they dirty a cache line
that should stay shared (dev->features is needed in rx and tx paths)

We could move away stats field in net_device but it wont help that much.
(Two cache lines dirtied in tx path, we can do one only)

Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
netdev_queue because this structure is already touched in tx path and
counters updates will then be free (no increase in size)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:15:06 -07:00
Eric Dumazet
c0f84d0d4b sch_teql: should not dereference skb after ndo_start_xmit()
It is illegal to dereference a skb after a successful ndo_start_xmit()
call. We must store skb length in a local variable instead.

Bug was introduced in 2.6.27 by commit 0abf77e55a
(net_sched: Add accessor function for packet length for qdiscs)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:12:31 -07:00
Ilpo Järvinen
7752731318 tcp: fix MSG_PEEK race check
Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of
blocking behavior) lets the loop run longer than the race check
did previously expect, so we need to be more careful with this
check and consider the work we have been doing.

I tried my best to deal with urg hole madness too which happens
here:
	if (!sock_flag(sk, SOCK_URGINLINE)) {
		++*seq;
		...
by using additional offset by one but I certainly have very
little interest in testing that part.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Frans Pop <elendil@planet.nl>
Tested-by: Ian Zimmermann <itz@buug.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 15:05:40 -07:00
David S. Miller
82d048186e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-05-18 14:48:30 -07:00
Ingo Molnar
1079cac0f4 Merge commit 'v2.6.30-rc6' into tracing/core
Merge reason: we were on an -rc4 base, sync up to -rc6

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18 10:15:35 +02:00
Stephen Hemminger
4f0611af47 bridge: fix initial packet flood if !STP
If bridge is configured with no STP and forwarding delay of 0 (which
is typical for virtualization) then when link starts it will flood all
packets for the first 20 seconds.

This bug was introduced by a combination of earlier changes:
  * forwarding database uses hold time of zero to indicate
    user wants to always flood packets
  * optimzation of the case of forwarding delay of 0 avoids the initial
    timer tick

The fix is to just skip all the topology change detection code if
kernel STP is not being used.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 21:12:55 -07:00
Stephen Hemminger
a598f6aebe bridge: relay bridge multicast pkgs if !STP
Currently the bridge catches all STP packets; even if STP is turned
off.  This prevents other systems (which do have STP turned on)
from being able to detect loops in the network.

With this patch, if STP is off, then any packet sent to the STP
multicast group address is forwarded to all ports.

Based on earlier patch by Joakim Tjernlund with changes
to go through forwarding (not local chain), and optimization
that only last octet needs to be checked.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 21:12:54 -07:00
Eric Dumazet
9d21493b4b net: tx scalability works : trans_start
struct net_device trans_start field is a hot spot on SMP and high performance
devices, particularly multi queues ones, because every transmitter dirties
it. Is main use is tx watchdog and bonding alive checks.

But as most devices dont use NETIF_F_LLTX, we have to lock
a netdev_queue before calling their ndo_start_xmit(). So it makes
sense to move trans_start from net_device to netdev_queue. Its update
will occur on a already present (and in exclusive state) cache line, for
free.

We can do this transition smoothly. An old driver continue to
update dev->trans_start, while an updated one updates txq->trans_start.

Further patches could also put tx_bytes/tx_packets counters in 
netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 20:55:16 -07:00
John Dykstra
9dc20c5f78 tcp: tcp_prequeue() can use keyed wakeups
When TCP frees up write buffer space, avoid waking up tasks that have
done a poll() or select() on the same socket specifying read-side
events.

This is an extension of a read-side patch by Eric Dumazet.

Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 20:44:43 -07:00
Chris Friesen
2513dfb83f ipconfig: handle case of delayed DHCP server
If a DHCP server is delayed, it's possible for the client to receive the 
DHCPOFFER after it has already sent out a new DHCPDISCOVER message from 
a second interface.  The client then sends out a DHCPREQUEST from the 
second interface, but the server doesn't recognize the device and 
rejects the request.

This patch simply tracks the current device being configured and throws 
away the OFFER if it is not intended for the current device.  A more 
sophisticated approach would be to put the OFFER information into the 
struct ic_device rather than storing it globally.

Signed-off-by: Chris Friesen <cfriesen@nortel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 20:39:33 -07:00
Pavel Emelyanov
5e392739d6 netpoll: don't dereference NULL dev from np
It looks like the dev in netpoll_poll can be NULL - at lease it's
checked at the function beginning. Thus the dev->netde_ops dereference
looks dangerous.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 20:37:55 -07:00
Jiri Pirko
a186d2aead net: remove needless (now buggy) & from dev->dev_addr (part2)
Missed part of "&" removal.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 11:59:53 -07:00
Li Zefan
cb1c4b71f6 cls_cgroup: remove unneeded cgroup_lock
We can remove this lock here, since we are in cgroup write handler and
thus the cgrp is guaranteed to be valid, and no lock is needed when
writing a u32 variable.

Signed-off-by: Li Zefan <lizf@cn.fujitsuc.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 11:59:48 -07:00
Jiri Pirko
3a6d54c563 net: remove needless (now buggy) & from dev->dev_addr
Patch fixes issues with dev->dev_addr changing from array to pointer.
Hopefully there are no others.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 11:59:47 -07:00
Rami Rosen
8b3521eeb7 ipv4: remove an unused parameter from configure method of fib_rules_ops.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-17 11:59:45 -07:00
Linus Torvalds
7c7327d966 Merge git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6:
  Bluetooth: Don't trigger disconnect timeout for security mode 3 pairing
  Bluetooth: Don't use hci_acl_connect_cancel() for incoming connections
  Bluetooth: Fix wrong module refcount when connection setup fails

Another case of me handling the fallout from Davem's unfortunate
addiction to shuffleboard.

Won't anybody think of the children? Join the anti-shuffleboard league
today!
2009-05-15 14:30:02 -07:00
Linus Torvalds
3346857f6f Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6:
  iwlwifi: fix device id registration for 6000 series 2x2 devices
  ath5k: update channel in sw state after stopping RX and TX
  rtl8187: use DMA-aware buffers with usb_control_msg
  mac80211: avoid NULL ptr deref when finding max_rates in PID and minstrel
  airo: airo_get_encode{,ext} potential buffer overflow

Pulled directly by Linus because Davem is off playing shuffle-board at
some Alaskan cruise, and the NULL ptr deref issue hits people and should
get merged sooner rather than later.

David - make us proud on the shuffle-board tournament!
2009-05-15 12:02:06 -07:00
Johannes Berg
d3707d9918 mac80211: make noack test available
There's this internal wifi_wme_noack_test variable that
we use to set the QoS control if set. For one, it is
unlikely that it is set. Secondly, if set it needs to
influence the IEEE80211_TX_CTL_NO_ACK TX control flag,
and finally we should also be able to set it at all, so
make it available in debugfs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:53 -04:00
Johannes Berg
b59066a291 mac80211: IBSS supported rate fixes
Currently mac80211 announces a rate set with no basic rates,
this fixes it to use 1/2 or 6/9 Mbit as basic rates by default.
Additionally, mac80211 will currently adopt the peer's entire
rate set, rather than just the basic rate set; fix that too.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:49 -04:00
Johannes Berg
e0d61887c2 mac80211: don't connect to IBSS network with different privacy
Even when we find an IBSS with the SSID we're looking for, we
may not be able to connect to it because it has a key and we
don't, or vice versa. Avoid such situations by checking the
privacy capability bit.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:48 -04:00
Johannes Berg
e0502de6fe mac80211: split out and decrease probe wait time
The time we wait for a probe response after probing an AP due to
beacon loss is currently the same as the monitoring interval, 2s.
This is far too long, APs should respond to probes within a
fraction of that time. To be able to adjust both values, add a
new constant IEEE80211_PROBE_WAIT, use it for checking the probe
response, and adjust it down to 200ms instead of 2 seconds.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:45 -04:00
Johannes Berg
34bfc411f6 mac80211: respond to beacon loss report only once
The driver might keep reporting beacon loss until we
disassociate -- catch that and don't respond to any
subsequent events until the probe is either successful
or we disassociate.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:42 -04:00
Johannes Berg
f7eef3563c wext: remove seq_start/stop sparse annotations
Even though they are true, they cause sparse to complain
because it doesn't see the __acquires(dev_base_lock) on
dev_seq_start() because it is only added to the function
in net/core/dev.c, not the header file. To keep track of
the nesting correctly we should probably annotate those
functions publically, but for now let's just remove the
annotation I added to wext.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:40 -04:00
Jouni Malinen
faa8fdc853 nl80211: Add RSC configuration for new keys
When setting a key with NL80211_CMD_NEW_KEY, we should allow the key
sequence number (RSC) to be set in order to allow replay protection to
work correctly for group keys. This patch documents this use for
nl80211 and adds the couple of missing pieces in nl80211/cfg80211 and
mac80211 to support this. In addition, WEXT SIOCSIWENCODEEXT compat
processing in cfg80211 is extended to handle the RSC (this was already
specified in WEXT, but just not implemented in cfg80211/mac80211).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:39 -04:00
Jouni Malinen
3f77316c6b nl80211: Add IEEE 802.1X PAE control for station mode
Add a new NL80211_ATTR_CONTROL_PORT flag for NL80211_CMD_ASSOCIATE to
allow user space to indicate that it will control the IEEE 802.1X port
in station mode. Previously, mac80211 was always marking the port
authorized in station mode. This was enough when drop_unencrypted flag
was set. However, drop_unencrypted can currently be controlled only
with WEXT and the current nl80211 design does not allow fully secure
configuration. Fix this by providing a mechanism for user space to
control the IEEE 802.1X port in station mode (i.e., do the same that
we are already doing in AP mode).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:37 -04:00
Johannes Berg
eccb8e8f0c nl80211: improve station flags handling
It is currently not possible to modify station flags, but that
capability would be very useful. This patch introduces a new
nl80211 attribute that contains a set/mask for station flags,
and updates the internal API (and mac80211) to mirror that.

The new attribute is parsed before falling back to the old so
that userspace can specify both (if it can) to work on all
kernels.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:35 -04:00
Jouni Malinen
0e46724a48 nl80211: Validate MFP flag type when parsing STA flags
NL80211_STA_FLAG_MFP was forgotten from sta_flags_policy. The previous
version added the flag due to the loop used in parse_station_flags,
but the proper behavior would be to allow nla_parse_nested() to go
through the policy for all flags.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:33 -04:00
Johannes Berg
08645126dd cfg80211: implement wext key handling
Move key handling wireless extension ioctls from mac80211 to cfg80211
so that all drivers that implement the cfg80211 operations get wext
compatibility.

Note that this drops the SIOCGIWENCODE ioctl support for getting
IW_ENCODE_RESTRICTED/IW_ENCODE_OPEN. This means that iwconfig will
no longer report "Security mode:open" or "Security mode:restricted"
for mac80211. However, what we displayed there (the authentication
algo used) was actually wrong -- linux/wireless.h states that this
setting is meant to differentiate between "Refuse non-encoded packets"
and "Accept non-encoded packets".

(Combined with "cfg80211: fix a couple of bugs with key ioctls". -- JWL)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-13 15:44:32 -04:00
Linus Torvalds
2ea3f86848 Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
  nfsd: silence lockdep warning
  lockd: fix list corruption on lockd restart
  nfsd4: check for negative dentry before use in nfsv4 readdir
  nfsd41: slots are freed with session
  svcrdma: clean up error paths.
  svcrdma: Fix dma map direction for rdma read targets
2009-05-12 17:11:56 -07:00
Johannes Berg
7be69c0b9a wext: fix get_wireless_stats locking
Currently, get_wireless_stats is racy by _design_. This is
because it returns a buffer, which needs to be statically
allocated since it cannot be freed if it was allocated
dynamically. Also, SIOCGIWSTATS and /proc/net/wireless use
no common lock, and /proc/net/wireless accesses are not
synchronised against each other. This is a design flaw in
get_wireless_stats since the beginning.

This patch fixes it by wrapping /proc/net/wireless accesses
with the RTNL so they are protected against each other and
SIOCGIWSTATS. The more correct method of fixing this would
be to pass in the buffer instead of returning it and have
the caller take care of synchronisation of the buffer, but
even then most drivers probably assume that their callback
is protected by the RTNL like all other wext callbacks.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:24:07 -04:00
Johannes Berg
e80cf8537a cfg80211: disallow interfering with stations on non-AP
On non-AP interfaces userspace has no business interfering with
the station management, this can confuse mac80211 (and other
drivers probably wouldn't support it anyway). Allow adding and
removing stations only on AP interfaces.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:24:07 -04:00
Johannes Berg
cbe8fa9c5e cfg80211: put wext data into substructure
To make it more apparent in the code what is for wext
only (and needs to be #ifdef'ed) put all the info for
wext into a substruct in each wireless_dev.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:24:07 -04:00
Johannes Berg
4e943900fb cfg80211: constify key mac address in ops
The address pointed to by mac_addr can be marked as const.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:24:07 -04:00
Johannes Berg
413ad50a5c mac80211: properly track HT operation_mode
When we disassociate, we set the channel to non-HT which
obviously invalidates any ht_operation_mode setting. But
when we then associate with the next AP again, we might
still have the ht_operation_mode from the previous AP
cached and fail to configure the hardware with the new
(but unchanged) operation mode. This patch fixes it by
separately tracking whether our cache is valid.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:57 -04:00
Johannes Berg
9ed6bcce77 mac80211: move HT operation mode BSS info
There really is no need to have a separate struct for a
single variable. The fact that it exists is due to the
code legacy, but we can remove that now. Very simple.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:57 -04:00
Johannes Berg
99c84cb069 mac80211: improve scan timing
The call to ieee80211_hw_config() is supposed to apply changes
synchronously, so once it returns the parameters are applied to
the hardware. Thus, there really is no need to delay the probing
by the channel switch time again since the channel switch has
already happened once we get to this code.

Additionally, there is no need to wait for a NAV update (probe
delay) when the channel is passively scanned. Remove that extra
time too.

This cuts scanning time from over 7 seconds to under 4 on ar9170,
which is due to the number of channels scanned and ar9170's switch
time being advertised as 135ms (my test now indicates it is about
77ms with the current driver, but the difference might also be due
to using a different machine with different USB controllers).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:56 -04:00
Jouni Malinen
f2ca3ea484 mac80211: MFP - Drop unprotected Action frames prior key setup
When management frame protection (IEEE 802.11w) is used, unprotected
Robust Action frames are not allowed prior to key configuration.
However, unprotected Deauthentication and Disassociation frames are
allowed at that point, but not after key configuration.

Make ieee80211_drop_unencrypted() handle the special cases for MFP by
separating the basic Data frame case from Management frame processing
and handle the Management frames only if MFP has been negotiated. In
addition, do not use sdata->drop_unencrypted for Management frames
since the decision on whether to accept the frame depends on the key
being configured.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:55 -04:00
Jouni Malinen
0c7c10c7cc mac80211: Drop unencrypted frames based on key setup
When using nl80211, we do not have a mechanism to set
sdata->drop_unencrypted. Currently, this breaks code that is supposed
to drop unencrypted frames when protection is expected since
ieee80211_rx_h_decrypt() is optimized to not set rx->key when the
frame is not protected.

This patch modifies ieee80211_rx_h_decrypt() to set rx->key for all
frames and only skip decryption if the frame is not protected. This
allows ieee80211_drop_unencrypted() to correctly drop frames even if
drop_unencrypted is not set.

The changes here are not enough to handle all cases, though. Additional
patches will be needed to implement proper IEEE 802.1X PAE for station
mode (currently, this is only used for AP mode) and some additional
rules are needed for MFP to drop unprotected Robust Action frames prior
to having PTK and IGTK configured.

In theory, the unprotected frames could and should be dropped in
ieee80211_rx_h_decrypt(). However, due to the special case with EAPOL
frames that have to be allowed to be received unprotected even when
keys are set, it is simpler to only set rx->key and allow the
ieee80211_frame_allowed() function to handle the actual dropping of
data frames after 802.11->802.3 header conversion. In addition,
unprotected robust management frames are dropped before they are
processed.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:55 -04:00
Johannes Berg
0b258582fe cfg80211: fix wext iw_freq parsing
The function to parse a struct iw_freq has a stupid bug,
it returns NULL when the channel cannot be found at all,
but NULL is supposed to mean "auto". Fix this by checking
the return value of ieee80211_get_channel() and returning
ERR_PTR(-EINVAL) if it returned NULL (channel not found).

This fixes an issue where you could say (in IBSS mode)
	iwconfig wlan0 channel 21
and it would use channel 1 instead because that's the
first available channel with IBSS allowed (which is what
the "auto" setting uses).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:54 -04:00
Johannes Berg
aa837e1d6b mac80211: set default QoS values according to spec
We've never really cared about the default QoS (WMM) values, but
we really should if the AP doesn't send any. This patch makes
mac80211 use the default values according to 802.11-2007, and
additionally syncs the default values when we disassociate so
whatever the last AP said gets "unconfigured".

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:54 -04:00
Johannes Berg
58905ca5b1 mac80211: fix scan channel race
When a software scan starts, it first sets sw_scanning, but
leaves the scan_channel "unset" (it currently actually gets
initialised to a default). Now, when something else tries
to (re)configure the hardware in the window between these two
events (after sw_scanning = true, but before scan_channel is
set), the current code switches to the (unset!) scan_channel.
This causes trouble, especially when switching bands and
sending frames on the wrong channel.

To work around this, leave scan_channel initialised to NULL
and use it to determine whether or not a switch to a different
channel should occur (and also use the same condition to check
whether to adjust power for scan or not).

Additionally, avoid reconfiguring the hardware completely when
recalculating idle resulted in no changes, this was the problem
that originally led us to discover the race condition in the
first place, which was helpfully bisected by Pavel. This part
of the patch should not be necessary with the other fixes, but
not calling the ieee80211_hw_config function when we know it to
be unnecessary is certainly a correct thing to do.

Unfortunately, this patch cannot and does not fix the race
condition completely, but due to the way the scan code is
structured it makes the particular problem Pavel discovered
(race while changing channel at the same time as transmitting
frames) go away. To fix it completely, more work especially
with locking configuration is needed.

Bisected-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:54 -04:00
Jouni Malinen
dc6382ced0 nl80211 : Add support for configuring MFP
NL80211_CMD_ASSOCIATE request must be able to indicate whether
management frame protection (IEEE 802.11w) is being used. mac80211 was
able to use MFP in client mode only with WEXT, but the new
NL80211_ATTR_USE_MFP attribute will allow this to be done with
nl80211, too.

Since we are currently using nl80211 for MFP only with drivers that
use user space SME, only MFP disabled and required values are
used. However, the NL80211_ATTR_USE_MFP attribute is an enum that can
be extended with MFP optional in the future, if that is needed with
some drivers (e.g., if the RSN IE is generated by the driver).

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:23:54 -04:00
John W. Linville
621ad7c96a mac80211: avoid NULL ptr deref when finding max_rates in PID and minstrel
"There is another problem with this piece of code. The sband will be NULL
after second iteration on single band device and cause null pointer
dereference. Everything is working with dual band card. Sorry, but i
don't know how to explain this clearly in English. I have looked on the
second patch for pid algorithm and found similar bug."

Reported-by: Karol Szuster <qflon@o2.pl>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-05-11 15:07:01 -04:00
Linus Torvalds
2ad20802b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
  bonding: fix panic if initialization fails
  IXP4xx: complete Ethernet netdev setup before calling register_netdev().
  IXP4xx: use "ENODEV" instead of "ENOSYS" in module initialization.
  ipvs: Fix IPv4 FWMARK virtual services
  ipv4: Make INET_LRO a bool instead of tristate.
  net: remove stale reference to fastroute from Kconfig help text
  net: update skb_recycle_check() for hardware timestamping changes
  bnx2: Fix panic in bnx2_poll_work().
  net-sched: fix bfifo default limit
  igb: resolve panic on shutdown when SR-IOV is enabled
  wimax: oops: wimax_dev_add() is the only one that can initialize the state
  wimax: fix oops if netlink fails to add attribute
  Bluetooth: Move dev_set_name() to a context that can sleep
  netfilter: ctnetlink: fix wrong message type in user updates
  netfilter: xt_cluster: fix use of cluster match with 32 nodes
  netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONE
  netfilter: add missing linux/types.h include to xt_LED.h
  mac80211: pid, fix memory corruption
  mac80211: minstrel, fix memory corruption
  cfg80211: fix comment on regulatory hint processing
  ...
2009-05-10 10:46:45 -07:00
Marcel Holtmann
3d7a9d1c7e Bluetooth: Don't trigger disconnect timeout for security mode 3 pairing
A remote device in security mode 3 that tries to connect will require
the pairing during the connection setup phase. The disconnect timeout
is now triggered within 10 milliseconds and causes the pairing to fail.

If a connection is not fully established and a PIN code request is
received, don't trigger the disconnect timeout. The either successful
or failing connection complete event will make sure that the timeout
is triggered at the right time.

The biggest problem with security mode 3 is that many Bluetooth 2.0
device and before use a temporary security mode 3 for dedicated
bonding.

Based on a report by Johan Hedberg <johan.hedberg@nokia.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Johan Hedberg <johan.hedberg@nokia.com>
2009-05-09 18:09:52 -07:00
Marcel Holtmann
1b0336bb36 Bluetooth: Don't use hci_acl_connect_cancel() for incoming connections
The connection setup phase takes around 2 seconds or longer and in
that time it is possible that the need for an ACL connection is no
longer present. If that happens then, the connection attempt will
be canceled.

This only applies to outgoing connections, but currently it can also
be triggered by incoming connection. Don't call hci_acl_connect_cancel()
on incoming connection since these have to be either accepted or rejected
in this state. Once they are successfully connected they need to be
fully disconnected anyway.

Also remove the wrong hci_acl_disconn() call for SCO and eSCO links
since at this stage they can't be disconnected either, because the
connection handle is still unknown.

Based on a report by Johan Hedberg <johan.hedberg@nokia.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Johan Hedberg <johan.hedberg@nokia.com>
2009-05-09 18:09:45 -07:00
Marcel Holtmann
384943ec1b Bluetooth: Fix wrong module refcount when connection setup fails
The module refcount is increased by hci_dev_hold() call in hci_conn_add()
and decreased by hci_dev_put() call in del_conn(). In case the connection
setup fails, hci_dev_put() is never called.

Procedure to reproduce the issue:

  # hciconfig hci0 up
  # lsmod | grep btusb                   -> "used by" refcount = 1

  # hcitool cc <non-exisiting bdaddr>    -> will get timeout

  # lsmod | grep btusb                   -> "used by" refcount = 2
  # hciconfig hci0 down
  # lsmod | grep btusb                   -> "used by" refcount = 1
  # rmmod btusb                          -> ERROR: Module btusb is in use

The hci_dev_put() call got moved into del_conn() with the 2.6.25 kernel
to fix an issue with hci_dev going away before hci_conn. However that
change was wrong and introduced this problem.

When calling hci_conn_del() it has to call hci_dev_put() after freeing
the connection details. This handling should be fully symmetric. The
execution of del_conn() is done in a work queue and needs it own calls
to hci_dev_hold() and hci_dev_put() to ensure that the hci_dev stays
until the connection cleanup has been finished.

Based on a report by Bing Zhao <bzhao@marvell.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: Bing Zhao <bzhao@marvell.com>
2009-05-09 18:09:38 -07:00
Jiri Pirko
ab9c73ccb5 net: check retval of dev_addr_init()
Add missed checking of dev_addr_init return value in alloc_netdev_mq.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 net/core/dev.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-09 13:15:48 -07:00
Steven Whitehouse
9948bb6a6d decnet: Use data ready call back, rather than hand coding it
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-09 13:15:47 -07:00
John Dykstra
61de71c67c Network Drop Monitor: Fix skb_kill_datagram
Commit ead2ceb0ec ("Network Drop Monitor:
Adding kfree_skb_clean for non-drops and modifying end-of-line points
for skbs") established new conventions for identifying dropped packets.

Align skb_kill_datagram() with these conventions so that packets that
get dropped just before the copy to userspace are properly tracked.

Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08 14:57:01 -07:00
Simon Horman
be8be9eccb ipvs: Fix IPv4 FWMARK virtual services
This fixes the use of fwmarks to denote IPv4 virtual services
which was unfortunately broken as a result of the integration
of IPv6 support into IPVS, which was included in 2.6.28.

The problem arises because fwmarks are stored in the 4th octet
of a union nf_inet_addr .all, however in the case of IPv4 only
the first octet, corresponding to .ip, is assigned and compared.

In other words, using .all = { 0, 0, 0, htonl(svc->fwmark) always
results in a value of 0 (32bits) being stored for IPv4. This means
that one fwmark can be used, as it ends up being mapped to 0, but things
break down when multiple fwmarks are used, as they all end up being mapped
to 0.

As fwmarks are 32bits a reasonable fix seems to be to just store the fwmark
in .ip, and comparing and storing .ip when fwmarks are used.

This patch makes the assumption that in calls to ip_vs_ct_in_get()
and ip_vs_sched_persist() if the proto parameter is IPPROTO_IP then
we are dealing with an fwmark. I believe this is valid as ip_vs_in()
does fairly strict filtering on the protocol and IPPROTO_IP should
not be used in these calls unless explicitly passed when making
these calls for fwmarks in ip_vs_sched_persist().

Tested-by: Fabien Duchêne <fabien.duchene@student.uclouvain.be>
Cc: Joseph Mack NA3T <jmack@wm7d.net>
Cc: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08 14:54:47 -07:00
David S. Miller
a8679be207 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-05-08 12:46:17 -07:00
David S. Miller
e81963b180 ipv4: Make INET_LRO a bool instead of tristate.
This code is used as a library by several device drivers,
which select INET_LRO.

If some are modules and some are statically built into the
kernel, we get build failures if INET_LRO is modular.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-08 12:45:26 -07:00
David S. Miller
22f6dacdfc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	include/net/tcp.h
2009-05-08 02:48:30 -07:00
Jan Engelhardt
451853645f netfilter: xtables: print hook name instead of mask
Users cannot make anything of these numbers. Let's just tell them
directly.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08 10:30:50 +02:00
Jan Engelhardt
bb70dfa5f8 netfilter: xtables: consolidate comefrom debug cast access
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08 10:30:49 +02:00
Jan Engelhardt
7a6b1c46e2 netfilter: xtables: remove another level of indent
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08 10:30:49 +02:00
Jan Engelhardt
9452258d81 netfilter: xtables: remove some goto
Combining two ifs, and goto is easily gone.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-05-08 10:30:48 +02:00