Commit graph

2431 commits

Author SHA1 Message Date
Michael Chan
b0da853703 [NET]: Add ECN support for TSO
In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev->
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

With help from Herbert Xu <herbert@gondor.apana.org.au>.

Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock
flag is completely removed.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:58:08 -07:00
Catherine Zhang
877ce7c1b3 [AF_UNIX]: Datagram getpeersec
This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket.  The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.

Patch design and implementation:

The design and implementation is very similar to the UDP case for INET
sockets.  Basically we build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).  To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt.  Then the application
retrieves the security context using the auxiliary data mechanism.

An example server application for Unix datagram socket should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_SOCKET &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.

Testing:

We have tested the patch by setting up Unix datagram client and server
applications.  We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.

Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:58:06 -07:00
Herbert Xu
3d3a853379 [NET]: Make illegal_highdma more anal
Rather than having illegal_highdma as a macro when HIGHMEM is off, we
can turn it into an inline function that returns zero.  This will catch
callers that give it bad arguments.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:59 -07:00
Sridhar Samudrala
47da8ee681 [TCP]: Export accept queue len of a TCP listening socket via rx_queue
While debugging a TCP server hang issue, we noticed that currently there is
no way for a user to get the acceptq backlog value for a TCP listen socket.

All the standard networking utilities that display socket info like netstat,
ss and /proc/net/tcp have 2 fields called rx_queue and tx_queue. These
fields do not mean much for listening sockets. This patch uses one of these
unused fields(rx_queue) to export the accept queue len for listening sockets.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:57 -07:00
Darrel Goeddel
c7bdb545d2 [NETLINK]: Encapsulate eff_cap usage within security framework.
This patch encapsulates the usage of eff_cap (in netlink_skb_params) within
the security framework by extending security_netlink_recv to include a required
capability parameter and converting all direct usage of eff_caps outside
of the lsm modules to use the interface.  It also updates the SELinux
implementation of the security_netlink_send and security_netlink_recv
hooks to take advantage of the sid in the netlink_skb_params struct.
This also enables SELinux to perform auditing of netlink capability checks.
Please apply, for 2.6.18 if possible.

Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by:  James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:55 -07:00
Herbert Xu
576a30eb64 [NET]: Added GSO header verification
When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.

Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY.  Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST.  If the device doesn't have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.

This patch changes the sg flag to a full features flag.  The same method
can be used to implement TSO ECN support.  We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them.  The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:53 -07:00
Patrick McHardy
68c1692e3e [NETFILTER]: statistic match: add missing Kconfig help text
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:50 -07:00
Patrick McHardy
ef47c6a7b8 [NETFILTER]: ip_queue/nfnetlink_queue: drop bridge port references when dev disappears
When a device that is acting as a bridge port is unregistered, the
ip_queue/nfnetlink_queue notifier doesn't check if its one of
physindev/physoutdev and doesn't release the references if it is.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:48 -07:00
Jorge Matias
1c7e47726a [NETFILTER]: xt_sctp: fix --chunk-types matching
xt_sctp uses an incorrect header offset when --chunk-types is used.

Signed-off-by: Jorge Matias <jorge.matias@motorola.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:46 -07:00
Yuri Gushin
9abdcf6b6c [NETFILTER]: xt_tcpudp: fix double unregistration in error path
"xt_unregister_match(AF_INET, &tcp_matchstruct)" is called twice,
leaving "udp_matchstruct" registered, in case of a failure in the
registration of the udp6 structure.

Signed-off-by: Yuri Gushin <yuri@ecl-labs.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:44 -07:00
Yasuyuki Kozakai
40a839fdbd [NETFILTER]: nf_conntrack: Fix undefined references to local_bh_*
CC      net/netfilter/nf_conntrack_proto_sctp.o
net/netfilter/nf_conntrack_proto_sctp.c: In function `sctp_print_conntrack':
net/netfilter/nf_conntrack_proto_sctp.c:206: warning: implicit declaration of function `local_bh_disable'
net/netfilter/nf_conntrack_proto_sctp.c:208: warning: implicit declaration of function `local_bh_enable'
  CC      net/netfilter/nf_conntrack_netlink.o
net/netfilter/nf_conntrack_netlink.c: In function `ctnetlink_dump_table':
net/netfilter/nf_conntrack_netlink.c:429: warning: implicit declaration of function `local_bh_disable'
net/netfilter/nf_conntrack_netlink.c:452: warning: implicit declaration of function `local_bh_enable'

Spotted by Toralf Förster

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:42 -07:00
Patrick McHardy
da298d3a4f [NETFILTER]: x_tables: fix xt_register_table error propagation
When xt_register_table fails the error is not properly propagated back.
Based on patch by Lepton Wu <ytht.net@gmail.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29 16:57:40 -07:00
Linus Torvalds
602cada851 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6: (22 commits)
  [PATCH] devfs: Remove it from the feature_removal.txt file
  [PATCH] devfs: Last little devfs cleanups throughout the kernel tree.
  [PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV
  [PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the line_driver devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the videodevice devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed
  [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
  [PATCH] devfs: Remove devfs_remove() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_symlink() function from the kernel tree
  [PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree
  [PATCH] devfs: Remove devfs_*_tape() functions from the kernel tree
  [PATCH] devfs: Remove devfs support from the sound subsystem
  [PATCH] devfs: Remove devfs support from the ide subsystem.
  [PATCH] devfs: Remove devfs support from the serial subsystem
  [PATCH] devfs: Remove devfs from the init code
  [PATCH] devfs: Remove devfs from the partition code
  ...
2006-06-29 14:19:21 -07:00
Paul Fulghum
817d6d3bce [PATCH] remove TTY_DONT_FLIP
Remove TTY_DONT_FLIP tty flag.  This flag was introduced in 2.1.X kernels
to prevent the N_TTY line discipline functions read_chan() and
n_tty_receive_buf() from running at the same time.  2.2.15 introduced
tty->read_lock to protect access to the N_TTY read buffer, which is the
only state requiring protection between these two functions.

The current TTY_DONT_FLIP implementation is broken for SMP, and is not
universally honored by drivers that send data directly to the line
discipline receive_buf function.

Because TTY_DONT_FLIP is not necessary, is broken in implementation, and is
not universally honored, it is removed.

Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28 14:59:05 -07:00
Ingo Molnar
34af946a22 [PATCH] spin/rwlock init cleanups
locking init cleanups:

 - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
 - convert rwlocks in a similar manner

this patch was generated automatically.

Motivation:

 - cleanliness
 - lockdep needs control of lock initialization, which the open-coded
   variants do not give
 - it's also useful for -rt and for lock debugging in general

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:39 -07:00
Linus Torvalds
da206c9e68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  typo fixes
  Clean up 'inline is not at beginning' warnings for usb storage
  Storage class should be first
  i386: Trivial typo fixes
  ixj: make ixj_set_tone_off() static
  spelling fixes
  fix paniced->panicked typos
  Spelling fixes for Documentation/atomic_ops.txt
  move acknowledgment for Mark Adler to CREDITS
  remove the bouncing email address of David Campbell
2006-06-26 13:33:14 -07:00
Greg Kroah-Hartman
331b831983 [PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV
I've always found this flag confusing.  Now that devfs is no longer around, it
has been renamed, and the documentation for when this flag should be used has
been updated.

Also fixes all drivers that use this flag.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:09 -07:00
Greg Kroah-Hartman
f4eaa37017 [PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed
Also fixes all drivers that set this field.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:09 -07:00
Greg Kroah-Hartman
ff23eca3e8 [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
Also fixes up all files that #include it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:08 -07:00
Linus Torvalds
61a46dc9d1 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  [IOAT]: Do not dereference THIS_MODULE directly to set unsafe.
  [NETROM]: Fix possible null pointer dereference.
  [NET] netpoll: break recursive loop in netpoll rx path
  [NET] netpoll: don't spin forever sending to stopped queues
  [IRDA]: add some IBM think pads
  [ATM]: atm/mpc.c warning fix
  [NET]: skb_find_text ignores to argument
  [NET]: make net/core/dev.c:netdev_nit static
  [NET]: Fix GSO problems in dev_hard_start_xmit()
  [NET]: Fix CHECKSUM_HW GSO problems.
  [TIPC]: Fix incorrect correction to discovery timer frequency computation.
  [TIPC]: Get rid of dynamically allocated arrays in broadcast code.
  [TIPC]: Fixed link switchover bugs
  [TIPC]: Enhanced & cleaned up system messages; fixed 2 obscure memory leaks.
  [TIPC]: First phase of assert() cleanup
  [TIPC]: Disallow config operations that aren't supported in certain modes.
  [TIPC]: Fixed memory leak in tipc_link_send() when destination is unreachable
  [TIPC]: Added missing warning for out-of-memory condition
  [TIPC]: Withdrawing all names from nameless port now returns success, not error
  [TIPC]: Optimized argument validation done by connect().
  ...
2006-06-26 10:08:13 -07:00
Akinobu Mita
a842ef297f [PATCH] net/rxrpc: use list_move()
This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under net/rxrpc.

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:17 -07:00
Andreas Mohr
d6e05edc59 spelling fixes
acquired (aquired)
contiguous (contigious)
successful (succesful, succesfull)
surprise (suprise)
whether (weather)
some other misspellings

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-26 18:35:02 +02:00
Ralf Baechle
52383678a8 [NETROM]: Fix possible null pointer dereference.
If in nr_link_failed the neighbour list is non-empty but the node list
is empty we'll end dereferencing a  in a NULL pointer.

This fixes coverity 362.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-26 00:05:23 -07:00
Neil Horman
068c6e98bc [NET] netpoll: break recursive loop in netpoll rx path
The netpoll system currently has a rx to tx path via:

netpoll_rx
 __netpoll_rx
  arp_reply
   netpoll_send_skb
    dev->hard_start_tx

This rx->tx loop places network drivers at risk of inadvertently causing a
deadlock or BUG halt by recursively trying to acquire a spinlock that is
used in both their rx and tx paths (this problem was origionally reported
to me in the 3c59x driver, which shares a spinlock between the
boomerang_interrupt and boomerang_start_xmit routines).

This patch breaks this loop, by queueing arp frames, so that they can be
responded to after all receive operations have been completed.  Tested by
myself and the reported with successful results.

Specifically it was tested with netdump.  Heres the BZ with details:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=194055

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-26 00:04:27 -07:00
Jeremy Fitzhardinge
8834807b43 [NET] netpoll: don't spin forever sending to stopped queues
When transmitting a skb in netpoll_send_skb(), only retry a limited number
of times if the device queue is stopped.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-26 00:03:40 -07:00
Andrew Morton
1e4fd51e2c [ATM]: atm/mpc.c warning fix
net/atm/mpc.c: In function 'MPOA_res_reply_rcvd':
net/atm/mpc.c:1116: warning: unused variable 'ip'

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-26 00:01:58 -07:00
Phil Oester
f72b948dcb [NET]: skb_find_text ignores to argument
skb_find_text takes a "to" argument which is supposed to limit how
far into the skb it will search for the given text.  At present,
it seems to ignore that argument on the first skb, and instead
return a match even if the text occurs beyond the limit.

Patch below fixes this, after adjusting for the "from" starting
point.  This consequently fixes the netfilter string match's "--to"
handling, which currently is broken.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-26 00:00:57 -07:00
Adrian Bunk
6048126440 [NET]: make net/core/dev.c:netdev_nit static
netdev_nit can now become static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:58:10 -07:00
Michael Chan
f54d9e8d7f [NET]: Fix GSO problems in dev_hard_start_xmit()
Fix 2 problems in dev_hard_start_xmit():

1. nskb->next needs to link back to skb->next if hard_start_xmit()
returns non-zero.

2. Since the total number of GSO fragments may exceed MAX_SKB_FRAGS + 1,
it needs to stop transmitting if the netif_queue is stopped.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:57:04 -07:00
Herbert Xu
0718bcc09b [NET]: Fix CHECKSUM_HW GSO problems.
Fix checksum problems in the GSO code path for CHECKSUM_HW packets.

The ipv4 TCP pseudo header checksum has to be adjusted for GSO
segmented packets.

The adjustment is needed because the length field in the pseudo-header
changes.  However, because we have the inequality oldlen > newlen, we
know that delta = (u16)~oldlen + newlen is still a 16-bit quantity.
This also means that htonl(delta) + th->check still fits in 32 bits.
Therefore we don't have to use csum_add on this operations.

This is based on a patch by Michael Chan <mchan@broadcom.com>.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:55:46 -07:00
Allan Stephens
3ba07e65b2 [TIPC]: Fix incorrect correction to discovery timer frequency computation.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:53:47 -07:00
Allan Stephens
65f51ef097 [TIPC]: Get rid of dynamically allocated arrays in broadcast code.
This change improves an earlier change which replaced the large local
variable arrays used during broadcasting with dynamically allocated arrays.
The temporary arrays are now incoprorated into the multicast link data
structure.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:53:20 -07:00
Allan Stephens
5392d64688 [TIPC]: Fixed link switchover bugs
Incorporates several related fixes:
- switchover now occurs when switching from an active link to a standby link
- failure of a standby link no longer initiates switchover
- links now display correct # of received packtes following reactivation

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:52:50 -07:00
Allan Stephens
a10bd924a4 [TIPC]: Enhanced & cleaned up system messages; fixed 2 obscure memory leaks.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:52:17 -07:00
Allan Stephens
f131072c3d [TIPC]: First phase of assert() cleanup
This also contains enhancements to simplify comparisons in name table
publication removal algorithm and to simplify name table sanity checking
when shutting down TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:51:37 -07:00
Allan Stephens
e100ae92a6 [TIPC]: Disallow config operations that aren't supported in certain modes.
This change provides user-friendly feedback when TIPC is unable to perform
certain configuration operations that don't work properly in certain modes.
(In particular, any reconfiguration request that would temporarily take TIPC
from network mode to standalone mode, or from standalone mode to not running
mode, is disallowed.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:51:08 -07:00
Allan Stephens
c33d53b235 [TIPC]: Fixed memory leak in tipc_link_send() when destination is unreachable
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:50:30 -07:00
Allan Stephens
a75bf87427 [TIPC]: Added missing warning for out-of-memory condition
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:50:01 -07:00
Allan Stephens
a7513528cd [TIPC]: Withdrawing all names from nameless port now returns success, not error
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:49:33 -07:00
Allan Stephens
51f9cc1ff8 [TIPC]: Optimized argument validation done by connect().
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:49:06 -07:00
Allan Stephens
a3b0a5a9d0 [TIPC]: Simplify code for returning partial success of stream send request.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:48:22 -07:00
Allan Stephens
4b087b28a6 [TIPC]: recvmsg() now returns TIPC ancillary data using correct level (SOL_TIPC)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:47:44 -07:00
Allan Stephens
499786516f [TIPC]: Improved performance of error checking during socket creation.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:47:18 -07:00
Allan Stephens
1303e8f173 [TIPC]: Stream socket send indicates partial success if data partially sent.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:46:50 -07:00
Allan Stephens
bdd94789d2 [TIPC]: Connected send now checks socket state when retrying congested send.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:45:53 -07:00
Allan Stephens
3546c7508d [TIPC]: Can now return destination name of form {0,x,y} via ancillary data.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:45:24 -07:00
Allan Stephens
3388007bc4 [TIPC]: Implied connect now saves dest name for retrieval as ancillary data.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:44:57 -07:00
Allan Stephens
6b384de853 [TIPC]: Fixed connect() to detect a dest address that is missing or too short.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:44:27 -07:00
Allan Stephens
e9024f0f79 [TIPC]: Non-operation-affecting corrections to comments & function definitions.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:43:57 -07:00
Allan Stephens
687a25f1cd [TIPC]: Validate entire interface name when locating bearer to enable.
This fix prevents a bearer from being enabled using the wrong interface.
For example, specifying "eth:eth14" might enable "eth:eth1" by mistake.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
2006-06-25 23:43:21 -07:00
Allan Stephens
a592ea6362 [TIPC]: Added support for MODULE_VERSION capability.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:42:47 -07:00
Allan Stephens
8b1f0a92e9 [TIPC]: Fix misleading comment in buf_discard() routine.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:42:19 -07:00
Allan Stephens
70cb234770 [TIPC]: Fixed privilege checking typo in dest_name_check().
This patch originated by Stephane Ouellette <ouellettes@videotron.ca>.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:41:47 -07:00
Eric Sesterhenn
3ac90216ab [TIPC] Fix for NULL pointer dereference
This fixes a bug spotted by the coverity checker, bug id #366. If
(mod(seqno - prev) != 1) we set buf to NULL, dereference it in the for
case, and set it to whatever value happes to be at adress 0+next, if it
happens to be non-zero, we even stay in the loop. It seems that the author
intended to break there.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:41:15 -07:00
Allan Stephens
a4e0927902 [TIPC]: Allow compilation when CONFIG_TIPC_DEBUG is not set.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:40:35 -07:00
Allan Stephens
d356eeba8e [TIPC]: Multicast link failure now resets all links to "nacking" node.
This fix prevents node from crashing.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:40:01 -07:00
Allan Stephens
260082471e [TIPC]: Links now validate destination node specified by incoming messages.
This fix prevents link flopping and name table inconsistency problems arising
when a node is assigned a different <Z.C.N> value than it used previously.
(Changing the <Z.C.N> value causes other nodes to have two link endpoints
sending to the same MAC address using two different destination <Z.C.N> values,
requiring the receiving node to filter out the unwanted messages.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:39:31 -07:00
Allan Stephens
9688243b63 [TIPC]: Allow ports to receive multicast messages through native API.
This fix prevents a kernel panic if an application mistakenly sends a
multicast message to  TIPC's topology service or configuration service.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:38:58 -07:00
Allan Stephens
4938450789 [TIPC]: Corrected potential misuse of tipc_media_addr structure.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:38:29 -07:00
Allan Stephens
2535ec50b7 [TIPC]: Use correct upper bound when validating network zone number.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:38:00 -07:00
Allan Stephens
9ab230f82f [TIPC]: Prevent name table corruption if no room for new publication
Now exits cleanly if attempt to allocate larger array of subsequences fails,
without losing track of pointer to existing array.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:37:24 -07:00
Jon Maloy
5e3c8854c1 [TIPC] Improved tolerance to promiscuous mode interface
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-25 23:36:43 -07:00
Linus Torvalds
1d77062b14 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (51 commits)
  nfs: remove nfs_put_link()
  nfs-build-fix-99
  git-nfs-build-fixes
  Merge branch 'odirect'
  NFS: alloc nfs_read/write_data as direct I/O is scheduled
  NFS: Eliminate nfs_get_user_pages()
  NFS: refactor nfs_direct_free_user_pages
  NFS: remove user_addr, user_count, and pos from nfs_direct_req
  NFS: "open code" the NFS direct write rescheduler
  NFS: Separate functions for counting outstanding NFS direct I/Os
  NLM: Fix reclaim races
  NLM: sem to mutex conversion
  locks.c: add the fl_owner to nlm_compare_locks
  NFS: Display the chosen RPCSEC_GSS security flavour in /proc/mounts
  NFS: Split fs/nfs/inode.c
  NFS: Fix typo in nfs_do_clone_mount()
  NFS: Fix compile errors introduced by referrals patches
  NFSv4: Ensure that referral mounts bind to a reserved port
  NFSv4: A root pathname is sent as a zero component4
  NFSv4: Follow a referral
  ...
2006-06-25 10:54:14 -07:00
Paul Mackerras
bfe5d83419 [PATCH] Define __raw_get_cpu_var and use it
There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled.  For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.

This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Andrew Morton
fb1bb34d45 [PATCH] remove for_each_cpu()
Convert a few stragglers over to for_each_possible_cpu(), remove
for_each_cpu().

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:00:54 -07:00
Trond Myklebust
816724e65c Merge branch 'master' of /home/trondmy/kernel/linux-2.6/
Conflicts:

	fs/nfs/inode.c
	fs/super.c

Fix conflicts between patch 'NFS: Split fs/nfs/inode.c' and patch
'VFS: Permit filesystem to override root dentry on mount'
2006-06-24 13:07:53 -04:00
Linus Torvalds
199f4c9f76 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NET]: Require CAP_NET_ADMIN to create tuntap devices.
  [NET]: fix net-core kernel-doc
  [TCP]: Move inclusion of <linux/dmaengine.h> to correct place in <linux/tcp.h>
  [IPSEC]: Handle GSO packets
  [NET]: Added GSO toggle
  [NET]: Add software TSOv4
  [NET]: Add generic segmentation offload
  [NET]: Merge TSO/UFO fields in sk_buff
  [NET]: Prevent transmission after dev_deactivate
  [IPV6] ADDRCONF: Fix default source address selection without CONFIG_IPV6_PRIVACY
  [IPV6]: Fix source address selection.
  [NET]: Avoid allocating skb in skb_pad
2006-06-23 08:00:01 -07:00
Oleg Nesterov
626ab0e69d [PATCH] list: use list_replace_init() instead of list_splice_init()
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1.  We can use list_replace_init() instead.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:07 -07:00
Jean-Luc Leger
538c5902b8 [PATCH] clean up default value of IP_DCCP_ACKVEC
Default values for boolean and tristate options can only be 'y', 'm' or 'n'.
This patch removes wrong default for IP_DCCP_ACKVEC.

Signed-off-by: Jean-Luc Leger <jean-luc.leger@dspnet.fr.eu.org>
Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:43:04 -07:00
David Howells
454e2398be [PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers.  For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing.  In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

 (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
     pointer argument and return an integer, so most filesystems have to change
     very little.

 (*) If one of the convenience function is not used, then get_sb() should
     normally call simple_set_mnt() to instantiate the vfsmount. This will
     always return 0, and so can be tail-called from get_sb().

 (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
     dcache upon superblock destruction rather than shrink_dcache_anon().

     This is required because the superblock may now have multiple trees that
     aren't actually bound to s_root, but that still need to be cleaned up. The
     currently called functions assume that the whole tree is rooted at s_root,
     and that anonymous dentries are not the roots of trees which results in
     dentries being left unculled.

     However, with the way NFS superblock sharing are currently set to be
     implemented, these assumptions are violated: the root of the filesystem is
     simply a dummy dentry and inode (the real inode for '/' may well be
     inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
     with child trees.

     [*] Anonymous until discovered from another tree.

 (*) The documentation has been adjusted, including the additional bit of
     changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Randy Dunlap
f4b8ea7849 [NET]: fix net-core kernel-doc
Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:42 -07:00
Herbert Xu
09b8f7a93e [IPSEC]: Handle GSO packets
This patch segments GSO packets received by the IPsec stack.  This can
happen when a NIC driver injects GSO packets into the stack which are
then forwarded to another host.

The primary application of this is going to be Xen where its backend
driver may inject GSO packets into dom0.

Of course this also can be used by other virtualisation schemes such as
VMWare or UML since the tap device could be modified to inject GSO packets
received through splice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:38 -07:00
Herbert Xu
37c3185a02 [NET]: Added GSO toggle
This patch adds a generic segmentation offload toggle that can be turned
on/off for each net device.  For now it only supports in TCPv4.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:36 -07:00
Herbert Xu
f4c50d990d [NET]: Add software TSOv4
This patch adds the GSO implementation for IPv4 TCP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:33 -07:00
Herbert Xu
f6a78bfcb1 [NET]: Add generic segmentation offload
This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.

The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:31 -07:00
Herbert Xu
7967168cef [NET]: Merge TSO/UFO fields in sk_buff
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP).  So
let's merge them.

They were used to tell the protocol of a packet.  This function has been
subsumed by the new gso_type field.  This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb.  As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.

I've made gso_type a conjunction.  The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN.  All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4.  This means that only the CWR packets need
to be emulated in software.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:29 -07:00
Herbert Xu
d4828d85d1 [NET]: Prevent transmission after dev_deactivate
The dev_deactivate function has bit-rotted since the introduction of
lockless drivers.  In particular, the spin_unlock_wait call at the end
has no effect on the xmit routine of lockless drivers.

With a little bit of work, we can make it much more useful by providing
the guarantee that when it returns, no more calls to the xmit routine
of the underlying driver will be made.

The idea is simple.  There are two entry points in to the xmit routine.
The first comes from dev_queue_xmit.  That one is easily stopped by
using synchronize_rcu.  This works because we set the qdisc to noop_qdisc
before the synchronize_rcu call.  That in turn causes all subsequent
packets sent to dev_queue_xmit to be dropped.  The synchronize_rcu call
also ensures all outstanding calls leave their critical section.

The other entry point is from qdisc_run.  Since we now have a bit that
indicates whether it's running, all we have to do is to wait until the
bit is off.

I've removed the loop to wait for __LINK_STATE_SCHED to clear.  This is
useless because netif_wake_queue can cause it to be set again.  It is
also harmless because we've disarmed qdisc_run.

I've also removed the spin_unlock_wait on xmit_lock because its only
purpose of making sure that all outstanding xmit_lock holders have
exited is also given by dev_watchdog_down.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:26 -07:00
YOSHIFUJI Hideaki
5e2707fa3a [IPV6] ADDRCONF: Fix default source address selection without CONFIG_IPV6_PRIVACY
We need to update hiscore.rule even if we don't enable CONFIG_IPV6_PRIVACY,
because we have more less significant rule; longest match.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:24 -07:00
Łukasz Stelmach
102128e3a2 [IPV6]: Fix source address selection.
Two additional labels (RFC 3484, sec. 10.3) for IPv6 addreses
are defined to make a distinction between global unicast
addresses and Unique Local Addresses (fc00::/7, RFC 4193) and
Teredo (2001::/32, RFC 4380). It is necessary to avoid attempts
of connection that would either fail (eg. fec0:: to 2001:feed::)
or be sub-optimal (2001:0:: to 2001:feed::).

Signed-off-by: Łukasz Stelmach <stlman@poczta.fm>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:07:22 -07:00
Herbert Xu
5b057c6b1a [NET]: Avoid allocating skb in skb_pad
First of all it is unnecessary to allocate a new skb in skb_pad since
the existing one is not shared.  More importantly, our hard_start_xmit
interface does not allow a new skb to be allocated since that breaks
requeueing.

This patch uses pskb_expand_head to expand the existing skb and linearize
it if needed.  Actually, someone should sift through every instance of
skb_pad on a non-linear skb as they do not fit the reasons why this was
originally created.

Incidentally, this fixes a minor bug when the skb is cloned (tcpdump,
TCP, etc.).  As it is skb_pad will simply write over a cloned skb.  Because
of the position of the write it is unlikely to cause problems but still
it's best if we don't do it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23 02:06:41 -07:00
Jeff Garzik
dbe1ab9514 Merge branch 'master' into upstream 2006-06-22 22:51:46 -04:00
Trond Myklebust
d59bf96cdd Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ 2006-06-20 08:59:45 -04:00
Al Viro
ff7512e1a2 [ATM]: fix broken uses of NIPQUAD in net/atm
NIPQUAD expects an l-value of type __be32, _NOT_ a pointer to __be32.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-20 03:27:27 -07:00
Al Viro
8ca84481b6 [SCTP]: sctp_unpack_cookie() fix
sizeof(pointer) != sizeof(array)...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-20 03:26:14 -07:00
Jeff Garzik
4b2d9cf009 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream 2006-06-20 04:46:02 -04:00
Herbert Xu
48d83325b6 [NET]: Prevent multiple qdisc runs
Having two or more qdisc_run's contend against each other is bad because
it can induce packet reordering if the packets have to be requeued.  It
appears that this is an unintended consequence of relinquinshing the queue
lock while transmitting.  That in turn is needed for devices that spend a
lot of time in their transmit routine.

There are no advantages to be had as devices with queues are inherently
single-threaded (the loopback device is not but then it doesn't have a
queue).

Even if you were to add a queue to a parallel virtual device (e.g., bolt
a tbf filter in front of an ipip tunnel device), you would still want to
process the queue in sequence to ensure that the packets are ordered
correctly.

The solution here is to steal a bit from net_device to prevent this.

BTW, as qdisc_restart is no longer used by anyone as a module inside the
kernel (IIRC it used to with netif_wake_queue), I have not exported the
new __qdisc_run function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-19 23:57:59 -07:00
Patrick McHardy
d3dcd4efe2 [NETFILTER]: xt_sctp: fix endless loop caused by 0 chunk length
Fix endless loop in the SCTP match similar to those already fixed in
the SCTP conntrack helper (was CVE-2006-1527).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-19 23:39:45 -07:00
Linus Torvalds
4c84a39c8a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (46 commits)
  IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
  IB/mthca: Make all device methods truly reentrant
  IB/mthca: Fix memory leak on modify_qp error paths
  IB/uverbs: Factor out common idr code
  IB/uverbs: Don't decrement usecnt on error paths
  IB/uverbs: Release lock on error path
  IB/cm: Use address handle helpers
  IB/sa: Add ib_init_ah_from_path()
  IB: Add ib_init_ah_from_wc()
  IB/ucm: Get rid of duplicate P_Key parameter
  IB/srp: Factor out common request reset code
  IB/srp: Support SRP rev. 10 targets
  [SCSI] srp.h: Add I/O Class values
  IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool.
  IB/mthca: Fill in max_map_per_fmr device attribute
  IB/ipath: Add client reregister event generation
  IB/mthca: Add client reregister event generation
  IB: Move struct port_info from ipath to <rdma/ib_smi.h>
  IPoIB: Handle client reregister events
  IB: Add client reregister event type
  ...
2006-06-19 19:01:59 -07:00
Linus Torvalds
d0b952a983 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (109 commits)
  [ETHTOOL]: Fix UFO typo
  [SCTP]: Fix persistent slowdown in sctp when a gap ack consumes rx buffer.
  [SCTP]: Send only 1 window update SACK per message.
  [SCTP]: Don't do CRC32C checksum over loopback.
  [SCTP] Reset rtt_in_progress for the chunk when processing its sack.
  [SCTP]: Reject sctp packets with broadcast addresses.
  [SCTP]: Limit association max_retrans setting in setsockopt.
  [PFKEYV2]: Fix inconsistent typing in struct sadb_x_kmprivate.
  [IPV6]: Sum real space for RTAs.
  [IRDA]: Use put_unaligned() in irlmp_do_discovery().
  [BRIDGE]: Add support for NETIF_F_HW_CSUM devices
  [NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM
  [TG3]: Convert to non-LLTX
  [TG3]: Remove unnecessary tx_lock
  [TCP]: Add tcp_slow_start_after_idle sysctl.
  [BNX2]: Update version and reldate
  [BNX2]: Use CPU native page size
  [BNX2]: Use compressed firmware
  [BNX2]: Add firmware decompression
  [BNX2]: Allow WoL settings on new 5708 chips
  ...

Manual fixup for conflict in drivers/net/tulip/winbond-840.c
2006-06-19 18:55:56 -07:00
Herbert Xu
47552c4e55 [ETHTOOL]: Fix UFO typo
The function ethtool_get_ufo was referring to ETHTOOL_GTSO instead of
ETHTOOL_GUFO.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 23:00:20 -07:00
Neil Horman
d5b9f4c083 [SCTP]: Fix persistent slowdown in sctp when a gap ack consumes rx buffer.
In the event that our entire receive buffer is full with a series of
chunks that represent a single gap-ack, and then we accept a chunk
(or chunks) that fill in the gap between the ctsn and the first gap,
we renege chunks from the end of the buffer, which effectively does
nothing but move our gap to the end of our received tsn stream. This
does little but move our missing tsns down stream a little, and, if the
sender is sending sufficiently large retransmit frames, the result is a
perpetual slowdown which can never be recovered from, since the only
chunk that can be accepted to allow progress in the tsn stream necessitates
that a new gap be created to make room for it. This leads to a constant
need for retransmits, and subsequent receiver stalls. The fix I've come up
with is to deliver the frame without reneging if we have a full receive
buffer and the receiving sockets sk_receive_queue is empty(indicating that
the receive buffer is being blocked by a missing tsn).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:59:03 -07:00
Tsutomu Fujii
d7c2c9e397 [SCTP]: Send only 1 window update SACK per message.
Right now, every time we increase our rwnd by more then MTU bytes, we
trigger a SACK.  When processing large messages, this will generate a
SACK for almost every other SCTP fragment. However since we are freeing
the entire message at the same time, we might as well collapse the SACK
generation to 1.

Signed-off-by: Tsutomu Fujii <t-fujii@nb.jp.nec.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:58:28 -07:00
Sridhar Samudrala
503b55fd77 [SCTP]: Don't do CRC32C checksum over loopback.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:57:28 -07:00
Vlad Yasevich
4c9f5d5305 [SCTP] Reset rtt_in_progress for the chunk when processing its sack.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:56:08 -07:00
Vlad Yasevich
5636bef732 [SCTP]: Reject sctp packets with broadcast addresses.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:55:35 -07:00
Vlad Yasevich
402d68c433 [SCTP]: Limit association max_retrans setting in setsockopt.
When using ASSOCINFO socket option, we need to limit the number of
maximum association retransmissions to be no greater than the sum
of all the path retransmissions. This is specified in Section 7.1.2
of the SCTP socket API draft.
However, we only do this if the association has multiple paths. If
there is only one path, the protocol stack will use the
assoc_max_retrans setting when trying to retransmit packets.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:54:51 -07:00
YOSHIFUJI Hideaki
c5396a31b2 [IPV6]: Sum real space for RTAs.
This patch fixes RTNLGRP_IPV6_IFINFO netlink notifications.  Issue
pointed out by Patrick McHardy <kaber@trash.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:48:48 -07:00
David S. Miller
b293acfd31 [IRDA]: Use put_unaligned() in irlmp_do_discovery().
irda_device_info->hints[] is byte aligned but is being
accessed as a u16

Based upon a patch by Luke Yang <luke.adi@gmail.com>.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:16:13 -07:00
Herbert Xu
2c6cc0d853 [BRIDGE]: Add support for NETIF_F_HW_CSUM devices
As it is the bridge will only ever declare NETIF_F_IP_CSUM even if all
its constituent devices support NETIF_F_HW_CSUM.  This patch fixes
this by supporting the first one out of NETIF_F_NO_CSUM,
NETIF_F_HW_CSUM, and NETIF_F_IP_CSUM that is supported by all
constituent devices.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:06:45 -07:00
Herbert Xu
8648b3053b [NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM
The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM
identically so we test for them in quite a few places.  For the sake
of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two.  We
also test the disjunct of NETIF_F_IP_CSUM and the other two in various
places, for that purpose I've added NETIF_F_ALL_CSUM.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17 22:06:05 -07:00