In light of my recent patch to net/ipv4/udp.c that replaced the
spin_lock_irq calls on the receive queue lock with spin_lock_bh,
here is a similar patch for all other occurences of spin_lock_irq
on receive/error queue locks in IPv4 and IPv6.
In these stacks, we know that they can only be entered from user
or softirq context. Therefore it's safe to disable BH only.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that netlink events created as a result of programns
using ioctls (such as ifconfig, route etc) contains the correct PID of
those events.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts "unsigned flags" to use more explict types like u16
instead and incrementally introduces NLMSG_NEW().
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch was supposed to be part of the neighbour tables related
patchset but apparently got lost.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the format of the XFRM_MSG_DELSA and
XFRM_MSG_DELPOLICY notification so that the main message
sent is of the same format as that received by the kernel
if the original message was via netlink. This also means
that we won't lose the byid information carried in km_event.
Since this user interface is introduced by Jamal's patch
we can still afford to change it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch rectifies some rtnetlink message builders that derive the
flags from the pid. It is now explicit like the other cases
which get it right. Also fixes half a dozen dumpers which did not
set NLM_F_MULTI at all.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduces a new macro NLMSG_NEW which extends NLMSG_PUT but takes
a flags argument. NLMSG_PUT stays there for compatibility but now
calls NLMSG_NEW with flags == 0. NLMSG_PUT_ANSWER is renamed to
NLMSG_NEW_ANSWER which now also takes a flags argument.
Also converts the users of NLMSG_PUT_ANSWER to use NLMSG_NEW_ANSWER
and fixes the two direct users of __nlmsg_put to either provide
the flags or use NLMSG_NEW(_ANSWER).
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes dsmark to do all configuration sanity checks first and
only apply the changes if all of them can be applied without
any errors. Also fixes the weak sanity checks for DSMARK_VALUE
and DSMASK_MASK.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
To retrieve the neighbour tables send RTM_GETNEIGHTBL with the
NLM_F_DUMP flag set. Every neighbour table configuration is
spread over multiple messages to avoid running into message
size limits on systems with many interfaces. The first message
in the sequence transports all not device specific data such as
statistics, configuration, and the default parameter set.
This message is followed by 0..n messages carrying device
specific parameter sets.
Although the ordering should be sufficient, NDTA_NAME can be
used to identify sequences. The initial message can be identified
by checking for NDTA_CONFIG. The device specific messages do
not contain this TLV but have NDTPA_IFINDEX set to the
corresponding interface index.
To change neighbour table attributes, send RTM_SETNEIGHTBL
with NDTA_NAME set. Changeable attribute include NDTA_THRESH[1-3],
NDTA_GC_INTERVAL, and all TLVs in NDTA_PARMS unless marked
otherwise. Device specific parameter sets can be changed by
setting NDTPA_IFINDEX to the interface index of the corresponding
device.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This chunks out the accept_queue and tcp_listen_opt code and moves
them to net/core/request_sock.c and include/net/request_sock.h, to
make it useful for other transport protocols, DCCP being the first one
to use it.
Next patches will rename tcp_listen_opt to accept_sock and remove the
inline tcp functions that just call a reqsk_queue_ function.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ok, this one just renames some stuff to have a better namespace and to
dissassociate it from TCP:
struct open_request -> struct request_sock
tcp_openreq_alloc -> reqsk_alloc
tcp_openreq_free -> reqsk_free
tcp_openreq_fastfree -> __reqsk_free
With this most of the infrastructure closely resembles a struct
sock methods subset.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kept this first changeset minimal, without changing existing names to
ease peer review.
Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
has two new members:
->slab, that replaces tcp_openreq_cachep
->obj_size, to inform the size of the openreq descendant for
a specific protocol
The protocol specific fields in struct open_request were moved to a
class hierarchy, with the things that are common to all connection
oriented PF_INET protocols in struct inet_request_sock, the TCP ones
in tcp_request_sock, that is an inet_request_sock, that is an
open_request.
I.e. this uses the same approach used for the struct sock class
hierarchy, with sk_prot indicating if the protocol wants to use the
open_request infrastructure by filling in sk_prot->rsk_prot with an
or_calltable.
Results? Performance is improved and TCP v4 now uses only 64 bytes per
open request minisock, down from 96 without this patch :-)
Next changeset will rename some of the structs, fields and functions
mentioned above, struct or_calltable is way unclear, better name it
struct request_sock_ops, s/struct open_request/struct request_sock/g,
etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Small fixup to use netlink macros instead of hardcoding.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu wrote:
> @@ -1254,6 +1326,7 @@ static int pfkey_add(struct sock *sk, st
> if (IS_ERR(x))
> return PTR_ERR(x);
>
> + xfrm_state_hold(x);
This introduces a leak when xfrm_state_add()/xfrm_state_update()
fail. We hold two references (one from xfrm_state_alloc(), one
from xfrm_state_hold()), but only drop one. We need to take the
reference because the reference from xfrm_state_alloc() can
be dropped by __xfrm_state_delete(), so the fix is to drop both
references on error. Same problem in xfrm_user.c.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes XFRM_SAP_* and converts them over to XFRM_MSG_*.
The netlink interface is meant to map directly onto the underlying
xfrm subsystem. Therefore rather than using a new independent
representation for the events we can simply use the existing ones
from xfrm_user.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes policy deletion in xfrm_user so that it sets
km_event.data.byid. This puts xfrm_user on par with what af_key
does in this case.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adjusts the SA state conversion in af_key such that
XFRM_STATE_ERROR/XFRM_STATE_DEAD will be converted to SADB_STATE_DEAD
instead of SADB_STATE_DYING.
According to RFC 2367, SADB_STATE_DYING SAs can be turned into
mature ones through updating their lifetime settings. Since SAs
which are in the states XFRM_STATE_ERROR/XFRM_STATE_DEAD cannot
be resurrected, this value is unsuitable.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch ensures that the hard state/policy expire notifications are
only sent when the state/policy is successfully removed from their
respective tables.
As it is, it's possible for a state/policy to both expire through
reaching a hard limit, as well as being deleted by the user.
Note that this behaviour isn't actually forbidden by RFC 2367.
However, it is a quality of implementation issue.
As an added bonus, the restructuring in this patch will help
eventually in moving the expire notifications from softirq
context into process context, thus improving their reliability.
One important side-effect from this change is that SAs reaching
their hard byte/packet limits are now deleted immediately, just
like SAs that have reached their hard time limits.
Previously they were announced immediately but only deleted after
30 seconds.
This is bad because it prevents the system from issuing an ACQUIRE
command until the existing state was deleted by the user or expires
after the time is up.
In the scenario where the expire notification was lost this introduces
a 30 second delay into the system for no good reason.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Heres the final patch.
What this patch provides
- netlink xfrm events
- ability to have events generated by netlink propagated to pfkey
and vice versa.
- fixes the acquire lets-be-happy-with-one-success issue
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a fixed-up version of the broken "upstream-2.6.13" branch, where
I re-did the manual merge of drivers/net/r8169.c by hand, and made sure
the history is all good.
This fixes various crashes on 64-bit when using this module.
Based upon a patch by Juergen Kreileder <jk@blackdown.de>.
Signed-off-by: David S. Miller <davem@davemloft.net>
ACKed-by: Patrick McHardy <kaber@trash.net>
This patch alows you to change the source address of icmp error
messages. It applies cleanly to 2.6.11.11 and retains the default
behaviour.
In the old (default) behaviour icmp error messages are sent with the ip
of the exiting interface.
The new behaviour (when the sysctl variable is toggled on), it will send
the message with the ip of the interface that received the packet that
caused the icmp error. This is the behaviour network administrators will
expect from a router. It makes debugging complicated network layouts
much easier. Also, all 'vendor routers' I know of have the later
behaviour.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Userland layer-2 tunneling devices allocated through the TUNTAP driver
(drivers/net/tun.c) have a type of ARPHRD_NONE, and have no link-layer
address. The kernel complains at regular interval when IPv6 Privacy
extension are enabled because it can't find an hardware address :
Dec 29 11:02:04 auguste kernel: __ipv6_regen_rndid(idev=cb3e0c00):
cannot get EUI64 identifier; use random bytes.
IPv6 Privacy extensions should probably be disabled on that sort of
device. They won't work anyway. If userland wants a more usual
Ethernet-ish interface with usual IPv6 autoconfiguration, it will use a
TAP device with an emulated link-layer and a random hardware address
rather than a TUN device.
As far as I could fine, TUN virtual device from TUNTAP is the very only
sort of device using ARPHRD_NONE as kernel device type.
Signed-off-by: Rémi Denis-Courmont <rdenis@simphalempin.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We saw following trace several times:
|BUG: using smp_processor_id() in preemptible [00000001] code: httpd/30137
|caller is icmpv6_send+0x23/0x540
| [<c01ad63b>] smp_processor_id+0x9b/0xb8
| [<c02993e7>] icmpv6_send+0x23/0x540
This is because of icmpv6_socket, which is the only one user of
smp_processor_id() in icmpv6_send(), AFAIK.
Since it should be used in non-preemptive context,
let's defer the dereference after disabling preemption
(by icmpv6_xmit_lock()).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Ralf Baechle <ralf@linux-mips.org>
There are archives of the old list at http://oss.sgi.com/archives/netdev
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is brought to you by the department of applied stupidity.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds meta collectors for all socket attributes that make sense
to be filtered upon. Some of them are only useful for debugging
but having them doesn't hurt.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing the sysctl net.core.dev_weight has no effect because the weight
of the backlog devices is set during initialization and never changed.
This patch propagates any changes to the global value affected by sysctl
to the per-cpu devices. It is done every time the packet handler
function is run.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simple interface to allow changing network device scheduling weight
with sysfs. Please consider this for 2.6.12, since risk/impact is small.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was checking the "GET" function pointer instead of
the "SET" one. Looks like a cut&paste error :-)
Signed-off-by: David S. Miller <davem@davemloft.net>