Commit graph

21175 commits

Author SHA1 Message Date
Pavel Emelyanov
3c4d05c805 inet_diag: Introduce the inet socket dumping routine
The existing inet_csk_diag_fill dumps the inet connection sock info
into the netlink inet_diag_message. Prepare this routine to be able
to dump only the inet_sock part of a socket if the icsk part is missing.

This will be used by UDP diag module when dumping UDP sockets.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:08 -05:00
Pavel Emelyanov
8d07d1518a inet_diag: Introduce the byte-code run on an inet socket
The upcoming UDP module will require exactly this ability, so just
move the existing code to provide one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:08 -05:00
Pavel Emelyanov
efb3cb428d inet_diag: Split inet_diag_get_exact into parts
Similar to previous patch: the 1st part locks the inet handler
and will get generalized and the 2nd one dumps icsk-s and will
be used by TCP and DCCP handlers.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:08 -05:00
Pavel Emelyanov
476f7dbff3 inet_diag: Split inet_diag_get_exact into parts
The 1st part locks the inet handler and the 2nd one dump the
inet connection sock.

In the next patches the 1st part will be generalized to call
the socket dumping routine indirectly (i.e. TCP/UDP/DCCP) and
the 2nd part will be used by TCP and DCCP handlers.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:08 -05:00
Pavel Emelyanov
b005ab4ef8 inet_diag: Export inet diag cookie checking routine
The netlink diag susbsys stores sk address bits in the nl message
as a "cookie" and uses one when dumps details about particular
socket.

The same will be required for udp diag module, so introduce a heler
in inet_diag module

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:08 -05:00
Pavel Emelyanov
87c22ea52e inet_diag: Reduce the number of args for bytecode run routine
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:07 -05:00
Pavel Emelyanov
7b35eadd7e inet_diag: Remove indirect sizeof from inet diag handlers
There's an info_size value stored on inet_diag_handler, but for existing
code this value is effectively constant, so just use sizeof(struct tcp_info)
where required.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 14:14:07 -05:00
Eric Dumazet
a73ed26bba sch_red: generalize accurate MAX_P support to RED/GRED/CHOKE
Now RED uses a Q0.32 number to store max_p (max probability), allow
RED/GRED/CHOKE to use/report full resolution at config/dump time.

Old tc binaries are non aware of new attributes, and still set/get Plog.

New tc binary set/get both Plog and max_p for backward compatibility,
they display "probability value" if they get max_p from new kernels.

# tc -d  qdisc show dev ...
...
qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5
probability 0.09 Scell_log 15

Make sure we avoid potential divides by 0 in reciprocal_value(), if
(max_th - min_th) is big.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 13:46:15 -05:00
John Fastabend
0221cd5154 Revert "net: netprio_cgroup: make net_prio_subsys static"
This reverts commit 865d9f9f74.

This commit breaks the build with CONFIG_NETPRIO_CGROUP=y so
revert it. It does build as a module though. The SUBSYS macro
in the cgroup core code automatically defines a subsys structure
as extern. Long term we should fix the macro. And I need to
fully build test things.

Tested with CONFIG_NETPRIO_CGROUP={y|m|n} with and without
CONFIG_CGROUPS defined.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Reported-By: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-09 13:39:27 -05:00
Dan Carpenter
6f8e4ad0ef sock_diag: off by one checks
These tests are off by one because sock_diag_handlers[] only has AF_MAX
elements.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:58:35 -05:00
John Fastabend
865d9f9f74 net: netprio_cgroup: make net_prio_subsys static
net_prio_subsys can be made static this removes the sparse
warning it was throwing.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:58:35 -05:00
Benjamin LaHaise
6d4cdf47d2 vlan: add 802.1q netpoll support
Add netpoll support to 802.1q vlan devices.  Based on the netpoll support
in the bridging code.  Tested on a forced_eth device with netconsole.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:58:30 -05:00
Eric Dumazet
8af2a218de sch_red: Adaptative RED AQM
Adaptative RED AQM for linux, based on paper from Sally FLoyd,
Ramakrishna Gummadi, and Scott Shenker, August 2001 :

http://icir.org/floyd/papers/adaptiveRed.pdf

Goal of Adaptative RED is to make max_p a dynamic value between 1% and
50% to reach the target average queue : (max_th - min_th) / 2

Every 500 ms:
 if (avg > target and max_p <= 0.5)
  increase max_p : max_p += alpha;
 else if (avg < target and max_p >= 0.01)
  decrease max_p : max_p *= beta;

target :[min_th + 0.4*(min_th - max_th),
          min_th + 0.6*(min_th - max_th)].
alpha : min(0.01, max_p / 4)
beta : 0.9
max_P is a Q0.32 fixed point number (unsigned, with 32 bits mantissa)

Changes against our RED implementation are :

max_p is no longer a negative power of two (1/(2^Plog)), but a Q0.32
fixed point number, to allow full range described in Adatative paper.

To deliver a random number, we now use a reciprocal divide (thats really
a multiply), but this operation is done once per marked/droped packet
when in RED_BETWEEN_TRESH window, so added cost (compared to previous
AND operation) is near zero.

dump operation gives current max_p value in a new TCA_RED_MAX_P
attribute.

Example on a 10Mbit link :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec red \
   limit 400000 min 30000 max 90000 avpkt 1000 \
   burst 55 ecn adaptative bandwidth 10Mbit

# tc -s -d qdisc show dev eth3
...
qdisc red 10: parent 1:1 limit 400000b min 30000b max 90000b ecn
adaptative ewma 5 max_p=0.113335 Scell_log 15
 Sent 50414282 bytes 34504 pkt (dropped 35, overlimits 1392 requeues 0)
 rate 9749Kbit 831pps backlog 72056b 16p requeues 0
  marked 1357 early 35 pdrop 0 other 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:43 -05:00
Jiri Pirko
348a1443cc vlan: introduce functions to do mass addition/deletion of vids by another device
Introduce functions handy to copy vlan ids from one driver's list to
another.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:42 -05:00
Jiri Pirko
5b9ea6e022 vlan: introduce vid list with reference counting
This allows to keep track of vids needed to be in rx vlan filters of
devices even if they are used in bond/team etc.

vlan_info as well as vlan_group previously was, is allocated when first
vid is added and dealocated whan last vid is deleted.

vlan_group definition is moved to private header.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:42 -05:00
Jiri Pirko
87002b03ba net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls
This patch adds wrapper for ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid
functions. Check for NETIF_F_HW_VLAN_FILTER feature is done in this
wrapper.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:42 -05:00
Jiri Pirko
7da82c06de vlan: rename vlan_dev_info to vlan_dev_priv
As this structure is priv, name it approprietely. Also for pointer to it
use name "vlan".

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:51:30 -05:00
stephen hemminger
4359881338 bridge: add local MAC address to forwarding table (v2)
If user has configured a MAC address that is not one of the existing
ports of the bridge, then we need to add a special entry in the forwarding
table. This forwarding table entry has no outgoing port so it has to be
treated a little differently. The special entry is reported by the netlink
interface with ifindex of bridge, but ignored by the old interface since there
is no usable way to put it in the ABI.

Reported-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:40:28 -05:00
stephen hemminger
31e8a49c16 bridge: rearrange fdb notifications (v2)
Pass bridge to fdb_notify so it can determine correct namespace based
on namespace of bridge rather than namespace of destination port.
Also makes next patch easier.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:40:28 -05:00
stephen hemminger
f58ee4e1a2 bridge: refactor fdb_notify
Move fdb_notify outside of fdb_create. This fixes the problem
that notification of local entries are not flagged correctly.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:40:28 -05:00
David S. Miller
959327c784 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-06 21:10:05 -05:00
Roar Førde
f84ea779c2 caif: Replace BUG_ON with WARN_ON.
BUG_ON is too strict in a number of circumstances,
use WARN_ON instead. Protocol errors should not halt the system.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 17:21:47 -05:00
sjur.brandeland@stericsson.com
005b0b076f caif: Bad assert triggering false positive.
Fix bad assert on fragment size triggering false positive.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 17:21:47 -05:00
David S. Miller
87a115783e ipv6: Move xfrm_lookup() call down into icmp6_dst_alloc().
And return error pointers.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 17:04:13 -05:00
David S. Miller
8f0315190d ipv6: Make third arg to anycast_dst_alloc() bool.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 16:48:14 -05:00
Stephen Boyd
681090902e net: Silence seq_scale() unused warning
On a CONFIG_NET=y build

net/core/secure_seq.c:22: warning: 'seq_scale' defined but not
used

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:59:16 -05:00
Pavel Emelyanov
8ef874bfc7 sock_diag: Move the sock_ code to net/core/
This patch moves the sock_ code from inet_diag.c to generic sock_diag.c
file and provides necessary request_module-s calls and a pointer on
inet_diag_compat dumping routine.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:02 -05:00
Pavel Emelyanov
a029fe26b6 inet_diag: Cleanup type2proto last user
Now all the code works with sock_diag_req-compatible structs, so it's
possible to stop using the inet_diag_type2proto in inet_csk_diag_fill.
Pass the inet_diag_req into it and use the sdiag_protocol field. At the
same time remove the explicit ext argument, since it's also on the req.

However, this conversion is still required in _compat code, so just move
this routine, not remove.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:02 -05:00
Pavel Emelyanov
d23deaa07b inet_diag: Introduce socket family checks
The new API will specify family to work with. Teach the existing
socket walking code to bypass not interesting ones.

To preserve compatibility with existing behavior the _compat code
sets interesting family to AF_UNSPEC to dump them all.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:02 -05:00
Pavel Emelyanov
25c4cd2b6d inet_diag: Switch the _dump to work with new header
Make inet_diag_dumo work with given header instead of calculating
one from the nl message.

The SOCK_DIAG_BY_FAMILY just passes skb's one through, the compat code
converts the old header to new one.

Also fix the bytecode calculation to find one at proper offset.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:02 -05:00
Pavel Emelyanov
fe50ce2846 inet_diag: Switch the _get_exact to work with new header
Make inet_diag_get_exact work with given header instead of calculating
one from the nl message.

The SOCK_DIAG_BY_FAMILY just passes skb's one through, the compat code
converts the old header to new one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:01 -05:00
Pavel Emelyanov
126fdc3249 inet_diag: Introduce new inet_diag_req header
This one coinsides with the sock_diag_req in the beginning and
contains only used fields from its previous analogue.

The existing code is patched to use the _compat version of it
for now.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:01 -05:00
Pavel Emelyanov
d366477a52 sock_diag: Initial skeleton
When receiving the SOCK_DIAG_BY_FAMILY message we have to find the
handler for provided family and pass the nl message to it.

This patch describes an infrastructure to work with such nandlers
and implements stubs for AF_INET(6) ones.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:01 -05:00
Pavel Emelyanov
f13c95f0e2 inet_diag: Switch from _GETSOCK to IPPROTO_ numbers
Sorry, but the vger didn't let this message go to the list. Re-sending it with
less spam-filter-prone subject.

When dumping the AF_INET/AF_INET6 sockets user will also specify the protocol,
so prepare the protocol diag handlers to work with IPPROTO_ constants.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:01 -05:00
Pavel Emelyanov
37f352b5e3 inet_diag: Move byte-code finding up the call-stack
Current code calculates it at fixed offset. This offset will change, so
move the BC calculation upper to make the further patching simpler.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:01 -05:00
Pavel Emelyanov
8d34172dfd sock_diag: Introduce new message type
This type will run the family+protocol based socket dumping.
Also prepare the stub function for it.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:58:01 -05:00
Pavel Emelyanov
7f1fb60c4f inet_diag: Partly rename inet_ to sock_
The ultimate goal is to get the sock_diag module, that works in
family+protocol terms. Currently this is suitable to do on the
inet_diag basis, so rename parts of the code. It will be moved
to sock_diag.c later.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:57:36 -05:00
David S. Miller
9e998a7550 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2011-12-06 13:31:19 -05:00
Peter Pan(潘卫平)
99b53bdd81 ipv4:correct description for tcp_max_syn_backlog
Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning),
sysctl_max_syn_backlog is determined by tcp_hashinfo->ehash_mask,
and the minimal value is 128, and it will increase in proportion to the
memory of machine.
The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog
are out of date.

Changelog:
V2: update description for sysctl_max_syn_backlog

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Reviewed-by: Shan Wei <shanwei88@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 13:02:28 -05:00
Dan Carpenter
f0a98ae8db openvswitch: small potential memory leak in ovs_vport_alloc()
We're unlikely to hit this leak, but the static checkers complain if we
don't take care of it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 12:58:57 -05:00
John W. Linville
d39aeaf260 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2011-12-06 10:47:12 -05:00
Igor Maravic
40e4783ee6 ipv4: arp: Cleanup in arp.c
Use "IS_ENABLED(CONFIG_FOO)" macro instead of
"defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)"

Signed-off-by: Igor Maravic <igorm@etf.rs>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-06 00:34:40 -05:00
Eric Dumazet
0a5912db7b tcp: remove TCP_OFF and TCP_PAGE macros
As mentioned by Joe Perches, TCP_OFF() and TCP_PAGE() macros are
useless.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05 18:30:03 -05:00
Eric Dumazet
b474ae7760 bql: fix CONFIG_XPS=n build
netdev_queue_release() should be called even if CONFIG_XPS=n
to properly release device reference.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05 18:30:03 -05:00
Eric Dumazet
4fa48bf3c7 tcp: fix tcp_trim_head()
commit f07d960df3 (tcp: avoid frag allocation for small frames)
breaked assumption in tcp stack that skb is either linear (skb->data_len
== 0), or fully fragged (skb->data_len == skb->len)

tcp_trim_head() made this assumption, we must fix it.

Thanks to Vijay for providing a very detailed explanation.

Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05 18:30:03 -05:00
sjur.brandeland@stericsson.com
7d31130428 caif: Stash away hijacked skb destructor and call it later
This patch adds functionality for avoiding orphaning SKB too early.
The original skb is stashed away and the original destructor is called
from the hi-jacked flow-on callback. If CAIF interface goes down and a
hi-jacked SKB exists, the original skb->destructor is restored.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05 18:27:56 -05:00
sjur.brandeland@stericsson.com
0e4c7d85d5 caif: Add support for flow-control on device's tx-queue
Flow control is implemented by inspecting the qdisc queue length
in order to detect potential overflow on the TX queue. When a threshold
is reached flow-off is sent upwards in the CAIF stack. At the same time
the skb->destructor is hi-jacked by orphaning the SKB and the original
destructor is replaced with a "flow-on" callback. When the "hi-jacked"
SKB is consumed the queue should be empty, and the "flow-on" callback
is called and xon is sent upwards in the CAIF stack.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05 18:27:56 -05:00
sjur.brandeland@stericsson.com
7ad65bf68d caif: Add support for CAIF over CDC NCM USB interface
NCM 1.0 does not support anything but Ethernet framing, hence
CAIF payload will be put into Ethernet frames.

Discovery is based on fixed USB vendor 0x04cc (ST-Ericsson),
product-id 0x230f (NCM). In this variant only CAIF payload is sent over
the NCM interface.

The CAIF stack (cfusbl.c) will when USB interface register first check if
we got a CDC NCM USB interface with the right VID, PID.
It will then read the device's Ethernet address and create a 'template'
Ethernet TX header, using a broadcast address as the destination address,
and EthType 0x88b5 (802.1 Local Experimental - vendor specific).

A protocol handler for 0x88b5 is setup for reception of CAIF frames from
the CDC NCM USB interface.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05 18:27:56 -05:00
David Miller
2721745501 net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05 15:20:19 -05:00
David S. Miller
321f3b8708 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2011-12-05 13:23:14 -05:00