Commit graph

18678 commits

Author SHA1 Message Date
Dan Siemon
4a2b9c3756 net_sched: fix ip_tos2prio
ECN support incorrectly maps ECN BESTEFFORT packets to TC_PRIO_FILLER
(1) instead of TC_PRIO_BESTEFFORT (0)

This means ECN enabled flows are placed in pfifo_fast/prio low priority
band, giving ECN enabled flows [ECT(0) and CE codepoints] higher drop
probabilities.

This is rather unfortunate, given we would like ECN being more widely
used.

Ref : http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/

Signed-off-by: Dan Siemon <dan@coverfire.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dave Täht <d@taht.net>
Cc: Jonathan Morton <chromatix99@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-15 18:53:54 -07:00
Randy Dunlap
986d4abbdd sunrpc: fix printk format warning
Fix printk format build warning:

net/sunrpc/xprtrdma/verbs.c:1463: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'dma_addr_t'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-15 20:17:32 -04:00
j223yang@asset.uwaterloo.ca
4d4a76f330 xprt: remove redundant null check
'req' is dereferenced before checked for NULL.
The patch simply removes the check.

Signed-off-by: Jinqiu Yang<crindy646@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-15 20:16:14 -04:00
Linus Torvalds
422e6c4bc4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits)
  tidy the trailing symlinks traversal up
  Turn resolution of trailing symlinks iterative everywhere
  simplify link_path_walk() tail
  Make trailing symlink resolution in path_lookupat() iterative
  update nd->inode in __do_follow_link() instead of after do_follow_link()
  pull handling of one pathname component into a helper
  fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
  Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
  readlinkat(), fchownat() and fstatat() with empty relative pathnames
  Allow O_PATH for symlinks
  New kind of open files - "location only".
  ext4: Copy fs UUID to superblock
  ext3: Copy fs UUID to superblock.
  vfs: Export file system uuid via /proc/<pid>/mountinfo
  unistd.h: Add new syscalls numbers to asm-generic
  x86: Add new syscalls for x86_64
  x86: Add new syscalls for x86_32
  fs: Remove i_nlink check from file system link callback
  fs: Don't allow to create hardlink for deleted file
  vfs: Add open by file handle support
  ...
2011-03-15 15:48:13 -07:00
James Morris
a002951c97 Merge branch 'next' into for-linus 2011-03-16 09:41:17 +11:00
Eric Dumazet
7313714775 xfrm: fix __xfrm_route_forward()
This function should return 0 in case of error, 1 if OK
commit 452edd598f (xfrm: Return dst directly from xfrm_lookup())
got it wrong.

Reported-and-bisected-by: Michael Smith <msmith@cbnco.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-15 15:26:43 -07:00
David S. Miller
c337ffb68e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-03-15 15:15:17 -07:00
Rémi Denis-Courmont
638be34459 Phonet: fix aligned-mode pipe socket buffer header reserve
When the pipe uses aligned-mode data packets, we must reserve 4 bytes
instead of 3 for the pipe protocol header. Otherwise the Phonet header
would not be aligned, resulting in potentially corrupted headers with
later unaligned memory writes.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-15 14:55:49 -07:00
David S. Miller
918690f981 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2011-03-15 13:57:18 -07:00
David S. Miller
31111c26d9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
2011-03-15 13:03:27 -07:00
Florian Westphal
2f5dc63123 netfilter: xt_addrtype: ipv6 support
The kernel will refuse certain types that do not work in ipv6 mode.
We can then add these features incrementally without risk of userspace
breakage.

Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-15 20:17:44 +01:00
Florian Westphal
de81bbea17 netfilter: ipt_addrtype: rename to xt_addrtype
Followup patch will add ipv6 support.

ipt_addrtype.h is retained for compatibility reasons, but no longer used
by the kernel.

Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-15 20:16:20 +01:00
John W. Linville
106af2c99a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2011-03-15 14:16:48 -04:00
Tommi Virtanen
b09734b1f4 libceph: Fix base64-decoding when input ends in newline.
It used to return -EINVAL because it thought the end was not aligned
to 4 bytes.

Clean up superfluous src < end test in if, the while itself guarantees
that.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-15 09:14:02 -07:00
Aneesh Kumar K.V
c0aa4caf4c net/9p: Implement syncfs 9P operation
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:38 -05:00
Venkateswararao Jujjuri (JV)
f735195d51 [net/9p] Small non-IO PDUs for zero-copy supporting transports.
If a transport prefers payload to be sent separate from the PDU
(P9_TRANS_PREF_PAYLOAD_SEP), there is no need to allocate msize
PDU buffers(struct p9_fcall).

This patch allocates only upto 4k buffers for this kind of transports
and there won't be any change to the legacy transports.

Hence, this patch on top of zero copy changes allows user to
specify higher msizes through the mount option
without hogging the kernel heap.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:36 -05:00
Venkateswararao Jujjuri (JV)
ca41bb3e21 [net/9p] Handle Zero Copy TREAD/RERROR case in !dotl case.
This takes care of copying out error buffers from user buffer
payloads when we are using zero copy.  This happens because the
only payload buffer the server has to respond to the request is
the user buffer given for the zero copy read.

Because we only use zerocopy when the amount of data to transfer
is greater than a certain size (currently 4K) and error strings are
limited to ERRMAX (currently 128) we don't need to worry about there
being sufficient space for the error to fit in the payload.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:36 -05:00
Venkateswararao Jujjuri (JV)
2c66523fd2 [net/9p] readdir zerocopy changes for 9P2000.L protocol.
Modify p9_client_readdir() to check the transport preference and act according
If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload
separately instead of putting it directly on PDU.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)
1fc52481c2 [net/9p] Write side zerocopy changes for 9P2000.L protocol.
Modify p9_client_write() to check the transport preference and act accordingly.
If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload
separately instead of putting it directly on PDU.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)
bb2f8a5515 [net/9p] Read side zerocopy changes for 9P2000.L protocol.
Modify p9_client_read() to check the transport preference and act accordingly.
If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload
separately instead of putting it directly on PDU.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)
6f69c395ce [net/9p] Add preferences to transport layer.
This patch adds preferences field to the p9_trans_module.
Through this, now transport layer can express its preference about the
payload. i.e if payload neds to be part of the PDU or it prefers it
to be sent sepearetly so that the transport layer can handle it in
a better way.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)
4038866dab [net/9p] Add gup/zero_copy support to VirtIO transport layer.
Modify p9_virtio_request() and req_done() functions to support
additional payload sent down to the transport layer through
tc->pubuf and tc->pkbuf.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)
9bb6c10a4e [net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed.
This will be used by the transport layer to determine the out going
request type. Transport layer uses this information to correctly
place the mapped pages in the PDU. Patches following this will make
use of this to achieve zero copy.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:34 -05:00
Venkateswararao Jujjuri (JV)
022cae3655 [net/9p] Preparation and helper functions for zero copy
This patch prepares p9_fcall structure for zero copy. Added
fields send the payload buffer information to the transport layer.
In addition it adds a 'private' field for the transport layer to
store mapped/pinned page information so that it can be freed/unpinned
during req_done.

This patch also creates trans_common.[ch] to house helper functions.
It adds the following helper functions.

p9_release_req_pages - Release pages after the transaction.
p9_nr_pages - Return number of pages needed to accomodate the payload.
payload_gup - Translates user buffer into kernel pages.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-15 09:57:34 -05:00
Vasiliy Kulikov
6a8ab06077 ipv6: netfilter: ip6_tables: fix infoleak to userspace
Structures ip6t_replace, compat_ip6t_replace, and xt_get_revision are
copied from userspace.  Fields of these structs that are
zero-terminated strings are not checked.  When they are used as argument
to a format string containing "%s" in request_module(), some sensitive
information is leaked to userspace via argument of spawned modprobe
process.

The first bug was introduced before the git epoch;  the second was
introduced in 3bc3fe5e (v2.6.25-rc1);  the third is introduced by
6b7d31fc (v2.6.15-rc1).  To trigger the bug one should have
CAP_NET_ADMIN.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-15 13:37:13 +01:00
Vasiliy Kulikov
78b7987676 netfilter: ip_tables: fix infoleak to userspace
Structures ipt_replace, compat_ipt_replace, and xt_get_revision are
copied from userspace.  Fields of these structs that are
zero-terminated strings are not checked.  When they are used as argument
to a format string containing "%s" in request_module(), some sensitive
information is leaked to userspace via argument of spawned modprobe
process.

The first and the third bugs were introduced before the git epoch; the
second was introduced in 2722971c (v2.6.17-rc1).  To trigger the bug
one should have CAP_NET_ADMIN.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-15 13:36:05 +01:00
Vasiliy Kulikov
42eab94fff netfilter: arp_tables: fix infoleak to userspace
Structures ipt_replace, compat_ipt_replace, and xt_get_revision are
copied from userspace.  Fields of these structs that are
zero-terminated strings are not checked.  When they are used as argument
to a format string containing "%s" in request_module(), some sensitive
information is leaked to userspace via argument of spawned modprobe
process.

The first bug was introduced before the git epoch;  the second is
introduced by 6b7d31fc (v2.6.15-rc1);  the third is introduced by
6b7d31fc (v2.6.15-rc1).  To trigger the bug one should have
CAP_NET_ADMIN.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-15 13:35:21 +01:00
Changli Gao
4656c4d61a netfilter: xt_connlimit: remove connlimit_rnd_inited
A potential race condition when generating connlimit_rnd is also fixed.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-15 13:26:32 +01:00
Changli Gao
3e0d5149e6 netfilter: xt_connlimit: use hlist instead
The header of hlist is smaller than list.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-15 13:25:42 +01:00
Changli Gao
0e23ca14f8 netfilter: xt_connlimit: use kmalloc() instead of kzalloc()
All the members are initialized after kzalloc().

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-15 13:24:56 +01:00
Changli Gao
8183e3a88a netfilter: xt_connlimit: fix daddr connlimit in SNAT scenario
We use the reply tuples when limiting the connections by the destination
addresses, however, in SNAT scenario, the final reply tuples won't be
ready until SNAT is done in POSTROUING or INPUT chain, and the following
nf_conntrack_find_get() in count_tem() will get nothing, so connlimit
can't work as expected.

In this patch, the original tuples are always used, and an additional
member addr is appended to save the address in either end.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-15 13:23:28 +01:00
Al Viro
326be7b484 Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
Just need to make sure that AF_UNIX garbage collector won't
confuse O_PATHed socket on filesystem for real AF_UNIX opened
socket.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-15 02:21:45 -04:00
Simon Horman
14e405461e IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()
Break out the portions of __ip_vs_control_init() and
__ip_vs_control_cleanup() where aren't necessary when
CONFIG_SYSCTL is undefined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:37:01 +09:00
Simon Horman
fb1de432c1 IPVS: Conditionally define and use ip_vs_lblc{r}_table
ip_vs_lblc_table and ip_vs_lblcr_table, and code that uses them
are unnecessary when CONFIG_SYSCTL is undefined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:37:01 +09:00
Simon Horman
a7a86b8616 IPVS: Minimise ip_vs_leave when CONFIG_SYSCTL is undefined
Much of ip_vs_leave() is unnecessary if CONFIG_SYSCTL is undefined.

I tried an approach of breaking the now #ifdef'ed portions out
into a separate function. However this appeared to grow the
compiled code on x86_64 by about 200 bytes in the case where
CONFIG_SYSCTL is defined. So I have gone with the simpler though
less elegant #ifdef'ed solution for now.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:37:00 +09:00
Simon Horman
b27d777ec5 IPVS: Conditinally use sysctl_lblc{r}_expiration
In preparation for not including sysctl_lblc{r}_expiration in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:59 +09:00
Simon Horman
8e1b0b1b56 IPVS: Add expire_quiescent_template()
In preparation for not including sysctl_expire_quiescent_template in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:58 +09:00
Simon Horman
71a8ab6cad IPVS: Add sysctl_expire_nodest_conn()
In preparation for not including sysctl_expire_nodest_conn in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:58 +09:00
Simon Horman
7532e8d40c IPVS: Add sysctl_sync_ver()
In preparation for not including sysctl_sync_ver in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:57 +09:00
Simon Horman
59e0350ead IPVS: Add {sysctl_sync_threshold,period}()
In preparation for not including sysctl_sync_threshold in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:57 +09:00
Simon Horman
0cfa558e2c IPVS: Add sysctl_nat_icmp_send()
In preparation for not including sysctl_nat_icmp_send in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:56 +09:00
Simon Horman
84b3cee39f IPVS: Add sysctl_snat_reroute()
In preparation for not including sysctl_snat_reroute in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:55 +09:00
Simon Horman
ba4fd7e966 IPVS: Add ip_vs_route_me_harder()
Add ip_vs_route_me_harder() to avoid repeating the same code twice.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:54 +09:00
Julian Anastasov
6ef757f965 ipvs: rename estimator functions
Rename ip_vs_new_estimator to ip_vs_start_estimator
and ip_vs_kill_estimator to ip_vs_stop_estimator to better
match their logic.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:54 +09:00
Julian Anastasov
ea9f22cce9 ipvs: optimize rates reading
Move the estimator reading from estimation_timer to user
context. ip_vs_read_estimator() will be used to decode the rate
values. As the decoded rates are not set by estimation timer
there is no need to reset them in ip_vs_zero_stats.

 	There is no need ip_vs_new_estimator() to encode stats
to rates, if the destination is in trash both the stats and the
rates are inactive.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:53 +09:00
Julian Anastasov
55a3d4e15c ipvs: properly zero stats and rates
Currently, the new percpu counters are not zeroed and
the zero commands do not work as expected, we still show the old
sum of percpu values. OTOH, we can not reset the percpu counters
from user context without causing the incrementing to use old
and bogus values.

 	So, as Eric Dumazet suggested fix that by moving all overhead
to stats reading in user context. Do not introduce overhead in
timer context (estimator) and incrementing (packet handling in
softirqs).

 	The new ustats0 field holds the zero point for all
counter values, the rates always use 0 as base value as before.
When showing the values to user space just give the difference
between counters and the base values. The only drawback is that
percpu stats are not zeroed, they are accessible only from /proc
and are new interface, so it should not be a compatibility problem
as long as the sum stats are correct after zeroing.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:52 +09:00
Julian Anastasov
2a0751af09 ipvs: reorganize tot_stats
The global tot_stats contains cpustats field just like the
stats for dest and svc, so better use it to simplify the usage
in estimation_timer. As tot_stats is registered as estimator
we can remove the special ip_vs_read_cpu_stats call for
tot_stats. Fix ip_vs_read_cpu_stats to be called under
stats lock because it is still used as synchronization between
estimation timer and user context (the stats readers).

 	Also, make sure ip_vs_stats_percpu_show reads properly
the u64 stats from user context.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:52 +09:00
Shan Wei
6060c74a3d netfilter:ipvs: use kmemdup
The semantic patch that makes this output is available
in scripts/coccinelle/api/memdup.cocci.

More information about semantic patching is available at
http://coccinelle.lip6.fr/

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:49 +09:00
Julian Anastasov
4a569c0c0f ipvs: remove _bh from percpu stats reading
ip_vs_read_cpu_stats is called only from timer, so
no need for _bh locks.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:48 +09:00
Julian Anastasov
097fc76a08 ipvs: avoid lookup for fwmark 0
Restore the previous behaviour to lookup for fwmark
service only when fwmark is non-null. This saves only CPU.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:48 +09:00
Mark Rustad
698e1d23cf net: dcbnl: Update copyright dates
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 17:02:42 -07:00
Sangtae Ha
b5ccd07337 tcp_cubic: fix low utilization of CUBIC with HyStart
HyStart sets the initial exit point of slow start.
Suppose that HyStart exits at 0.5BDP in a BDP network and no history exists.
If the BDP of a network is large, CUBIC's initial cwnd growth may be
too conservative to utilize the link.
CUBIC increases the cwnd 20% per RTT in this case.

Signed-off-by: Sangtae Ha <sangtae.ha@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:54:42 -07:00
Sangtae Ha
2b4636a5f8 tcp_cubic: make the delay threshold of HyStart less sensitive
Make HyStart less sensitive to abrupt delay variations due to buffer bloat.

Signed-off-by: Sangtae Ha <sangtae.ha@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Reported-by: Lucas Nussbaum <lucas.nussbaum@loria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:54:42 -07:00
stephen hemminger
3b585b3449 tcp_cubic: enable high resolution ack time if needed
This is a refined version of an earlier patch by Lucas Nussbaum.
Cubic needs RTT values in milliseconds. If HZ < 1000 then
the values will be too coarse.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reported-by: Lucas Nussbaum <lucas.nussbaum@loria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:54:40 -07:00
stephen hemminger
17a6e9f1aa tcp_cubic: fix clock dependency
The hystart code was written with assumption that HZ=1000.
Replace the use of jiffies with bictcp_clock as a millisecond
real time clock.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reported-by: Lucas Nussbaum <lucas.nussbaum@loria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:54:39 -07:00
stephen hemminger
aac46324e1 tcp_cubic: make ack train delta value a parameter
Make the spacing between ACK's that indicates a train a tuneable
value like other hystart values.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:54:39 -07:00
stephen hemminger
c54b4b7655 tcp_cubic: fix comparison of jiffies
Jiffies wraps around therefore the correct way to compare is
to use cast to signed value.

Note: cubic is not using full jiffies value on 64 bit arch
because using full unsigned long makes struct bictcp grow too
large for the available ca_priv area.

Includes correction from Sangtae Ha to improve ack train detection.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:54:38 -07:00
stephen hemminger
febf081987 tcp: fix RTT for quick packets in congestion control
In the congestion control interface, the callback for each ACK
includes an estimated round trip time in microseconds.
Some algorithms need high resolution (Vegas style) but most only
need jiffie resolution.  If RTT is not accurate (like a retransmission)
-1 is used as a flag value.

When doing coarse resolution if RTT is less than a a jiffie
then 0 should be returned rather than no estimate. Otherwise algorithms
that expect good ack's to trigger slow start (like CUBIC Hystart)
will be confused.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:54:38 -07:00
Daniel Baluta
e5537bfc98 af_unix: update locking comment
We latch our state using a spinlock not a r/w kind of lock.

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:25:33 -07:00
stephen hemminger
a461c0297f bridge: skip forwarding delay if not using STP
If Spanning Tree Protocol is not enabled, there is no good reason for
the bridge code to wait for the forwarding delay period before enabling
the link. The purpose of the forwarding delay is to allow STP to
learn about other bridges before nominating itself.

The only possible impact is that when starting up a new port
the bridge may flood a packet now, where previously it might have
seen traffic from the other host and preseeded the forwarding table.

Includes change for local variable br already available in that func.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 15:06:49 -07:00
stephen hemminger
1faa4356a3 bridge: control carrier based on ports online
This makes the bridge device behave like a physical device.
In earlier releases the bridge always asserted carrier. This
changes the behavior so that bridge device carrier is on only
if one or more ports are in the forwarding state. This
should help IPv6 autoconfiguration, DHCP, and routing daemons.

I did brief testing with Network and Virt manager and they
seem fine, but since this changes behavior of bridge, it should
wait until net-next (2.6.39).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Tested-By: Adam Majer <adamm@zombino.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-14 14:29:02 -07:00
David S. Miller
201a11c1db Merge branch 'tipc-Mar14-2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6 2011-03-14 13:49:53 -07:00
Daniel Turull
05aebe2e5d pktgen: bug fix in transmission headers with frags=0
(bug introduced by commit 26ad787962
(pktgen: speedup fragmented skbs)

The headers of pktgen were incorrectly added in a pktgen packet
without frags (frags=0). There was an offset in the pktgen headers.

The cause was in reusing the pgh variable as a return variable in skb_put
when adding the payload to the skb.

Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-03-14 13:47:40 -07:00
Felix Fietkau
9db372fdd5 mac80211: fix channel type recalculation with HT and non-HT interfaces
When running an AP interface along with the cooked monitor interface created
by hostapd, adding an interface and deleting it again triggers a channel type
recalculation during which the (non-HT) monitor interface takes precedence
over the HT AP interface, thus causing the channel type to be set to non-HT.
Fix this by ensuring that a more wide channel type will not be overwritten
by a less wide channel type.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-14 14:46:58 -04:00
Helmut Schaa
cf28d7934c mac80211: Shortcut minstrel_ht rate setup for non-MRR capable devices
Devices without multi rate retry support won't be able to use all rates
as specified by mintrel_ht. Hence, we can simply skip setting up further
rates as the devices will only use the first one.

Also add a special case for devices with only two possible tx rates. We
use sample_rate -> max_prob_rate for sampling and max_tp_rate ->
max_prob_rate by default.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-03-14 14:46:58 -04:00
Stephen Hemminger
fe8f661f2c netfilter: nf_conntrack: fix sysctl memory leak
Message in log because sysctl table was not empty at netns exit
 WARNING: at net/sysctl_net.c:84 sysctl_net_exit+0x2a/0x2c()

Instrumenting showed that the nf_conntrack_timestamp was the entry
that was being created but not cleared.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-14 19:20:44 +01:00
Linus Torvalds
5f40d42094 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: NFSROOT should default to "proto=udp"
  nfs4: remove duplicated #include
  NFSv4: nfs4_state_mark_reclaim_nograce() should be static
  NFSv4: Fix the setlk error handler
  NFSv4.1: Fix the handling of the SEQUENCE status bits
  NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
  NFSv4.1 reclaim complete must wait for completion
  NFSv4: remove duplicate clientid in struct nfs_client
  NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY
  sunrpc: Propagate errors from xs_bind() through xs_create_sock()
  (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid
  nfs: fix compilation warning
  nfs: add kmalloc return value check in decode_and_add_ds
  SUNRPC: Remove resource leak in svc_rdma_send_error()
  nfs: close NFSv4 COMMIT vs. CLOSE race
  SUNRPC: Close a race in __rpc_wait_for_completion_task()
2011-03-14 11:19:50 -07:00
Patrick McHardy
42046e2e45 netfilter: x_tables: return -ENOENT for non-existant matches/targets
As Stephen correctly points out, we need to return -ENOENT in
xt_find_match()/xt_find_target() after the patch "netfilter: x_tables:
misuse of try_then_request_module" in order to properly indicate
a non-existant module to the caller.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-03-14 19:11:44 +01:00
Paul Gortmaker
1fa073803e tipc: delete extra semicolon blocking node deletion
Remove bogus semicolon only recently introduced in 34e46258cb
that blocks cleanup of nodes for N>1 on shutdown.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-14 12:21:12 -04:00
Al Viro
c9c6cac0c2 kill path_lookup()
all remaining callers pass LOOKUP_PARENT to it, so
flags argument can die; renamed to kern_path_parent()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-14 09:15:23 -04:00
Eric Dumazet
4e75db2e8f inetpeer: should use call_rcu() variant
After commit 7b46ac4e77 (inetpeer: Don't disable BH for initial
fast RCU lookup.), we should use call_rcu() to wait proper RCU grace
period.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-13 23:22:23 -07:00
Steffen Klassert
d8647b79c3 xfrm: Add user interface for esn and big anti-replay windows
This patch adds a netlink based user interface to configure
esn and big anti-replay windows. The new netlink attribute
XFRMA_REPLAY_ESN_VAL is used to configure the new implementation.
If the XFRM_STATE_ESN flag is set, we use esn and support for big
anti-replay windows for the configured state. If this flag is not
set we use the new implementation with 32 bit sequence numbers.
A big anti-replay window can be configured in this case anyway.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-13 20:22:31 -07:00
Steffen Klassert
2cd084678f xfrm: Add support for IPsec extended sequence numbers
This patch adds support for IPsec extended sequence numbers (esn)
as defined in RFC 4303. The bits to manage the anti-replay window
are based on a patch from Alex Badea.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-13 20:22:31 -07:00
Steffen Klassert
97e15c3a85 xfrm: Support anti-replay window size bigger than 32 packets
As it is, the anti-replay bitmap in struct xfrm_replay_state can
only accomodate 32 packets. Even though it is possible to configure
anti-replay window sizes up to 255 packets from userspace. So we
reject any packet with a sequence number within the configured window
but outside the bitmap. With this patch, we represent the anti-replay
window as a bitmap of variable length that can be accessed via the
new struct xfrm_replay_state_esn. Thus, we have no limit on the
window size anymore. To use the new anti-replay window implementantion,
new userspace tools are required. We leave the old implementation
untouched to stay in sync with old userspace tools.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-13 20:22:30 -07:00
Steffen Klassert
9fdc4883d9 xfrm: Move IPsec replay detection functions to a separate file
To support multiple versions of replay detection, we move the replay
detection functions to a separate file and make them accessible
via function pointers contained in the struct xfrm_replay.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-13 20:22:30 -07:00
Steffen Klassert
d212a4c290 esp6: Add support for IPsec extended sequence numbers
This patch adds IPsec extended sequence numbers support to esp6.
We use the authencesn crypto algorithm to handle esp with separate
encryption/authentication algorithms.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-13 20:22:29 -07:00
Steffen Klassert
0dc49e9b28 esp4: Add support for IPsec extended sequence numbers
This patch adds IPsec extended sequence numbers support to esp4.
We use the authencesn crypto algorithm to handle esp with separate
encryption/authentication algorithms.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-13 20:22:29 -07:00
Steffen Klassert
1ce3644ade xfrm: Use separate low and high order bits of the sequence numbers in xfrm_skb_cb
To support IPsec extended sequence numbers, we split the
output sequence numbers of xfrm_skb_cb in low and high order 32 bits
and we add the high order 32 bits to the input sequence numbers.
All users are updated accordingly.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-13 20:22:28 -07:00
David S. Miller
27b61ae2d7 Merge branch 'tipc-Mar13-2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6 2011-03-13 18:49:11 -07:00
Hiroaki SHIMODA
46af31800b ipv4: Fix PMTU update.
On current net-next-2.6, when Linux receives ICMP Type: 3, Code: 4
(Destination unreachable (Fragmentation needed)),

  icmp_unreach
    -> ip_rt_frag_needed
         (peer->pmtu_expires is set here)
    -> tcp_v4_err
         -> do_pmtu_discovery
              -> ip_rt_update_pmtu
                   (peer->pmtu_expires is already set,
                    so check_peer_pmtu is skipped.)
                   -> check_peer_pmtu

check_peer_pmtu is skipped and MTU is not updated.

To fix this, let check_peer_pmtu execute unconditionally.
And some minor fixes
1) Avoid potential peer->pmtu_expires set to be zero.
2) In check_peer_pmtu, argument of time_before is reversed.
3) check_peer_pmtu expects peer->pmtu_orig is initialized as zero,
   but not initialized.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-13 18:37:49 -07:00
Allan Stephens
390bce4237 tipc: Eliminate obsolete routine for handling routed messages
Eliminates a routine that is used in handling messages arriving from
another cluster or zone. Such messages can no longer be received by TIPC
now that multi-cluster and multi-zone network support has been eliminated.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:19 -04:00
Allan Stephens
7945c1fb02 tipc: Eliminate remaining support for routing table messages
Gets rid of all remaining code relating to ROUTE_DISTRIBUTOR messages.
These messages were only used in multi-cluster and multi-zone networks,
which TIPC no longer supports. (For safety, TIPC now treats such messages
the same way that it handles other unrecognized messages.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:19 -04:00
Allan Stephens
50d492321a tipc: Remove bearer flag indicating existence of broadcast address
Eliminates the flag in the TIPC bearer structure that indicates if
the bearer supports broadcasting, since the flag is always set to 1
and serves no useful purpose.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:19 -04:00
Allan Stephens
f9107ebe7d tipc: Don't respond to neighbor discovery request on blocked bearer
Adds a check to prevent TIPC from trying to respond to an incoming
LINK_CONFIG request message if the associated bearer is currently
prohibited from sending messages.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:19 -04:00
Allan Stephens
d901a42b27 tipc: Eliminate unnecessary constant for neighbor discovery msg size
Eliminates an unnecessary constant that defines the size of a LINK_CONFIG
message, and uses one of the existing standard message size symbols in
its place. (The defunct constant was located in the wrong place anyway,
since it was grouped with other constants that define message users instead
of message sizes.)

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:18 -04:00
Allan Stephens
a2b58de2e3 tipc: Remove unused field in bearer structure
Eliminates a field in TIPC's bearer objects that is set, but never
referenced.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:18 -04:00
Allan Stephens
50d3e6399a tipc: Correct misnamed references to neighbor discovery domain
Renames items that are improperly labelled as "network scope" items
(which are represented by simple integer values) rather than "network
domain" items (which are represented by <Z.C.N>-type network addresses).
This change is purely cosmetic, and does not affect the operation of TIPC.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:18 -04:00
Allan Stephens
37b9c08a88 tipc: Optimizations to link creation code
Enhances link creation code as follows:

1) Detects illegal attempts to add a requested link earlier in the
   link creation process. This prevents TIPC from wasting time
   initializing a link object it then throws away, and also eliminates
   the code needed to do the throwing away.

2) Passes in the node object associated with the requested link.
   This allows TIPC to eliminate a search to locate the node object,
   as well as code that attempted to create the node if it doesn't
   exist.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:18 -04:00
Allan Stephens
fa2bae2d5b tipc: Give Tx of discovery responses priority over link messages
Delay releasing the node lock when processing a neighbor discovery
message until after the optional discovery response message has been
sent. This helps ensure that any link protocol messages sent by a
link endpoint created as a result of a neighbor discovery request
are received after the discovery response is received, thereby
giving the receiving node a chance to create a peer link endpoint to
consume those link protocol messages, if one does not already exist.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:18 -04:00
Allan Stephens
a728750e4f tipc: Cosmetic changes to neighbor discovery logic
Reworks the appearance of the routine that processes incoming
LINK_CONFIG messages to keep the main logic flow at a consistent level
of indentation, and to add comments outlining the various phases involved
in processing each message. This rework is being done to allow upcoming
enhancements to this routine to be integrated more cleanly.

The diff isn't really readable, so know that it was a case of the
old code being like:

	tipc_disc_recv_msg(..)
	{
		if (in_own_cluster(orig)) {
			...
			lines and lines of stuff
			...
		}
	}

which is now replaced with the more sane:

	tipc_disc_recv_msg(..)
	{
		if (!in_own_cluster(orig))
			return;
		...
		lines and lines of stuff
		...
	}

Instances of spin locking within the reindented block were replaced with
the identical tipc_node_[un]lock() abstractions.  Note that all these
changes are cosmetic in nature, and do not change the way LINK_CONFIG
messages are processed.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:18 -04:00
Allan Stephens
75f0aa4990 tipc: Fix redundant link field handling in link protocol message
Ensures that the "redundant link exists" field of the LINK_PROTOCOL
messages sent by a link endpoint is set if and only if the sending
node has at least one other working link to the peer node. Previously,
the bit was set only if there were at least 2 working links to the peer
node, meaning the bit was incorrectly left unset in messages sent by a
non-working link endpoint when exactly one alternate working link was
available. The revised code now takes the state of the link sending
the message into account when deciding if an alternate link exists.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:18 -04:00
Allan Stephens
77f167fcce tipc: make msg_set_redundant_link() consistent with other set ops
All the other boolean like msg_set_X(m) operations don't
export both a msg_set_X(a) and a msg_clear_X(m), but instead
just have the single msg_set_X(m, val) variant.

Make the redundant_link one consistent by having the set take
a value, and delete the msg_clear_redundant_link() anomoly.
This is a cosmetic change and should not change behaviour.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Paul Gortmaker
8f19afb2db tipc: cosmetic - function names are not to be full sentences
Function names like "tipc_node_has_redundant_links" are unweildy
and result in long lines even for simple lines.  The "has" doesn't
contribute any value add, so dropping that is a slight step in the
right direction.   This is a cosmetic change, basic result of:

for i in `grep -l tipc_node_has_ *` ; do sed -i s/tipc_node_has_/tipc_node_/ $i ; done

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Allan Stephens
e7b3acb6a8 tipc: Eliminate timestamp from link protocol messages
Removes support for the timestamp field of TIPC's link protocol messages.

This field was previously used to hold an OS-dependent timestamp value
that was used to assist in debugging early versions of TIPC. The field
has now been deemed unnecessary and has been removed from the latest TIPC
specification. This change has no impact on the operation of TIPC since
the field was set by TIPC, but never referenced.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Allan Stephens
34e46258cb tipc: manually inline net_start/stop, make assoc. vars static
Relocates network-related variables into the subsystem files where
they are now primarily used (following the recent rework of TIPC's
node table), and converts globals into locals where possible. Changes
the initialization of tipc_num_links from run-time to compile-time,
and eliminates the net_start routine that becomes empty as a result.
Also eliminates the corresponding net_stop routine by moving its
(trivial) content into the one location that called the routine.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Allan Stephens
672d99e19a tipc: Convert node object array to a hash table
Replaces the dynamically allocated array of pointers to the cluster's
node objects with a static hash table. Hash collisions are resolved
using chaining, with a typical hash chain having only a single node,
to avoid degrading performance during processing of incoming packets.
The conversion to a hash table reduces the memory requirements for
TIPC's node table to approximately the same size it had prior to
the previous commit.

In addition to the hash table itself, TIPC now also maintains a
linked list for the node objects, sorted by ascending network address.
This list allows TIPC to continue sending responses to user space
applications that request node and link information in sorted order.
The list also improves performance when name table update messages are
sent by making it easier to identify the nodes that must be notified.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Allan Stephens
f831c963b5 tipc: Eliminate configuration for maximum number of cluster nodes
Gets rid of the need for users to specify the maximum number of
cluster nodes supported by TIPC. TIPC now automatically provides
support for all 4K nodes allowed by its addressing scheme.

Note: This change sets TIPC's memory usage to the amount used by
a maximum size node table with 4K entries.  An upcoming patch that
converts the node table from a linear array to a hash table will
compact the node table to a more efficient design, but for clarity
it is nice to have all the Kconfig infrastruture go away separately.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Allan Stephens
d1bcb11544 tipc: Split up unified structure of network-related variables
Converts the fields of the global "tipc_net" structure into individual
variables.  Since the struct was never referenced as a complete unit,
its existence was pointless.  This will facilitate upcoming changes to
TIPC's node table and simpify upcoming relocation of the variables so
they are only visible to the files that actually use them.

This change is essentially cosmetic in nature, and doesn't affect the
operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:17 -04:00
Allan Stephens
9df3b7eb6e tipc: Fix problem with missing link in "tipc-config -l" output
Removes a race condition that could cause TIPC's internal counter
of the number of links it has to neighboring nodes to have the
incorrect value if two independent threads of control simultaneously
create new link endpoints connecting to two different nodes using two
different bearers. Such under counting would result in TIPC failing to
list the final link(s) in its response to a configuration request to
list all of the node's links. The counter is now updated atomically
to ensure that simultaneous increments do not interfere with each
other.

Thanks go to Peter Butler <pbutler@pt.com> for his assistance in
diagnosing and fixing this problem.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:16 -04:00
Allan Stephens
71092ea122 tipc: Add support for SO_RCVTIMEO socket option
Adds support for the SO_RCVTIMEO socket option to TIPC's socket
receive routines.

Thanks go out to Raj Hegde <rajenhegde@yahoo.ca> for his contribution
to the development and testing this enhancement.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-13 16:35:16 -04:00