c7d4426a98
While doing stress tests with IP route cache disabled, and multi queue devices, I noticed a very high contention on one rwlock used in neighbour code. When many cpus are trying to send frames (possibly using a high performance multiqueue device) to the same neighbour, they fight for the neigh->lock rwlock in order to call neigh_hh_init(), and fight on hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test()) But we dont need to call neigh_hh_init() for dst that are used only once. It costs four atomic operations at least, on two contended cache lines, plus the high contention on neigh->lock rwlock. Introduce a new dst flag, DST_NOCACHE, that is set when dst was not inserted in route cache. With the stress test bench, sending 160000000 frames on one neighbour, results are : Before patch: real 2m28.406s user 0m11.781s sys 36m17.964s After patch: real 1m26.532s user 0m12.185s sys 20m3.903s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
---|---|---|
.. | ||
datagram.c | ||
dev.c | ||
dev_addr_lists.c | ||
drop_monitor.c | ||
dst.c | ||
ethtool.c | ||
fib_rules.c | ||
filter.c | ||
flow.c | ||
gen_estimator.c | ||
gen_stats.c | ||
iovec.c | ||
kmap_skb.h | ||
link_watch.c | ||
Makefile | ||
neighbour.c | ||
net-sysfs.c | ||
net-sysfs.h | ||
net-traces.c | ||
net_namespace.c | ||
netevent.c | ||
netpoll.c | ||
pktgen.c | ||
request_sock.c | ||
rtnetlink.c | ||
scm.c | ||
skbuff.c | ||
sock.c | ||
stream.c | ||
sysctl_net_core.c | ||
timestamping.c | ||
user_dma.c | ||
utils.c |