With CONFIG_EXPERT=y, CONFIG_KASAN=y, CONFIG_RANDOMIZE_BASE=n,
CONFIG_RELOCATABLE=n, we observe the following failure when trying to
link the kernel image with LD=ld.lld:
error: section: .exit.data is not contiguous with other relro sections
ld.lld defaults to -z relro while ld.bfd defaults to -z norelro. This
was previously fixed, but only for CONFIG_RELOCATABLE=y.
Cc: stable@vger.kernel.org
Fixes: commit 3bbd3db86470 ("arm64: relocatable: fix inconsistencies in linker script and options")
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Bug: 170775238
Link: https://lore.kernel.org/lkml/20201016175339.2429280-1-ndesaulniers@google.com/T/
Change-Id: I6a084b9610dec6c23c6bdd6e0eaeecc2d27f5dca
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Add native kernel support for a sane VPN
Note, this disables CONFIG_ARM64_CRYPTO from the gki arm64 defconfig
because CONFIG_WIREGUARD explicitly enables. So the functionality is
still there, but the defconfig does not need to show it anymore.
Bug: 152722841
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iece00c78770ac146d0f5bc252902794fec2e3955
Eric's suggested fix for the previous commit's mentioned race condition
was to simply take the table->lock in wg_index_hashtable_replace(). The
table->lock of the hash table is supposed to protect the bucket heads,
not the entires, but actually, since all the mutator functions are
already taking it, it makes sense to take it too for the test to
hlist_unhashed, as a defense in depth measure, so that it no longer
races with deletions, regardless of what other locks are protecting
individual entries. This is sensible from a performance perspective
because, as Eric pointed out, the case of being unhashed is already the
unlikely case, so this won't add common contention. And comparing
instructions, this basically doesn't make much of a difference other
than pushing and popping %r13, used by the new `bool ret`. More
generally, I like the idea of locking consistency across table mutator
functions, and this might let me rest slightly easier at night.
Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6147f7b1e90ff09bd52afc8b9206a7fcd133daf7)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3f3c44100fe655f3f278dc8a57cee1171ced4147
Eric reported that syzkaller found a race of this variety:
CPU 1 CPU 2
-------------------------------------------|---------------------------------------
wg_index_hashtable_replace(old, ...) |
if (hlist_unhashed(&old->index_hash)) |
| wg_index_hashtable_remove(old)
| hlist_del_init_rcu(&old->index_hash)
| old->index_hash.pprev = NULL
hlist_replace_rcu(&old->index_hash, ...) |
*old->index_hash.pprev |
Syzbot wasn't actually able to reproduce this more than once or create a
reproducer, because the race window between checking "hlist_unhashed" and
calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or
similar there helps make this demonstrable using this simple script:
#!/bin/bash
set -ex
trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT
ip link add wg0 type wireguard
ip link add wg1 type wireguard
wg set wg0 private-key <(wg genkey) listen-port 9999
wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1
wg set wg0 peer $(wg show wg1 public-key)
ip link set wg0 up
yes link set wg1 up | ip -force -batch - &
pid1=$!
yes link set wg1 down | ip -force -batch - &
pid2=$!
wait
The fundumental underlying problem is that we permit calls to wg_index_
hashtable_remove(handshake.entry) without requiring the caller to take
the handshake mutex that is intended to protect members of handshake
during mutations. This is consistently the case with calls to wg_index_
hashtable_insert(handshake.entry) and wg_index_hashtable_replace(
handshake.entry), but it's missing from a pertinent callsite of wg_
index_hashtable_remove(handshake.entry). So, this patch makes sure that
mutex is taken.
The original code was a little bit funky though, in the form of:
remove(handshake.entry)
lock(), memzero(handshake.some_members), unlock()
remove(handshake.entry)
The original intention of that double removal pattern outside the lock
appears to be some attempt to prevent insertions that might happen while
locks are dropped during expensive crypto operations, but actually, all
callers of wg_index_hashtable_insert(handshake.entry) take the write
lock and then explicitly check handshake.state, as they should, which
the aforementioned memzero clears, which means an insertion should
already be impossible. And regardless, the original intention was
necessarily racy, since it wasn't guaranteed that something else would
run after the unlock() instead of after the remove(). So, from a
soundness perspective, it seems positive to remove what looks like a
hack at best.
The crash from both syzbot and from the script above is as follows:
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker
RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline]
RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174
Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5
RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010
RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000
R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000
R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500
FS: 0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820
wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline]
wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9179ba31367bcf481c3c79b5f028c94faad9f30a)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5d1ca2bb77b61c654b2f56660a6b1c3f5fb2446f
Now that wg_examine_packet_protocol has been added for general
consumption as ip_tunnel_parse_protocol, it's possible to remove
wg_examine_packet_protocol and simply use the new
ip_tunnel_parse_protocol function directly.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1a574074ae7d1d745c16f7710655f38a53174c27)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia65b9100a5647c0084108b732f445f3d8a146c3a
Some devices that take straight up layer 3 packets benefit from having a
shared header_ops so that AF_PACKET sockets can inject packets that are
recognized. This shared infrastructure will be used by other drivers
that currently can't inject packets using AF_PACKET. It also exposes the
parser function, as it is useful in standalone form too.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Jason: no parse_protocol in ip_tunnel_header_ops]
(cherry picked from commit 2606aff916854b61234bf85001be9777bab2d5f8)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib255de1683387e8adfa5042cf11ea004a06cd78f
The napi_gro_receive function no longer returns GRO_DROP ever, making
handling GRO_DROP dead code. This commit removes that dead code.
Further, it's not even clear that device drivers have any business in
taking action after passing off received packets; that's arguably out of
their hands.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Fixes: 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit df08126e3833e9dca19e2407db5f5860a7c194fb)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3fb4902fcb0e8bb522317518120651defed7804f
Before, we took a reference to the creating netns if the new netns was
different. This caused issues with circular references, with two
wireguard interfaces swapping namespaces. The solution is to rather not
take any extra references at all, but instead simply invalidate the
creating netns pointer when that netns is deleted.
In order to prevent this from happening again, this commit improves the
rough object leak tracking by allowing it to account for created and
destroyed interfaces, aside from just peers and keys. That then makes it
possible to check for the object leak when having two interfaces take a
reference to each others' namespaces.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 900575aa33a3eaaef802b31de187a85c4a4b4bd0)
Bug: 152722841
[Jason: netlink notifier uses exit instead of pre_exit]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iea52fe3ca0e41318c392d9e91edb1856de6c9528
Fixes an error condition reported by checkpatch.pl which caused by
assigning a variable in an if condition in wg_noise_handshake_consume_
initiation().
Signed-off-by: Frank Werner-Krippendorf <mail@hb9fxq.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 558b353c9c2a717509f291c066c6bd8f5f5e21be)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib44bfe39c4897ba774732e148efb73fbd9d6409c
In "wireguard: queueing: preserve flow hash across packet scrubbing", we
were required to slightly increase the size of the receive replay
counter to something still fairly small, but an increase nonetheless.
It turns out that we can recoup some of the additional memory overhead
by splitting up the prior union type into two distinct types. Before, we
used the same "noise_counter" union for both sending and receiving, with
sending just using a simple atomic64_t, while receiving used the full
replay counter checker. This meant that most of the memory being
allocated for the sending counter was being wasted. Since the old
"noise_counter" type increased in size in the prior commit, now is a
good time to split up that union type into a distinct "noise_replay_
counter" for receiving and a boring atomic64_t for sending, each using
neither more nor less memory than required.
Also, since sometimes the replay counter is accessed without
necessitating additional accesses to the bitmap, we can reduce cache
misses by hoisting the always-necessary lock above the bitmap in the
struct layout. We also change a "noise_replay_counter" stack allocation
to kmalloc in a -DDEBUG selftest so that KASAN doesn't trigger a stack
frame warning.
All and all, removing a bit of abstraction in this commit makes the code
simpler and smaller, in addition to the motivating memory usage
recuperation. For example, passing around raw "noise_symmetric_key"
structs is something that really only makes sense within noise.c, in the
one place where the sending and receiving keys can safely be thought of
as the same type of object; subsequent to that, it's important that we
uniformly access these through keypair->{sending,receiving}, where their
distinct roles are always made explicit. So this patch allows us to draw
that distinction clearly as well.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a9e90d9931f3a474f04bab782ccd9d77904941e9)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I164a6693651e8238fd24961939b1ade74f5f2d52
It's important that we clear most header fields during encapsulation and
decapsulation, because the packet is substantially changed, and we don't
want any info leak or logic bug due to an accidental correlation. But,
for encapsulation, it's wrong to clear skb->hash, since it's used by
fq_codel and flow dissection in general. Without it, classification does
not proceed as usual. This change might make it easier to estimate the
number of innerflows by examining clustering of out of order packets,
but this shouldn't open up anything that can't already be inferred
otherwise (e.g. syn packet size inference), and fq_codel can be disabled
anyway.
Furthermore, it might be the case that the hash isn't used or queried at
all until after wireguard transmits the encrypted UDP packet, which
means skb->hash might still be zero at this point, and thus no hash
taken over the inner packet data. In order to address this situation, we
force a calculation of skb->hash before encrypting packet data.
Of course this means that fq_codel might transmit packets slightly more
out of order than usual. Toke did some testing on beefy machines with
high quantities of parallel flows and found that increasing the
reply-attack counter to 8192 takes care of the most pathological cases
pretty well.
Reported-by: Dave Taht <dave.taht@gmail.com>
Reviewed-and-tested-by: Toke Høiland-Jørgensen <toke@toke.dk>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c78a0b4a78839d572d8a80f6a62221c0d7843135)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I12466d816c672380dc20e395f72e6ddc24c45d13
Prior we read the preshared key after dropping the handshake lock, which
isn't an actual crypto issue if it races, but it's still not quite
correct. So copy that part of the state into a temporary like we do with
the rest of the handshake state variables. Then we can release the lock,
operate on the temporary, and zero it out at the end of the function. In
performance tests, the impact of this was entirely unnoticable, probably
because those bytes are coming from the same cacheline as other things
that are being copied out in the same manner.
Reported-by: Matt Dunwoodie <ncon@noconroy.net>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bc67d371256f5c47d824e2eec51e46c8d62d022e)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I41c2eeee65162fac791cd27812c6b9ba5a9eeec2
gcc-10 switched to defaulting to -fno-common, which broke iproute2-5.4.
This was fixed in iproute-5.6, so switch to that. Because we're after a
stable testing surface, we generally don't like to bump these
unnecessarily, but in this case, being able to actually build is a basic
necessity.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ee3c1aa3f34b7842c1557cfe5d8c3f7b8c692de8)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id88bafaca825112ed4e3d53baf2b724bcf70fe00
It's very unlikely that send will become true. It's nearly always false
between 0 and 120 seconds of a session, and in most cases becomes true
only between 120 and 121 seconds before becoming false again. So,
unlikely(send) is clearly the right option here.
What happened before was that we had this complex boolean expression
with multiple likely and unlikely clauses nested. Since this is
evaluated left-to-right anyway, the whole thing got converted to
unlikely. So, we can clean this up to better represent what's going on.
The generated code is the same.
Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 243f2148937adc72bcaaa590d482d599c936efde)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0cc67841703296acab0892503d34d3ce84c40ddb
Without setting these to NULL, clang complains in certain
configurations that have CONFIG_IPV6=n:
In file included from drivers/net/wireguard/ratelimiter.c:223:
drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' is uninitialized when used here [-Werror,-Wuninitialized]
ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
^~~~
drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the variable 'skb6' to silence this warning
struct sk_buff *skb4, *skb6;
^
= NULL
drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' is uninitialized when used here [-Werror,-Wuninitialized]
ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
^~~~
drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the variable 'hdr6' to silence this warning
struct ipv6hdr *hdr6;
^
We silence this warning by setting the variables to NULL as the warning
suggests.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4fed818ef54b08d4b29200e416cce65546ad5312)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8de706d1ba2563560b60df3b9a08b005bb313188
Users with pathological hardware reported CPU stalls on CONFIG_
PREEMPT_VOLUNTARY=y, because the ringbuffers would stay full, meaning
these workers would never terminate. That turned out not to be okay on
systems without forced preemption, which Sultan observed. This commit
adds a cond_resched() to the bottom of each loop iteration, so that
these workers don't hog the core. Note that we don't need this on the
napi poll worker, since that terminates after its budget is expended.
Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reported-by: Wang Jian <larkwang@gmail.com>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4005f5c3c9d006157ba716594e0d70c88a235c5e)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I95440398070e8f6bf11894bd967a3fc4e06c29a8
It's already possible to create two different interfaces and loop
packets between them. This has always been possible with tunnels in the
kernel, and isn't specific to wireguard. Therefore, the networking stack
already needs to deal with that. At the very least, the packet winds up
exceeding the MTU and is discarded at that point. So, since this is
already something that happens, there's no need to forbid the not very
exceptional case of routing a packet back to the same interface; this
loop is no different than others, and we shouldn't special case it, but
rather rely on generic handling of loops in general. This also makes it
easier to do interesting things with wireguard such as onion routing.
At the same time, we add a selftest for this, ensuring that both onion
routing works and infinite routing loops do not crash the kernel. We
also add a test case for wireguard interfaces nesting packets and
sending traffic between each other, as well as the loop in this case
too. We make sure to send some throughput-heavy traffic for this use
case, to stress out any possible recursion issues with the locks around
workqueues.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b673e24aad36981f327a6570412ffa7754de8911)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6e5cb7a9f76372af8e03b54e6c0b0d5d20787604
While at some point it might have made sense to be running these tests
on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on
64-bit powerpc in a long time, and more interesting things that we test
don't really work when we deviate from the default (16k). So, we stop
pushing our luck in this commit, and return to the default instead of
the minimum.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a0fd7cc87a018df1a17f9d3f0bd994c1f22c6b34)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0442ce3ce4a954519f3d10e5db3607522707f35d
WireGuard currently only propagates ECN markings on tunnel decap according
to the old RFC3168 specification. However, the spec has since been updated
in RFC6040 to recommend slightly different decapsulation semantics. This
was implemented in the kernel as a set of common helpers for ECN
decapsulation, so let's just switch over WireGuard to using those, so it
can benefit from this enhancement and any future tweaks. We do not drop
packets with invalid ECN marking combinations, because WireGuard is
frequently used to work around broken ISPs, which could be doing that.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Reported-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Rodney W. Grimes <ietf@gndrsh.dnsmgr.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit eebabcb26ea1e3295704477c6cd4e772c96a9559)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I422eb94327f92d9436e998588e62041cb01902b9
Prior, if the alloc_percpu of packet_percpu_multicore_worker_alloc
failed, the previously allocated ptr_ring wouldn't be freed. This commit
adds the missing call to ptr_ring_cleanup in the error case.
Reported-by: Sultan Alsawaf <sultan@kerneltoast.com>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 130c58606171326c81841a49cc913cd354113dd9)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iad1c5bc2be2459b3dbe4791a5fec09a9403d7d56
This commit removes a useless newline at the end of a scope, which
doesn't add anything in the way of organization or readability.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d6833e42786e050e7522d6a91a9361e54085897d)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I47c458befe22630e82944f362a56585bbd5e2b7b
We precompute the static-static ECDH during configuration time, in order
to save an expensive computation later when receiving network packets.
However, not all ECDH computations yield a contributory result. Prior,
we were just not letting those peers be added to the interface. However,
this creates a strange inconsistency, since it was still possible to add
other weird points, like a valid public key plus a low-order point, and,
like points that result in zeros, a handshake would not complete. In
order to make the behavior more uniform and less surprising, simply
allow all peers to be added. Then, we'll error out later when doing the
crypto if there's an issue. This also adds more separation between the
crypto layer and the configuration layer.
Discussed-with: Mathias Hall-Andersen <mathias@hall-andersen.dk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 11a7686aa99c7fe4b3f80f6dcccd54129817984d)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iae7e1688340109decefa565b848b97ce444c20b6
The situation in which we wind up hitting the default case here
indicates a major bug in earlier parsing code. It is not a usual thing
that should ever happen, which means a "friendly" message for it doesn't
make sense. Rather, replace this with a WARN_ON, just like we do earlier
in the file for a similar situation, so that somebody sends us a bug
report and we can fix it.
Reported-by: Fabian Freyer <fabianfreyer@radicallyopensecurity.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2b8765c52db24c0fbcc81bac9b5e8390f2c7d3c8)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I40e109e5bac6541d309b588958ef0e09b3b9c19b
We carry out checks to the effect of:
if (skb->protocol != wg_examine_packet_protocol(skb))
goto err;
By having wg_skb_examine_untrusted_ip_hdr return 0 on failure, this
means that the check above still passes in the case where skb->protocol
is zero, which is possible to hit with AF_PACKET:
struct sockaddr_pkt saddr = { .spkt_device = "wg0" };
unsigned char buffer[5] = { 0 };
sendto(socket(AF_PACKET, SOCK_PACKET, /* skb->protocol = */ 0),
buffer, sizeof(buffer), 0, (const struct sockaddr *)&saddr, sizeof(saddr));
Additional checks mean that this isn't actually a problem in the code
base, but I could imagine it becoming a problem later if the function is
used more liberally.
I would prefer to fix this by having wg_examine_packet_protocol return a
32-bit ~0 value on failure, which will never match any value of
skb->protocol, which would simply change the generated code from a mov
to a movzx. However, sparse complains, and adding __force casts doesn't
seem like a good idea, so instead we just add a simple helper function
to check for the zero return value. Since wg_examine_packet_protocol
itself gets inlined, this winds up not adding an additional branch to
the generated code, since the 0 return value already happens in a
mergable branch.
Reported-by: Fabian Freyer <fabianfreyer@radicallyopensecurity.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a5588604af448664e796daf3c1d5a4523c60667b)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I55d75c41f92b0eec88553ac5c2910a8400c427b4
This commit removes a duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 166391159c5deb84795d2ff46e95f276177fa5fb)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8d3becf426a100767f3d83970ba1eeac91b08683
synchronize_net() is a wrapper around synchronize_rcu(), so there's no
point in having synchronize_net and synchronize_rcu back to back,
despite the documentation comment suggesting maybe it's somewhat useful,
"Wait for packets currently being received to be done." This commit
removes the extra call.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1fbc33b0a7feb6ca72bf7dc8a05d81485ee8ee2e)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8f152ba49c926f249c4e0fd0f9bace30df269d5d
It turns out there's an easy way to get packets queued up while still
having an MTU of zero, and that's via persistent keep alive. This commit
makes sure that in whatever condition, we don't wind up dividing by
zero. Note that an MTU of zero for a wireguard interface is something
quasi-valid, so I don't think the correct fix is to limit it via
min_mtu. This can be reproduced easily with:
ip link add wg0 type wireguard
ip link add wg1 type wireguard
ip link set wg0 up mtu 0
ip link set wg1 up
wg set wg0 private-key <(wg genkey)
wg set wg1 listen-port 1 private-key <(wg genkey) peer $(wg show wg0 public-key)
wg set wg0 peer $(wg show wg1 public-key) persistent-keepalive 1 endpoint 127.0.0.1:1
However, while min_mtu=0 seems fine, it makes sense to restrict the
max_mtu. This commit also restricts the maximum MTU to the greatest
number for which rounding up to the padding multiple won't overflow a
signed integer. Packets this large were always rejected anyway
eventually, due to checks deeper in, but it seems more sound not to even
let the administrator configure something that won't work anyway.
We use this opportunity to clean up this function a bit so that it's
clear which paths we're expecting.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 175f1ca9a9ed8689d2028da1a7c624bb4fb4ff7e)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I94193470b4e586f6e55c9d8aa77ea738a6fe9df9
This is a small optimization that prevents more expensive comparisons
from happening when they are no longer necessary, by clearing the
last_under_load variable whenever we wind up in a state where we were
under load but we no longer are.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Suggested-by: Matt Dunwoodie <ncon@noconroy.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2a8a4df36462aa85b0db87b7c5ea145ba67e34a8)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I87d5a99e857f1d1ec5e683e2540cba40b03b833b
This gives us fewer dependencies and shortens build time, fixes up some
hash checking race conditions, and also fixes missing directory creation
that caused issues on massively parallel builds.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 04ddf1208f03e1dbc39a4619c40eba640051b950)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I842d8f3e2fbe3cf29d50d2bc9463299e74c7aef1
Because wireguard is calling icmp from network device context, it should
use the ndo helper so that the rate limiting applies correctly. This
commit adds a small test to the wireguard test suite to ensure that the
new functions continue doing the right thing in the context of
wireguard. It does this by setting up a condition that will definately
evoke an icmp error message from the driver, but along a nat'd path.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a12d7f3cbdc72c7625881c8dc2660fc2c979fdf2)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifb36defa0c8d6dc7d51884078826f5b6ca793bfd
Without this, we wind up proceeding too early sometimes when the
previous process has just used the same listening port. So, we tie the
listening socket query to the specific pid we're interested in.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 88f404a9b1d75388225b1c67b6dd327cb2182777)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I717d6d68773dfd7f114320cc79d1f3bd24bb230a
Ensure that peers with low order points are ignored, both in the case
where we already have a device private key and in the case where we do
not. This adds points that naturally give a zero output.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f9398acba6a4ae9cb98bfe4d56414d376eff8d57)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I76ee06a8c84b5dfc333981fe4f69ead816db2917
Our static-static calculation returns a failure if the public key is of
low order. We check for this when peers are added, and don't allow them
to be added if they're low order, except in the case where we haven't
yet been given a private key. In that case, we would defer the removal
of the peer until we're given a private key, since at that point we're
doing new static-static calculations which incur failures we can act on.
This meant, however, that we wound up removing peers rather late in the
configuration flow.
Syzkaller points out that peer_remove calls flush_workqueue, which in
turn might then wait for sending a handshake initiation to complete.
Since handshake initiation needs the static identity lock, holding the
static identity lock while calling peer_remove can result in a rare
deadlock. We have precisely this case in this situation of late-stage
peer removal based on an invalid public key. We can't drop the lock when
removing, because then incoming handshakes might interact with a bogus
static-static calculation.
While the band-aid patch for this would involve breaking up the peer
removal into two steps like wg_peer_remove_all does, in order to solve
the locking issue, there's actually a much more elegant way of fixing
this:
If the static-static calculation succeeds with one private key, it
*must* succeed with all others, because all 32-byte strings map to valid
private keys, thanks to clamping. That means we can get rid of this
silly dance and locking headaches of removing peers late in the
configuration flow, and instead just reject them early on, regardless of
whether the device has yet been assigned a private key. For the case
where the device doesn't yet have a private key, we safely use zeros
just for the purposes of checking for low order points by way of
checking the output of the calculation.
The following PoC will trigger the deadlock:
ip link add wg0 type wireguard
ip addr add 10.0.0.1/24 dev wg0
ip link set wg0 up
ping -f 10.0.0.2 &
while true; do
wg set wg0 private-key /dev/null peer AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= allowed-ips 10.0.0.0/24 endpoint 10.0.0.3:1234
wg set wg0 private-key <(echo AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=)
done
[ 0.949105] ======================================================
[ 0.949550] WARNING: possible circular locking dependency detected
[ 0.950143] 5.5.0-debug+ #18 Not tainted
[ 0.950431] ------------------------------------------------------
[ 0.950959] wg/89 is trying to acquire lock:
[ 0.951252] ffff8880333e2128 ((wq_completion)wg-kex-wg0){+.+.}, at: flush_workqueue+0xe3/0x12f0
[ 0.951865]
[ 0.951865] but task is already holding lock:
[ 0.952280] ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0
[ 0.953011]
[ 0.953011] which lock already depends on the new lock.
[ 0.953011]
[ 0.953651]
[ 0.953651] the existing dependency chain (in reverse order) is:
[ 0.954292]
[ 0.954292] -> #2 (&wg->static_identity.lock){++++}:
[ 0.954804] lock_acquire+0x127/0x350
[ 0.955133] down_read+0x83/0x410
[ 0.955428] wg_noise_handshake_create_initiation+0x97/0x700
[ 0.955885] wg_packet_send_handshake_initiation+0x13a/0x280
[ 0.956401] wg_packet_handshake_send_worker+0x10/0x20
[ 0.956841] process_one_work+0x806/0x1500
[ 0.957167] worker_thread+0x8c/0xcb0
[ 0.957549] kthread+0x2ee/0x3b0
[ 0.957792] ret_from_fork+0x24/0x30
[ 0.958234]
[ 0.958234] -> #1 ((work_completion)(&peer->transmit_handshake_work)){+.+.}:
[ 0.958808] lock_acquire+0x127/0x350
[ 0.959075] process_one_work+0x7ab/0x1500
[ 0.959369] worker_thread+0x8c/0xcb0
[ 0.959639] kthread+0x2ee/0x3b0
[ 0.959896] ret_from_fork+0x24/0x30
[ 0.960346]
[ 0.960346] -> #0 ((wq_completion)wg-kex-wg0){+.+.}:
[ 0.960945] check_prev_add+0x167/0x1e20
[ 0.961351] __lock_acquire+0x2012/0x3170
[ 0.961725] lock_acquire+0x127/0x350
[ 0.961990] flush_workqueue+0x106/0x12f0
[ 0.962280] peer_remove_after_dead+0x160/0x220
[ 0.962600] wg_set_device+0xa24/0xcc0
[ 0.962994] genl_rcv_msg+0x52f/0xe90
[ 0.963298] netlink_rcv_skb+0x111/0x320
[ 0.963618] genl_rcv+0x1f/0x30
[ 0.963853] netlink_unicast+0x3f6/0x610
[ 0.964245] netlink_sendmsg+0x700/0xb80
[ 0.964586] __sys_sendto+0x1dd/0x2c0
[ 0.964854] __x64_sys_sendto+0xd8/0x1b0
[ 0.965141] do_syscall_64+0x90/0xd9a
[ 0.965408] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 0.965769]
[ 0.965769] other info that might help us debug this:
[ 0.965769]
[ 0.966337] Chain exists of:
[ 0.966337] (wq_completion)wg-kex-wg0 --> (work_completion)(&peer->transmit_handshake_work) --> &wg->static_identity.lock
[ 0.966337]
[ 0.967417] Possible unsafe locking scenario:
[ 0.967417]
[ 0.967836] CPU0 CPU1
[ 0.968155] ---- ----
[ 0.968497] lock(&wg->static_identity.lock);
[ 0.968779] lock((work_completion)(&peer->transmit_handshake_work));
[ 0.969345] lock(&wg->static_identity.lock);
[ 0.969809] lock((wq_completion)wg-kex-wg0);
[ 0.970146]
[ 0.970146] *** DEADLOCK ***
[ 0.970146]
[ 0.970531] 5 locks held by wg/89:
[ 0.970908] #0: ffffffff827433c8 (cb_lock){++++}, at: genl_rcv+0x10/0x30
[ 0.971400] #1: ffffffff82743480 (genl_mutex){+.+.}, at: genl_rcv_msg+0x642/0xe90
[ 0.971924] #2: ffffffff827160c0 (rtnl_mutex){+.+.}, at: wg_set_device+0x9f/0xcc0
[ 0.972488] #3: ffff888032819de0 (&wg->device_update_lock){+.+.}, at: wg_set_device+0xb0/0xcc0
[ 0.973095] #4: ffff888032819bc0 (&wg->static_identity.lock){++++}, at: wg_set_device+0x95d/0xcc0
[ 0.973653]
[ 0.973653] stack backtrace:
[ 0.973932] CPU: 1 PID: 89 Comm: wg Not tainted 5.5.0-debug+ #18
[ 0.974476] Call Trace:
[ 0.974638] dump_stack+0x97/0xe0
[ 0.974869] check_noncircular+0x312/0x3e0
[ 0.975132] ? print_circular_bug+0x1f0/0x1f0
[ 0.975410] ? __kernel_text_address+0x9/0x30
[ 0.975727] ? unwind_get_return_address+0x51/0x90
[ 0.976024] check_prev_add+0x167/0x1e20
[ 0.976367] ? graph_lock+0x70/0x160
[ 0.976682] __lock_acquire+0x2012/0x3170
[ 0.976998] ? register_lock_class+0x1140/0x1140
[ 0.977323] lock_acquire+0x127/0x350
[ 0.977627] ? flush_workqueue+0xe3/0x12f0
[ 0.977890] flush_workqueue+0x106/0x12f0
[ 0.978147] ? flush_workqueue+0xe3/0x12f0
[ 0.978410] ? find_held_lock+0x2c/0x110
[ 0.978662] ? lock_downgrade+0x6e0/0x6e0
[ 0.978919] ? queue_rcu_work+0x60/0x60
[ 0.979166] ? netif_napi_del+0x151/0x3b0
[ 0.979501] ? peer_remove_after_dead+0x160/0x220
[ 0.979871] peer_remove_after_dead+0x160/0x220
[ 0.980232] wg_set_device+0xa24/0xcc0
[ 0.980516] ? deref_stack_reg+0x8e/0xc0
[ 0.980801] ? set_peer+0xe10/0xe10
[ 0.981040] ? __ww_mutex_check_waiters+0x150/0x150
[ 0.981430] ? __nla_validate_parse+0x163/0x270
[ 0.981719] ? genl_family_rcv_msg_attrs_parse+0x13f/0x310
[ 0.982078] genl_rcv_msg+0x52f/0xe90
[ 0.982348] ? genl_family_rcv_msg_attrs_parse+0x310/0x310
[ 0.982690] ? register_lock_class+0x1140/0x1140
[ 0.983049] netlink_rcv_skb+0x111/0x320
[ 0.983298] ? genl_family_rcv_msg_attrs_parse+0x310/0x310
[ 0.983645] ? netlink_ack+0x880/0x880
[ 0.983888] genl_rcv+0x1f/0x30
[ 0.984168] netlink_unicast+0x3f6/0x610
[ 0.984443] ? netlink_detachskb+0x60/0x60
[ 0.984729] ? find_held_lock+0x2c/0x110
[ 0.984976] netlink_sendmsg+0x700/0xb80
[ 0.985220] ? netlink_broadcast_filtered+0xa60/0xa60
[ 0.985533] __sys_sendto+0x1dd/0x2c0
[ 0.985763] ? __x64_sys_getpeername+0xb0/0xb0
[ 0.986039] ? sockfd_lookup_light+0x17/0x160
[ 0.986397] ? __sys_recvmsg+0x8c/0xf0
[ 0.986711] ? __sys_recvmsg_sock+0xd0/0xd0
[ 0.987018] __x64_sys_sendto+0xd8/0x1b0
[ 0.987283] ? lockdep_hardirqs_on+0x39b/0x5a0
[ 0.987666] do_syscall_64+0x90/0xd9a
[ 0.987903] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 0.988223] RIP: 0033:0x7fe77c12003e
[ 0.988508] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 4
[ 0.989666] RSP: 002b:00007fffada2ed58 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 0.990137] RAX: ffffffffffffffda RBX: 00007fe77c159d48 RCX: 00007fe77c12003e
[ 0.990583] RDX: 0000000000000040 RSI: 000055fd1d38e020 RDI: 0000000000000004
[ 0.991091] RBP: 000055fd1d38e020 R08: 000055fd1cb63358 R09: 000000000000000c
[ 0.991568] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000002c
[ 0.992014] R13: 0000000000000004 R14: 000055fd1d38e020 R15: 0000000000000001
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ec31c2676a10e064878927b243fada8c2fb0c03c)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I860bfac72c98c8c9b26f4490b4f346dc67892f87
This worked before, because we made all callers name their next pointer
"next". But in trying to be more "drop-in" ready, the silliness here is
revealed. This commit fixes the problem by making the macro argument and
the member use different names.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5eee7bd7e245914e4e050c413dfe864e31805207)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4778eb05d0ed284543822bd3ba537379c3c19b85
As part of the continual effort to remove direct usage of skb->next and
skb->prev, this patch adds a helper for iterating through the
singly-linked variant of skb lists, which are used for lists of GSO
packet. The name "skb_list_..." has been chosen to match the existing
function, "kfree_skb_list, which also operates on these singly-linked
lists, and the "..._walk_safe" part is the same idiom as elsewhere in
the kernel.
This patch removes the helper from wireguard and puts it into
linux/skbuff.h, while making it a bit more robust for general usage. In
particular, parenthesis are added around the macro argument usage, and it
now accounts for trying to iterate through an already-null skb pointer,
which will simply run the iteration zero times. This latter enhancement
means it can be used to replace both do { ... } while and while (...)
open-coded idioms.
This should take care of these three possible usages, which match all
current methods of iterations.
skb_list_walk_safe(segs, skb, next) { ... }
skb_list_walk_safe(skb, skb, next) { ... }
skb_list_walk_safe(segs, skb, segs) { ... }
Gcc appears to generate efficient code for each of these.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit dcfea72e79b0aa7a057c8f6024169d86a1bbc84b)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia8fe1237424161e4de23df4515dbb2275e15e180
Certain drivers will pass gro skbs to udp, at which point the udp driver
simply iterates through them and passes them off to encap_rcv, which is
where we pick up. At the moment, we're not attempting to coalesce these
into bundles, but we also don't want to wind up having cascaded lists of
skbs treated separately. The right behavior here, then, is to just mark
each incoming one as not on a list. This can be seen in practice, for
example, with Qualcomm's rmnet_perf driver.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 736775d06bac60d7a353e405398b48b2bd8b1e54)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Icee93ca63e947dc93838bd40f8b51751c417f0f6
Before 8b7008620b ("net: Don't copy pfmemalloc flag in __copy_skb_
header()"), the pfmemalloc flag used to be between headers_start and
headers_end, which is a region we clear when preparing the packet for
encryption/decryption. This is a parameter we certainly want to
preserve, which is why 8b7008620b moved it out of there. The code here
was written in a world before 8b7008620b, though, where we had to
manually account for it. This commit brings things up to speed.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 04d2ea92a18417619182cbb79063f154892b0150)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1f2c3176c00b01adae90143d958f1d98c7fea8c9
Quite a bit of the test suite was designed to work with ancient kernels.
Thankfully we no longer have to deal with this. This commit updates
things that we can finally update and removes things that we can finally
remove, to avoid the build-up of the last several years as a result of
having to support ancient kernels. We can finally rely on suppress_
prefixlength being available. On the build side of things, the no-PIE
hack is no longer required, and we can bump some of the tools, repair
our m68k and i686-kvm support, and get better coverage of the static
branches used in the crypto lib and in udp_tunnel.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9a69a4c8802adf642bc4a13d471b5a86b44ed434)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ief6e041886e649dd900cae53fca86320ea4c4f09
The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d89ee7d5c73af15c1c6f12b016cdf469742b5726)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8a50d841105381849825c9cf0185dad55a33fc36
Remove <linux/version.h> from the includes for main.c, which is unused.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
[Jason: reworded commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 43967b6ff91e53bcce5ae08c16a0588a475b53a1)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5ee6cedcb037dbe522ab50dcc5b7c66f06d76b16
This fixes two spelling errors in source code comments.
Signed-off-by: Josh Soref <jsoref@gmail.com>
[Jason: rewrote commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a2ec8b5706944d228181c8b91d815f41d6dd8e7b)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id8dbd0f7ffa71358b7ecdee6a890cdfb4341f3c1
This fixes the crypto selection submenu depenencies. Otherwise, we'd
wind up issuing warnings in which certain dependencies we also select
couldn't be satisfied. This condition was triggered by the addition of
the test suite autobuilder in the previous commit.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d7c68a38bb4f9b7c1a2e4a772872c752ee5c44a6)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I47e3c1bbf9f2feaad1db9ce639e98c7d1074fd38
WireGuard has been using this on build.wireguard.com for the last
several years with considerable success. It allows for very quick and
iterative development cycles, and supports several platforms.
To run the test suite on your current platform in QEMU:
$ make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
To run it with KASAN and such turned on:
$ DEBUG_KERNEL=yes make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
To run it emulated for another platform in QEMU:
$ ARCH=arm make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
At the moment, we support aarch64_be, aarch64, arm, armeb, i686, m68k,
mips64, mips64el, mips, mipsel, powerpc64le, powerpc, and x86_64.
The system supports incremental rebuilding, so it should be very fast to
change a single file and then test it out and have immediate feedback.
This requires for the right toolchain and qemu to be installed prior.
I've had success with those from musl.cc.
This is tailored for WireGuard at the moment, though later projects
might generalize it for other network testing.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 65d88d04114bca7d85faebd5fed61069cb2b632c)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibfe5c5b231bc213756a3c2baaf6744f3f5864b38
WireGuard is a layer 3 secure networking tunnel made specifically for
the kernel, that aims to be much simpler and easier to audit than IPsec.
Extensive documentation and description of the protocol and
considerations, along with formal proofs of the cryptography, are
available at:
* https://www.wireguard.com/
* https://www.wireguard.com/papers/wireguard.pdf
This commit implements WireGuard as a simple network device driver,
accessible in the usual RTNL way used by virtual network drivers. It
makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of
networking subsystem APIs. It has a somewhat novel multicore queueing
system designed for maximum throughput and minimal latency of encryption
operations, but it is implemented modestly using workqueues and NAPI.
Configuration is done via generic Netlink, and following a review from
the Netlink maintainer a year ago, several high profile userspace tools
have already implemented the API.
This commit also comes with several different tests, both in-kernel
tests and out-of-kernel tests based on network namespaces, taking profit
of the fact that sockets used by WireGuard intentionally stay in the
namespace the WireGuard interface was originally created, exactly like
the semantics of userspace tun devices. See wireguard.com/netns/ for
pictures and examples.
The source code is fairly short, but rather than combining everything
into a single file, WireGuard is developed as cleanly separable files,
making auditing and comprehension easier. Things are laid out as
follows:
* noise.[ch], cookie.[ch], messages.h: These implement the bulk of the
cryptographic aspects of the protocol, and are mostly data-only in
nature, taking in buffers of bytes and spitting out buffers of
bytes. They also handle reference counting for their various shared
pieces of data, like keys and key lists.
* ratelimiter.[ch]: Used as an integral part of cookie.[ch] for
ratelimiting certain types of cryptographic operations in accordance
with particular WireGuard semantics.
* allowedips.[ch], peerlookup.[ch]: The main lookup structures of
WireGuard, the former being trie-like with particular semantics, an
integral part of the design of the protocol, and the latter just
being nice helper functions around the various hashtables we use.
* device.[ch]: Implementation of functions for the netdevice and for
rtnl, responsible for maintaining the life of a given interface and
wiring it up to the rest of WireGuard.
* peer.[ch]: Each interface has a list of peers, with helper functions
available here for creation, destruction, and reference counting.
* socket.[ch]: Implementation of functions related to udp_socket and
the general set of kernel socket APIs, for sending and receiving
ciphertext UDP packets, and taking care of WireGuard-specific sticky
socket routing semantics for the automatic roaming.
* netlink.[ch]: Userspace API entry point for configuring WireGuard
peers and devices. The API has been implemented by several userspace
tools and network management utility, and the WireGuard project
distributes the basic wg(8) tool.
* queueing.[ch]: Shared function on the rx and tx path for handling
the various queues used in the multicore algorithms.
* send.c: Handles encrypting outgoing packets in parallel on
multiple cores, before sending them in order on a single core, via
workqueues and ring buffers. Also handles sending handshake and cookie
messages as part of the protocol, in parallel.
* receive.c: Handles decrypting incoming packets in parallel on
multiple cores, before passing them off in order to be ingested via
the rest of the networking subsystem with GRO via the typical NAPI
poll function. Also handles receiving handshake and cookie messages
as part of the protocol, in parallel.
* timers.[ch]: Uses the timer wheel to implement protocol particular
event timeouts, and gives a set of very simple event-driven entry
point functions for callers.
* main.c, version.h: Initialization and deinitialization of the module.
* selftest/*.h: Runtime unit tests for some of the most security
sensitive functions.
* tools/testing/selftests/wireguard/netns.sh: Aforementioned testing
script using network namespaces.
This commit aims to be as self-contained as possible, implementing
WireGuard as a standalone module not needing much special handling or
coordination from the network subsystem. I expect for future
optimizations to the network stack to positively improve WireGuard, and
vice-versa, but for the time being, this exists as intentionally
standalone.
We introduce a menu option for CONFIG_WIREGUARD, as well as providing a
verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: David Miller <davem@davemloft.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
[Jason: ported to 4.19 by doing the following:
- wg_get_device_start uses genl_family_attrbuf
- skb_probe_transport_header has an extra argument
- NLA_EXACT/MIN_LEN is not there yet
- nla policy is per verb not family
- totalram_pages isn't a function]
- __kernel_timespec -> __uapi_kernel_timespec]
(cherry picked from commit e7096c131e5161fa3b8e52a650d7719d2857adfd)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I04cd661a4cfec9b9fa64c3ab0ea39e4e2352fa13
Somewhere in all the patchsets before, this cleanup got lost.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20190624091539.13512-1-Jason@zx2c4.com
(cherry picked from commit d48e0cd8fcaf314175a15d3076d7a1e71bd4e628)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4986ad53fbb6b2ddc99627c2ded37327c6f66af6
This further unifies the accessors for the fast and coarse functions, so
that the same types of functions are available for each. There was also
a bit of confusion with the documentation, which prior advertised a
function that has never existed. Finally, the vanilla ktime_get_coarse()
was omitted from the API originally, so this fills this oversight.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20190621203249.3909-3-Jason@zx2c4.com
(cherry picked from commit 4c54294d01e605a9f992361b924c5d8b12822a6d)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6beab35b160e85c9f473c86d248af5ae962f9ac2
This introduces a helper function to be called only by network drivers
that wraps calls to icmp[v6]_send in a conntrack transformation, in case
NAT has been used. We don't want to pollute the non-driver path, though,
so we introduce this as a helper to be called by places that actually
make use of this, as suggested by Florian.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0b41713b606694257b90d61ba7e2712d8457648b)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic11fe236d0c404a389017a713b972954ae067439
x86_64 zero extends 32bit operations, so for 64bit operands,
XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids
a REX prefix byte when legacy registers are used.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
(cherry picked from commit 7dfd1e01b3dfc13431b1b25720cf2692a7e111ef)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ice946da2856551122eca8b589323ecc7b3b311af