`jiffies` and `jiffies_64` are meant to alias (two different symbols
that share the same address). Most architectures make the symbols alias
to the same address via linker script assignment in their
arch/<arch>/kernel/vmlinux.lds.S:
jiffies = jiffies_64;
which is effectively a definition of jiffies.
jiffies and jiffies_64 are both forward declared for all arch's in:
include/linux/jiffies.h.
jiffies_64 is defined in kernel/time/timer.c for all arch's.
x86_64 was peculiar in that it wasn't doing the above linker script
assignment, but rather was:
1. defining jiffies in arch/x86/kernel/time.c instead via linker script.
2. overriding the symbol jiffies_64 from kernel/time/timer.c in
arch/x86/kernel/vmlinux.lds.s via `jiffies_64 = jiffies;`.
As Fangrui notes:
```
In LLD, symbol assignments in linker scripts override definitions in
object files. GNU ld appears to have the same behavior. It would
probably make sense for LLD to error "duplicate symbol" but GNU ld is
unlikely to adopt for compatibility reasons.
```
So we have an ODR violation (UB), which we seem to have gotten away
with thus far. Where it becomes harmful is when we:
1. Use -fno-semantic-interposition.
As Fangrui notes:
```
Clang after LLVM
commit 5b22bcc2b70d ("[X86][ELF] Prefer to lower MC_GlobalAddress
operands to .Lfoo$local")
defaults to -fno-semantic-interposition similar semantics which help
-fpic/-fPIC code avoid GOT/PLT when the referenced symbol is defined
within the same translation unit. Unlike GCC
-fno-semantic-interposition, Clang emits such relocations referencing
local symbols for non-pic code as well.
```
This causes references to jiffies to refer to `.Ljiffies$local` when
jiffies is defined in the same translation unit. Likewise, references
to jiffies_64 become references to `.Ljiffies_64$local` in translation
units that define jiffies_64. Because these differ from the names
used in the linker script, they will not be rewritten to alias one
another.
Combined with ...
2. Full LTO effectively treats all source files as one translation
unit, causing these local references to be produced everywhere. When
the linker processes the linker script, there are no longer any
references to `jiffies_64` anywhere to replace with `jiffies`. And
thus `.Ljiffies$local` and `.Ljiffies_64$local` no longer alias
at all.
In the process of porting patches enabling Full LTO from arm64 to
x86_64, we observe spooky bugs where the kernel appeared to boot, but
init doesn't get scheduled.
Instead, we can avoid the ODR violation by matching other arch's by
defining jiffies only by linker script. For -fno-semantic-interposition
+ Full LTO, there is no longer a global definition of jiffies for the
compiler to produce a local symbol which the linker script won't ensure
aliases to jiffies_64.
Link: https://github.com/ClangBuiltLinux/linux/issues/852
Fixes: 40747ffa5a ("asmlinkage: Make jiffies visible")
Cc: stable@vger.kernel.org
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Alistair Delva <adelva@google.com>
Suggested-by: Fangrui Song <maskray@google.com>
Debugged-by: Nick Desaulniers <ndesaulniers@google.com>
Debugged-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Bob Haarman <inglorion@google.com>
(am from https://lore.kernel.org/lkml/20200515180544.59824-1-inglorion@google.com/T/#u)
Bug: 155426344
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Change-Id: I6a5d3aeac388a0bc8254b83fa92e12376a04f922
Commit 6545d64a18 ("BACKPORT: arm64: vdso: Substitute gettimeofday()
with C implementation") added an assignment to
CFLAGS_REMOVE_vgettimeofday.o without adjusting the existing one for the
removal of the shadow call stack flags.
Without this patch, the SCS flags do not get filtered:
clang ... -g -fsanitize=shadow-call-stack -Wdeclaration-after-statement
... -c -o arch/arm64/kernel/vdso/.tmp_vgettimeofday.o ...
After this, everything works as it should:
clang ... -g -Wdeclaration-after-statement -Wno-pointer-sign ... -c
-o arch/arm64/kernel/vdso/.tmp_vgettimeofday.o ...
This is an Android specific issue to the backport of the above patch,
this issue will not be present in the mainline version of SCS.
Change-Id: I2ff00e1b407155501ada4c36a961db0b720b80bf
Fixes: 6545d64a18 ("BACKPORT: arm64: vdso: Substitute gettimeofday() with C implementation")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Waking up the waiters accounts for 80+% of the total logging
time, and about 40% of overall read_single_page() with no
signature verification. By throttling it to once every 16ms
we get back all read performance, reduce the waiter's CPU
usage and still leave it enough time to pull the logs out.
Bug: 155996534
Test: adb install megacity.apk & dd from the installed apk
Signed-off-by: Yurii Zubrytskyi <zyy@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I4a118dc226d7ca318cf099ba3e239f0120bb23c2
If an incfs file is created, then the file system is sync'd,on opening
the incfs file inode_set reads the size from the backing file from
within iget5_locked, causing this error.
Test: incfs_test passes, this no longer occurs
Bug: 156413528
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I8939c4afa514d39d251c044d7680cfc69272669e
With a verified file (use incfs_perf to create a verified file), throughput
measured using dd after dropping caches increases from 200M/s to 290M/s
Test: incfs_test passes
Bug: 155996534
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Change-Id: I7abb5ad92e4167f82f3452acc9db322fec8307dd
Trimming non-whitelisted symbols from modules will be useful once we
introduce GKI modules, but in the meantime it causes pain downstream for
out-of-tree modules as vendors will have to whitelist the symbols they
need.
The right solution would be to apply the trimming or not to a module
based on an explicit list of GKI modules, but as we currently don't have
any (nor we have a good way to differentiate them in the kernel build
system), let's simply disable the trimming for all modules altogether,
and get things unblocked for GKI 1.0.
Bug: 148277666
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I2513d85c46eefc62c59883b724b6153dd27b3671
Leaf changes summary: 3 artifacts changed
Changed leaf types summary: 0 leaf type changed
Removed/Changed/Added functions summary: 0 Removed, 2 Changed, 1 Added function
Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable
1 Added function:
[A] 'function void keyslot_manager_set_max_dun_bytes(keyslot_manager*, unsigned int)'
2 functions with some sub-type change:
[C] 'function int blk_crypto_init_key(blk_crypto_key*, const unsigned char*, unsigned int, bool, blk_crypto_mode_num, unsigned int)' at blk-crypto.h:17:1 has some sub-type changes:
parameter 7 of type 'unsigned int' was added
[C] 'function int blk_crypto_start_using_mode(blk_crypto_mode_num, unsigned int, bool, request_queue*)' at blk-crypto.h:24:1 has some sub-type changes:
parameter 5 of type 'request_queue*' was added
parameter 3 of type 'bool' changed:
type name changed from 'bool' to 'unsigned int'
type size changed from 8 to 32 (in bits)
parameter 4 of type 'request_queue*' changed:
entity changed from 'request_queue*' to 'bool'
type size changed from 64 to 8 (in bits)
Bug: 144046242
Bug: 153512828
BUg: 156673772
Change-Id: I774418738cb01631dc4d5b9b85e229441e718266
Signed-off-by: Eric Biggers <ebiggers@google.com>
Make fscrypt set dun_bytes to only what it actually needs, so that it
can make use of inline crypto hardware in more cases.
Bug: 144046242
Bug: 153512828
Change-Id: I36f90ea6b64ef51a9d58ffb069d2cba74965c239
Signed-off-by: Eric Biggers <ebiggers@google.com>
Make dm-default-key set dun_bytes to only what it actually needs, so
that it can make use of inline crypto hardware in more cases.
Bug: 144046242
Bug: 153512828
Change-Id: I338e6444e71be9c7c16ce70172d14a8e05301023
Signed-off-by: Eric Biggers <ebiggers@google.com>
Backport a fix from the v7 inline crypto patchset which ensures that the
block layer knows the number of DUN bytes the inline encryption hardware
supports, so that hardware isn't used when it shouldn't be.
(This unfortunately means introducing some increasing long argument
lists; this was all already fixed up in later versions of the patchset.)
To avoid breaking the KMI for drivers, don't add a dun_bytes argument to
keyslot_manager_create() but rather allow drivers to call
keyslot_manager_set_max_dun_bytes() to override the default. Also,
don't add dun_bytes as a new field in 'struct blk_crypto_key' but rather
pack it into the existing 'hash' field which is for block layer use.
Bug: 144046242
Bug: 153512828
Change-Id: I285f36557fb3eafc5f2f64727ef1740938b59dd7
Signed-off-by: Eric Biggers <ebiggers@google.com>
This reverts commit 7fc2fedd65.
There's actually no need for vendors to override dm-default-key.
Bug: 154836329
Bug: 156673772
Change-Id: Id7fd2b970f0526949934102b9aba48d3a12f579b
Signed-off-by: Eric Biggers <ebiggers@google.com>
commit e08df079b23e2e982df15aa340bfbaf50f297504 upstream.
If the trapping instruction contains a ':', for a memory access through
segment registers for example, the sed substitution will insert the '*'
marker in the middle of the instruction instead of the line address:
2b: 65 48 0f c7 0f cmpxchg16b %gs:*(%rdi) <-- trapping instruction
I started to think I had forgotten some quirk of the assembly syntax
before noticing that it was actually coming from the script. Fix it to
add the address marker at the right place for these instructions:
28: 49 8b 06 mov (%r14),%rax
2b:* 65 48 0f c7 0f cmpxchg16b %gs:(%rdi) <-- trapping instruction
30: 0f 94 c0 sete %al
Fixes: 18ff44b189 ("scripts/decodecode: make faulting insn ptr more robust")
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20200419223653.GA31248@visor
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8dd25a461e4eec7190cb9d66616aceacc5110ad upstream.
When the current frame address (CFA) is stored on the stack (i.e.,
cfa->base == CFI_SP_INDIRECT), objtool neglects to adjust the stack
offset when there are subsequent pushes or pops. This results in bad
ORC data at the end of the ENTER_IRQ_STACK macro, when it puts the
previous stack pointer on the stack and does a subsequent push.
This fixes the following unwinder warning:
WARNING: can't dereference registers at 00000000f0a6bdba for ip interrupt_entry+0x9f/0xa0
Fixes: 627fce1480 ("objtool: Add ORC unwind table generation")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Dave Jones <dsj@fb.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/853d5d691b29e250333332f09b8e27410b2d9924.1587808742.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c165d57b552aaca607fa5daf3fb524a6efe3c5a3 upstream.
gcc-10 points out that a code path exists where a pointer to a stack
variable may be passed back to the caller:
net/netfilter/nfnetlink_osf.c: In function 'nf_osf_hdr_ctx_init':
cc1: warning: function may return address of local variable [-Wreturn-local-addr]
net/netfilter/nfnetlink_osf.c:171:16: note: declared here
171 | struct tcphdr _tcph;
| ^~~~~
I am not sure whether this can happen in practice, but moving the
variable declaration into the callers avoids the problem.
Fixes: 31a9c29210 ("netfilter: nf_osf: add struct nf_osf_hdr_ctx")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea64d8d6c675c0bb712689b13810301de9d8f77a upstream.
If the UDP header of a local VXLAN endpoint is NAT-ed, and the VXLAN
device has disabled UDP checksums and enabled Tx checksum offloading,
then the skb passed to udp_manip_pkt() has hdr->check == 0 (outer
checksum disabled) and skb->ip_summed == CHECKSUM_PARTIAL (inner packet
checksum offloaded).
Because of the ->ip_summed value, udp_manip_pkt() tries to update the
outer checksum with the new address and port, leading to an invalid
checksum sent on the wire, as the original null checksum obviously
didn't take the old address and port into account.
So, we can't take ->ip_summed into account in udp_manip_pkt(), as it
might not refer to the checksum we're acting on. Instead, we can base
the decision to update the UDP checksum entirely on the value of
hdr->check, because it's null if and only if checksum is disabled:
* A fully computed checksum can't be 0, since a 0 checksum is
represented by the CSUM_MANGLED_0 value instead.
* A partial checksum can't be 0, since the pseudo-header always adds
at least one non-zero value (the UDP protocol type 0x11) and adding
more values to the sum can't make it wrap to 0 as the carry is then
added to the wrapped number.
* A disabled checksum uses the special value 0.
The problem seems to be there from day one, although it was probably
not visible before UDP tunnels were implemented.
Fixes: 5b1158e909 ("[NETFILTER]: Add NAT support for nf_conntrack")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81b67439d147677d844d492fcbd03712ea438f42 upstream.
The following execution path is possible:
fsnotify()
[ realign the stack and store previous SP in R10 ]
<IRQ>
[ only IRET regs saved ]
common_interrupt()
interrupt_entry()
<NMI>
[ full pt_regs saved ]
...
[ unwind stack ]
When the unwinder goes through the NMI and the IRQ on the stack, and
then sees fsnotify(), it doesn't have access to the value of R10,
because it only has the five IRET registers. So the unwind stops
prematurely.
However, because the interrupt_entry() code is careful not to clobber
R10 before saving the full regs, the unwinder should be able to read R10
from the previously saved full pt_regs associated with the NMI.
Handle this case properly. When encountering an IRET regs frame
immediately after a full pt_regs frame, use the pt_regs as a backup
which can be used to get the C register values.
Also, note that a call frame resets the 'prev_regs' value, because a
function is free to clobber the registers. For this fix to work, the
IRET and full regs frames must be adjacent, with no FUNC frames in
between. So replace the FUNC hint in interrupt_entry() with an
IRET_REGS hint.
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0f81bf26888048100bf017fadf438a5bdffa8d8 upstream.
If the ORC entry type is unknown, nothing else can be done other than
reporting an error. Exit the function instead of breaking out of the
switch statement.
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98d0c8ebf77e0ba7c54a9ae05ea588f0e9e3f46e upstream.
If the unwinder is called before the ORC data has been initialized,
orc_find() returns NULL, and it tries to fall back to using frame
pointers. This can cause some unexpected warnings during boot.
Move the 'orc_init' check from orc_find() to __unwind_init(), so that it
doesn't even try to unwind from an uninitialized state.
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/069d1499ad606d85532eb32ce39b2441679667d5.1587808742.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1d9a2abff66aa8156fbc1493abed468db63ea48 upstream.
When unwinding an inactive task, the ORC unwinder skips the first frame
by default. If both the 'regs' and 'first_frame' parameters of
unwind_start() are NULL, 'state->sp' and 'first_frame' are later
initialized to the same value for an inactive task. Given there is a
"less than or equal to" comparison used at the end of __unwind_start()
for skipping stack frames, the first frame is skipped.
Drop the equal part of the comparison and make the behavior equivalent
to the frame pointer unwinder.
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/7f08db872ab59e807016910acdbe82f744de7065.1587808742.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f977df7b7ca45a4ac4b66d30a8931d0434c394b1 upstream.
The LEAQ instruction in rewind_stack_do_exit() moves the stack pointer
directly below the pt_regs at the top of the task stack before calling
do_exit(). Tell the unwinder to expect pt_regs.
Fixes: 8c1f75587a ("x86/entry/64: Add unwind hint annotations")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/68c33e17ae5963854916a46f522624f8e1d264f2.1587808742.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fb143634a38095b641a3a21220774799772dc4c upstream.
In swapgs_restore_regs_and_return_to_usermode, after the stack is
switched to the trampoline stack, the existing UNWIND_HINT_REGS hint is
no longer valid, which can result in the following ORC unwinder warning:
WARNING: can't dereference registers at 000000003aeb0cdd for ip swapgs_restore_regs_and_return_to_usermode+0x93/0xa0
For full correctness, we could try to add complicated unwind hints so
the unwinder could continue to find the registers, but when when it's
this close to kernel exit, unwind hints aren't really needed anymore and
it's fine to just use an empty hint which tells the unwinder to stop.
For consistency, also move the UNWIND_HINT_EMPTY in
entry_SYSCALL_64_after_hwframe to a similar location.
Fixes: 3e3b9293d3 ("x86/entry/64: Return to userspace from the trampoline stack")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Dave Jones <dsj@fb.com>
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/60ea8f562987ed2d9ace2977502fe481c0d7c9a0.1587808742.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06a9750edcffa808494d56da939085c35904e618 upstream.
The PUSH_AND_CLEAR_REGS macro zeroes each register immediately after
pushing it. If an NMI or exception hits after a register is cleared,
but before the UNWIND_HINT_REGS annotation, the ORC unwinder will
wrongly think the previous value of the register was zero. This can
confuse the unwinding process and cause it to exit early.
Because ORC is simpler than DWARF, there are a limited number of unwind
annotation states, so it's not possible to add an individual unwind hint
after each push/clear combination. Instead, the register clearing
instructions need to be consolidated and moved to after the
UNWIND_HINT_REGS annotation.
Fixes: 3f01daecd5 ("x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/68fd3d0bc92ae2d62ff7879d15d3684217d51f08.1587808742.git.jpoimboe@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f91a3f7af4186099dd10fa530dd7e0d9c29747d upstream.
batadv_v_ogm_process() invokes batadv_hardif_neigh_get(), which returns
a reference of the neighbor object to "hardif_neigh" with increased
refcount.
When batadv_v_ogm_process() returns, "hardif_neigh" becomes invalid, so
the refcount should be decreased to keep refcount balanced.
The reference counting issue happens in one exception handling paths of
batadv_v_ogm_process(). When batadv_v_ogm_orig_get() fails to get the
orig node and returns NULL, the refcnt increased by
batadv_hardif_neigh_get() is not decreased, causing a refcnt leak.
Fix this issue by jumping to "out" label when batadv_v_ogm_orig_get()
fails to get the orig node.
Fixes: 9323158ef9 ("batman-adv: OGMv2 - implement originators logic")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6107c5da0fca8b50b4d3215e94d619d38cc4a18c upstream.
batadv_show_throughput_override() invokes batadv_hardif_get_by_netdev(),
which gets a batadv_hard_iface object from net_dev with increased refcnt
and its reference is assigned to a local pointer 'hard_iface'.
When batadv_store_throughput_override() returns, "hard_iface" becomes
invalid, so the refcount should be decreased to keep refcount balanced.
The issue happens in one error path of
batadv_store_throughput_override(). When batadv_parse_throughput()
returns NULL, the refcnt increased by batadv_hardif_get_by_netdev() is
not decreased, causing a refcnt leak.
Fix this issue by jumping to "out" label when batadv_parse_throughput()
returns NULL.
Fixes: 0b5ecc6811 ("batman-adv: add throughput override attribute to hard_ifaces")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f872de8185acf1b48b954ba5bd8f9bc0a0d14016 upstream.
batadv_show_throughput_override() invokes batadv_hardif_get_by_netdev(),
which gets a batadv_hard_iface object from net_dev with increased refcnt
and its reference is assigned to a local pointer 'hard_iface'.
When batadv_show_throughput_override() returns, "hard_iface" becomes
invalid, so the refcount should be decreased to keep refcount balanced.
The issue happens in the normal path of
batadv_show_throughput_override(), which forgets to decrease the refcnt
increased by batadv_hardif_get_by_netdev() before the function returns,
causing a refcnt leak.
Fix this issue by calling batadv_hardif_put() before the
batadv_show_throughput_override() returns in the normal path.
Fixes: 0b5ecc6811 ("batman-adv: add throughput override attribute to hard_ifaces")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd0c42c4dea54335967c5a86f15fc064235a2797 upstream.
and change to pseudorandom numbers, as this is a traffic dithering
operation that doesn't need crypto-grade.
The previous code operated in 4 steps:
1. Generate a random byte 0 <= rand_tq <= 255
2. Multiply it by BATADV_TQ_MAX_VALUE - tq
3. Divide by 255 (= BATADV_TQ_MAX_VALUE)
4. Return BATADV_TQ_MAX_VALUE - rand_tq
This would apperar to scale (BATADV_TQ_MAX_VALUE - tq) by a random
value between 0/255 and 255/255.
But! The intermediate value between steps 3 and 4 is stored in a u8
variable. So it's truncated, and most of the time, is less than 255, after
which the division produces 0. Specifically, if tq is odd, the product is
always even, and can never be 255. If tq is even, there's exactly one
random byte value that will produce a product byte of 255.
Thus, the return value is 255 (511/512 of the time) or 254 (1/512
of the time).
If we assume that the truncation is a bug, and the code is meant to scale
the input, a simpler way of looking at it is that it's returning a random
value between tq and BATADV_TQ_MAX_VALUE, inclusive.
Well, we have an optimized function for doing just that.
Fixes: 3c12de9a5c ("batman-adv: network coding - code and transmit packets if possible")
Signed-off-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on upstream commit f3689e3f17f064fd4cd5f0cb01ae2395c94f39d9.
Save RCX, RDX and RSI to fake outputs to coerce the compiler into
treating them as clobbered. RCX in particular is likely to be reused by
the compiler to dereference the 'struct vcpu_vmx' pointer, which will
result in a null pointer dereference now that RCX is zeroed by the asm
blob.
Tag the asm() blob as volatile to prevent GCC from dropping the blob,
which is possible now that the blob has output values, all of which are
unused.
Upstream commit f3689e3f17f06 ("KVM: VMX: Save RSI to an unused output
in the vCPU-run asm blob") is not a direct equivalent of this patch. As
its shortlog states, it only tagged RSI as clobbered, whereas here RCX
and RDX are also clobbered.
In upstream at the time of the offending commit (b4be98039a in 4.19,
0e0ab73c9a024 upstream), the inline asm blob had previously been moved
to a dedicated helper, __vmx_vcpu_run(). For unrelated reasons,
__vmx_vcpu_run() was put into its own optimization unit, which for all
intents and purposes made it impossible to consume clobbered registers
because RCX, RDX and RSI are volatile and __vmx_vcpu_run() couldn't
itself be inlined. In other words, the bug existed but couldn't be hit.
Similarly, the lack of "volatile" was also a bug in upstream that was
hidden by an unrelated change that exists in upstream but not in 4.19.
In this case, the asm blob also uses ASM_CALL_CONSTRAINT (marks RSP as
being an input/output constraint) in upstream to play nice with objtool
due the blob making a CALL. In 4.19, there is no CALL and thus no
ASM_CALL_CONSTRAINT.
Furthermore, both of the lurking bugs were blasted away in upstream by
commits 5e0781df1899 ("KVM: VMX: Move vCPU-run code to a proper assembly
routine") and fc2ba5a27a1a ("KVM: VMX: Call vCPU-run asm sub-routine
from C and remove clobbering"), i.e. these bugs will never be directly
fixed in upstream.
Reported-by: Tobias Urdin <tobias.urdin@binero.com>
Fixes: b4be98039a ("KVM: VMX: Zero out *all* general purpose registers after VM-Exit")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 051a2d3e59e51ae49fd56aef34e472832897ce46 upstream.
Use '%% " _ASM_CX"' instead of '%0' to dereference RCX, i.e. the
'struct vcpu_vmx' pointer, in the VM-Enter asm blobs of vmx_vcpu_run()
and nested_vmx_check_vmentry_hw(). Using the symbolic name means that
adding/removing an output parameter(s) requires "rewriting" almost all
of the asm blob, which makes it nearly impossible to understand what's
being changed in even the most minor patches.
Opportunistically improve the code comments.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 769acc3656d93aaacada814939743361d284fd87 upstream.
Check the return value of gasket_get_bar_index function as it can return
a negative one (-EINVAL). If this happens, a negative index is used in
the "gasket_dev->bar_data" array.
Addresses-Coverity-ID: 1438542 ("Negative array index read")
Fixes: 9a69f5087c ("drivers/staging: Gasket driver framework + Apex driver")
Signed-off-by: Oscar Carter <oscar.carter@gmx.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Richard Yeh <rcy@google.com>
Link: https://lore.kernel.org/r/20200501155118.13380-1-oscar.carter@gmx.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e84fe99b68ce353c37ceeecc95dce9696c976556 upstream.
Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
e.g., while booting up.
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
RIP: __pageblock_pfn_to_page+0x134/0x1c0
Call Trace:
set_zone_contiguous+0x56/0x70
page_alloc_init_late+0x166/0x176
kernel_init_freeable+0xfa/0x255
kernel_init+0xa/0x106
ret_from_fork+0x35/0x40
The issue becomes visible when having a lot of memory (e.g., 4TB)
assigned to a single NUMA node - a system that can easily be created
using QEMU. Inside VMs on a hypervisor with quite some memory
overcommit, this is fairly easy to trigger.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 027d0c7101f50cf03aeea9eebf484afd4920c8d3 upstream.
The static analyzer in GCC 10 spotted that in huge_pte_alloc() we may
pass a NULL pmdp into pte_alloc_map() when pmd_alloc() returns NULL:
| CC arch/arm64/mm/pageattr.o
| CC arch/arm64/mm/hugetlbpage.o
| from arch/arm64/mm/hugetlbpage.c:10:
| arch/arm64/mm/hugetlbpage.c: In function ‘huge_pte_alloc’:
| ./arch/arm64/include/asm/pgtable-types.h:28:24: warning: dereference of NULL ‘pmdp’ [CWE-690] [-Wanalyzer-null-dereference]
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’
| |arch/arm64/mm/hugetlbpage.c:232:10:
| |./arch/arm64/include/asm/pgtable-types.h:28:24:
| ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’
| arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’
This can only occur when the kernel cannot allocate a page, and so is
unlikely to happen in practice before other systems start failing.
We can avoid this by bailing out if pmd_alloc() fails, as we do earlier
in the function if pud_alloc() fails.
Fixes: 66b3923a1a ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Kyrill Tkachov <kyrylo.tkachov@arm.com>
Cc: <stable@vger.kernel.org> # 4.5.x-
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0225fd5e0a6a32af7af0aefac45c8ebf19dc5183 upstream.
In the unlikely event that a 32bit vcpu traps into the hypervisor
on an instruction that is located right at the end of the 32bit
range, the emulation of that instruction is going to increment
PC past the 32bit range. This isn't great, as userspace can then
observe this value and get a bit confused.
Conversly, userspace can do things like (in the context of a 64bit
guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0,
set PC to a 64bit value, change PSTATE to AArch32-USR, and observe
that PC hasn't been truncated. More confusion.
Fix both by:
- truncating PC increments for 32bit guests
- sanitizing all 32bit regs every time a core reg is changed by
userspace, and that PSTATE indicates a 32bit mode.
Cc: stable@vger.kernel.org
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c32ca5dc6d00012f0c964e5fdd7042fcc71efb1 upstream.
When deciding whether a guest has to be stopped we check whether this
is a private interrupt or not. Unfortunately, there's an off-by-one bug
here, and we fail to recognize a whole range of interrupts as being
global (GICv2 SPIs 32-63).
Fix the condition from > to be >=.
Cc: stable@vger.kernel.org
Fixes: abd7229626 ("KVM: arm/arm64: Simplify active_change_prepare and plug race")
Reported-by: André Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11f5efc3ab66284f7aaacc926e9351d658e2577b upstream.
x86_64 lazily maps in the vmalloc pages, and the way this works with per_cpu
areas can be complex, to say the least. Mappings may happen at boot up, and
if nothing synchronizes the page tables, those page mappings may not be
synced till they are used. This causes issues for anything that might touch
one of those mappings in the path of the page fault handler. When one of
those unmapped mappings is touched in the page fault handler, it will cause
another page fault, which in turn will cause a page fault, and leave us in
a loop of page faults.
Commit 763802b53a42 ("x86/mm: split vmalloc_sync_all()") split
vmalloc_sync_all() into vmalloc_sync_unmappings() and
vmalloc_sync_mappings(), as on system exit, it did not need to do a full
sync on x86_64 (although it still needed to be done on x86_32). By chance,
the vmalloc_sync_all() would synchronize the page mappings done at boot up
and prevent the per cpu area from being a problem for tracing in the page
fault handler. But when that synchronization in the exit of a task became a
nop, it caused the problem to appear.
Link: https://lore.kernel.org/r/20200429054857.66e8e333@oasis.local.home
Cc: stable@vger.kernel.org
Fixes: 737223fbca ("tracing: Consolidate buffer allocation code")
Reported-by: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
Suggested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9b3c610a05c1cdf8e959a6d89c38807ff758ee6 upstream.
We must not process packets shorter than a packet ID
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-and-tested-by: syzbot+d29e9263e13ce0b9f4fd@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ed08faded1da03eb3def61502b27f81aef2e615 upstream.
The syzbot fuzzer discovered a bad race between in the usbhid driver
between usbhid_stop() and usbhid_close(). In particular,
usbhid_stop() does:
usb_free_urb(usbhid->urbin);
...
usbhid->urbin = NULL; /* don't mess up next start */
and usbhid_close() does:
usb_kill_urb(usbhid->urbin);
with no mutual exclusion. If the two routines happen to run
concurrently so that usb_kill_urb() is called in between the
usb_free_urb() and the NULL assignment, it will access the
deallocated urb structure -- a use-after-free bug.
This patch adds a mutex to the usbhid private structure and uses it to
enforce mutual exclusion of the usbhid_start(), usbhid_stop(),
usbhid_open() and usbhid_close() callbacks.
Reported-and-tested-by: syzbot+7bf5a7b0f0a1f9446f4c@syzkaller.appspotmail.com
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: <stable@vger.kernel.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 145cb2f7177d94bc54563ed26027e952ee0ae03c upstream.
When we start shutdown in sctp_sf_do_dupcook_a(), we want to bundle
the SHUTDOWN with the COOKIE-ACK to ensure that the peer receives them
at the same time and in the correct order. This bundling was broken by
commit 4ff40b86262b ("sctp: set chunk transport correctly when it's a
new asoc"), which assigns a transport for the COOKIE-ACK, but not for
the SHUTDOWN.
Fix this by passing a reference to the COOKIE-ACK chunk as an argument
to sctp_sf_do_9_2_start_shutdown() and onward to
sctp_make_shutdown(). This way the SHUTDOWN chunk is assigned the same
transport as the COOKIE-ACK chunk, which allows them to be bundled.
In sctp_sf_do_9_2_start_shutdown(), the void *arg parameter was
previously unused. Now that we're taking it into use, it must be a
valid pointer to a chunk, or NULL. There is only one call site where
it's not, in sctp_sf_autoclose_timer_expire(). Fix that too.
Fixes: 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc")
Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 778fbf4179991e7652e97d7f1ca1f657ef828422 upstream.
We've recently switched from extracting the value of HID_DG_CONTACTMAX
at a fixed offset (which may not be correct for all tablets) to
injecting the report into the driver for the generic codepath to handle.
Unfortunately, this change was made for *all* tablets, even those which
aren't generic. Because `wacom_wac_report` ignores reports from non-
generic devices, the contact count never gets initialized. Ultimately
this results in the touch device itself failing to probe, and thus the
loss of touch input.
This commit adds back the fixed-offset extraction for non-generic devices.
Link: https://github.com/linuxwacom/input-wacom/issues/155
Fixes: 184eccd40389 ("HID: wacom: generic: read HID_DG_CONTACTMAX from any feature report")
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Aaron Armstrong Skomra <aaron.skomra@wacom.com>
CC: stable@vger.kernel.org # 5.3+
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9274124f023b5c56dc4326637d4f787968b03607 ]
Syzkaller again found a path to a kernel crash through bad gso input:
a packet with transport header extending beyond skb_headlen(skb).
Tighten validation at kernel entry:
- Verify that the transport header lies within the linear section.
To avoid pulling linux/tcp.h, verify just sizeof tcphdr.
tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use.
- Match the gso_type against the ip_proto found by the flow dissector.
Fixes: bfd5f4a3d6 ("packet: Add GSO/csum offload support.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c71c4e49afe173823a2a85b0cabc9b3f1176ffa2 ]
Fix the logic that sets the enable/disable flag for the source MAC
filter according to firmware spec 1.7.1.
In the original firmware spec. before 1.7.1, the VF spoof check flags
were not latched after making the HWRM_FUNC_CFG call, so there was a
need to keep the func_flags so that subsequent calls would perserve
the VF spoof check setting. A change was made in the 1.7.1 spec
so that the flags became latched. So we now set or clear the anti-
spoof setting directly without retrieving the old settings in the
stored vf->func_flags which are no longer valid. We also remove the
unneeded vf->func_flags.
Fixes: 8eb992e876 ("bnxt_en: Update firmware interface spec to 1.7.6.2.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bae361c54fb6ac6eba3b4762f49ce14beb73ef13 ]
Improve the slot reset sequence by disabling the device to prevent bad
DMAs if slot reset fails. Return the proper result instead of always
PCI_ERS_RESULT_RECOVERED to the caller.
Fixes: 6316ea6db9 ("bnxt_en: Enable AER support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cece6f432cca9f18900463ed01b97a152a03600a ]
Processing commands by cmd_work_handler() while already in Internal
Error State will result in entry leak, since the handler process force
completion without doorbell. Forced completion doesn't release the entry
and event completion will never arrive, so entry should be released.
Fixes: 73dd3a4839 ("net/mlx5: Avoid using pending command interface slots")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>