kernel-fxtec-pro1x/arch
Cristian Marussi d59848cab4 arm64: smp: fix crash_smp_send_stop() behaviour
commit f50b7dacccbab2b9e3ef18f52a6dcc18ed2050b9 upstream.

On a system configured to trigger a crash_kexec() reboot, when only one CPU
is online and another CPU panics while starting-up, crash_smp_send_stop()
will fail to send any STOP message to the other already online core,
resulting in fail to freeze and registers not properly saved.

Moreover even if the proper messages are sent (case CPUs > 2)
it will similarly fail to account for the booting CPU when executing
the final stop wait-loop, so potentially resulting in some CPU not
been waited for shutdown before rebooting.

A tangible effect of this behaviour can be observed when, after a panic
with kexec enabled and loaded, on the following reboot triggered by kexec,
the cpu that could not be successfully stopped fails to come back online:

[  362.291022] ------------[ cut here ]------------
[  362.291525] kernel BUG at arch/arm64/kernel/cpufeature.c:886!
[  362.292023] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[  362.292400] Modules linked in:
[  362.292970] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.6.0-rc4-00003-gc780b890948a #105
[  362.293136] Hardware name: Foundation-v8A (DT)
[  362.293382] pstate: 200001c5 (nzCv dAIF -PAN -UAO)
[  362.294063] pc : has_cpuid_feature+0xf0/0x348
[  362.294177] lr : verify_local_elf_hwcaps+0x84/0xe8
[  362.294280] sp : ffff800011b1bf60
[  362.294362] x29: ffff800011b1bf60 x28: 0000000000000000
[  362.294534] x27: 0000000000000000 x26: 0000000000000000
[  362.294631] x25: 0000000000000000 x24: ffff80001189a25c
[  362.294718] x23: 0000000000000000 x22: 0000000000000000
[  362.294803] x21: ffff8000114aa018 x20: ffff800011156a00
[  362.294897] x19: ffff800010c944a0 x18: 0000000000000004
[  362.294987] x17: 0000000000000000 x16: 0000000000000000
[  362.295073] x15: 00004e53b831ae3c x14: 00004e53b831ae3c
[  362.295165] x13: 0000000000000384 x12: 0000000000000000
[  362.295251] x11: 0000000000000000 x10: 00400032b5503510
[  362.295334] x9 : 0000000000000000 x8 : ffff800010c7e204
[  362.295426] x7 : 00000000410fd0f0 x6 : 0000000000000001
[  362.295508] x5 : 00000000410fd0f0 x4 : 0000000000000000
[  362.295592] x3 : 0000000000000000 x2 : ffff8000100939d8
[  362.295683] x1 : 0000000000180420 x0 : 0000000000180480
[  362.296011] Call trace:
[  362.296257]  has_cpuid_feature+0xf0/0x348
[  362.296350]  verify_local_elf_hwcaps+0x84/0xe8
[  362.296424]  check_local_cpu_capabilities+0x44/0x128
[  362.296497]  secondary_start_kernel+0xf4/0x188
[  362.296998] Code: 52805001 72a00301 6b01001f 54000ec0 (d4210000)
[  362.298652] SMP: stopping secondary CPUs
[  362.300615] Starting crashdump kernel...
[  362.301168] Bye!
[    0.000000] Booting Linux on physical CPU 0x0000000003 [0x410fd0f0]
[    0.000000] Linux version 5.6.0-rc4-00003-gc780b890948a (crimar01@e120937-lin) (gcc version 8.3.0 (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36))) #105 SMP PREEMPT Fri Mar 6 17:00:42 GMT 2020
[    0.000000] Machine model: Foundation-v8A
[    0.000000] earlycon: pl11 at MMIO 0x000000001c090000 (options '')
[    0.000000] printk: bootconsole [pl11] enabled
.....
[    0.138024] rcu: Hierarchical SRCU implementation.
[    0.153472] its@2f020000: unable to locate ITS domain
[    0.154078] its@2f020000: Unable to locate ITS domain
[    0.157541] EFI services will not be available.
[    0.175395] smp: Bringing up secondary CPUs ...
[    0.209182] psci: failed to boot CPU1 (-22)
[    0.209377] CPU1: failed to boot: -22
[    0.274598] Detected PIPT I-cache on CPU2
[    0.278707] GICv3: CPU2: found redistributor 1 region 0:0x000000002f120000
[    0.285212] CPU2: Booted secondary processor 0x0000000001 [0x410fd0f0]
[    0.369053] Detected PIPT I-cache on CPU3
[    0.372947] GICv3: CPU3: found redistributor 2 region 0:0x000000002f140000
[    0.378664] CPU3: Booted secondary processor 0x0000000002 [0x410fd0f0]
[    0.401707] smp: Brought up 1 node, 3 CPUs
[    0.404057] SMP: Total of 3 processors activated.

Make crash_smp_send_stop() account also for the online status of the
calling CPU while evaluating how many CPUs are effectively online: this way
the right number of STOPs is sent and all other stopped-cores's registers
are properly saved.

Fixes: 78fd584cde ("arm64: kdump: implement machine_crash_shutdown()")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 08:06:15 +01:00
..
alpha alpha: Fix Eiger NR_IRQS to 128 2019-02-20 10:25:47 +01:00
arc ARC: define __ALIGN_STR and __ALIGN symbols for ARC 2020-03-18 07:14:21 +01:00
arm ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes 2020-03-25 08:06:06 +01:00
arm64 arm64: smp: fix crash_smp_send_stop() behaviour 2020-03-25 08:06:15 +01:00
c6x
h8300 h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux- 2019-04-05 22:32:55 +02:00
hexagon hexagon: work around compiler crash 2020-01-17 19:47:17 +01:00
ia64 mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
m68k m68k: Call timer_interrupt() with interrupts disabled 2020-01-27 14:51:23 +01:00
microblaze microblaze: Prevent the overflow of the start 2020-02-24 08:34:53 +01:00
mips MIPS: VPE: Fix a double free and a memory leak in 'release_vpe()' 2020-03-05 16:42:19 +01:00
nds32 nds32: Fix the items of hwcap_str ordering issue. 2019-12-13 08:51:35 +01:00
nios2 nios2: ksyms: Add missing symbol exports 2020-01-27 14:50:30 +01:00
openrisc openrisc: Fix broken paths to arch/or32 2019-12-05 09:20:40 +01:00
parisc parisc: Use proper printk format for resource_size_t 2020-02-05 14:43:45 +00:00
powerpc powerpc: Include .BTF section 2020-03-25 08:06:05 +01:00
riscv riscv: avoid the PIC offset of static percpu data in module beyond 2G limits 2020-03-25 08:06:07 +01:00
s390 s390/qdio: fill SL with absolute addresses 2020-03-11 14:14:54 +01:00
sh pinctrl: sh-pfc: sh7269: Fix CAN function GPIOs 2020-02-24 08:34:44 +01:00
sparc sparc: Add .exit.data section. 2020-02-24 08:34:37 +01:00
um um: Fix off by one error in IRQ enumeration 2020-01-27 14:51:13 +01:00
unicore32
x86 x86/mm: split vmalloc_sync_all() 2020-03-25 08:06:13 +01:00
xtensa xtensa: fix TLB sanity checker 2019-12-21 10:57:25 +01:00
.gitignore
Kconfig jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00