kernel-fxtec-pro1x

History

Matt Fleming 50db905f00 x86/asm/64: Align start of __clear_user() loop to 16-bytes commit bb5570ad3b54e7930997aec76ab68256d5236d94 upstream. x86 CPUs can suffer severe performance drops if a tight loop, such as the ones in __clear_user(), straddles a 16-byte instruction fetch window, or worse, a 64-byte cacheline. This issues was discovered in the SUSE kernel with the following commit, `1153933703` ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants") which increased the code object size from 10 bytes to 15 bytes and caused the 8-byte copy loop in __clear_user() to be split across a 64-byte cacheline. Aligning the start of the loop to 16-bytes makes this fit neatly inside a single instruction fetch window again and restores the performance of __clear_user() which is used heavily when reading from /dev/zero. Here are some numbers from running libmicro's read_z* and pread_z* microbenchmarks which read from /dev/zero: Zen 1 (Naples) libmicro-file 5.7.0-rc6 5.7.0-rc6 5.7.0-rc6 revert-1153933703d9+ align16+ Time mean95-pread_z100k 9.9195 ( 0.00%) 5.9856 ( 39.66%) 5.9938 ( 39.58%) Time mean95-pread_z10k 1.1378 ( 0.00%) 0.7450 ( 34.52%) 0.7467 ( 34.38%) Time mean95-pread_z1k 0.2623 ( 0.00%) 0.2251 ( 14.18%) 0.2252 ( 14.15%) Time mean95-pread_zw100k 9.9974 ( 0.00%) 6.0648 ( 39.34%) 6.0756 ( 39.23%) Time mean95-read_z100k 9.8940 ( 0.00%) 5.9885 ( 39.47%) 5.9994 ( 39.36%) Time mean95-read_z10k 1.1394 ( 0.00%) 0.7483 ( 34.33%) 0.7482 ( 34.33%) Note that this doesn't affect Haswell or Broadwell microarchitectures which seem to avoid the alignment issue by executing the loop straight out of the Loop Stream Detector (verified using perf events). Fixes: `1153933703` ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants") Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # v4.19+ Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>		2020-06-30 23:17:16 -04:00
..
.gitignore
atomic64_32.c
atomic64_386_32.S
atomic64_cx8_32.S
cache-smp.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
checksum_32.S	x86/retpoline/checksum32: Convert assembler indirect jumps	2018-01-12 00:14:31 +01:00
clear_page_64.S	x86/asm: Trim clear_page.S includes	2018-02-13 17:37:07 +01:00
cmdline.c	x86/boot: Add early cmdline parsing for options with arguments	2017-07-18 11:38:06 +02:00
cmpxchg8b_emu.S	x86: move exports to actual definitions	2016-08-07 23:47:15 -04:00
cmpxchg16b_emu.S
copy_page_64.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
copy_user_64.S	x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings	2017-06-30 09:52:51 +02:00
cpu.c	x86/lib/cpu: Address missing prototypes warning	2019-08-29 08:28:45 +02:00
csum-copy_64.S	x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()	2017-05-05 07:59:24 +02:00
csum-partial_64.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
csum-wrappers_64.c
delay.c	x86/asm: Fix MWAITX C-state hint value	2019-10-17 13:45:43 -07:00
error-inject.c	x86/error_inject: Make just_return_func() globally visible	2018-02-13 14:33:35 +01:00
getuser.S	x86/get_user: Use pointer masking to limit speculation	2018-01-30 21:54:31 +01:00
hweight.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
inat.c
insn-eval.c	x86/insn-eval: Fix use-after-free access to LDT entry	2019-06-11 12:20:52 +02:00
insn.c
iomap_copy_64.S
kaslr.c	x86/kaslr: Fix incorrect i8254 outb() parameters	2019-01-31 08:14:39 +01:00
Makefile	x86/mm/mem_encrypt: Disable all instrumentation for early SME setup	2019-05-25 18:23:45 +02:00
memcpy_32.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
memcpy_64.S	x86/uaccess: Fix up the fixup	2019-05-31 06:46:27 -07:00
memmove_64.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
memset_64.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
misc.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mmx_32.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
msr-reg-export.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
msr-reg.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
msr-smp.c	x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well	2018-03-28 10:34:13 +02:00
msr.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
putuser.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
retpoline.S	Revert "x86/retpoline: Simplify vmexit_fill_RSB()"	2018-02-20 09:38:26 +01:00
rwsem.S	locking/arch, x86: Add __down_read_killable()	2017-10-10 11:50:15 +02:00
string_32.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
strstr_32.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
usercopy.c	x86/nmi: Fix NMI uaccess race against CR3 switching	2018-08-31 17:08:22 +02:00
usercopy_32.c	x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec	2018-01-30 21:54:31 +01:00
usercopy_64.c	x86/asm/64: Align start of __clear_user() loop to 16-bytes	2020-06-30 23:17:16 -04:00
x86-opcode-map.txt	x86/decoder: Add TEST opcode to Group3-2	2020-02-24 08:34:50 +01:00