kernel-fxtec-pro1x/arch/x86/lib
Matt Fleming 50db905f00 x86/asm/64: Align start of __clear_user() loop to 16-bytes
commit bb5570ad3b54e7930997aec76ab68256d5236d94 upstream.

x86 CPUs can suffer severe performance drops if a tight loop, such as
the ones in __clear_user(), straddles a 16-byte instruction fetch
window, or worse, a 64-byte cacheline. This issues was discovered in the
SUSE kernel with the following commit,

  1153933703 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")

which increased the code object size from 10 bytes to 15 bytes and
caused the 8-byte copy loop in __clear_user() to be split across a
64-byte cacheline.

Aligning the start of the loop to 16-bytes makes this fit neatly inside
a single instruction fetch window again and restores the performance of
__clear_user() which is used heavily when reading from /dev/zero.

Here are some numbers from running libmicro's read_z* and pread_z*
microbenchmarks which read from /dev/zero:

  Zen 1 (Naples)

  libmicro-file
                                        5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                                    revert-1153933703d9+               align16+
  Time mean95-pread_z100k       9.9195 (   0.00%)      5.9856 (  39.66%)      5.9938 (  39.58%)
  Time mean95-pread_z10k        1.1378 (   0.00%)      0.7450 (  34.52%)      0.7467 (  34.38%)
  Time mean95-pread_z1k         0.2623 (   0.00%)      0.2251 (  14.18%)      0.2252 (  14.15%)
  Time mean95-pread_zw100k      9.9974 (   0.00%)      6.0648 (  39.34%)      6.0756 (  39.23%)
  Time mean95-read_z100k        9.8940 (   0.00%)      5.9885 (  39.47%)      5.9994 (  39.36%)
  Time mean95-read_z10k         1.1394 (   0.00%)      0.7483 (  34.33%)      0.7482 (  34.33%)

Note that this doesn't affect Haswell or Broadwell microarchitectures
which seem to avoid the alignment issue by executing the loop straight
out of the Loop Stream Detector (verified using perf events).

Fixes: 1153933703 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v4.19+
Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30 23:17:16 -04:00
..
.gitignore
atomic64_32.c
atomic64_386_32.S
atomic64_cx8_32.S
cache-smp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
checksum_32.S x86/retpoline/checksum32: Convert assembler indirect jumps 2018-01-12 00:14:31 +01:00
clear_page_64.S x86/asm: Trim clear_page.S includes 2018-02-13 17:37:07 +01:00
cmdline.c x86/boot: Add early cmdline parsing for options with arguments 2017-07-18 11:38:06 +02:00
cmpxchg8b_emu.S x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
cmpxchg16b_emu.S
copy_page_64.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
copy_user_64.S x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings 2017-06-30 09:52:51 +02:00
cpu.c x86/lib/cpu: Address missing prototypes warning 2019-08-29 08:28:45 +02:00
csum-copy_64.S x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic() 2017-05-05 07:59:24 +02:00
csum-partial_64.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
csum-wrappers_64.c
delay.c x86/asm: Fix MWAITX C-state hint value 2019-10-17 13:45:43 -07:00
error-inject.c x86/error_inject: Make just_return_func() globally visible 2018-02-13 14:33:35 +01:00
getuser.S x86/get_user: Use pointer masking to limit speculation 2018-01-30 21:54:31 +01:00
hweight.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
inat.c
insn-eval.c x86/insn-eval: Fix use-after-free access to LDT entry 2019-06-11 12:20:52 +02:00
insn.c
iomap_copy_64.S
kaslr.c x86/kaslr: Fix incorrect i8254 outb() parameters 2019-01-31 08:14:39 +01:00
Makefile x86/mm/mem_encrypt: Disable all instrumentation for early SME setup 2019-05-25 18:23:45 +02:00
memcpy_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memcpy_64.S x86/uaccess: Fix up the fixup 2019-05-31 06:46:27 -07:00
memmove_64.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memset_64.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
misc.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mmx_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msr-reg-export.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msr-reg.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msr-smp.c x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well 2018-03-28 10:34:13 +02:00
msr.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
putuser.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
retpoline.S Revert "x86/retpoline: Simplify vmexit_fill_RSB()" 2018-02-20 09:38:26 +01:00
rwsem.S locking/arch, x86: Add __down_read_killable() 2017-10-10 11:50:15 +02:00
string_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
strstr_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
usercopy.c x86/nmi: Fix NMI uaccess race against CR3 switching 2018-08-31 17:08:22 +02:00
usercopy_32.c x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec 2018-01-30 21:54:31 +01:00
usercopy_64.c x86/asm/64: Align start of __clear_user() loop to 16-bytes 2020-06-30 23:17:16 -04:00
x86-opcode-map.txt x86/decoder: Add TEST opcode to Group3-2 2020-02-24 08:34:50 +01:00