Commit graph

13222 commits

Author SHA1 Message Date
Greg Kroah-Hartman
d8a623cfbb This is the 4.19.80 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2o0vkACgkQONu9yGCS
 aT50rQ//So/ShGy5+D6cmsPVbd8p/9X0fM5D6VjgJ2pj+HbalxW7JhJ0tyVdfkO1
 AS6p/24bmUMWI3pxJBbJjfggbyleV2UFqlHIei+SjfHd0sghZ+E/XY7mLdcKvx0a
 dGdXuxF+enb18pNiUKUrJUgqksgfO40yMXyZhjGf5A+/JmYsNaJhkyCWgPfGIB5i
 YuowOWC4Rkds/CQxyRSRJl+YSQWh8j7Ke0tdjhMQxnmyzFD0wcE5vUtq2vIP7yxO
 ypPrz9zWTase3y1yCcPsXd0m4Ji1VpF99hlgAr0HLFNcZn2kNYFpyblfOrI/zfoK
 RqUJgVVlZaWvhMOrueQO+XbebXdCWTJHiNTUDxIFakEAPNz+XkTQQf6LvzOjcFxB
 oLHnp1qBCn72y6o6Q3XmauFcmYBokyfBvXDAJsXYr+X5B4akn/fpyzzBCInX0h+r
 og5zpLbXDLszG3j76TPSpcjd+t4xqMy+MG+jkH/mLtMAFyaVi2sfMzsZJAQCACr2
 wNHRFQQGjDmQjovkwSRYENQBUuITSj+VrBUD0M7lnfA/Fni3BAJaxY5I6WeEvbeW
 ejzPVFZB8dfEy7hAKKiVEuI8/akcp8MUXfLJOfTxytLIpYllOBoVC4LtjgO5FYu3
 grSbRuowjrlUCtnCU9H18avLE1ScFFEGTmPOqI6xDqKiZf59QsQ=
 =sKVA
 -----END PGP SIGNATURE-----

Merge 4.19.80 into android-4.19

Changes in 4.19.80
	panic: ensure preemption is disabled during panic()
	f2fs: use EINVAL for superblock with invalid magic
	USB: rio500: Remove Rio 500 kernel driver
	USB: yurex: Don't retry on unexpected errors
	USB: yurex: fix NULL-derefs on disconnect
	USB: usb-skeleton: fix runtime PM after driver unbind
	USB: usb-skeleton: fix NULL-deref on disconnect
	xhci: Fix false warning message about wrong bounce buffer write length
	xhci: Prevent device initiated U1/U2 link pm if exit latency is too long
	xhci: Check all endpoints for LPM timeout
	xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts
	usb: xhci: wait for CNR controller not ready bit in xhci resume
	xhci: Prevent deadlock when xhci adapter breaks during init
	xhci: Increase STS_SAVE timeout in xhci_suspend()
	USB: adutux: fix use-after-free on disconnect
	USB: adutux: fix NULL-derefs on disconnect
	USB: adutux: fix use-after-free on release
	USB: iowarrior: fix use-after-free on disconnect
	USB: iowarrior: fix use-after-free on release
	USB: iowarrior: fix use-after-free after driver unbind
	USB: usblp: fix runtime PM after driver unbind
	USB: chaoskey: fix use-after-free on release
	USB: ldusb: fix NULL-derefs on driver unbind
	serial: uartlite: fix exit path null pointer
	USB: serial: keyspan: fix NULL-derefs on open() and write()
	USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20
	USB: serial: option: add Telit FN980 compositions
	USB: serial: option: add support for Cinterion CLS8 devices
	USB: serial: fix runtime PM after driver unbind
	USB: usblcd: fix I/O after disconnect
	USB: microtek: fix info-leak at probe
	USB: dummy-hcd: fix power budget for SuperSpeed mode
	usb: renesas_usbhs: gadget: Do not discard queues in usb_ep_set_{halt,wedge}()
	usb: renesas_usbhs: gadget: Fix usb_ep_set_{halt,wedge}() behavior
	USB: legousbtower: fix slab info leak at probe
	USB: legousbtower: fix deadlock on disconnect
	USB: legousbtower: fix potential NULL-deref on disconnect
	USB: legousbtower: fix open after failed reset request
	USB: legousbtower: fix use-after-free on release
	mei: me: add comet point (lake) LP device ids
	mei: avoid FW version request on Ibex Peak and earlier
	gpio: eic: sprd: Fix the incorrect EIC offset when toggling
	Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc
	staging: vt6655: Fix memory leak in vt6655_probe
	iio: adc: hx711: fix bug in sampling of data
	iio: adc: ad799x: fix probe error handling
	iio: adc: axp288: Override TS pin bias current for some models
	iio: light: opt3001: fix mutex unlock race
	efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified
	perf llvm: Don't access out-of-scope array
	perf inject jit: Fix JIT_CODE_MOVE filename
	blk-wbt: fix performance regression in wbt scale_up/scale_down
	CIFS: Gracefully handle QueryInfo errors during open
	CIFS: Force revalidate inode when dentry is stale
	CIFS: Force reval dentry if LOOKUP_REVAL flag is set
	kernel/sysctl.c: do not override max_threads provided by userspace
	mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
	firmware: google: increment VPD key_len properly
	gpiolib: don't clear FLAG_IS_OUT when emulating open-drain/open-source
	iio: adc: stm32-adc: move registers definitions
	iio: adc: stm32-adc: fix a race when using several adcs with dma and irq
	cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic
	btrfs: fix incorrect updating of log root tree
	btrfs: fix uninitialized ret in ref-verify
	NFS: Fix O_DIRECT accounting of number of bytes read/written
	MIPS: Disable Loongson MMI instructions for kernel build
	MIPS: elf_hwcap: Export userspace ASEs
	ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags
	ACPI/PPTT: Add support for ACPI 6.3 thread flag
	arm64: topology: Use PPTT to determine if PE is a thread
	Fix the locking in dcache_readdir() and friends
	media: stkwebcam: fix runtime PM after driver unbind
	arm64/sve: Fix wrong free for task->thread.sve_state
	tracing/hwlat: Report total time spent in all NMIs during the sample
	tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
	ftrace: Get a reference counter for the trace_array on filter files
	tracing: Get trace_array reference for available_tracers files
	hwmon: Fix HWMON_P_MIN_ALARM mask
	x86/asm: Fix MWAITX C-state hint value
	PCI: vmd: Fix config addressing when using bus offsets
	perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization
	Linux 4.19.80

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8ef1835585f79b752712a835852a4bc960ae3b97
2019-10-17 15:33:07 -07:00
Dan Carpenter
491a39dcee mm/vmpressure.c: fix a signedness bug in vmpressure_register_event()
commit 518a86713078168acd67cf50bc0b45d54b4cce6c upstream.

The "mode" and "level" variables are enums and in this context GCC will
treat them as unsigned ints so the error handling is never triggered.

I also removed the bogus initializer because it isn't required any more
and it's sort of confusing.

[akpm@linux-foundation.org: reduce implicit and explicit typecasting]
[akpm@linux-foundation.org: fix return value, add comment, per Matthew]
Link: http://lkml.kernel.org/r/20190925110449.GO3264@mwanda
Fixes: 3cadfa2b94 ("mm/vmpressure.c: convert to use match_string() helper")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Enrico Weigelt <info@metux.net>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17 13:45:19 -07:00
Roman Gushchin
91711ef418 UPSTREAM: mm: vmalloc: show number of vmalloc pages in /proc/meminfo
Vmalloc() is getting more and more used these days (kernel stacks, bpf and
percpu allocator are new top users), and the total % of memory consumed by
vmalloc() can be pretty significant and changes dynamically.

/proc/meminfo is the best place to display this information: its top goal
is to show top consumers of the memory.

Since the VmallocUsed field in /proc/meminfo is not in use for quite a
long time (it has been defined to 0 by a5ad88ce8c ("mm: get rid of
'vmalloc_info' from /proc/meminfo")), let's reuse it for showing the
actual physical memory consumption of vmalloc().

Link: http://lkml.kernel.org/r/20190417194002.12369-3-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 97105f0ab7b877a8ece2005e214894e93793950c)
Signed-off-by: Minchan Kim <minchan@google.com>
Bug: 141257998
Test: "cat /proc/meminfo | grep VmallocUsed" doesn't show 0 any longer
Change-Id: Ibedcd5c1290ba7be3293e220a917434008a30996
2019-10-14 20:24:53 +00:00
Greg Kroah-Hartman
ef55d5261c This is the 4.19.79 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2grBgACgkQONu9yGCS
 aT6xRBAA0pTW2W/VvzBHBLeVlmNtwQZb8x7civVb72iZkltKR9tTPim90PULpz/P
 iO7kh8KqkgVUqdgBE0VzkHGWUSThggfSTQiqzCqOgTwV8WQWqSF8ET0HU8zbglYB
 5pXSojoRYmurGVznd4Ll6aWa5brXIKwf1mDSrFHagOyOLxQmyggHaTRSLx36BSfj
 gunE2ideB1oTaPmd/2aTI03CU3jRwXmowe8rZIDa8pJEpplZPFdk0YOPXg2t6uRI
 bjJGO8bhfR/14r/3h76IwsEiVVXIcCeEVm0fos/H6NUypedfi7jlT0Ldzg1/zZti
 mUMkbPGHcJbOWfBYPQq8xQzviCa+MFraA4Tek5h/Lf7kf3NpjE20AnH3pb9TaqQf
 mJYUGziCoOOOz8k+0eNtIjIZiCysOnf9sI5rGhMYb9qfZoZGG6RiitqyVYNa+rzJ
 wvIUQZ4vSnYmQMAXqxyayfSZvFbMxv6pAdeH0NrXVRgFF6dnKG9TSsCnIuQaJxAE
 OQRaYEJktMUBs81hS0IjnJNDFLW3r++s87xEYvCt4L7XGSrxMJ3jW6xLZlmET68G
 4UIddJ81zIuqpGY1qoWdWZAp3nfRfSX4ehOnoNmIDyC9pRhiCKc+N6j5rX8gBNO/
 SO8YOaNf9RTphhEG6Op7u4ZbU+UR4pYP+rjKveyT2HKPH6D/Tv0=
 =wt6H
 -----END PGP SIGNATURE-----

Merge 4.19.79 into android-4.19

Changes in 4.19.79
	s390/process: avoid potential reading of freed stack
	KVM: s390: Test for bad access register and size at the start of S390_MEM_OP
	s390/topology: avoid firing events before kobjs are created
	s390/cio: exclude subchannels with no parent from pseudo check
	KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts
	KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores
	KVM: PPC: Book3S HV: Don't lose pending doorbell request on migration on P9
	KVM: X86: Fix userspace set invalid CR4
	KVM: nVMX: handle page fault in vmread fix
	nbd: fix max number of supported devs
	PM / devfreq: tegra: Fix kHz to Hz conversion
	ASoC: Define a set of DAPM pre/post-up events
	ASoC: sgtl5000: Improve VAG power and mute control
	powerpc/mce: Fix MCE handling for huge pages
	powerpc/mce: Schedule work from irq_work
	powerpc/powernv: Restrict OPAL symbol map to only be readable by root
	powerpc/powernv/ioda: Fix race in TCE level allocation
	powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions
	can: mcp251x: mcp251x_hw_reset(): allow more time after a reset
	tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file
	crypto: qat - Silence smp_processor_id() warning
	crypto: skcipher - Unmap pages after an external error
	crypto: cavium/zip - Add missing single_release()
	crypto: caam - fix concurrency issue in givencrypt descriptor
	crypto: ccree - account for TEE not ready to report
	crypto: ccree - use the full crypt length value
	MIPS: Treat Loongson Extensions as ASEs
	power: supply: sbs-battery: use correct flags field
	power: supply: sbs-battery: only return health when battery present
	tracing: Make sure variable reference alias has correct var_ref_idx
	usercopy: Avoid HIGHMEM pfn warning
	timer: Read jiffies once when forwarding base clk
	PCI: vmd: Fix shadow offsets to reflect spec changes
	PCI: Restore Resizable BAR size bits correctly for 1MB BARs
	watchdog: imx2_wdt: fix min() calculation in imx2_wdt_set_timeout
	perf stat: Fix a segmentation fault when using repeat forever
	drm/omap: fix max fclk divider for omap36xx
	drm/msm/dsi: Fix return value check for clk_get_parent
	drm/nouveau/kms/nv50-: Don't create MSTMs for eDP connectors
	drm/i915/gvt: update vgpu workload head pointer correctly
	mmc: sdhci: improve ADMA error reporting
	mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
	Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
	xen/xenbus: fix self-deadlock after killing user process
	ieee802154: atusb: fix use-after-free at disconnect
	s390/cio: avoid calling strlen on null pointer
	cfg80211: initialize on-stack chandefs
	arm64: cpufeature: Detect SSBS and advertise to userspace
	ima: always return negative code for error
	ima: fix freeing ongoing ahash_request
	fs: nfs: Fix possible null-pointer dereferences in encode_attrs()
	9p: Transport error uninitialized
	9p: avoid attaching writeback_fid on mmap with type PRIVATE
	xen/pci: reserve MCFG areas earlier
	ceph: fix directories inode i_blkbits initialization
	ceph: reconnect connection if session hang in opening state
	watchdog: aspeed: Add support for AST2600
	netfilter: nf_tables: allow lookups in dynamic sets
	drm/amdgpu: Fix KFD-related kernel oops on Hawaii
	drm/amdgpu: Check for valid number of registers to read
	pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
	pwm: stm32-lp: Add check in case requested period cannot be achieved
	x86/purgatory: Disable the stackleak GCC plugin for the purgatory
	ntb: point to right memory window index
	thermal: Fix use-after-free when unregistering thermal zone device
	thermal_hwmon: Sanitize thermal_zone type
	libnvdimm/region: Initialize bad block for volatile namespaces
	fuse: fix memleak in cuse_channel_open
	libnvdimm/nfit_test: Fix acpi_handle redefinition
	sched/membarrier: Call sync_core only before usermode for same mm
	sched/membarrier: Fix private expedited registration check
	sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
	perf build: Add detection of java-11-openjdk-devel package
	kernel/elfcore.c: include proper prototypes
	perf unwind: Fix libunwind build failure on i386 systems
	nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
	drm/radeon: Bail earlier when radeon.cik_/si_support=0 is passed
	KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP
	KVM: nVMX: Fix consistency check on injected exception error code
	nbd: fix crash when the blksize is zero
	powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()
	powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag
	tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure
	tick: broadcast-hrtimer: Fix a race in bc_set_next
	perf tools: Fix segfault in cpu_cache_level__read()
	perf stat: Reset previous counts on repeat with interval
	riscv: Avoid interrupts being erroneously enabled in handle_exception()
	arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3
	KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe
	arm64: docs: Document SSBS HWCAP
	arm64: fix SSBS sanitization
	arm64: Add sysfs vulnerability show for spectre-v1
	arm64: add sysfs vulnerability show for meltdown
	arm64: enable generic CPU vulnerabilites support
	arm64: Always enable ssb vulnerability detection
	arm64: Provide a command line to disable spectre_v2 mitigation
	arm64: Advertise mitigation of Spectre-v2, or lack thereof
	arm64: Always enable spectre-v2 vulnerability detection
	arm64: add sysfs vulnerability show for spectre-v2
	arm64: add sysfs vulnerability show for speculative store bypass
	arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
	arm64: Force SSBS on context switch
	arm64: Use firmware to detect CPUs that are not affected by Spectre-v2
	arm64/speculation: Support 'mitigations=' cmdline option
	vfs: Fix EOVERFLOW testing in put_compat_statfs64
	coresight: etm4x: Use explicit barriers on enable/disable
	staging: erofs: fix an error handling in erofs_readdir()
	staging: erofs: some compressed cluster should be submitted for corrupted images
	staging: erofs: add two missing erofs_workgroup_put for corrupted images
	staging: erofs: detect potential multiref due to corrupted images
	cfg80211: add and use strongly typed element iteration macros
	cfg80211: Use const more consistently in for_each_element macros
	nl80211: validate beacon head
	Linux 4.19.79

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie4f85994b5f3e53658c42833d0dc712575d0902e
2019-10-11 19:13:57 +02:00
Kees Cook
12c6c4a50f usercopy: Avoid HIGHMEM pfn warning
commit 314eed30ede02fa925990f535652254b5bad6b65 upstream.

When running on a system with >512MB RAM with a 32-bit kernel built with:

	CONFIG_DEBUG_VIRTUAL=y
	CONFIG_HIGHMEM=y
	CONFIG_HARDENED_USERCOPY=y

all execve()s will fail due to argv copying into kmap()ed pages, and on
usercopy checking the calls ultimately of virt_to_page() will be looking
for "bad" kmap (highmem) pointers due to CONFIG_DEBUG_VIRTUAL=y:

 ------------[ cut here ]------------
 kernel BUG at ../arch/x86/mm/physaddr.c:83!
 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc8 #6
 Hardware name: Dell Inc. Inspiron 1318/0C236D, BIOS A04 01/15/2009
 EIP: __phys_addr+0xaf/0x100
 ...
 Call Trace:
  __check_object_size+0xaf/0x3c0
  ? __might_sleep+0x80/0xa0
  copy_strings+0x1c2/0x370
  copy_strings_kernel+0x2b/0x40
  __do_execve_file+0x4ca/0x810
  ? kmem_cache_alloc+0x1c7/0x370
  do_execve+0x1b/0x20
  ...

The check is from arch/x86/mm/physaddr.c:

	VIRTUAL_BUG_ON((phys_addr >> PAGE_SHIFT) > max_low_pfn);

Due to the kmap() in fs/exec.c:

		kaddr = kmap(kmapped_page);
	...
	if (copy_from_user(kaddr+offset, str, bytes_to_copy)) ...

Now we can fetch the correct page to avoid the pfn check. In both cases,
hardened usercopy will need to walk the page-span checker (if enabled)
to do sanity checking.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: f5509cc18d ("mm: Hardened usercopy")
Cc: Matthew Wilcox <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/201909171056.7F2FFD17@keescook
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11 18:20:58 +02:00
Catalin Marinas
13daec4d70 UPSTREAM: mm: untag user pointers in mmap/munmap/mremap/brk
(Upstream commit ce18d171cb7368557e6498a3ce111d7d3dc03e4d).

There isn't a good reason to differentiate between the user address space
layout modification syscalls and the other memory permission/attributes
ones (e.g.  mprotect, madvise) w.r.t.  the tagged address ABI.  Untag the
user addresses on entry to these functions.

Link: http://lkml.kernel.org/r/20190821164730.47450-2-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Dave P Martin <Dave.Martin@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: If6a7be6d5f39f411bd65c49100088d0ceae530ee
2019-10-07 15:27:41 -04:00
Andrey Konovalov
6e651342c6 UPSTREAM: mm: untag user pointers in get_vaddr_frames
(Upstream commit 5d65e7a7d8cd5c77baa1acf129a11b8b45ffee75).

This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

get_vaddr_frames uses provided user pointers for vma lookups, which can
only by done with untagged pointers.  Instead of locating and changing all
callers of this function, perform untagging in it.

Link: http://lkml.kernel.org/r/28f05e49c92b2a69c4703323d6c12208f3d881fe.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: I1a8fd78531a7cf299cb41192519857b40e0d2305
2019-10-07 15:27:40 -04:00
Andrey Konovalov
d59b4a6646 UPSTREAM: mm: untag user pointers in mm/gup.c
(Upstream commit f9652594195fca8c3d8b8ee392ad0ff9f701bb20).

This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

mm/gup.c provides a kernel interface that accepts user addresses and
manipulates user pages directly (for example get_user_pages, that is used
by the futex syscall).  Since a user can provided tagged addresses, we
need to handle this case.

Add untagging to gup.c functions that use user addresses for vma lookups.

Link: http://lkml.kernel.org/r/4731bddba3c938658c10ff4ed55cc01c60f4c8f8.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: Icdfc4cf0851cd84bac89908d53ba17c65df40077
2019-10-07 15:27:40 -04:00
Andrey Konovalov
690c4ca8a5 UPSTREAM: mm: untag user pointers passed to memory syscalls
(Upstream commit 057d3389108eda8a20c7f496f011846932680d88).

This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.

This patch allows tagged pointers to be passed to the following memory
syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect,
mremap, msync, munlock, move_pages.

The mmap and mremap syscalls do not currently accept tagged addresses.
Architectures may interpret the tag as a background colour for the
corresponding vma.

Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: I1a2d89eedb45e618e85ca515f4c9121460711efb
2019-10-07 15:27:40 -04:00
Greg Kroah-Hartman
7f1f24fed2 This is the 4.19.77 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2YehoACgkQONu9yGCS
 aT536hAA0w6XF1ogPi23xS5BQ386c3BjWaOJddu0PFkR9utguuZaDG5AKjnkHtm2
 foxKeCGyChCc1h6qFD2wLavsQMephzLWXkMw1/uXfwAPXGWY1hquD46zrFO0IYOc
 dsmKh5V8ZT70lmPTBHCDVowp1xUczNFhuNRHz1KzVQ3lsKMAhsRiy6vpsXWzIeOM
 VCrayI4jxTVZvoA9Odm9fTXpau0Qb9k4pqGGU84dz/xWfGzrz/AOmp+kWuZRGepS
 fQi+46HK3idmGGNBVGkJYkDdpkefdrYdjlVHkkoAZHlaB+QaOrre0trNArK+knRu
 jJIl8R/M50yYbkn97vVIpwoh19rV0BI7KUFKO4C4NCByOOV+9vjJ5g1FuZ7c3pmj
 XKZiBmFBJVqbI1uwgqEn+/Tv6DyEz4gUzo5GHOe2i/Bh2nSguHZzO+Yhat1U+J+Y
 3QVfrIS10jICWBAqm147FepGtNxbCPP3plyqbJdrMwgM6GOMd83v+rY/5okB2YK4
 SHhzuevQoxujjeoOgHKjIqTiCIaJMt5ZvlCFqODg8NgpjTNDAbCA/UUQWs7MlrqX
 6uwH2QLwk3h+bEXUfGzsbsk5isTDpHr1ch6SI9T675JHRZCE461RoR82XpjLOyHD
 90tBJk9pfKlLqSEyiaWPPpXErhoS5WCVKM5yuT/rSS/i2Ve4VH4=
 =Q+c9
 -----END PGP SIGNATURE-----

Merge 4.19.77 into android-4.19

Changes in 4.19.77
	arcnet: provide a buffer big enough to actually receive packets
	cdc_ncm: fix divide-by-zero caused by invalid wMaxPacketSize
	macsec: drop skb sk before calling gro_cells_receive
	net/phy: fix DP83865 10 Mbps HDX loopback disable function
	net: qrtr: Stop rx_worker before freeing node
	net/sched: act_sample: don't push mac header on ip6gre ingress
	net_sched: add max len check for TCA_KIND
	nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
	openvswitch: change type of UPCALL_PID attribute to NLA_UNSPEC
	ppp: Fix memory leak in ppp_write
	sch_netem: fix a divide by zero in tabledist()
	skge: fix checksum byte order
	usbnet: ignore endpoints with invalid wMaxPacketSize
	usbnet: sanity checking of packet sizes and device mtu
	net: sched: fix possible crash in tcf_action_destroy()
	tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
	net/mlx5: Add device ID of upcoming BlueField-2
	mISDN: enforce CAP_NET_RAW for raw sockets
	appletalk: enforce CAP_NET_RAW for raw sockets
	ax25: enforce CAP_NET_RAW for raw sockets
	ieee802154: enforce CAP_NET_RAW for raw sockets
	nfc: enforce CAP_NET_RAW for raw sockets
	nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
	ALSA: hda: Flush interrupts on disabling
	regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg
	ASoC: tlv320aic31xx: suppress error message for EPROBE_DEFER
	ASoC: sgtl5000: Fix of unmute outputs on probe
	ASoC: sgtl5000: Fix charge pump source assignment
	firmware: qcom_scm: Use proper types for dma mappings
	dmaengine: bcm2835: Print error in case setting DMA mask fails
	leds: leds-lp5562 allow firmware files up to the maximum length
	media: dib0700: fix link error for dibx000_i2c_set_speed
	media: mtk-cir: lower de-glitch counter for rc-mm protocol
	media: exynos4-is: fix leaked of_node references
	media: hdpvr: Add device num check and handling
	media: i2c: ov5640: Check for devm_gpiod_get_optional() error
	time/tick-broadcast: Fix tick_broadcast_offline() lockdep complaint
	sched/fair: Fix imbalance due to CPU affinity
	sched/core: Fix CPU controller for !RT_GROUP_SCHED
	x86/apic: Make apic_pending_intr_clear() more robust
	sched/deadline: Fix bandwidth accounting at all levels after offline migration
	x86/reboot: Always use NMI fallback when shutdown via reboot vector IPI fails
	x86/apic: Soft disable APIC before initializing it
	ALSA: hda - Show the fatal CORB/RIRB error more clearly
	ALSA: i2c: ak4xxx-adda: Fix a possible null pointer dereference in build_adc_controls()
	EDAC/mc: Fix grain_bits calculation
	media: iguanair: add sanity checks
	base: soc: Export soc_device_register/unregister APIs
	ALSA: usb-audio: Skip bSynchAddress endpoint check if it is invalid
	ia64:unwind: fix double free for mod->arch.init_unw_table
	EDAC/altera: Use the proper type for the IRQ status bits
	ASoC: rsnd: don't call clk_get_rate() under atomic context
	arm64/prefetch: fix a -Wtype-limits warning
	md/raid1: end bio when the device faulty
	md: don't call spare_active in md_reap_sync_thread if all member devices can't work
	md: don't set In_sync if array is frozen
	media: media/platform: fsl-viu.c: fix build for MICROBLAZE
	ACPI / processor: don't print errors for processorIDs == 0xff
	loop: Add LOOP_SET_DIRECT_IO to compat ioctl
	EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
	efi: cper: print AER info of PCIe fatal error
	firmware: arm_scmi: Check if platform has released shmem before using
	sched/fair: Use rq_lock/unlock in online_fair_sched_group
	idle: Prevent late-arriving interrupts from disrupting offline
	media: gspca: zero usb_buf on error
	perf config: Honour $PERF_CONFIG env var to specify alternate .perfconfig
	perf test vfs_getname: Disable ~/.perfconfig to get default output
	media: mtk-mdp: fix reference count on old device tree
	media: fdp1: Reduce FCP not found message level to debug
	media: em28xx: modules workqueue not inited for 2nd device
	media: rc: imon: Allow iMON RC protocol for ffdc 7e device
	dmaengine: iop-adma: use correct printk format strings
	perf record: Support aarch64 random socket_id assignment
	media: vsp1: fix memory leak of dl on error return path
	media: i2c: ov5645: Fix power sequence
	media: omap3isp: Don't set streaming state on random subdevs
	media: imx: mipi csi-2: Don't fail if initial state times-out
	net: lpc-enet: fix printk format strings
	m68k: Prevent some compiler warnings in Coldfire builds
	ARM: dts: imx7d: cl-som-imx7: make ethernet work again
	ARM: dts: imx7-colibri: disable HS400
	media: radio/si470x: kill urb on error
	media: hdpvr: add terminating 0 at end of string
	ASoC: uniphier: Fix double reset assersion when transitioning to suspend state
	tools headers: Fixup bitsperlong per arch includes
	ASoC: sun4i-i2s: Don't use the oversample to calculate BCLK
	led: triggers: Fix a memory leak bug
	nbd: add missing config put
	media: mceusb: fix (eliminate) TX IR signal length limit
	media: dvb-frontends: use ida for pll number
	posix-cpu-timers: Sanitize bogus WARNONS
	media: dvb-core: fix a memory leak bug
	libperf: Fix alignment trap with xyarray contents in 'perf stat'
	EDAC/amd64: Recognize DRAM device type ECC capability
	EDAC/amd64: Decode syndrome before translating address
	PM / devfreq: passive: Use non-devm notifiers
	PM / devfreq: exynos-bus: Correct clock enable sequence
	media: cec-notifier: clear cec_adap in cec_notifier_unregister
	media: saa7146: add cleanup in hexium_attach()
	media: cpia2_usb: fix memory leaks
	media: saa7134: fix terminology around saa7134_i2c_eeprom_md7134_gate()
	perf trace beauty ioctl: Fix off-by-one error in cmd->string table
	media: ov9650: add a sanity check
	ASoC: es8316: fix headphone mixer volume table
	ACPI / CPPC: do not require the _PSD method
	sched/cpufreq: Align trace event behavior of fast switching
	x86/apic/vector: Warn when vector space exhaustion breaks affinity
	arm64: kpti: ensure patched kernel text is fetched from PoU
	x86/mm/pti: Do not invoke PTI functions when PTI is disabled
	ASoC: fsl_ssi: Fix clock control issue in master mode
	x86/mm/pti: Handle unaligned address gracefully in pti_clone_pagetable()
	nvmet: fix data units read and written counters in SMART log
	nvme-multipath: fix ana log nsid lookup when nsid is not found
	ALSA: firewire-motu: add support for MOTU 4pre
	iommu/amd: Silence warnings under memory pressure
	libata/ahci: Drop PCS quirk for Denverton and beyond
	iommu/iova: Avoid false sharing on fq_timer_on
	libtraceevent: Change users plugin directory
	ARM: dts: exynos: Mark LDO10 as always-on on Peach Pit/Pi Chromebooks
	ACPI: custom_method: fix memory leaks
	ACPI / PCI: fix acpi_pci_irq_enable() memory leak
	closures: fix a race on wakeup from closure_sync
	hwmon: (acpi_power_meter) Change log level for 'unsafe software power cap'
	md/raid1: fail run raid1 array when active disk less than one
	dmaengine: ti: edma: Do not reset reserved paRAM slots
	kprobes: Prohibit probing on BUG() and WARN() address
	s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding
	x86/cpu: Add Tiger Lake to Intel family
	platform/x86: intel_pmc_core: Do not ioremap RAM
	ASoC: dmaengine: Make the pcm->name equal to pcm->id if the name is not set
	raid5: don't set STRIPE_HANDLE to stripe which is in batch list
	mmc: core: Clarify sdio_irq_pending flag for MMC_CAP2_SDIO_IRQ_NOTHREAD
	mmc: sdhci: Fix incorrect switch to HS mode
	mmc: core: Add helper function to indicate if SDIO IRQs is enabled
	mmc: dw_mmc: Re-store SDIO IRQs mask at system resume
	raid5: don't increment read_errors on EILSEQ return
	libertas: Add missing sentinel at end of if_usb.c fw_table
	e1000e: add workaround for possible stalled packet
	ALSA: hda - Drop unsol event handler for Intel HDMI codecs
	drm/amd/powerplay/smu7: enforce minimal VBITimeout (v2)
	media: ttusb-dec: Fix info-leak in ttusb_dec_send_command()
	ALSA: hda/realtek - Blacklist PC beep for Lenovo ThinkCentre M73/93
	iommu/amd: Override wrong IVRS IOAPIC on Raven Ridge systems
	btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type
	media: omap3isp: Set device on omap3isp subdevs
	PM / devfreq: passive: fix compiler warning
	iwlwifi: fw: don't send GEO_TX_POWER_LIMIT command to FW version 36
	ALSA: firewire-tascam: handle error code when getting current source of clock
	ALSA: firewire-tascam: check intermediate state of clock status and retry
	scsi: scsi_dh_rdac: zero cdb in send_mode_select()
	scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag
	printk: Do not lose last line in kmsg buffer dump
	IB/mlx5: Free mpi in mp_slave mode
	IB/hfi1: Define variables as unsigned long to fix KASAN warning
	randstruct: Check member structs in is_pure_ops_struct()
	Revert "ceph: use ceph_evict_inode to cleanup inode's resource"
	ceph: use ceph_evict_inode to cleanup inode's resource
	ALSA: hda/realtek - PCI quirk for Medion E4254
	blk-mq: add callback of .cleanup_rq
	scsi: implement .cleanup_rq callback
	powerpc/imc: Dont create debugfs files for cpu-less nodes
	fuse: fix missing unlock_page in fuse_writepage()
	parisc: Disable HP HSC-PCI Cards to prevent kernel crash
	KVM: x86: always stop emulation on page fault
	KVM: x86: set ctxt->have_exception in x86_decode_insn()
	KVM: x86: Manually calculate reserved bits when loading PDPTRS
	media: sn9c20x: Add MSI MS-1039 laptop to flip_dmi_table
	media: don't drop front-end reference count for ->detach
	binfmt_elf: Do not move brk for INTERP-less ET_EXEC
	ASoC: Intel: NHLT: Fix debug print format
	ASoC: Intel: Skylake: Use correct function to access iomem space
	ASoC: Intel: Fix use of potentially uninitialized variable
	ARM: samsung: Fix system restart on S3C6410
	ARM: zynq: Use memcpy_toio instead of memcpy on smp bring-up
	Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
	arm64: tlb: Ensure we execute an ISB following walk cache invalidation
	arm64: dts: rockchip: limit clock rate of MMC controllers for RK3328
	alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP
	regulator: Defer init completion for a while after late_initcall
	efifb: BGRT: Improve efifb_bgrt_sanity_check
	gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
	memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
	memcg, kmem: do not fail __GFP_NOFAIL charges
	i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
	block: fix null pointer dereference in blk_mq_rq_timed_out()
	smb3: allow disabling requesting leases
	ovl: Fix dereferencing possible ERR_PTR()
	ovl: filter of trusted xattr results in audit
	btrfs: fix allocation of free space cache v1 bitmap pages
	Btrfs: fix use-after-free when using the tree modification log
	btrfs: Relinquish CPUs in btrfs_compare_trees
	btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
	btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
	Btrfs: fix race setting up and completing qgroup rescan workers
	md/raid6: Set R5_ReadError when there is read failure on parity disk
	md: don't report active array_state until after revalidate_disk() completes.
	md: only call set_in_sync() when it is expected to succeed.
	cfg80211: Purge frame registrations on iftype change
	/dev/mem: Bail out upon SIGKILL.
	ext4: fix warning inside ext4_convert_unwritten_extents_endio
	ext4: fix punch hole for inline_data file systems
	quota: fix wrong condition in is_quota_modification()
	hwrng: core - don't wait on add_early_randomness()
	i2c: riic: Clear NACK in tend isr
	CIFS: fix max ea value size
	CIFS: Fix oplock handling for SMB 2.1+ protocols
	md/raid0: avoid RAID0 data corruption due to layout confusion.
	fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock
	mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
	drm/amd/display: Restore backlight brightness after system resume
	Linux 4.19.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2c74f09f497a4b45b244a7dd263c5116533dfccb
2019-10-06 11:27:45 +02:00
Yafang Shao
4d8bdf7f3a mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
[ Upstream commit a94b525241c0fff3598809131d7cfcfe1d572d8c ]

total_{migrate,free}_scanned will be added to COMPACTMIGRATE_SCANNED and
COMPACTFREE_SCANNED in compact_zone().  We should clear them before
scanning a new zone.  In the proc triggered compaction, we forgot clearing
them.

[laoar.shao@gmail.com: introduce a helper compact_zone_counters_init()]
  Link: http://lkml.kernel.org/r/1563869295-25748-1-git-send-email-laoar.shao@gmail.com
[akpm@linux-foundation.org: expand compact_zone_counters_init() into its single callsite, per mhocko]
[vbabka@suse.cz: squash compact_zone() list_head init as well]
  Link: http://lkml.kernel.org/r/1fb6f7da-f776-9e42-22f8-bbb79b030b98@suse.cz
[akpm@linux-foundation.org: kcompactd_do_work(): avoid unnecessary initialization of cc.zone]
Link: http://lkml.kernel.org/r/1563789275-9639-1-git-send-email-laoar.shao@gmail.com
Fixes: 7f354a548d ("mm, compaction: add vmstats for kcompactd work")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Yafang Shao <shaoyafang@didiglobal.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05 13:10:13 +02:00
Michal Hocko
b4a734a529 memcg, kmem: do not fail __GFP_NOFAIL charges
commit e55d9d9bfb69405bd7615c0f8d229d8fafb3e9b8 upstream.

Thomas has noticed the following NULL ptr dereference when using cgroup
v1 kmem limit:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 0
P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 16923 Comm: gtk-update-icon Not tainted 4.19.51 #42
Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
RIP: 0010:create_empty_buffers+0x24/0x100
Code: cd 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 ba 01 00 00 00 55 53 48 89 fb e8 97 fe ff ff 48 89 c5 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 22 48 85 c9 75 f1 48 89 6a 08 48 8b 43 18 48 8d
RSP: 0018:ffff927ac1b37bf8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: fffff2d4429fd740 RCX: 0000000100097149
RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9075a99fbe00
RBP: 0000000000000000 R08: fffff2d440949cc8 R09: 00000000000960c0
R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
R13: ffff907601f18360 R14: 0000000000002000 R15: 0000000000001000
FS:  00007fb55b288bc0(0000) GS:ffff90761f8c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000007aebc002 CR4: 00000000001606e0
Call Trace:
 create_page_buffers+0x4d/0x60
 __block_write_begin_int+0x8e/0x5a0
 ? ext4_inode_attach_jinode.part.82+0xb0/0xb0
 ? jbd2__journal_start+0xd7/0x1f0
 ext4_da_write_begin+0x112/0x3d0
 generic_perform_write+0xf1/0x1b0
 ? file_update_time+0x70/0x140
 __generic_file_write_iter+0x141/0x1a0
 ext4_file_write_iter+0xef/0x3b0
 __vfs_write+0x17e/0x1e0
 vfs_write+0xa5/0x1a0
 ksys_write+0x57/0xd0
 do_syscall_64+0x55/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Tetsuo then noticed that this is because the __memcg_kmem_charge_memcg
fails __GFP_NOFAIL charge when the kmem limit is reached.  This is a wrong
behavior because nofail allocations are not allowed to fail.  Normal
charge path simply forces the charge even if that means to cross the
limit.  Kmem accounting should be doing the same.

Link: http://lkml.kernel.org/r/20190906125608.32129-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Thomas Lindroth <thomas.lindroth@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05 13:10:08 +02:00
Tetsuo Handa
d40b3eafb5 memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
commit f9c645621a28e37813a1de96d9cbd89cde94a1e4 upstream.

Masoud Sharbiani noticed that commit 29ef680ae7 ("memcg, oom: move
out_of_memory back to the charge path") broke memcg OOM called from
__xfs_filemap_fault() path.  It turned out that try_charge() is retrying
forever without making forward progress because mem_cgroup_oom(GFP_NOFS)
cannot invoke the OOM killer due to commit 3da88fb3ba ("mm, oom:
move GFP_NOFS check to out_of_memory").

Allowing forced charge due to being unable to invoke memcg OOM killer will
lead to global OOM situation.  Also, just returning -ENOMEM will be risky
because OOM path is lost and some paths (e.g.  get_user_pages()) will leak
-ENOMEM.  Therefore, invoking memcg OOM killer (despite GFP_NOFS) will be
the only choice we can choose for now.

Until 29ef680ae7, we were able to invoke memcg OOM killer when
GFP_KERNEL reclaim failed [1].  But since 29ef680ae7, we need to
invoke memcg OOM killer when GFP_NOFS reclaim failed [2].  Although in the
past we did invoke memcg OOM killer for GFP_NOFS [3], we might get
pre-mature memcg OOM reports due to this patch.

[1]

 leaker invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
 CPU: 0 PID: 2746 Comm: leaker Not tainted 4.18.0+ #19
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
 Call Trace:
  dump_stack+0x63/0x88
  dump_header+0x67/0x27a
  ? mem_cgroup_scan_tasks+0x91/0xf0
  oom_kill_process+0x210/0x410
  out_of_memory+0x10a/0x2c0
  mem_cgroup_out_of_memory+0x46/0x80
  mem_cgroup_oom_synchronize+0x2e4/0x310
  ? high_work_func+0x20/0x20
  pagefault_out_of_memory+0x31/0x76
  mm_fault_error+0x55/0x115
  ? handle_mm_fault+0xfd/0x220
  __do_page_fault+0x433/0x4e0
  do_page_fault+0x22/0x30
  ? page_fault+0x8/0x30
  page_fault+0x1e/0x30
 RIP: 0033:0x4009f0
 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83
 RSP: 002b:00007ffe29ae96f0 EFLAGS: 00010206
 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001ce1000
 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000
 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f94be09220d
 R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0
 R13: 0000000000000003 R14: 00007f949d845000 R15: 0000000002800000
 Task in /leaker killed as a result of limit of /leaker
 memory: usage 524288kB, limit 524288kB, failcnt 158965
 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
 kmem: usage 2016kB, limit 9007199254740988kB, failcnt 0
 Memory cgroup stats for /leaker: cache:844KB rss:521136KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB inactive_anon:0KB active_anon:521224KB inactive_file:1012KB active_file:8KB unevictable:0KB
 Memory cgroup out of memory: Kill process 2746 (leaker) score 998 or sacrifice child
 Killed process 2746 (leaker) total-vm:536704kB, anon-rss:521176kB, file-rss:1208kB, shmem-rss:0kB
 oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

[2]

 leaker invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), nodemask=(null), order=0, oom_score_adj=0
 CPU: 1 PID: 2746 Comm: leaker Not tainted 4.18.0+ #20
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
 Call Trace:
  dump_stack+0x63/0x88
  dump_header+0x67/0x27a
  ? mem_cgroup_scan_tasks+0x91/0xf0
  oom_kill_process+0x210/0x410
  out_of_memory+0x109/0x2d0
  mem_cgroup_out_of_memory+0x46/0x80
  try_charge+0x58d/0x650
  ? __radix_tree_replace+0x81/0x100
  mem_cgroup_try_charge+0x7a/0x100
  __add_to_page_cache_locked+0x92/0x180
  add_to_page_cache_lru+0x4d/0xf0
  iomap_readpages_actor+0xde/0x1b0
  ? iomap_zero_range_actor+0x1d0/0x1d0
  iomap_apply+0xaf/0x130
  iomap_readpages+0x9f/0x150
  ? iomap_zero_range_actor+0x1d0/0x1d0
  xfs_vm_readpages+0x18/0x20 [xfs]
  read_pages+0x60/0x140
  __do_page_cache_readahead+0x193/0x1b0
  ondemand_readahead+0x16d/0x2c0
  page_cache_async_readahead+0x9a/0xd0
  filemap_fault+0x403/0x620
  ? alloc_set_pte+0x12c/0x540
  ? _cond_resched+0x14/0x30
  __xfs_filemap_fault+0x66/0x180 [xfs]
  xfs_filemap_fault+0x27/0x30 [xfs]
  __do_fault+0x19/0x40
  __handle_mm_fault+0x8e8/0xb60
  handle_mm_fault+0xfd/0x220
  __do_page_fault+0x238/0x4e0
  do_page_fault+0x22/0x30
  ? page_fault+0x8/0x30
  page_fault+0x1e/0x30
 RIP: 0033:0x4009f0
 Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83
 RSP: 002b:00007ffda45c9290 EFLAGS: 00010206
 RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001a1e000
 RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000
 RBP: 000000000000000c R08: 0000000000000000 R09: 00007f6d061ff20d
 R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0
 R13: 0000000000000003 R14: 00007f6ce59b2000 R15: 0000000002800000
 Task in /leaker killed as a result of limit of /leaker
 memory: usage 524288kB, limit 524288kB, failcnt 7221
 memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
 kmem: usage 1944kB, limit 9007199254740988kB, failcnt 0
 Memory cgroup stats for /leaker: cache:3632KB rss:518232KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:518408KB inactive_file:3908KB active_file:12KB unevictable:0KB
 Memory cgroup out of memory: Kill process 2746 (leaker) score 992 or sacrifice child
 Killed process 2746 (leaker) total-vm:536704kB, anon-rss:518264kB, file-rss:1188kB, shmem-rss:0kB
 oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

[3]

 leaker invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=0
 leaker cpuset=/ mems_allowed=0
 CPU: 1 PID: 3206 Comm: leaker Not tainted 3.10.0-957.27.2.el7.x86_64 #1
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
 Call Trace:
  [<ffffffffaf364147>] dump_stack+0x19/0x1b
  [<ffffffffaf35eb6a>] dump_header+0x90/0x229
  [<ffffffffaedbb456>] ? find_lock_task_mm+0x56/0xc0
  [<ffffffffaee32a38>] ? try_get_mem_cgroup_from_mm+0x28/0x60
  [<ffffffffaedbb904>] oom_kill_process+0x254/0x3d0
  [<ffffffffaee36c36>] mem_cgroup_oom_synchronize+0x546/0x570
  [<ffffffffaee360b0>] ? mem_cgroup_charge_common+0xc0/0xc0
  [<ffffffffaedbc194>] pagefault_out_of_memory+0x14/0x90
  [<ffffffffaf35d072>] mm_fault_error+0x6a/0x157
  [<ffffffffaf3717c8>] __do_page_fault+0x3c8/0x4f0
  [<ffffffffaf371925>] do_page_fault+0x35/0x90
  [<ffffffffaf36d768>] page_fault+0x28/0x30
 Task in /leaker killed as a result of limit of /leaker
 memory: usage 524288kB, limit 524288kB, failcnt 20628
 memory+swap: usage 524288kB, limit 9007199254740988kB, failcnt 0
 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
 Memory cgroup stats for /leaker: cache:840KB rss:523448KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:523448KB inactive_file:464KB active_file:376KB unevictable:0KB
 Memory cgroup out of memory: Kill process 3206 (leaker) score 970 or sacrifice child
 Killed process 3206 (leaker) total-vm:536692kB, anon-rss:523304kB, file-rss:412kB, shmem-rss:0kB

Bisected by Masoud Sharbiani.

Link: http://lkml.kernel.org/r/cbe54ed1-b6ba-a056-8899-2dc42526371d@i-love.sakura.ne.jp
Fixes: 3da88fb3ba ("mm, oom: move GFP_NOFS check to out_of_memory") [necessary after 29ef680ae7]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Masoud Sharbiani <msharbiani@apple.com>
Tested-by: Masoud Sharbiani <msharbiani@apple.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>	[4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05 13:10:07 +02:00
Andrey Ryabinin
a0aef70a47 UPSTREAM: mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
(Upstream commit 00fb24a42a68b1ee0f6495993fe1be7124433dfb).

The code like this:

	ptr = kmalloc(size, GFP_KERNEL);
	page = virt_to_page(ptr);
	offset = offset_in_page(ptr);
	kfree(page_address(page) + offset);

may produce false-positive invalid-free reports on the kernel with
CONFIG_KASAN_SW_TAGS=y.

In the example above we lose the original tag assigned to 'ptr', so
kfree() gets the pointer with 0xFF tag.  In kfree() we check that 0xFF
tag is different from the tag in shadow hence print false report.

Instead of just comparing tags, do the following:

1) Check that shadow doesn't contain KASAN_TAG_INVALID.  Otherwise it's
   double-free and it doesn't matter what tag the pointer have.

2) If pointer tag is different from 0xFF, make sure that tag in the
   shadow is the same as in the pointer.

Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com
Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Walter Wu <walter-zh.wu@mediatek.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I3250be3a6e243086c561899a3458387370db693a
2019-09-24 17:44:16 -07:00
Nathan Chancellor
c89c308809 UPSTREAM: kasan: initialize tag to 0xff in __kasan_kmalloc
(Upstream commit 0600597c854e53d2f9b7a6a718c1da2b8b4cb4db).

When building with -Wuninitialized and CONFIG_KASAN_SW_TAGS unset, Clang
warns:

mm/kasan/common.c:484:40: warning: variable 'tag' is uninitialized when
used here [-Wuninitialized]
        kasan_unpoison_shadow(set_tag(object, tag), size);
                                              ^~~

set_tag ignores tag in this configuration but clang doesn't realize it at
this point in its pipeline, as it points to arch_kasan_set_tag as being
the point where it is used, which will later be expanded to (void
*)(object) without a use of tag.  Initialize tag to 0xff, as it removes
this warning and doesn't change the meaning of the code.

Link: https://github.com/ClangBuiltLinux/linux/issues/465
Link: http://lkml.kernel.org/r/20190502163057.6603-1-natechancellor@gmail.com
Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I15c57bb15b36ed9f2dbfe31641a5a2905d3ab668
2019-09-24 17:44:16 -07:00
Peter Zijlstra
40b539bf2e UPSTREAM: x86/uaccess, kasan: Fix KASAN vs SMAP
(Upstream commit 57b78a62e7f23c4686fe54091cdc3d12e60d6513).

KASAN inserts extra code for every LOAD/STORE emitted by te compiler.
Much of this code is simple and safe to run with AC=1, however the
kasan_report() function, called on error, is most certainly not safe
to call with AC=1.

Therefore wrap kasan_report() in user_access_{save,restore}; which for
x86 SMAP, saves/restores EFLAGS and clears AC before calling the real
function.

Also ensure all the functions are without __fentry__ hook. The
function tracer is also not safe.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I636890ae2e0dcb5a54f27942d1b00098bb1a43fe
2019-09-24 17:44:16 -07:00
Qian Cai
0ee9b4391a UPSTREAM: kasan: fix variable 'tag' set but not used warning
(Upstream commit c412a769d2452161e97f163c4c4f31efc6626f06).

set_tag() compiles away when CONFIG_KASAN_SW_TAGS=n, so make
arch_kasan_set_tag() a static inline function to fix warnings below.

  mm/kasan/common.c: In function '__kasan_kmalloc':
  mm/kasan/common.c:475:5: warning: variable 'tag' set but not used [-Wunused-but-set-variable]
    u8 tag;
       ^~~

Link: http://lkml.kernel.org/r/20190307185244.54648-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I46b33353a25e5e3e0ef7b61dba56ae52922a5452
2019-09-24 17:44:15 -07:00
Andrey Konovalov
6395402ef0 UPSTREAM: kasan: fix coccinelle warnings in kasan_p*_table
(Upstream commit 5c0198b6fb73867dea296a69a944a4fdbceff3d8).

kasan_p4d_table(), kasan_pmd_table() and kasan_pud_table() are declared
as returning bool, but return 0 instead of false, which produces a
coccinelle warning.  Fix it.

Link: http://lkml.kernel.org/r/1fa6fadf644859e8a6a8ecce258444b49be8c7ee.1551716733.git.andreyknvl@google.com
Fixes: 0207df4fa1 ("kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I6db14d97607078ac7b2231d323653815caddadf1
2019-09-24 17:44:15 -07:00
Arnd Bergmann
cdb775fce4 UPSTREAM: kasan: fix kasan_check_read/write definitions
(Upstream commit bcf6f55a0d05eedd8ebb6ecc60ae3f93205ad833).

Building little-endian allmodconfig kernels on arm64 started failing
with the generated atomic.h implementation, since we now try to call
kasan helpers from the EFI stub:

  aarch64-linux-gnu-ld: drivers/firmware/efi/libstub/arm-stub.stub.o: in function `atomic_set':
  include/generated/atomic-instrumented.h:44: undefined reference to `__efistub_kasan_check_write'

I suspect that we get similar problems in other files that explicitly
disable KASAN for some reason but call atomic_t based helper functions.

We can fix this by checking the predefined __SANITIZE_ADDRESS__ macro
that the compiler sets instead of checking CONFIG_KASAN, but this in
turn requires a small hack in mm/kasan/common.c so we do see the extern
declaration there instead of the inline function.

Link: http://lkml.kernel.org/r/20181211133453.2835077-1-arnd@arndb.de
Fixes: b1864b828644 ("locking/atomics: build atomic headers as required")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Ib6836f63fc9dfe594cd8ce4e99f2ad4489c98a53
2019-09-24 17:44:15 -07:00
Andrey Ryabinin
f46851e627 BACKPORT: kasan: remove use after scope bugs detection.
(Upstream commit 7771bdbbfd3d6f204631b6fd9e1bbc30cd15918e).

Use after scope bugs detector seems to be almost entirely useless for
the linux kernel.  It exists over two years, but I've seen only one
valid bug so far [1].  And the bug was fixed before it has been
reported.  There were some other use-after-scope reports, but they were
false-positives due to different reasons like incompatibility with
structleak plugin.

This feature significantly increases stack usage, especially with GCC <
9 version, and causes a 32K stack overflow.  It probably adds
performance penalty too.

Given all that, let's remove use-after-scope detector entirely.

While preparing this patch I've noticed that we mistakenly enable
use-after-scope detection for clang compiler regardless of
CONFIG_KASAN_EXTRA setting.  This is also fixed now.

[1] http://lkml.kernel.org/r/<20171129052106.rhgbjhhis53hkgfn@wfg-t540p.sh.intel.com>

Link: http://lkml.kernel.org/r/20190111185842.13978-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Will Deacon <will.deacon@arm.com>		[arm64]
Cc: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I00d869a5e287e3e198950b78ad17bd6ff6a51595
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
2019-09-24 17:44:15 -07:00
Qian Cai
e50674c599 UPSTREAM: slub: fix a crash with SLUB_DEBUG + KASAN_SW_TAGS
(Upstream commit 6373dca16c911b2828ef8d836d7f6f1800e1bbbc).

In process_slab(), "p = get_freepointer()" could return a tagged
pointer, but "addr = page_address()" always return a native pointer.  As
the result, slab_index() is messed up here,

    return (p - addr) / s->size;

All other callers of slab_index() have the same situation where "addr"
is from page_address(), so just need to untag "p".

    # cat /sys/kernel/slab/hugetlbfs_inode_cache/alloc_calls

    Unable to handle kernel paging request at virtual address 2bff808aa4856d48
    Mem abort info:
      ESR = 0x96000007
      Exception class = DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
    Data abort info:
      ISV = 0, ISS = 0x00000007
      CM = 0, WnR = 0
    swapper pgtable: 64k pages, 48-bit VAs, pgdp = 0000000002498338
    [2bff808aa4856d48] pgd=00000097fcfd0003, pud=00000097fcfd0003, pmd=00000097fca30003, pte=00e8008b24850712
    Internal error: Oops: 96000007 [#1] SMP
    CPU: 3 PID: 79210 Comm: read_all Tainted: G             L    5.0.0-rc7+ #84
    Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.0.6 07/10/2018
    pstate: 00400089 (nzcv daIf +PAN -UAO)
    pc : get_map+0x78/0xec
    lr : get_map+0xa0/0xec
    sp : aeff808989e3f8e0
    x29: aeff808989e3f940 x28: ffff800826200000
    x27: ffff100012d47000 x26: 9700000000002500
    x25: 0000000000000001 x24: 52ff8008200131f8
    x23: 52ff8008200130a0 x22: 52ff800820013098
    x21: ffff800826200000 x20: ffff100013172ba0
    x19: 2bff808a8971bc00 x18: ffff1000148f5538
    x17: 000000000000001b x16: 00000000000000ff
    x15: ffff1000148f5000 x14: 00000000000000d2
    x13: 0000000000000001 x12: 0000000000000000
    x11: 0000000020000002 x10: 2bff808aa4856d48
    x9 : 0000020000000000 x8 : 68ff80082620ebb0
    x7 : 0000000000000000 x6 : ffff1000105da1dc
    x5 : 0000000000000000 x4 : 0000000000000000
    x3 : 0000000000000010 x2 : 2bff808a8971bc00
    x1 : ffff7fe002098800 x0 : ffff80082620ceb0
    Process read_all (pid: 79210, stack limit = 0x00000000f65b9361)
    Call trace:
     get_map+0x78/0xec
     process_slab+0x7c/0x47c
     list_locations+0xb0/0x3c8
     alloc_calls_show+0x34/0x40
     slab_attr_show+0x34/0x48
     sysfs_kf_seq_show+0x2e4/0x570
     kernfs_seq_show+0x12c/0x1a0
     seq_read+0x48c/0xf84
     kernfs_fop_read+0xd4/0x448
     __vfs_read+0x94/0x5d4
     vfs_read+0xcc/0x194
     ksys_read+0x6c/0xe8
     __arm64_sys_read+0x68/0xb0
     el0_svc_handler+0x230/0x3bc
     el0_svc+0x8/0xc
    Code: d3467d2a 9ac92329 8b0a0e6a f9800151 (c85f7d4b)
    ---[ end trace a383a9a44ff13176 ]---
    Kernel panic - not syncing: Fatal exception
    SMP: stopping secondary CPUs
    SMP: failed to stop secondary CPUs 1-7,32,40,127
    Kernel Offset: disabled
    CPU features: 0x002,20000c18
    Memory Limit: none
    ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190220020251.82039-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I14d53cd0c3ce9672edc3b9666c68241e42d2fced
2019-09-24 17:44:15 -07:00
Andrey Konovalov
b26d7811dd UPSTREAM: kasan, slab: remove redundant kasan_slab_alloc hooks
(Upstream commit 557ea25383a231fe3ffc72881ada35c24b960dbc).

kasan_slab_alloc() calls in kmem_cache_alloc() and kmem_cache_alloc_node()
are redundant as they are already called via slab_alloc/slab_alloc_node()->
slab_post_alloc_hook()->kasan_slab_alloc().  Remove them.

Link: http://lkml.kernel.org/r/4ca1655cdcfc4379c49c50f7bf80f81c4ad01485.1550602886.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I61be422e28a4e31f36f9ec5bea654077691b89b9
2019-09-24 17:44:15 -07:00
Andrey Konovalov
8607c71441 UPSTREAM: kasan, slab: make freelist stored without tags
(Upstream commit 51dedad06b5f6c3eea7ec1069631b1ef7796912a).

Similarly to "kasan, slub: move kasan_poison_slab hook before
page_address", move kasan_poison_slab() before alloc_slabmgmt(), which
calls page_address(), to make page_address() return value to be
non-tagged.  This, combined with calling kasan_reset_tag() for off-slab
slab management object, leads to freelist being stored non-tagged.

Link: http://lkml.kernel.org/r/dfb53b44a4d00de3879a05a9f04c1f55e584f7a1.1550602886.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I65954f8eed880ff2a0fe76a81bcf2703f707a585
2019-09-24 17:44:15 -07:00
Andrey Konovalov
137a7d4213 UPSTREAM: kasan, slab: fix conflicts with CONFIG_HARDENED_USERCOPY
(Upstream commit 219667c23c68eb3dbc0d5662b9246f28477fe529).

Similarly to commit 96fedce27e13 ("kasan: make tag based mode work with
CONFIG_HARDENED_USERCOPY"), we need to reset pointer tags in
__check_heap_object() in mm/slab.c before doing any pointer math.

Link: http://lkml.kernel.org/r/9a5c0f958db10e69df5ff9f2b997866b56b7effc.1550602886.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I411184ecec0e3f77014bb330de3db5728d2065ed
2019-09-24 17:44:15 -07:00
Andrey Konovalov
d735952416 UPSTREAM: kasan: prevent tracing of tags.c
(Upstream commit dc15a8a2543cb9ebe67998eefe2880ce1a20d42e).

Similarly to commit 0d0c8de8788b ("kasan: mark file common so ftrace
doesn't trace it") add the -pg flag to mm/kasan/tags.c to prevent
conflicts with tracing.

Link: http://lkml.kernel.org/r/9c4c3ce5ccfb894c7fe66d91de7c1da2787b4da4.1550602886.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Qian Cai <cai@lca.pw>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I41a129790bd12e699f816e27fb086a6570d4223e
2019-09-24 17:44:15 -07:00
Andrey Konovalov
aa846c8644 UPSTREAM: kasan: fix random seed generation for tag-based mode
(Upstream commit 3f41b609382388f95c0a05b69b8db0d706adafb4).

There are two issues with assigning random percpu seeds right now:

1. We use for_each_possible_cpu() to iterate over cpus, but cpumask is
   not set up yet at the moment of kasan_init(), and thus we only set
   the seed for cpu #0.

2. A call to get_random_u32() always returns the same number and produces
   a message in dmesg, since the random subsystem is not yet initialized.

Fix 1 by calling kasan_init_tags() after cpumask is set up.

Fix 2 by using get_cycles() instead of get_random_u32(). This gives us
lower quality random numbers, but it's good enough, as KASAN is meant to
be used as a debugging tool and not a mitigation.

Link: http://lkml.kernel.org/r/1f815cc914b61f3516ed4cc9bfd9eeca9bd5d9de.1550677973.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Ic4b29dfd24515f02302d5cbd0d79eab5c6f0642c
2019-09-24 17:44:15 -07:00
Qian Cai
2a9788f5f8 UPSTREAM: slub: fix SLAB_CONSISTENCY_CHECKS + KASAN_SW_TAGS
(Upstream commit 338cfaad4993d3bc35a740e28981747770a65f90).

Enabling SLUB_DEBUG's SLAB_CONSISTENCY_CHECKS with KASAN_SW_TAGS
triggers endless false positives during boot below due to
check_valid_pointer() checks tagged pointers which have no addresses
that is valid within slab pages:

  BUG radix_tree_node (Tainted: G    B            ): Freelist Pointer check fails
  -----------------------------------------------------------------------------

  INFO: Slab objects=69 used=69 fp=0x          (null) flags=0x7ffffffc000200
  INFO: Object @offset=15060037153926966016 fp=0x

  Redzone: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 18 6b 06 00 08 80 ff d0  .........k......
  Object : 18 6b 06 00 08 80 ff d0 00 00 00 00 00 00 00 00  .k..............
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  Redzone: bb bb bb bb bb bb bb bb                          ........
  Padding: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
  CPU: 0 PID: 0 Comm: swapper/0 Tainted: G    B             5.0.0-rc5+ #18
  Call trace:
    dump_backtrace+0x0/0x450
    show_stack+0x20/0x2c
    __dump_stack+0x20/0x28
    dump_stack+0xa0/0xfc
    print_trailer+0x1bc/0x1d0
    object_err+0x40/0x50
    alloc_debug_processing+0xf0/0x19c
    ___slab_alloc+0x554/0x704
    kmem_cache_alloc+0x2f8/0x440
    radix_tree_node_alloc+0x90/0x2fc
    idr_get_free+0x1e8/0x6d0
    idr_alloc_u32+0x11c/0x2a4
    idr_alloc+0x74/0xe0
    worker_pool_assign_id+0x5c/0xbc
    workqueue_init_early+0x49c/0xd50
    start_kernel+0x52c/0xac4
  FIX radix_tree_node: Marking all objects used

Link: http://lkml.kernel.org/r/20190209044128.3290-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: Id7d2fe26fd720bd4bd9fa9ae7f5a04be881066e7
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
2019-09-24 17:44:14 -07:00
Andrey Konovalov
712b50a15c UPSTREAM: kasan, slub: fix more conflicts with CONFIG_SLAB_FREELIST_HARDENED
(Upstream commit d36a63a943e37081e92e4abdf4a207fd2e83a006).

When CONFIG_KASAN_SW_TAGS is enabled, ptr_addr might be tagged.  Normally,
this doesn't cause any issues, as both set_freepointer() and
get_freepointer() are called with a pointer with the same tag.  However,
there are some issues with CONFIG_SLUB_DEBUG code.  For example, when
__free_slub() iterates over objects in a cache, it passes untagged
pointers to check_object().  check_object() in turns calls
get_freepointer() with an untagged pointer, which causes the freepointer
to be restored incorrectly.

Add kasan_reset_tag to freelist_ptr(). Also add a detailed comment.

Link: http://lkml.kernel.org/r/bf858f26ef32eb7bd24c665755b3aee4bc58d0e4.1550103861.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Qian Cai <cai@lca.pw>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Ie57f08f676ea7a244a20f1dee98fc725594cafa6
2019-09-24 17:44:14 -07:00
Andrey Konovalov
5dbe32c94e UPSTREAM: kasan, slub: fix conflicts with CONFIG_SLAB_FREELIST_HARDENED
(Upstream commit 18e506610238eda2b0c5a19a123d3d6ec0ab2de6).

CONFIG_SLAB_FREELIST_HARDENED hashes freelist pointer with the address of
the object where the pointer gets stored.  With tag based KASAN we don't
account for that when building freelist, as we call set_freepointer() with
the first argument untagged.  This patch changes the code to properly
propagate tags throughout the loop.

Link: http://lkml.kernel.org/r/3df171559c52201376f246bf7ce3184fe21c1dc7.1549921721.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Qian Cai <cai@lca.pw>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I218d23e0fe3164f6f03d4f08bb93630f3e496f82
2019-09-24 17:44:14 -07:00
Andrey Konovalov
d79e51f2a9 UPSTREAM: kasan, slub: move kasan_poison_slab hook before page_address
(Upstream commit a71012242837fe5e67d8c999cfc357174ed5dba0).

With tag based KASAN page_address() looks at the page flags to see whether
the resulting pointer needs to have a tag set.  Since we don't want to set
a tag when page_address() is called on SLAB pages, we call
page_kasan_tag_reset() in kasan_poison_slab().  However in allocate_slab()
page_address() is called before kasan_poison_slab().  Fix it by changing
the order.

[andreyknvl@google.com: fix compilation error when CONFIG_SLUB_DEBUG=n]
  Link: http://lkml.kernel.org/r/ac27cc0bbaeb414ed77bcd6671a877cf3546d56e.1550066133.git.andreyknvl@google.com
Link: http://lkml.kernel.org/r/cd895d627465a3f1c712647072d17f10883be2a1.1549921721.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I53ade71e532b3e4163b84a3a23c7c54ab2d059bd
2019-09-24 17:44:14 -07:00
Andrey Konovalov
11543367c3 UPSTREAM: kasan, kmemleak: pass tagged pointers to kmemleak
(Upstream commit 53128245b43daad600d9fe72940206570e064112).

Right now we call kmemleak hooks before assigning tags to pointers in
KASAN hooks.  As a result, when an objects gets allocated, kmemleak sees a
differently tagged pointer, compared to the one it sees when the object
gets freed.  Fix it by calling KASAN hooks before kmemleak's ones.

Link: http://lkml.kernel.org/r/cd825aa4897b0fc37d3316838993881daccbe9f5.1549921721.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I7766731cf2fe22effd91bfb89da915355d0e7981
2019-09-24 17:44:14 -07:00
Andrey Konovalov
ceface4ee3 UPSTREAM: kasan: fix assigning tags twice
(Upstream commit e1db95befb3e9e3476629afec6e0f5d0707b9825).

When an object is kmalloc()'ed, two hooks are called: kasan_slab_alloc()
and kasan_kmalloc().  Right now we assign a tag twice, once in each of the
hooks.  Fix it by assigning a tag only in the former hook.

Link: http://lkml.kernel.org/r/ce8c6431da735aa7ec051fd6497153df690eb021.1549921721.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I70f85dda8fcbcc796ebde8294a6f110d390bff3f
2019-09-24 17:44:14 -07:00
Anders Roxell
b4d4745fe4 UPSTREAM: kasan: mark file common so ftrace doesn't trace it
(Upstream commit 0d0c8de8788b6c441ea01365612de7efc20cc682).

When option CONFIG_KASAN is enabled toghether with ftrace, function
ftrace_graph_caller() gets in to a recursion, via functions
kasan_check_read() and kasan_check_write().

 Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 179             mcount_get_pc             x0    //     function's pc
 (gdb) bt
 #0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 #1  0xffffff90101406c8 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
 #2  0xffffff90106fd084 in kasan_check_write (p=0xffffffc06c170878, size=4) at ../mm/kasan/common.c:105
 #3  0xffffff90104a2464 in atomic_add_return (v=<optimized out>, i=<optimized out>) at ./include/generated/atomic-instrumented.h:71
 #4  atomic_inc_return (v=<optimized out>) at ./include/generated/atomic-fallback.h:284
 #5  trace_graph_entry (trace=0xffffffc03f5ff380) at ../kernel/trace/trace_functions_graph.c:441
 #6  0xffffff9010481774 in trace_graph_entry_watchdog (trace=<optimized out>) at ../kernel/trace/trace_selftest.c:741
 #7  0xffffff90104a185c in function_graph_enter (ret=<optimized out>, func=<optimized out>, frame_pointer=18446743799894897728, retp=<optimized out>) at ../kernel/trace/trace_functions_graph.c:196
 #8  0xffffff9010140628 in prepare_ftrace_return (self_addr=18446743592948977792, parent=0xffffffc03f5ff418, frame_pointer=18446743799894897728) at ../arch/arm64/kernel/ftrace.c:231
 #9  0xffffff90101406f4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 (gdb)

Rework so that the kasan implementation isn't traced.

Link: http://lkml.kernel.org/r/20181212183447.15890-1-anders.roxell@linaro.org
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Id4dc25b795f81b4193c5f861657e9091acd99cef
2019-09-24 17:44:14 -07:00
Andrey Konovalov
0f007a2bf6 UPSTREAM: kasan: fix krealloc handling for tag-based mode
(Upstream commit a3fe7cdf02e318870fb71218726cc2321ff41f30).

Right now tag-based KASAN can retag the memory that is reallocated via
krealloc and return a differently tagged pointer even if the same slab
object gets used and no reallocated technically happens.

There are a few issues with this approach.  One is that krealloc callers
can't rely on comparing the return value with the passed argument to
check whether reallocation happened.  Another is that if a caller knows
that no reallocation happened, that it can access object memory through
the old pointer, which leads to false positives.  Look at
nf_ct_ext_add() to see an example.

Fix this by keeping the same tag if the memory don't actually gets
reallocated during krealloc.

Link: http://lkml.kernel.org/r/bb2a71d17ed072bcc528cbee46fcbd71a6da3be4.1546540962.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Ie9bd73c43fa76ac2ba946bfbd2cb88e5c0000dfb
2019-09-24 17:44:14 -07:00
Andrey Konovalov
d8c656e717 UPSTREAM: kasan: make tag based mode work with CONFIG_HARDENED_USERCOPY
(Upstream commit 96fedce27e1356a2fff1c270710d9405848db562).

With CONFIG_HARDENED_USERCOPY enabled __check_heap_object() compares and
then subtracts a potentially tagged pointer with a non-tagged address of
the page that this pointer belongs to, which leads to unexpected
behavior.

Untag the pointer in __check_heap_object() before doing any of these
operations.

Link: http://lkml.kernel.org/r/7e756a298d514c4482f52aea6151db34818d395d.1546540962.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Id1a000155fe12fc39b3caee7a097f9bea0ba3fce
2019-09-24 17:44:14 -07:00
Andrey Konovalov
c84291fd95 UPSTREAM: kasan, arm64: use ARCH_SLAB_MINALIGN instead of manual aligning
(Upstream commit eb214f2dda31ffa989033b1e0f848ba0d3cb6188).

Instead of changing cache->align to be aligned to KASAN_SHADOW_SCALE_SIZE
in kasan_cache_create() we can reuse the ARCH_SLAB_MINALIGN macro.

Link: http://lkml.kernel.org/r/52ddd881916bcc153a9924c154daacde78522227.1546540962.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Ia1c8de45c621b68555f77a569431e232ff4dc781
2019-09-24 17:44:14 -07:00
Qian Cai
88299f649b BACKPORT: mm/memblock.c: skip kmemleak for kasan_init()
(Upstream commit fed84c78527009d4f799a3ed9a566502fa026d82).

Kmemleak does not play well with KASAN (tested on both HPE Apollo 70 and
Huawei TaiShan 2280 aarch64 servers).

After calling start_kernel()->setup_arch()->kasan_init(), kmemleak early
log buffer went from something like 280 to 260000 which caused kmemleak
disabled and crash dump memory reservation failed.  The multitude of
kmemleak_alloc() calls is from nested loops while KASAN is setting up full
memory mappings, so let early kmemleak allocations skip those
memblock_alloc_internal() calls came from kasan_init() given that those
early KASAN memory mappings should not reference to other memory.  Hence,
no kmemleak false positives.

kasan_init
  kasan_map_populate [1]
    kasan_pgd_populate [2]
      kasan_pud_populate [3]
        kasan_pmd_populate [4]
          kasan_pte_populate [5]
            kasan_alloc_zeroed_page
              memblock_alloc_try_nid
                memblock_alloc_internal
                  kmemleak_alloc

[1] for_each_memblock(memory, reg)
[2] while (pgdp++, addr = next, addr != end)
[3] while (pudp++, addr = next, addr != end && pud_none(READ_ONCE(*pudp)))
[4] while (pmdp++, addr = next, addr != end && pmd_none(READ_ONCE(*pmdp)))
[5] while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep)))

Link: http://lkml.kernel.org/r/1543442925-17794-1-git-send-email-cai@gmx.us
Signed-off-by: Qian Cai <cai@gmx.us>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I2423d511c3938c882c738341673b13b3beff5475
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
2019-09-24 17:44:13 -07:00
Andrey Konovalov
f7de9038cd UPSTREAM: kasan: add SPDX-License-Identifier mark to source files
(Upstream commit e886bf9d9abedf8236464bfd21bc5707748b4a02).

This patch adds a "SPDX-License-Identifier: GPL-2.0" mark to all source
files under mm/kasan.

Link: http://lkml.kernel.org/r/bce2d1e618afa5142e81961ab8fa4b4165337380.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Ia5cdf04c5183e6c2cb75d98d60e6f4f5d961cd6e
2019-09-24 17:44:13 -07:00
Andrey Konovalov
c80e94ddfc UPSTREAM: kasan: add __must_check annotations to kasan hooks
(Upstream commit 66afc7f1e07a1db74453be9167ac0d1205653854).

This patch adds __must_check annotations to kasan hooks that return a
pointer to make sure that a tagged pointer always gets propagated.

Link: http://lkml.kernel.org/r/03b269c5e453945f724bfca3159d4e1333a8fb1c.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I758f33506b67ed5d4b7539745db311ab31c37f46
2019-09-24 17:44:13 -07:00
Andrey Konovalov
be21fa3044 UPSTREAM: kasan, mm, arm64: tag non slab memory allocated via pagealloc
(Upstream commit 2813b9c0296259fb11e75c839bab2d958ba4f96c).

Tag-based KASAN doesn't check memory accesses through pointers tagged with
0xff.  When page_address is used to get pointer to memory that corresponds
to some page, the tag of the resulting pointer gets set to 0xff, even
though the allocated memory might have been tagged differently.

For slab pages it's impossible to recover the correct tag to return from
page_address, since the page might contain multiple slab objects tagged
with different values, and we can't know in advance which one of them is
going to get accessed.  For non slab pages however, we can recover the tag
in page_address, since the whole page was marked with the same tag.

This patch adds tagging to non slab memory allocated with pagealloc.  To
set the tag of the pointer returned from page_address, the tag gets stored
to page->flags when the memory gets allocated.

Link: http://lkml.kernel.org/r/d758ddcef46a5abc9970182b9137e2fbee202a2c.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I500bdf42462fee0ee14495a6be51815e7e44460f
2019-09-24 17:44:13 -07:00
Andrey Konovalov
854b1b3576 UPSTREAM: kasan: add hooks implementation for tag-based mode
(Upstream commit 7f94ffbc4c6a1bdb51d39965e4f2acaa19bd798f).

This commit adds tag-based KASAN specific hooks implementation and
adjusts common generic and tag-based KASAN ones.

1. When a new slab cache is created, tag-based KASAN rounds up the size of
   the objects in this cache to KASAN_SHADOW_SCALE_SIZE (== 16).

2. On each kmalloc tag-based KASAN generates a random tag, sets the shadow
   memory, that corresponds to this object to this tag, and embeds this
   tag value into the top byte of the returned pointer.

3. On each kfree tag-based KASAN poisons the shadow memory with a random
   tag to allow detection of use-after-free bugs.

The rest of the logic of the hook implementation is very much similar to
the one provided by generic KASAN. Tag-based KASAN saves allocation and
free stack metadata to the slab object the same way generic KASAN does.

Link: http://lkml.kernel.org/r/bda78069e3b8422039794050ddcb2d53d053ed41.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I23c40472207cba40dc962bb182225e378322249e
2019-09-24 17:44:13 -07:00
Andrey Konovalov
cf9a2eedfd UPSTREAM: mm: move obj_to_index to include/linux/slab_def.h
(Upstream commit 5b7c4148222d7acaa1612e5eec84fc66c88d54f3).

While with SLUB we can actually preassign tags for caches with contructors
and store them in pointers in the freelist, SLAB doesn't allow that since
the freelist is stored as an array of indexes, so there are no pointers to
store the tags.

Instead we compute the tag twice, once when a slab is created before
calling the constructor and then again each time when an object is
allocated with kmalloc.  Tag is computed simply by taking the lowest byte
of the index that corresponds to the object.  However in kasan_kmalloc we
only have access to the objects pointer, so we need a way to find out
which index this object corresponds to.

This patch moves obj_to_index from slab.c to include/linux/slab_def.h to
be reused by KASAN.

Link: http://lkml.kernel.org/r/c02cd9e574cfd93858e43ac94b05e38f891fef64.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I6cb3332ea05e6152ab8de0490171ed5d4947def3
2019-09-24 17:44:13 -07:00
Andrey Konovalov
f8adce49ea UPSTREAM: kasan: add bug reporting routines for tag-based mode
(Upstream commit 121e8f81d38cc43834195722d0768340dc130a33).

This commit adds rountines, that print tag-based KASAN error reports.
Those are quite similar to generic KASAN, the difference is:

1. The way tag-based KASAN finds the first bad shadow cell (with a
   mismatching tag). Tag-based KASAN compares memory tags from the shadow
   memory to the pointer tag.

2. Tag-based KASAN reports all bugs with the "KASAN: invalid-access"
   header.

Also simplify generic KASAN find_first_bad_addr.

Link: http://lkml.kernel.org/r/aee6897b1bd077732a315fd84c6b4f234dbfdfcb.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I20ef95c8c568df73adf69b140dbbf4cd9da52f34
2019-09-24 17:44:13 -07:00
Andrey Konovalov
1ff3965086 UPSTREAM: kasan: split out generic_report.c from report.c
(Upstream commit 11cd3cd69a256a353dd1a249b48ccd727d945952).

Move generic KASAN specific error reporting routines to generic_report.c
without any functional changes, leaving common error reporting code in
report.c to be later reused by tag-based KASAN.

Link: http://lkml.kernel.org/r/ba48c32f8e5aefedee78998ccff0413bee9e0f5b.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Ic65fb3da92607d345be5bbe2c486269d1db53745
2019-09-24 17:44:13 -07:00
Andrey Konovalov
70b223ca1d UPSTREAM: kasan, mm: perform untagged pointers comparison in krealloc
(Upstream commit 772a2fa50ffb2f4282be8436da6e70530a2ac63c).

The krealloc function checks where the same buffer was reused or a new one
allocated by comparing kernel pointers.  Tag-based KASAN changes memory
tag on the krealloc'ed chunk of memory and therefore also changes the
pointer tag of the returned pointer.  Therefore we need to perform
comparison on untagged (with tags reset) pointers to check whether it's
the same memory region or not.

Link: http://lkml.kernel.org/r/14f6190d7846186a3506cd66d82446646fe65090.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I1e64158a5a0d683fc19c76296bc5fa345639bf30
2019-09-24 17:44:12 -07:00
Andrey Konovalov
9900949b99 UPSTREAM: kasan: preassign tags to objects with ctors or SLAB_TYPESAFE_BY_RCU
(Upstream commit 4d176711ea7a8d4873e7157ac6ab242ade3ba351).

An object constructor can initialize pointers within this objects based on
the address of the object.  Since the object address might be tagged, we
need to assign a tag before calling constructor.

The implemented approach is to assign tags to objects with constructors
when a slab is allocated and call constructors once as usual.  The
downside is that such object would always have the same tag when it is
reallocated, so we won't catch use-after-frees on it.

Also pressign tags for objects from SLAB_TYPESAFE_BY_RCU caches, since
they can be validy accessed after having been freed.

Link: http://lkml.kernel.org/r/f158a8a74a031d66f0a9398a5b0ed453c37ba09a.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I48b1de0516b9998f3b3e3917a15dd0bde27897cf
2019-09-24 17:44:12 -07:00
Andrey Konovalov
f8ce5f5926 UPSTREAM: kasan: add tag related helper functions
(Upstream commit 3c9e3aa11094e821aff4a8f6812a6e032293dbc0).

This commit adds a few helper functions, that are meant to be used to work
with tags embedded in the top byte of kernel pointers: to set, to get or
to reset the top byte.

Link: http://lkml.kernel.org/r/f6c6437bb8e143bc44f42c3c259c62e734be7935.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I6a46eb172205fe869d126ae296f64d44e007ac88
2019-09-24 17:44:12 -07:00
Andrey Konovalov
fe1cf9c054 BACKPORT: kasan: initialize shadow to 0xff for tag-based mode
Use kasan_alloc_zeroed_page instead of defining kasan_alloc_raw_page.

(Upstream commit 080eb83f54cf5b96ae5b6ce3c1896e35c341aff9).

A tag-based KASAN shadow memory cell contains a memory tag, that
corresponds to the tag in the top byte of the pointer, that points to that
memory.  The native top byte value of kernel pointers is 0xff, so with
tag-based KASAN we need to initialize shadow memory to 0xff.

[cai@lca.pw: arm64: skip kmemleak for KASAN again\
  Link: http://lkml.kernel.org/r/20181226020550.63712-1-cai@lca.pw
Link: http://lkml.kernel.org/r/5cc1b789aad7c99cf4f3ec5b328b147ad53edb40.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I370390e529de29fee99820f3ae2fc3d0408379ef
2019-09-24 17:44:12 -07:00
Andrey Konovalov
02e8a3fcb0 BACKPORT: kasan: rename kasan_zero_page to kasan_early_shadow_page
(Upstream commit 9577dd7486487722ed8f0773243223f108e8089f).

With tag based KASAN mode the early shadow value is 0xff and not 0x00, so
this patch renames kasan_zero_(page|pte|pmd|pud|p4d) to
kasan_early_shadow_(page|pte|pmd|pud|p4d) to avoid confusion.

Link: http://lkml.kernel.org/r/3fed313280ebf4f88645f5b89ccbc066d320e177.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 128674696
Change-Id: I22d043b1f2ce489b12832f6f1ba1593c4ccdaedf
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
2019-09-24 17:44:12 -07:00
Andrey Konovalov
8712cc728d BACKPORT: kasan: add CONFIG_KASAN_GENERIC and CONFIG_KASAN_SW_TAGS
The conflict during backport is caused by the
include/linux/compiler_attributes.h file not being present.

(Upstream commit 2bd926b439b4cb6b9ed240a9781cd01958b53d85).

This commit splits the current CONFIG_KASAN config option into two:
1. CONFIG_KASAN_GENERIC, that enables the generic KASAN mode (the one
   that exists now);
2. CONFIG_KASAN_SW_TAGS, that enables the software tag-based KASAN mode.

The name CONFIG_KASAN_SW_TAGS is chosen as in the future we will have
another hardware tag-based KASAN mode, that will rely on hardware memory
tagging support in arm64.

With CONFIG_KASAN_SW_TAGS enabled, compiler options are changed to
instrument kernel files with -fsantize=kernel-hwaddress (except the ones
for which KASAN_SANITIZE := n is set).

Both CONFIG_KASAN_GENERIC and CONFIG_KASAN_SW_TAGS support both
CONFIG_KASAN_INLINE and CONFIG_KASAN_OUTLINE instrumentation modes.

This commit also adds empty placeholder (for now) implementation of
tag-based KASAN specific hooks inserted by the compiler and adjusts
common hooks implementation.

While this commit adds the CONFIG_KASAN_SW_TAGS config option, this option
is not selectable, as it depends on HAVE_ARCH_KASAN_SW_TAGS, which we will
enable once all the infrastracture code has been added.

Link: http://lkml.kernel.org/r/b2550106eb8a68b10fefbabce820910b115aa853.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Id95c0c0b6857c6b30f2bea4597aea6c90273ef89
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
2019-09-24 17:44:12 -07:00
Andrey Konovalov
3915eb8bdf UPSTREAM: kasan: rename source files to reflect the new naming scheme
(Upstream commit b938fcf42739de8270e6ea41593722929c8a7dd0).

We now have two KASAN modes: generic KASAN and tag-based KASAN.  Rename
kasan.c to generic.c to reflect that.  Also rename kasan_init.c to init.c
as it contains initialization code for both KASAN modes.

Link: http://lkml.kernel.org/r/88c6fd2a883e459e6242030497230e5fb0d44d44.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I375981e1e9013517f073b29e27e750c48ba24879
2019-09-24 17:44:11 -07:00
Andrey Konovalov
17045587b4 UPSTREAM: kasan: move common generic and tag-based code to common.c
(Upstream commit bffa986c6f80e39d9903015fc7d0d99a66bbf559).

Tag-based KASAN reuses a significant part of the generic KASAN code, so
move the common parts to common.c without any functional changes.

Link: http://lkml.kernel.org/r/114064d002356e03bb8cc91f7835e20dc61b51d9.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Iffe8a07fa53751e4232c8458aab0a8adda951f55
2019-09-24 17:44:11 -07:00
Andrey Konovalov
9938c593b5 UPSTREAM: kasan, slub: handle pointer tags in early_kmem_cache_node_alloc
(Upstream commit 12b22386998ccf97497a49c88f9579cf9c0dee55).

The previous patch updated KASAN hooks signatures and their usage in SLAB
and SLUB code, except for the early_kmem_cache_node_alloc function.  This
patch handles that function separately, as it requires to reorder some of
the initialization code to correctly propagate a tagged pointer in case a
tag is assigned by kasan_kmalloc.

Link: http://lkml.kernel.org/r/fc8d0fdcf733a7a52e8d0daaa650f4736a57de8c.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: Ifcfcd2a679ebf9444a1c6525ad55b3a65a4a5941
2019-09-24 17:44:11 -07:00
Andrey Konovalov
0697e14f31 UPSTREAM: kasan, mm: change hooks signatures
(Upstream commit 0116523cfffa62aeb5aa3b85ce7419f3dae0c1b8).

Patch series "kasan: add software tag-based mode for arm64", v13.

This patchset adds a new software tag-based mode to KASAN [1].  (Initially
this mode was called KHWASAN, but it got renamed, see the naming rationale
at the end of this section).

The plan is to implement HWASan [2] for the kernel with the incentive,
that it's going to have comparable to KASAN performance, but in the same
time consume much less memory, trading that off for somewhat imprecise bug
detection and being supported only for arm64.

The underlying ideas of the approach used by software tag-based KASAN are:

1. By using the Top Byte Ignore (TBI) arm64 CPU feature, we can store
   pointer tags in the top byte of each kernel pointer.

2. Using shadow memory, we can store memory tags for each chunk of kernel
   memory.

3. On each memory allocation, we can generate a random tag, embed it into
   the returned pointer and set the memory tags that correspond to this
   chunk of memory to the same value.

4. By using compiler instrumentation, before each memory access we can add
   a check that the pointer tag matches the tag of the memory that is being
   accessed.

5. On a tag mismatch we report an error.

With this patchset the existing KASAN mode gets renamed to generic KASAN,
with the word "generic" meaning that the implementation can be supported
by any architecture as it is purely software.

The new mode this patchset adds is called software tag-based KASAN.  The
word "tag-based" refers to the fact that this mode uses tags embedded into
the top byte of kernel pointers and the TBI arm64 CPU feature that allows
to dereference such pointers.  The word "software" here means that shadow
memory manipulation and tag checking on pointer dereference is done in
software.  As it is the only tag-based implementation right now, "software
tag-based" KASAN is sometimes referred to as simply "tag-based" in this
patchset.

A potential expansion of this mode is a hardware tag-based mode, which
would use hardware memory tagging support (announced by Arm [3]) instead
of compiler instrumentation and manual shadow memory manipulation.

Same as generic KASAN, software tag-based KASAN is strictly a debugging
feature.

[1] https://www.kernel.org/doc/html/latest/dev-tools/kasan.html

[2] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html

[3] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a

====== Rationale

On mobile devices generic KASAN's memory usage is significant problem.
One of the main reasons to have tag-based KASAN is to be able to perform a
similar set of checks as the generic one does, but with lower memory
requirements.

Comment from Vishwath Mohan <vishwath@google.com>:

I don't have data on-hand, but anecdotally both ASAN and KASAN have proven
problematic to enable for environments that don't tolerate the increased
memory pressure well.  This includes

(a) Low-memory form factors - Wear, TV, Things, lower-tier phones like Go,
(c) Connected components like Pixel's visual core [1].

These are both places I'd love to have a low(er) memory footprint option at
my disposal.

Comment from Evgenii Stepanov <eugenis@google.com>:

Looking at a live Android device under load, slab (according to
/proc/meminfo) + kernel stack take 8-10% available RAM (~350MB).  KASAN's
overhead of 2x - 3x on top of it is not insignificant.

Not having this overhead enables near-production use - ex.  running
KASAN/KHWASAN kernel on a personal, daily-use device to catch bugs that do
not reproduce in test configuration.  These are the ones that often cost
the most engineering time to track down.

CPU overhead is bad, but generally tolerable.  RAM is critical, in our
experience.  Once it gets low enough, OOM-killer makes your life
miserable.

[1] https://www.blog.google/products/pixel/pixel-visual-core-image-processing-and-machine-learning-pixel-2/

====== Technical details

Software tag-based KASAN mode is implemented in a very similar way to the
generic one. This patchset essentially does the following:

1. TCR_TBI1 is set to enable Top Byte Ignore.

2. Shadow memory is used (with a different scale, 1:16, so each shadow
   byte corresponds to 16 bytes of kernel memory) to store memory tags.

3. All slab objects are aligned to shadow scale, which is 16 bytes.

4. All pointers returned from the slab allocator are tagged with a random
   tag and the corresponding shadow memory is poisoned with the same value.

5. Compiler instrumentation is used to insert tag checks. Either by
   calling callbacks or by inlining them (CONFIG_KASAN_OUTLINE and
   CONFIG_KASAN_INLINE flags are reused).

6. When a tag mismatch is detected in callback instrumentation mode
   KASAN simply prints a bug report. In case of inline instrumentation,
   clang inserts a brk instruction, and KASAN has it's own brk handler,
   which reports the bug.

7. The memory in between slab objects is marked with a reserved tag, and
   acts as a redzone.

8. When a slab object is freed it's marked with a reserved tag.

Bug detection is imprecise for two reasons:

1. We won't catch some small out-of-bounds accesses, that fall into the
   same shadow cell, as the last byte of a slab object.

2. We only have 1 byte to store tags, which means we have a 1/256
   probability of a tag match for an incorrect access (actually even
   slightly less due to reserved tag values).

Despite that there's a particular type of bugs that tag-based KASAN can
detect compared to generic KASAN: use-after-free after the object has been
allocated by someone else.

====== Testing

Some kernel developers voiced a concern that changing the top byte of
kernel pointers may lead to subtle bugs that are difficult to discover.
To address this concern deliberate testing has been performed.

It doesn't seem feasible to do some kind of static checking to find
potential issues with pointer tagging, so a dynamic approach was taken.
All pointer comparisons/subtractions have been instrumented in an LLVM
compiler pass and a kernel module that would print a bug report whenever
two pointers with different tags are being compared/subtracted (ignoring
comparisons with NULL pointers and with pointers obtained by casting an
error code to a pointer type) has been used.  Then the kernel has been
booted in QEMU and on an Odroid C2 board and syzkaller has been run.

This yielded the following results.

The two places that look interesting are:

is_vmalloc_addr in include/linux/mm.h
is_kernel_rodata in mm/util.c

Here we compare a pointer with some fixed untagged values to make sure
that the pointer lies in a particular part of the kernel address space.
Since tag-based KASAN doesn't add tags to pointers that belong to rodata
or vmalloc regions, this should work as is.  To make sure debug checks to
those two functions that check that the result doesn't change whether we
operate on pointers with or without untagging has been added.

A few other cases that don't look that interesting:

Comparing pointers to achieve unique sorting order of pointee objects
(e.g. sorting locks addresses before performing a double lock):

tty_ldisc_lock_pair_timeout in drivers/tty/tty_ldisc.c
pipe_double_lock in fs/pipe.c
unix_state_double_lock in net/unix/af_unix.c
lock_two_nondirectories in fs/inode.c
mutex_lock_double in kernel/events/core.c

ep_cmp_ffd in fs/eventpoll.c
fsnotify_compare_groups fs/notify/mark.c

Nothing needs to be done here, since the tags embedded into pointers
don't change, so the sorting order would still be unique.

Checks that a pointer belongs to some particular allocation:

is_sibling_entry in lib/radix-tree.c
object_is_on_stack in include/linux/sched/task_stack.h

Nothing needs to be done here either, since two pointers can only belong
to the same allocation if they have the same tag.

Overall, since the kernel boots and works, there are no critical bugs.
As for the rest, the traditional kernel testing way (use until fails) is
the only one that looks feasible.

Another point here is that tag-based KASAN is available under a separate
config option that needs to be deliberately enabled. Even though it might
be used in a "near-production" environment to find bugs that are not found
during fuzzing or running tests, it is still a debug tool.

====== Benchmarks

The following numbers were collected on Odroid C2 board. Both generic and
tag-based KASAN were used in inline instrumentation mode.

Boot time [1]:
* ~1.7 sec for clean kernel
* ~5.0 sec for generic KASAN
* ~5.0 sec for tag-based KASAN

Network performance [2]:
* 8.33 Gbits/sec for clean kernel
* 3.17 Gbits/sec for generic KASAN
* 2.85 Gbits/sec for tag-based KASAN

Slab memory usage after boot [3]:
* ~40 kb for clean kernel
* ~105 kb (~260% overhead) for generic KASAN
* ~47 kb (~20% overhead) for tag-based KASAN

KASAN memory overhead consists of three main parts:
1. Increased slab memory usage due to redzones.
2. Shadow memory (the whole reserved once during boot).
3. Quaratine (grows gradually until some preset limit; the more the limit,
   the more the chance to detect a use-after-free).

Comparing tag-based vs generic KASAN for each of these points:
1. 20% vs 260% overhead.
2. 1/16th vs 1/8th of physical memory.
3. Tag-based KASAN doesn't require quarantine.

[1] Time before the ext4 driver is initialized.
[2] Measured as `iperf -s & iperf -c 127.0.0.1 -t 30`.
[3] Measured as `cat /proc/meminfo | grep Slab`.

====== Some notes

A few notes:

1. The patchset can be found here:
   https://github.com/xairy/kasan-prototype/tree/khwasan

2. Building requires a recent Clang version (7.0.0 or later).

3. Stack instrumentation is not supported yet and will be added later.

This patch (of 25):

Tag-based KASAN changes the value of the top byte of pointers returned
from the kernel allocation functions (such as kmalloc).  This patch
updates KASAN hooks signatures and their usage in SLAB and SLUB code to
reflect that.

Link: http://lkml.kernel.org/r/aec2b5e3973781ff8a6bb6760f8543643202c451.1544099024.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I62e554e732ec79ffd195e2269c8a50aed14381c0
2019-09-24 17:44:11 -07:00
Clark Williams
52023a31dc UPSTREAM: mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t
(Upstream commit 026d1eaf5ef1a5d6258b46e4e411cd9f5ab8c41d).

The static lock quarantine_lock is used in quarantine.c to protect the
quarantine queue datastructures.  It is taken inside quarantine queue
manipulation routines (quarantine_put(), quarantine_reduce() and
quarantine_remove_cache()), with IRQs disabled.  This is not a problem on
a stock kernel but is problematic on an RT kernel where spin locks are
sleeping spinlocks, which can sleep and can not be acquired with disabled
interrupts.

Convert the quarantine_lock to a raw spinlock_t.  The usage of
quarantine_lock is confined to quarantine.c and the work performed while
the lock is held is used for debug purpose.

[bigeasy@linutronix.de: slightly altered the commit message]
Link: http://lkml.kernel.org/r/20181010214945.5owshc3mlrh74z4b@linutronix.de
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 128674696
Change-Id: I12f35246b81b23cad5ce8b90407c00b86bf90cc0
2019-09-24 17:44:11 -07:00
Greg Kroah-Hartman
8ca5759502 This is the 4.19.73 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1/KiEACgkQONu9yGCS
 aT49JBAAy7b3wv1WXAtg9wsyS1JL4HbMXt3YjtokIX+UpkznoqII4B85QftPBbiD
 9zDuTWPjhrqKv1GsMkFRCqBVp5wGVik1MIbjVuKdstFN5W8KQybpbYnSW4T52+wS
 cs6oOPkLydAfWzKeq+ekEeU8yr5dua+Ui3huundZ49wseJWQP3fh9T+ToUx8V/cr
 tsLiRRgI0djj7KQWVuM1j8YGKT/6qk/UL0HMVZyoIdLmsxpLap+LWe0+CRXn8rvs
 eJJlVQTVtYf/ySoHkpnwR12VsjRYjx6pNkm/GrebMCkM7wF/4RMqxk7j9EU0PENH
 VUdRrUd+j/YPp6QzjSFMK0+0eb7Gm3X0FEN0IGZshu1r/CDnoj/7hqnBmOlYIbhv
 pdteYaLqWq7JjAHu7vF+S4aNQRGpAZb05LsbTJ39Eu3FbdVTLXsAuUveZ7Y4/y0X
 ri2M3d/sF/cjc3C+V7Y7h422SM36jSAK6496VAoRyqqjX/3JyROhgfU9NAMzVr83
 4uI904z9lH4TZGOd5YQgX2VuOtBcGwa7+g6fy97u1tp8UxSWFZRGDDLRysF/dIJO
 Wi51UK0Q7EWnqBTe0TFF6TjE5tC7R3ZgzqEQ1MU4eLI5mqokg82DAK4Ub2Wk5Qch
 CGs5/d16OOrLtG2RoaOGz9UdQR7IHUXLSqkKbaEdstc16MXNXns=
 =cmGh
 -----END PGP SIGNATURE-----

Merge 4.19.73 into android-4.19

Changes in 4.19.73
	ALSA: hda - Fix potential endless loop at applying quirks
	ALSA: hda/realtek - Fix overridden device-specific initialization
	ALSA: hda/realtek - Add quirk for HP Pavilion 15
	ALSA: hda/realtek - Enable internal speaker & headset mic of ASUS UX431FL
	ALSA: hda/realtek - Fix the problem of two front mics on a ThinkCentre
	sched/fair: Don't assign runtime for throttled cfs_rq
	drm/vmwgfx: Fix double free in vmw_recv_msg()
	vhost/test: fix build for vhost test
	vhost/test: fix build for vhost test - again
	powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction
	batman-adv: fix uninit-value in batadv_netlink_get_ifindex()
	batman-adv: Only read OGM tvlv_len after buffer len check
	hv_sock: Fix hang when a connection is closed
	Blk-iolatency: warn on negative inflight IO counter
	blk-iolatency: fix STS_AGAIN handling
	{nl,mac}80211: fix interface combinations on crypto controlled devices
	timekeeping: Use proper ktime_add when adding nsecs in coarse offset
	selftests: fib_rule_tests: use pre-defined DEV_ADDR
	x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace()
	powerpc/64: mark start_here_multiplatform as __ref
	media: stm32-dcmi: fix irq = 0 case
	arm64: dts: rockchip: enable usb-host regulators at boot on rk3328-rock64
	scripts/decode_stacktrace: match basepath using shell prefix operator, not regex
	riscv: remove unused variable in ftrace
	nvme-fc: use separate work queue to avoid warning
	clk: s2mps11: Add used attribute to s2mps11_dt_match
	remoteproc: qcom: q6v5: shore up resource probe handling
	modules: always page-align module section allocations
	kernel/module: Fix mem leak in module_add_modinfo_attrs
	drm/i915: Re-apply "Perform link quality check, unconditionally during long pulse"
	media: cec/v4l2: move V4L2 specific CEC functions to V4L2
	media: cec: remove cec-edid.c
	scsi: qla2xxx: Move log messages before issuing command to firmware
	keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h
	Drivers: hv: kvp: Fix two "this statement may fall through" warnings
	x86, hibernate: Fix nosave_regions setup for hibernation
	remoteproc: qcom: q6v5-mss: add SCM probe dependency
	drm/amdgpu/gfx9: Update gfx9 golden settings.
	drm/amdgpu: Update gc_9_0 golden settings.
	KVM: x86: hyperv: enforce vp_index < KVM_MAX_VCPUS
	KVM: x86: hyperv: consistently use 'hv_vcpu' for 'struct kvm_vcpu_hv' variables
	KVM: x86: hyperv: keep track of mismatched VP indexes
	KVM: hyperv: define VP assist page helpers
	x86/kvm/lapic: preserve gfn_to_hva_cache len on cache reinit
	drm/i915: Fix intel_dp_mst_best_encoder()
	drm/i915: Rename PLANE_CTL_DECOMPRESSION_ENABLE
	drm/i915/gen9+: Fix initial readout for Y tiled framebuffers
	drm/atomic_helper: Disallow new modesets on unregistered connectors
	Drivers: hv: kvp: Fix the indentation of some "break" statements
	Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up
	powerplay: Respect units on max dcfclk watermark
	drm/amd/pp: Fix truncated clock value when set watermark
	drm/amd/dm: Understand why attaching path/tile properties are needed
	ARM: davinci: da8xx: define gpio interrupts as separate resources
	ARM: davinci: dm365: define gpio interrupts as separate resources
	ARM: davinci: dm646x: define gpio interrupts as separate resources
	ARM: davinci: dm355: define gpio interrupts as separate resources
	ARM: davinci: dm644x: define gpio interrupts as separate resources
	s390/zcrypt: reinit ap queue state machine during device probe
	media: vim2m: use workqueue
	media: vim2m: use cancel_delayed_work_sync instead of flush_schedule_work
	drm/i915: Restore sane defaults for KMS on GEM error load
	drm/i915: Cleanup gt powerstate from gem
	KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switch
	Btrfs: clean up scrub is_dev_replace parameter
	Btrfs: fix deadlock with memory reclaim during scrub
	btrfs: Remove extent_io_ops::fill_delalloc
	btrfs: Fix error handling in btrfs_cleanup_ordered_extents
	scsi: megaraid_sas: Fix combined reply queue mode detection
	scsi: megaraid_sas: Add check for reset adapter bit
	scsi: megaraid_sas: Use 63-bit DMA addressing
	powerpc/pkeys: Fix handling of pkey state across fork()
	btrfs: volumes: Make sure no dev extent is beyond device boundary
	btrfs: Use real device structure to verify dev extent
	media: vim2m: only cancel work if it is for right context
	ARC: show_regs: lockdep: re-enable preemption
	ARC: mm: do_page_fault fixes #1: relinquish mmap_sem if signal arrives while handle_mm_fault
	IB/uverbs: Fix OOPs upon device disassociation
	crypto: ccree - fix resume race condition on init
	crypto: ccree - add missing inline qualifier
	drm/vblank: Allow dynamic per-crtc max_vblank_count
	drm/i915/ilk: Fix warning when reading emon_status with no output
	mfd: Kconfig: Fix I2C_DESIGNWARE_PLATFORM dependencies
	tpm: Fix some name collisions with drivers/char/tpm.h
	bcache: replace hard coded number with BUCKET_GC_GEN_MAX
	bcache: treat stale && dirty keys as bad keys
	KVM: VMX: Compare only a single byte for VMCS' "launched" in vCPU-run
	iio: adc: exynos-adc: Add S5PV210 variant
	dt-bindings: iio: adc: exynos-adc: Add S5PV210 variant
	iio: adc: exynos-adc: Use proper number of channels for Exynos4x12
	mt76: fix corrupted software generated tx CCMP PN
	drm/nouveau: Don't WARN_ON VCPI allocation failures
	iwlwifi: fix devices with PCI Device ID 0x34F0 and 11ac RF modules
	iwlwifi: add new card for 9260 series
	x86/kvmclock: set offset for kvm unstable clock
	spi: spi-gpio: fix SPI_CS_HIGH capability
	powerpc/kvm: Save and restore host AMR/IAMR/UAMOR
	mmc: renesas_sdhi: Fix card initialization failure in high speed mode
	btrfs: scrub: pass fs_info to scrub_setup_ctx
	btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
	btrfs: scrub: fix circular locking dependency warning
	btrfs: init csum_list before possible free
	PCI: qcom: Fix error handling in runtime PM support
	PCI: qcom: Don't deassert reset GPIO during probe
	drm: add __user attribute to ptr_to_compat()
	CIFS: Fix error paths in writeback code
	CIFS: Fix leaking locked VFS cache pages in writeback retry
	drm/i915: Handle vm_mmap error during I915_GEM_MMAP ioctl with WC set
	drm/i915: Sanity check mmap length against object size
	usb: typec: tcpm: Try PD-2.0 if sink does not respond to 3.0 source-caps
	arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's
	IB/mlx5: Reset access mask when looping inside page fault handler
	kvm: mmu: Fix overflow on kvm mmu page limit calculation
	x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
	KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
	cifs: Fix lease buffer length error
	media: i2c: tda1997x: select V4L2_FWNODE
	ext4: protect journal inode's blocks using block_validity
	ARM: dts: qcom: ipq4019: fix PCI range
	ARM: dts: qcom: ipq4019: Fix MSI IRQ type
	ARM: dts: qcom: ipq4019: enlarge PCIe BAR range
	dt-bindings: mmc: Add supports-cqe property
	dt-bindings: mmc: Add disable-cqe-dcmd property.
	PCI: Add macro for Switchtec quirk declarations
	PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary
	dm mpath: fix missing call of path selector type->end_io
	blk-mq: free hw queue's resource in hctx's release handler
	mmc: sdhci-pci: Add support for Intel CML
	PCI: dwc: Use devm_pci_alloc_host_bridge() to simplify code
	cifs: smbd: take an array of reqeusts when sending upper layer data
	dm crypt: move detailed message into debug level
	signal/arc: Use force_sig_fault where appropriate
	ARC: mm: fix uninitialised signal code in do_page_fault
	ARC: mm: SIGSEGV userspace trying to access kernel virtual memory
	drm/amdkfd: Add missing Polaris10 ID
	kvm: Check irqchip mode before assign irqfd
	drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2)
	drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc
	Btrfs: fix race between block group removal and block group allocation
	cifs: add spinlock for the openFileList to cifsInodeInfo
	clk: tegra: Fix maximum audio sync clock for Tegra124/210
	clk: tegra210: Fix default rates for HDA clocks
	IB/hfi1: Avoid hardlockup with flushlist_lock
	apparmor: reset pos on failure to unpack for various functions
	scsi: target/core: Use the SECTOR_SHIFT constant
	scsi: target/iblock: Fix overrun in WRITE SAME emulation
	staging: wilc1000: fix error path cleanup in wilc_wlan_initialize()
	scsi: zfcp: fix request object use-after-free in send path causing wrong traces
	cifs: Properly handle auto disabling of serverino option
	ALSA: hda - Don't resume forcibly i915 HDMI/DP codec
	ceph: use ceph_evict_inode to cleanup inode's resource
	KVM: x86: optimize check for valid PAT value
	KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value
	KVM: VMX: Fix handling of #MC that occurs during VM-Entry
	KVM: VMX: check CPUID before allowing read/write of IA32_XSS
	KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu struct
	KVM: PPC: Book3S HV: Fix CR0 setting in TM emulation
	ARM: dts: gemini: Set DIR-685 SPI CS as active low
	RDMA/srp: Document srp_parse_in() arguments
	RDMA/srp: Accept again source addresses that do not have a port number
	btrfs: correctly validate compression type
	resource: Include resource end in walk_*() interfaces
	resource: Fix find_next_iomem_res() iteration issue
	resource: fix locking in find_next_iomem_res()
	pstore: Fix double-free in pstore_mkfile() failure path
	dm thin metadata: check if in fail_io mode when setting needs_check
	drm/panel: Add support for Armadeus ST0700 Adapt
	ALSA: hda - Fix intermittent CORB/RIRB stall on Intel chips
	powerpc/mm: Limit rma_size to 1TB when running without HV mode
	iommu/iova: Remove stale cached32_node
	gpio: don't WARN() on NULL descs if gpiolib is disabled
	i2c: at91: disable TXRDY interrupt after sending data
	i2c: at91: fix clk_offset for sama5d2
	mm/migrate.c: initialize pud_entry in migrate_vma()
	iio: adc: gyroadc: fix uninitialized return code
	NFSv4: Fix delegation state recovery
	bcache: only clear BTREE_NODE_dirty bit when it is set
	bcache: add comments for mutex_lock(&b->write_lock)
	bcache: fix race in btree_flush_write()
	drm/i915: Make sure cdclk is high enough for DP audio on VLV/CHV
	virtio/s390: fix race on airq_areas[]
	drm/atomic_helper: Allow DPMS On<->Off changes for unregistered connectors
	ext4: don't perform block validity checks on the journal inode
	ext4: fix block validity checks for journal inodes using indirect blocks
	ext4: unsigned int compared against zero
	PCI: Reset both NVIDIA GPU and HDA in ThinkPad P50 workaround
	powerpc/tm: Remove msr_tm_active()
	powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts
	vhost: make sure log_num < in_num
	Linux 4.19.73

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7bc57825aeb36759bb8e8726888da9af06392c09
2019-09-16 09:35:02 +02:00
Ralph Campbell
2e7e7c8f94 mm/migrate.c: initialize pud_entry in migrate_vma()
[ Upstream commit 7b358c6f12dc82364f6d317f8c8f1d794adbc3f5 ]

When CONFIG_MIGRATE_VMA_HELPER is enabled, migrate_vma() calls
migrate_vma_collect() which initializes a struct mm_walk but didn't
initialize mm_walk.pud_entry.  (Found by code inspection) Use a C
structure initialization to make sure it is set to NULL.

Link: http://lkml.kernel.org/r/20190719233225.12243-1-rcampbell@nvidia.com
Fixes: 8763cb45ab ("mm/migrate: new memory migration helper for use with device memory")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16 08:22:22 +02:00
Greg Kroah-Hartman
6b1f307bd0 This is the 4.19.70 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1yF0AACgkQONu9yGCS
 aT64ew/6AzJDRMcmnx1COeRP8tfQ5A8ghjnp6REEca1MYJWjDlqd0X+EMd/7zorZ
 YkBzb1ND1c/9KeGQzdx8lJ5lcNRVJcD7tT6irT5zMnBcyR9uPamAaVgmVHXxuorK
 el4nvX9g/MxBLsuqLHxGr5pNXi7mHu6zfXyQ86TJzlez7yHlPuiJb8bDpHnCRJ1P
 n/MemPq/nQgC5jPBQRhT+IpqC1MTIHNhRaHHg/5Gdrrz+eVumnk+1zbWqtBRuJKS
 qS7RL1pI0Su00i5bY1r76iSkoRkGw9SoeIgz3sycbtAvGo9TI16/hZPvWAsYOAjL
 2DQS0rMPSnM4QV0odbUImFt86f0YJiAL7xYS8EYCc4GX/eLbNRtP8yLq+8rlo4Oa
 36HbiNhGSEjRxenfVRUD/STgBYzfVeQOMEyFJNRtfNDP/l66sLk/pEOEd83j06H4
 G87BJgKFC35dv5QrbCmJO8P1IXLs5QaChD3dL6R9/hbvCU2A/MOqhPL16JCXWA20
 +hOWn8ryrtBa5Dt0avAkrrnUNC8cVWyD44uAm+Hu/49CZpkTasZMB7Z81VSIsuvO
 xoT1Jx0J1W/LtHwCghSkui/fjQjVpkP3xnB7zVGem73Mpcm68g6mLajKaUrLnJHp
 /sz2mppF7gZob43anTtUhSV9OvrzqftkR+iFg3rCKSJyQE1o48o=
 =hMvg
 -----END PGP SIGNATURE-----

Merge 4.19.70 into android-4.19

Changes in 4.19.70
	dmaengine: ste_dma40: fix unneeded variable warning
	nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns
	afs: Fix the CB.ProbeUuid service handler to reply correctly
	afs: Fix loop index mixup in afs_deliver_vl_get_entry_by_name_u()
	fs: afs: Fix a possible null-pointer dereference in afs_put_read()
	afs: Only update d_fsdata if different in afs_d_revalidate()
	nvmet-loop: Flush nvme_delete_wq when removing the port
	nvme: fix a possible deadlock when passthru commands sent to a multipath device
	nvme-pci: Fix async probe remove race
	soundwire: cadence_master: fix register definition for SLAVE_STATE
	soundwire: cadence_master: fix definitions for INTSTAT0/1
	auxdisplay: panel: need to delete scan_timer when misc_register fails in panel_attach
	dmaengine: stm32-mdma: Fix a possible null-pointer dereference in stm32_mdma_irq_handler()
	omap-dma/omap_vout_vrfb: fix off-by-one fi value
	iommu/dma: Handle SG length overflow better
	usb: gadget: composite: Clear "suspended" on reset/disconnect
	usb: gadget: mass_storage: Fix races between fsg_disable and fsg_set_alt
	xen/blkback: fix memory leaks
	arm64: cpufeature: Don't treat granule sizes as strict
	i2c: rcar: avoid race when unregistering slave client
	i2c: emev2: avoid race when unregistering slave client
	drm/ast: Fixed reboot test may cause system hanged
	usb: host: fotg2: restart hcd after port reset
	tools: hv: fixed Python pep8/flake8 warnings for lsvmbus
	tools: hv: fix KVP and VSS daemons exit code
	drm/i915: fix broadwell EU computation
	watchdog: bcm2835_wdt: Fix module autoload
	drm/bridge: tfp410: fix memleak in get_modes()
	scsi: ufs: Fix RX_TERMINATION_FORCE_ENABLE define value
	drm/tilcdc: Register cpufreq notifier after we have initialized crtc
	net/tls: Fixed return value when tls_complete_pending_work() fails
	net/tls: swap sk_write_space on close
	net: tls, fix sk_write_space NULL write when tx disabled
	ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is set
	ipv6: Default fib6_type to RTN_UNICAST when not set
	net/smc: make sure EPOLLOUT is raised
	tcp: make sure EPOLLOUT wont be missed
	ipv4/icmp: fix rt dst dev null pointer dereference
	mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n
	ALSA: usb-audio: Check mixer unit bitmap yet more strictly
	ALSA: line6: Fix memory leak at line6_init_pcm() error path
	ALSA: hda - Fixes inverted Conexant GPIO mic mute led
	ALSA: seq: Fix potential concurrent access to the deleted pool
	ALSA: usb-audio: Fix invalid NULL check in snd_emuusb_set_samplerate()
	ALSA: usb-audio: Add implicit fb quirk for Behringer UFX1604
	kvm: x86: skip populating logical dest map if apic is not sw enabled
	KVM: x86: Don't update RIP or do single-step on faulting emulation
	uprobes/x86: Fix detection of 32-bit user mode
	x86/apic: Do not initialize LDR and DFR for bigsmp
	x86/apic: Include the LDR when clearing out APIC registers
	ftrace: Fix NULL pointer dereference in t_probe_next()
	ftrace: Check for successful allocation of hash
	ftrace: Check for empty hash and comment the race with registering probes
	usb-storage: Add new JMS567 revision to unusual_devs
	USB: cdc-wdm: fix race between write and disconnect due to flag abuse
	usb: hcd: use managed device resources
	usb: chipidea: udc: don't do hardware access if gadget has stopped
	usb: host: ohci: fix a race condition between shutdown and irq
	usb: host: xhci: rcar: Fix typo in compatible string matching
	USB: storage: ums-realtek: Update module parameter description for auto_delink_en
	USB: storage: ums-realtek: Whitelist auto-delink support
	mei: me: add Tiger Lake point LP device ID
	mmc: sdhci-of-at91: add quirk for broken HS200
	mmc: core: Fix init of SD cards reporting an invalid VDD range
	stm class: Fix a double free of stm_source_device
	intel_th: pci: Add support for another Lewisburg PCH
	intel_th: pci: Add Tiger Lake support
	typec: tcpm: fix a typo in the comparison of pdo_max_voltage
	fsi: scom: Don't abort operations for minor errors
	lib: logic_pio: Fix RCU usage
	lib: logic_pio: Avoid possible overlap for unregistering regions
	lib: logic_pio: Add logic_pio_unregister_range()
	drm/amdgpu: Add APTX quirk for Dell Latitude 5495
	drm/i915: Don't deballoon unused ggtt drm_mm_node in linux guest
	drm/i915: Call dma_set_max_seg_size() in i915_driver_hw_probe()
	bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
	bus: hisi_lpc: Add .remove method to avoid driver unbind crash
	VMCI: Release resource if the work is already queued
	crypto: ccp - Ignore unconfigured CCP device on suspend/resume
	Revert "cfg80211: fix processing world regdomain when non modular"
	mac80211: fix possible sta leak
	mac80211: Don't memset RXCB prior to PAE intercept
	mac80211: Correctly set noencrypt for PAE frames
	KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
	KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is long
	KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI
	NFS: Clean up list moves of struct nfs_page
	NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend()
	NFS: Pass error information to the pgio error cleanup routine
	NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0
	i2c: piix4: Fix port selection for AMD Family 16h Model 30h
	x86/ptrace: fix up botched merge of spectrev1 fix
	mt76: mt76x0u: do not reset radio on resume
	Revert "ASoC: Fail card instantiation if DAI format setup fails"
	Linux 4.19.70

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I35ff8a403a05a8c66d87cb4b542997e63c422288
2019-09-06 12:08:42 +02:00
Andrew Morton
5dd2db1ab0 mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n
commit 441e254cd40dc03beec3c650ce6ce6074bc6517f upstream.

Fixes: 701d678599d0c1 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool")
Link: http://lkml.kernel.org/r/201908251039.5oSbEEUT%25lkp@intel.com
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-06 10:22:08 +02:00
Greg Kroah-Hartman
12dc90c620 This is the 4.19.69 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1ncKwACgkQONu9yGCS
 aT6FPg//RiiJo8O+CUzkP4MFohy8JUuGC1MnnSfSFJn9bzAljWhYtSoJlZ9PbfHq
 qx9oWuQNNVZRn9nWuRbTRfRlz6ztc7whsjhAth4eNCtXvu+xAvFLvFhlbVt6xiZ0
 Wg3jXDtIY3Y8km01uJdzVk/juUqvTU8nioM4s1OWTFRfOfakLMK9CkxOKfZMFnxP
 mVILTcOxZAf0Js2tRMRPvm8c6OhegkXZjWUhGMlvmFKk/pqUouVXH8pKbBoTj8zR
 VoHB6pWs3YG3S15WgkNfKiR9WpeXywC7XN9ilziczaQ8HbsH6Y+5wM9Ncx+3FVxd
 mzygLCtlckYWjiabS/w3tQHrH+LV8MaPYuW/2tlL9sBljlDW5RBW+g7vaBQjIpCK
 gco/z0qmeEIYt8ktLL08i9FQBPp00Fra9x3jZKLz8Tp+W//EBm4ENMm2cxHtRKtd
 3fG70ngJmycksCK/e8N466/1f/aAOxfBZkog3R/4yqNvOX8rkQGRJlhv0AzIdsy8
 RlTDotDwwQFbMWROVs/Jea+9Wwp71jlPOXyMqX2EqUR/WfDWuIZEyzSdbqKPNgHL
 Q9OoUB7kIKGusqQC6ABYKtDn7T046o6ePEB8r2aIvPmw5AiWMefSubHNV6mP3KJb
 0NbQlUelMTvxuwm5NLMY2EWbWi+jvg4OgJvajwcDretTPeBGMZI=
 =945Y
 -----END PGP SIGNATURE-----

Merge 4.19.69 into android-4.19

Changes in 4.19.69
	HID: Add 044f:b320 ThrustMaster, Inc. 2 in 1 DT
	MIPS: kernel: only use i8253 clocksource with periodic clockevent
	mips: fix cacheinfo
	netfilter: ebtables: fix a memory leak bug in compat
	ASoC: dapm: Fix handling of custom_stop_condition on DAPM graph walks
	selftests/bpf: fix sendmsg6_prog on s390
	bonding: Force slave speed check after link state recovery for 802.3ad
	net: mvpp2: Don't check for 3 consecutive Idle frames for 10G links
	selftests: forwarding: gre_multipath: Enable IPv4 forwarding
	selftests: forwarding: gre_multipath: Fix flower filters
	can: dev: call netif_carrier_off() in register_candev()
	can: mcp251x: add error check when wq alloc failed
	can: gw: Fix error path of cgw_module_init
	ASoC: Fail card instantiation if DAI format setup fails
	st21nfca_connectivity_event_received: null check the allocation
	st_nci_hci_connectivity_event_received: null check the allocation
	ASoC: rockchip: Fix mono capture
	ASoC: ti: davinci-mcasp: Correct slot_width posed constraint
	net: usb: qmi_wwan: Add the BroadMobi BM818 card
	qed: RDMA - Fix the hw_ver returned in device attributes
	isdn: mISDN: hfcsusb: Fix possible null-pointer dereferences in start_isoc_chain()
	mac80211_hwsim: Fix possible null-pointer dereferences in hwsim_dump_radio_nl()
	netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too
	netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets
	netfilter: ipset: Fix rename concurrency with listing
	rxrpc: Fix potential deadlock
	rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packet
	isdn: hfcsusb: Fix mISDN driver crash caused by transfer buffer on the stack
	net: phy: phy_led_triggers: Fix a possible null-pointer dereference in phy_led_trigger_change_speed()
	perf bench numa: Fix cpu0 binding
	can: sja1000: force the string buffer NULL-terminated
	can: peak_usb: force the string buffer NULL-terminated
	net/ethernet/qlogic/qed: force the string buffer NULL-terminated
	NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
	NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts
	HID: quirks: Set the INCREMENT_USAGE_ON_DUPLICATE quirk on Saitek X52
	HID: input: fix a4tech horizontal wheel custom usage
	drm/rockchip: Suspend DP late
	SMB3: Fix potential memory leak when processing compound chain
	SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
	s390: put _stext and _etext into .text section
	net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
	net: stmmac: Fix issues when number of Queues >= 4
	net: stmmac: tc: Do not return a fragment entry
	net: hisilicon: make hip04_tx_reclaim non-reentrant
	net: hisilicon: fix hip04-xmit never return TX_BUSY
	net: hisilicon: Fix dma_map_single failed on arm64
	libata: have ata_scsi_rw_xlat() fail invalid passthrough requests
	libata: add SG safety checks in SFF pio transfers
	x86/lib/cpu: Address missing prototypes warning
	drm/vmwgfx: fix memory leak when too many retries have occurred
	block, bfq: handle NULL return value by bfq_init_rq()
	perf ftrace: Fix failure to set cpumask when only one cpu is present
	perf cpumap: Fix writing to illegal memory in handling cpumap mask
	perf pmu-events: Fix missing "cpu_clk_unhalted.core" event
	KVM: arm64: Don't write junk to sysregs on reset
	KVM: arm: Don't write junk to CP15 registers on reset
	selftests: kvm: Adding config fragments
	HID: wacom: correct misreported EKR ring values
	HID: wacom: Correct distance scale for 2nd-gen Intuos devices
	Revert "dm bufio: fix deadlock with loop device"
	clk: socfpga: stratix10: fix rate caclulationg for cnt_clks
	ceph: clear page dirty before invalidate page
	ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
	libceph: fix PG split vs OSD (re)connect race
	drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX
	gpiolib: never report open-drain/source lines as 'input' to user-space
	Drivers: hv: vmbus: Fix virt_to_hvpfn() for X86_PAE
	userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
	x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
	x86/apic: Handle missing global clockevent gracefully
	x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
	x86/boot: Save fields explicitly, zero out everything else
	x86/boot: Fix boot regression caused by bootparam sanitizing
	dm kcopyd: always complete failed jobs
	dm btree: fix order of block initialization in btree_split_beneath
	dm integrity: fix a crash due to BUG_ON in __journal_read_write()
	dm raid: add missing cleanup in raid_ctr()
	dm space map metadata: fix missing store of apply_bops() return value
	dm table: fix invalid memory accesses with too high sector number
	dm zoned: improve error handling in reclaim
	dm zoned: improve error handling in i/o map code
	dm zoned: properly handle backing device failure
	genirq: Properly pair kobject_del() with kobject_add()
	mm, page_owner: handle THP splits correctly
	mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
	mm/zsmalloc.c: fix race condition in zs_destroy_pool
	xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
	xfs: don't trip over uninitialized buffer on extent read of corrupted inode
	xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
	xfs: Add helper function xfs_attr_try_sf_addname
	xfs: Add attibute set and helper functions
	xfs: Add attibute remove and helper functions
	xfs: always rejoin held resources during defer roll
	dm zoned: fix potential NULL dereference in dmz_do_reclaim()
	powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB
	rxrpc: Fix local endpoint refcounting
	rxrpc: Fix read-after-free in rxrpc_queue_local()
	rxrpc: Fix local endpoint replacement
	rxrpc: Fix local refcounting
	Linux 4.19.69

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9824a29e0434a6a80e2f32fdb88c0ac1fe8e5af5
2019-09-02 17:39:29 +02:00
Laura Abbott
c41e798f4f UPSTREAM: mm: slub: Fix slab walking for init_on_free
Upstream commit 1b7e816fc80e ("mm: slub: Fix slab walking for
init_on_free").

To properly clear the slab on free with slab_want_init_on_free, we walk
the list of free objects using get_freepointer/set_freepointer.

The value we get from get_freepointer may not be valid.  This isn't an
issue since an actual value will get written later but this means
there's a chance of triggering a bug if we use this value with
set_freepointer:

  kernel BUG at mm/slub.c:306!
  invalid opcode: 0000 [#1] PREEMPT PTI
  CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-05754-g6471384a #4
  RIP: 0010:kfree+0x58a/0x5c0
  Code: 48 83 05 78 37 51 02 01 0f 0b 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 d6 37 51 02 01 <0f> 0b 48 83 05 d4 37 51 02 01 48 83 05 d4 37 51 02 01 48 83 05 d4
  RSP: 0000:ffffffff82603d90 EFLAGS: 00010002
  RAX: ffff8c3976c04320 RBX: ffff8c3976c04300 RCX: 0000000000000000
  RDX: ffff8c3976c04300 RSI: 0000000000000000 RDI: ffff8c3976c04320
  RBP: ffffffff82603db8 R08: 0000000000000000 R09: 0000000000000000
  R10: ffff8c3976c04320 R11: ffffffff8289e1e0 R12: ffffd52cc8db0100
  R13: ffff8c3976c01a00 R14: ffffffff810f10d4 R15: ffff8c3976c04300
  FS:  0000000000000000(0000) GS:ffffffff8266b000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff8c397ffff000 CR3: 0000000125020000 CR4: 00000000000406b0
  Call Trace:
   apply_wqattrs_prepare+0x154/0x280
   apply_workqueue_attrs_locked+0x4e/0xe0
   apply_workqueue_attrs+0x36/0x60
   alloc_workqueue+0x25a/0x6d0
   workqueue_init_early+0x246/0x348
   start_kernel+0x3c7/0x7ec
   x86_64_start_reservations+0x40/0x49
   x86_64_start_kernel+0xda/0xe4
   secondary_startup_64+0xb6/0xc0
  Modules linked in:
  ---[ end trace f67eb9af4d8d492b ]---

Fix this by ensuring the value we set with set_freepointer is either NULL
or another value in the chain.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Change-Id: I89414f6bf1559771d5d7c14853ddfe9f769989e5
Bug: 138435492
Test: Boot cuttlefish with and without
Test:   CONFIG_INIT_ON_ALLOC_DEFAULT_ON/CONFIG_INIT_ON_FREE_DEFAULT_ON
Signed-off-by: Alexander Potapenko <glider@google.com>
2019-08-30 12:03:55 +02:00
Alexander Potapenko
a5587d8fce UPSTREAM: mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options
Upstream commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1
and init_on_free=1 boot options").

Patch series "add init_on_alloc/init_on_free boot options", v10.

Provide init_on_alloc and init_on_free boot options.

These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.

Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AU]B is initialized with zeroes.  SLOB allocator
isn't supported at the moment, as its emulation of kmem caches complicates
handling of SLAB_TYPESAFE_BY_RCU caches correctly.

Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.

As suggested by Michal Hocko, right now we don't let the heap users to
disable initialization for certain allocations.  There's not enough
evidence that doing so can speed up real-life cases, and introducing ways
to opt-out may result in things going out of control.

This patch (of 2):

The new options are needed to prevent possible information leaks and make
control-flow bugs that depend on uninitialized values more deterministic.

This is expected to be on-by-default on Android and Chrome OS.  And it
gives the opportunity for anyone else to use it under distros too via the
boot args.  (The init_on_free feature is regularly requested by folks
where memory forensics is included in their threat models.)

init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes.  Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.

init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion.  This helps to ensure sensitive data
doesn't leak via use-after-free accesses.

Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory.  The two exceptions are slab caches with
constructors and SLAB_TYPESAFE_BY_RCU flag.  Those are never
zero-initialized to preserve their semantics.

Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.

If either SLUB poisoning or page poisoning is enabled, those options take
precedence over init_on_alloc and init_on_free: initialization is only
applied to unpoisoned allocations.

Slowdown for the new features compared to init_on_free=0, init_on_alloc=0:

hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)

Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline
is within the standard error.

The new features are also going to pave the way for hardware memory
tagging (e.g.  arm64's MTE), which will require both on_alloc and on_free
hooks to set the tags for heap objects.  With MTE, tagging will have the
same cost as memory initialization.

Although init_on_free is rather costly, there are paranoid use-cases where
in-memory data lifetime is desired to be minimized.  There are various
arguments for/against the realism of the associated threat models, but
given that we'll need the infrastructure for MTE anyway, and there are
people who want wipe-on-free behavior no matter what the performance cost,
it seems reasonable to include it in this series.

[glider@google.com: v8]
  Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com
[glider@google.com: v9]
  Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com
[glider@google.com: v10]
  Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com
Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.cz>		[page and dmapool parts
Acked-by: James Morris <jamorris@linux.microsoft.com>]
Cc: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Change-Id: If0620a6a8aed34c21e98458c965e94f5a9dfd297
Bug: 138435492
Test: Boot cuttlefish with and without
Test:   CONFIG_INIT_ON_ALLOC_DEFAULT_ON/CONFIG_INIT_ON_FREE_DEFAULT_ON
Signed-off-by: Alexander Potapenko <glider@google.com>
2019-08-30 11:58:12 +02:00
Henry Burns
ed11e60033 mm/zsmalloc.c: fix race condition in zs_destroy_pool
commit 701d678599d0c1623aaf4139c03eea260a75b027 upstream.

In zs_destroy_pool() we call flush_work(&pool->free_work).  However, we
have no guarantee that migration isn't happening in the background at
that time.

Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages.  But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work().  Which would mean
pages still pointing at the inode when we free it.

Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages).  This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding.  Keeping that state under the class spinlock
keeps the logic straightforward.

In this case a memory leak could lead to an eventual crash if compaction
hits the leaked page.  This crash would only occur if people are
changing their zswap backend at runtime (which eventually starts
destruction).

Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-29 08:28:57 +02:00
Henry Burns
b30a2f608e mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
commit 1a87aa03597efa9641e92875b883c94c7f872ccb upstream.

In zs_page_migrate() we call putback_zspage() after we have finished
migrating all pages in this zspage.  However, the return value is
ignored.  If a zs_free() races in between zs_page_isolate() and
zs_page_migrate(), freeing the last object in the zspage,
putback_zspage() will leave the page in ZS_EMPTY for potentially an
unbounded amount of time.

To fix this, we need to do the same thing as zs_page_putback() does:
schedule free_work to occur.

To avoid duplicated code, move the sequence to a new
putback_zspage_deferred() function which both zs_page_migrate() and
zs_page_putback() call.

Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-29 08:28:57 +02:00
Vlastimil Babka
db67ac0316 mm, page_owner: handle THP splits correctly
commit f7da677bc6e72033f0981b9d58b5c5d409fa641e upstream.

THP splitting path is missing the split_page_owner() call that
split_page() has.

As a result, split THP pages are wrongly reported in the page_owner file
as order-9 pages.  Furthermore when the former head page is freed, the
remaining former tail pages are not listed in the page_owner file at
all.  This patch fixes that by adding the split_page_owner() call into
__split_huge_page().

Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz
Fixes: a9627bc5e3 ("mm/page_owner: introduce split_page_owner and replace manual handling")
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-29 08:28:57 +02:00
Greg Kroah-Hartman
bf707e4df1 This is the 4.19.68 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1iS0cACgkQONu9yGCS
 aT7zbhAAyPU6KNVLa1Pj/xQf8puJ3v+FTws0X9Ii9zyWCNNuTXSi8mf4arP8oMjH
 NsYvhCGBHIQO0l3kFJRnOMLp0pPZPPUgHLvFWQljKWRABUOzMLjCWolnjegIPo8E
 cUwYkwx5T5oZtYH7ScxfQLiQUJB28L65+gi+/LBqANcEYL6WaSa2+UIBeVUzbMTt
 tQw6TAmE6f/kGAbmiQFeIdHj6Q9MrQyGBwTdhVSe+OENlZnZq8pxbwy3GXwEBuOP
 rtmUfZrSxdUmWBa7+/oW14TBe/c6j2LT0tVoyZdUFGZNbOUNv4vEImXghd28YSuv
 fppeSkom5di+RDH+B+LCNm+rV5vbfQqGTMuwBT6do2EDuQQ5KqS5kR8tULI4GKVl
 pNejWeK2qcNNloC7imH+4rHIy9AFewKz1ixbGolSaPXyCYfYEBx1xfpw4npng2+X
 aWJuk7/DnEEdzeKu9msdycQpf0aT5vkJqquBZQTzd5HAMlgqAF/sjvKjIsaUPNu1
 wDc+tyyAF0lObR7aswuhvttZ/8yczMW6pOJiM/XAfFn/a+Y7V2FRedGcIsxyeKOw
 YMNoW6VskIunShpbKhxVjPViI27NM0o8vmwCKmRQ6FRpxOX6vViNj+uQOjGQxfez
 N5g9jqxLsq9YGVQfoseS2Taao8JkBrTOW0wiJtE3/nwt/FaeHpU=
 =M37u
 -----END PGP SIGNATURE-----

Merge 4.19.68 into android-4.19

Changes in 4.19.68
	sh: kernel: hw_breakpoint: Fix missing break in switch statement
	seq_file: fix problem when seeking mid-record
	mm/hmm: fix bad subpage pointer in try_to_unmap_one
	mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
	mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
	mm/memcontrol.c: fix use after free in mem_cgroup_iter()
	mm/usercopy: use memory range to be accessed for wraparound check
	Revert "pwm: Set class for exported channels in sysfs"
	cpufreq: schedutil: Don't skip freq update when limits change
	xtensa: add missing isync to the cpu_reset TLB code
	ALSA: hda/realtek - Add quirk for HP Envy x360
	ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term
	ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit
	ALSA: hda - Apply workaround for another AMD chip 1022:1487
	ALSA: hda - Fix a memory leak bug
	ALSA: hda - Add a generic reboot_notify
	ALSA: hda - Let all conexant codec enter D3 when rebooting
	HID: holtek: test for sanity of intfdata
	HID: hiddev: avoid opening a disconnected device
	HID: hiddev: do cleanup in failure of opening a device
	Input: kbtab - sanity check for endpoint type
	Input: iforce - add sanity checks
	net: usb: pegasus: fix improper read if get_registers() fail
	netfilter: ebtables: also count base chain policies
	riscv: Make __fstate_clean() work correctly.
	clk: at91: generated: Truncate divisor to GENERATED_MAX_DIV + 1
	clk: sprd: Select REGMAP_MMIO to avoid compile errors
	clk: renesas: cpg-mssr: Fix reset control race condition
	xen/pciback: remove set but not used variable 'old_state'
	irqchip/gic-v3-its: Free unused vpt_page when alloc vpe table fail
	irqchip/irq-imx-gpcv2: Forward irq type to parent
	perf header: Fix divide by zero error if f_header.attr_size==0
	perf header: Fix use of unitialized value warning
	libata: zpodd: Fix small read overflow in zpodd_get_mech_type()
	drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m
	Btrfs: fix deadlock between fiemap and transaction commits
	scsi: hpsa: correct scsi command status issue after reset
	scsi: qla2xxx: Fix possible fcport null-pointer dereferences
	drm/amdgpu: fix a potential information leaking bug
	ata: libahci: do not complain in case of deferred probe
	kbuild: modpost: handle KBUILD_EXTRA_SYMBOLS only for external modules
	kbuild: Check for unknown options with cc-option usage in Kconfig and clang
	arm64/efi: fix variable 'si' set but not used
	arm64: unwind: Prohibit probing on return_address()
	arm64/mm: fix variable 'pud' set but not used
	IB/core: Add mitigation for Spectre V1
	IB/mlx5: Fix MR registration flow to use UMR properly
	IB/mad: Fix use-after-free in ib mad completion handling
	drm: msm: Fix add_gpu_components
	drm/exynos: fix missing decrement of retry counter
	Revert "kmemleak: allow to coexist with fault injection"
	ocfs2: remove set but not used variable 'last_hash'
	asm-generic: fix -Wtype-limits compiler warnings
	arm64: KVM: regmap: Fix unexpected switch fall-through
	KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
	staging: comedi: dt3000: Fix signed integer overflow 'divider * base'
	staging: comedi: dt3000: Fix rounding up of timer divisor
	iio: adc: max9611: Fix temperature reading in probe
	USB: core: Fix races in character device registration and deregistraion
	usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role"
	usb: cdc-acm: make sure a refcount is taken early enough
	USB: CDC: fix sanity checks in CDC union parser
	USB: serial: option: add D-Link DWM-222 device ID
	USB: serial: option: Add support for ZTE MF871A
	USB: serial: option: add the BroadMobi BM818 card
	USB: serial: option: Add Motorola modem UARTs
	drm/i915/cfl: Add a new CFL PCI ID.
	dm: disable DISCARD if the underlying storage no longer supports it
	arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side
	netfilter: conntrack: Use consistent ct id hash calculation
	Input: psmouse - fix build error of multiple definition
	iommu/amd: Move iommu_init_pci() to .init section
	bnx2x: Fix VF's VLAN reconfiguration in reload.
	bonding: Add vlan tx offload to hw_enc_features
	net: dsa: Check existence of .port_mdb_add callback before calling it
	net/mlx4_en: fix a memory leak bug
	net/packet: fix race in tpacket_snd()
	sctp: fix memleak in sctp_send_reset_streams
	sctp: fix the transport error_count check
	team: Add vlan tx offload to hw_enc_features
	tipc: initialise addr_trail_end when setting node addresses
	xen/netback: Reset nr_frags before freeing skb
	net/mlx5e: Only support tx/rx pause setting for port owner
	net/mlx5e: Use flow keys dissector to parse packets for ARFS
	mmc: sdhci-of-arasan: Do now show error message in case of deffered probe
	Linux 4.19.68

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8d9bb84a852deab3581f6cb37b4cb16e9bfe927c
2019-08-25 14:19:34 +02:00
Yang Shi
01d8d08f4c Revert "kmemleak: allow to coexist with fault injection"
[ Upstream commit df9576def004d2cd5beedc00cb6e8901427634b9 ]

When running ltp's oom test with kmemleak enabled, the below warning was
triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
passed in:

  WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
  Modules linked in: loop dax_pmem dax_pmem_core ip_tables x_tables xfs virtio_net net_failover virtio_blk failover ata_generic virtio_pci virtio_ring virtio libata
  CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
  ...
   kmemleak_alloc+0x4e/0xb0
   kmem_cache_alloc+0x2a7/0x3e0
   mempool_alloc_slab+0x2d/0x40
   mempool_alloc+0x118/0x2b0
   bio_alloc_bioset+0x19d/0x350
   get_swap_bio+0x80/0x230
   __swap_writepage+0x5ff/0xb20

The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, however kmemleak
has __GFP_NOFAIL set all the time due to d9570ee3bd ("kmemleak:
allow to coexist with fault injection").  But, it doesn't make any sense
to have __GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM specified at the same
time.

According to the discussion on the mailing list, the commit should be
reverted for short term solution.  Catalin Marinas would follow up with
a better solution for longer term.

The failure rate of kmemleak metadata allocation may increase in some
circumstances, but this should be expected side effect.

Link: http://lkml.kernel.org/r/1563299431-111710-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: d9570ee3bd ("kmemleak: allow to coexist with fault injection")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-25 10:47:58 +02:00
Isaac J. Manjarres
056368fc3e mm/usercopy: use memory range to be accessed for wraparound check
commit 951531691c4bcaa59f56a316e018bc2ff1ddf855 upstream.

Currently, when checking to see if accessing n bytes starting at address
"ptr" will cause a wraparound in the memory addresses, the check in
check_bogus_address() adds an extra byte, which is incorrect, as the
range of addresses that will be accessed is [ptr, ptr + (n - 1)].

This can lead to incorrectly detecting a wraparound in the memory
address, when trying to read 4 KB from memory that is mapped to the the
last possible page in the virtual address space, when in fact, accessing
that range of memory would not cause a wraparound to occur.

Use the memory range that will actually be accessed when considering if
accessing a certain amount of bytes will cause the memory address to
wrap around.

Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeaurora.org
Fixes: f5509cc18d ("mm: Hardened usercopy")
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Co-developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Trilok Soni <tsoni@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25 10:47:44 +02:00
Miles Chen
c8282f1b56 mm/memcontrol.c: fix use after free in mem_cgroup_iter()
commit 54a83d6bcbf8f4700013766b974bf9190d40b689 upstream.

This patch is sent to report an use after free in mem_cgroup_iter()
after merging commit be2657752e9e ("mm: memcg: fix use after free in
mem_cgroup_iter()").

I work with android kernel tree (4.9 & 4.14), and commit be2657752e9e
("mm: memcg: fix use after free in mem_cgroup_iter()") has been merged
to the trees.  However, I can still observe use after free issues
addressed in the commit be2657752e9e.  (on low-end devices, a few times
this month)

backtrace:
        css_tryget <- crash here
        mem_cgroup_iter
        shrink_node
        shrink_zones
        do_try_to_free_pages
        try_to_free_pages
        __perform_reclaim
        __alloc_pages_direct_reclaim
        __alloc_pages_slowpath
        __alloc_pages_nodemask

To debug, I poisoned mem_cgroup before freeing it:

  static void __mem_cgroup_free(struct mem_cgroup *memcg)
        for_each_node(node)
        free_mem_cgroup_per_node_info(memcg, node);
        free_percpu(memcg->stat);
  +     /* poison memcg before freeing it */
  +     memset(memcg, 0x78, sizeof(struct mem_cgroup));
        kfree(memcg);
  }

The coredump shows the position=0xdbbc2a00 is freed.

  (gdb) p/x ((struct mem_cgroup_per_node *)0xe5009e00)->iter[8]
  $13 = {position = 0xdbbc2a00, generation = 0x2efd}

  0xdbbc2a00:     0xdbbc2e00      0x00000000      0xdbbc2800      0x00000100
  0xdbbc2a10:     0x00000200      0x78787878      0x00026218      0x00000000
  0xdbbc2a20:     0xdcad6000      0x00000001      0x78787800      0x00000000
  0xdbbc2a30:     0x78780000      0x00000000      0x0068fb84      0x78787878
  0xdbbc2a40:     0x78787878      0x78787878      0x78787878      0xe3fa5cc0
  0xdbbc2a50:     0x78787878      0x78787878      0x00000000      0x00000000
  0xdbbc2a60:     0x00000000      0x00000000      0x00000000      0x00000000
  0xdbbc2a70:     0x00000000      0x00000000      0x00000000      0x00000000
  0xdbbc2a80:     0x00000000      0x00000000      0x00000000      0x00000000
  0xdbbc2a90:     0x00000001      0x00000000      0x00000000      0x00100000
  0xdbbc2aa0:     0x00000001      0xdbbc2ac8      0x00000000      0x00000000
  0xdbbc2ab0:     0x00000000      0x00000000      0x00000000      0x00000000
  0xdbbc2ac0:     0x00000000      0x00000000      0xe5b02618      0x00001000
  0xdbbc2ad0:     0x00000000      0x78787878      0x78787878      0x78787878
  0xdbbc2ae0:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2af0:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2b00:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2b10:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2b20:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2b30:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2b40:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2b50:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2b60:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2b70:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2b80:     0x78787878      0x78787878      0x00000000      0x78787878
  0xdbbc2b90:     0x78787878      0x78787878      0x78787878      0x78787878
  0xdbbc2ba0:     0x78787878      0x78787878      0x78787878      0x78787878

In the reclaim path, try_to_free_pages() does not setup
sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ...,
shrink_node().

In mem_cgroup_iter(), root is set to root_mem_cgroup because
sc->target_mem_cgroup is NULL.  It is possible to assign a memcg to
root_mem_cgroup.nodeinfo.iter in mem_cgroup_iter().

        try_to_free_pages
        	struct scan_control sc = {...}, target_mem_cgroup is 0x0;
        do_try_to_free_pages
        shrink_zones
        shrink_node
        	 mem_cgroup *root = sc->target_mem_cgroup;
        	 memcg = mem_cgroup_iter(root, NULL, &reclaim);
        mem_cgroup_iter()
        	if (!root)
        		root = root_mem_cgroup;
        	...

        	css = css_next_descendant_pre(css, &root->css);
        	memcg = mem_cgroup_from_css(css);
        	cmpxchg(&iter->position, pos, memcg);

My device uses memcg non-hierarchical mode.  When we release a memcg:
invalidate_reclaim_iterators() reaches only dead_memcg and its parents.
If non-hierarchical mode is used, invalidate_reclaim_iterators() never
reaches root_mem_cgroup.

  static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
  {
        struct mem_cgroup *memcg = dead_memcg;

        for (; memcg; memcg = parent_mem_cgroup(memcg)
        ...
  }

So the use after free scenario looks like:

  CPU1						CPU2

  try_to_free_pages
  do_try_to_free_pages
  shrink_zones
  shrink_node
  mem_cgroup_iter()
      if (!root)
      	root = root_mem_cgroup;
      ...
      css = css_next_descendant_pre(css, &root->css);
      memcg = mem_cgroup_from_css(css);
      cmpxchg(&iter->position, pos, memcg);

        				invalidate_reclaim_iterators(memcg);
        				...
        				__mem_cgroup_free()
        					kfree(memcg);

  try_to_free_pages
  do_try_to_free_pages
  shrink_zones
  shrink_node
  mem_cgroup_iter()
      if (!root)
      	root = root_mem_cgroup;
      ...
      mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id);
      iter = &mz->iter[reclaim->priority];
      pos = READ_ONCE(iter->position);
      css_tryget(&pos->css) <- use after free

To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter
in invalidate_reclaim_iterators().

[cai@lca.pw: fix -Wparentheses compilation warning]
  Link: http://lkml.kernel.org/r/1564580753-17531-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/20190730015729.4406-1-miles.chen@mediatek.com
Fixes: 5ac8fb31ad ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25 10:47:44 +02:00
Yang Shi
3c0cb90e92 mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
commit a53190a4aaa36494f4d7209fd1fcc6f2ee08e0e0 upstream.

When running syzkaller internally, we ran into the below bug on 4.9.x
kernel:

  kernel BUG at mm/huge_memory.c:2124!
  invalid opcode: 0000 [#1] SMP KASAN
  CPU: 0 PID: 1518 Comm: syz-executor107 Not tainted 4.9.168+ #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
  task: ffff880067b34900 task.stack: ffff880068998000
  RIP: split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
  Call Trace:
    split_huge_page include/linux/huge_mm.h:100 [inline]
    queue_pages_pte_range+0x7e1/0x1480 mm/mempolicy.c:538
    walk_pmd_range mm/pagewalk.c:50 [inline]
    walk_pud_range mm/pagewalk.c:90 [inline]
    walk_pgd_range mm/pagewalk.c:116 [inline]
    __walk_page_range+0x44a/0xdb0 mm/pagewalk.c:208
    walk_page_range+0x154/0x370 mm/pagewalk.c:285
    queue_pages_range+0x115/0x150 mm/mempolicy.c:694
    do_mbind mm/mempolicy.c:1241 [inline]
    SYSC_mbind+0x3c3/0x1030 mm/mempolicy.c:1370
    SyS_mbind+0x46/0x60 mm/mempolicy.c:1352
    do_syscall_64+0x1d2/0x600 arch/x86/entry/common.c:282
    entry_SYSCALL_64_after_swapgs+0x5d/0xdb
  Code: c7 80 1c 02 00 e8 26 0a 76 01 <0f> 0b 48 c7 c7 40 46 45 84 e8 4c
  RIP  [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
   RSP <ffff88006899f980>

with the below test:

  uint64_t r[1] = {0xffffffffffffffff};

  int main(void)
  {
        syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
                                intptr_t res = 0;
        res = syscall(__NR_socket, 0x11, 3, 0x300);
        if (res != -1)
                r[0] = res;
        *(uint32_t*)0x20000040 = 0x10000;
        *(uint32_t*)0x20000044 = 1;
        *(uint32_t*)0x20000048 = 0xc520;
        *(uint32_t*)0x2000004c = 1;
        syscall(__NR_setsockopt, r[0], 0x107, 0xd, 0x20000040, 0x10);
        syscall(__NR_mmap, 0x20fed000, 0x10000, 0, 0x8811, r[0], 0);
        *(uint64_t*)0x20000340 = 2;
        syscall(__NR_mbind, 0x20ff9000, 0x4000, 0x4002, 0x20000340, 0x45d4, 3);
        return 0;
  }

Actually the test does:

  mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
  socket(AF_PACKET, SOCK_RAW, 768)        = 3
  setsockopt(3, SOL_PACKET, PACKET_TX_RING, {block_size=65536, block_nr=1, frame_size=50464, frame_nr=1}, 16) = 0
  mmap(0x20fed000, 65536, PROT_NONE, MAP_SHARED|MAP_FIXED|MAP_POPULATE|MAP_DENYWRITE, 3, 0) = 0x20fed000
  mbind(..., MPOL_MF_STRICT|MPOL_MF_MOVE) = 0

The setsockopt() would allocate compound pages (16 pages in this test)
for packet tx ring, then the mmap() would call packet_mmap() to map the
pages into the user address space specified by the mmap() call.

When calling mbind(), it would scan the vma to queue the pages for
migration to the new node.  It would split any huge page since 4.9
doesn't support THP migration, however, the packet tx ring compound
pages are not THP and even not movable.  So, the above bug is triggered.

However, the later kernel is not hit by this issue due to commit
d44d363f65 ("mm: don't assume anonymous pages have SwapBacked flag"),
which just removes the PageSwapBacked check for a different reason.

But, there is a deeper issue.  According to the semantic of mbind(), it
should return -EIO if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified and
MPOL_MF_STRICT was also specified, but the kernel was unable to move all
existing pages in the range.  The tx ring of the packet socket is
definitely not movable, however, mbind() returns success for this case.

Although the most socket file associates with non-movable pages, but XDP
may have movable pages from gup.  So, it sounds not fine to just check
the underlying file type of vma in vma_migratable().

Change migrate_page_add() to check if the page is movable or not, if it
is unmovable, just return -EIO.  But do not abort pte walk immediately,
since there may be pages off LRU temporarily.  We should migrate other
pages if MPOL_MF_MOVE* is specified.  Set has_unmovable flag if some
paged could not be not moved, then return -EIO for mbind() eventually.

With this change the above test would return -EIO as expected.

[yang.shi@linux.alibaba.com: fix review comments from Vlastimil]
  Link: http://lkml.kernel.org/r/1563556862-54056-3-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1561162809-59140-3-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25 10:47:44 +02:00
Yang Shi
cd825d8714 mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
commit d883544515aae54842c21730b880172e7894fde9 upstream.

When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should
try best to migrate misplaced pages, if some of the pages could not be
migrated, then return -EIO.

There are three different sub-cases:
 1. vma is not migratable
 2. vma is migratable, but there are unmovable pages
 3. vma is migratable, pages are movable, but migrate_pages() fails

If #1 happens, kernel would just abort immediately, then return -EIO,
after a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when
MPOL_MF_STRICT is specified").

If #3 happens, kernel would set policy and migrate pages with
best-effort, but won't rollback the migrated pages and reset the policy
back.

Before that commit, they behaves in the same way.  It'd better to keep
their behavior consistent.  But, rolling back the migrated pages and
resetting the policy back sounds not feasible, so just make #1 behave as
same as #3.

Userspace will know that not everything was successfully migrated (via
-EIO), and can take whatever steps it deems necessary - attempt
rollback, determine which exact page(s) are violating the policy, etc.

Make queue_pages_range() return 1 to indicate there are unmovable pages
or vma is not migratable.

The #2 is not handled correctly in the current kernel, the following
patch will fix it.

[yang.shi@linux.alibaba.com: fix review comments from Vlastimil]
  Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25 10:47:43 +02:00
Ralph Campbell
f0fed8283d mm/hmm: fix bad subpage pointer in try_to_unmap_one
commit 1de13ee59225dfc98d483f8cce7d83f97c0b31de upstream.

When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is
increased.  This is so rmap_walk() can be used to unmap and migrate the
page back to system memory.

However, try_to_unmap_one() computes the subpage pointer from a swap pte
which computes an invalid page pointer and a kernel panic results such
as:

  BUG: unable to handle page fault for address: ffffea1fffffffc8

Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".

[rcampbell@nvidia.com: add comment]
  Link: http://lkml.kernel.org/r/20190724232700.23327-4-rcampbell@nvidia.com
Link: http://lkml.kernel.org/r/20190719192955.30462-4-rcampbell@nvidia.com
Fixes: a5430dda8a ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25 10:47:43 +02:00
Greg Kroah-Hartman
b1e96f1650 This is the 4.19.67 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1WZYYACgkQONu9yGCS
 aT5VjRAApdD6wuKcKhZ8j010Ni18w6W+3qs6IuIXv94eav0zFSRaO9Zp93lZq2p0
 h+k+ssZ+P8a4EuDquzDydlagno9hojHFAYr+9loPZlZUw578Jzg9JbJK9Z1MyQCo
 BCRElzZG67E+WjLP0wGHnS0oVhIoHlJaVWP3pEYkTJILY65ErLT/fYGs64YUAEKr
 Ct1pKoIHPEC0606IKx12kmV645ME4z6pI7g4kLDhk2BozglbxGlwdHgVuIe/NzDP
 PraR1gqMoOD2skjK673ozsZ65yuiVeqSTsbs49Xao1lAS6etUMbC/ACU/yrhL48H
 mMM/EFTSKb5TjJSxQAXU1ANQrm4X6n1yPkNW/MdthnPAotDY3Nda4NNVE9X2toM7
 XW0HfFdcVD7aJtpC/h6ckndGTaOGkHSPjhJtSlBEjF76BA+KhZ9hhcjNWng92bWL
 d5Nws4b82wvgM6T99mkZfbMc2MOopPMf+I94W0JcMa49+rXhyhJdrC72GpxKLdSq
 +XtZJupFWg0RrPlZfmc4Az8f/uY0UfR9gNSaHJokaZAkMzH2x4MzMnPxwRiXAw4W
 qz1s+sgZlqUQcWvODzaNvG1l7QtjD5rbdJ+FAjN2+16F8rep52Yl/IQffYr04DDD
 wikYmcUoMh8hCoj6Atj2LAAU9ulhl6ja8s0YpmHz/HQETufHAZc=
 =gOG+
 -----END PGP SIGNATURE-----

Merge 4.19.67 into android-4.19

Changes in 4.19.67
	iio: cros_ec_accel_legacy: Fix incorrect channel setting
	iio: adc: max9611: Fix misuse of GENMASK macro
	staging: gasket: apex: fix copy-paste typo
	staging: android: ion: Bail out upon SIGKILL when allocating memory.
	crypto: ccp - Fix oops by properly managing allocated structures
	crypto: ccp - Add support for valid authsize values less than 16
	crypto: ccp - Ignore tag length when decrypting GCM ciphertext
	usb: usbfs: fix double-free of usb memory upon submiturb error
	usb: iowarrior: fix deadlock on disconnect
	sound: fix a memory leak bug
	mmc: cavium: Set the correct dma max segment size for mmc_host
	mmc: cavium: Add the missing dma unmap when the dma has finished.
	loop: set PF_MEMALLOC_NOIO for the worker thread
	Input: usbtouchscreen - initialize PM mutex before using it
	Input: elantech - enable SMBus on new (2018+) systems
	Input: synaptics - enable RMI mode for HP Spectre X360
	x86/mm: Check for pfn instead of page in vmalloc_sync_one()
	x86/mm: Sync also unmappings in vmalloc_sync_all()
	mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()
	perf annotate: Fix s390 gap between kernel end and module start
	perf db-export: Fix thread__exec_comm()
	perf record: Fix module size on s390
	x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS
	gfs2: gfs2_walk_metadata fix
	usb: host: xhci-rcar: Fix timeout in xhci_suspend()
	usb: yurex: Fix use-after-free in yurex_delete
	usb: typec: tcpm: free log buf memory when remove debug file
	usb: typec: tcpm: remove tcpm dir if no children
	usb: typec: tcpm: Add NULL check before dereferencing config
	usb: typec: tcpm: Ignore unsupported/unknown alternate mode requests
	can: rcar_canfd: fix possible IRQ storm on high load
	can: peak_usb: fix potential double kfree_skb()
	netfilter: nfnetlink: avoid deadlock due to synchronous request_module
	vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn
	netfilter: Fix rpfilter dropping vrf packets by mistake
	netfilter: conntrack: always store window size un-scaled
	netfilter: nft_hash: fix symhash with modulus one
	scripts/sphinx-pre-install: fix script for RHEL/CentOS
	drm/amd/display: Wait for backlight programming completion in set backlight level
	drm/amd/display: use encoder's engine id to find matched free audio device
	drm/amd/display: Fix dc_create failure handling and 666 color depths
	drm/amd/display: Only enable audio if speaker allocation exists
	drm/amd/display: Increase size of audios array
	iscsi_ibft: make ISCSI_IBFT dependson ACPI instead of ISCSI_IBFT_FIND
	nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
	mac80211: don't warn about CW params when not using them
	allocate_flower_entry: should check for null deref
	hwmon: (nct6775) Fix register address and added missed tolerance for nct6106
	drm: silence variable 'conn' set but not used
	cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
	s390/qdio: add sanity checks to the fast-requeue path
	ALSA: compress: Fix regression on compressed capture streams
	ALSA: compress: Prevent bypasses of set_params
	ALSA: compress: Don't allow paritial drain operations on capture streams
	ALSA: compress: Be more restrictive about when a drain is allowed
	perf tools: Fix proper buffer size for feature processing
	perf probe: Avoid calling freeing routine multiple times for same pointer
	drbd: dynamically allocate shash descriptor
	ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id()
	nvme: fix multipath crash when ANA is deactivated
	ARM: davinci: fix sleep.S build error on ARMv4
	ARM: dts: bcm: bcm47094: add missing #cells for mdio-bus-mux
	scsi: megaraid_sas: fix panic on loading firmware crashdump
	scsi: ibmvfc: fix WARN_ON during event pool release
	scsi: scsi_dh_alua: always use a 2 second delay before retrying RTPG
	test_firmware: fix a memory leak bug
	tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop
	perf/core: Fix creating kernel counters for PMUs that override event->cpu
	s390/dma: provide proper ARCH_ZONE_DMA_BITS value
	HID: sony: Fix race condition between rumble and device remove.
	x86/purgatory: Do not use __builtin_memcpy and __builtin_memset
	ALSA: usb-audio: fix a memory leak bug
	can: peak_usb: pcan_usb_pro: Fix info-leaks to USB devices
	can: peak_usb: pcan_usb_fd: Fix info-leaks to USB devices
	hwmon: (nct7802) Fix wrong detection of in4 presence
	drm/i915: Fix wrong escape clock divisor init for GLK
	ALSA: firewire: fix a memory leak bug
	ALSA: hiface: fix multiple memory leak bugs
	ALSA: hda - Don't override global PCM hw info flag
	ALSA: hda - Workaround for crackled sound on AMD controller (1022:1457)
	mac80211: don't WARN on short WMM parameters from AP
	dax: dax_layout_busy_page() should not unmap cow pages
	SMB3: Fix deadlock in validate negotiate hits reconnect
	smb3: send CAP_DFS capability during session setup
	NFSv4: Fix an Oops in nfs4_do_setattr
	KVM: Fix leak vCPU's VMCS value into other pCPU
	mwifiex: fix 802.11n/WPA detection
	iwlwifi: don't unmap as page memory that was mapped as single
	iwlwifi: mvm: fix an out-of-bound access
	iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT on version < 41
	iwlwifi: mvm: fix version check for GEO_TX_POWER_LIMIT support
	Linux 4.19.67

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5ea813ed5ba6d1eeda51eb4031395ee3e8ba54c3
2019-08-16 11:27:10 +02:00
Joerg Roedel
46b306f3cd mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()
commit 3f8fd02b1bf1d7ba964485a56f2f4b53ae88c167 upstream.

On x86-32 with PTI enabled, parts of the kernel page-tables are not shared
between processes. This can cause mappings in the vmalloc/ioremap area to
persist in some page-tables after the region is unmapped and released.

When the region is re-used the processes with the old mappings do not fault
in the new mappings but still access the old ones.

This causes undefined behavior, in reality often data corruption, kernel
oopses and panics and even spontaneous reboots.

Fix this problem by activly syncing unmaps in the vmalloc/ioremap area to
all page-tables in the system before the regions can be re-used.

References: https://bugzilla.suse.com/show_bug.cgi?id=1118689
Fixes: 5d72b4fba4 ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-4-joro@8bytes.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-16 10:12:40 +02:00
Greg Kroah-Hartman
de4c70d6a9 This is the 4.19.65 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1Js7MACgkQONu9yGCS
 aT4PQxAAo7xa4kYvDxc1RjUY/yIlp6lQ3rpYAAfZB0t8vN+dqivnJZ7m6JHeWX1Y
 CMcxg85zxLVFeuiXdP821Zj68AB5zqlWMhX0bXm2lhw/Eo9+XHzXtnrLZHhz0/Xd
 M5cmfIPmoyPCUQQfzSfUMvch+ZpwzEt5op5pUfSjckSpjHQZ0HFj1WJ4D8Hn9jAJ
 y4+DAKDZgtqhb3GvpS6MoVnBJgcPk9+mBiDkSb12L392+FvHqfeBi3tDRhvyiZAO
 iJrk747SPds7NlNmuRnj7YyUSDhBzaceRCz0Jsv9FT5EKXoPErXdsL3Bkfa9TREM
 pH0OaMgNr6WSXLO9qIMcfxMeaKVIvIbotqBTkBTzhEAGPkHA75dhi0lpixXXFExg
 MaqhLfmHO0dOEr9FrvYGe7f2wUA1Rdw/qRTM3KPEKmHxMqBS7eufIWMHwie1n9Oe
 cYoP6UkxUIvhUyFV2BlMRFdMfaDbtR0iqy8Dqh36NISD6PAYaUGSoVeSO1fEg4Jy
 5GgrKPg6rcz2XNY2cVbsm2zLpqY4dY58SFK9ORfuULdKUQvScvFGrdSSW0CgX+uc
 F/5NmPutUoboHVxFraDPx7yo46pHf1RW0Me4xZ0aJ3e9ituLAN4fmJ9u46nofb5M
 thPelQlMVt30O41uViJ0ADkOjCsiBr3AxOFvc76Ct9Q/BJVxhLk=
 =JVBv
 -----END PGP SIGNATURE-----

Merge 4.19.65 into android-4.19

Changes in 4.19.65
	ARM: riscpc: fix DMA
	ARM: dts: rockchip: Make rk3288-veyron-minnie run at hs200
	ARM: dts: rockchip: Make rk3288-veyron-mickey's emmc work again
	ARM: dts: rockchip: Mark that the rk3288 timer might stop in suspend
	ftrace: Enable trampoline when rec count returns back to one
	dmaengine: tegra-apb: Error out if DMA_PREP_INTERRUPT flag is unset
	arm64: dts: rockchip: fix isp iommu clocks and power domain
	kernel/module.c: Only return -EEXIST for modules that have finished loading
	firmware/psci: psci_checker: Park kthreads before stopping them
	MIPS: lantiq: Fix bitfield masking
	dmaengine: rcar-dmac: Reject zero-length slave DMA requests
	clk: tegra210: fix PLLU and PLLU_OUT1
	fs/adfs: super: fix use-after-free bug
	clk: sprd: Add check for return value of sprd_clk_regmap_init()
	btrfs: fix minimum number of chunk errors for DUP
	btrfs: qgroup: Don't hold qgroup_ioctl_lock in btrfs_qgroup_inherit()
	cifs: Fix a race condition with cifs_echo_request
	ceph: fix improper use of smp_mb__before_atomic()
	ceph: return -ERANGE if virtual xattr value didn't fit in buffer
	ACPI: blacklist: fix clang warning for unused DMI table
	scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized
	perf version: Fix segfault due to missing OPT_END()
	x86: kvm: avoid constant-conversion warning
	ACPI: fix false-positive -Wuninitialized warning
	be2net: Signal that the device cannot transmit during reconfiguration
	x86/apic: Silence -Wtype-limits compiler warnings
	x86: math-emu: Hide clang warnings for 16-bit overflow
	mm/cma.c: fail if fixed declaration can't be honored
	lib/test_overflow.c: avoid tainting the kernel and fix wrap size
	lib/test_string.c: avoid masking memset16/32/64 failures
	coda: add error handling for fget
	coda: fix build using bare-metal toolchain
	uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers
	drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
	ipc/mqueue.c: only perform resource calculation if user valid
	mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed
	xen/pv: Fix a boot up hang revealed by int3 self test
	x86/kvm: Don't call kvm_spurious_fault() from .fixup
	x86/paravirt: Fix callee-saved function ELF sizes
	x86, boot: Remove multiple copy of static function sanitize_boot_params()
	drm/nouveau: fix memory leak in nouveau_conn_reset()
	kconfig: Clear "written" flag to avoid data loss
	kbuild: initialize CLANG_FLAGS correctly in the top Makefile
	Btrfs: fix incremental send failure after deduplication
	Btrfs: fix race leading to fs corruption after transaction abort
	mmc: dw_mmc: Fix occasional hang after tuning on eMMC
	mmc: meson-mx-sdio: Fix misuse of GENMASK macro
	gpiolib: fix incorrect IRQ requesting of an active-low lineevent
	IB/hfi1: Fix Spectre v1 vulnerability
	mtd: rawnand: micron: handle on-die "ECC-off" devices correctly
	selinux: fix memory leak in policydb_init()
	ALSA: hda: Fix 1-minute detection delay when i915 module is not available
	mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
	s390/dasd: fix endless loop after read unit address configuration
	cgroup: kselftest: relax fs_spec checks
	parisc: Fix build of compressed kernel even with debug enabled
	drivers/perf: arm_pmu: Fix failure path in PM notifier
	arm64: compat: Allow single-byte watchpoints on all addresses
	arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG}
	nbd: replace kill_bdev() with __invalidate_device() again
	xen/swiotlb: fix condition for calling xen_destroy_contiguous_region()
	IB/mlx5: Fix unreg_umr to ignore the mkey state
	IB/mlx5: Use direct mkey destroy command upon UMR unreg failure
	IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache
	IB/mlx5: Fix clean_mr() to work in the expected order
	IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification
	IB/hfi1: Check for error on call to alloc_rsm_map_table
	drm/i915/gvt: fix incorrect cache entry for guest page mapping
	eeprom: at24: make spd world-readable again
	ARC: enable uboot support unconditionally
	objtool: Support GCC 9 cold subfunction naming scheme
	gcc-9: properly declare the {pv,hv}clock_page storage
	x86/vdso: Prevent segfaults due to hoisted vclock reads
	scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA
	x86/cpufeatures: Carve out CQM features retrieval
	x86/cpufeatures: Combine word 11 and 12 into a new scattered features word
	x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
	x86/speculation: Enable Spectre v1 swapgs mitigations
	x86/entry/64: Use JMP instead of JMPQ
	x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
	Documentation: Add swapgs description to the Spectre v1 documentation
	Linux 4.19.65

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iceeabdb164657e0a616db618e6aa8445d56b0dc1
2019-08-06 20:08:18 +02:00
Yang Shi
beb0cc781b mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
commit fa1e512fac717f34e7c12d7a384c46e90a647392 upstream.

Shakeel Butt reported premature oom on kernel with
"cgroup_disable=memory" since mem_cgroup_is_root() returns false even
though memcg is actually NULL.  The drop_caches is also broken.

It is because commit aeed1d325d ("mm/vmscan.c: generalize
shrink_slab() calls in shrink_node()") removed the !memcg check before
!mem_cgroup_is_root().  And, surprisingly root memcg is allocated even
though memory cgroup is disabled by kernel boot parameter.

Add mem_cgroup_disabled() check to make reclaimer work as expected.

Link: http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: aeed1d325d ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reported-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Hadrava <had@kam.mff.cuni.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>	[4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-06 19:06:54 +02:00
Doug Berger
439c79ed77 mm/cma.c: fail if fixed declaration can't be honored
[ Upstream commit c633324e311243586675e732249339685e5d6faa ]

The description of cma_declare_contiguous() indicates that if the
'fixed' argument is true the reserved contiguous area must be exactly at
the address of the 'base' argument.

However, the function currently allows the 'base', 'size', and 'limit'
arguments to be silently adjusted to meet alignment constraints.  This
commit enforces the documented behavior through explicit checks that
return an error if the region does not fit within a specified region.

Link: http://lkml.kernel.org/r/1561422051-16142-1-git-send-email-opendmb@gmail.com
Fixes: 5ea3b1b2f8 ("cma: add placement specifier for "cma=" kernel parameter")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Yue Hu <huyue2@yulong.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Peng Fan <peng.fan@nxp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-06 19:06:51 +02:00
Greg Kroah-Hartman
75ff56e1a2 This is the 4.19.63 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1BJrAACgkQONu9yGCS
 aT6MpRAAt2nuozm5Z/MshFAuGFzddAwPoYtkIPSy8BiPHYjf7x0+D5Ew4dz5OihS
 ElfbA94hMOpvhhXlzBU3ZFJsWZIK78gzV6+LiHyb5R97Jdzj/zT4h40y0kxKw+pS
 gghnZ6zx+pGSIXm/EsODW2gg98yTrmhBFpUhXpAGoC/71c1vxVlj3jHcuhd778YK
 NRlj2tFWJGIBpmXrApo1Eg7qQRj4tzbOjthfgANWPq+EP68PgiOaxMDMMp7IUwju
 KrXEFcXlk5aHvfaJ06FSBRBnn45XMyGXPYV/76HsaqkBNmg1r7o02dZKy/0S84Fn
 YXoEFxHt6NFOJ52LiO95z7cQ0xeTSYygNCtYXJ70uyDMnrCPXCrKp7DRyka+vDPs
 RCrcpB1QjcCb3xTL//SPkNWM3oZEW9CawpRFh9bmqbw/h7ZjaUEuwNIJnunISulu
 2fvOjUmFWfUVIARiwKFVuIkXzgf3cSLYZTtiDFC5/yBpkGVNnXqyO7YtWZwnrMHq
 L3DC3pOKuYXMa03KWGEzZoCZEXjtoRhRwCSgV7wbK5o90sZeRj/HC1zXEyDPPD/R
 7A1rePTuwlAH3gHCJGhYkmYqULx62ZdvV6IC2N7xNxeTL1Y7OVNBT7ZUqexxY6WC
 OG1vVxUKNJIvBLYmc6cmQATgR6XH5/B9H2p1YRBLuAQuqHk+jqo=
 =ZQf3
 -----END PGP SIGNATURE-----

Merge 4.19.63 into android-4.19

Changes in 4.19.63
	hvsock: fix epollout hang from race condition
	drm/panel: simple: Fix panel_simple_dsi_probe
	iio: adc: stm32-dfsdm: manage the get_irq error case
	iio: adc: stm32-dfsdm: missing error case during probe
	staging: vt6656: use meaningful error code during buffer allocation
	usb: core: hub: Disable hub-initiated U1/U2
	tty: max310x: Fix invalid baudrate divisors calculator
	pinctrl: rockchip: fix leaked of_node references
	tty: serial: cpm_uart - fix init when SMC is relocated
	drm/amd/display: Fill prescale_params->scale for RGB565
	drm/amdgpu/sriov: Need to initialize the HDP_NONSURFACE_BAStE
	drm/amd/display: Disable ABM before destroy ABM struct
	drm/amdkfd: Fix a potential memory leak
	drm/amdkfd: Fix sdma queue map issue
	drm/edid: Fix a missing-check bug in drm_load_edid_firmware()
	PCI: Return error if cannot probe VF
	drm/bridge: tc358767: read display_props in get_modes()
	drm/bridge: sii902x: pixel clock unit is 10kHz instead of 1kHz
	gpu: host1x: Increase maximum DMA segment size
	drm/crc-debugfs: User irqsafe spinlock in drm_crtc_add_crc_entry
	drm/crc-debugfs: Also sprinkle irqrestore over early exits
	memstick: Fix error cleanup path of memstick_init
	tty/serial: digicolor: Fix digicolor-usart already registered warning
	tty: serial: msm_serial: avoid system lockup condition
	serial: 8250: Fix TX interrupt handling condition
	drm/amd/display: Always allocate initial connector state state
	drm/virtio: Add memory barriers for capset cache.
	phy: renesas: rcar-gen2: Fix memory leak at error paths
	drm/amd/display: fix compilation error
	powerpc/pseries/mobility: prevent cpu hotplug during DT update
	drm/rockchip: Properly adjust to a true clock in adjusted_mode
	serial: imx: fix locking in set_termios()
	tty: serial_core: Set port active bit in uart_port_activate
	usb: gadget: Zero ffs_io_data
	mmc: sdhci: sdhci-pci-o2micro: Check if controller supports 8-bit width
	powerpc/pci/of: Fix OF flags parsing for 64bit BARs
	drm/msm: Depopulate platform on probe failure
	serial: mctrl_gpio: Check if GPIO property exisits before requesting it
	PCI: sysfs: Ignore lockdep for remove attribute
	i2c: stm32f7: fix the get_irq error cases
	kbuild: Add -Werror=unknown-warning-option to CLANG_FLAGS
	genksyms: Teach parser about 128-bit built-in types
	PCI: xilinx-nwl: Fix Multi MSI data programming
	iio: iio-utils: Fix possible incorrect mask calculation
	powerpc/cacheflush: fix variable set but not used
	powerpc/xmon: Fix disabling tracing while in xmon
	recordmcount: Fix spurious mcount entries on powerpc
	mfd: madera: Add missing of table registration
	mfd: core: Set fwnode for created devices
	mfd: arizona: Fix undefined behavior
	mfd: hi655x-pmic: Fix missing return value check for devm_regmap_init_mmio_clk
	mm/swap: fix release_pages() when releasing devmap pages
	um: Silence lockdep complaint about mmap_sem
	powerpc/4xx/uic: clear pending interrupt after irq type/pol change
	RDMA/i40iw: Set queue pair state when being queried
	serial: sh-sci: Terminate TX DMA during buffer flushing
	serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
	IB/mlx5: Fixed reporting counters on 2nd port for Dual port RoCE
	powerpc/mm: Handle page table allocation failures
	IB/ipoib: Add child to parent list only if device initialized
	arm64: assembler: Switch ESB-instruction with a vanilla nop if !ARM64_HAS_RAS
	PCI: mobiveil: Fix PCI base address in MEM/IO outbound windows
	PCI: mobiveil: Fix the Class Code field
	kallsyms: exclude kasan local symbols on s390
	PCI: mobiveil: Initialize Primary/Secondary/Subordinate bus numbers
	PCI: mobiveil: Use the 1st inbound window for MEM inbound transactions
	perf test mmap-thread-lookup: Initialize variable to suppress memory sanitizer warning
	perf stat: Fix use-after-freed pointer detected by the smatch tool
	perf top: Fix potential NULL pointer dereference detected by the smatch tool
	perf session: Fix potential NULL pointer dereference found by the smatch tool
	perf annotate: Fix dereferencing freed memory found by the smatch tool
	perf hists browser: Fix potential NULL pointer dereference found by the smatch tool
	RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM
	PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB
	powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
	block: init flush rq ref count to 1
	f2fs: avoid out-of-range memory access
	mailbox: handle failed named mailbox channel request
	dlm: check if workqueues are NULL before flushing/destroying
	powerpc/eeh: Handle hugepages in ioremap space
	block/bio-integrity: fix a memory leak bug
	sh: prevent warnings when using iounmap
	mm/kmemleak.c: fix check for softirq context
	9p: pass the correct prototype to read_cache_page
	mm/gup.c: mark undo_dev_pagemap as __maybe_unused
	mm/gup.c: remove some BUG_ONs from get_gate_page()
	memcg, fsnotify: no oom-kill for remote memcg charging
	mm/mmu_notifier: use hlist_add_head_rcu()
	proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollup
	proc: use down_read_killable mmap_sem for /proc/pid/pagemap
	proc: use down_read_killable mmap_sem for /proc/pid/clear_refs
	proc: use down_read_killable mmap_sem for /proc/pid/map_files
	cxgb4: reduce kernel stack usage in cudbg_collect_mem_region()
	proc: use down_read_killable mmap_sem for /proc/pid/maps
	locking/lockdep: Fix lock used or unused stats error
	mm: use down_read_killable for locking mmap_sem in access_remote_vm
	locking/lockdep: Hide unused 'class' variable
	usb: wusbcore: fix unbalanced get/put cluster_id
	usb: pci-quirks: Correct AMD PLL quirk detection
	btrfs: inode: Don't compress if NODATASUM or NODATACOW set
	x86/sysfb_efi: Add quirks for some devices with swapped width and height
	x86/speculation/mds: Apply more accurate check on hypervisor platform
	binder: prevent transactions to context manager from its own process.
	fpga-manager: altera-ps-spi: Fix build error
	mei: me: add mule creek canyon (EHL) device ids
	hpet: Fix division by zero in hpet_time_div()
	ALSA: ac97: Fix double free of ac97_codec_device
	ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1
	ALSA: hda - Add a conexant codec entry to let mute led work
	powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()
	powerpc/tm: Fix oops on sigreturn on systems without TM
	libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
	access: avoid the RCU grace period for the temporary subjective credentials
	Linux 4.19.63

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic31529aa6fd283d16d6bfb182187a9402a4db44f
2019-07-31 08:03:42 +02:00
Konstantin Khlebnikov
b07687243d mm: use down_read_killable for locking mmap_sem in access_remote_vm
[ Upstream commit 1e426fe28261b03f297992e89da3320b42816f4e ]

This function is used by ptrace and proc files like /proc/pid/cmdline and
/proc/pid/environ.

Access_remote_vm never returns error codes, all errors are ignored and
only size of successfully read data is returned.  So, if current task was
killed we'll simply return 0 (bytes read).

Mmap_sem could be locked for a long time or forever if something goes
wrong.  Using a killable lock permits cleanup of stuck tasks and
simplifies investigation.

Link: http://lkml.kernel.org/r/156007494202.3335.16782303099589302087.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:27:09 +02:00
Jean-Philippe Brucker
a8c568fc48 mm/mmu_notifier: use hlist_add_head_rcu()
[ Upstream commit 543bdb2d825fe2400d6e951f1786d92139a16931 ]

Make mmu_notifier_register() safer by issuing a memory barrier before
registering a new notifier.  This fixes a theoretical bug on weakly
ordered CPUs.  For example, take this simplified use of notifiers by a
driver:

	my_struct->mn.ops = &my_ops; /* (1) */
	mmu_notifier_register(&my_struct->mn, mm)
		...
		hlist_add_head(&mn->hlist, &mm->mmu_notifiers); /* (2) */
		...

Once mmu_notifier_register() releases the mm locks, another thread can
invalidate a range:

	mmu_notifier_invalidate_range()
		...
		hlist_for_each_entry_rcu(mn, &mm->mmu_notifiers, hlist) {
			if (mn->ops->invalidate_range)

The read side relies on the data dependency between mn and ops to ensure
that the pointer is properly initialized.  But the write side doesn't have
any dependency between (1) and (2), so they could be reordered and the
readers could dereference an invalid mn->ops.  mmu_notifier_register()
does take all the mm locks before adding to the hlist, but those have
acquire semantics which isn't sufficient.

By calling hlist_add_head_rcu() instead of hlist_add_head() we update the
hlist using a store-release, ensuring that readers see prior
initialization of my_struct.  This situation is better illustated by
litmus test MP+onceassign+derefonce.

Link: http://lkml.kernel.org/r/20190502133532.24981-1-jean-philippe.brucker@arm.com
Fixes: cddb8a5c14 ("mmu-notifiers: core")
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:27:08 +02:00
Andy Lutomirski
041b127df7 mm/gup.c: remove some BUG_ONs from get_gate_page()
[ Upstream commit b5d1c39f34d1c9bca0c4b9ae2e339fbbe264a9c7 ]

If we end up without a PGD or PUD entry backing the gate area, don't BUG
-- just fail gracefully.

It's not entirely implausible that this could happen some day on x86.  It
doesn't right now even with an execute-only emulated vsyscall page because
the fixmap shares the PUD, but the core mm code shouldn't rely on that
particular detail to avoid OOPSing.

Link: http://lkml.kernel.org/r/a1d9f4efb75b9d464e59fd6af00104b21c58f6f7.1561610798.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:27:08 +02:00
Guenter Roeck
fa099d6ddf mm/gup.c: mark undo_dev_pagemap as __maybe_unused
[ Upstream commit 790c73690c2bbecb3f6f8becbdb11ddc9bcff8cc ]

Several mips builds generate the following build warning.

  mm/gup.c:1788:13: warning: 'undo_dev_pagemap' defined but not used

The function is declared unconditionally but only called from behind
various ifdefs. Mark it __maybe_unused.

Link: http://lkml.kernel.org/r/1562072523-22311-1-git-send-email-linux@roeck-us.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:27:08 +02:00
Dmitry Vyukov
071f2135cf mm/kmemleak.c: fix check for softirq context
[ Upstream commit 6ef9056952532c3b746de46aa10d45b4d7797bd8 ]

in_softirq() is a wrong predicate to check if we are in a softirq
context.  It also returns true if we have BH disabled, so objects are
falsely stamped with "softirq" comm.  The correct predicate is
in_serving_softirq().

If user does cat from /sys/kernel/debug/kmemleak previously they would
see this, which is clearly wrong, this is system call context (see the
comm):

unreferenced object 0xffff88805bd661c0 (size 64):
  comm "softirq", pid 0, jiffies 4294942959 (age 12.400s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
    00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000007dcb30c>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
    [<0000000007dcb30c>] slab_post_alloc_hook mm/slab.h:439 [inline]
    [<0000000007dcb30c>] slab_alloc mm/slab.c:3326 [inline]
    [<0000000007dcb30c>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
    [<00000000969722b7>] kmalloc include/linux/slab.h:547 [inline]
    [<00000000969722b7>] kzalloc include/linux/slab.h:742 [inline]
    [<00000000969722b7>] ip_mc_add1_src net/ipv4/igmp.c:1961 [inline]
    [<00000000969722b7>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2085
    [<00000000a4134b5f>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2475
    [<00000000d20248ad>] do_ip_setsockopt.isra.0+0x19fe/0x1c00 net/ipv4/ip_sockglue.c:957
    [<000000003d367be7>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1246
    [<000000003c7c76af>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616
    [<000000000c1aeb23>] sock_common_setsockopt+0x3e/0x50 net/core/sock.c:3130
    [<000000000157b92b>] __sys_setsockopt+0x9e/0x120 net/socket.c:2078
    [<00000000a9f3d058>] __do_sys_setsockopt net/socket.c:2089 [inline]
    [<00000000a9f3d058>] __se_sys_setsockopt net/socket.c:2086 [inline]
    [<00000000a9f3d058>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
    [<000000001b8da885>] do_syscall_64+0x7c/0x1a0 arch/x86/entry/common.c:301
    [<00000000ba770c62>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

now they will see this:

unreferenced object 0xffff88805413c800 (size 64):
  comm "syz-executor.4", pid 8960, jiffies 4294994003 (age 14.350s)
  hex dump (first 32 bytes):
    00 7a 8a 57 80 88 ff ff e0 00 00 01 00 00 00 00  .z.W............
    00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000c5d3be64>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
    [<00000000c5d3be64>] slab_post_alloc_hook mm/slab.h:439 [inline]
    [<00000000c5d3be64>] slab_alloc mm/slab.c:3326 [inline]
    [<00000000c5d3be64>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
    [<0000000023865be2>] kmalloc include/linux/slab.h:547 [inline]
    [<0000000023865be2>] kzalloc include/linux/slab.h:742 [inline]
    [<0000000023865be2>] ip_mc_add1_src net/ipv4/igmp.c:1961 [inline]
    [<0000000023865be2>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2085
    [<000000003029a9d4>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2475
    [<00000000ccd0a87c>] do_ip_setsockopt.isra.0+0x19fe/0x1c00 net/ipv4/ip_sockglue.c:957
    [<00000000a85a3785>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1246
    [<00000000ec13c18d>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616
    [<0000000052d748e3>] sock_common_setsockopt+0x3e/0x50 net/core/sock.c:3130
    [<00000000512f1014>] __sys_setsockopt+0x9e/0x120 net/socket.c:2078
    [<00000000181758bc>] __do_sys_setsockopt net/socket.c:2089 [inline]
    [<00000000181758bc>] __se_sys_setsockopt net/socket.c:2086 [inline]
    [<00000000181758bc>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
    [<00000000d4b73623>] do_syscall_64+0x7c/0x1a0 arch/x86/entry/common.c:301
    [<00000000c1098bec>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: http://lkml.kernel.org/r/20190517171507.96046-1-dvyukov@gmail.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:27:08 +02:00
Ira Weiny
30edc7c1fe mm/swap: fix release_pages() when releasing devmap pages
[ Upstream commit c5d6c45e90c49150670346967971e14576afd7f1 ]

release_pages() is an optimized version of a loop around put_page().
Unfortunately for devmap pages the logic is not entirely correct in
release_pages().  This is because device pages can be more than type
MEMORY_DEVICE_PUBLIC.  There are in fact 4 types, private, public, FS DAX,
and PCI P2PDMA.  Some of these have specific needs to "put" the page while
others do not.

This logic to handle any special needs is contained in
put_devmap_managed_page().  Therefore all devmap pages should be processed
by this function where we can contain the correct logic for a page put.

Handle all device type pages within release_pages() by calling
put_devmap_managed_page() on all devmap pages.  If
put_devmap_managed_page() returns true the page has been put and we
continue with the next page.  A false return of put_devmap_managed_page()
means the page did not require special processing and should fall to
"normal" processing.

This was found via code inspection while determining if release_pages()
and the new put_user_pages() could be interchangeable.[1]

[1] https://lkml.kernel.org/r/20190523172852.GA27175@iweiny-DESK2.sc.intel.com

Link: https://lkml.kernel.org/r/20190605214922.17684-1-ira.weiny@intel.com
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-31 07:27:03 +02:00
Greg Kroah-Hartman
f232ce65ba This is the 4.19.62 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl09QMoACgkQONu9yGCS
 aT7bBQ//dApu9DLQ3tyfc+tucIc3ViOcif3zlfLWRqClcLXw1HF63/Q/u40kYuf1
 TOnMkVXFWmpeGPgz2/tWw9TN+ibHaPBfyMoPGw/Ehs5dB8qn/F4AnLqkePFimIYQ
 +YOZTbvPqW/xSh8I79i6y2NxrZLGvO9FDY9YUblBKz4lKWc+HULXn9RmjtxlBOLE
 s6epoM0tWKPPeM5MyJIwT+9MZ0o3Cj89YNv+PqYLQSzcvbUbHDUiMt+DO4rqSyyf
 6fQe4mSX4tk+tbviyEruvKH3kJg0rWS1Maw1/h+AOJ8Gmr2WZ+vYBorPEdwdYu5K
 eootc3Jkj4wtcPsvKtNzWVPqJxdd5j651vS8/bxA0DmOVQM7A8dUBWKtMJGGG7nM
 nIRvxoMo+kv9QE/DJpeQ/apE9WnBflSqw6/DYdqs11gTX4E+beR75aRoVD1ue0lS
 if5CfnM0BTiOO1rMdMo42rzcBfh2DLt5a18WXsMmYOaSMGEJuF0KjgaGc5E4N00A
 QlqIJ7PKk+Kb53jz5oLV2vB/SXNvAQRLAvMTMH2Mst4veDonYyiVTfp6C+r8DJYc
 hvyaF1FoCMTWV5XsINmM2mlJ2+/G0nV5kLDmVPUHiFxE0JXnPQ8iCx9a96Ub+Zim
 F//muwohNIUa6EEN6AwrEUsoyVZ7cP/aR5wHQ9PAY3RV5nis474=
 =zWcX
 -----END PGP SIGNATURE-----

Merge 4.19.62 into android-4.19

Changes in 4.19.62
	bnx2x: Prevent load reordering in tx completion processing
	caif-hsi: fix possible deadlock in cfhsi_exit_module()
	hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
	igmp: fix memory leak in igmpv3_del_delrec()
	ipv4: don't set IPv6 only flags to IPv4 addresses
	ipv6: rt6_check should return NULL if 'from' is NULL
	ipv6: Unlink sibling route in case of failure
	net: bcmgenet: use promisc for unsupported filters
	net: dsa: mv88e6xxx: wait after reset deactivation
	net: make skb_dst_force return true when dst is refcounted
	net: neigh: fix multiple neigh timer scheduling
	net: openvswitch: fix csum updates for MPLS actions
	net: phy: sfp: hwmon: Fix scaling of RX power
	net: stmmac: Re-work the queue selection for TSO packets
	nfc: fix potential illegal memory access
	r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
	rxrpc: Fix send on a connected, but unbound socket
	sctp: fix error handling on stream scheduler initialization
	sky2: Disable MSI on ASUS P6T
	tcp: be more careful in tcp_fragment()
	tcp: fix tcp_set_congestion_control() use from bpf hook
	tcp: Reset bytes_acked and bytes_received when disconnecting
	vrf: make sure skb->data contains ip header to make routing
	net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn
	macsec: fix use-after-free of skb during RX
	macsec: fix checksumming after decryption
	netrom: fix a memory leak in nr_rx_frame()
	netrom: hold sock when setting skb->destructor
	net_sched: unset TCQ_F_CAN_BYPASS when adding filters
	net/tls: make sure offload also gets the keys wiped
	sctp: not bind the socket in sctp_connect
	net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling
	net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query
	net: bridge: don't cache ether dest pointer on input
	net: bridge: stp: don't cache eth dest pointer before skb pull
	dma-buf: balance refcount inbalance
	dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc
	gpio: davinci: silence error prints in case of EPROBE_DEFER
	MIPS: lb60: Fix pin mappings
	perf/core: Fix exclusive events' grouping
	perf/core: Fix race between close() and fork()
	ext4: don't allow any modifications to an immutable file
	ext4: enforce the immutable flag on open files
	mm: add filemap_fdatawait_range_keep_errors()
	jbd2: introduce jbd2_inode dirty range scoping
	ext4: use jbd2_inode dirty range scoping
	ext4: allow directory holes
	KVM: nVMX: do not use dangling shadow VMCS after guest reset
	KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
	mm: vmscan: scan anonymous pages on file refaults
	net: sched: verify that q!=NULL before setting q->flags
	Linux 4.19.62

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2eb23bda9d5294a5c874fe4f403934fd99e84661
2019-07-28 08:43:04 +02:00
Kuo-Hsin Yang
c1d98b766e mm: vmscan: scan anonymous pages on file refaults
commit 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa upstream.

When file refaults are detected and there are many inactive file pages,
the system never reclaim anonymous pages, the file pages are dropped
aggressively when there are still a lot of cold anonymous pages and
system thrashes.  This issue impacts the performance of applications
with large executable, e.g.  chrome.

With this patch, when file refault is detected, inactive_list_is_low()
always returns true for file pages in get_scan_count() to enable
scanning anonymous pages.

The problem can be reproduced by the following test program.

---8<---
void fallocate_file(const char *filename, off_t size)
{
	struct stat st;
	int fd;

	if (!stat(filename, &st) && st.st_size >= size)
		return;

	fd = open(filename, O_WRONLY | O_CREAT, 0600);
	if (fd < 0) {
		perror("create file");
		exit(1);
	}
	if (posix_fallocate(fd, 0, size)) {
		perror("fallocate");
		exit(1);
	}
	close(fd);
}

long *alloc_anon(long size)
{
	long *start = malloc(size);
	memset(start, 1, size);
	return start;
}

long access_file(const char *filename, long size, long rounds)
{
	int fd, i;
	volatile char *start1, *end1, *start2;
	const int page_size = getpagesize();
	long sum = 0;

	fd = open(filename, O_RDONLY);
	if (fd == -1) {
		perror("open");
		exit(1);
	}

	/*
	 * Some applications, e.g. chrome, use a lot of executable file
	 * pages, map some of the pages with PROT_EXEC flag to simulate
	 * the behavior.
	 */
	start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED,
		      fd, 0);
	if (start1 == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}
	end1 = start1 + size / 2;

	start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2);
	if (start2 == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}

	for (i = 0; i < rounds; ++i) {
		struct timeval before, after;
		volatile char *ptr1 = start1, *ptr2 = start2;
		gettimeofday(&before, NULL);
		for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size)
			sum += *ptr1 + *ptr2;
		gettimeofday(&after, NULL);
		printf("File access time, round %d: %f (sec)
", i,
		       (after.tv_sec - before.tv_sec) +
		       (after.tv_usec - before.tv_usec) / 1000000.0);
	}
	return sum;
}

int main(int argc, char *argv[])
{
	const long MB = 1024 * 1024;
	long anon_mb, file_mb, file_rounds;
	const char filename[] = "large";
	long *ret1;
	long ret2;

	if (argc != 4) {
		printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS
");
		exit(0);
	}
	anon_mb = atoi(argv[1]);
	file_mb = atoi(argv[2]);
	file_rounds = atoi(argv[3]);

	fallocate_file(filename, file_mb * MB);
	printf("Allocate %ld MB anonymous pages
", anon_mb);
	ret1 = alloc_anon(anon_mb * MB);
	printf("Access %ld MB file pages
", file_mb);
	ret2 = access_file(filename, file_mb * MB, file_rounds);
	printf("Print result to prevent optimization: %ld
",
	       *ret1 + ret2);
	return 0;
}
---8<---

Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the program
fills ram with 2048 MB memory, access a 200 MB file for 10 times.  Without
this patch, the file cache is dropped aggresively and every access to the
file is from disk.

  $ ./thrash 2048 200 10
  Allocate 2048 MB anonymous pages
  Access 200 MB file pages
  File access time, round 0: 2.489316 (sec)
  File access time, round 1: 2.581277 (sec)
  File access time, round 2: 2.487624 (sec)
  File access time, round 3: 2.449100 (sec)
  File access time, round 4: 2.420423 (sec)
  File access time, round 5: 2.343411 (sec)
  File access time, round 6: 2.454833 (sec)
  File access time, round 7: 2.483398 (sec)
  File access time, round 8: 2.572701 (sec)
  File access time, round 9: 2.493014 (sec)

With this patch, these file pages can be cached.

  $ ./thrash 2048 200 10
  Allocate 2048 MB anonymous pages
  Access 200 MB file pages
  File access time, round 0: 2.475189 (sec)
  File access time, round 1: 2.440777 (sec)
  File access time, round 2: 2.411671 (sec)
  File access time, round 3: 1.955267 (sec)
  File access time, round 4: 0.029924 (sec)
  File access time, round 5: 0.000808 (sec)
  File access time, round 6: 0.000771 (sec)
  File access time, round 7: 0.000746 (sec)
  File access time, round 8: 0.000738 (sec)
  File access time, round 9: 0.000747 (sec)

Checked the swap out stats during the test [1], 19006 pages swapped out
with this patch, 3418 pages swapped out without this patch. There are
more swap out, but I think it's within reasonable range when file backed
data set doesn't fit into the memory.

$ ./thrash 2000 100 2100 5 1 # ANON_MB FILE_EXEC FILE_NOEXEC ROUNDS
PROCESSES Allocate 2000 MB anonymous pages active_anon: 1613644,
inactive_anon: 348656, active_file: 892, inactive_file: 1384 (kB)
pswpout: 7972443, pgpgin: 478615246 Access 100 MB executable file pages
Access 2100 MB regular file pages File access time, round 0: 12.165,
(sec) active_anon: 1433788, inactive_anon: 478116, active_file: 17896,
inactive_file: 24328 (kB) File access time, round 1: 11.493, (sec)
active_anon: 1430576, inactive_anon: 477144, active_file: 25440,
inactive_file: 26172 (kB) File access time, round 2: 11.455, (sec)
active_anon: 1427436, inactive_anon: 476060, active_file: 21112,
inactive_file: 28808 (kB) File access time, round 3: 11.454, (sec)
active_anon: 1420444, inactive_anon: 473632, active_file: 23216,
inactive_file: 35036 (kB) File access time, round 4: 11.479, (sec)
active_anon: 1413964, inactive_anon: 471460, active_file: 31728,
inactive_file: 32224 (kB) pswpout: 7991449 (+ 19006), pgpgin: 489924366
(+ 11309120)

With 4 processes accessing non-overlapping parts of a large file, 30316
pages swapped out with this patch, 5152 pages swapped out without this
patch.  The swapout number is small comparing to pgpgin.

[1]: https://github.com/vovo/testing/blob/master/mem_thrash.c

Link: http://lkml.kernel.org/r/20190701081038.GA83398@google.com
Fixes: e986850598 ("mm,vmscan: only evict file pages when we have plenty")
Fixes: 7c5bd705d8 ("mm: memcg: only evict file pages when we have plenty")
Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>	[4.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[backported to 4.14.y, 4.19.y, 5.1.y: adjust context]
Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28 08:29:30 +02:00
Ross Zwisler
4becd6c11e mm: add filemap_fdatawait_range_keep_errors()
commit aa0bfcd939c30617385ffa28682c062d78050eba upstream.

In the spirit of filemap_fdatawait_range() and
filemap_fdatawait_keep_errors(), introduce
filemap_fdatawait_range_keep_errors() which both takes a range upon
which to wait and does not clear errors from the address space.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-28 08:29:29 +02:00
Greg Kroah-Hartman
5ad6eeba58 This is the 4.19.58 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0lmYwACgkQONu9yGCS
 aT4h5w//ZG0BYEwxoa4Qc8rwvncnk78miK/VRH5JVTiToDqTuttHZQoMp+NLD2fQ
 V679f/2+VqEPn8o6yJsrbM8uea0iIratI8U6L2OEt6TKPbar3CPcRUPJeqlPWkej
 tf3qjAtvNNjLcl7xCYt9JNvpF4RwA8rLWWP5hZyYMi7xcMiB0FOriTlVJYHJ0PLK
 Iqg+edkBxKwx7mvFlZnJkT0ln5hCqT4QBq2XrOYGUfy2Ans5Ytg5dhhp41QDD6iu
 oE4mS+fybCzNOR3BWl7pfpeJRg8TKq4XNzYsQr9ftt2e3OZxOi3Jg+RLsgzjJB9P
 1aTsuSzSeMXVGrAwRpBAot7TC+8F88sci0gibh4pg5N0ujGdvRW4gyzYHtdKhsTc
 wmjYMKbAxJWwz0vkRp1aSnUMSRur4Wo3qCWaOWpjkP4xhSBTTER5e5cqeuVSWde5
 FaD8s0yjnQsUaH3oxZ7zDL//MR0N+C4Izs9c2A8HkdksWTdTvI7YX8c766iIZgrm
 JFV0FIZYIHAyuXT04W9n3VSvV4tLS+ouwYZpgG09oK0lBA8NT6RyZWzijY3VE0ed
 Kl+t6iu02qZgZrvnq4pHUVnLQtw7KfyL3mzeljVxEeaTbGODPOJfypY1OMfhWYw+
 dIlmsmfa2aANf5wttl8CjLkAIIG3JmuWO2exMQidvXlGCE+rKVM=
 =u7q2
 -----END PGP SIGNATURE-----

Merge 4.19.58 into android-4.19

Changes in 4.19.58
	Bluetooth: Fix faulty expression for minimum encryption key size check
	block: Fix a NULL pointer dereference in generic_make_request()
	md/raid0: Do not bypass blocking queue entered for raid0 bios
	netfilter: nf_flow_table: ignore DF bit setting
	netfilter: nft_flow_offload: set liberal tracking mode for tcp
	netfilter: nft_flow_offload: don't offload when sequence numbers need adjustment
	netfilter: nft_flow_offload: IPCB is only valid for ipv4 family
	ASoC : cs4265 : readable register too low
	ASoC: ak4458: add return value for ak4458_probe
	ASoC: soc-pcm: BE dai needs prepare when pause release after resume
	ASoC: ak4458: rstn_control - return a non-zero on error only
	spi: bitbang: Fix NULL pointer dereference in spi_unregister_master
	drm/mediatek: fix unbind functions
	drm/mediatek: unbind components in mtk_drm_unbind()
	drm/mediatek: call drm_atomic_helper_shutdown() when unbinding driver
	drm/mediatek: clear num_pipes when unbind driver
	drm/mediatek: call mtk_dsi_stop() after mtk_drm_crtc_atomic_disable()
	ASoC: max98090: remove 24-bit format support if RJ is 0
	ASoC: sun4i-i2s: Fix sun8i tx channel offset mask
	ASoC: sun4i-i2s: Add offset to RX channel select
	x86/CPU: Add more Icelake model numbers
	usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i]
	usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC
	ALSA: hdac: fix memory release for SST and SOF drivers
	SoC: rt274: Fix internal jack assignment in set_jack callback
	scsi: hpsa: correct ioaccel2 chaining
	drm: panel-orientation-quirks: Add quirk for GPD pocket2
	drm: panel-orientation-quirks: Add quirk for GPD MicroPC
	platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
	platform/x86: intel-vbtn: Report switch events when event wakes device
	platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registration
	platform/mellanox: mlxreg-hotplug: Add devm_free_irq call to remove flow
	i2c: pca-platform: Fix GPIO lookup code
	cpuset: restore sanity to cpuset_cpus_allowed_fallback()
	scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE
	mm/mlock.c: change count_mm_mlocked_page_nr return type
	tracing: avoid build warning with HAVE_NOP_MCOUNT
	module: Fix livepatch/ftrace module text permissions race
	ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()
	drm/i915/dmc: protect against reading random memory
	ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME
	crypto: user - prevent operating on larval algorithms
	crypto: cryptd - Fix skcipher instance memory leak
	ALSA: seq: fix incorrect order of dest_client/dest_ports arguments
	ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messages
	ALSA: line6: Fix write on zero-sized buffer
	ALSA: usb-audio: fix sign unintended sign extension on left shifts
	ALSA: hda/realtek: Add quirks for several Clevo notebook barebones
	ALSA: hda/realtek - Change front mic location for Lenovo M710q
	lib/mpi: Fix karactx leak in mpi_powm
	fs/userfaultfd.c: disable irqs for fault_pending and event locks
	tracing/snapshot: Resize spare buffer if size changed
	ARM: dts: armada-xp-98dx3236: Switch to armada-38x-uart serial node
	arm64: kaslr: keep modules inside module region when KASAN is enabled
	drm/amd/powerplay: use hardware fan control if no powerplay fan table
	drm/amdgpu/gfx9: use reset default for PA_SC_FIFO_SIZE
	drm/etnaviv: add missing failure path to destroy suballoc
	drm/imx: notify drm core before sending event during crtc disable
	drm/imx: only send event on crtc disable if kept disabled
	ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()
	mm/vmscan.c: prevent useless kswapd loops
	btrfs: Ensure replaced device doesn't have pending chunk allocation
	tty: rocket: fix incorrect forward declaration of 'rp_init()'
	mlxsw: spectrum: Handle VLAN device unlinking
	net/smc: move unhash before release of clcsock
	media: s5p-mfc: fix incorrect bus assignment in virtual child device
	drm/fb-helper: generic: Don't take module ref for fbcon
	f2fs: don't access node/meta inode mapping after iput
	mac80211: mesh: fix missing unlock on error in table_path_del()
	scsi: tcmu: fix use after free
	selftests: fib_rule_tests: Fix icmp proto with ipv6
	x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting
	net: hns: Fixes the missing put_device in positive leg for roce reset
	ALSA: hda: Initialize power_state field properly
	rds: Fix warning.
	ip6: fix skb leak in ip6frag_expire_frag_queue()
	netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments
	sc16is7xx: move label 'err_spi' to correct section
	net: hns: fix unsigned comparison to less than zero
	bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K
	netfilter: ipv6: nf_defrag: accept duplicate fragments again
	KVM: x86: degrade WARN to pr_warn_ratelimited
	KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC
	nfsd: Fix overflow causing non-working mounts on 1 TB machines
	svcrdma: Ignore source port when computing DRC hash
	MIPS: Fix bounds check virt_addr_valid
	MIPS: Add missing EHB in mtc0 -> mfc0 sequence.
	MIPS: have "plain" make calls build dtbs for selected platforms
	dmaengine: qcom: bam_dma: Fix completed descriptors count
	dmaengine: imx-sdma: remove BD_INTR for channel0
	Linux 4.19.58

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-07-10 11:40:00 +02:00
Shakeel Butt
27ce6c2675 mm/vmscan.c: prevent useless kswapd loops
commit dffcac2cb88e4ec5906235d64a83d802580b119e upstream.

In production we have noticed hard lockups on large machines running
large jobs due to kswaps hoarding lru lock within isolate_lru_pages when
sc->reclaim_idx is 0 which is a small zone.  The lru was couple hundred
GiBs and the condition (page_zonenum(page) > sc->reclaim_idx) in
isolate_lru_pages() was basically skipping GiBs of pages while holding
the LRU spinlock with interrupt disabled.

On further inspection, it seems like there are two issues:

(1) If kswapd on the return from balance_pgdat() could not sleep (i.e.
    node is still unbalanced), the classzone_idx is unintentionally set
    to 0 and the whole reclaim cycle of kswapd will try to reclaim only
    the lowest and smallest zone while traversing the whole memory.

(2) Fundamentally isolate_lru_pages() is really bad when the
    allocation has woken kswapd for a smaller zone on a very large machine
    running very large jobs.  It can hoard the LRU spinlock while skipping
    over 100s of GiBs of pages.

This patch only fixes (1).  (2) needs a more fundamental solution.  To
fix (1), in the kswapd context, if pgdat->kswapd_classzone_idx is
invalid use the classzone_idx of the previous kswapd loop otherwise use
the one the waker has requested.

Link: http://lkml.kernel.org/r/20190701201847.251028-1-shakeelb@google.com
Fixes: e716f2eb24 ("mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:53:44 +02:00
swkhack
79fccb9815 mm/mlock.c: change count_mm_mlocked_page_nr return type
[ Upstream commit 0874bb49bb21bf24deda853e8bf61b8325e24bcb ]

On a 64-bit machine the value of "vma->vm_end - vma->vm_start" may be
negative when using 32 bit ints and the "count >> PAGE_SHIFT"'s result
will be wrong.  So change the local variable and return value to
unsigned long to fix the problem.

Link: http://lkml.kernel.org/r/20190513023701.83056-1-swkhack@gmail.com
Fixes: 0cf2f6f6dc ("mm: mlock: check against vma for actual mlock() size")
Signed-off-by: swkhack <swkhack@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10 09:53:40 +02:00
Greg Kroah-Hartman
5b2dde5e0b This is the 4.19.57 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0cjioACgkQONu9yGCS
 aT4TNg//Sr2cN3HmcbJrjfNAifpjT1XRix0Qy0EOYMhieCh26SbHyB0yo/N0UMCK
 iGv4ThqoBE+goK9bfb1F4CL0iMo88RM11lTy7UbemSQg2+MNJb8mvaq8YkpexTdw
 SRgXT1kyOPoHVGCypTgQcKHLdLAuOkQQGCxccU0n+Vc006nLPI0b9yRvgUnUwzvY
 EO9zLSfMLQhCcsLVoXLqaJ0AeU+VG5mkILjHZjcNElT+0T/LwoPO+VBLkuQt3KLp
 BWe+N11xsc2ZR53jptpl9UU2aaUGIKeYttKgwj7rcqUuigk4hQ0AIZmZuQWzhgBu
 6ERnKRgKARKQt4igxL5IsbIJiSK4/VJvuaR+26Sobc6zfDPQ0qfOuJaZeLYQjRQe
 SXjLNXzozA1SV593o1atLhFeY+tGMRQ4dlFCE9x/gJ68v5dya+f0e7X+zP8+HV+v
 u7pfgHT3Jb43D/G6H4sHE0VZZF4vh3Ba675Xp4NzOQOaFHJtQQUPCROiyYjJF6+H
 2fgkwsokE8oFPgqWrYuOIzV9t5THjSNqhT7lyZ/LNDJiMTnJytqfQ01zbHoaHCAb
 i5QB09x+72L7L/U9B9BGH+zEPTC2myw3dKmMv7kUxNx/3QKVDb/6cVnLnTWs4zrJ
 lw52HzgB2aV8pRtvgg0OeHedJ8UGVYfVq2/YHUHbiukgZ61n3J8=
 =OkFp
 -----END PGP SIGNATURE-----

Merge 4.19.57 into android-4.19

Changes in 4.19.57
	perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit set nul
	perf help: Remove needless use of strncpy()
	perf header: Fix unchecked usage of strncpy()
	arm64: Don't unconditionally add -Wno-psabi to KBUILD_CFLAGS
	Revert "x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP"
	IB/hfi1: Close PSM sdma_progress sleep window
	9p/xen: fix check for xenbus_read error in front_probe
	9p: Use a slab for allocating requests
	9p: embed fcall in req to round down buffer allocs
	9p: add a per-client fcall kmem_cache
	9p: rename p9_free_req() function
	9p: Add refcount to p9_req_t
	9p/rdma: do not disconnect on down_interruptible EAGAIN
	9p: Rename req to rreq in trans_fd
	9p: acl: fix uninitialized iattr access
	9p/rdma: remove useless check in cm_event_handler
	9p: p9dirent_read: check network-provided name length
	9p: potential NULL dereference
	9p/trans_fd: abort p9_read_work if req status changed
	9p/trans_fd: put worker reqs on destroy
	net/9p: include trans_common.h to fix missing prototype warning.
	qmi_wwan: Fix out-of-bounds read
	Revert "usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup"
	usb: dwc3: gadget: combine unaligned and zero flags
	usb: dwc3: gadget: track number of TRBs per request
	usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()
	usb: dwc3: gadget: extract dwc3_gadget_ep_skip_trbs()
	usb: dwc3: gadget: introduce cancelled_list
	usb: dwc3: gadget: move requests to cancelled_list
	usb: dwc3: gadget: remove wait_end_transfer
	usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
	fs/proc/array.c: allow reporting eip/esp for all coredumping threads
	mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
	fs/binfmt_flat.c: make load_flat_shared_library() work
	clk: socfpga: stratix10: fix divider entry for the emac clocks
	mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
	mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
	mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
	dm log writes: make sure super sector log updates are written in order
	scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
	x86/speculation: Allow guests to use SSBD even if host does not
	x86/microcode: Fix the microcode load on CPU hotplug for real
	x86/resctrl: Prevent possible overrun during bitmap operations
	KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT
	NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O
	cpu/speculation: Warn on unsupported mitigations= parameter
	SUNRPC: Clean up initialisation of the struct rpc_rqst
	irqchip/mips-gic: Use the correct local interrupt map registers
	eeprom: at24: fix unexpected timeout under high load
	af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
	bonding: Always enable vlan tx offload
	ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
	net/packet: fix memory leak in packet_set_ring()
	net: remove duplicate fetch in sock_getsockopt
	net: stmmac: fixed new system time seconds value calculation
	net: stmmac: set IC bit when transmitting frames with HW timestamp
	sctp: change to hold sk after auth shkey is created successfully
	team: Always enable vlan tx offload
	tipc: change to use register_pernet_device
	tipc: check msg->req data len in tipc_nl_compat_bearer_disable
	tun: wake up waitqueues after IFF_UP is set
	bpf: simplify definition of BPF_FIB_LOOKUP related flags
	bpf: lpm_trie: check left child of last leftmost node for NULL
	bpf: fix nested bpf tracepoints with per-cpu data
	bpf: fix unconnected udp hooks
	bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro
	bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err
	arm64: futex: Avoid copying out uninitialised stack in failed cmpxchg()
	bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd
	futex: Update comments and docs about return values of arch futex code
	RDMA: Directly cast the sockaddr union to sockaddr
	tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
	usb: dwc3: Reset num_trbs after skipping
	arm64: insn: Fix ldadd instruction encoding
	Linux 4.19.57

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-07-03 13:19:45 +02:00
Colin Ian King
87cf811ab6 mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
commit 7298e3b0a149c91323b3205d325e942c3b3b9ef6 upstream.

Currently the calcuation of end_pfn can round up the pfn number to more
than the actual maximum number of pfns, causing an Oops.  Fix this by
ensuring end_pfn is never more than max_pfn.

This can be easily triggered when on systems where the end_pfn gets
rounded up to more than max_pfn using the idle-page stress-ng stress test:

sudo stress-ng --idle-page 0

  BUG: unable to handle kernel paging request at 00000000000020d8
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 11039 Comm: stress-ng-idle- Not tainted 5.0.0-5-generic #6-Ubuntu
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  RIP: 0010:page_idle_get_page+0xc8/0x1a0
  Code: 0f b1 0a 75 7d 48 8b 03 48 89 c2 48 c1 e8 33 83 e0 07 48 c1 ea 36 48 8d 0c 40 4c 8d 24 88 49 c1 e4 07 4c 03 24 d5 00 89 c3 be <49> 8b 44 24 58 48 8d b8 80 a1 02 00 e8 07 d5 77 00 48 8b 53 08 48
  RSP: 0018:ffffafd7c672fde8 EFLAGS: 00010202
  RAX: 0000000000000005 RBX: ffffe36341fff700 RCX: 000000000000000f
  RDX: 0000000000000284 RSI: 0000000000000275 RDI: 0000000001fff700
  RBP: ffffafd7c672fe00 R08: ffffa0bc34056410 R09: 0000000000000276
  R10: ffffa0bc754e9b40 R11: ffffa0bc330f6400 R12: 0000000000002080
  R13: ffffe36341fff700 R14: 0000000000080000 R15: ffffa0bc330f6400
  FS: 00007f0ec1ea5740(0000) GS:ffffa0bc7db00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000020d8 CR3: 0000000077d68000 CR4: 00000000000006e0
  Call Trace:
    page_idle_bitmap_write+0x8c/0x140
    sysfs_kf_bin_write+0x5c/0x70
    kernfs_fop_write+0x12e/0x1b0
    __vfs_write+0x1b/0x40
    vfs_write+0xab/0x1b0
    ksys_write+0x55/0xc0
    __x64_sys_write+0x1a/0x20
    do_syscall_64+0x5a/0x110
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: http://lkml.kernel.org/r/20190618124352.28307-1-colin.king@canonical.com
Fixes: 33c3fc71c8 ("mm: introduce idle page tracking")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-03 13:14:45 +02:00
Naoya Horiguchi
1192fb703d mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
commit faf53def3b143df11062d87c12afe6afeb6f8cc7 upstream.

madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline
for hugepages with overcommitting enabled.  That was caused by the
suboptimal code in current soft-offline code.  See the following part:

    ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
                            MIGRATE_SYNC, MR_MEMORY_FAILURE);
    if (ret) {
            ...
    } else {
            /*
             * We set PG_hwpoison only when the migration source hugepage
             * was successfully dissolved, because otherwise hwpoisoned
             * hugepage remains on free hugepage list, then userspace will
             * find it as SIGBUS by allocation failure. That's not expected
             * in soft-offlining.
             */
            ret = dissolve_free_huge_page(page);
            if (!ret) {
                    if (set_hwpoison_free_buddy_page(page))
                            num_poisoned_pages_inc();
            }
    }
    return ret;

Here dissolve_free_huge_page() returns -EBUSY if the migration source page
was freed into buddy in migrate_pages(), but even in that case we actually
has a chance that set_hwpoison_free_buddy_page() succeeds.  So that means
current code gives up offlining too early now.

dissolve_free_huge_page() checks that a given hugepage is suitable for
dissolving, where we should return success for !PageHuge() case because
the given hugepage is considered as already dissolved.

This change also affects other callers of dissolve_free_huge_page(), which
are cleaned up together.

[n-horiguchi@ah.jp.nec.com: v3]
  Link: http://lkml.kernel.org/r/1560761476-4651-3-git-send-email-n-horiguchi@ah.jp.nec.comLink: http://lkml.kernel.org/r/1560154686-18497-3-git-send-email-n-horiguchi@ah.jp.nec.com
Fixes: 6bc9b56433 ("mm: fix race on soft-offlining")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Chen, Jerry T <jerry.t.chen@intel.com>
Tested-by: Chen, Jerry T <jerry.t.chen@intel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com>
Cc: "Chen, Jerry T" <jerry.t.chen@intel.com>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>
Cc: <stable@vger.kernel.org>	[4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-03 13:14:45 +02:00
Naoya Horiguchi
aab6291888 mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
commit b38e5962f8ed0d2a2b28a887fc2221f7f41db119 upstream.

The pass/fail of soft offline should be judged by checking whether the
raw error page was finally contained or not (i.e.  the result of
set_hwpoison_free_buddy_page()), but current code do not work like
that.  It might lead us to misjudge the test result when
set_hwpoison_free_buddy_page() fails.

Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may
not offline the original page and will not return an error.

Link: http://lkml.kernel.org/r/1560154686-18497-2-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Fixes: 6bc9b56433 ("mm: fix race on soft-offlining")
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Xishi Qiu <xishi.qiuxishi@alibaba-inc.com>
Cc: "Chen, Jerry T" <jerry.t.chen@intel.com>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>
Cc: <stable@vger.kernel.org>	[4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-03 13:14:44 +02:00
zhong jiang
49e9b499a3 mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
commit 29b190fa774dd1b72a1a6f19687d55dc72ea83be upstream.

mpol_rebind_nodemask() is called for MPOL_BIND and MPOL_INTERLEAVE
mempoclicies when the tasks's cpuset's mems_allowed changes.  For
policies created without MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES,
it works by remapping the policy's allowed nodes (stored in v.nodes)
using the previous value of mems_allowed (stored in
w.cpuset_mems_allowed) as the domain of map and the new mems_allowed
(passed as nodes) as the range of the map (see the comment of
bitmap_remap() for details).

The result of remapping is stored back as policy's nodemask in v.nodes,
and the new value of mems_allowed should be stored in
w.cpuset_mems_allowed to facilitate the next rebind, if it happens.

However, 213980c0f2 ("mm, mempolicy: simplify rebinding mempolicies
when updating cpusets") introduced a bug where the result of remapping
is stored in w.cpuset_mems_allowed instead.  Thus, a mempolicy's
allowed nodes can evolve in an unexpected way after a series of
rebinding due to cpuset mems_allowed changes, possibly binding to a
wrong node or a smaller number of nodes which may e.g.  overload them.
This patch fixes the bug so rebinding again works as intended.

[vbabka@suse.cz: new changlog]
  Link: http://lkml.kernel.org/r/ef6a69c6-c052-b067-8f2c-9d615c619bb9@suse.cz
Link: http://lkml.kernel.org/r/1558768043-23184-1-git-send-email-zhongjiang@huawei.com
Fixes: 213980c0f2 ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets")
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-03 13:14:44 +02:00
Greg Kroah-Hartman
237d383597 This is the 4.19.54 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0Nx3oACgkQONu9yGCS
 aT7hJRAAzdu1EKr/VUiFJshvPL/+veH1MLdMgafW6gXcoQCQSdMuv1j3mv3axElC
 dz2e/nHXvIhMKMrhzgEvVwV8m8eKPwM4IZjTJ8ji8st9uy0R2UIdbPUHrLtRAQzI
 bK2BIk6697B9/9z2s8nDXhKex9EtZ2vj8DeQLT1AgQKMM0+abPA2YR9+cp9HEjQR
 Wa5NjajwWgpty1VVk+Q6GMD+GBnEILyqxtVGv30z/prWoVadFAXJEZrWsaXotUvh
 ySc2CS9Iu/muiIegTSStnNXaWaawZblA4JdDcdRJ235rPK8sbACYtDgZispYopi4
 y/9kUtuZzWEv61aFZkXd13jKZ8pJsrLZLoc+yDqew0IA6M2Hz7d7Pq/UWQ0m4qnt
 rCcU1vTFooSp/IHOToXN0oQQrTqD2oK5zO4V9S5McwQniuO/RqUPJfX/sKjtrvNT
 SI/yLVKMXImAwVXxNEfO4bE+T9F/QYptOHdpfqJkPvpKLzjxnPBPswG8poIKwQzJ
 w3p1AoQ1ISuW6b94a/nI5nSC28gVslVg+oTjRjF7kfIcOtvglX79XPvfdM5bB2DC
 qD51A8veEGCtAu57/tSyisGOomqTqqqbaiYXwuhgTI1NSszbqKDLNvjsw1GKj0rK
 mSmDzVMgQhxaHOagOeDqLYlj1zgTamx5R6+dretf8AOXyEE2RSg=
 =+MJy
 -----END PGP SIGNATURE-----

Merge 4.19.54 into android-4.19

Changes in 4.19.54
	ax25: fix inconsistent lock state in ax25_destroy_timer
	be2net: Fix number of Rx queues used for flow hashing
	hv_netvsc: Set probe mode to sync
	ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero
	lapb: fixed leak of control-blocks.
	neigh: fix use-after-free read in pneigh_get_next
	net: dsa: rtl8366: Fix up VLAN filtering
	net: openvswitch: do not free vport if register_netdevice() is failed.
	nfc: Ensure presence of required attributes in the deactivate_target handler
	sctp: Free cookie before we memdup a new one
	sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg
	tipc: purge deferredq list for each grp member in tipc_group_delete
	vsock/virtio: set SOCK_DONE on peer shutdown
	net/mlx5: Avoid reloading already removed devices
	net: mvpp2: prs: Fix parser range for VID filtering
	net: mvpp2: prs: Use the correct helpers when removing all VID filters
	Staging: vc04_services: Fix a couple error codes
	perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
	netfilter: nf_queue: fix reinject verdict handling
	ipvs: Fix use-after-free in ip_vs_in
	selftests: netfilter: missing error check when setting up veth interface
	clk: ti: clkctrl: Fix clkdm_clk handling
	powerpc/powernv: Return for invalid IMC domain
	usb: xhci: Fix a potential null pointer dereference in xhci_debugfs_create_endpoint()
	mISDN: make sure device name is NUL terminated
	x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
	perf/ring_buffer: Fix exposing a temporarily decreased data_head
	perf/ring_buffer: Add ordering to rb->nest increment
	perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data
	gpio: fix gpio-adp5588 build errors
	net: stmmac: update rx tail pointer register to fix rx dma hang issue.
	net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE()
	ACPI/PCI: PM: Add missing wakeup.flags.valid checks
	drm/etnaviv: lock MMU while dumping core
	net: aquantia: tx clean budget logic error
	net: aquantia: fix LRO with FCS error
	i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr
	ALSA: hda - Force polling mode on CNL for fixing codec communication
	configfs: Fix use-after-free when accessing sd->s_dentry
	perf data: Fix 'strncat may truncate' build failure with recent gcc
	perf namespace: Protect reading thread's namespace
	perf record: Fix s390 missing module symbol and warning for non-root users
	ia64: fix build errors by exporting paddr_to_nid()
	xen/pvcalls: Remove set but not used variable
	xenbus: Avoid deadlock during suspend due to open transactions
	KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
	KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
	arm64: fix syscall_fn_t type
	arm64: use the correct function type in SYSCALL_DEFINE0
	arm64: use the correct function type for __arm64_sys_ni_syscall
	net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs
	net: phylink: ensure consistent phy interface mode
	net: phy: dp83867: Set up RGMII TX delay
	scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
	scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask
	scsi: scsi_dh_alua: Fix possible null-ptr-deref
	scsi: libsas: delete sas port if expander discover failed
	mlxsw: spectrum: Prevent force of 56G
	ocfs2: fix error path kobject memory leak
	coredump: fix race condition between collapse_huge_page() and core dumping
	Abort file_remove_privs() for non-reg. files
	Linux 4.19.54

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-22 08:47:30 +02:00
Andrea Arcangeli
465ce9a50f coredump: fix race condition between collapse_huge_page() and core dumping
commit 59ea6d06cfa9247b586a695c21f94afa7183af74 upstream.

When fixing the race conditions between the coredump and the mmap_sem
holders outside the context of the process, we focused on
mmget_not_zero()/get_task_mm() callers in 04f5866e41fb70 ("coredump: fix
race condition between mmget_not_zero()/get_task_mm() and core
dumping"), but those aren't the only cases where the mmap_sem can be
taken outside of the context of the process as Michal Hocko noticed
while backporting that commit to older -stable kernels.

If mmgrab() is called in the context of the process, but then the
mm_count reference is transferred outside the context of the process,
that can also be a problem if the mmap_sem has to be taken for writing
through that mm_count reference.

khugepaged registration calls mmgrab() in the context of the process,
but the mmap_sem for writing is taken later in the context of the
khugepaged kernel thread.

collapse_huge_page() after taking the mmap_sem for writing doesn't
modify any vma, so it's not obvious that it could cause a problem to the
coredump, but it happens to modify the pmd in a way that breaks an
invariant that pmd_trans_huge_lock() relies upon.  collapse_huge_page()
needs the mmap_sem for writing just to block concurrent page faults that
call pmd_trans_huge_lock().

Specifically the invariant that "!pmd_trans_huge()" cannot become a
"pmd_trans_huge()" doesn't hold while collapse_huge_page() runs.

The coredump will call __get_user_pages() without mmap_sem for reading,
which eventually can invoke a lockless page fault which will need a
functional pmd_trans_huge_lock().

So collapse_huge_page() needs to use mmget_still_valid() to check it's
not running concurrently with the coredump...  as long as the coredump
can invoke page faults without holding the mmap_sem for reading.

This has "Fixes: khugepaged" to facilitate backporting, but in my view
it's more a bug in the coredump code that will eventually have to be
rewritten to stop invoking page faults without the mmap_sem for reading.
So the long term plan is still to drop all mmget_still_valid().

Link: http://lkml.kernel.org/r/20190607161558.32104-1-aarcange@redhat.com
Fixes: ba76149f47 ("thp: khugepaged")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-22 08:15:21 +02:00
Greg Kroah-Hartman
f613e8938d This is the 4.19.53 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0J4rgACgkQONu9yGCS
 aT6JVhAAouXsxBjKDxMlJkMrcCFPVHTgWyrHMJ0aeV3jWACGoKbd+RBZtEoNrD4n
 X5/AJdDMvSPuPMWqE8BO/+Q1e4YQYRoHQyQ24aRkq66a8+UpybRMpshFb+tfGQO+
 ysVypoBg97Y+8ao0MFKxu79+rrJor566f8jXznrxvFG3MnI0Kq3BBGq59UcQTu4z
 e1jd2OJjQxttZovOlA6v9rRTdHdzoI/84DZfYgqV1pUoC8pwjg7UO0+1cwnKlqem
 No4iWGP4lmfXN/VuzRpBs9a6kqE14GL5Ah13U3uhYXk9Dd2iLkS2fPKPiaDpiHql
 /SIL/3q4BCJVDZePIZuLoRe9qKLDh7rdNziIDhSUskW1H1tHAhsJUktWo0HarZW2
 6Z5K10S/MsXAxWoFnY1TMBhZg+h+f/pHxy6FqZdvV4DGsx7uDWoVhn+o4Qw5WTSG
 7KGMvvyDE7EyciwXVkQoN3hwX9pcxBwDtMswzSbW1VM9h3nYOzfY6OwSc/GDbM+f
 9zQb+sgLIu1K5FUqyYNMfV/YXGGHpml181ZRWm3bZ4N63odDKMwzWfkc5RlIlDoM
 kqmMZKvpTSA4MeRzGzcUMsIpXv8SeNvR3cotCBqau+IRAIkLX8tZClJL+4FilLIN
 PkOjXtuEt648ZDybVGZykmAPIVuWreosWdtIH79ZOmnpveRQN7I=
 =DiXP
 -----END PGP SIGNATURE-----

Merge 4.19.53 into android-4.19

Changes in 4.19.53
	drm/nouveau: add kconfig option to turn off nouveau legacy contexts. (v3)
	nouveau: Fix build with CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT disabled
	HID: multitouch: handle faulty Elo touch device
	HID: wacom: Don't set tool type until we're in range
	HID: wacom: Don't report anything prior to the tool entering range
	HID: wacom: Send BTN_TOUCH in response to INTUOSP2_BT eraser contact
	HID: wacom: Correct button numbering 2nd-gen Intuos Pro over Bluetooth
	HID: wacom: Sync INTUOSP2_BT touch state after each frame if necessary
	Revert "ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops"
	ALSA: oxfw: allow PCM capture for Stanton SCS.1m
	ALSA: hda/realtek - Update headset mode for ALC256
	ALSA: firewire-motu: fix destruction of data for isochronous resources
	libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk
	mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
	fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
	mm/vmscan.c: fix trying to reclaim unevictable LRU page
	signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO
	ptrace: restore smp_rmb() in __ptrace_may_access()
	iommu/arm-smmu: Avoid constant zero in TLBI writes
	i2c: acorn: fix i2c warning
	bcache: fix stack corruption by PRECEDING_KEY()
	bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached
	cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
	ASoC: cs42xx8: Add regcache mask dirty
	ASoC: fsl_asrc: Fix the issue about unsupported rate
	drm/i915/sdvo: Implement proper HDMI audio support for SDVO
	x86/uaccess, kcov: Disable stack protector
	ALSA: seq: Protect in-kernel ioctl calls with mutex
	ALSA: seq: Fix race of get-subscription call vs port-delete ioctls
	Revert "ALSA: seq: Protect in-kernel ioctl calls with mutex"
	s390/kasan: fix strncpy_from_user kasan checks
	Drivers: misc: fix out-of-bounds access in function param_set_kgdbts_var
	f2fs: fix to avoid accessing xattr across the boundary
	scsi: qedi: remove memset/memcpy to nfunc and use func instead
	scsi: qedi: remove set but not used variables 'cdev' and 'udev'
	scsi: lpfc: correct rcu unlock issue in lpfc_nvme_info_show
	scsi: lpfc: add check for loss of ndlp when sending RRQ
	arm64/mm: Inhibit huge-vmap with ptdump
	nvme: fix srcu locking on error return in nvme_get_ns_from_disk
	nvme: remove the ifdef around nvme_nvm_ioctl
	nvme: merge nvme_ns_ioctl into nvme_ioctl
	nvme: release namespace SRCU protection before performing controller ioctls
	nvme: fix memory leak for power latency tolerance
	platform/x86: pmc_atom: Add Lex 3I380D industrial PC to critclk_systems DMI table
	platform/x86: pmc_atom: Add several Beckhoff Automation boards to critclk_systems DMI table
	scsi: bnx2fc: fix incorrect cast to u64 on shift operation
	libnvdimm: Fix compilation warnings with W=1
	selftests: fib_rule_tests: fix local IPv4 address typo
	selftests/timers: Add missing fflush(stdout) calls
	tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts
	usbnet: ipheth: fix racing condition
	KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid instrumentation
	KVM: x86/pmu: mask the result of rdpmc according to the width of the counters
	KVM: x86/pmu: do not mask the value that is written to fixed PMUs
	KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION
	tools/kvm_stat: fix fields filter for child events
	drm/vmwgfx: integer underflow in vmw_cmd_dx_set_shader() leading to an invalid read
	drm/vmwgfx: NULL pointer dereference from vmw_cmd_dx_view_define()
	usb: dwc2: Fix DMA cache alignment issues
	usb: dwc2: host: Fix wMaxPacketSize handling (fix webcam regression)
	USB: Fix chipmunk-like voice when using Logitech C270 for recording audio.
	USB: usb-storage: Add new ID to ums-realtek
	USB: serial: pl2303: add Allied Telesis VT-Kit3
	USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode
	USB: serial: option: add Telit 0x1260 and 0x1261 compositions
	timekeeping: Repair ktime_get_coarse*() granularity
	RAS/CEC: Convert the timer callback to a workqueue
	RAS/CEC: Fix binary search function
	x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback
	x86/kasan: Fix boot with 5-level paging and KASAN
	x86/mm/KASLR: Compute the size of the vmemmap section properly
	x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled
	drm/edid: abstract override/firmware EDID retrieval
	drm: add fallback override/firmware EDID modes workaround
	rtc: pcf8523: don't return invalid date when battery is low
	Linux 4.19.53

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-19 10:10:24 +02:00
Minchan Kim
54a20289cb mm/vmscan.c: fix trying to reclaim unevictable LRU page
commit a58f2cef26e1ca44182c8b22f4f4395e702a5795 upstream.

There was the below bug report from Wu Fangsuo.

On the CMA allocation path, isolate_migratepages_range() could isolate
unevictable LRU pages and reclaim_clean_page_from_list() can try to
reclaim them if they are clean file-backed pages.

  page:ffffffbf02f33b40 count:86 mapcount:84 mapping:ffffffc08fa7a810 index:0x24
  flags: 0x19040c(referenced|uptodate|arch_1|mappedtodisk|unevictable|mlocked)
  raw: 000000000019040c ffffffc08fa7a810 0000000000000024 0000005600000053
  raw: ffffffc009b05b20 ffffffc009b05b20 0000000000000000 ffffffc09bf3ee80
  page dumped because: VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page))
  page->mem_cgroup:ffffffc09bf3ee80
  ------------[ cut here ]------------
  kernel BUG at /home/build/farmland/adroid9.0/kernel/linux/mm/vmscan.c:1350!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 7125 Comm: syz-executor Tainted: G S              4.14.81 #3
  Hardware name: ASR AQUILAC EVB (DT)
  task: ffffffc00a54cd00 task.stack: ffffffc009b00000
  PC is at shrink_page_list+0x1998/0x3240
  LR is at shrink_page_list+0x1998/0x3240
  pc : [<ffffff90083a2158>] lr : [<ffffff90083a2158>] pstate: 60400045
  sp : ffffffc009b05940
  ..
     shrink_page_list+0x1998/0x3240
     reclaim_clean_pages_from_list+0x3c0/0x4f0
     alloc_contig_range+0x3bc/0x650
     cma_alloc+0x214/0x668
     ion_cma_allocate+0x98/0x1d8
     ion_alloc+0x200/0x7e0
     ion_ioctl+0x18c/0x378
     do_vfs_ioctl+0x17c/0x1780
     SyS_ioctl+0xac/0xc0

Wu found it's due to commit ad6b67041a ("mm: remove SWAP_MLOCK in
ttu").  Before that, unevictable pages go to cull_mlocked so that we
can't reach the VM_BUG_ON_PAGE line.

To fix the issue, this patch filters out unevictable LRU pages from the
reclaim_clean_pages_from_list in CMA.

Link: http://lkml.kernel.org/r/20190524071114.74202-1-minchan@kernel.org
Fixes: ad6b67041a ("mm: remove SWAP_MLOCK in ttu")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Wu Fangsuo <fangsuowu@asrmicro.com>
Debugged-by: Wu Fangsuo <fangsuowu@asrmicro.com>
Tested-by: Wu Fangsuo <fangsuowu@asrmicro.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Suryawanshi <pankaj.suryawanshi@einfochips.com>
Cc: <stable@vger.kernel.org>	[4.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 08:18:00 +02:00
Shakeel Butt
553a1f0d3c mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
commit 3510955b327176fd4cbab5baa75b449f077722a2 upstream.

Syzbot reported following memory leak:

ffffffffda RBX: 0000000000000003 RCX: 0000000000441f79
BUG: memory leak
unreferenced object 0xffff888114f26040 (size 32):
  comm "syz-executor626", pid 7056, jiffies 4294948701 (age 39.410s)
  hex dump (first 32 bytes):
    40 60 f2 14 81 88 ff ff 40 60 f2 14 81 88 ff ff  @`......@`......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
     slab_post_alloc_hook mm/slab.h:439 [inline]
     slab_alloc mm/slab.c:3326 [inline]
     kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
     kmalloc include/linux/slab.h:547 [inline]
     __memcg_init_list_lru_node+0x58/0xf0 mm/list_lru.c:352
     memcg_init_list_lru_node mm/list_lru.c:375 [inline]
     memcg_init_list_lru mm/list_lru.c:459 [inline]
     __list_lru_init+0x193/0x2a0 mm/list_lru.c:626
     alloc_super+0x2e0/0x310 fs/super.c:269
     sget_userns+0x94/0x2a0 fs/super.c:609
     sget+0x8d/0xb0 fs/super.c:660
     mount_nodev+0x31/0xb0 fs/super.c:1387
     fuse_mount+0x2d/0x40 fs/fuse/inode.c:1236
     legacy_get_tree+0x27/0x80 fs/fs_context.c:661
     vfs_get_tree+0x2e/0x120 fs/super.c:1476
     do_new_mount fs/namespace.c:2790 [inline]
     do_mount+0x932/0xc50 fs/namespace.c:3110
     ksys_mount+0xab/0x120 fs/namespace.c:3319
     __do_sys_mount fs/namespace.c:3333 [inline]
     __se_sys_mount fs/namespace.c:3330 [inline]
     __x64_sys_mount+0x26/0x30 fs/namespace.c:3330
     do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is a simple off by one bug on the error path.

Link: http://lkml.kernel.org/r/20190528043202.99980-1-shakeelb@google.com
Fixes: 60d3fd32a7 ("list_lru: introduce per-memcg lists")
Reported-by: syzbot+f90a420dfe2b1b03cb2c@syzkaller.appspotmail.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 08:17:59 +02:00
Greg Kroah-Hartman
d1f7f3be99 This is the 4.19.51 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0EwEMACgkQONu9yGCS
 aT4UpA/3UqmMAaHH4g20cic8YDoBkYXoQUyj/kbf1w6bXqCuLTHOFS/qXa6QtPK3
 WfwNWChIob+EbfAMjreQZjT6pwBbxyCNUvsvpB0k8YhJ3v/sIZL/Wc+1ZDu+jBC9
 xfqwT9+JGzgu8P4PUTr1BsGMhgiH9qoJAeq7RCuDcicMIJ4/aJVr4Cvrs+18PVUe
 95T5vSn6G62QyOrUExmuuztvNM2P/kos6yJTkN80l3uPLMUjsnWCsMKu+9utd5ea
 ew332Z/BQs+ff4oljH1uRwsM/Z7+AKXlXatXD0sHQ8CTEqh44SgUSU96vB/h9W8I
 6a0t4M2atsdaGjMHmiiPA+gIgd1rW0lsHk6ob6qgfzuRBFGN9BTUfZgQwhOW7uXt
 e4o5RrWELkbk/TlJzrG1dFjhfyeb7q3LHOOg8kOVU0KdPD44ekJW9qoI8tlMPI+5
 mafCCS/oS6TaW20ZKmjjkIbfTndzdO3dy5EWLy3elCEyLF2gDZ6WCAL+SMngdApC
 /dbuBigF/+RBaEU1e56DkcYUYXjt6UO84O3dYAx69s6EhHS5CP4yCySuYpdxyg/G
 MWdFDhtbnFyMXqoK1ROS0hNnxpydkh+R1Ns0TeSYibJI2J2enMGIa0thu4aVLD3v
 +GLqHV2PsPTRQmF5ChnvNV7O53i4j4WBQQjL+80PQulJwRFb/A==
 =bIqz
 -----END PGP SIGNATURE-----

Merge 4.19.51 into android-4.19

Changes in 4.19.51
	rapidio: fix a NULL pointer dereference when create_workqueue() fails
	fs/fat/file.c: issue flush after the writeback of FAT
	sysctl: return -EINVAL if val violates minmax
	ipc: prevent lockup on alloc_msg and free_msg
	drm/pl111: Initialize clock spinlock early
	ARM: prevent tracing IPI_CPU_BACKTRACE
	mm/hmm: select mmu notifier when selecting HMM
	hugetlbfs: on restore reserve error path retain subpool reservation
	mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE
	mm/cma.c: fix crash on CMA allocation if bitmap allocation fails
	initramfs: free initrd memory if opening /initrd.image fails
	mm/cma.c: fix the bitmap status to show failed allocation reason
	mm: page_mkclean vs MADV_DONTNEED race
	mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
	mm/slab.c: fix an infinite loop in leaks_show()
	kernel/sys.c: prctl: fix false positive in validate_prctl_map()
	thermal: rcar_gen3_thermal: disable interrupt in .remove
	drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER
	mfd: tps65912-spi: Add missing of table registration
	mfd: intel-lpss: Set the device in reset state when init
	drm/nouveau/disp/dp: respect sink limits when selecting failsafe link configuration
	mfd: twl6040: Fix device init errors for ACCCTL register
	perf/x86/intel: Allow PEBS multi-entry in watermark mode
	drm/nouveau/kms/gf119-gp10x: push HeadSetControlOutputResource() mthd when encoders change
	drm/bridge: adv7511: Fix low refresh rate selection
	objtool: Don't use ignore flag for fake jumps
	drm/nouveau/kms/gv100-: fix spurious window immediate interlocks
	bpf: fix undefined behavior in narrow load handling
	EDAC/mpc85xx: Prevent building as a module
	pwm: meson: Use the spin-lock only to protect register modifications
	mailbox: stm32-ipcc: check invalid irq
	ntp: Allow TAI-UTC offset to be set to zero
	f2fs: fix to avoid panic in do_recover_data()
	f2fs: fix to avoid panic in f2fs_inplace_write_data()
	f2fs: fix to avoid panic in f2fs_remove_inode_page()
	f2fs: fix to do sanity check on free nid
	f2fs: fix to clear dirty inode in error path of f2fs_iget()
	f2fs: fix to avoid panic in dec_valid_block_count()
	f2fs: fix to use inline space only if inline_xattr is enable
	f2fs: fix to do sanity check on valid block count of segment
	f2fs: fix to do checksum even if inode page is uptodate
	percpu: remove spurious lock dependency between percpu and sched
	configfs: fix possible use-after-free in configfs_register_group
	uml: fix a boot splat wrt use of cpu_all_mask
	PCI: dwc: Free MSI in dw_pcie_host_init() error path
	PCI: dwc: Free MSI IRQ page in dw_pcie_free_msi()
	ovl: do not generate duplicate fsnotify events for "fake" path
	mmc: mmci: Prevent polling for busy detection in IRQ context
	netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast
	netfilter: nf_conntrack_h323: restore boundary check correctness
	mips: Make sure dt memory regions are valid
	netfilter: nf_tables: fix base chain stat rcu_dereference usage
	watchdog: imx2_wdt: Fix set_timeout for big timeout values
	watchdog: fix compile time error of pretimeout governors
	blk-mq: move cancel of requeue_work into blk_mq_release
	iommu/vt-d: Set intel_iommu_gfx_mapped correctly
	misc: pci_endpoint_test: Fix test_reg_bar to be updated in pci_endpoint_test
	PCI: designware-ep: Use aligned ATU window for raising MSI interrupts
	nvme-pci: unquiesce admin queue on shutdown
	nvme-pci: shutdown on timeout during deletion
	netfilter: nf_flow_table: check ttl value in flow offload data path
	netfilter: nf_flow_table: fix netdev refcnt leak
	ALSA: hda - Register irq handler after the chip initialization
	nvmem: core: fix read buffer in place
	nvmem: sunxi_sid: Support SID on A83T and H5
	fuse: retrieve: cap requested size to negotiated max_write
	nfsd: allow fh_want_write to be called twice
	nfsd: avoid uninitialized variable warning
	vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING"
	iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel
	switchtec: Fix unintended mask of MRPC event
	net: thunderbolt: Unregister ThunderboltIP protocol handler when suspending
	x86/PCI: Fix PCI IRQ routing table memory leak
	i40e: Queues are reserved despite "Invalid argument" error
	platform/chrome: cros_ec_proto: check for NULL transfer function
	PCI: keystone: Prevent ARM32 specific code to be compiled for ARM64
	soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher
	clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288
	soc: rockchip: Set the proper PWM for rk3288
	ARM: dts: imx51: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx50: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx53: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx6sll: Specify IMX6SLL_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx7d: Specify IMX7D_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6ul: Specify IMX6UL_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA
	PCI: rpadlpar: Fix leaked device_node references in add/remove paths
	drm/amd/display: Use plane->color_space for dpp if specified
	ARM: OMAP2+: pm33xx-core: Do not Turn OFF CEFUSE as PPA may be using it
	platform/x86: intel_pmc_ipc: adding error handling
	power: supply: max14656: fix potential use-before-alloc
	net: hns3: return 0 and print warning when hit duplicate MAC
	PCI: rcar: Fix a potential NULL pointer dereference
	PCI: rcar: Fix 64bit MSI message address handling
	scsi: qla2xxx: Reset the FCF_ASYNC_{SENT|ACTIVE} flags
	video: hgafb: fix potential NULL pointer dereference
	video: imsttfb: fix potential NULL pointer dereferences
	block, bfq: increase idling for weight-raised queues
	PCI: xilinx: Check for __get_free_pages() failure
	gpio: gpio-omap: add check for off wake capable gpios
	ice: Add missing case in print_link_msg for printing flow control
	dmaengine: idma64: Use actual device for DMA transfers
	pwm: tiehrpwm: Update shadow register for disabling PWMs
	ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa
	pwm: Fix deadlock warning when removing PWM device
	ARM: exynos: Fix undefined instruction during Exynos5422 resume
	usb: typec: fusb302: Check vconn is off when we start toggling
	soc: renesas: Identify R-Car M3-W ES1.3
	gpio: vf610: Do not share irq_chip
	percpu: do not search past bitmap when allocating an area
	Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections"
	Revert "drm/nouveau: add kconfig option to turn off nouveau legacy contexts. (v3)"
	ovl: check the capability before cred overridden
	ovl: support stacked SEEK_HOLE/SEEK_DATA
	drm/vc4: fix fb references in async update
	ALSA: seq: Cover unsubscribe_port() in list_mutex
	Linux 4.19.51

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-15 16:12:59 +02:00
Dennis Zhou
526972e95e percpu: do not search past bitmap when allocating an area
[ Upstream commit 8c43004af01635cc9fbb11031d070e5e0d327ef2 ]

pcpu_find_block_fit() guarantees that a fit is found within
PCPU_BITMAP_BLOCK_BITS. Iteration is used to determine the first fit as
it compares against the block's contig_hint. This can lead to
incorrectly scanning past the end of the bitmap. The behavior was okay
given the check after for bit_off >= end and the correctness of the
hints from pcpu_find_block_fit().

This patch fixes this by bounding the end offset by the number of bits
in a chunk.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:11 +02:00
John Sperbeck
5329dcafea percpu: remove spurious lock dependency between percpu and sched
[ Upstream commit 198790d9a3aeaef5792d33a560020861126edc22 ]

In free_percpu() we sometimes call pcpu_schedule_balance_work() to
queue a work item (which does a wakeup) while holding pcpu_lock.
This creates an unnecessary lock dependency between pcpu_lock and
the scheduler's pi_lock.  There are other places where we call
pcpu_schedule_balance_work() without hold pcpu_lock, and this case
doesn't need to be different.

Moving the call outside the lock prevents the following lockdep splat
when running tools/testing/selftests/bpf/{test_maps,test_progs} in
sequence with lockdep enabled:

======================================================
WARNING: possible circular locking dependency detected
5.1.0-dbg-DEV #1 Not tainted
------------------------------------------------------
kworker/23:255/18872 is trying to acquire lock:
000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520

but task is already holding lock:
00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (pcpu_lock){..-.}:
       lock_acquire+0x9e/0x180
       _raw_spin_lock_irqsave+0x3a/0x50
       pcpu_alloc+0xfa/0x780
       __alloc_percpu_gfp+0x12/0x20
       alloc_htab_elem+0x184/0x2b0
       __htab_percpu_map_update_elem+0x252/0x290
       bpf_percpu_hash_update+0x7c/0x130
       __do_sys_bpf+0x1912/0x1be0
       __x64_sys_bpf+0x1a/0x20
       do_syscall_64+0x59/0x400
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #3 (&htab->buckets[i].lock){....}:
       lock_acquire+0x9e/0x180
       _raw_spin_lock_irqsave+0x3a/0x50
       htab_map_update_elem+0x1af/0x3a0

-> #2 (&rq->lock){-.-.}:
       lock_acquire+0x9e/0x180
       _raw_spin_lock+0x2f/0x40
       task_fork_fair+0x37/0x160
       sched_fork+0x211/0x310
       copy_process.part.43+0x7b1/0x2160
       _do_fork+0xda/0x6b0
       kernel_thread+0x29/0x30
       rest_init+0x22/0x260
       arch_call_rest_init+0xe/0x10
       start_kernel+0x4fd/0x520
       x86_64_start_reservations+0x24/0x26
       x86_64_start_kernel+0x6f/0x72
       secondary_startup_64+0xa4/0xb0

-> #1 (&p->pi_lock){-.-.}:
       lock_acquire+0x9e/0x180
       _raw_spin_lock_irqsave+0x3a/0x50
       try_to_wake_up+0x41/0x600
       wake_up_process+0x15/0x20
       create_worker+0x16b/0x1e0
       workqueue_init+0x279/0x2ee
       kernel_init_freeable+0xf7/0x288
       kernel_init+0xf/0x180
       ret_from_fork+0x24/0x30

-> #0 (&(&pool->lock)->rlock){-.-.}:
       __lock_acquire+0x101f/0x12a0
       lock_acquire+0x9e/0x180
       _raw_spin_lock+0x2f/0x40
       __queue_work+0xb2/0x520
       queue_work_on+0x38/0x80
       free_percpu+0x221/0x260
       pcpu_freelist_destroy+0x11/0x20
       stack_map_free+0x2a/0x40
       bpf_map_free_deferred+0x3c/0x50
       process_one_work+0x1f7/0x580
       worker_thread+0x54/0x410
       kthread+0x10f/0x150
       ret_from_fork+0x24/0x30

other info that might help us debug this:

Chain exists of:
  &(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(pcpu_lock);
                               lock(&htab->buckets[i].lock);
                               lock(pcpu_lock);
  lock(&(&pool->lock)->rlock);

 *** DEADLOCK ***

3 locks held by kworker/23:255/18872:
 #0: 00000000b36a6e16 ((wq_completion)events){+.+.},
     at: process_one_work+0x17a/0x580
 #1: 00000000dfd966f0 ((work_completion)(&map->work)){+.+.},
     at: process_one_work+0x17a/0x580
 #2: 00000000e3e7a6aa (pcpu_lock){..-.},
     at: free_percpu+0x36/0x260

stack backtrace:
CPU: 23 PID: 18872 Comm: kworker/23:255 Not tainted 5.1.0-dbg-DEV #1
Hardware name: ...
Workqueue: events bpf_map_free_deferred
Call Trace:
 dump_stack+0x67/0x95
 print_circular_bug.isra.38+0x1c6/0x220
 check_prev_add.constprop.50+0x9f6/0xd20
 __lock_acquire+0x101f/0x12a0
 lock_acquire+0x9e/0x180
 _raw_spin_lock+0x2f/0x40
 __queue_work+0xb2/0x520
 queue_work_on+0x38/0x80
 free_percpu+0x221/0x260
 pcpu_freelist_destroy+0x11/0x20
 stack_map_free+0x2a/0x40
 bpf_map_free_deferred+0x3c/0x50
 process_one_work+0x1f7/0x580
 worker_thread+0x54/0x410
 kthread+0x10f/0x150
 ret_from_fork+0x24/0x30

Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:05 +02:00
Qian Cai
515d18ced8 mm/slab.c: fix an infinite loop in leaks_show()
[ Upstream commit 745e10146c31b1c6ed3326286704ae251b17f663 ]

"cat /proc/slab_allocators" could hang forever on SMP machines with
kmemleak or object debugging enabled due to other CPUs running do_drain()
will keep making kmemleak_object or debug_objects_cache dirty and unable
to escape the first loop in leaks_show(),

do {
	set_store_user_clean(cachep);
	drain_cpu_caches(cachep);
	...

} while (!is_store_user_clean(cachep));

For example,

do_drain
  slabs_destroy
    slab_destroy
      kmem_cache_free
        __cache_free
          ___cache_free
            kmemleak_free_recursive
              delete_object_full
                __delete_object
                  put_object
                    free_object_rcu
                      kmem_cache_free
                        cache_free_debugcheck --> dirty kmemleak_object

One approach is to check cachep->name and skip both kmemleak_object and
debug_objects_cache in leaks_show().  The other is to set store_user_clean
after drain_cpu_caches() which leaves a small window between
drain_cpu_caches() and set_store_user_clean() where per-CPU caches could
be dirty again lead to slightly wrong information has been stored but
could also speed up things significantly which sounds like a good
compromise.  For example,

 # cat /proc/slab_allocators
 0m42.778s # 1st approach
 0m0.737s  # 2nd approach

[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/20190411032635.10325-1-cai@lca.pw
Fixes: d31676dfde ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:01 +02:00
Yue Hu
13e1ea0881 mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
[ Upstream commit f0fd50504a54f5548eb666dc16ddf8394e44e4b7 ]

If not find zero bit in find_next_zero_bit(), it will return the size
parameter passed in, so the start bit should be compared with bitmap_maxno
rather than cma->count.  Although getting maxchunk is working fine due to
zero value of order_per_bit currently, the operation will be stuck if
order_per_bit is set as non-zero.

Link: http://lkml.kernel.org/r/20190319092734.276-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Joe Perches <joe@perches.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Safonov <d.safonov@partner.samsung.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:01 +02:00
Aneesh Kumar K.V
38c5fce7fc mm: page_mkclean vs MADV_DONTNEED race
[ Upstream commit 024eee0e83f0df52317be607ca521e0fc572aa07 ]

MADV_DONTNEED is handled with mmap_sem taken in read mode.  We call
page_mkclean without holding mmap_sem.

MADV_DONTNEED implies that pages in the region are unmapped and subsequent
access to the pages in that range is handled as a new page fault.  This
implies that if we don't have parallel access to the region when
MADV_DONTNEED is run we expect those range to be unallocated.

w.r.t page_mkclean() we need to make sure that we don't break the
MADV_DONTNEED semantics.  MADV_DONTNEED check for pmd_none without holding
pmd_lock.  This implies we skip the pmd if we temporarily mark pmd none.
Avoid doing that while marking the page clean.

Keep the sequence same for dax too even though we don't support
MADV_DONTNEED for dax mapping

The bug was noticed by code review and I didn't observe any failures w.r.t
test run.  This is similar to

commit 58ceeb6bec
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Thu Apr 13 14:56:26 2017 -0700

    thp: fix MADV_DONTNEED vs. MADV_FREE race

commit ced108037c
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Thu Apr 13 14:56:20 2017 -0700

    thp: fix MADV_DONTNEED vs. numa balancing race

Link: http://lkml.kernel.org/r/20190321040610.14226-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc:"Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:01 +02:00
Yue Hu
77a01e3357 mm/cma.c: fix the bitmap status to show failed allocation reason
[ Upstream commit 2b59e01a3aa665f751d1410b99fae9336bd424e1 ]

Currently one bit in cma bitmap represents number of pages rather than
one page, cma->count means cma size in pages. So to find available pages
via find_next_zero_bit()/find_next_bit() we should use cma size not in
pages but in bits although current free pages number is correct due to
zero value of order_per_bit. Once order_per_bit is changed the bitmap
status will be incorrect.

The size input in cma_debug_show_areas() is not correct.  It will
affect the available pages at some position to debug the failure issue.

This is an example with order_per_bit = 1

Before this change:
[    4.120060] cma: number of available pages: 1@93+4@108+7@121+7@137+7@153+7@169+7@185+7@201+3@213+3@221+3@229+3@237+3@245+3@253+3@261+3@269+3@277+3@285+3@293+3@301+3@309+3@317+3@325+19@333+15@369+512@512=> 638 free of 1024 total pages

After this change:
[    4.143234] cma: number of available pages: 2@93+8@108+14@121+14@137+14@153+14@169+14@185+14@201+6@213+6@221+6@229+6@237+6@245+6@253+6@261+6@269+6@277+6@285+6@293+6@301+6@309+6@317+6@325+38@333+30@369=> 252 free of 1024 total pages

Obviously the bitmap status before is incorrect.

Link: http://lkml.kernel.org/r/20190320060829.9144-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:01 +02:00
Yue Hu
e5f8857ea9 mm/cma.c: fix crash on CMA allocation if bitmap allocation fails
[ Upstream commit 1df3a339074e31db95c4790ea9236874b13ccd87 ]

f022d8cb7e ("mm: cma: Don't crash on allocation if CMA area can't be
activated") fixes the crash issue when activation fails via setting
cma->count as 0, same logic exists if bitmap allocation fails.

Link: http://lkml.kernel.org/r/20190325081309.6004-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:01 +02:00
Linxu Fang
5094a85d6d mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE
[ Upstream commit 299c83dce9ea3a79bb4b5511d2cb996b6b8e5111 ]

342332e6a9 ("mm/page_alloc.c: introduce kernelcore=mirror option") and
later patches rewrote the calculation of node spanned pages.

e506b99696 ("mem-hotplug: fix node spanned pages when we have a movable
node"), but the current code still has problems,

When we have a node with only zone_movable and the node id is not zero,
the size of node spanned pages is double added.

That's because we have an empty normal zone, and zone_start_pfn or
zone_end_pfn is not between arch_zone_lowest_possible_pfn and
arch_zone_highest_possible_pfn, so we need to use clamp to constrain the
range just like the commit <96e907d13602> (bootmem: Reimplement
__absent_pages_in_range() using for_each_mem_pfn_range()).

e.g.
Zone ranges:
  DMA      [mem 0x0000000000001000-0x0000000000ffffff]
  DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
  Normal   [mem 0x0000000100000000-0x000000023fffffff]
Movable zone start for each node
  Node 0: 0x0000000100000000
  Node 1: 0x0000000140000000
Early memory node ranges
  node   0: [mem 0x0000000000001000-0x000000000009efff]
  node   0: [mem 0x0000000000100000-0x00000000bffdffff]
  node   0: [mem 0x0000000100000000-0x000000013fffffff]
  node   1: [mem 0x0000000140000000-0x000000023fffffff]

node 0 DMA	spanned:0xfff   present:0xf9e   absent:0x61
node 0 DMA32	spanned:0xff000 present:0xbefe0	absent:0x40020
node 0 Normal	spanned:0	present:0	absent:0
node 0 Movable	spanned:0x40000 present:0x40000 absent:0
On node 0 totalpages(node_present_pages): 1048446
node_spanned_pages:1310719
node 1 DMA	spanned:0	    present:0		absent:0
node 1 DMA32	spanned:0	    present:0		absent:0
node 1 Normal	spanned:0x100000    present:0x100000	absent:0
node 1 Movable	spanned:0x100000    present:0x100000	absent:0
On node 1 totalpages(node_present_pages): 2097152
node_spanned_pages:2097152
Memory: 6967796K/12582392K available (16388K kernel code, 3686K rwdata,
4468K rodata, 2160K init, 10444K bss, 5614596K reserved, 0K
cma-reserved)

It shows that the current memory of node 1 is double added.
After this patch, the problem is fixed.

node 0 DMA	spanned:0xfff   present:0xf9e   absent:0x61
node 0 DMA32	spanned:0xff000 present:0xbefe0	absent:0x40020
node 0 Normal	spanned:0	present:0	absent:0
node 0 Movable	spanned:0x40000 present:0x40000 absent:0
On node 0 totalpages(node_present_pages): 1048446
node_spanned_pages:1310719
node 1 DMA	spanned:0	    present:0		absent:0
node 1 DMA32	spanned:0	    present:0		absent:0
node 1 Normal	spanned:0	    present:0		absent:0
node 1 Movable	spanned:0x100000    present:0x100000	absent:0
On node 1 totalpages(node_present_pages): 1048576
node_spanned_pages:1048576
memory: 6967796K/8388088K available (16388K kernel code, 3686K rwdata,
4468K rodata, 2160K init, 10444K bss, 1420292K reserved, 0K
cma-reserved)

Link: http://lkml.kernel.org/r/1554178276-10372-1-git-send-email-fanglinxu@huawei.com
Signed-off-by: Linxu Fang <fanglinxu@huawei.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:00 +02:00
Mike Kravetz
ffaafd27b0 hugetlbfs: on restore reserve error path retain subpool reservation
[ Upstream commit 0919e1b69ab459e06df45d3ba6658d281962db80 ]

When a huge page is allocated, PagePrivate() is set if the allocation
consumed a reservation.  When freeing a huge page, PagePrivate is checked.
If set, it indicates the reservation should be restored.  PagePrivate
being set at free huge page time mostly happens on error paths.

When huge page reservations are created, a check is made to determine if
the mapping is associated with an explicitly mounted filesystem.  If so,
pages are also reserved within the filesystem.  The default action when
freeing a huge page is to decrement the usage count in any associated
explicitly mounted filesystem.  However, if the reservation is to be
restored the reservation/use count within the filesystem should not be
decrementd.  Otherwise, a subsequent page allocation and free for the same
mapping location will cause the file filesystem usage to go 'negative'.

Filesystem                         Size  Used Avail Use% Mounted on
nodev                              4.0G -4.0M  4.1G    - /opt/hugepool

To fix, when freeing a huge page do not adjust filesystem usage if
PagePrivate() is set to indicate the reservation should be restored.

I did not cc stable as the problem has been around since reserves were
added to hugetlbfs and nobody has noticed.

Link: http://lkml.kernel.org/r/20190328234704.27083-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:00 +02:00
Jérôme Glisse
85e1a6c4b3 mm/hmm: select mmu notifier when selecting HMM
[ Upstream commit 734fb89968900b5c5f8edd5038bd4cdeab8c61d2 ]

To avoid random config build issue, select mmu notifier when HMM is
selected.  In any cases when HMM get selected it will be by users that
will also wants the mmu notifier.

Link: http://lkml.kernel.org/r/20190403193318.16478-2-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-15 11:54:00 +02:00
Greg Kroah-Hartman
3f534fa2fc This is the 4.19.49 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlz8soUACgkQONu9yGCS
 aT5o7w//cc+XhbVg3ro68Rq2byYVuntdiKtYDYlEQtv5DORELdX29g6gBlLlO+ma
 J29W/8quC9fi8EERG14TRvhh8MJNlQCv5g+Afvqr42zCOBowUFL4q3o06dON2e3v
 qhKsEhwcvHH/dR+ixtw1zpSNl6bIdG87C5gxYpWvpjCHpVpVDwLHfRK7GbSS/ioe
 Ys2uJwtbB4ZpileCoA4O61alYNLq/zP/gJiKraMbrwfOPfY3dmDIqo6KMniUyFs3
 0E0w6AKp2PalsoVxyl6mD0KGFFChWz0AfSE6b3IdlzXC2qOZkWlAKPWxpQMHWdme
 HVGXHxrW6aDVk+RGOmIk1dKWDdVqTZJ1Cx1RscrtIQsQwuXqR/STgoqu1hbQ/B4W
 /jMak4BICuyfnOzsmuQbIhOWaf8KTIokYfQ92SAEDQl+xRhxqe2VTSCkl0gVE+Yr
 KxMoyzyZTYidpWRedUGJiACNjlgogYp3L9xiJjHz2dlQD9r4GTsHXo2CbzV1jYrL
 HKqmkz6angoHx3Vw6tffCe9117KFKkOxX8MG3xRMZMfRglRbNd3m8Uo+YGOWcscU
 e1mUE+71woj89TquN7cTg5hY5/kRiGwVJBqihSuWzeJsf0LPab9Txv7ipHKuzh84
 i8e3Isi3eoZR55ZsU7Jl8QeDIZ2OSNprUy9ldvaI2K4fg/VxpA4=
 =hIeN
 -----END PGP SIGNATURE-----

Merge 4.19.49 into android-4.19

Changes in 4.19.49
	sparc64: Fix regression in non-hypervisor TLB flush xcall
	include/linux/bitops.h: sanitize rotate primitives
	xhci: update bounce buffer with correct sg num
	xhci: Use %zu for printing size_t type
	xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()
	usb: xhci: avoid null pointer deref when bos field is NULL
	usbip: usbip_host: fix BUG: sleeping function called from invalid context
	usbip: usbip_host: fix stub_dev lock context imbalance regression
	USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor
	USB: sisusbvga: fix oops in error path of sisusb_probe
	USB: Add LPM quirk for Surface Dock GigE adapter
	USB: rio500: refuse more than one device at a time
	USB: rio500: fix memory leak in close after disconnect
	media: usb: siano: Fix general protection fault in smsusb
	media: usb: siano: Fix false-positive "uninitialized variable" warning
	media: smsusb: better handle optional alignment
	brcmfmac: fix NULL pointer derefence during USB disconnect
	scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
	scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
	tracing: Avoid memory leak in predicate_parse()
	Btrfs: fix wrong ctime and mtime of a directory after log replay
	Btrfs: fix race updating log root item during fsync
	Btrfs: fix fsync not persisting changed attributes of a directory
	Btrfs: incremental send, fix file corruption when no-holes feature is enabled
	iio: dac: ds4422/ds4424 fix chip verification
	iio: adc: ti-ads8688: fix timestamp is not updated in buffer
	s390/crypto: fix gcm-aes-s390 selftest failures
	s390/crypto: fix possible sleep during spinlock aquired
	KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interrupts
	powerpc/perf: Fix MMCRA corruption by bhrb_filter
	ALSA: line6: Assure canceling delayed work at disconnection
	ALSA: hda/realtek - Set default power save node to 0
	ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops
	KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID
	drm/nouveau/i2c: Disable i2c bus access after ->fini()
	i2c: mlxcpld: Fix wrong initialization order in probe
	i2c: synquacer: fix synquacer_i2c_doxfer() return value
	tty: serial: msm_serial: Fix XON/XOFF
	tty: max310x: Fix external crystal register setup
	memcg: make it work on sparse non-0-node systems
	kernel/signal.c: trace_signal_deliver when signal_group_exit
	arm64: Fix the arm64_personality() syscall wrapper redirection
	docs: Fix conf.py for Sphinx 2.0
	doc: Cope with the deprecation of AutoReporter
	doc: Cope with Sphinx logging deprecations
	ima: show rules with IMA_INMASK correctly
	evm: check hash algorithm passed to init_desc()
	vt/fbcon: deinitialize resources in visual_init() after failed memory allocation
	serial: sh-sci: disable DMA for uart_console
	staging: vc04_services: prevent integer overflow in create_pagelist()
	staging: wlan-ng: fix adapter initialization failure
	cifs: fix memory leak of pneg_inbuf on -EOPNOTSUPP ioctl case
	CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM
	Revert "lockd: Show pid of lockd for remote locks"
	gcc-plugins: Fix build failures under Darwin host
	drm/tegra: gem: Fix CPU-cache maintenance for BO's allocated using get_pages()
	drm/vmwgfx: Don't send drm sysfs hotplug events on initial master set
	drm/sun4i: Fix sun8i HDMI PHY clock initialization
	drm/sun4i: Fix sun8i HDMI PHY configuration for > 148.5 MHz
	drm/rockchip: shutdown drm subsystem on shutdown
	drm/lease: Make sure implicit planes are leased
	Compiler Attributes: add support for __copy (gcc >= 9)
	include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
	Revert "x86/build: Move _etext to actual end of .text"
	Revert "binder: fix handling of misaligned binder object"
	binder: fix race between munmap() and direct reclaim
	x86/ftrace: Do not call function graph from dynamic trampolines
	x86/ftrace: Set trampoline pages as executable
	x86/kprobes: Set instruction page as executable
	scsi: lpfc: Fix backport of faf5a744f4f8 ("scsi: lpfc: avoid uninitialized variable warning")
	of: overlay: validate overlay properties #address-cells and #size-cells
	of: overlay: set node fields from properties when add new overlay node
	media: uvcvideo: Fix uvc_alloc_entity() allocation alignment
	Linux 4.19.49

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-09 09:27:18 +02:00
Jiri Slaby
8b057ad846 memcg: make it work on sparse non-0-node systems
commit 3e8589963773a5c23e2f1fe4bcad0e9a90b7f471 upstream.

We have a single node system with node 0 disabled:
  Scanning NUMA topology in Northbridge 24
  Number of physical nodes 2
  Skipping disabled node 0
  Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
  NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]

This causes crashes in memcg when system boots:
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
...
  RIP: 0010:list_lru_add+0x94/0x170
...
  Call Trace:
   d_lru_add+0x44/0x50
   dput.part.34+0xfc/0x110
   __fput+0x108/0x230
   task_work_run+0x9f/0xc0
   exit_to_usermode_loop+0xf5/0x100

It is reproducible as far as 4.12.  I did not try older kernels.  You have
to have a new enough systemd, e.g.  241 (the reason is unknown -- was not
investigated).  Cannot be reproduced with systemd 234.

The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.

The root cause are list_lru_memcg_aware checks in the list_lru code.  The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.

So fix this by avoiding checks on node 0.  Remember the memcg-awareness by
a bool flag in struct list_lru.

Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
Fixes: 60d3fd32a7 ("list_lru: introduce per-memcg lists")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-09 09:17:19 +02:00
Greg Kroah-Hartman
50f91435a2 This is the 4.19.45 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzk4CsACgkQONu9yGCS
 aT5Xaw//UWopx4Yqbiv+4HBgW+2ijP4utxI4lBNYITD44jvkyVJnztUtVkWepu5r
 Tkl/7zytXOpxbpuhS0xqpWwG7lL5eT4NCG08KSX4lYQVjIWX4YzVkw9gLe9V2AaK
 IqTzaWtbuagARbnR3UC65TI4kjRGsr9ldY0AbbGGVTM6IwPquHN9Qd9TAzRwRohn
 CxY94Bwp1RcN2sSPkD3nUCUGOSNh97BXyypeM7FyceOzOpyAdQCXoUPc84cPqdNC
 4GBkd5Z1IL/7zX3HDjQeGS0KK6e1enslSmsbSSUVuHI90LCr3CZPJkFF8RFnPnff
 2RA7bdhp8C1JPeLDimr+SNSLEl9yywoH6d4UQAnBwoLDjiFCEITVgjDtYzzd81+1
 ES6lbUAs8v/LXkaCaExq6pNNd1prg6Mj9Fe6cz+G9V/YV1tLUsoAJHdFucu8Sp7w
 rwz/PZ6waCf8VRO4aYFF9b+u7PQ/RFZWQYsz22P7PhAYg0CTajV1FWGk1AYi0+wQ
 5YCmthbWhDo9U5lAFyQ0pVTXv/UNgEu6MfV1/jKtCk5AzsbE77orj1xusKckHq2e
 QojgmELmHMlFFajI0h/ddDo7iwz/5OrPVs9D03RysiOciMzdTKPucPyC0Ah4yEBA
 sJ0cQkaVtqO2Nu3E42lfQTpVIqBgi8NGav+kRwryB1YyKeaXLsM=
 =HJ7O
 -----END PGP SIGNATURE-----

Merge 4.19.45 into android-4.19

Changes in 4.19.45
	locking/rwsem: Prevent decrement of reader count before increment
	x86/speculation/mds: Revert CPU buffer clear on double fault exit
	x86/speculation/mds: Improve CPU buffer clear documentation
	objtool: Fix function fallthrough detection
	arm64: dts: rockchip: Disable DCMDs on RK3399's eMMC controller.
	ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260
	ARM: dts: exynos: Fix audio (microphone) routing on Odroid XU3
	mmc: sdhci-of-arasan: Add DTS property to disable DCMDs.
	ARM: exynos: Fix a leaked reference by adding missing of_node_put
	power: supply: axp288_charger: Fix unchecked return value
	power: supply: axp288_fuel_gauge: Add ACEPC T8 and T11 mini PCs to the blacklist
	arm64: mmap: Ensure file offset is treated as unsigned
	arm64: arch_timer: Ensure counter register reads occur with seqlock held
	arm64: compat: Reduce address limit
	arm64: Clear OSDLR_EL1 on CPU boot
	arm64: Save and restore OSDLR_EL1 across suspend/resume
	sched/x86: Save [ER]FLAGS on context switch
	crypto: crypto4xx - fix ctr-aes missing output IV
	crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues
	crypto: salsa20 - don't access already-freed walk.iv
	crypto: chacha20poly1305 - set cra_name correctly
	crypto: ccp - Do not free psp_master when PLATFORM_INIT fails
	crypto: vmx - fix copy-paste error in CTR mode
	crypto: skcipher - don't WARN on unprocessed data after slow walk step
	crypto: crct10dif-generic - fix use via crypto_shash_digest()
	crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
	crypto: arm64/gcm-aes-ce - fix no-NEON fallback code
	crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
	crypto: rockchip - update IV buffer to contain the next IV
	crypto: arm/aes-neonbs - don't access already-freed walk.iv
	crypto: arm64/aes-neonbs - don't access already-freed walk.iv
	mmc: core: Fix tag set memory leak
	ALSA: line6: toneport: Fix broken usage of timer for delayed execution
	ALSA: usb-audio: Fix a memory leak bug
	ALSA: hda/hdmi - Read the pin sense from register when repolling
	ALSA: hda/hdmi - Consider eld_valid when reporting jack event
	ALSA: hda/realtek - EAPD turn on later
	ALSA: hdea/realtek - Headset fixup for System76 Gazelle (gaze14)
	ASoC: max98090: Fix restore of DAPM Muxes
	ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
	ASoC: fsl_esai: Fix missing break in switch statement
	ASoC: codec: hdac_hdmi add device_link to card device
	bpf, arm64: remove prefetch insn in xadd mapping
	crypto: ccree - remove special handling of chained sg
	crypto: ccree - fix mem leak on error path
	crypto: ccree - don't map MAC key on stack
	crypto: ccree - use correct internal state sizes for export
	crypto: ccree - don't map AEAD key and IV on stack
	crypto: ccree - pm resume first enable the source clk
	crypto: ccree - HOST_POWER_DOWN_EN should be the last CC access during suspend
	crypto: ccree - add function to handle cryptocell tee fips error
	crypto: ccree - handle tee fips error during power management resume
	mm/mincore.c: make mincore() more conservative
	mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses
	mm/hugetlb.c: don't put_page in lock of hugetlb_lock
	hugetlb: use same fault hash key for shared and private mappings
	ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
	userfaultfd: use RCU to free the task struct when fork fails
	ACPI: PM: Set enable_for_wake for wakeup GPEs during suspend-to-idle
	mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
	mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values
	mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write
	tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0
	tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
	jbd2: check superblock mapped prior to committing
	ext4: make sanity check in mballoc more strict
	ext4: ignore e_value_offs for xattrs with value-in-ea-inode
	ext4: avoid drop reference to iloc.bh twice
	ext4: fix use-after-free race with debug_want_extra_isize
	ext4: actually request zeroing of inode table after grow
	ext4: fix ext4_show_options for file systems w/o journal
	btrfs: Check the first key and level for cached extent buffer
	btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails
	btrfs: Honour FITRIM range constraints during free space trim
	Btrfs: send, flush dellaloc in order to avoid data loss
	Btrfs: do not start a transaction during fiemap
	Btrfs: do not start a transaction at iterate_extent_inodes()
	bcache: fix a race between cache register and cacheset unregister
	bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
	ipmi:ssif: compare block number correctly for multi-part return messages
	crypto: ccm - fix incompatibility between "ccm" and "ccm_base"
	fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
	tty: Don't force RISCV SBI console as preferred console
	ext4: zero out the unused memory region in the extent tree block
	ext4: fix data corruption caused by overlapping unaligned and aligned IO
	ext4: fix use-after-free in dx_release()
	ext4: avoid panic during forced reboot due to aborted journal
	ALSA: hda/realtek - Corrected fixup for System76 Gazelle (gaze14)
	ALSA: hda/realtek - Fixup headphone noise via runtime suspend
	ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
	jbd2: fix potential double free
	KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
	KVM: lapic: Busy wait for timer to expire when using hv_timer
	kbuild: turn auto.conf.cmd into a mandatory include file
	xen/pvh: set xen_domain_type to HVM in xen_pvh_init
	libnvdimm/namespace: Fix label tracking error
	iov_iter: optimize page_copy_sane()
	pstore: Centralize init/exit routines
	pstore: Allocate compression during late_initcall()
	pstore: Refactor compression initialization
	ext4: fix compile error when using BUFFER_TRACE
	ext4: don't update s_rev_level if not required
	Linux 4.19.45

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-22 08:00:39 +02:00
Mike Kravetz
a3ccc156f3 hugetlb: use same fault hash key for shared and private mappings
commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream.

hugetlb uses a fault mutex hash table to prevent page faults of the
same pages concurrently.  The key for shared and private mappings is
different.  Shared keys off address_space and file index.  Private keys
off mm and virtual address.  Consider a private mappings of a populated
hugetlbfs file.  A fault will map the page from the file and if needed
do a COW to map a writable page.

Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
pages.  It uses the address_space file index key.  However, private
mappings will use a different key and could race with this code to map
the file page.  This causes problems (BUG) for the page cache remove
code as it expects the page to be unmapped.  A sample stack is:

page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
kernel BUG at mm/filemap.c:169!
...
RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
...
Call Trace:
__delete_from_page_cache+0x39/0x220
delete_from_page_cache+0x45/0x70
remove_inode_hugepages+0x13c/0x380
? __add_to_page_cache_locked+0x162/0x380
hugetlbfs_fallocate+0x403/0x540
? _cond_resched+0x15/0x30
? __inode_security_revalidate+0x5d/0x70
? selinux_file_permission+0x100/0x130
vfs_fallocate+0x13f/0x270
ksys_fallocate+0x3c/0x80
__x64_sys_fallocate+0x1a/0x20
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9

There seems to be another potential COW issue/race with this approach
of different private and shared keys as noted in commit 8382d914eb
("mm, hugetlb: improve page-fault scalability").

Since every hugetlb mapping (even anon and private) is actually a file
mapping, just use the address_space index key for all mappings.  This
results in potentially more hash collisions.  However, this should not
be the common case.

Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
Fixes: b5cec28d36 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22 07:37:40 +02:00
Kai Shen
0b16b09a72 mm/hugetlb.c: don't put_page in lock of hugetlb_lock
commit 2bf753e64b4a702e27ce26ff520c59563c62f96b upstream.

spinlock recursion happened when do LTP test:
#!/bin/bash
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &

The dtor returned by get_compound_page_dtor in __put_compound_page may be
the function of free_huge_page which will lock the hugetlb_lock, so don't
put_page in lock of hugetlb_lock.

 BUG: spinlock recursion on CPU#0, hugemmap05/1079
  lock: hugetlb_lock+0x0/0x18, .magic: dead4ead, .owner: hugemmap05/1079, .owner_cpu: 0
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xa4/0xcc
  spin_dump+0x84/0xa8
  do_raw_spin_lock+0xd0/0x108
  _raw_spin_lock+0x20/0x30
  free_huge_page+0x9c/0x260
  __put_compound_page+0x44/0x50
  __put_page+0x2c/0x60
  alloc_surplus_huge_page.constprop.19+0xf0/0x140
  hugetlb_acct_memory+0x104/0x378
  hugetlb_reserve_pages+0xe0/0x250
  hugetlbfs_file_mmap+0xc0/0x140
  mmap_region+0x3e8/0x5b0
  do_mmap+0x280/0x460
  vm_mmap_pgoff+0xf4/0x128
  ksys_mmap_pgoff+0xb4/0x258
  __arm64_sys_mmap+0x34/0x48
  el0_svc_common+0x78/0x130
  el0_svc_handler+0x38/0x78
  el0_svc+0x8/0xc

Link: http://lkml.kernel.org/r/b8ade452-2d6b-0372-32c2-703644032b47@huawei.com
Fixes: 9980d744a0 ("mm, hugetlb: get rid of surplus page accounting tricks")
Signed-off-by: Kai Shen <shenkai8@huawei.com>
Signed-off-by: Feilong Lin <linfeilong@huawei.com>
Reported-by: Wang Wang <wangwang2@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22 07:37:40 +02:00
Dan Williams
58db381368 mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses
commit fce86ff5802bac3a7b19db171aa1949ef9caac31 upstream.

Starting with c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page
protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls
pmdp_set_access_flags().  That helper enforces a pmd aligned @address
argument via VM_BUG_ON() assertion.

Update the implementation to take a 'struct vm_fault' argument directly
and apply the address alignment fixup internally to fix crash signatures
like:

    kernel BUG at arch/x86/mm/pgtable.c:515!
    invalid opcode: 0000 [#1] SMP NOPTI
    CPU: 51 PID: 43713 Comm: java Tainted: G           OE     4.19.35 #1
    [..]
    RIP: 0010:pmdp_set_access_flags+0x48/0x50
    [..]
    Call Trace:
     vmf_insert_pfn_pmd+0x198/0x350
     dax_iomap_fault+0xe82/0x1190
     ext4_dax_huge_fault+0x103/0x1f0
     ? __switch_to_asm+0x40/0x70
     __handle_mm_fault+0x3f6/0x1370
     ? __switch_to_asm+0x34/0x70
     ? __switch_to_asm+0x40/0x70
     handle_mm_fault+0xda/0x200
     __do_page_fault+0x249/0x4f0
     do_page_fault+0x32/0x110
     ? page_fault+0x8/0x30
     page_fault+0x1e/0x30

Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: c6f3c5ee40c1 ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Piotr Balcer <piotr.balcer@intel.com>
Tested-by: Yan Ma <yan.ma@intel.com>
Tested-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Chandan Rajendra <chandan@linux.ibm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22 07:37:40 +02:00
Jiri Kosina
f580a54bbd mm/mincore.c: make mincore() more conservative
commit 134fca9063ad4851de767d1768180e5dede9a881 upstream.

The semantics of what mincore() considers to be resident is not
completely clear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache".

That's potentially a problem, as that [in]directly exposes
meta-information about pagecache / memory mapping state even about
memory not strictly belonging to the process executing the syscall,
opening possibilities for sidechannel attacks.

Change the semantics of mincore() so that it only reveals pagecache
information for non-anonymous mappings that belog to files that the
calling process could (if it tried to) successfully open for writing;
otherwise we'd be including shared non-exclusive mappings, which

 - is the sidechannel

 - is not the usecase for mincore(), as that's primarily used for data,
   not (shared) text

[jkosina@suse.cz: v2]
  Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz
[mhocko@suse.com: restructure can_do_mincore() conditions]
Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pm
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Josh Snyder <joshs@netflix.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Originally-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally-by: Dominique Martinet <asmadeus@codewreck.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daniel Gruss <daniel@gruss.cc>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22 07:37:40 +02:00
Christoph Hellwig
2bf50b5a8e FROMLIST: mm: don't cast ->readpage to filler_t for do_read_cache_page
We can just pass a NULL filler and do the right thing inside of
do_read_cache_page based on the NULL parameter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Bug: 133186739
Change-Id: I720afb2e0c28d5649b62ed3852b48ac63dc0d7a8
Link: https://lkml.org/lkml/2019/5/1/294
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2019-05-20 17:46:42 -07:00
Greg Kroah-Hartman
0b63cd6d63 This is the 4.19.44 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzdoMwACgkQONu9yGCS
 aT5H9RAAkvsM0ryV/0BLcS7bAc2qXHz3shwcqMRHeqLSHCI728Ew3vbrenquQ9ou
 I1I2gXVjVr3fT5IrNeEKC/RJg5EvUBD25dm3Iei9cTf8kqnnFaEqQFe+jIvVOMfR
 zr//hEG3lOO5tLtyK1pnZxrW1uKl7B+1O64bVMpHSola+5teV51r/PxqYM5lNKYm
 A0YhhcrTZ4f1ZY1nctgb3g5XousPHJAMhe8ACv6zO7I+eWwn410nUeAJL7WK8SvL
 UqW0xgLAkKsYrriaDCPC9VeOVMUetCHeT9KbYQ8W9p+kb5bMQua4d/YJtCxulnqH
 TAQosuCVSVoq+jQIZS+El4+teCQW/aAkHFGo+08e9ti+trEliz2u2PxiloZoGlxl
 M4iIPwjkp7YJ9WHoVdNjObPiv5vt+2pWDEPhB7p++nsk5/2ArbShqWVDREj09SRp
 RpOJ7UGl+Nckk8IppfQgRUOlVyMW6a+Fw7GX7dxnyh9U1g48w5ebNTVeUvKxQVCJ
 a99U9qqGRgmEsgicnOXs4tRMYuU0gn3j0Qjhd9G2xYijh8ElMLgTJyv+2JiUIavC
 nXmSVW3G54dR7N94aDfWm/z85w/uFmTmT4BiqNuc4/gHVyyk12Q9RbsV4T617xls
 wiCkGoLKlpi9cp6abxB9H+oav9PdZd9ztwZMSB35k1AIgh6oeo0=
 =v14F
 -----END PGP SIGNATURE-----

Merge 4.19.44 into android-4.19

Changes in 4.19.44
	bfq: update internal depth state when queue depth changes
	platform/x86: sony-laptop: Fix unintentional fall-through
	platform/x86: thinkpad_acpi: Disable Bluetooth for some machines
	platform/x86: dell-laptop: fix rfkill functionality
	hwmon: (pwm-fan) Disable PWM if fetching cooling data fails
	kernfs: fix barrier usage in __kernfs_new_node()
	virt: vbox: Sanity-check parameter types for hgcm-calls coming from userspace
	USB: serial: fix unthrottle races
	iio: adc: xilinx: fix potential use-after-free on remove
	iio: adc: xilinx: fix potential use-after-free on probe
	iio: adc: xilinx: prevent touching unclocked h/w on remove
	acpi/nfit: Always dump _DSM output payload
	libnvdimm/namespace: Fix a potential NULL pointer dereference
	HID: input: add mapping for Expose/Overview key
	HID: input: add mapping for keyboard Brightness Up/Down/Toggle keys
	HID: input: add mapping for "Toggle Display" key
	libnvdimm/btt: Fix a kmemdup failure check
	s390/dasd: Fix capacity calculation for large volumes
	mac80211: fix unaligned access in mesh table hash function
	mac80211: Increase MAX_MSG_LEN
	cfg80211: Handle WMM rules in regulatory domain intersection
	mac80211: fix memory accounting with A-MSDU aggregation
	nl80211: Add NL80211_FLAG_CLEAR_SKB flag for other NL commands
	libnvdimm/pmem: fix a possible OOB access when read and write pmem
	s390/3270: fix lockdep false positive on view->lock
	drm/amd/display: extending AUX SW Timeout
	clocksource/drivers/npcm: select TIMER_OF
	clocksource/drivers/oxnas: Fix OX820 compatible
	selftests: fib_tests: Fix 'Command line is not complete' errors
	mISDN: Check address length before reading address family
	vxge: fix return of a free'd memblock on a failed dma mapping
	qede: fix write to free'd pointer error and double free of ptp
	afs: Unlock pages for __pagevec_release()
	drm/amd/display: If one stream full updates, full update all planes
	s390/pkey: add one more argument space for debug feature entry
	x86/build/lto: Fix truncated .bss with -fdata-sections
	x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T
	KVM: fix spectrev1 gadgets
	KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
	tools lib traceevent: Fix missing equality check for strcmp
	ipmi: ipmi_si_hardcode.c: init si_type array to fix a crash
	ocelot: Don't sleep in atomic context (irqs_disabled())
	scsi: aic7xxx: fix EISA support
	mm: fix inactive list balancing between NUMA nodes and cgroups
	init: initialize jump labels before command line option parsing
	selftests: netfilter: check icmp pkttoobig errors are set as related
	ipvs: do not schedule icmp errors from tunnels
	netfilter: ctnetlink: don't use conntrack/expect object addresses as id
	netfilter: nf_tables: prevent shift wrap in nft_chain_parse_hook()
	MIPS: perf: ath79: Fix perfcount IRQ assignment
	s390: ctcm: fix ctcm_new_device error return code
	drm/sun4i: Set device driver data at bind time for use in unbind
	drm/sun4i: Fix component unbinding and component master deletion
	selftests/net: correct the return value for run_netsocktests
	netfilter: fix nf_l4proto_log_invalid to log invalid packets
	gpu: ipu-v3: dp: fix CSC handling
	drm/imx: don't skip DP channel disable for background plane
	ARM: 8856/1: NOMMU: Fix CCR register faulty initialization when MPU is disabled
	spi: Micrel eth switch: declare missing of table
	spi: ST ST95HF NFC: declare missing of table
	drm/sun4i: Unbind components before releasing DRM and memory
	Input: synaptics-rmi4 - fix possible double free
	RDMA/hns: Bugfix for mapping user db
	mm/memory_hotplug.c: drop memory device reference after find_memory_block()
	powerpc/smp: Fix NMI IPI timeout
	powerpc/smp: Fix NMI IPI xmon timeout
	net: dsa: mv88e6xxx: fix few issues in mv88e6390x_port_set_cmode
	mm/memory.c: fix modifying of page protection by insert_pfn()
	usb: typec: Fix unchecked return value
	netfilter: nf_tables: use-after-free in dynamic operations
	netfilter: nf_tables: add missing ->release_ops() in error path of newrule()
	net: fec: manage ahb clock in runtime pm
	mlxsw: spectrum_switchdev: Add MDB entries in prepare phase
	mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueue
	mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueue
	mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueue
	net/tls: fix the IV leaks
	net: strparser: partially revert "strparser: Call skb_unclone conditionally"
	NFC: nci: Add some bounds checking in nci_hci_cmd_received()
	nfc: nci: Potential off by one in ->pipes[] array
	x86/kprobes: Avoid kretprobe recursion bug
	cw1200: fix missing unlock on error in cw1200_hw_scan()
	mwl8k: Fix rate_idx underflow
	rtlwifi: rtl8723ae: Fix missing break in switch statement
	Don't jump to compute_result state from check_result state
	um: Don't hardcode path as it is architecture dependent
	powerpc/64s: Include cpu header
	bonding: fix arp_validate toggling in active-backup mode
	bridge: Fix error path for kobject_init_and_add()
	dpaa_eth: fix SG frame cleanup
	fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied
	ipv4: Fix raw socket lookup for local traffic
	net: dsa: Fix error cleanup path in dsa_init_module
	net: ethernet: stmmac: dwmac-sun8i: enable support of unicast filtering
	net: macb: Change interrupt and napi enable order in open
	net: seeq: fix crash caused by not set dev.parent
	net: ucc_geth - fix Oops when changing number of buffers in the ring
	packet: Fix error path in packet_init
	selinux: do not report error on connect(AF_UNSPEC)
	vlan: disable SIOCSHWTSTAMP in container
	vrf: sit mtu should not be updated when vrf netdev is the link
	tuntap: fix dividing by zero in ebpf queue selection
	tuntap: synchronize through tfiles array instead of tun->numqueues
	isdn: bas_gigaset: use usb_fill_int_urb() properly
	tipc: fix hanging clients using poll with EPOLLOUT flag
	drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl
	drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl
	powerpc/book3s/64: check for NULL pointer in pgd_alloc()
	powerpc/powernv/idle: Restore IAMR after idle
	powerpc/booke64: set RI in default MSR
	PCI: hv: Fix a memory leak in hv_eject_device_work()
	PCI: hv: Add hv_pci_remove_slots() when we unload the driver
	PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary
	Linux 4.19.44

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-16 19:52:46 +02:00
Jan Kara
6832199422 mm/memory.c: fix modifying of page protection by insert_pfn()
[ Upstream commit cae85cb8add35f678cf487139d05e083ce2f570a ]

Aneesh has reported that PPC triggers the following warning when
excercising DAX code:

  IP set_pte_at+0x3c/0x190
  LR insert_pfn+0x208/0x280
  Call Trace:
     insert_pfn+0x68/0x280
     dax_iomap_pte_fault.isra.7+0x734/0xa40
     __xfs_filemap_fault+0x280/0x2d0
     do_wp_page+0x48c/0xa40
     __handle_mm_fault+0x8d0/0x1fd0
     handle_mm_fault+0x140/0x250
     __do_page_fault+0x300/0xd60
     handle_page_fault+0x18

Now that is WARN_ON in set_pte_at which is

        VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));

The problem is that on some architectures set_pte_at() cannot cope with
a situation where there is already some (different) valid entry present.

Use ptep_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PTE.

Link: http://lkml.kernel.org/r/20190311084537.16029-1-jack@suse.cz
Fixes: b2770da642 "mm: add vm_insert_mixed_mkwrite()"
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Chandan Rajendra <chandan@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2019-05-16 19:41:26 +02:00
David Hildenbrand
6a60fb62c8 mm/memory_hotplug.c: drop memory device reference after find_memory_block()
[ Upstream commit 89c02e69fc5245f8a2f34b58b42d43a737af1a5e ]

Right now we are using find_memory_block() to get the node id for the
pfn range to online.  We are missing to drop a reference to the memory
block device.  While the device still gets unregistered via
device_unregister(), resulting in no user visible problem, the device is
never released via device_release(), resulting in a memory leak.  Fix
that by properly using a put_device().

Link: http://lkml.kernel.org/r/20190411110955.1430-1-david@redhat.com
Fixes: d0dc12e86b ("mm/memory_hotplug: optimize memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16 19:41:25 +02:00
Johannes Weiner
6536de8232 mm: fix inactive list balancing between NUMA nodes and cgroups
[ Upstream commit 3b991208b897f52507168374033771a984b947b1 ]

During !CONFIG_CGROUP reclaim, we expand the inactive list size if it's
thrashing on the node that is about to be reclaimed.  But when cgroups
are enabled, we suddenly ignore the node scope and use the cgroup scope
only.  The result is that pressure bleeds between NUMA nodes depending
on whether cgroups are merely compiled into Linux.  This behavioral
difference is unexpected and undesirable.

When the refault adaptivity of the inactive list was first introduced,
there were no statistics at the lruvec level - the intersection of node
and memcg - so it was better than nothing.

But now that we have that infrastructure, use lruvec_page_state() to
make the list balancing decision always NUMA aware.

[hannes@cmpxchg.org: fix bisection hole]
  Link: http://lkml.kernel.org/r/20190417155241.GB23013@cmpxchg.org
Link: http://lkml.kernel.org/r/20190412144438.2645-1-hannes@cmpxchg.org
Fixes: 2a2e48854d ("mm: vmscan: fix IO/refault regression in cache workingset transition")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16 19:41:23 +02:00
Greg Kroah-Hartman
66aebe2d89 This is the 4.19.42 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzVnqQACgkQONu9yGCS
 aT4iJQ//SEDeqCHsZIblTqIywc87Xn/M4WGvQSr5l7NlFFWXRmIZ2VqGSa8IqY6m
 mbq0WJHIb0N3fJ09rh6/OuoFz8j7L4aq3OngCsAfmj3Esb2mAt8QkpvHgKdO8wEp
 iWstqAUUgCJLaaCO2KN5Z4uvpib+HhpaoOkHNto94w6M5Hc9Ax9iqMWqOk+ugcy9
 q3nPWomhesrmzNjw4gOtG2gyGlkqUZOkbvQDYttXO5jAV2sgfy1puaG3maApRRvT
 uZFOYBJEa13MFx+QgaLvZ7asyhtlBVzQyRx/CcBjecCMRv3oC/ZZtemDmwqEZ4ll
 kj4IaRMzthRhmkpLEEW4SkN/TZDL3C0n114Wp+hRMmLviY7yOOovrbiHs8k/RYzU
 eOKZ2v/zI2FKGYbH0Nsa1HPJdOO1ESAVIxs/XHyk8odOPCpx+KN8NOPdnWILrM1l
 uhVKK62iQlZsLlny+EE7Oy3nqZAg1hHwNkV7WuyLfFrSVDQ/IOLK953QJdNlmgCd
 NbVRCcZ8gtL47jLD4F3uf8bcNQhheg/1m/A95xl860WmTDXrQSvN3FmvVWZ9LUCf
 ftsA76tMcDFlpw/Xu12z+P93VliaATt/I6aRbJyKvL3BBL5dB7Cx0asMDi5ecT6P
 sgaxYPp56xX0m6V10vEtaFCRcyuP2JmR60uU5Zvq9t5CtG+7BL0=
 =SzxN
 -----END PGP SIGNATURE-----

Merge 4.19.42 into android-4.19

Changes in 4.19.42
	net: stmmac: Use bfsize1 in ndesc_init_rx_desc
	scsi: libsas: fix a race condition when smp task timeout
	Drivers: hv: vmbus: Remove the undesired put_cpu_ptr() in hv_synic_cleanup()
	ubsan: Fix nasty -Wbuiltin-declaration-mismatch GCC-9 warnings
	staging: greybus: power_supply: fix prop-descriptor request size
	staging: most: cdev: fix chrdev_region leak in mod_exit
	ASoC: tlv320aic3x: fix reset gpio reference counting
	ASoC: hdmi-codec: fix S/PDIF DAI
	ASoC: stm32: sai: fix iec958 controls indexation
	ASoC: stm32: sai: fix exposed capabilities in spdif mode
	ASoC:soc-pcm:fix a codec fixup issue in TDM case
	ASoC:intel:skl:fix a simultaneous playback & capture issue on hda platform
	ASoC: nau8824: fix the issue of the widget with prefix name
	ASoC: nau8810: fix the issue of widget with prefixed name
	ASoC: samsung: odroid: Fix clock configuration for 44100 sample rate
	ASoC: rt5682: recording has no sound after booting
	ASoC: wm_adsp: Add locking to wm_adsp2_bus_error
	clk: meson-gxbb: round the vdec dividers to closest
	ASoC: stm32: dfsdm: manage multiple prepare
	ASoC: stm32: dfsdm: fix debugfs warnings on entry creation
	ASoC: cs4270: Set auto-increment bit for register writes
	ASoC: dapm: Fix NULL pointer dereference in snd_soc_dapm_free_kcontrol
	drm/omap: hdmi4_cec: Fix CEC clock handling for PM
	IB/hfi1: Eliminate opcode tests on mr deref
	IB/hfi1: Fix the allocation of RSM table
	MIPS: KGDB: fix kgdb support for SMP platforms.
	ASoC: tlv320aic32x4: Fix Common Pins
	drm/mediatek: Fix an error code in mtk_hdmi_dt_parse_pdata()
	perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS
	perf/x86/intel: Initialize TFA MSR
	linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
	ASoC: rockchip: pdm: fix regmap_ops hang issue
	drm/amd/display: fix cursor black issue
	ASoC: cs35l35: Disable regulators on driver removal
	objtool: Add rewind_stack_do_exit() to the noreturn list
	slab: fix a crash by reading /proc/slab_allocators
	drm/sun4i: tcon top: Fix NULL/invalid pointer dereference in sun8i_tcon_top_un/bind
	virtio_pci: fix a NULL pointer reference in vp_del_vqs
	RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_remove
	RDMA/hns: Fix bug that caused srq creation to fail
	scsi: csiostor: fix missing data copy in csio_scsi_err_handler()
	drm/mediatek: fix possible object reference leak
	ASoC: Intel: kbl: fix wrong number of channels
	virtio-blk: limit number of hw queues by nr_cpu_ids
	nvme-fc: correct csn initialization and increments on error
	platform/x86: pmc_atom: Drop __initconst on dmi table
	perf/core: Fix perf_event_disable_inatomic() race
	iommu/amd: Set exclusion range correctly
	genirq: Prevent use-after-free and work list corruption
	usb: dwc3: Fix default lpm_nyet_threshold value
	USB: serial: f81232: fix interrupt worker not stop
	USB: cdc-acm: fix unthrottle races
	usb-storage: Set virt_boundary_mask to avoid SG overflows
	intel_th: pci: Add Comet Lake support
	cpufreq: armada-37xx: fix frequency calculation for opp
	soc: sunxi: Fix missing dependency on REGMAP_MMIO
	scsi: lpfc: change snprintf to scnprintf for possible overflow
	scsi: qla2xxx: Fix incorrect region-size setting in optrom SYSFS routines
	scsi: qla2xxx: Fix device staying in blocked state
	Bluetooth: hidp: fix buffer overflow
	Bluetooth: Align minimum encryption key size for LE and BR/EDR connections
	UAS: fix alignment of scatter/gather segments
	ASoC: Intel: avoid Oops if DMA setup fails
	locking/futex: Allow low-level atomic operations to return -EAGAIN
	arm64: futex: Bound number of LDXR/STXR loops in FUTEX_WAKE_OP
	Linux 4.19.42

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-10 18:30:16 +02:00
Qian Cai
78bc98235e slab: fix a crash by reading /proc/slab_allocators
[ Upstream commit fcf88917dd435c6a4cb2830cb086ee58605a1d85 ]

The commit 510ded33e0 ("slab: implement slab_root_caches list")
changes the name of the list node within "struct kmem_cache" from "list"
to "root_caches_node", but leaks_show() still use the "list" which
causes a crash when reading /proc/slab_allocators.

You need to have CONFIG_SLAB=y and CONFIG_MEMCG=y to see the problem,
because without MEMCG all slab caches are root caches, and the "list"
node happens to be the right one.

Fixes: 510ded33e0 ("slab: implement slab_root_caches list")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Tobin C. Harding <tobin@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-10 17:54:08 +02:00
Greg Kroah-Hartman
44c5f03127 This is the 4.19.41 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzSZ3MACgkQONu9yGCS
 aT7f2hAAww50MxygeVQXgB0zJ7oSlEBwJAUH1GRCj1ual70pDoK9W8iuacPI4gNu
 +A0V4cD+lndMZ8E+F9/Vmzhmj/xHdxRfCkRmtnpa5IKUJqA/bLr1kdspYO6G354G
 ZBGvizvmFaAikbqNSnoadQZ5AYoT/C4FxxD6t/meQtnwmrK+OhRlh1xCkTddAbQo
 UEPDns9pxqrwHK7tS0GbPXLWgsr4y7BPq92ILdt/GFZGmEBJa0+tf66/DF+RMKjD
 OCmAkiFr+hN8KScUHeZpJKdmI3s7dHEUnQnwkHyC1TwjEXwxhtA0QZgzl9j1F8JI
 f27kbJrpmYuw2QADJyZmCnTiqBo7XglItxEZ2csJKyV7+jFFlzKT4S26iyrHwgXp
 BTdSuTaVe2Ubp/EiULwoSsmxaIyW6fROotQK/x4oB0eDm+OSXq217nnbnF2MRpMb
 eOqzRPCd5CvL8+7T9rc0kOJcDuK/9h95OjzZWBo9rDGz0glxlokw7cJ4fUyrJMPo
 kcw7ZIRRHLtKGTD/ZKX3XrErAqz9I4RAEzBRAXT6W0is7PUi00vFNcOU4ThNSsSK
 ihfuHsp9Jq/SKEUl6/CumU8e91LWv77/gqYxLLuygpuKLFLpMMc/PQOruxXT6ojx
 QC/KIUtcMfTP+xy23Wg2UQ1BXhl+HcfShXzvcEXxHzjgHpglKvc=
 =S4RT
 -----END PGP SIGNATURE-----

Merge 4.19.41 into android-4.19

Changes in 4.19.41
	iwlwifi: fix driver operation for 5350
	mwifiex: Make resume actually do something useful again on SDIO cards
	mac80211: don't attempt to rename ERR_PTR() debugfs dirs
	i2c: synquacer: fix enumeration of slave devices
	i2c: imx: correct the method of getting private data in notifier_call
	i2c: Remove unnecessary call to irq_find_mapping
	i2c: Clear client->irq in i2c_device_remove
	i2c: Allow recovery of the initial IRQ by an I2C client device.
	i2c: Prevent runtime suspend of adapter when Host Notify is required
	ALSA: hda/realtek - Add new Dell platform for headset mode
	ALSA: hda/realtek - Fixed Dell AIO speaker noise
	ALSA: hda/realtek - Apply the fixup for ASUS Q325UAR
	USB: yurex: Fix protection fault after device removal
	USB: w1 ds2490: Fix bug caused by improper use of altsetting array
	USB: dummy-hcd: Fix failure to give back unlinked URBs
	usb: usbip: fix isoc packet num validation in get_pipe
	USB: core: Fix unterminated string returned by usb_string()
	USB: core: Fix bug caused by duplicate interface PM usage counter
	nvme-loop: init nvmet_ctrl fatal_err_work when allocate
	efi: Fix debugobjects warning on 'efi_rts_work'
	arm64: dts: rockchip: fix rk3328-roc-cc gmac2io tx/rx_delay
	HID: logitech: check the return value of create_singlethread_workqueue
	HID: debug: fix race condition with between rdesc_show() and device removal
	rtc: cros-ec: Fail suspend/resume if wake IRQ can't be configured
	rtc: sh: Fix invalid alarm warning for non-enabled alarm
	batman-adv: Reduce claim hash refcnt only for removed entry
	batman-adv: Reduce tt_local hash refcnt only for removed entry
	batman-adv: Reduce tt_global hash refcnt only for removed entry
	batman-adv: fix warning in function batadv_v_elp_get_throughput
	ARM: dts: rockchip: Fix gpu opp node names for rk3288
	reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev
	igb: Fix WARN_ONCE on runtime suspend
	riscv: fix accessing 8-byte variable from RV32
	HID: quirks: Fix keyboard + touchpad on Lenovo Miix 630
	net: hns3: fix compile error
	net/mlx5: E-Switch, Fix esw manager vport indication for more vport commands
	bonding: show full hw address in sysfs for slave entries
	net: stmmac: use correct DMA buffer size in the RX descriptor
	net: stmmac: ratelimit RX error logs
	net: stmmac: don't stop NAPI processing when dropping a packet
	net: stmmac: don't overwrite discard_frame status
	net: stmmac: fix dropping of multi-descriptor RX frames
	net: stmmac: don't log oversized frames
	jffs2: fix use-after-free on symlink traversal
	debugfs: fix use-after-free on symlink traversal
	mfd: twl-core: Disable IRQ while suspended
	block: use blk_free_flush_queue() to free hctx->fq in blk_mq_init_hctx
	rtc: da9063: set uie_unsupported when relevant
	HID: input: add mapping for Assistant key
	vfio/pci: use correct format characters
	scsi: core: add new RDAC LENOVO/DE_Series device
	scsi: storvsc: Fix calculation of sub-channel count
	arm/mach-at91/pm : fix possible object reference leak
	arm64: fix wrong check of on_sdei_stack in nmi context
	net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
	net: hns: Use NAPI_POLL_WEIGHT for hns driver
	net: hns: Fix probabilistic memory overwrite when HNS driver initialized
	net: hns: fix ICMP6 neighbor solicitation messages discard problem
	net: hns: Fix WARNING when remove HNS driver with SMMU enabled
	libcxgb: fix incorrect ppmax calculation
	KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow
	kmemleak: powerpc: skip scanning holes in the .bss section
	hugetlbfs: fix memory leak for resv_map
	sh: fix multiple function definition build errors
	xsysace: Fix error handling in ace_setup
	fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
	ARM: orion: don't use using 64-bit DMA masks
	ARM: iop: don't use using 64-bit DMA masks
	block: pass no-op callback to INIT_WORK().
	perf/x86/amd: Update generic hardware cache events for Family 17h
	Bluetooth: btusb: request wake pin with NOAUTOEN
	Bluetooth: mediatek: fix up an error path to restore bdev->tx_state
	clk: qcom: Add missing freq for usb30_master_clk on 8998
	staging: iio: adt7316: allow adt751x to use internal vref for all dacs
	staging: iio: adt7316: fix the dac read calculation
	staging: iio: adt7316: fix the dac write calculation
	scsi: RDMA/srpt: Fix a credit leak for aborted commands
	ASoC: Intel: bytcr_rt5651: Revert "Fix DMIC map headsetmic mapping"
	ASoC: wm_adsp: Correct handling of compressed streams that restart
	ASoC: stm32: fix sai driver name initialisation
	platform/x86: intel_pmc_core: Fix PCH IP name
	platform/x86: intel_pmc_core: Handle CFL regmap properly
	IB/core: Unregister notifier before freeing MAD security
	IB/core: Fix potential memory leak while creating MAD agents
	IB/core: Destroy QP if XRC QP fails
	Input: snvs_pwrkey - initialize necessary driver data before enabling IRQ
	Input: stmfts - acknowledge that setting brightness is a blocking call
	gpio: mxc: add check to return defer probe if clock tree NOT ready
	selinux: avoid silent denials in permissive mode under RCU walk
	selinux: never allow relabeling on context mounts
	mac80211: Honor SW_CRYPTO_CONTROL for unicast keys in AP VLAN mode
	powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search
	x86/mce: Improve error message when kernel cannot recover, p2
	clk: x86: Add system specific quirk to mark clocks as critical
	x86/mm/KASLR: Fix the size of the direct mapping section
	x86/mm: Fix a crash with kmemleak_scan()
	x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info"
	i2c: i2c-stm32f7: Fix SDADEL minimum formula
	media: v4l2: i2c: ov7670: Fix PLL bypass register values
	ASoC: wm_adsp: Check for buffer in trigger stop
	mm/kmemleak.c: fix unused-function warning
	Linux 4.19.41

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-08 07:39:48 +02:00
Arnd Bergmann
e7c2d06656 mm/kmemleak.c: fix unused-function warning
commit dce5b0bdeec61bdbee56121ceb1d014151d5cab1 upstream.

The only references outside of the #ifdef have been removed, so now we
get a warning in non-SMP configurations:

  mm/kmemleak.c:1404:13: error: unused function 'scan_large_block' [-Werror,-Wunused-function]

Add a new #ifdef around it.

Link: http://lkml.kernel.org/r/20190416123148.3502045-1-arnd@arndb.de
Fixes: 298a32b13208 ("kmemleak: powerpc: skip scanning holes in the .bss section")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-08 07:21:55 +02:00
Catalin Marinas
6a62bbe823 kmemleak: powerpc: skip scanning holes in the .bss section
[ Upstream commit 298a32b132087550d3fa80641ca58323c5dfd4d9 ]

Commit 2d4f567103 ("KVM: PPC: Introduce kvm_tmp framework") adds
kvm_tmp[] into the .bss section and then free the rest of unused spaces
back to the page allocator.

kernel_init
  kvm_guest_init
    kvm_free_tmp
      free_reserved_area
        free_unref_page
          free_unref_page_prepare

With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel.  As the
result, kmemleak scan will trigger a panic when it scans the .bss
section with unmapped pages.

This patch creates dedicated kmemleak objects for the .data, .bss and
potentially .data..ro_after_init sections to allow partial freeing via
the kmemleak_free_part() in the powerpc kvm_free_tmp() function.

Link: http://lkml.kernel.org/r/20190321171917.62049-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Qian Cai <cai@lca.pw>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Qian Cai <cai@lca.pw>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-08 07:21:50 +02:00
Greg Kroah-Hartman
dda1dd3ba3 This is the 4.19.39 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzNPTYACgkQONu9yGCS
 aT5+LA//TEZMvKwfq8ASsp17ncB9jNrV9G5gwwTv4Aa+Zv16LN5pLhlTwqJE1JCc
 wJzisFxCGNil0LTZYlpTwGnPvjNB4MPqyMEU5B7GkLsbRCjCJCZUpCEhrsEhR6D+
 finLDnSAHh8+c/kvETIazvzTw9BP8YUDe7j933L54pXONfAHUvYDkz57ri5oEyIE
 9g+Haot6gytgnPH1iU3A+uSjMll2naBHT1ga7F6fqU6EuhIogtEepG1Y6TWT6zfQ
 IDO4HZwMhdi75ITkv06lh/1zdQaMaGWE+R7svKfQ5UoH89eIUvRjKl8Pkdd5JPha
 1LOz5HBBTxCZdzl70UF8n0ZzlNYP71rQ1CT4bo+U4kPNWNvKZb2MxJEOxB7/JqJo
 qNrv1lBE1ERBM5s8NlTSx1P+RSs3G6kLhDIgz+uB8ALl87zDkwUA7ynP13NFsP2c
 QY7E0eianWSYJxrqhkgWXQ/ToYRLsCBARvAHMZgDveVUNe83SyK1NFhZD77weDSC
 WtirUFzKpw0ZtVjs8Ox6zh4HRtAFXsyPvnlPbtrMdaU0GtEox2Skg7SKboEZfyUa
 8nu98ZdEdzqAKKUOs2o/3vVFnq4NKmw1khpo1OEW5Ob76KdDHYGMiiGOYH1vEGY1
 OxqNyNgw8/8OT6drCrAgx7+4X+BTXdiSXORUyp9AP7WpELwMy4Q=
 =35Eg
 -----END PGP SIGNATURE-----

Merge 4.19.39 into android-4.19

Changes in 4.19.39
	selinux: use kernel linux/socket.h for genheaders and mdp
	Revert "ACPICA: Clear status of GPEs before enabling them"
	mm: make page ref count overflow check tighter and more explicit
	mm: add 'try_get_page()' helper function
	mm: prevent get_user_pages() from overflowing page refcount
	fs: prevent page refcount overflow in pipe_buf_get
	ARM: dts: bcm283x: Fix hdmi hpd gpio pull
	s390: limit brk randomization to 32MB
	net: ieee802154: fix a potential NULL pointer dereference
	ieee802154: hwsim: propagate genlmsg_reply return code
	net: stmmac: don't set own bit too early for jumbo frames
	qlcnic: Avoid potential NULL pointer dereference
	xsk: fix umem memory leak on cleanup
	staging: axis-fifo: add CONFIG_OF dependency
	staging, mt7621-pci: fix build without pci support
	netfilter: nft_set_rbtree: check for inactive element after flag mismatch
	netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTING
	netfilter: fix NETFILTER_XT_TARGET_TEE dependencies
	netfilter: ip6t_srh: fix NULL pointer dereferences
	s390/qeth: fix race when initializing the IP address table
	ARM: imx51: fix a leaked reference by adding missing of_node_put
	sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init()
	serial: ar933x_uart: Fix build failure with disabled console
	KVM: arm64: Reset the PMU in preemptible context
	KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory
	KVM: arm/arm64: vgic-its: Take the srcu lock when parsing the memslots
	usb: dwc3: pci: add support for Comet Lake PCH ID
	usb: gadget: net2280: Fix overrun of OUT messages
	usb: gadget: net2280: Fix net2280_dequeue()
	usb: gadget: net2272: Fix net2272_dequeue()
	ARM: dts: pfla02: increase phy reset duration
	i2c: i801: Add support for Intel Comet Lake
	net: ks8851: Dequeue RX packets explicitly
	net: ks8851: Reassert reset pin if chip ID check fails
	net: ks8851: Delay requesting IRQ until opened
	net: ks8851: Set initial carrier state to down
	staging: rtl8188eu: Fix potential NULL pointer dereference of kcalloc
	staging: rtlwifi: rtl8822b: fix to avoid potential NULL pointer dereference
	staging: rtl8712: uninitialized memory in read_bbreg_hdl()
	staging: rtlwifi: Fix potential NULL pointer dereference of kzalloc
	net: macb: Add null check for PCLK and HCLK
	net/sched: don't dereference a->goto_chain to read the chain index
	ARM: dts: imx6qdl: Fix typo in imx6qdl-icore-rqs.dtsi
	drm/tegra: hub: Fix dereference before check
	NFS: Fix a typo in nfs_init_timeout_values()
	net: xilinx: fix possible object reference leak
	net: ibm: fix possible object reference leak
	net: ethernet: ti: fix possible object reference leak
	drm: Fix drm_release() and device unplug
	gpio: aspeed: fix a potential NULL pointer dereference
	drm/meson: Fix invalid pointer in meson_drv_unbind()
	drm/meson: Uninstall IRQ handler
	ARM: davinci: fix build failure with allnoconfig
	scsi: mpt3sas: Fix kernel panic during expander reset
	scsi: aacraid: Insure we don't access PCIe space during AER/EEH
	scsi: qla4xxx: fix a potential NULL pointer dereference
	usb: usb251xb: fix to avoid potential NULL pointer dereference
	leds: trigger: netdev: fix refcnt leak on interface rename
	x86/realmode: Don't leak the trampoline kernel address
	usb: u132-hcd: fix resource leak
	ceph: fix use-after-free on symlink traversal
	scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN
	x86/mm: Don't exceed the valid physical address space
	libata: fix using DMA buffers on stack
	gpio: of: Fix of_gpiochip_add() error path
	nvme-multipath: relax ANA state check
	perf machine: Update kernel map address and re-order properly
	kconfig/[mn]conf: handle backspace (^H) key
	iommu/amd: Reserve exclusion range in iova-domain
	ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK
	leds: pca9532: fix a potential NULL pointer dereference
	leds: trigger: netdev: use memcpy in device_name_store
	Linux 4.19.39

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-04 09:28:59 +02:00
Linus Torvalds
d972ebbf42 mm: prevent get_user_pages() from overflowing page refcount
commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream.

If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it.  One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times.  This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.

Reported-by: Jann Horn <jannh@google.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-04 09:20:11 +02:00
Greg Kroah-Hartman
5e7b4fbe36 This is the 4.19.38 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzKo0YACgkQONu9yGCS
 aT4dbQ//U1bo/8bdBJec+a0aNMy3cxzPF1Ozbrb/vEaHofj1BR87hgo4BODBO7pu
 6ppwloPle9VFrsfT1FYOjsicUBhT4NmieHlsC3msAR4xlBEbHEOBTEbUdu3HinGV
 Jn/uL/NDTrq+wA5rROGOh9sTlQ5w6dqItjHAWvnGkXlerbUJwIgnzbgH5qGBFZhQ
 6SbPmqJv5V+C+qYy3yXNs2CnbtS7+cfulLy26MNnkFMEZGbHTWeNbeu9H41AK6T4
 xtO8INse28RD6lbAPvW/xb//iAXsOHv+7KF1TgtZq89Z1RmlaqLSdPdgTYvCxm+Y
 RhWa8KyIdhADJ8z8sRcPviFI5bR65cfCMUAEgBcFNYYByDv36KCBLsXajn4JbBsF
 OOOtqnGaZyAJBZgMXySfVJIXLAx7cUlt07YD9cIdsOzjl1DCMP76XvypeGXLw5Mk
 ZBXBJ+By+8jwnE7PAtecij/VH6qCDsfn4HqoRELsRLVahFsnFFid5lutVIjsO21j
 QHrwi4hChuYGa89MhD48KyC2ZuaQmbs3rm6F3O0iQ0aipknvlsDoB4jYYp9qRI04
 0FYMlZLlVyg+sNYOM2XvTtpOBFa1PFwFwscqXoyt0CGtig0D+pD3gDYExRONj6Fp
 8h+OUBWbVHWscceMc6G1p/Qu+YcgmQTu8CFAUO8l/X8xq655c1A=
 =isRm
 -----END PGP SIGNATURE-----

Merge 4.19.38 into android-4.19

Changes in 4.19.38
	netfilter: nft_compat: use refcnt_t type for nft_xt reference count
	netfilter: nft_compat: make lists per netns
	netfilter: nf_tables: split set destruction in deactivate and destroy phase
	netfilter: nft_compat: destroy function must not have side effects
	netfilter: nf_tables: warn when expr implements only one of activate/deactivate
	netfilter: nf_tables: unbind set in rule from commit path
	netfilter: nft_compat: don't use refcount_inc on newly allocated entry
	netfilter: nft_compat: use .release_ops and remove list of extension
	netfilter: nf_tables: fix set double-free in abort path
	netfilter: nf_tables: bogus EBUSY when deleting set after flush
	netfilter: nf_tables: bogus EBUSY in helper removal from transaction
	net/ibmvnic: Fix RTNL deadlock during device reset
	net: mvpp2: fix validate for PPv2.1
	ext4: fix some error pointer dereferences
	tipc: handle the err returned from cmd header function
	loop: do not print warn message if partition scan is successful
	drm/rockchip: fix for mailbox read validation.
	vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock
	ipvs: fix warning on unused variable
	powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
	ALSA: hda/ca0132 - Fix build error without CONFIG_PCI
	net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
	cifs: fix memory leak in SMB2_read
	cifs: do not attempt cifs operation on smb2+ rename error
	tracing: Fix a memory leak by early error exit in trace_pid_write()
	tracing: Fix buffer_ref pipe ops
	gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC
	zram: pass down the bvec we need to read into in the work struct
	lib/Kconfig.debug: fix build error without CONFIG_BLOCK
	MIPS: scall64-o32: Fix indirect syscall number load
	trace: Fix preempt_enable_no_resched() abuse
	IB/rdmavt: Fix frwr memory registration
	RDMA/mlx5: Do not allow the user to write to the clock page
	sched/numa: Fix a possible divide-by-zero
	ceph: only use d_name directly when parent is locked
	ceph: ensure d_name stability in ceph_dentry_hash()
	ceph: fix ci->i_head_snapc leak
	nfsd: Don't release the callback slot unless it was actually held
	sunrpc: don't mark uninitialised items as VALID.
	perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters
	Input: synaptics-rmi4 - write config register values to the right offset
	vfio/type1: Limit DMA mappings per container
	dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid
	dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status
	ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache
	powerpc/mm/radix: Make Radix require HUGETLB_PAGE
	drm/vc4: Fix memory leak during gpu reset.
	Revert "drm/i915/fbdev: Actually configure untiled displays"
	drm/vc4: Fix compilation error reported by kbuild test bot
	USB: Add new USB LPM helpers
	USB: Consolidate LPM checks to avoid enabling LPM twice
	slip: make slhc_free() silently accept an error pointer
	intel_th: gth: Fix an off-by-one in output unassigning
	fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
	workqueue: Try to catch flush_work() without INIT_WORK().
	binder: fix handling of misaligned binder object
	sched/deadline: Correctly handle active 0-lag timers
	NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
	netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
	fm10k: Fix a potential NULL pointer dereference
	tipc: check bearer name with right length in tipc_nl_compat_bearer_enable
	tipc: check link name with right length in tipc_nl_compat_link_set
	net: netrom: Fix error cleanup path of nr_proto_init
	net/rds: Check address length before reading address family
	rxrpc: fix race condition in rxrpc_input_packet()
	aio: clear IOCB_HIPRI
	aio: use assigned completion handler
	aio: separate out ring reservation from req allocation
	aio: don't zero entire aio_kiocb aio_get_req()
	aio: use iocb_put() instead of open coding it
	aio: split out iocb copy from io_submit_one()
	aio: abstract out io_event filler helper
	aio: initialize kiocb private in case any filesystems expect it.
	aio: simplify - and fix - fget/fput for io_submit()
	pin iocb through aio.
	aio: fold lookup_kiocb() into its sole caller
	aio: keep io_event in aio_kiocb
	aio: store event at final iocb_put()
	Fix aio_poll() races
	x86, retpolines: Raise limit for generating indirect calls from switch-case
	x86/retpolines: Disable switch jump tables when retpolines are enabled
	mm: Fix warning in insert_pfn()
	x86/fpu: Don't export __kernel_fpu_{begin,end}()
	ipv4: add sanity checks in ipv4_link_failure()
	ipv4: set the tcp_min_rtt_wlen range from 0 to one day
	mlxsw: spectrum: Fix autoneg status in ethtool
	net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query
	net: rds: exchange of 8K and 1M pool
	net/rose: fix unbound loop in rose_loopback_timer()
	net: stmmac: move stmmac_check_ether_addr() to driver probe
	net/tls: fix refcount adjustment in fallback
	stmmac: pci: Adjust IOT2000 matching
	team: fix possible recursive locking when add slaves
	net: hns: Fix WARNING when hns modules installed
	mlxsw: pci: Reincrease PCI reset timeout
	mlxsw: spectrum: Put MC TCs into DWRR mode
	net/mlx5e: Fix the max MTU check in case of XDP
	net/mlx5e: Fix use-after-free after xdp_return_frame
	net/tls: avoid potential deadlock in tls_set_device_offload_rx()
	net/tls: don't leak IV and record seq when offload fails
	powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg
	Linux 4.19.38

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-02 10:13:34 +02:00
Jan Kara
423497a96d mm: Fix warning in insert_pfn()
commit f2c57d91b0d96aa13ccff4e3b178038f17b00658 upstream.

In DAX mode a write pagefault can race with write(2) in the following
way:

CPU0                            CPU1
                                write fault for mapped zero page (hole)
dax_iomap_rw()
  iomap_apply()
    xfs_file_iomap_begin()
      - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - invalidates radix tree entries in given range
                                dax_iomap_pte_fault()
                                  grab_mapping_entry()
                                    - no entry found, creates empty
                                  ...
                                  xfs_file_iomap_begin()
                                    - finds already allocated block
                                  ...
                                  vmf_insert_mixed_mkwrite()
                                    - WARNs and does nothing because there
                                      is still zero page mapped in PTE
        unmap_mapping_pages()

This race results in WARN_ON from insert_pfn() and is occasionally
triggered by fstest generic/344. Note that the race is otherwise
harmless as before write(2) on CPU0 is finished, we will invalidate page
tables properly and thus user of mmap will see modified data from
write(2) from that point on. So just restrict the warning only to the
case when the PFN in PTE is not zero page.

Link: http://lkml.kernel.org/r/20180824154542.26872-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02 09:58:59 +02:00
Greg Kroah-Hartman
9bf5904866 This is the 4.19.37 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzEBokACgkQONu9yGCS
 aT7G7w/8C93URGM67H7ynkCHTo8y3hkRE2rUJPckJNdS+IJKuecmOphak4tF0h07
 qPWDPya70Q1S0cNu661TuVAGrhmE5jBx8/xfZaAOeaaU0xtZive+TfSHdAQQaHct
 tDk32O85N1aZ49rDEz9ibr7CGLVFDZtyhxV5gFMYQpjbqA7MzJC61zQg1jHyPSCz
 sKjQzW+uXMuSLru8jXHMvp41K5sFFp5gYdQbAVKlWtt79qPxWdxZPJbLbM0LBbtz
 XHt9E45Ink3ALF9P6tZ4e6gi4zzlNbh9yR92+X5NK5/8AP57yWba4W9JHWIfMBpC
 yyDYTOEAzdxqa2Jrgwr4WTdKH6U7FbQZFmWfTBB4VotbHLBWkVXj0OnF10qxP9eQ
 p5wGDTJAlWezhX1BTCfYroglDsvqhj+gHfwHzDRF1Del1dRgydRMQc0qLD1d9tul
 ovzwOkx1xyJrM2wq05I5gc0FoVyOL6/KCwqMrpVfKa3WKY7Uttjgf56bMqdIIkns
 i/6opzF+wtvwlLlCoXgYPXdm6kbWdgvS+skVHfWcHmZFMuGrFGGzJNwzXb7qnVjK
 T0hD1OestsfTyD/amnDNYkNeCkoOZqtHAi+xYOQR4kGY5cxP1lQJf85MgAy6RZSY
 h+rjys76Qf6+hTCtrowLr8SgksX4ACWxm+UarfAiiNnnDXwGfu8=
 =SrFV
 -----END PGP SIGNATURE-----

Merge 4.19.37 into android-4.19

Changes in 4.19.37
	bonding: fix event handling for stacked bonds
	failover: allow name change on IFF_UP slave interfaces
	net: atm: Fix potential Spectre v1 vulnerabilities
	net: bridge: fix per-port af_packet sockets
	net: bridge: multicast: use rcu to access port list from br_multicast_start_querier
	net: Fix missing meta data in skb with vlan packet
	net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
	tcp: tcp_grow_window() needs to respect tcp_space()
	team: set slave to promisc if team is already in promisc mode
	tipc: missing entries in name table of publications
	vhost: reject zero size iova range
	ipv4: recompile ip options in ipv4_link_failure
	ipv4: ensure rcu_read_lock() in ipv4_link_failure()
	net: thunderx: raise XDP MTU to 1508
	net: thunderx: don't allow jumbo frames with XDP
	net/mlx5: FPGA, tls, hold rcu read lock a bit longer
	net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded()
	net/mlx5: FPGA, tls, idr remove on flow delete
	route: Avoid crash from dereferencing NULL rt->from
	sch_cake: Use tc_skb_protocol() helper for getting packet protocol
	sch_cake: Make sure we can write the IP header before changing DSCP bits
	nfp: flower: replace CFI with vlan present
	nfp: flower: remove vlan CFI bit from push vlan action
	sch_cake: Simplify logic in cake_select_tin()
	net: IP defrag: encapsulate rbtree defrag code into callable functions
	net: IP6 defrag: use rbtrees for IPv6 defrag
	net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c
	CIFS: keep FileInfo handle live during oplock break
	cifs: Fix use-after-free in SMB2_write
	cifs: Fix use-after-free in SMB2_read
	cifs: fix handle leak in smb2_query_symlink()
	KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
	KVM: x86: svm: make sure NMI is injected after nmi_singlestep
	Staging: iio: meter: fixed typo
	staging: iio: ad7192: Fix ad7193 channel address
	iio: gyro: mpu3050: fix chip ID reading
	iio/gyro/bmg160: Use millidegrees for temperature scale
	iio:chemical:bme680: Fix, report temperature in millidegrees
	iio:chemical:bme680: Fix SPI read interface
	iio: cros_ec: Fix the maths for gyro scale calculation
	iio: ad_sigma_delta: select channel when reading register
	iio: dac: mcp4725: add missing powerdown bits in store eeprom
	iio: Fix scan mask selection
	iio: adc: at91: disable adc channel interrupt in timeout case
	iio: core: fix a possible circular locking dependency
	io: accel: kxcjk1013: restore the range after resume.
	staging: most: core: use device description as name
	staging: comedi: vmk80xx: Fix use of uninitialized semaphore
	staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
	staging: comedi: ni_usb6501: Fix use of uninitialized mutex
	staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
	ALSA: hda/realtek - add two more pin configuration sets to quirk table
	ALSA: core: Fix card races between register and disconnect
	Input: elan_i2c - add hardware ID for multiple Lenovo laptops
	serial: sh-sci: Fix HSCIF RX sampling point adjustment
	serial: sh-sci: Fix HSCIF RX sampling point calculation
	vt: fix cursor when clearing the screen
	scsi: core: set result when the command cannot be dispatched
	Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO"
	Revert "svm: Fix AVIC incomplete IPI emulation"
	coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
	ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier
	crypto: x86/poly1305 - fix overflow during partial reduction
	drm/ttm: fix out-of-bounds read in ttm_put_pages() v2
	arm64: futex: Restore oldval initialization to work around buggy compilers
	x86/kprobes: Verify stack frame on kretprobe
	kprobes: Mark ftrace mcount handler functions nokprobe
	kprobes: Fix error check when reusing optimized probes
	rt2x00: do not increment sequence number while re-transmitting
	mac80211: do not call driver wake_tx_queue op during reconfig
	drm/amdgpu/gmc9: fix VM_L2_CNTL3 programming
	perf/x86/amd: Add event map for AMD Family 17h
	x86/cpu/bugs: Use __initconst for 'const' init data
	perf/x86: Fix incorrect PEBS_REGS
	x86/speculation: Prevent deadlock on ssb_state::lock
	timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze()
	nfit/ars: Remove ars_start_flags
	nfit/ars: Introduce scrub_flags
	nfit/ars: Allow root to busy-poll the ARS state machine
	nfit/ars: Avoid stale ARS results
	mmc: sdhci: Fix data command CRC error handling
	mmc: sdhci: Rename SDHCI_ACMD12_ERR and SDHCI_INT_ACMD12ERR
	mmc: sdhci: Handle auto-command errors
	modpost: file2alias: go back to simple devtable lookup
	modpost: file2alias: check prototype of handler
	tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete
	tpm: Fix the type of the return value in calc_tpm2_event_size()
	Revert "kbuild: use -Oz instead of -Os when using clang"
	sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
	device_cgroup: fix RCU imbalance in error case
	mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
	ALSA: info: Fix racy addition/deletion of nodes
	percpu: stop printing kernel addresses
	tools include: Adopt linux/bits.h
	ASoC: rockchip: add missing INTERLEAVED PCM attribute
	i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array
	Revert "locking/lockdep: Add debug_locks check in __lock_downgrade()"
	kernel/sysctl.c: fix out-of-bounds access when setting file-max
	Linux 4.19.37

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-30 12:53:00 +02:00
Matteo Croce
6580376fe8 percpu: stop printing kernel addresses
commit 00206a69ee32f03e6f40837684dcbe475ea02266 upstream.

Since commit ad67b74d24 ("printk: hash addresses printed with %p"),
at boot "____ptrval____" is printed instead of actual addresses:

    percpu: Embedded 38 pages/cpu @(____ptrval____) s124376 r0 d31272 u524288

Instead of changing the print to "%px", and leaking kernel addresses,
just remove the print completely, cfr. e.g. commit 071929dbdd
("arm64: Stop printing the virtual memory layout").

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27 09:36:40 +02:00
Konstantin Khlebnikov
1343fd8f96 mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
commit e8277b3b52240ec1caad8e6df278863e4bf42eac upstream.

Commit 58bc4c34d2 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly")
depends on skipping vmstat entries with empty name introduced in
7aaf772723 ("mm: don't show nr_indirectly_reclaimable in
/proc/vmstat") but reverted in b29940c1abd7 ("mm: rename and change
semantics of nr_indirectly_reclaimable_bytes").

So skipping no longer works and /proc/vmstat has misformatted lines " 0".

This patch simply shows debug counters "nr_tlb_remote_*" for UP.

Link: http://lkml.kernel.org/r/155481488468.467.4295519102880913454.stgit@buzz
Fixes: 58bc4c34d2 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Roman Gushchin <guro@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27 09:36:40 +02:00
Andrea Arcangeli
6ff17bc593 coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27 09:36:37 +02:00
Greg Kroah-Hartman
10f41ccfc7 This is the 4.19.36 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAly6xzUACgkQONu9yGCS
 aT5sIA//b7nAk2zuhmbkonsBfzFq5uBJmqXcCrOgy3XHMs4fE+Q11kLd1wMAV7dx
 U7FNHe4PIJ8Rczxgqr2VP3VmFbV6UuTK+UTclJKfbV3ouIAQiQBuutABBmbDUj2p
 FInc/yAYyhVc9n7gX78czTiUxKnKi4+sisUYDCZPr3hr6jDPcLvm/WVWdyrcXJje
 rYFNmE/2MBH1NofG+MOpq+ILhKHXlf2APN2/spl+I42a8bwodiSl9g+dhuWr7wgT
 Ln2Ocf7BZ6BPCQKoveZdD1Gd56NNR/lJh4ulqpuhaZw4Yp+B/C7GmrBtdPzVSGka
 IwPWoSc9/9VSUl+ooSZHms78VLbqq0rNNclskL2bN6m962u04Eu7sB2Tg/bwUs52
 Wkcw0DY4J/oMJtj/CMHcQOUPsk6vwHxqnjsj+LYJ1ZjHO68tUshnENxXrbAoDc45
 2fuY3TCA+XqFvqNt5HbkLPtFR78u8QmZ1lP/Pkri6xoG/GA6O0EAxhS0Z9hncGK7
 8wJNuxLMd2UX94wlajQ+DF7yyCU4HOFEdeSEOwlHHBid/fckXsGzL2tKJUAbbUPP
 ux3An8kJHni8nQrmUkyy1Nx29ROyAFxBLOQshWGpXgJrV3qRMYLyB2Icv0WYCGFk
 zZCTupPgvb46u81VzqxrLH4RZdy4Ar4uB3BQGPKs596rlYmvnSo=
 =CArs
 -----END PGP SIGNATURE-----

Merge 4.19.36 into android-4.19

Changes in 4.19.36
	ARC: u-boot args: check that magic number is correct
	arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM
	inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch()
	perf/core: Restore mmap record type correctly
	ext4: avoid panic during forced reboot
	ext4: add missing brelse() in add_new_gdb_meta_bg()
	ext4: report real fs size after failed resize
	ALSA: echoaudio: add a check for ioremap_nocache
	ALSA: sb8: add a check for request_region
	auxdisplay: hd44780: Fix memory leak on ->remove()
	drm/udl: use drm_gem_object_put_unlocked.
	IB/mlx4: Fix race condition between catas error reset and aliasguid flows
	i40iw: Avoid panic when handling the inetdev event
	mmc: davinci: remove extraneous __init annotation
	ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
	thermal/intel_powerclamp: fix __percpu declaration of worker_data
	thermal: samsung: Fix incorrect check after code merge
	thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs
	thermal/int340x_thermal: Add additional UUIDs
	thermal/int340x_thermal: fix mode setting
	thermal/intel_powerclamp: fix truncated kthread name
	scsi: iscsi: flush running unbind operations when removing a session
	sched/cpufreq: Fix 32-bit math overflow
	sched/core: Fix buffer overflow in cgroup2 property cpu.max
	x86/mm: Don't leak kernel addresses
	tools/power turbostat: return the exit status of a command
	perf list: Don't forget to drop the reference to the allocated thread_map
	perf config: Fix an error in the config template documentation
	perf config: Fix a memory leak in collect_config()
	perf build-id: Fix memory leak in print_sdt_events()
	perf top: Fix error handling in cmd_top()
	perf hist: Add missing map__put() in error case
	perf evsel: Free evsel->counts in perf_evsel__exit()
	perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test
	perf tests: Fix memory leak by expr__find_other() in test__expr()
	perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
	ACPI / utils: Drop reference in test for device presence
	PM / Domains: Avoid a potential deadlock
	blk-iolatency: #include "blk.h"
	drm/exynos/mixer: fix MIXER shadow registry synchronisation code
	irqchip/stm32: Don't clear rising/falling config registers at init
	irqchip/mbigen: Don't clear eventid when freeing an MSI
	x86/hpet: Prevent potential NULL pointer dereference
	x86/hyperv: Prevent potential NULL pointer dereference
	x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
	drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure
	iommu/vt-d: Check capability before disabling protected memory
	x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
	fix incorrect error code mapping for OBJECTID_NOT_FOUND
	x86/gart: Exclude GART aperture from kcore
	ext4: prohibit fstrim in norecovery mode
	drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up
	gpio: pxa: handle corner case of unprobed device
	rsi: improve kernel thread handling to fix kernel panic
	f2fs: fix to avoid NULL pointer dereference on se->discard_map
	9p: do not trust pdu content for stat item size
	9p locks: add mount option for lock retry interval
	ASoC: Fix UBSAN warning at snd_soc_get/put_volsw_sx()
	f2fs: fix to do sanity check with current segment number
	netfilter: xt_cgroup: shrink size of v2 path
	serial: uartps: console_setup() can't be placed to init section
	powerpc/pseries: Remove prrn_work workqueue
	media: au0828: cannot kfree dev before usb disconnect
	Bluetooth: Fix debugfs NULL pointer dereference
	HID: i2c-hid: override HID descriptors for certain devices
	pinctrl: core: make sure strcmp() doesn't get a null parameter
	ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms
	usbip: fix vhci_hcd controller counting
	ACPI / SBS: Fix GPE storm on recent MacBookPro's
	HID: usbhid: Add quirk for Redragon/Dragonrise Seymur 2
	KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
	compiler.h: update definition of unreachable()
	netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine
	f2fs: cleanup dirty pages if recover failed
	net: stmmac: Set OWN bit for jumbo frames
	cifs: fallback to older infolevels on findfirst queryinfo retry
	kernel: hung_task.c: disable on suspend
	platform/x86: Add Intel AtomISP2 dummy / power-management driver
	drm/ttm: Fix bo_global and mem_global kfree error
	ALSA: hda: fix front speakers on Huawei MBXP
	ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle
	net/rds: fix warn in rds_message_alloc_sgs
	xfrm: destroy xfrm_state synchronously on net exit path
	crypto: sha256/arm - fix crash bug in Thumb2 build
	crypto: sha512/arm - fix crash bug in Thumb2 build
	net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version
	iommu/dmar: Fix buffer overflow during PCI bus notification
	scsi: core: Avoid that system resume triggers a kernel warning
	soc/tegra: pmc: Drop locking from tegra_powergate_is_powered()
	lkdtm: Print real addresses
	lkdtm: Add tests for NULL pointer dereference
	drm/panel: panel-innolux: set display off in innolux_panel_unprepare
	crypto: axis - fix for recursive locking from bottom half
	Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk"
	coresight: cpu-debug: Support for CA73 CPUs
	PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports
	drm/nouveau/volt/gf117: fix speedo readout register
	ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
	drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI)
	appletalk: Fix use-after-free in atalk_proc_exit
	lib/div64.c: off by one in shift
	rxrpc: Fix client call connect/disconnect race
	f2fs: fix to dirty inode for i_mode recovery
	include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
	bpf: fix use after free in bpf_evict_inode
	IB/hfi1: Failed to drain send queue when QP is put into error state
	mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo
	mm: hide incomplete nr_indirectly_reclaimable in sysfs
	appletalk: Fix compile regression
	Linux 4.19.36

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-20 15:53:36 +02:00
Roman Gushchin
d49dea545a mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo
[fixed differently upstream, this is a work-around to resolve it for 4.19.y]

Yongqin reported that /proc/zoneinfo format is broken in 4.14
due to commit 7aaf772723 ("mm: don't show nr_indirectly_reclaimable
in /proc/vmstat")

Node 0, zone      DMA
  per-node stats
      nr_inactive_anon 403
      nr_active_anon 89123
      nr_inactive_file 128887
      nr_active_file 47377
      nr_unevictable 2053
      nr_slab_reclaimable 7510
      nr_slab_unreclaimable 10775
      nr_isolated_anon 0
      nr_isolated_file 0
      <...>
      nr_vmscan_write 0
      nr_vmscan_immediate_reclaim 0
      nr_dirtied   6022
      nr_written   5985
                   74240
      ^^^^^^^^^^
  pages free     131656

The problem is caused by the nr_indirectly_reclaimable counter,
which is hidden from the /proc/vmstat, but not from the
/proc/zoneinfo. Let's fix this inconsistency and hide the
counter from /proc/zoneinfo exactly as from /proc/vmstat.

BTW, in 4.19+ the counter has been renamed and exported by
the commit b29940c1abd7 ("mm: rename and change semantics of
nr_indirectly_reclaimable_bytes"), so there is no such a problem
anymore.

Cc: <stable@vger.kernel.org> # 4.14.x-4.18.x
Fixes: 7aaf772723 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat")
Reported-by: Yongqin Liu <yongqin.liu@linaro.org>
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-20 09:16:05 +02:00
Greg Kroah-Hartman
3e166630b0 This is the 4.19.35 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAly2yf8ACgkQONu9yGCS
 aT5OkA/+Jf3UWnM8MdPoxVUROYg6WRSMThvuKkcspkvXxXqu7Z9a/+meqdIZ0NDr
 j+RdhCfZf7RRNpTpYB9jMqXXuRFhVWniAVM3/me2LxFvwyKkNqURyIz9azh7T0VW
 nDa2NDQ9C0IyLFLW4cyzZncrmWntgW5XKZfVot1TMNpryHB4FPsKP5rInDQpSNIu
 bbJBWuZziUqdEnk3TWuqdQr4ZTXgnjYSBaTbOQBU3F6E88TRmFIatyYovLcJCQh5
 jBmXV3U6mGPZe64I+JXj8dTp32WD74afMX1YSZmjdLL6KgbD9a4k47UzpXzEbKT8
 uMG057zPiYcwOyMCFZOWsCYIH7Yr4oMIAzOEyYcDI1uCuKd4aESdpgiWfUWambR8
 BJO5oGELzLArMVWusWK7xBfm+Et2ePOFWC7V9MRVVQJLdAWkEafYuLoPqrhhqxQX
 Iu3ThoY3UmEoNqi1345gmxQBJfbOqdK7CwQGzKOtiPNs0nyMfg62B9Rcr9O+FE5A
 +womRQF2Tik1cZhXxkJVSD4f+orL+xuOu59ufvMTBe4co9tPVFm9Qp6SeWoOVuwZ
 l8SS7DD5wb7vzDUZOO4KmA7D6p3YQWdBvGeN6jKqjASIFS9t5Sb+fNiKznTo/WCT
 1I68sLm7/rs95+WazKu66ewW8bh91VAo+Gmpfo5zEJjcfajnDxY=
 =F1Te
 -----END PGP SIGNATURE-----

Merge 4.19.35 into android-4.19

Changes in 4.19.35
	kvm: nVMX: NMI-window and interrupt-window exiting should wake L2 from HLT
	drm/i915/gvt: do not let pin count of shadow mm go negative
	powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM
	hv_netvsc: Fix unwanted wakeup after tx_disable
	ibmvnic: Fix completion structure initialization
	ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type
	ipv6: Fix dangling pointer when ipv6 fragment
	ipv6: sit: reset ip header pointer in ipip6_rcv
	kcm: switch order of device registration to fix a crash
	net: ethtool: not call vzalloc for zero sized memory request
	net-gro: Fix GRO flush when receiving a GSO packet.
	net/mlx5: Decrease default mr cache size
	netns: provide pure entropy for net_hash_mix()
	net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().
	net/sched: act_sample: fix divide by zero in the traffic path
	net/sched: fix ->get helper of the matchall cls
	openvswitch: fix flow actions reallocation
	qmi_wwan: add Olicard 600
	r8169: disable ASPM again
	sctp: initialize _pad of sockaddr_in before copying to user memory
	tcp: Ensure DCTCP reacts to losses
	tcp: fix a potential NULL pointer dereference in tcp_sk_exit
	vrf: check accept_source_route on the original netdevice
	net/mlx5e: Fix error handling when refreshing TIRs
	net/mlx5e: Add a lock on tir list
	nfp: validate the return code from dev_queue_xmit()
	nfp: disable netpoll on representors
	bnxt_en: Improve RX consumer index validity check.
	bnxt_en: Reset device on RX buffer errors.
	net: ip_gre: fix possible use-after-free in erspan_rcv
	net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
	net: core: netif_receive_skb_list: unlist skb before passing to pt->func
	r8169: disable default rx interrupt coalescing on RTL8168
	net: mlx5: Add a missing check on idr_find, free buf
	net/mlx5e: Update xoff formula
	net/mlx5e: Update xon formula
	kbuild: deb-pkg: fix bindeb-pkg breakage when O= is used
	kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD
	x86/vdso: Drop implicit common-page-size linker flag
	lib/string.c: implement a basic bcmp
	Revert "clk: meson: clean-up clock registration"
	netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr
	netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, too
	arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear region
	tty: mark Siemens R3964 line discipline as BROKEN
	tty: ldisc: add sysctl to prevent autoloading of ldiscs
	hwmon: (w83773g) Select REGMAP_I2C to fix build error
	ACPICA: Clear status of GPEs before enabling them
	ACPICA: Namespace: remove address node from global list after method termination
	ALSA: seq: Fix OOB-reads from strlcpy
	ALSA: hda/realtek: Enable headset MIC of Acer TravelMate B114-21 with ALC233
	ALSA: hda/realtek - Add quirk for Tuxedo XC 1509
	ALSA: hda - Add two more machines to the power_save_blacklist
	mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
	arm64: dts: rockchip: fix rk3328 sdmmc0 write errors
	parisc: Detect QEMU earlier in boot process
	parisc: regs_return_value() should return gpr28
	parisc: also set iaoq_b in instruction_pointer_set()
	alarmtimer: Return correct remaining time
	drm/i915/gvt: do not deliver a workload if its creation fails
	drm/udl: add a release method and delay modeset teardown
	kvm: svm: fix potential get_num_contig_pages overflow
	include/linux/bitrev.h: fix constant bitrev
	mm: writeback: use exact memcg dirty counts
	ASoC: intel: Fix crash at suspend/resume after failed codec registration
	ASoC: fsl_esai: fix channel swap issue when stream starts
	Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
	btrfs: prop: fix zstd compression parameter validation
	btrfs: prop: fix vanished compression property after failed set
	riscv: Fix syscall_get_arguments() and syscall_set_arguments()
	block: do not leak memory in bio_copy_user_iov()
	block: fix the return errno for direct IO
	genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
	genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n
	virtio: Honour 'may_reduce_num' in vring_create_virtqueue
	ARM: dts: rockchip: fix rk3288 cpu opp node reference
	ARM: dts: am335x-evmsk: Correct the regulators for the audio codec
	ARM: dts: am335x-evm: Correct the regulators for the audio codec
	ARM: dts: at91: Fix typo in ISC_D0 on PC9
	arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
	arm64: dts: rockchip: fix rk3328 rgmii high tx error rate
	arm64: backtrace: Don't bother trying to unwind the userspace stack
	xen: Prevent buffer overflow in privcmd ioctl
	sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
	xtensa: fix return_address
	x86/asm: Remove dead __GNUC__ conditionals
	x86/asm: Use stricter assembly constraints in bitops
	x86/perf/amd: Resolve race condition when disabling PMC
	x86/perf/amd: Resolve NMI latency issues for active PMCs
	x86/perf/amd: Remove need to check "running" bit in NMI handler
	PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
	PCI: pciehp: Ignore Link State Changes after powering off a slot
	dm integrity: change memcmp to strncmp in dm_integrity_ctr
	dm: revert 8f50e35815 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
	dm table: propagate BDI_CAP_STABLE_WRITES to fix sporadic checksum errors
	dm integrity: fix deadlock with overlapping I/O
	arm64: dts: rockchip: fix vcc_host1_5v pin assign on rk3328-rock64
	arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity on rk3328-rock64
	ACPICA: AML interpreter: add region addresses in global list during initialization
	KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)
	KVM: x86: nVMX: fix x2APIC VTPR read intercept
	Linux 4.19.35

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-17 11:46:16 +02:00
Greg Thelen
43f47331a4 mm: writeback: use exact memcg dirty counts
commit 0b3d6e6f2dd0a7b697b1aa8c167265908940624b upstream.

Since commit a983b5ebee ("mm: memcontrol: fix excessive complexity in
memory.stat reporting") memcg dirty and writeback counters are managed
as:

 1) per-memcg per-cpu values in range of [-32..32]

 2) per-memcg atomic counter

When a per-cpu counter cannot fit in [-32..32] it's flushed to the
atomic.  Stat readers only check the atomic.  Thus readers such as
balance_dirty_pages() may see a nontrivial error margin: 32 pages per
cpu.

Assuming 100 cpus:
   4k x86 page_size:  13 MiB error per memcg
  64k ppc page_size: 200 MiB error per memcg

Considering that dirty+writeback are used together for some decisions the
errors double.

This inaccuracy can lead to undeserved oom kills.  One nasty case is
when all per-cpu counters hold positive values offsetting an atomic
negative value (i.e.  per_cpu[*]=32, atomic=n_cpu*-32).
balance_dirty_pages() only consults the atomic and does not consider
throttling the next n_cpu*32 dirty pages.  If the file_lru is in the
13..200 MiB range then there's absolutely no dirty throttling, which
burdens vmscan with only dirty+writeback pages thus resorting to oom
kill.

It could be argued that tiny containers are not supported, but it's more
subtle.  It's the amount the space available for file lru that matters.
If a container has memory.max-200MiB of non reclaimable memory, then it
will also suffer such oom kills on a 100 cpu machine.

The following test reliably ooms without this patch.  This patch avoids
oom kills.

  $ cat test
  mount -t cgroup2 none /dev/cgroup
  cd /dev/cgroup
  echo +io +memory > cgroup.subtree_control
  mkdir test
  cd test
  echo 10M > memory.max
  (echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo)
  (echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)

  $ cat memcg-writeback-stress.c
  /*
   * Dirty pages from all but one cpu.
   * Clean pages from the non dirtying cpu.
   * This is to stress per cpu counter imbalance.
   * On a 100 cpu machine:
   * - per memcg per cpu dirty count is 32 pages for each of 99 cpus
   * - per memcg atomic is -99*32 pages
   * - thus the complete dirty limit: sum of all counters 0
   * - balance_dirty_pages() only sees atomic count -99*32 pages, which
   *   it max()s to 0.
   * - So a workload can dirty -99*32 pages before balance_dirty_pages()
   *   cares.
   */
  #define _GNU_SOURCE
  #include <err.h>
  #include <fcntl.h>
  #include <sched.h>
  #include <stdlib.h>
  #include <stdio.h>
  #include <sys/stat.h>
  #include <sys/sysinfo.h>
  #include <sys/types.h>
  #include <unistd.h>

  static char *buf;
  static int bufSize;

  static void set_affinity(int cpu)
  {
  	cpu_set_t affinity;

  	CPU_ZERO(&affinity);
  	CPU_SET(cpu, &affinity);
  	if (sched_setaffinity(0, sizeof(affinity), &affinity))
  		err(1, "sched_setaffinity");
  }

  static void dirty_on(int output_fd, int cpu)
  {
  	int i, wrote;

  	set_affinity(cpu);
  	for (i = 0; i < 32; i++) {
  		for (wrote = 0; wrote < bufSize; ) {
  			int ret = write(output_fd, buf+wrote, bufSize-wrote);
  			if (ret == -1)
  				err(1, "write");
  			wrote += ret;
  		}
  	}
  }

  int main(int argc, char **argv)
  {
  	int cpu, flush_cpu = 1, output_fd;
  	const char *output;

  	if (argc != 2)
  		errx(1, "usage: output_file");

  	output = argv[1];
  	bufSize = getpagesize();
  	buf = malloc(getpagesize());
  	if (buf == NULL)
  		errx(1, "malloc failed");

  	output_fd = open(output, O_CREAT|O_RDWR);
  	if (output_fd == -1)
  		err(1, "open(%s)", output);

  	for (cpu = 0; cpu < get_nprocs(); cpu++) {
  		if (cpu != flush_cpu)
  			dirty_on(output_fd, cpu);
  	}

  	set_affinity(flush_cpu);
  	if (fsync(output_fd))
  		err(1, "fsync(%s)", output);
  	if (close(output_fd))
  		err(1, "close(%s)", output);
  	free(buf);
  }

Make balance_dirty_pages() and wb_over_bg_thresh() work harder to
collect exact per memcg counters.  This avoids the aforementioned oom
kills.

This does not affect the overhead of memory.stat, which still reads the
single atomic counter.

Why not use percpu_counter? memcg already handles cpus going offline, so
no need for that overhead from percpu_counter.  And the percpu_counter
spinlocks are more heavyweight than is required.

It probably also makes sense to use exact dirty and writeback counters
in memcg oom reports.  But that is saved for later.

Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>	[4.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17 08:38:51 +02:00
Aneesh Kumar K.V
9a62d69114 mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
commit c6f3c5ee40c10bb65725047a220570f718507001 upstream.

With some architectures like ppc64, set_pmd_at() cannot cope with a
situation where there is already some (different) valid entry present.

Use pmdp_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PMD entries.

This is similar to commit cae85cb8add3 ("mm/memory.c: fix modifying of
page protection by insert_pfn()")

We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't
support pud pfn entries now.

Without this patch we also see the below message in kernel log "BUG:
non-zero pgtables_bytes on freeing mm:"

Link: http://lkml.kernel.org/r/20190402115125.18803-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17 08:38:49 +02:00
Greg Kroah-Hartman
d885da678e This is the 4.19.34 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlynu40ACgkQONu9yGCS
 aT5X6g//Wkfm/+qSZ0GhLDQkPniiH1QkvzhOmVrrxu+KB0qsiwsEl8Srw33ZVkJK
 LT8+IPGiG9jEGu9dj+BYXTIfy9ZvfSsEL2N6GhYwDSXP0fok2rUaHbZvv1IB2g4W
 afhGdNwNAUCJ/j1UrUsi+SAFJ+xWbVxFpGstd0cqM9IbKdEV7RIukvuKckHiKOKR
 qI8FxC+G2PAr+BtnETfk5/suPDJ7B3ZicDoMhiWJGxJ6dfFTVmkSmasSoPDaMiHm
 4S3hN2lu+WTeRpRPPB17Dlk4MmIp0k+bGYBKAlaxAMCc/RZxvbT2pRYaMQbId2/L
 mNUfSnOQFGEAhlAPfb7wdbObphnyT34GhlkWfZBTrnhPO0/FomLOvU6xVdcNuakX
 Tv2JKfDzb+2ttcMZ+0T84Ru9RztoswFATSw8uFMVxW8oTS6MVWnHu96Kxfl7QO3J
 PdlIGcyqxSuWNE8OX1QVtdSruGZfwUDNs94S4nQJtkB8BViRwhGJlqaXuy4d9Wp6
 fGlI2W6qhjyosi2wBSMTjh/ytk/jq0vfs+z2XjR2gAYssvB/SOLR/AlSVguWsDnf
 WaoFBkXvCbuPvPlo0TrLpl5RW5WlOtLXHE3Vr3dKp458wLwpf/OZBGoZiknp7DrF
 PzBZs2ie5tmyqTxbAygl7WkbQPJ682pd5R4nf5CY+zvUaOMZv1g=
 =Iuup
 -----END PGP SIGNATURE-----

Merge 4.19.34 into android-4.19

Changes in 4.19.34
	arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signals
	ext4: cleanup bh release code in ext4_ind_remove_space()
	tty/serial: atmel: Add is_half_duplex helper
	tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped
	CIFS: fix POSIX lock leak and invalid ptr deref
	h8300: use cc-cross-prefix instead of hardcoding h8300-unknown-linux-
	f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
	f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
	tracing: kdb: Fix ftdump to not sleep
	net/mlx5: Avoid panic when setting vport rate
	net/mlx5: Avoid panic when setting vport mac, getting vport config
	gpio: gpio-omap: fix level interrupt idling
	include/linux/relay.h: fix percpu annotation in struct rchan
	sysctl: handle overflow for file-max
	net: stmmac: Avoid sometimes uninitialized Clang warnings
	enic: fix build warning without CONFIG_CPUMASK_OFFSTACK
	libbpf: force fixdep compilation at the start of the build
	scsi: hisi_sas: Set PHY linkrate when disconnected
	scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
	iio: adc: fix warning in Qualcomm PM8xxx HK/XOADC driver
	x86/hyperv: Fix kernel panic when kexec on HyperV
	perf c2c: Fix c2c report for empty numa node
	mm/sparse: fix a bad comparison
	mm/cma.c: cma_declare_contiguous: correct err handling
	mm/page_ext.c: fix an imbalance with kmemleak
	mm, swap: bounds check swap_info array accesses to avoid NULL derefs
	mm,oom: don't kill global init via memory.oom.group
	memcg: killed threads should not invoke memcg OOM killer
	mm, mempolicy: fix uninit memory access
	mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
	mm/slab.c: kmemleak no scan alien caches
	ocfs2: fix a panic problem caused by o2cb_ctl
	f2fs: do not use mutex lock in atomic context
	fs/file.c: initialize init_files.resize_wait
	page_poison: play nicely with KASAN
	cifs: use correct format characters
	dm thin: add sanity checks to thin-pool and external snapshot creation
	f2fs: fix to check inline_xattr_size boundary correctly
	cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED
	cifs: Fix NULL pointer dereference of devname
	netfilter: nf_tables: check the result of dereferencing base_chain->stats
	netfilter: conntrack: tcp: only close if RST matches exact sequence
	jbd2: fix invalid descriptor block checksum
	fs: fix guard_bio_eod to check for real EOD errors
	tools lib traceevent: Fix buffer overflow in arg_eval
	PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()
	wil6210: check null pointer in _wil_cfg80211_merge_extra_ies
	mt76: fix a leaked reference by adding a missing of_node_put
	crypto: crypto4xx - add missing of_node_put after of_device_is_available
	crypto: cavium/zip - fix collision with generic cra_driver_name
	usb: chipidea: Grab the (legacy) USB PHY by phandle first
	powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables
	scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
	kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
	powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc
	coresight: etm4x: Add support to enable ETMv4.2
	serial: 8250_pxa: honor the port number from devicetree
	ARM: 8840/1: use a raw_spinlock_t in unwind
	iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables
	powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback
	btrfs: qgroup: Make qgroup async transaction commit more aggressive
	mmc: omap: fix the maximum timeout setting
	net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat
	e1000e: Fix -Wformat-truncation warnings
	mlxsw: spectrum: Avoid -Wformat-truncation warnings
	platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN
	platform/mellanox: mlxreg-hotplug: Fix KASAN warning
	loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
	IB/mlx4: Increase the timeout for CM cache
	clk: fractional-divider: check parent rate only if flag is set
	perf annotate: Fix getting source line failure
	ASoC: qcom: Fix of-node refcount unbalance in qcom_snd_parse_of()
	cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
	efi: cper: Fix possible out-of-bounds access
	s390/ism: ignore some errors during deregistration
	scsi: megaraid_sas: return error when create DMA pool failed
	scsi: fcoe: make use of fip_mode enum complete
	drm/amd/display: Clear stream->mode_changed after commit
	perf test: Fix failure of 'evsel-tp-sched' test on s390
	mwifiex: don't advertise IBSS features without FW support
	perf report: Don't shadow inlined symbol with different addr range
	SoC: imx-sgtl5000: add missing put_device()
	media: ov7740: fix runtime pm initialization
	media: sh_veu: Correct return type for mem2mem buffer helpers
	media: s5p-jpeg: Correct return type for mem2mem buffer helpers
	media: rockchip/rga: Correct return type for mem2mem buffer helpers
	media: s5p-g2d: Correct return type for mem2mem buffer helpers
	media: mx2_emmaprp: Correct return type for mem2mem buffer helpers
	media: mtk-jpeg: Correct return type for mem2mem buffer helpers
	mt76: usb: do not run mt76u_queues_deinit twice
	xen/gntdev: Do not destroy context while dma-bufs are in use
	vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
	HID: intel-ish-hid: avoid binding wrong ishtp_cl_device
	cgroup, rstat: Don't flush subtree root unless necessary
	jbd2: fix race when writing superblock
	leds: lp55xx: fix null deref on firmware load failure
	perf report: Add s390 diagnosic sampling descriptor size
	iwlwifi: pcie: fix emergency path
	ACPI / video: Refactor and fix dmi_is_desktop()
	selftests: skip seccomp get_metadata test if not real root
	kprobes: Prohibit probing on bsearch()
	kprobes: Prohibit probing on RCU debug routine
	netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm
	ARM: 8833/1: Ensure that NEON code always compiles with Clang
	ARM: dts: meson8b: fix the Ethernet data line signals in eth_rgmii_pins
	ALSA: PCM: check if ops are defined before suspending PCM
	ath10k: fix shadow register implementation for WCN3990
	usb: f_fs: Avoid crash due to out-of-scope stack ptr access
	sched/topology: Fix percpu data types in struct sd_data & struct s_data
	bcache: fix input overflow to cache set sysfs file io_error_halflife
	bcache: fix input overflow to sequential_cutoff
	bcache: fix potential div-zero error of writeback_rate_i_term_inverse
	bcache: improve sysfs_strtoul_clamp()
	genirq: Avoid summation loops for /proc/stat
	net: marvell: mvpp2: fix stuck in-band SGMII negotiation
	iw_cxgb4: fix srqidx leak during connection abort
	net: phy: consider latched link-down status in polling mode
	fbdev: fbmem: fix memory access if logo is bigger than the screen
	cdrom: Fix race condition in cdrom_sysctl_register
	drm: rcar-du: add missing of_node_put
	drm/amd/display: Don't re-program planes for DPMS changes
	drm/amd/display: Disconnect mpcc when changing tg
	perf/aux: Make perf_event accessible to setup_aux()
	e1000e: fix cyclic resets at link up with active tx
	e1000e: Exclude device from suspend direct complete optimization
	platform/x86: intel_pmc_core: Fix PCH IP sts reading
	i2c: of: Try to find an I2C adapter matching the parent
	staging: spi: mt7621: Add return code check on device_reset()
	iwlwifi: mvm: fix RFH config command with >=10 CPUs
	ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe
	sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
	efi/memattr: Don't bail on zero VA if it equals the region's PA
	sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
	drm/vkms: Bugfix extra vblank frame
	ARM: dts: lpc32xx: Remove leading 0x and 0s from bindings notation
	efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted
	soc: qcom: gsbi: Fix error handling in gsbi_probe()
	mt7601u: bump supported EEPROM version
	ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of
	ARM: avoid Cortex-A9 livelock on tight dmb loops
	block, bfq: fix in-service-queue check for queue merging
	bpf: fix missing prototype warnings
	selftests/bpf: skip verifier tests for unsupported program types
	powerpc/64s: Clear on-stack exception marker upon exception return
	cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting
	backlight: pwm_bl: Use gpiod_get_value_cansleep() to get initial state
	tty: increase the default flip buffer limit to 2*640K
	powerpc/pseries: Perform full re-add of CPU for topology update post-migration
	drm/amd/display: Enable vblank interrupt during CRC capture
	ALSA: dice: add support for Solid State Logic Duende Classic/Mini
	usb: dwc3: gadget: Fix OTG events when gadget driver isn't loaded
	platform/x86: intel-hid: Missing power button release on some Dell models
	perf script python: Use PyBytes for attr in trace-event-python
	perf script python: Add trace_context extension module to sys.modules
	media: mt9m111: set initial frame size other than 0x0
	hwrng: virtio - Avoid repeated init of completion
	soc/tegra: fuse: Fix illegal free of IO base address
	HID: intel-ish: ipc: handle PIMR before ish_wakeup also clear PISR busy_clear bit
	f2fs: UBSAN: set boolean value iostat_enable correctly
	hpet: Fix missing '=' character in the __setup() code of hpet_mmap_enable
	cpu/hotplug: Mute hotplug lockdep during init
	dmaengine: imx-dma: fix warning comparison of distinct pointer types
	dmaengine: qcom_hidma: assign channel cookie correctly
	dmaengine: qcom_hidma: initialize tx flags in hidma_prep_dma_*
	netfilter: physdev: relax br_netfilter dependency
	media: rcar-vin: Allow independent VIN link enablement
	media: s5p-jpeg: Check for fmt_ver_flag when doing fmt enumeration
	regulator: act8865: Fix act8600_sudcdc_voltage_ranges setting
	pinctrl: meson: meson8b: add the eth_rxd2 and eth_rxd3 pins
	drm: Auto-set allow_fb_modifiers when given modifiers at plane init
	drm/nouveau: Stop using drm_crtc_force_disable
	x86/build: Specify elf_i386 linker emulation explicitly for i386 objects
	selinux: do not override context on context mounts
	brcmfmac: Use firmware_request_nowarn for the clm_blob
	wlcore: Fix memory leak in case wl12xx_fetch_firmware failure
	x86/build: Mark per-CPU symbols as absolute explicitly for LLD
	drm/fb-helper: fix leaks in error path of drm_fb_helper_fbdev_setup
	clk: meson: clean-up clock registration
	clk: rockchip: fix frac settings of GPLL clock for rk3328
	dmaengine: tegra: avoid overflow of byte tracking
	Input: soc_button_array - fix mapping of the 5th GPIO in a PNP0C40 device
	drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers
	net: stmmac: Avoid one more sometimes uninitialized Clang warning
	ACPI / video: Extend chassis-type detection with a "Lunch Box" check
	bcache: fix potential div-zero error of writeback_rate_p_term_inverse
	kprobes/x86: Blacklist non-attachable interrupt functions
	Linux 4.19.34

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-05 22:43:09 +02:00
Qian Cai
a6c56bf63e page_poison: play nicely with KASAN
[ Upstream commit 4117992df66a26fa33908b4969e04801534baab1 ]

KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING).
It triggers false positives in the allocation path:

  BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330
  Read of size 8 at addr ffff88881f800000 by task swapper/0
  CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54
  Call Trace:
   dump_stack+0xe0/0x19a
   print_address_description.cold.2+0x9/0x28b
   kasan_report.cold.3+0x7a/0xb5
   __asan_report_load8_noabort+0x19/0x20
   memchr_inv+0x2ea/0x330
   kernel_poison_pages+0x103/0x3d5
   get_page_from_freelist+0x15e7/0x4d90

because KASAN has not yet unpoisoned the shadow page for allocation
before it checks memchr_inv() but only found a stale poison pattern.

Also, false positives in free path,

  BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5
  Write of size 4096 at addr ffff8888112cc000 by task swapper/0/1
  CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55
  Call Trace:
   dump_stack+0xe0/0x19a
   print_address_description.cold.2+0x9/0x28b
   kasan_report.cold.3+0x7a/0xb5
   check_memory_region+0x22d/0x250
   memset+0x28/0x40
   kernel_poison_pages+0x29e/0x3d5
   __free_pages_ok+0x75f/0x13e0

due to KASAN adds poisoned redzones around slab objects, but the page
poisoning needs to poison the whole page.

Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:59 +02:00
Qian Cai
f09c424cea mm/slab.c: kmemleak no scan alien caches
[ Upstream commit 92d1d07daad65c300c7d0b68bbef8867e9895d54 ]

Kmemleak throws endless warnings during boot due to in
__alloc_alien_cache(),

    alc = kmalloc_node(memsize, gfp, node);
    init_arraycache(&alc->ac, entries, batch);
    kmemleak_no_scan(ac);

Kmemleak does not track the array cache (alc->ac) but the alien cache
(alc) instead, so let it track the latter by lifting kmemleak_no_scan()
out of init_arraycache().

There is another place that calls init_arraycache(), but
alloc_kmem_cache_cpus() uses the percpu allocation where will never be
considered as a leak.

  kmemleak: Found object by alias at 0xffff8007b9aa7e38
  CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
  Call trace:
   dump_backtrace+0x0/0x168
   show_stack+0x24/0x30
   dump_stack+0x88/0xb0
   lookup_object+0x84/0xac
   find_and_get_object+0x84/0xe4
   kmemleak_no_scan+0x74/0xf4
   setup_kmem_cache_node+0x2b4/0x35c
   __do_tune_cpucache+0x250/0x2d4
   do_tune_cpucache+0x4c/0xe4
   enable_cpucache+0xc8/0x110
   setup_cpu_cache+0x40/0x1b8
   __kmem_cache_create+0x240/0x358
   create_cache+0xc0/0x198
   kmem_cache_create_usercopy+0x158/0x20c
   kmem_cache_create+0x50/0x64
   fsnotify_init+0x58/0x6c
   do_one_initcall+0x194/0x388
   kernel_init_freeable+0x668/0x688
   kernel_init+0x18/0x124
   ret_from_fork+0x10/0x18
  kmemleak: Object 0xffff8007b9aa7e00 (size 256):
  kmemleak:   comm "swapper/0", pid 1, jiffies 4294697137
  kmemleak:   min_count = 1
  kmemleak:   count = 0
  kmemleak:   flags = 0x1
  kmemleak:   checksum = 0
  kmemleak:   backtrace:
       kmemleak_alloc+0x84/0xb8
       kmem_cache_alloc_node_trace+0x31c/0x3a0
       __kmalloc_node+0x58/0x78
       setup_kmem_cache_node+0x26c/0x35c
       __do_tune_cpucache+0x250/0x2d4
       do_tune_cpucache+0x4c/0xe4
       enable_cpucache+0xc8/0x110
       setup_cpu_cache+0x40/0x1b8
       __kmem_cache_create+0x240/0x358
       create_cache+0xc0/0x198
       kmem_cache_create_usercopy+0x158/0x20c
       kmem_cache_create+0x50/0x64
       fsnotify_init+0x58/0x6c
       do_one_initcall+0x194/0x388
       kernel_init_freeable+0x668/0x688
       kernel_init+0x18/0x124
  kmemleak: Not scanning unknown object at 0xffff8007b9aa7e38
  CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
  Call trace:
   dump_backtrace+0x0/0x168
   show_stack+0x24/0x30
   dump_stack+0x88/0xb0
   kmemleak_no_scan+0x90/0xf4
   setup_kmem_cache_node+0x2b4/0x35c
   __do_tune_cpucache+0x250/0x2d4
   do_tune_cpucache+0x4c/0xe4
   enable_cpucache+0xc8/0x110
   setup_cpu_cache+0x40/0x1b8
   __kmem_cache_create+0x240/0x358
   create_cache+0xc0/0x198
   kmem_cache_create_usercopy+0x158/0x20c
   kmem_cache_create+0x50/0x64
   fsnotify_init+0x58/0x6c
   do_one_initcall+0x194/0x388
   kernel_init_freeable+0x668/0x688
   kernel_init+0x18/0x124
   ret_from_fork+0x10/0x18

Link: http://lkml.kernel.org/r/20190129184518.39808-1-cai@lca.pw
Fixes: 1fe00d50a9 ("slab: factor out initialization of array cache")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:58 +02:00
Uladzislau Rezki (Sony)
8a0fc62e33 mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512!
[ Upstream commit afd07389d3f4933c7f7817a92fb5e053d59a3182 ]

One of the vmalloc stress test case triggers the kernel BUG():

  <snip>
  [60.562151] ------------[ cut here ]------------
  [60.562154] kernel BUG at mm/vmalloc.c:512!
  [60.562206] invalid opcode: 0000 [#1] PREEMPT SMP PTI
  [60.562247] CPU: 0 PID: 430 Comm: vmalloc_test/0 Not tainted 4.20.0+ #161
  [60.562293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
  [60.562351] RIP: 0010:alloc_vmap_area+0x36f/0x390
  <snip>

it can happen due to big align request resulting in overflowing of
calculated address, i.e.  it becomes 0 after ALIGN()'s fixup.

Fix it by checking if calculated address is within vstart/vend range.

Link: http://lkml.kernel.org/r/20190124115648.9433-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:58 +02:00
Vlastimil Babka
67abbb9c54 mm, mempolicy: fix uninit memory access
[ Upstream commit 2e25644e8da4ed3a27e7b8315aaae74660be72dc ]

Syzbot with KMSAN reports (excerpt):

==================================================================
BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:353 [inline]
BUG: KMSAN: uninit-value in mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
CPU: 1 PID: 17420 Comm: syz-executor4 Not tainted 4.20.0-rc7+ #15
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x173/0x1d0 lib/dump_stack.c:113
  kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
  __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:295
  mpol_rebind_policy mm/mempolicy.c:353 [inline]
  mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
  update_tasks_nodemask+0x608/0xca0 kernel/cgroup/cpuset.c:1120
  update_nodemasks_hier kernel/cgroup/cpuset.c:1185 [inline]
  update_nodemask kernel/cgroup/cpuset.c:1253 [inline]
  cpuset_write_resmask+0x2a98/0x34b0 kernel/cgroup/cpuset.c:1728

...

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
  kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
  kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
  kmem_cache_alloc+0x572/0xb90 mm/slub.c:2777
  mpol_new mm/mempolicy.c:276 [inline]
  do_mbind mm/mempolicy.c:1180 [inline]
  kernel_mbind+0x8a7/0x31a0 mm/mempolicy.c:1347
  __do_sys_mbind mm/mempolicy.c:1354 [inline]

As it's difficult to report where exactly the uninit value resides in
the mempolicy object, we have to guess a bit.  mm/mempolicy.c:353
contains this part of mpol_rebind_policy():

        if (!mpol_store_user_nodemask(pol) &&
            nodes_equal(pol->w.cpuset_mems_allowed, *newmask))

"mpol_store_user_nodemask(pol)" is testing pol->flags, which I couldn't
ever see being uninitialized after leaving mpol_new().  So I'll guess
it's actually about accessing pol->w.cpuset_mems_allowed on line 354,
but still part of statement starting on line 353.

For w.cpuset_mems_allowed to be not initialized, and the nodes_equal()
reachable for a mempolicy where mpol_set_nodemask() is called in
do_mbind(), it seems the only possibility is a MPOL_PREFERRED policy
with empty set of nodes, i.e.  MPOL_LOCAL equivalent, with MPOL_F_LOCAL
flag.  Let's exclude such policies from the nodes_equal() check.  Note
the uninit access should be benign anyway, as rebinding this kind of
policy is always a no-op.  Therefore no actual need for stable
inclusion.

Link: http://lkml.kernel.org/r/a71997c3-e8ae-a787-d5ce-3db05768b27c@suse.cz
Link: http://lkml.kernel.org/r/73da3e9c-cc84-509e-17d9-0c434bb9967d@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: syzbot+b19c2dc2c990ea657a71@syzkaller.appspotmail.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:58 +02:00
Tetsuo Handa
9d785b92cf memcg: killed threads should not invoke memcg OOM killer
[ Upstream commit 7775face207922ea62a4e96b9cd45abfdc7b9840 ]

If a memory cgroup contains a single process with many threads
(including different process group sharing the mm) then it is possible
to trigger a race when the oom killer complains that there are no oom
elible tasks and complain into the log which is both annoying and
confusing because there is no actual problem.  The race looks as
follows:

P1				oom_reaper		P2
try_charge						try_charge
  mem_cgroup_out_of_memory
    mutex_lock(oom_lock)
      out_of_memory
        oom_kill_process(P1,P2)
         wake_oom_reaper
    mutex_unlock(oom_lock)
    				oom_reap_task
							  mutex_lock(oom_lock)
							    select_bad_process # no victim

The problem is more visible with many threads.

Fix this by checking for fatal_signal_pending from
mem_cgroup_out_of_memory when the oom_lock is already held.

The oom bypass is safe because we do the same early in the try_charge
path already.  The situation migh have changed in the mean time.  It
should be safe to check for fatal_signal_pending and tsk_is_oom_victim
but for a better code readability abstract the current charge bypass
condition into should_force_charge and reuse it from that path.  "

Link: http://lkml.kernel.org/r/01370f70-e1f6-ebe4-b95e-0df21a0bc15e@i-love.sakura.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:58 +02:00
Tetsuo Handa
eed3ca0a66 mm,oom: don't kill global init via memory.oom.group
[ Upstream commit d342a0b38674867ea67fde47b0e1e60ffe9f17a2 ]

Since setting global init process to some memory cgroup is technically
possible, oom_kill_memcg_member() must check it.

  Tasks in /test1 are going to be killed due to memory.oom.group set
  Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
  oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char *argv[])
{
	static char buffer[10485760];
	static int pipe_fd[2] = { EOF, EOF };
	unsigned int i;
	int fd;
	char buf[64] = { };
	if (pipe(pipe_fd))
		return 1;
	if (chdir("/sys/fs/cgroup/"))
		return 1;
	fd = open("cgroup.subtree_control", O_WRONLY);
	write(fd, "+memory", 7);
	close(fd);
	mkdir("test1", 0755);
	fd = open("test1/memory.oom.group", O_WRONLY);
	write(fd, "1", 1);
	close(fd);
	fd = open("test1/cgroup.procs", O_WRONLY);
	write(fd, "1", 1);
	snprintf(buf, sizeof(buf) - 1, "%d", getpid());
	write(fd, buf, strlen(buf));
	close(fd);
	snprintf(buf, sizeof(buf) - 1, "%lu", sizeof(buffer) * 5);
	fd = open("test1/memory.max", O_WRONLY);
	write(fd, buf, strlen(buf));
	close(fd);
	for (i = 0; i < 10; i++)
		if (fork() == 0) {
			char c;
			close(pipe_fd[1]);
			read(pipe_fd[0], &c, 1);
			memset(buffer, 0, sizeof(buffer));
			sleep(3);
			_exit(0);
		}
	close(pipe_fd[0]);
	close(pipe_fd[1]);
	sleep(3);
	return 0;
}

[   37.052923][ T9185] a.out invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[   37.056169][ T9185] CPU: 4 PID: 9185 Comm: a.out Kdump: loaded Not tainted 5.0.0-rc4-next-20190131 #280
[   37.059205][ T9185] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[   37.062954][ T9185] Call Trace:
[   37.063976][ T9185]  dump_stack+0x67/0x95
[   37.065263][ T9185]  dump_header+0x51/0x570
[   37.066619][ T9185]  ? trace_hardirqs_on+0x3f/0x110
[   37.068171][ T9185]  ? _raw_spin_unlock_irqrestore+0x3d/0x70
[   37.069967][ T9185]  oom_kill_process+0x18d/0x210
[   37.071515][ T9185]  out_of_memory+0x11b/0x380
[   37.072936][ T9185]  mem_cgroup_out_of_memory+0xb6/0xd0
[   37.074601][ T9185]  try_charge+0x790/0x820
[   37.076021][ T9185]  mem_cgroup_try_charge+0x42/0x1d0
[   37.077629][ T9185]  mem_cgroup_try_charge_delay+0x11/0x30
[   37.079370][ T9185]  do_anonymous_page+0x105/0x5e0
[   37.080939][ T9185]  __handle_mm_fault+0x9cb/0x1070
[   37.082485][ T9185]  handle_mm_fault+0x1b2/0x3a0
[   37.083819][ T9185]  ? handle_mm_fault+0x47/0x3a0
[   37.085181][ T9185]  __do_page_fault+0x255/0x4c0
[   37.086529][ T9185]  do_page_fault+0x28/0x260
[   37.087788][ T9185]  ? page_fault+0x8/0x30
[   37.088978][ T9185]  page_fault+0x1e/0x30
[   37.090142][ T9185] RIP: 0033:0x7f8b183aefe0
[   37.091433][ T9185] Code: 20 f3 44 0f 7f 44 17 d0 f3 44 0f 7f 47 30 f3 44 0f 7f 44 17 c0 48 01 fa 48 83 e2 c0 48 39 d1 74 a3 66 0f 1f 84 00 00 00 00 00 <66> 44 0f 7f 01 66 44 0f 7f 41 10 66 44 0f 7f 41 20 66 44 0f 7f 41
[   37.096917][ T9185] RSP: 002b:00007fffc5d329e8 EFLAGS: 00010206
[   37.098615][ T9185] RAX: 00000000006010e0 RBX: 0000000000000008 RCX: 0000000000c30000
[   37.100905][ T9185] RDX: 00000000010010c0 RSI: 0000000000000000 RDI: 00000000006010e0
[   37.103349][ T9185] RBP: 0000000000000000 R08: 00007f8b188f4740 R09: 0000000000000000
[   37.105797][ T9185] R10: 00007fffc5d32420 R11: 00007f8b183aef40 R12: 0000000000000005
[   37.108228][ T9185] R13: 0000000000000000 R14: ffffffffffffffff R15: 0000000000000000
[   37.110840][ T9185] memory: usage 51200kB, limit 51200kB, failcnt 125
[   37.113045][ T9185] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[   37.115808][ T9185] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[   37.117660][ T9185] Memory cgroup stats for /test1: cache:0KB rss:49484KB rss_huge:30720KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:49700KB inactive_file:0KB active_file:0KB unevictable:0KB
[   37.123371][ T9185] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test1,task_memcg=/test1,task=a.out,pid=9188,uid=0
[   37.128158][ T9185] Memory cgroup out of memory: Killed process 9188 (a.out) total-vm:14456kB, anon-rss:10324kB, file-rss:504kB, shmem-rss:0kB
[   37.132710][ T9185] Tasks in /test1 are going to be killed due to memory.oom.group set
[   37.132833][   T54] oom_reaper: reaped process 9188 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   37.135498][ T9185] Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
[   37.143434][ T9185] Memory cgroup out of memory: Killed process 9182 (a.out) total-vm:14456kB, anon-rss:76kB, file-rss:588kB, shmem-rss:0kB
[   37.144328][   T54] oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   37.147585][ T9185] Memory cgroup out of memory: Killed process 9183 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[   37.157222][ T9185] Memory cgroup out of memory: Killed process 9184 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:508kB, shmem-rss:0kB
[   37.157259][ T9185] Memory cgroup out of memory: Killed process 9185 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[   37.157291][ T9185] Memory cgroup out of memory: Killed process 9186 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:508kB, shmem-rss:0kB
[   37.157306][   T54] oom_reaper: reaped process 9183 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   37.157328][ T9185] Memory cgroup out of memory: Killed process 9187 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
[   37.157452][ T9185] Memory cgroup out of memory: Killed process 9189 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[   37.158733][ T9185] Memory cgroup out of memory: Killed process 9190 (a.out) total-vm:14456kB, anon-rss:552kB, file-rss:512kB, shmem-rss:0kB
[   37.160083][   T54] oom_reaper: reaped process 9186 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   37.160187][   T54] oom_reaper: reaped process 9189 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   37.206941][   T54] oom_reaper: reaped process 9185 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   37.212300][ T9185] Memory cgroup out of memory: Killed process 9191 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
[   37.212317][   T54] oom_reaper: reaped process 9190 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   37.218860][ T9185] Memory cgroup out of memory: Killed process 9192 (a.out) total-vm:14456kB, anon-rss:1080kB, file-rss:512kB, shmem-rss:0kB
[   37.227667][   T54] oom_reaper: reaped process 9192 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[   37.292323][ T9193] abrt-hook-ccpp (9193) used greatest stack depth: 10480 bytes left
[   37.351843][    T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
[   37.354833][    T1] CPU: 7 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.0.0-rc4-next-20190131 #280
[   37.357876][    T1] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[   37.361685][    T1] Call Trace:
[   37.363239][    T1]  dump_stack+0x67/0x95
[   37.365010][    T1]  panic+0xfc/0x2b0
[   37.366853][    T1]  do_exit+0xd55/0xd60
[   37.368595][    T1]  do_group_exit+0x47/0xc0
[   37.370415][    T1]  get_signal+0x32a/0x920
[   37.372449][    T1]  ? _raw_spin_unlock_irqrestore+0x3d/0x70
[   37.374596][    T1]  do_signal+0x32/0x6e0
[   37.376430][    T1]  ? exit_to_usermode_loop+0x26/0x9b
[   37.378418][    T1]  ? prepare_exit_to_usermode+0xa8/0xd0
[   37.380571][    T1]  exit_to_usermode_loop+0x3e/0x9b
[   37.382588][    T1]  prepare_exit_to_usermode+0xa8/0xd0
[   37.384594][    T1]  ? page_fault+0x8/0x30
[   37.386453][    T1]  retint_user+0x8/0x18
[   37.388160][    T1] RIP: 0033:0x7f42c06974a8
[   37.389922][    T1] Code: Bad RIP value.
[   37.391788][    T1] RSP: 002b:00007ffc3effd388 EFLAGS: 00010213
[   37.394075][    T1] RAX: 000000000000000e RBX: 00007ffc3effd390 RCX: 0000000000000000
[   37.396963][    T1] RDX: 000000000000002a RSI: 00007ffc3effd390 RDI: 0000000000000004
[   37.399550][    T1] RBP: 00007ffc3effd680 R08: 0000000000000000 R09: 0000000000000000
[   37.402334][    T1] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000001
[   37.404890][    T1] R13: ffffffffffffffff R14: 0000000000000884 R15: 000056460b1ac3b0

Link: http://lkml.kernel.org/r/201902010336.x113a4EO027170@www262.sakura.ne.jp
Fixes: 3d8b38eb81 ("mm, oom: introduce memory.oom.group")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:58 +02:00
Daniel Jordan
ed3345a660 mm, swap: bounds check swap_info array accesses to avoid NULL derefs
[ Upstream commit c10d38cc8d3e43f946b6c2bf4602c86791587f30 ]

Dan Carpenter reports a potential NULL dereference in
get_swap_page_of_type:

  Smatch complains that the NULL checks on "si" aren't consistent.  This
  seems like a real bug because we have not ensured that the type is
  valid and so "si" can be NULL.

Add the missing check for NULL, taking care to use a read barrier to
ensure CPU1 observes CPU0's updates in the correct order:

     CPU0                           CPU1
     alloc_swap_info()              if (type >= nr_swapfiles)
       swap_info[type] = p              /* handle invalid entry */
       smp_wmb()                    smp_rmb()
       ++nr_swapfiles               p = swap_info[type]

Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
CPU0's write to swap_info[type] and read NULL from swap_info[type].

Ying Huang noticed other places in swapfile.c don't order these reads
properly.  Introduce swap_type_to_swap_info to encourage correct usage.

Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
(see tools/memory-model/Documentation/explanation.txt).

This ordering need not be enforced in places where swap_lock is held
(e.g.  si_swapinfo) because swap_lock serializes updates to nr_swapfiles
and the swap_info array.

Link: http://lkml.kernel.org/r/20190131024410.29859-1-daniel.m.jordan@oracle.com
Fixes: ec8acf20af ("swap: add per-partition lock for swapfile")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:58 +02:00
Qian Cai
4c6d7dc741 mm/page_ext.c: fix an imbalance with kmemleak
[ Upstream commit 0c81585499601acd1d0e1cbf424cabfaee60628c ]

After offlining a memory block, kmemleak scan will trigger a crash, as
it encounters a page ext address that has already been freed during
memory offlining.  At the beginning in alloc_page_ext(), it calls
kmemleak_alloc(), but it does not call kmemleak_free() in
free_page_ext().

    BUG: unable to handle kernel paging request at ffff888453d00000
    PGD 128a01067 P4D 128a01067 PUD 128a04067 PMD 47e09e067 PTE 800ffffbac2ff060
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
    Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
    RIP: 0010:scan_block+0xb5/0x290
    Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 <4c> 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
    RSP: 0018:ffff8881ec57f8e0 EFLAGS: 00010082
    RAX: 0000000000000000 RBX: ffff888453d00000 RCX: ffffffffa61e5a54
    RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888453d00000
    RBP: ffff8881ec57f920 R08: fffffbfff4ed588d R09: fffffbfff4ed588c
    R10: fffffbfff4ed588c R11: ffffffffa76ac463 R12: dffffc0000000000
    R13: ffff888453d00ff9 R14: ffff8881f80cef48 R15: ffff8881f80cef48
    FS:  00007f6c0e3f8740(0000) GS:ffff8881f7680000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff888453d00000 CR3: 00000001c4244003 CR4: 00000000001606a0
    Call Trace:
     scan_gray_list+0x269/0x430
     kmemleak_scan+0x5a8/0x10f0
     kmemleak_write+0x541/0x6ca
     full_proxy_write+0xf8/0x190
     __vfs_write+0xeb/0x980
     vfs_write+0x15a/0x4f0
     ksys_write+0xd2/0x1b0
     __x64_sys_write+0x73/0xb0
     do_syscall_64+0xeb/0xaaa
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f6c0dad73b8
    Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
    RSP: 002b:00007ffd5b863cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f6c0dad73b8
    RDX: 0000000000000005 RSI: 000055a9216e1710 RDI: 0000000000000001
    RBP: 000055a9216e1710 R08: 000000000000000a R09: 00007ffd5b863840
    R10: 000000000000000a R11: 0000000000000246 R12: 00007f6c0dda9780
    R13: 0000000000000005 R14: 00007f6c0dda4740 R15: 0000000000000005
    Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
    CR2: ffff888453d00000
    ---[ end trace ccf646c7456717c5 ]---
    Kernel panic - not syncing: Fatal exception
    Shutting down cpus with NMI
    Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
    0xffffffff80000000-0xffffffffbfffffff)
    ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:58 +02:00
Peng Fan
f555b008c5 mm/cma.c: cma_declare_contiguous: correct err handling
[ Upstream commit 0d3bd18a5efd66097ef58622b898d3139790aa9d ]

In case cma_init_reserved_mem failed, need to free the memblock
allocated by memblock_reserve or memblock_alloc_range.

Quote Catalin's comments:
  https://lkml.org/lkml/2019/2/26/482

Kmemleak is supposed to work with the memblock_{alloc,free} pair and it
ignores the memblock_reserve() as a memblock_alloc() implementation
detail. It is, however, tolerant to memblock_free() being called on
a sub-range or just a different range from a previous memblock_alloc().
So the original patch looks fine to me. FWIW:

Link: http://lkml.kernel.org/r/20190227144631.16708-1-peng.fan@nxp.com
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:58 +02:00
Qian Cai
7b287c47e4 mm/sparse: fix a bad comparison
[ Upstream commit d778015ac95bc036af73342c878ab19250e01fe1 ]

next_present_section_nr() could only return an unsigned number -1, so
just check it specifically where compilers will convert -1 to unsigned
if needed.

  mm/sparse.c: In function 'sparse_init_nid':
  mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
         ((section_nr >= 0) &&    \
                      ^~
  mm/sparse.c:478:2: note: in expansion of macro
  'for_each_present_section_nr'
    for_each_present_section_nr(pnum_begin, pnum) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
         ((section_nr >= 0) &&    \
                      ^~
  mm/sparse.c:497:2: note: in expansion of macro
  'for_each_present_section_nr'
    for_each_present_section_nr(pnum_begin, pnum) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  mm/sparse.c: In function 'sparse_init':
  mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
         ((section_nr >= 0) &&    \
                      ^~
  mm/sparse.c:520:2: note: in expansion of macro
  'for_each_present_section_nr'
    for_each_present_section_nr(pnum_begin + 1, pnum_end) {
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~

Link: http://lkml.kernel.org/r/20190228181839.86504-1-cai@lca.pw
Fixes: c4e1be9ec1 ("mm, sparsemem: break out of loops early")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-05 22:32:58 +02:00
Greg Kroah-Hartman
0b065cd568 This is the 4.19.33 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlykNfcACgkQONu9yGCS
 aT40dRAAiCeYjEC1zH8dkAnbFlKo6IZuhKgISfVTgrWlRe9nUTYaenBXqAfGjufH
 EzXHrD1IANRAnFfWg8xt01TNBfTaiEYnYzFmJkWAHFWGKxa5fRU5Kan0MB97r8s9
 NjoSRsnFl8l2oJI88zwFa7k89Itop9ST/zvZIgnrysAr+j8yEZb7BZWaU2UrKK/q
 qfnJxjfCb/jeqAxwVh3OkasXj0gG2JkGR/uEGTw2EARuI6pvKo5OCzYz0tXTN6ZJ
 CSzM4X7dhkGSgLIUw3JOCB28riK9TYbOdPr4MFYMrnoU5VL8+n62tXoKXewobJ1C
 2+Pmg5E54r13Rr35eoGCiHsW2LGQrOyvm8S9TFB/0SmTPtzUjFlHBC62Vs/AW5ut
 HSmwVy+ILM/xTIdts5QT58Gw+5NCmHxw2oEdrgcct+6FtnR9XPOqZYyzH1tw1ZB+
 DL2PqYyT9czuo2bKanWA37M8339q2INDFskXqKRokQ9GNiqUx1E6fNBqtK5SqyDI
 CdohtAs7xSQoPPbKDiITOmt82MM8xvefKmqTIvHN5B7Ns4lT1QC74DCXEFEKtw6M
 l+p64h6Qw4DiKmqna7fsbKPjmg8pg1lVrCOwD5iUF3JXy+Wxi/OKnr2gqAVMChAq
 GSfFFf+MMZhYzJPRQKxOn4GwDBDz5niaWvQmlPcPWLJ5jdbzl+Y=
 =Yyc/
 -----END PGP SIGNATURE-----

Merge 4.19.33 into android-4.19

Changes in 4.19.33
	Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt
	Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer
	ipmi_si: Fix crash when using hard-coded device
	dccp: do not use ipv6 header for ipv4 flow
	genetlink: Fix a memory leak on error path
	gtp: change NET_UDP_TUNNEL dependency to select
	ipv6: make ip6_create_rt_rcu return ip6_null_entry instead of NULL
	mac8390: Fix mmio access size probe
	mISDN: hfcpci: Test both vendor & device ID for Digium HFC4S
	net: aquantia: fix rx checksum offload for UDP/TCP over IPv6
	net: datagram: fix unbounded loop in __skb_try_recv_datagram()
	net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec
	net: phy: meson-gxl: fix interrupt support
	net: rose: fix a possible stack overflow
	net: stmmac: fix memory corruption with large MTUs
	net-sysfs: call dev_hold if kobject_init_and_add success
	packets: Always register packet sk in the same order
	rhashtable: Still do rehash when we get EEXIST
	sctp: get sctphdr by offset in sctp_compute_cksum
	sctp: use memdup_user instead of vmemdup_user
	tcp: do not use ipv6 header for ipv4 flow
	tipc: allow service ranges to be connect()'ed on RDM/DGRAM
	tipc: change to check tipc_own_id to return in tipc_net_stop
	tipc: fix cancellation of topology subscriptions
	tun: properly test for IFF_UP
	vrf: prevent adding upper devices
	vxlan: Don't call gro_cells_destroy() before device is unregistered
	ila: Fix rhashtable walker list corruption
	net: sched: fix cleanup NULL pointer exception in act_mirr
	thunderx: enable page recycling for non-XDP case
	thunderx: eliminate extra calls to put_page() for pages held for recycling
	tun: add a missing rcu_read_unlock() in error path
	powerpc/fsl: Add infrastructure to fixup branch predictor flush
	powerpc/fsl: Add macro to flush the branch predictor
	powerpc/fsl: Emulate SPRN_BUCSR register
	powerpc/fsl: Add nospectre_v2 command line argument
	powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)
	powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)
	powerpc/fsl: Flush branch predictor when entering KVM
	powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used
	powerpc/fsl: Update Spectre v2 reporting
	powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'
	powerpc/fsl: Fix the flush of branch predictor.
	powerpc/security: Fix spectre_v2 reporting
	Btrfs: fix incorrect file size after shrinking truncate and fsync
	btrfs: remove WARN_ON in log_dir_items
	btrfs: don't report readahead errors and don't update statistics
	btrfs: raid56: properly unmap parity page in finish_parity_scrub()
	btrfs: Avoid possible qgroup_rsv_size overflow in btrfs_calculate_inode_block_rsv_size
	Btrfs: fix assertion failure on fsync with NO_HOLES enabled
	ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time
	powerpc: bpf: Fix generation of load/store DW instructions
	vfio: ccw: only free cp on final interrupt
	NFS: fix mount/umount race in nlmclnt.
	NFSv4.1 don't free interrupted slot on open
	net: dsa: qca8k: remove leftover phy accessors
	ALSA: rawmidi: Fix potential Spectre v1 vulnerability
	ALSA: seq: oss: Fix Spectre v1 vulnerability
	ALSA: pcm: Fix possible OOB access in PCM oss plugins
	ALSA: pcm: Don't suspend stream in unrecoverable PCM state
	ALSA: hda/realtek - Add support headset mode for DELL WYSE AIO
	ALSA: hda/realtek - Add support headset mode for New DELL WYSE NB
	ALSA: hda/realtek: Enable headset MIC of Acer AIO with ALC286
	ALSA: hda/realtek: Enable headset MIC of Acer Aspire Z24-890 with ALC286
	ALSA: hda/realtek - Add support for Acer Aspire E5-523G/ES1-432 headset mic
	ALSA: hda/realtek: Enable ASUS X441MB and X705FD headset MIC with ALC256
	ALSA: hda/realtek: Enable headset mic of ASUS P5440FF with ALC256
	ALSA: hda/realtek: Enable headset MIC of ASUS X430UN and X512DK with ALC256
	ALSA: hda/realtek - Fix speakers on Acer Predator Helios 500 Ryzen laptops
	kbuild: modversions: Fix relative CRC byte order interpretation
	fs/open.c: allow opening only regular files during execve()
	ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock
	scsi: sd: Fix a race between closing an sd device and sd I/O
	scsi: sd: Quiesce warning if device does not report optimal I/O size
	scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host
	scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices
	drm/rockchip: vop: reset scale mode when win is disabled
	tty: mxs-auart: fix a potential NULL pointer dereference
	tty: atmel_serial: fix a potential NULL pointer dereference
	tty: serial: qcom_geni_serial: Initialize baud in qcom_geni_console_setup
	staging: comedi: ni_mio_common: Fix divide-by-zero for DIO cmdtest
	staging: speakup_soft: Fix alternate speech with other synths
	staging: vt6655: Remove vif check from vnt_interrupt
	staging: vt6655: Fix interrupt race condition on device start up.
	staging: erofs: fix to handle error path of erofs_vmap()
	serial: max310x: Fix to avoid potential NULL pointer dereference
	serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference
	serial: sh-sci: Fix setting SCSCR_TIE while transferring data
	USB: serial: cp210x: add new device id
	USB: serial: ftdi_sio: add additional NovaTech products
	USB: serial: mos7720: fix mos_parport refcount imbalance on error path
	USB: serial: option: set driver_info for SIM5218 and compatibles
	USB: serial: option: add support for Quectel EM12
	USB: serial: option: add Olicard 600
	Disable kgdboc failed by echo space to /sys/module/kgdboc/parameters/kgdboc
	fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
	drm/vgem: fix use-after-free when drm_gem_handle_create() fails
	drm/vkms: fix use-after-free when drm_gem_handle_create() fails
	drm/i915/gvt: Fix MI_FLUSH_DW parsing with correct index check
	gpio: exar: add a check for the return value of ida_simple_get fails
	gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input
	phy: sun4i-usb: Support set_mode to USB_HOST for non-OTG PHYs
	usb: mtu3: fix EXTCON dependency
	USB: gadget: f_hid: fix deadlock in f_hidg_write()
	usb: common: Consider only available nodes for dr_mode
	usb: host: xhci-rcar: Add XHCI_TRUST_TX_LENGTH quirk
	xhci: Fix port resume done detection for SS ports with LPM enabled
	usb: xhci: dbc: Don't free all memory with spinlock held
	xhci: Don't let USB3 ports stuck in polling state prevent suspend
	usb: cdc-acm: fix race during wakeup blocking TX traffic
	mm: add support for kmem caches in DMA32 zone
	iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debugging
	mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
	mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate
	perf pmu: Fix parser error for uncore event alias
	perf intel-pt: Fix TSC slip
	objtool: Query pkg-config for libelf location
	powerpc/pseries/energy: Use OF accessor functions to read ibm,drc-indexes
	powerpc/64: Fix memcmp reading past the end of src/dest
	watchdog: Respect watchdog cpumask on CPU hotplug
	cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
	x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
	KVM: Reject device ioctls from processes other than the VM's creator
	KVM: x86: update %rip after emulating IO
	KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
	staging: erofs: fix error handling when failed to read compresssed data
	staging: erofs: keep corrupted fs from crashing kernel in erofs_readdir()
	bpf: do not restore dst_reg when cur_state is freed
	drivers: base: Helpers for adding device connection descriptions
	platform: x86: intel_cht_int33fe: Register all connections at once
	platform: x86: intel_cht_int33fe: Add connection for the DP alt mode
	platform: x86: intel_cht_int33fe: Add connections for the USB Type-C port
	usb: typec: class: Don't use port parent for getting mux handles
	platform: x86: intel_cht_int33fe: Remove the old connections for the muxes
	Linux 4.19.33

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-03 06:53:19 +02:00
Lars Persson
f70ddae24b mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate
commit d2b2c6dd227ba5b8a802858748ec9a780cb75b47 upstream.

Our MIPS 1004Kc SoCs were seeing random userspace crashes with SIGILL
and SIGSEGV that could not be traced back to a userspace code bug.  They
had all the magic signs of an I/D cache coherency issue.

Now recently we noticed that the /proc/sys/vm/compact_memory interface
was quite efficient at provoking this class of userspace crashes.

Studying the code in mm/migrate.c there is a distinction made between
migrating a page that is mapped at the instant of migration and one that
is not mapped.  Our problem turned out to be the non-mapped pages.

For the non-mapped page the code performs a copy of the page content and
all relevant meta-data of the page without doing the required D-cache
maintenance.  This leaves dirty data in the D-cache of the CPU and on
the 1004K cores this data is not visible to the I-cache.  A subsequent
page-fault that triggers a mapping of the page will happily serve the
process with potentially stale code.

What about ARM then, this bug should have seen greater exposure? Well
ARM became immune to this flaw back in 2010, see commit c01778001a
("ARM: 6379/1: Assume new page cache pages have dirty D-cache").

My proposed fix moves the D-cache maintenance inside move_to_new_page to
make it common for both cases.

Link: http://lkml.kernel.org/r/20190315083502.11849-1-larper@axis.com
Fixes: 97ee052461 ("flush cache before installing new page at migraton")
Signed-off-by: Lars Persson <larper@axis.com>
Reviewed-by: Paul Burton <paul.burton@mips.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03 06:26:28 +02:00
Yang Shi
5966777dd8 mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
commit a7f40cfe3b7ada57af9b62fd28430eeb4a7cfcb7 upstream.

When MPOL_MF_STRICT was specified and an existing page was already on a
node that does not follow the policy, mbind() should return -EIO.  But
commit 6f4576e368 ("mempolicy: apply page table walker on
queue_pages_range()") broke the rule.

And commit c863379849 ("mm: mempolicy: mbind and migrate_pages support
thp migration") didn't return the correct value for THP mbind() too.

If MPOL_MF_STRICT is set, ignore vma_migratable() to make sure it
reaches queue_pages_to_pte_range() or queue_pages_pmd() to check if an
existing page was already on a node that does not follow the policy.
And, non-migratable vma may be used, return -EIO too if MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL was specified.

Tested with https://github.com/metan-ucw/ltp/blob/master/testcases/kernel/syscalls/mbind/mbind02.c

[akpm@linux-foundation.org: tweak code comment]
Link: http://lkml.kernel.org/r/1553020556-38583-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: 6f4576e368 ("mempolicy: apply page table walker on queue_pages_range()")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03 06:26:28 +02:00
Nicolas Boichat
62d342d670 mm: add support for kmem caches in DMA32 zone
commit 6d6ea1e967a246f12cfe2f5fb743b70b2e608d4a upstream.

Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables",
v6.

This is a followup to the discussion in [1], [2].

IOMMUs using ARMv7 short-descriptor format require page tables (level 1
and 2) to be allocated within the first 4GB of RAM, even on 64-bit
systems.

For L1 tables that are bigger than a page, we can just use
__get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still
use GFP_DMA).

For L2 tables that only take 1KB, it would be a waste to allocate a full
page, so we considered 3 approaches:
 1. This series, adding support for GFP_DMA32 slab caches.
 2. genalloc, which requires pre-allocating the maximum number of L2 page
    tables (4096, so 4MB of memory).
 3. page_frag, which is not very memory-efficient as it is unable to reuse
    freed fragments until the whole page is freed. [3]

This series is the most memory-efficient approach.

stable@ note:
  We confirmed that this is a regression, and IOMMU errors happen on 4.19
  and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
  most likely starts from commit ad67f5a654 ("arm64: replace ZONE_DMA
  with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek
  platforms (and maybe others?).

[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html
[3] https://patchwork.codeaurora.org/patch/671639/

This patch (of 3):

IOMMUs using ARMv7 short-descriptor format require page tables to be
allocated within the first 4GB of RAM, even on 64-bit systems.  On arm64,
this is done by passing GFP_DMA32 flag to memory allocation functions.

For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
a full page using get_free_pages, so we considered 3 approaches:
 1. This patch, adding support for GFP_DMA32 slab caches.
 2. genalloc, which requires pre-allocating the maximum number of L2
    page tables (4096, so 4MB of memory).
 3. page_frag, which is not very memory-efficient as it is unable
    to reuse freed fragments until the whole page is freed.

This change makes it possible to create a custom cache in DMA32 zone using
kmem_cache_create, then allocate memory using kmem_cache_alloc.

We do not create a DMA32 kmalloc cache array, as there are currently no
users of kmalloc(..., GFP_DMA32).  These calls will continue to trigger a
warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.

This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
unnecessary).

Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.org
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Sasha Levin <Alexander.Levin@microsoft.com>
Cc: Huaisheng Ye <yehs1@lenovo.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Yong Wu <yong.wu@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Tomasz Figa <tfiga@google.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-03 06:26:28 +02:00
Linus Torvalds
1d1d2f0762 UPSTREAM: filemap: add a comment about FAULT_FLAG_RETRY_NOWAIT behavior
I thought Josef Bacik's patch to drop the mmap_sem was buggy, because
when looking at the error cases, there was one case where we returned
VM_FAULT_RETRY without actually dropping the mmap_sem.

Josef had to explain to me (using small words) that yes, that's actually
what we're supposed to do, and his patch was correct.  Which not only
convinced me he knew what he was doing and I should stop arguing with
him, but also that I should add a comment to the case I was confused
about.

Bug: 124328118
Change-Id: I28390f4924110ee259c20304753bd64a8f6121f8
Patiently-pointed-out-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8b0f9fa2e02dc95216577c3387b0707c5f60fbaf)
Signed-off-by: Minchan Kim <minchan@google.com>
2019-03-29 16:01:12 +09:00
Josef Bacik
9f5fa1fd7a BACKPORT: filemap: drop the mmap_sem for all blocking operations
Currently we only drop the mmap_sem if there is contention on the page
lock.  The idea is that we issue readahead and then go to lock the page
while it is under IO and we want to not hold the mmap_sem during the IO.

The problem with this is the assumption that the readahead does anything.
In the case that the box is under extreme memory or IO pressure we may end
up not reading anything at all for readahead, which means we will end up
reading in the page under the mmap_sem.

Even if the readahead does something, it could get throttled because of io
pressure on the system and the process is in a lower priority cgroup.

Holding the mmap_sem while doing IO is problematic because it can cause
system-wide priority inversions.  Consider some large company that does a
lot of web traffic.  This large company has load balancing logic in it's
core web server, cause some engineer thought this was a brilliant plan.
This load balancing logic gets statistics from /proc about the system,
which trip over processes mmap_sem for various reasons.  Now the web
server application is in a protected cgroup, but these other processes may
not be, and if they are being throttled while their mmap_sem is held we'll
stall, and cause this nice death spiral.

Instead rework filemap fault path to drop the mmap sem at any point that
we may do IO or block for an extended period of time.  This includes while
issuing readahead, locking the page, or needing to call ->readpage because
readahead did not occur.  Then once we have a fully uptodate page we can
return with VM_FAULT_RETRY and come back again to find our nicely in-cache
page that was gotten outside of the mmap_sem.

This patch also adds a new helper for locking the page with the mmap_sem
dropped.  This doesn't make sense currently as generally speaking if the
page is already locked it'll have been read in (unless there was an error)
before it was unlocked.  However a forthcoming patchset will change this
with the ability to abort read-ahead bio's if necessary, making it more
likely that we could contend for a page lock and still have a not uptodate
page.  This allows us to deal with this case by grabbing the lock and
issuing the IO without the mmap_sem held, and then returning
VM_FAULT_RETRY to come back around.

Test: fio mmap ioengine with 8-thread on 4cpu under memory pressure for
stability check
Test: locking holding latency measure with lockstat for performance
check
Bug: 124328118
Change-Id: I0af79cbe6f2ef9fe3927961e042347490b892684
[josef@toxicpanda.com: v6]
  Link: http://lkml.kernel.org/r/20181212152757.10017-1-josef@toxicpanda.com
[kirill@shutemov.name: fix race in filemap_fault()]
  Link: http://lkml.kernel.org/r/20181228235106.okk3oastsnpxusxs@kshutemo-mobl1
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20181211173801.29535-4-josef@toxicpanda.com
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: syzbot+b437b5a429d680cf2217@syzkaller.appspotmail.com
Cc: Dave Chinner <david@fromorbit.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6b4c9f4469819a0c1a38a0a4541337e0f9bf6c11)
Signed-off-by: Minchan Kim <minchan@google.com>
2019-03-29 16:01:06 +09:00
Josef Bacik
76b6ee8f38 BACKPORT: filemap: kill page_cache_read usage in filemap_fault
Patch series "drop the mmap_sem when doing IO in the fault path", v6.

Now that we have proper isolation in place with cgroups2 we have started
going through and fixing the various priority inversions.  Most are all
gone now, but this one is sort of weird since it's not necessarily a
priority inversion that happens within the kernel, but rather because of
something userspace does.

We have giant applications that we want to protect, and parts of these
giant applications do things like watch the system state to determine how
healthy the box is for load balancing and such.  This involves running
'ps' or other such utilities.  These utilities will often walk
/proc/<pid>/whatever, and these files can sometimes need to
down_read(&task->mmap_sem).  Not usually a big deal, but we noticed when
we are stress testing that sometimes our protected application has latency
spikes trying to get the mmap_sem for tasks that are in lower priority
cgroups.

This is because any down_write() on a semaphore essentially turns it into
a mutex, so even if we currently have it held for reading, any new readers
will not be allowed on to keep from starving the writer.  This is fine,
except a lower priority task could be stuck doing IO because it has been
throttled to the point that its IO is taking much longer than normal.  But
because a higher priority group depends on this completing it is now stuck
behind lower priority work.

In order to avoid this particular priority inversion we want to use the
existing retry mechanism to stop from holding the mmap_sem at all if we
are going to do IO.  This already exists in the read case sort of, but
needed to be extended for more than just grabbing the page lock.  With
io.latency we throttle at submit_bio() time, so the readahead stuff can
block and even page_cache_read can block, so all these paths need to have
the mmap_sem dropped.

The other big thing is ->page_mkwrite.  btrfs is particularly shitty here
because we have to reserve space for the dirty page, which can be a very
expensive operation.  We use the same retry method as the read path, and
simply cache the page and verify the page is still setup properly the next
pass through ->page_mkwrite().

I've tested these patches with xfstests and there are no regressions.

This patch (of 3):

If we do not have a page at filemap_fault time we'll do this weird forced
page_cache_read thing to populate the page, and then drop it again and
loop around and find it.  This makes for 2 ways we can read a page in
filemap_fault, and it's not really needed.  Instead add a FGP_FOR_MMAP
flag so that pagecache_get_page() will return a unlocked page that's in
pagecache.  Then use the normal page locking and readpage logic already in
filemap_fault.  This simplifies the no page in page cache case
significantly.

Bug: 124328118
Change-Id: I6e04df24ba52bb96c05c9386248983f477ab78e5
[akpm@linux-foundation.org: fix comment text]
[josef@toxicpanda.com: don't unlock null page in FGP_FOR_MMAP case]
  Link: http://lkml.kernel.org/r/20190312201742.22935-1-josef@toxicpanda.com
Link: http://lkml.kernel.org/r/20181211173801.29535-2-josef@toxicpanda.com
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a75d4c33377277b6034dd1e2663bce444f952c14)
Signed-off-by: Minchan Kim <minchan@google.com>
2019-03-29 16:01:02 +09:00
Josef Bacik
0cb18b7b49 UPSTREAM: filemap: pass vm_fault to the mmap ra helpers
All of the arguments to these functions come from the vmf.

Cut down on the amount of arguments passed by simply passing in the vmf
to these two helpers.

Bug: 124328118
Change-Id: I874f8320d8393f2e39aab030fcf43e51c882e90a
Link: http://lkml.kernel.org/r/20181211173801.29535-3-josef@toxicpanda.com
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 2a1180f1bd389e9d47693e5eb384b95f482d8d19)
Signed-off-by: Minchan Kim <minchan@google.com>
2019-03-29 16:00:56 +09:00
Greg Kroah-Hartman
bb418a146a This is the 4.19.31 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlyWhJcACgkQONu9yGCS
 aT6XzxAAzP2QGzC4SVPgcFH1woF/d8Cz0zQ81mLXzjXtEPm39fZCM2hbBnxkXLu1
 peFyrKNk6/c9541D9gsQCQT6Fu+H6u1bJKcIezlKJ2xyB/MsU1hXkjZrTJYW3RRs
 gimy1EGdood2el1ubEBZiaspazoeRzBqtg1Nsmr4V0l+RT8HwtKKw+0+Nxixfp59
 NoVkqTpPI5mL0FiH2R9ogcfg3SvgMZOsOhOBjdPvSjiJJsbvIWcW48MCs95XSUpF
 R+l/fWn+oiFCcIqBaFheujuqZMvVrUHZHaWAPMuoR/c3Cdf0lTBokdv6UM9c0nv3
 61jX5r5ImRI/dfQANN5mbB1YKcs5xOI+I7QZHQ2q4clsWrWyLapXW4clrAZJ6z5t
 UVeVbuLV2y5PL9GJyBcXpyY0BOf4e2gZURaPY3C5McNwgybNoiR0ZePqKb8ZhZyh
 jYOYRoBjJJpZoVTSt6MNX95NTvGaSAtqKMu1s3IeMfpwCfQKBPMOuBHr/dUqSC6I
 U0xxjk/71C15dSPVcTVJT/lmcKc6TXgoagnfbn8GBtDOAjBNsYyUJLQI+db1ERCe
 9MEB9k1Z87ROQ5jQCQmWsewOVAtFZBEvSszFmpKv3zTe8M2oFpXG56zckdiumwHU
 nSfeZTTeWzsFJd30MioEnGYm3ZwKwZx7wi0x4B4WWvBfSpp20Us=
 =xtLx
 -----END PGP SIGNATURE-----

Merge 4.19.31 into android-4.19

Changes in 4.19.31
	media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused()
	9p: use inode->i_lock to protect i_size_write() under 32-bit
	9p/net: fix memory leak in p9_client_create
	ASoC: fsl_esai: fix register setting issue in RIGHT_J mode
	ASoC: codecs: pcm186x: fix wrong usage of DECLARE_TLV_DB_SCALE()
	ASoC: codecs: pcm186x: Fix energysense SLEEP bit
	iio: adc: exynos-adc: Fix NULL pointer exception on unbind
	mei: hbm: clean the feature flags on link reset
	mei: bus: move hw module get/put to probe/release
	stm class: Fix an endless loop in channel allocation
	crypto: caam - fix hash context DMA unmap size
	crypto: ccree - fix missing break in switch statement
	crypto: caam - fixed handling of sg list
	crypto: caam - fix DMA mapping of stack memory
	crypto: ccree - fix free of unallocated mlli buffer
	crypto: ccree - unmap buffer before copying IV
	crypto: ccree - don't copy zero size ciphertext
	crypto: cfb - add missing 'chunksize' property
	crypto: cfb - remove bogus memcpy() with src == dest
	crypto: ahash - fix another early termination in hash walk
	crypto: rockchip - fix scatterlist nents error
	crypto: rockchip - update new iv to device in multiple operations
	drm/imx: ignore plane updates on disabled crtcs
	gpu: ipu-v3: Fix i.MX51 CSI control registers offset
	drm/imx: imx-ldb: add missing of_node_puts
	gpu: ipu-v3: Fix CSI offsets for imx53
	ASoC: rt5682: Correct the setting while select ASRC clk for AD/DA filter
	clocksource: timer-ti-dm: Fix pwm dmtimer usage of fck reparenting
	KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlock
	arm64: dts: rockchip: fix graph_port warning on rk3399 bob kevin and excavator
	s390/dasd: fix using offset into zero size array error
	Input: pwm-vibra - prevent unbalanced regulator
	Input: pwm-vibra - stop regulator after disabling pwm, not before
	ARM: dts: Configure clock parent for pwm vibra
	ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
	ASoC: dapm: fix out-of-bounds accesses to DAPM lookup tables
	ASoC: rsnd: fixup rsnd_ssi_master_clk_start() user count check
	KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded
	arm/arm64: KVM: Allow a VCPU to fully reset itself
	arm/arm64: KVM: Don't panic on failure to properly reset system registers
	KVM: arm/arm64: vgic: Always initialize the group of private IRQs
	KVM: arm64: Forbid kprobing of the VHE world-switch code
	ASoC: samsung: Prevent clk_get_rate() calls in atomic context
	ARM: OMAP2+: fix lack of timer interrupts on CPU1 after hotplug
	Input: cap11xx - switch to using set_brightness_blocking()
	Input: ps2-gpio - flush TX work when closing port
	Input: matrix_keypad - use flush_delayed_work()
	mac80211: call drv_ibss_join() on restart
	mac80211: Fix Tx aggregation session tear down with ITXQs
	netfilter: compat: initialize all fields in xt_init
	blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
	ipvs: fix dependency on nf_defrag_ipv6
	floppy: check_events callback should not return a negative number
	xprtrdma: Make sure Send CQ is allocated on an existing compvec
	NFS: Don't use page_file_mapping after removing the page
	mm/gup: fix gup_pmd_range() for dax
	Revert "mm: use early_pfn_to_nid in page_ext_init"
	scsi: qla2xxx: Fix panic from use after free in qla2x00_async_tm_cmd
	net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
	x86/CPU: Add Icelake model number
	mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
	net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
	i2c: cadence: Fix the hold bit setting
	i2c: bcm2835: Clear current buffer pointers and counts after a transfer
	auxdisplay: ht16k33: fix potential user-after-free on module unload
	Input: st-keyscan - fix potential zalloc NULL dereference
	clk: sunxi-ng: v3s: Fix TCON reset de-assert bit
	kallsyms: Handle too long symbols in kallsyms.c
	clk: sunxi: A31: Fix wrong AHB gate number
	esp: Skip TX bytes accounting when sending from a request socket
	ARM: 8824/1: fix a migrating irq bug when hotplug cpu
	bpf: only adjust gso_size on bytestream protocols
	bpf: fix lockdep false positive in stackmap
	af_key: unconditionally clone on broadcast
	ARM: 8835/1: dma-mapping: Clear DMA ops on teardown
	assoc_array: Fix shortcut creation
	keys: Fix dependency loop between construction record and auth key
	scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
	net: systemport: Fix reception of BPDUs
	net: dsa: bcm_sf2: Do not assume DSA master supports WoL
	pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
	qmi_wwan: apply SET_DTR quirk to Sierra WP7607
	net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
	xfrm: Fix inbound traffic via XFRM interfaces across network namespaces
	mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush timeout issue
	ASoC: topology: free created components in tplg load error
	qed: Fix iWARP buffer size provided for syn packet processing.
	qed: Fix iWARP syn packet mac address validation.
	ARM: dts: armada-xp: fix Armada XP boards NAND description
	arm64: Relax GIC version check during early boot
	ARM: tegra: Restore DT ABI on Tegra124 Chromebooks
	net: marvell: mvneta: fix DMA debug warning
	mm: handle lru_add_drain_all for UP properly
	tmpfs: fix link accounting when a tmpfile is linked in
	ixgbe: fix older devices that do not support IXGBE_MRQC_L3L4TXSWEN
	ARCv2: lib: memcpy: fix doing prefetchw outside of buffer
	ARC: uacces: remove lp_start, lp_end from clobber list
	ARCv2: support manual regfile save on interrupts
	ARCv2: don't assume core 0x54 has dual issue
	phonet: fix building with clang
	mac80211_hwsim: propagate genlmsg_reply return code
	bpf, lpm: fix lookup bug in map_delete_elem
	net: thunderx: make CFG_DONE message to run through generic send-ack sequence
	net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_task
	nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
	nfp: bpf: fix ALU32 high bits clearance bug
	bnxt_en: Fix typo in firmware message timeout logic.
	bnxt_en: Wait longer for the firmware message response to complete.
	net: set static variable an initial value in atl2_probe()
	selftests: fib_tests: sleep after changing carrier. again.
	tmpfs: fix uninitialized return value in shmem_link
	stm class: Prevent division by zero
	nfit: acpi_nfit_ctl(): Check out_obj->type in the right place
	acpi/nfit: Fix bus command validation
	nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot
	nfit/ars: Attempt short-ARS even in the no_init_ars case
	libnvdimm/label: Clear 'updating' flag after label-set update
	libnvdimm, pfn: Fix over-trim in trim_pfn_device()
	libnvdimm/pmem: Honor force_raw for legacy pmem regions
	libnvdimm: Fix altmap reservation size calculation
	fix cgroup_do_mount() handling of failure exits
	crypto: aead - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
	crypto: aegis - fix handling chunked inputs
	crypto: arm/crct10dif - revert to C code for short inputs
	crypto: arm64/aes-neonbs - fix returning final keystream block
	crypto: arm64/crct10dif - revert to C code for short inputs
	crypto: hash - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
	crypto: morus - fix handling chunked inputs
	crypto: pcbc - remove bogus memcpy()s with src == dest
	crypto: skcipher - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
	crypto: testmgr - skip crc32c context test for ahash algorithms
	crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP
	crypto: x86/aesni-gcm - fix crash on empty plaintext
	crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP
	crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling
	crypto: arm64/aes-ccm - fix bugs in non-NEON fallback routine
	CIFS: Do not reset lease state to NONE on lease break
	CIFS: Do not skip SMB2 message IDs on send failures
	CIFS: Fix read after write for files with read caching
	tracing: Use strncpy instead of memcpy for string keys in hist triggers
	tracing: Do not free iter->trace in fail path of tracing_open_pipe()
	tracing/perf: Use strndup_user() instead of buggy open-coded version
	xen: fix dom0 boot on huge systems
	ACPI / device_sysfs: Avoid OF modalias creation for removed device
	mmc: sdhci-esdhc-imx: fix HS400 timing issue
	mmc:fix a bug when max_discard is 0
	netfilter: ipt_CLUSTERIP: fix warning unused variable cn
	spi: ti-qspi: Fix mmap read when more than one CS in use
	spi: pxa2xx: Setup maximum supported DMA transfer length
	regulator: s2mps11: Fix steps for buck7, buck8 and LDO35
	regulator: max77620: Initialize values for DT properties
	regulator: s2mpa01: Fix step values for some LDOs
	clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
	clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
	clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
	s390/setup: fix early warning messages
	s390/virtio: handle find on invalid queue gracefully
	scsi: virtio_scsi: don't send sc payload with tmfs
	scsi: aacraid: Fix performance issue on logical drives
	scsi: sd: Optimal I/O size should be a multiple of physical block size
	scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock
	scsi: qla2xxx: Fix LUN discovery if loop id is not assigned yet by firmware
	fs/devpts: always delete dcache dentry-s in dput()
	splice: don't merge into linked buffers
	ovl: During copy up, first copy up data and then xattrs
	ovl: Do not lose security.capability xattr over metadata file copy-up
	m68k: Add -ffreestanding to CFLAGS
	Btrfs: setup a nofs context for memory allocation at btrfs_create_tree()
	Btrfs: setup a nofs context for memory allocation at __btrfs_set_acl
	btrfs: ensure that a DUP or RAID1 block group has exactly two stripes
	Btrfs: fix corruption reading shared and compressed extents after hole punching
	soc: qcom: rpmh: Avoid accessing freed memory from batch API
	libertas_tf: don't set URB_ZERO_PACKET on IN USB transfer
	irqchip/gic-v3-its: Avoid parsing _indirect_ twice for Device table
	irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code
	x86/kprobes: Prohibit probing on optprobe template code
	cpufreq: kryo: Release OPP tables on module removal
	cpufreq: tegra124: add missing of_node_put()
	cpufreq: pxa2xx: remove incorrect __init annotation
	ext4: fix check of inode in swap_inode_boot_loader
	ext4: cleanup pagecache before swap i_data
	ext4: update quota information while swapping boot loader inode
	ext4: add mask of ext4 flags to swap
	ext4: fix crash during online resizing
	PCI/ASPM: Use LTR if already enabled by platform
	PCI/DPC: Fix print AER status in DPC event handling
	PCI: dwc: skip MSI init if MSIs have been explicitly disabled
	IB/hfi1: Close race condition on user context disable and close
	cxl: Wrap iterations over afu slices inside 'afu_list_lock'
	ext2: Fix underflow in ext2_max_size()
	clk: uniphier: Fix update register for CPU-gear
	clk: clk-twl6040: Fix imprecise external abort for pdmclk
	clk: samsung: exynos5: Fix possible NULL pointer exception on platform_device_alloc() failure
	clk: samsung: exynos5: Fix kfree() of const memory on setting driver_override
	clk: ingenic: Fix round_rate misbehaving with non-integer dividers
	clk: ingenic: Fix doc of ingenic_cgu_div_info
	usb: chipidea: tegra: Fix missed ci_hdrc_remove_device()
	usb: typec: tps6598x: handle block writes separately with plain-I2C adapters
	dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit
	mm: hwpoison: fix thp split handing in soft_offline_in_use_page()
	mm/vmalloc: fix size check for remap_vmalloc_range_partial()
	mm/memory.c: do_fault: avoid usage of stale vm_area_struct
	kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv
	device property: Fix the length used in PROPERTY_ENTRY_STRING()
	intel_th: Don't reference unassigned outputs
	parport_pc: fix find_superio io compare code, should use equal test.
	i2c: tegra: fix maximum transfer size
	media: i2c: ov5640: Fix post-reset delay
	gpio: pca953x: Fix dereference of irq data in shutdown
	can: flexcan: FLEXCAN_IFLAG_MB: add () around macro argument
	drm/i915: Relax mmap VMA check
	bpf: only test gso type on gso packets
	serial: uartps: Fix stuck ISR if RX disabled with non-empty FIFO
	serial: 8250_of: assume reg-shift of 2 for mrvl,mmp-uart
	serial: 8250_pci: Fix number of ports for ACCES serial cards
	serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()
	jbd2: clear dirty flag when revoking a buffer from an older transaction
	jbd2: fix compile warning when using JBUFFER_TRACE
	selinux: add the missing walk_size + len check in selinux_sctp_bind_connect
	security/selinux: fix SECURITY_LSM_NATIVE_LABELS on reused superblock
	powerpc/32: Clear on-stack exception marker upon exception return
	powerpc/wii: properly disable use of BATs when requested.
	powerpc/powernv: Make opal log only readable by root
	powerpc/83xx: Also save/restore SPRG4-7 during suspend
	powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit
	powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest
	powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning
	powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration
	powerpc/traps: fix recoverability of machine check handling on book3s/32
	powerpc/traps: Fix the message printed when stack overflows
	ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify
	arm64: Fix HCR.TGE status for NMI contexts
	arm64: debug: Ensure debug handlers check triggering exception level
	arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
	ipmi_si: fix use-after-free of resource->name
	dm: fix to_sector() for 32bit
	dm integrity: limit the rate of error messages
	mfd: sm501: Fix potential NULL pointer dereference
	cpcap-charger: generate events for userspace
	NFS: Fix I/O request leakages
	NFS: Fix an I/O request leakage in nfs_do_recoalesce
	NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
	nfsd: fix performance-limiting session calculation
	nfsd: fix memory corruption caused by readdir
	nfsd: fix wrong check in write_v4_end_grace()
	NFSv4.1: Reinitialise sequence results before retransmitting a request
	svcrpc: fix UDP on servers with lots of threads
	PM / wakeup: Rework wakeup source timer cancellation
	bcache: never writeback a discard operation
	stable-kernel-rules.rst: add link to networking patch queue
	vt: perform safe console erase in the right order
	x86/unwind/orc: Fix ORC unwind table alignment
	perf intel-pt: Fix CYC timestamp calculation after OVF
	perf tools: Fix split_kallsyms_for_kcore() for trampoline symbols
	perf auxtrace: Define auxtrace record alignment
	perf intel-pt: Fix overlap calculation for padding
	perf/x86/intel/uncore: Fix client IMC events return huge result
	perf intel-pt: Fix divide by zero when TSC is not available
	md: Fix failed allocation of md_register_thread
	tpm/tpm_crb: Avoid unaligned reads in crb_recv()
	tpm: Unify the send callback behaviour
	rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
	media: imx: prpencvf: Stop upstream before disabling IDMA channel
	media: lgdt330x: fix lock status reporting
	media: uvcvideo: Avoid NULL pointer dereference at the end of streaming
	media: vimc: Add vimc-streamer for stream control
	media: imx: csi: Disable CSI immediately after last EOF
	media: imx: csi: Stop upstream before disabling IDMA channel
	drm/fb-helper: generic: Fix drm_fbdev_client_restore()
	drm/radeon/evergreen_cs: fix missing break in switch statement
	drm/amd/powerplay: correct power reading on fiji
	drm/amd/display: don't call dm_pp_ function from an fpu block
	KVM: Call kvm_arch_memslots_updated() before updating memslots
	KVM: x86/mmu: Detect MMIO generation wrap in any address space
	KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux
	KVM: nVMX: Sign extend displacements of VMX instr's mem operands
	KVM: nVMX: Apply addr size mask to effective address for VMX instructions
	KVM: nVMX: Ignore limit checks on VMX instructions using flat segments
	bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata
	s390/setup: fix boot crash for machine without EDAT-1
	Linux 4.19.31

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-03-23 21:13:30 +01:00
Jan Stancek
09417dd35e mm/memory.c: do_fault: avoid usage of stale vm_area_struct
commit fc8efd2ddfed3f343c11b693e87140ff358d7ff5 upstream.

LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8.
This is a stress test, where one thread mmaps/writes/munmaps memory area
and other thread is trying to read from it:

  CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51
  Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
  Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8)
  Call Trace:
  ([<0000000000000000>]           (null))
   [<00000000001adae4>] lock_acquire+0xec/0x258
   [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98
   [<000000000012a780>] page_table_free+0x48/0x1a8
   [<00000000002f6e54>] do_fault+0xdc/0x670
   [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0
   [<00000000002fb138>] handle_mm_fault+0x1b0/0x320
   [<00000000001248cc>] do_dat_exception+0x19c/0x2c8
   [<000000000080e5ee>] pgm_check_handler+0x19e/0x200

page_table_free() is called with NULL mm parameter, but because "0" is a
valid address on s390 (see S390_lowcore), it keeps going until it
eventually crashes in lockdep's lock_acquire.  This crash is
reproducible at least since 4.14.

Problem is that "vmf->vma" used in do_fault() can become stale.  Because
mmap_sem may be released, other threads can come in, call munmap() and
cause "vma" be returned to kmem cache, and get zeroed/re-initialized and
re-used:

handle_mm_fault                           |
  __handle_mm_fault                       |
    do_fault                              |
      vma = vmf->vma                      |
      do_read_fault                       |
        __do_fault                        |
          vma->vm_ops->fault(vmf);        |
            mmap_sem is released          |
                                          |
                                          | do_munmap()
                                          |   remove_vma_list()
                                          |     remove_vma()
                                          |       vm_area_free()
                                          |         # vma is released
                                          | ...
                                          | # same vma is allocated
                                          | # from kmem cache
                                          | do_mmap()
                                          |   vm_area_alloc()
                                          |     memset(vma, 0, ...)
                                          |
      pte_free(vma->vm_mm, ...);          |
        page_table_free                   |
          spin_lock_bh(&mm->context.lock);|
            <crash>                       |

Cache mm_struct to avoid using potentially stale "vma".

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c

Link: http://lkml.kernel.org/r/5b3fdf19e2a5be460a384b936f5b56e13733f1b8.1551595137.git.jstancek@redhat.com
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 20:10:04 +01:00
Roman Penyaev
c1ddc7b785 mm/vmalloc: fix size check for remap_vmalloc_range_partial()
commit 401592d2e095947344e10ec0623adbcd58934dd4 upstream.

When VM_NO_GUARD is not set area->size includes adjacent guard page,
thus for correct size checking get_vm_area_size() should be used, but
not area->size.

This fixes possible kernel oops when userspace tries to mmap an area on
1 page bigger than was allocated by vmalloc_user() call: the size check
inside remap_vmalloc_range_partial() accounts non-existing guard page
also, so check successfully passes but vmalloc_to_page() returns NULL
(guard page does not physically exist).

The following code pattern example should trigger an oops:

  static int oops_mmap(struct file *file, struct vm_area_struct *vma)
  {
        void *mem;

        mem = vmalloc_user(4096);
        BUG_ON(!mem);
        /* Do not care about mem leak */

        return remap_vmalloc_range(vma, mem, 0);
  }

And userspace simply mmaps size + PAGE_SIZE:

  mmap(NULL, 8192, PROT_WRITE|PROT_READ, MAP_PRIVATE, fd, 0);

Possible candidates for oops which do not have any explicit size
checks:

   *** drivers/media/usb/stkwebcam/stk-webcam.c:
   v4l_stk_mmap[789]   ret = remap_vmalloc_range(vma, sbuf->buffer, 0);

Or the following one:

   *** drivers/video/fbdev/core/fbmem.c
   static int
   fb_mmap(struct file *file, struct vm_area_struct * vma)
        ...
        res = fb->fb_mmap(info, vma);

Where fb_mmap callback calls remap_vmalloc_range() directly without any
explicit checks:

   *** drivers/video/fbdev/vfb.c
   static int vfb_mmap(struct fb_info *info,
             struct vm_area_struct *vma)
   {
       return remap_vmalloc_range(vma, (void *)info->fix.smem_start, vma->vm_pgoff);
   }

Link: http://lkml.kernel.org/r/20190103145954.16942-2-rpenyaev@suse.de
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 20:10:04 +01:00
zhongjiang
234c0cc982 mm: hwpoison: fix thp split handing in soft_offline_in_use_page()
commit 46612b751c4941c5c0472ddf04027e877ae5990f upstream.

When soft_offline_in_use_page() runs on a thp tail page after pmd is
split, we trigger the following VM_BUG_ON_PAGE():

  Memory failure: 0x3755ff: non anonymous thp
  __get_any_page: 0x3755ff: unknown zero refcount page type 2fffff80000000
  Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
  page:ffffea000d360140 count:0 mapcount:0 mapping:0000000000000000 index:0x1
  flags: 0x2fffff80000000()
  raw: 002fffff80000000 ffffea000d360108 ffffea000d360188 0000000000000000
  raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
  ------------[ cut here ]------------
  kernel BUG at ./include/linux/mm.h:519!

soft_offline_in_use_page() passed refcount and page lock from tail page
to head page, which is not needed because we can pass any subpage to
split_huge_page().

Naoya had fixed a similar issue in c3901e722b ("mm: hwpoison: fix thp
split handling in memory_failure()").  But he missed fixing soft
offline.

Link: http://lkml.kernel.org/r/1551452476-24000-1-git-send-email-zhongjiang@huawei.com
Fixes: 61f5d698cc ("mm: re-enable THP")
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>	[4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 20:10:04 +01:00
Darrick J. Wong
b3139fbb3b tmpfs: fix uninitialized return value in shmem_link
[ Upstream commit 29b00e609960ae0fcff382f4c7079dd0874a5311 ]

When we made the shmem_reserve_inode call in shmem_link conditional, we
forgot to update the declaration for ret so that it always has a known
value.  Dan Carpenter pointed out this deficiency in the original patch.

Fixes: 1062af920c07 ("tmpfs: fix link accounting when a tmpfile is linked in")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Matej Kupljen <matej.kupljen@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 20:09:52 +01:00
Darrick J. Wong
064a61d3e7 tmpfs: fix link accounting when a tmpfile is linked in
[ Upstream commit 1062af920c07f5b54cf5060fde3339da6df0cf6b ]

tmpfs has a peculiarity of accounting hard links as if they were
separate inodes: so that when the number of inodes is limited, as it is
by default, a user cannot soak up an unlimited amount of unreclaimable
dcache memory just by repeatedly linking a file.

But when v3.11 added O_TMPFILE, and the ability to use linkat() on the
fd, we missed accommodating this new case in tmpfs: "df -i" shows that
an extra "inode" remains accounted after the file is unlinked and the fd
closed and the actual inode evicted.  If a user repeatedly links
tmpfiles into a tmpfs, the limit will be hit (ENOSPC) even after they
are deleted.

Just skip the extra reservation from shmem_link() in this case: there's
a sense in which this first link of a tmpfile is then cheaper than a
hard link of another file, but the accounting works out, and there's
still good limiting, so no need to do anything more complicated.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182134370.7035@eggly.anvils
Fixes: f4e0c30c19 ("allow the temp files created by open() to be linked to")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Matej Kupljen <matej.kupljen@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 20:09:50 +01:00
Michal Hocko
e6e9d6e290 mm: handle lru_add_drain_all for UP properly
[ Upstream commit 6ea183d60c469560e7b08a83c9804299e84ec9eb ]

Since for_each_cpu(cpu, mask) added by commit 2d3854a37e
("cpumask: introduce new API, without changing anything") did not
evaluate the mask argument if NR_CPUS == 1 due to CONFIG_SMP=n,
lru_add_drain_all() is hitting WARN_ON() at __flush_work() added by
commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without
INIT_WORK().") by unconditionally calling flush_work() [1].

Workaround this issue by using CONFIG_SMP=n specific lru_add_drain_all
implementation.  There is no real need to defer the implementation to
the workqueue as the draining is going to happen on the local cpu.  So
alias lru_add_drain_all to lru_add_drain which does all the necessary
work.

[akpm@linux-foundation.org: fix various build warnings]
[1] https://lkml.kernel.org/r/18a30387-6aa5-6123-e67c-57579ecc3f38@roeck-us.net
Link: http://lkml.kernel.org/r/20190213124334.GH4525@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Debugged-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 20:09:50 +01:00
Jann Horn
33e83ea302 mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
[ Upstream commit 2c2ade81741c66082f8211f0b96cf509cc4c0218 ]

The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
number of references that we might need to create in the fastpath later,
the bump-allocation fastpath only has to modify the non-atomic bias value
that tracks the number of extra references we hold instead of the atomic
refcount. The maximum number of allocations we can serve (under the
assumption that no allocation is made with size 0) is nc->size, so that's
the bias used.

However, even when all memory in the allocation has been given away, a
reference to the page is still held; and in the `offset < 0` slowpath, the
page may be reused if everyone else has dropped their references.
This means that the necessary number of references is actually
`nc->size+1`.

Luckily, from a quick grep, it looks like the only path that can call
page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
requires CAP_NET_ADMIN in the init namespace and is only intended to be
used for kernel testing and fuzzing.

To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
`offset < 0` path, below the virt_to_page() call, and then repeatedly call
writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
with a vector consisting of 15 elements containing 1 byte each.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 20:09:46 +01:00
Qian Cai
53dcaeeff1 Revert "mm: use early_pfn_to_nid in page_ext_init"
[ Upstream commit 2f1ee0913ce58efe7f18fbd518bd54c598559b89 ]

This reverts commit fe53ca5427 ("mm: use early_pfn_to_nid in
page_ext_init").

When booting a system with "page_owner=on",

start_kernel
  page_ext_init
    invoke_init_callbacks
      init_section_page_ext
        init_page_owner
          init_early_allocated_pages
            init_zones_in_node
              init_pages_in_zone
                lookup_page_ext
                  page_to_nid

The issue here is that page_to_nid() will not work since some page flags
have no node information until later in page_alloc_init_late() due to
DEFERRED_STRUCT_PAGE_INIT.  Hence, it could trigger an out-of-bounds
access with an invalid nid.

  UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
  index 7 is out of range for type 'zone [5]'

Also, kernel will panic since flags were poisoned earlier with,

CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_NODE_NOT_IN_PAGE_FLAGS=n

start_kernel
  setup_arch
    pagetable_init
      paging_init
        sparse_init
          sparse_init_nid
            memblock_alloc_try_nid_raw

It did not handle it well in init_pages_in_zone() which ends up calling
page_to_nid().

  page:ffffea0004200000 is uninitialized and poisoned
  raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
  raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
  page_owner info is not active (free page?)
  kernel BUG at include/linux/mm.h:990!
  RIP: 0010:init_page_owner+0x486/0x520

This means that assumptions behind commit fe53ca5427 ("mm: use
early_pfn_to_nid in page_ext_init") are incomplete.  Therefore, revert
the commit for now.  A proper way to move the page_owner initialization
to sooner is to hook into memmap initialization.

Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 20:09:46 +01:00
Yu Zhao
8b1a7762e0 mm/gup: fix gup_pmd_range() for dax
[ Upstream commit 414fd080d125408cb15d04ff4907e1dd8145c8c7 ]

For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns true
on x86.  So the function works as long as hugetlb is configured.
However, dax doesn't depend on hugetlb.

Link: http://lkml.kernel.org/r/20190111034033.601-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-23 20:09:46 +01:00
Johannes Weiner
e550f94252 UPSTREAM: psi: pressure stall information for CPU, memory, and IO
When systems are overcommitted and resources become contended, it's hard
to tell exactly the impact this has on workload productivity, or how close
the system is to lockups and OOM kills.  In particular, when machines work
multiple jobs concurrently, the impact of overcommit in terms of latency
and throughput on the individual job can be enormous.

In order to maximize hardware utilization without sacrificing individual
job health or risk complete machine lockups, this patch implements a way
to quantify resource pressure in the system.

A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that
expose the percentage of time the system is stalled on CPU, memory, or IO,
respectively.  Stall states are aggregate versions of the per-task delay
accounting delays:

       cpu: some tasks are runnable but not executing on a CPU
       memory: tasks are reclaiming, or waiting for swapin or thrashing cache
       io: tasks are waiting for io completions

These percentages of walltime can be thought of as pressure percentages,
and they give a general sense of system health and productivity loss
incurred by resource overcommit.  They can also indicate when the system
is approaching lockup scenarios and OOMs.

To do this, psi keeps track of the task states associated with each CPU
and samples the time they spend in stall states.  Every 2 seconds, the
samples are averaged across CPUs - weighted by the CPUs' non-idle time to
eliminate artifacts from unused CPUs - and translated into percentages of
walltime.  A running average of those percentages is maintained over 10s,
1m, and 5m periods (similar to the loadaverage).

[hannes@cmpxchg.org: doc fixlet, per Randy]
  Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org
[hannes@cmpxchg.org: code optimization]
  Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org
[hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter]
  Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org
[hannes@cmpxchg.org: fix build]
  Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org
Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit eb414681d5a07d28d2ff90dc05f69ec6b232ebd2)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Id00d23c977169b0c4636d92016fc1fee0274be05
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-21 16:25:27 -07:00
Johannes Weiner
580f26b93e UPSTREAM: delayacct: track delays from thrashing cache pages
Delay accounting already measures the time a task spends in direct reclaim
and waiting for swapin, but in low memory situations tasks spend can spend
a significant amount of their time waiting on thrashing page cache.  This
isn't tracked right now.

To know the full impact of memory contention on an individual task,
measure the delay when waiting for a recently evicted active cache page to
read back into memory.

Also update tools/accounting/getdelays.c:

     [hannes@computer accounting]$ sudo ./getdelays -d -p 1
     print delayacct stats ON
     PID     1

     CPU             count     real total  virtual total    delay total  delay average
                     50318      745000000      847346785      400533713          0.008ms
     IO              count    delay total  delay average
                       435      122601218              0ms
     SWAP            count    delay total  delay average
                         0              0              0ms
     RECLAIM         count    delay total  delay average
                         0              0              0ms
     THRASHING       count    delay total  delay average
                        19       12621439              0ms

Link: http://lkml.kernel.org/r/20180828172258.3185-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit b1d29ba82cf2bc784f4c963ddd6a2cf29e229b33)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I259f693987cf04e6a52ee7e8accf55a17e0de005
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-21 16:25:26 -07:00
Johannes Weiner
b4a56abdf7 UPSTREAM: mm: workingset: tell cache transitions from workingset thrashing
Refaults happen during transitions between workingsets as well as in-place
thrashing.  Knowing the difference between the two has a range of
applications, including measuring the impact of memory shortage on the
system performance, as well as the ability to smarter balance pressure
between the filesystem cache and the swap-backed workingset.

During workingset transitions, inactive cache refaults and pushes out
established active cache.  When that active cache isn't stale, however,
and also ends up refaulting, that's bonafide thrashing.

Introduce a new page flag that tells on eviction whether the page has been
active or not in its lifetime.  This bit is then stored in the shadow
entry, to classify refaults as transitioning or thrashing.

How many page->flags does this leave us with on 32-bit?

	20 bits are always page flags

	21 if you have an MMU

	23 with the zone bits for DMA, Normal, HighMem, Movable

	29 with the sparsemem section bits

	30 if PAE is enabled

	31 with this patch.

So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes.  If
that's not enough, the system can switch to discontigmem and re-gain the 6
or 7 sparsemem section bits.

Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 8508cf3ffad4defa202b303e5b6379efc4cd9054)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I71df060dce5590a3c654f9a0e8e54deeb74b64c2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-21 16:25:26 -07:00
Greg Kroah-Hartman
2e568c979c This is the 4.19.29 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlyJb/EACgkQONu9yGCS
 aT4y0g//b9t9/onhTaXcY/ByPmBAwqNgugi7eYcZqGDBp7aCDOBLF6eOwbhdvvuS
 ZTaZ5eWG3Twz3mZu9vveuskgMci2npDyLPgqBWGzW+Ef5r/xPd40diaI75ZUc68T
 gimWbQ0VANuXKklK6LysBUaVQWE3ilIy6qnnpj0DI3ipNDoE62Ry1LNthuKy+73J
 w6r7uwkb6X/CkXpNB/L4cDdpSy/CvhGQhd6p91lBuE4DfyPqEzslYCokD9aPXp9b
 Fedt/Re+8eULBNcgqPYxkS5pBrbHtqrGf00AMlzC8DkC+GZyDqSP2xjv6AiTfGJd
 uf0/Jvsv2OBnP4aYsbk+uB2z3plzPBgmXxa/1bm+yrGCMvbpi9mMx75HM2joAeVp
 tVN4ZN65kNgJkXCchJTHdQ3s6teOD8Par1czy570HyKBU6l1j3AGArGm+b4WGPWx
 dL+82coojMKxKNdTHfxUXES6QGKp716r3un6mCrKR0xET/SDayzDQMaSM8UOtArK
 ELzNeKzKTc5oBx6i+JfGmY8ZsedpNGCIPpsiuoSYAaon5ZzNbruzOAlDOThs157d
 YezDHZ9XMrx3kN/xYnqZD63x/5egq9REbZGWljeykbNkWcEY74jIkKwNLxqv3P64
 JsLp60owvjzwtzKycjZogNU//GGNTBdb+6pESq4MxJpPTteFWnc=
 =n9iV
 -----END PGP SIGNATURE-----

Merge 4.19.29 into android-4.19

Changes in 4.19.29
	media: uvcvideo: Fix 'type' check leading to overflow
	vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
	perf script: Fix crash with printing mixed trace point and other events
	perf core: Fix perf_proc_update_handler() bug
	perf tools: Handle TOPOLOGY headers with no CPU
	perf script: Fix crash when processing recorded stat data
	IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
	iommu/amd: Call free_iova_fast with pfn in map_sg
	iommu/amd: Unmap all mapped pages in error path of map_sg
	riscv: fixup max_low_pfn with PFN_DOWN.
	ipvs: Fix signed integer overflow when setsockopt timeout
	iommu/amd: Fix IOMMU page flush when detach device from a domain
	clk: ti: Fix error handling in ti_clk_parse_divider_data()
	clk: qcom: gcc: Use active only source for CPUSS clocks
	xtensa: SMP: fix ccount_timer_shutdown
	riscv: Adjust mmap base address at a third of task size
	IB/ipoib: Fix for use-after-free in ipoib_cm_tx_start
	selftests: cpu-hotplug: fix case where CPUs offline > CPUs present
	xtensa: SMP: fix secondary CPU initialization
	xtensa: smp_lx200_defconfig: fix vectors clash
	xtensa: SMP: mark each possible CPU as present
	iomap: get/put the page in iomap_page_create/release()
	iomap: fix a use after free in iomap_dio_rw
	xtensa: SMP: limit number of possible CPUs by NR_CPUS
	net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
	net: hns: Fix for missing of_node_put() after of_parse_phandle()
	net: hns: Restart autoneg need return failed when autoneg off
	net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
	net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
	netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present
	gpio: vf610: Mask all GPIO interrupts
	selftests: net: use LDLIBS instead of LDFLAGS
	selftests: timers: use LDLIBS instead of LDFLAGS
	nfs: Fix NULL pointer dereference of dev_name
	qed: Fix bug in tx promiscuous mode settings
	qed: Fix LACP pdu drops for VFs
	qed: Fix VF probe failure while FLR
	qed: Fix system crash in ll2 xmit
	qed: Fix stack out of bounds bug
	scsi: libfc: free skb when receiving invalid flogi resp
	scsi: scsi_debug: fix write_same with virtual_gb problem
	scsi: bnx2fc: Fix error handling in probe()
	scsi: 53c700: pass correct "dev" to dma_alloc_attrs()
	platform/x86: Fix unmet dependency warning for ACPI_CMPC
	platform/x86: Fix unmet dependency warning for SAMSUNG_Q10
	net: macb: Apply RXUBR workaround only to versions with errata
	x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode
	cifs: fix computation for MAX_SMB2_HDR_SIZE
	x86/microcode/amd: Don't falsely trick the late loading mechanism
	arm64: kprobe: Always blacklist the KVM world-switch code
	apparmor: Fix aa_label_build() error handling for failed merges
	x86/kexec: Don't setup EFI info if EFI runtime is not enabled
	proc: fix /proc/net/* after setns(2)
	x86_64: increase stack size for KASAN_EXTRA
	mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
	mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
	lib/test_kmod.c: potential double free in error handling
	fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
	autofs: drop dentry reference only when it is never used
	autofs: fix error return in autofs_fill_super()
	mm, memory_hotplug: fix off-by-one in is_pageblock_removable
	ARM: OMAP: dts: N950/N9: fix onenand timings
	ARM: dts: omap4-droid4: Fix typo in cpcap IRQ flags
	ARM: dts: sun8i: h3: Add ethernet0 alias to Beelink X2
	arm: dts: meson: Fix IRQ trigger type for macirq
	ARM: dts: meson8b: odroidc1: mark the SD card detection GPIO active-low
	ARM: dts: meson8m2: mxiii-plus: mark the SD card detection GPIO active-low
	ARM: dts: imx6sx: correct backward compatible of gpt
	arm64: dts: renesas: r8a7796: Enable DMA for SCIF2
	arm64: dts: renesas: r8a77965: Enable DMA for SCIF2
	soc: fsl: qbman: avoid race in clearing QMan interrupt
	pinctrl: mcp23s08: spi: Fix regmap allocation for mcp23s18
	wlcore: sdio: Fixup power on/off sequence
	bpftool: Fix prog dump by tag
	bpftool: fix percpu maps updating
	bpf: sock recvbuff must be limited by rmem_max in bpf_setsockopt()
	ARM: pxa: ssp: unneeded to free devm_ allocated data
	arm64: dts: add msm8996 compatible to gicv3
	batman-adv: release station info tidstats
	DTS: CI20: Fix bugs in ci20's device tree.
	usb: phy: fix link errors
	irqchip/gic-v4: Fix occasional VLPI drop
	irqchip/gic-v3-its: Gracefully fail on LPI exhaustion
	irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
	drm/amdgpu: Add missing power attribute to APU check
	drm/radeon: check if device is root before getting pci speed caps
	drm/amdgpu: Transfer fences to dmabuf importer
	net: stmmac: Fallback to Platform Data clock in Watchdog conversion
	net: stmmac: Send TSO packets always from Queue 0
	net: stmmac: Disable EEE mode earlier in XMIT callback
	irqchip/gic-v3-its: Fix ITT_entry_size accessor
	relay: check return of create_buf_file() properly
	bpf, selftests: fix handling of sparse CPU allocations
	bpf: fix lockdep false positive in percpu_freelist
	bpf: fix potential deadlock in bpf_prog_register
	bpf: Fix syscall's stackmap lookup potential deadlock
	drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init
	dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
	vsock/virtio: fix kernel panic after device hot-unplug
	vsock/virtio: reset connected sockets on device removal
	dmaengine: dmatest: Abort test in case of mapping error
	selftests: netfilter: fix config fragment CONFIG_NF_TABLES_INET
	selftests: netfilter: add simple masq/redirect test cases
	netfilter: nf_nat: skip nat clash resolution for same-origin entries
	s390/qeth: release cmd buffer in error paths
	s390/qeth: fix use-after-free in error path
	s390/qeth: cancel close_dev work before removing a card
	perf symbols: Filter out hidden symbols from labels
	perf trace: Support multiple "vfs_getname" probes
	MIPS: Remove function size check in get_frame_info()
	Revert "scsi: libfc: Add WARN_ON() when deleting rports"
	i2c: omap: Use noirq system sleep pm ops to idle device for suspend
	drm/amdgpu: use spin_lock_irqsave to protect vm_manager.pasid_idr
	nvme: lock NS list changes while handling command effects
	nvme-pci: fix rapid add remove sequence
	fs: ratelimit __find_get_block_slow() failure message.
	qed: Fix EQ full firmware assert.
	qed: Consider TX tcs while deriving the max num_queues for PF.
	qede: Fix system crash on configuring channels.
	blk-iolatency: fix IO hang due to negative inflight counter
	nvme-pci: add missing unlock for reset error
	Input: wacom_serial4 - add support for Wacom ArtPad II tablet
	Input: elan_i2c - add id for touchpad found in Lenovo s21e-20
	iscsi_ibft: Fix missing break in switch statement
	scsi: aacraid: Fix missing break in switch statement
	x86/PCI: Fixup RTIT_BAR of Intel Denverton Trace Hub
	arm64: dts: zcu100-revC: Give wifi some time after power-on
	arm64: dts: hikey: Give wifi some time after power-on
	arm64: dts: hikey: Revert "Enable HS200 mode on eMMC"
	ARM: dts: exynos: Fix pinctrl definition for eMMC RTSN line on Odroid X2/U3
	ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
	ARM: dts: exynos: Fix max voltage for buck8 regulator on Odroid XU3/XU4
	drm: disable uncached DMA optimization for ARM and arm64
	netfilter: xt_TEE: fix wrong interface selection
	netfilter: xt_TEE: add missing code to get interface index in checkentry.
	gfs2: Fix missed wakeups in find_insert_glock
	staging: erofs: add error handling for xattr submodule
	staging: erofs: fix fast symlink w/o xattr when fs xattr is on
	staging: erofs: fix memleak of inode's shared xattr array
	staging: erofs: fix race of initializing xattrs of a inode at the same time
	staging: erofs: keep corrupted fs from crashing kernel in erofs_namei()
	cifs: allow calling SMB2_xxx_free(NULL)
	ath9k: Avoid OF no-EEPROM quirks without qca,no-eeprom
	driver core: Postpone DMA tear-down until after devres release
	perf/x86/intel: Make cpuc allocations consistent
	perf/x86/intel: Generalize dynamic constraint creation
	x86: Add TSX Force Abort CPUID/MSR
	perf/x86/intel: Implement support for TSX Force Abort
	Linux 4.19.29

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-03-13 14:17:29 -07:00
Michal Hocko
2cc84e2ea6 mm, memory_hotplug: fix off-by-one in is_pageblock_removable
[ Upstream commit 891cb2a72d821f930a39d5900cb7a3aa752c1d5b ]

Rong Chen has reported the following boot crash:

    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-gefad4e4 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    RIP: 0010:page_mapping+0x12/0x80
    Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
    RSP: 0018:ffff88801fa87cd8 EFLAGS: 00010202
    RAX: ffffffffffffffff RBX: fffffffffffffffe RCX: 000000000000000a
    RDX: fffffffffffffffe RSI: ffffffff820b9a20 RDI: ffff88801e5c0000
    RBP: 6db6db6db6db6db7 R08: ffff88801e8bb000 R09: 0000000001b64d13
    R10: ffff88801fa87cf8 R11: 0000000000000001 R12: ffff88801e640000
    R13: ffffffff820b9a20 R14: ffff88801f145258 R15: 0000000000000001
    FS:  00007fb2079817c0(0000) GS:ffff88801dd00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000006 CR3: 000000001fa82000 CR4: 00000000000006a0
    Call Trace:
     __dump_page+0x14/0x2c0
     is_mem_section_removable+0x24c/0x2c0
     removable_show+0x87/0xa0
     dev_attr_show+0x25/0x60
     sysfs_kf_seq_show+0xba/0x110
     seq_read+0x196/0x3f0
     __vfs_read+0x34/0x180
     vfs_read+0xa0/0x150
     ksys_read+0x44/0xb0
     do_syscall_64+0x5e/0x4a0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe

and bisected it down to commit efad4e475c31 ("mm, memory_hotplug:
is_mem_section_removable do not pass the end of a zone").

The reason for the crash is that the mapping is garbage for poisoned
(uninitialized) page.  This shouldn't happen as all pages in the zone's
boundary should be initialized.

Later debugging revealed that the actual problem is an off-by-one when
evaluating the end_page.  'start_pfn + nr_pages' resp 'zone_end_pfn'
refers to a pfn after the range and as such it might belong to a
differen memory section.

This along with CONFIG_SPARSEMEM then makes the loop condition
completely bogus because a pointer arithmetic doesn't work for pages
from two different sections in that memory model.

Fix the issue by reworking is_pageblock_removable to be pfn based and
only use struct page where necessary.  This makes the code slightly
easier to follow and we will remove the problematic pointer arithmetic
completely.

Link: http://lkml.kernel.org/r/20190218181544.14616-1-mhocko@kernel.org
Fixes: efad4e475c31 ("mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: <rong.a.chen@intel.com>
Tested-by: <rong.a.chen@intel.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-13 14:02:32 -07:00
Mikhail Zaslonko
71df1c8bc7 mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
[ Upstream commit 24feb47c5fa5b825efb0151f28906dfdad027e61 ]

If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized.  This may lead
to VM_BUG_ON due to uninitialized struct pages access from
test_pages_in_a_zone() function triggered by memory_hotplug sysfs
handlers.

Here are the the panic examples:
 CONFIG_DEBUG_VM_PGFLAGS=y
 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   test_pages_in_a_zone+0xde/0x160
   show_valid_zones+0x5c/0x190
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix this by checking whether the pfn to check is within the zone.

[mhocko@suse.com: separated this change from http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Link: http://lkml.kernel.org/r/20190128144506.15603-3-mhocko@kernel.org

[mhocko@suse.com: separated this change from
http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-13 14:02:32 -07:00
Michal Hocko
6027792d6a mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
[ Upstream commit efad4e475c312456edb3c789d0996d12ed744c13 ]

Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2.

Mikhail Zaslonko has posted fixes for the two bugs quite some time ago
[1].  I have pushed back on those fixes because I believed that it is
much better to plug the problem at the initialization time rather than
play whack-a-mole all over the hotplug code and find all the places
which expect the full memory section to be initialized.

We have ended up with commit 2830bf6f05fb ("mm, memory_hotplug:
initialize struct pages for the full memory section") merged and cause a
regression [2][3].  The reason is that there might be memory layouts
when two NUMA nodes share the same memory section so the merged fix is
simply incorrect.

In order to plug this hole we really have to be zone range aware in
those handlers.  I have split up the original patch into two.  One is
unchanged (patch 2) and I took a different approach for `removable'
crash.

[1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948
[3] http://lkml.kernel.org/r/20190125163938.GA20411@dhcp22.suse.cz

This patch (of 2):

Mikhail has reported the following VM_BUG_ON triggered when reading sysfs
removable state of a memory block:

 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   is_mem_section_removable+0xb4/0x190
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

The reason is that the memory block spans the zone boundary and we are
stumbling over an unitialized struct page.  Fix this by enforcing zone
range in is_mem_section_removable so that we never run away from a zone.

Link: http://lkml.kernel.org/r/20190128144506.15603-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Debugged-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-13 14:02:32 -07:00
Greg Kroah-Hartman
36d178b3bc This is the 4.19.27 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlx+qs4ACgkQONu9yGCS
 aT4OQQ/8DMkQa61GLVtL1Zm0i2GW+fpgAJj59QP7jRdilolP7BBnrX2Ne26Dx+X0
 bYLJDJw6WexsvLKOkSMol4Y3Q3NsBguC+MasVrrWBHpfntizlaPzsYuTKkyNRG0c
 vjOYyTeAxElWqEh1WaVQhk/juhBbrTtkZIK63h9M60KYb0qHA7TY9oTGokEA8WA0
 11Yuge2aV6cR8zup1coWR9NRvW/uealiENAhr0Jw1ZRtrBqwzQoFyn5psXdbzkjb
 HrwvDa/nkwfJuc/1+grOTF5HfI7D5+giq0iRtvBuChjDwqfJ2IrVRPF/DKSmR0Oz
 Sq1ILUdj61gd2hp4N9zpUE87r2tV2ABg/yZrtDcKFhCiUWnfaaGArn7mm/R7OhCD
 PjRDz7acVvKq3LLenM6xcFsAWUdjOoUWcGk70eO/ZZctW8o+HozxFcVrNmT/75zJ
 LcagplvYLiLB4Gard6niNH8qunmq/jD+GkgdKtF9GFHOjG1QMdQ9d/VxeJWGAjtA
 nE8ZX9EFYgmq14OR+YK0DNOOFBZe10lcR9rMIFp6T9AanY7rVT7LqqG6BUXb226P
 mJmNS9sSJhictucyyf/ej9j0jULBBsAYJA5+fkECHIioewACjFGn1dZfm3BBrDAb
 OoNIwaLmrvjh2iKwsj17OK9Fv8PG6VKwYRFQ/I3OGVouZUX5SEk=
 =edyH
 -----END PGP SIGNATURE-----

Merge 4.19.27 into android-4.19

Changes in 4.19.27
	irq/matrix: Split out the CPU selection code into a helper
	irq/matrix: Spread managed interrupts on allocation
	genirq/matrix: Improve target CPU selection for managed interrupts.
	mac80211: Change default tx_sk_pacing_shift to 7
	scsi: libsas: Fix rphy phy_identifier for PHYs with end devices attached
	drm/msm: Unblock writer if reader closes file
	ASoC: Intel: Haswell/Broadwell: fix setting for .dynamic field
	ALSA: compress: prevent potential divide by zero bugs
	ASoC: Variable "val" in function rt274_i2c_probe() could be uninitialized
	clk: tegra: dfll: Fix a potential Oop in remove()
	clk: sysfs: fix invalid JSON in clk_dump
	clk: vc5: Abort clock configuration without upstream clock
	thermal: int340x_thermal: Fix a NULL vs IS_ERR() check
	usb: dwc3: gadget: synchronize_irq dwc irq in suspend
	usb: dwc3: gadget: Fix the uninitialized link_state when udc starts
	usb: gadget: Potential NULL dereference on allocation error
	selftests: rtc: rtctest: fix alarm tests
	selftests: rtc: rtctest: add alarm test on minute boundary
	genirq: Make sure the initial affinity is not empty
	x86/mm/mem_encrypt: Fix erroneous sizeof()
	ASoC: rt5682: Fix PLL source register definitions
	ASoC: dapm: change snprintf to scnprintf for possible overflow
	ASoC: imx-audmux: change snprintf to scnprintf for possible overflow
	selftests/vm/gup_benchmark.c: match gup struct to kernel
	phy: ath79-usb: Fix the power on error path
	phy: ath79-usb: Fix the main reset name to match the DT binding
	selftests: seccomp: use LDLIBS instead of LDFLAGS
	selftests: gpio-mockup-chardev: Check asprintf() for error
	irqchip/gic-v3-mbi: Fix uninitialized mbi_lock
	ARC: fix __ffs return value to avoid build warnings
	ARC: show_regs: lockdep: avoid page allocator...
	drivers: thermal: int340x_thermal: Fix sysfs race condition
	staging: rtl8723bs: Fix build error with Clang when inlining is disabled
	mac80211: fix miscounting of ttl-dropped frames
	sched/wait: Fix rcuwait_wake_up() ordering
	sched/wake_q: Fix wakeup ordering for wake_q
	futex: Fix (possible) missed wakeup
	locking/rwsem: Fix (possible) missed wakeup
	drm/amd/powerplay: OD setting fix on Vega10
	tty: serial: qcom_geni_serial: Allow mctrl when flow control is disabled
	serial: fsl_lpuart: fix maximum acceptable baud rate with over-sampling
	drm/sun4i: hdmi: Fix usage of TMDS clock
	staging: android: ion: Support cpu access during dma_buf_detach
	direct-io: allow direct writes to empty inodes
	writeback: synchronize sync(2) against cgroup writeback membership switches
	scsi: lpfc: nvme: avoid hang / use-after-free when destroying localport
	scsi: lpfc: nvmet: avoid hang / use-after-free when destroying targetport
	scsi: csiostor: fix NULL pointer dereference in csio_vport_set_state()
	net: altera_tse: fix connect_local_phy error path
	hv_netvsc: Fix ethtool change hash key error
	hv_netvsc: Refactor assignments of struct netvsc_device_info
	hv_netvsc: Fix hash key value reset after other ops
	nvme-rdma: fix timeout handler
	nvme-multipath: drop optimization for static ANA group IDs
	drm/msm: Fix A6XX support for opp-level
	net: usb: asix: ax88772_bind return error when hw_reset fail
	net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
	ibmveth: Do not process frames after calling napi_reschedule
	mac80211: don't initiate TDLS connection if station is not associated to AP
	mac80211: Add attribute aligned(2) to struct 'action'
	cfg80211: extend range deviation for DMG
	svm: Fix AVIC incomplete IPI emulation
	KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting to L1
	kvm: selftests: Fix region overlap check in kvm_util
	mmc: spi: Fix card detection during probe
	mmc: tmio_mmc_core: don't claim spurious interrupts
	mmc: tmio: fix access width of Block Count Register
	mmc: core: Fix NULL ptr crash from mmc_should_fail_request
	mmc: cqhci: fix space allocated for transfer descriptor
	mmc: cqhci: Fix a tiny potential memory leak on error condition
	mmc: sdhci-esdhc-imx: correct the fix of ERR004536
	mm: enforce min addr even if capable() in expand_downwards()
	drm: Block fb changes for async plane updates
	hugetlbfs: fix races and page leaks during migration
	MIPS: fix truncation in __cmpxchg_small for short values
	MIPS: BCM63XX: provide DMA masks for ethernet devices
	MIPS: eBPF: Fix icache flush end address
	x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
	Linux 4.19.27

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-03-05 18:07:53 +01:00
Mike Kravetz
527cabfffb hugetlbfs: fix races and page leaks during migration
commit cb6acd01e2e43fd8bad11155752b7699c3d0fb76 upstream.

hugetlb pages should only be migrated if they are 'active'.  The
routines set/clear_page_huge_active() modify the active state of hugetlb
pages.

When a new hugetlb page is allocated at fault time, set_page_huge_active
is called before the page is locked.  Therefore, another thread could
race and migrate the page while it is being added to page table by the
fault code.  This race is somewhat hard to trigger, but can be seen by
strategically adding udelay to simulate worst case scheduling behavior.
Depending on 'how' the code races, various BUG()s could be triggered.

To address this issue, simply delay the set_page_huge_active call until
after the page is successfully added to the page table.

Hugetlb pages can also be leaked at migration time if the pages are
associated with a file in an explicitly mounted hugetlbfs filesystem.
For example, consider a two node system with 4GB worth of huge pages
available.  A program mmaps a 2G file in a hugetlbfs filesystem.  It
then migrates the pages associated with the file from one node to
another.  When the program exits, huge page counts are as follows:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  0       free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

That is as expected.  2G of huge pages are taken from the free_hugepages
counts, and 2G is the size of the file in the explicitly mounted
filesystem.  If the file is then removed, the counts become:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  1024    free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

Note that the filesystem still shows 2G of pages used, while there
actually are no huge pages in use.  The only way to 'fix' the filesystem
accounting is to unmount the filesystem

If a hugetlb page is associated with an explicitly mounted filesystem,
this information in contained in the page_private field.  At migration
time, this information is not preserved.  To fix, simply transfer
page_private from old to new page at migration time if necessary.

There is a related race with removing a huge page from a file and
migration.  When a huge page is removed from the pagecache, the
page_mapping() field is cleared, yet page_private remains set until the
page is actually freed by free_huge_page().  A page could be migrated
while in this state.  However, since page_mapping() is not set the
hugetlbfs specific routine to transfer page_private is not called and we
leak the page count in the filesystem.

To fix that, check for this condition before migrating a huge page.  If
the condition is detected, return EBUSY for the page.

Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
Fixes: bcc5422230 ("mm: hugetlb: introduce page_huge_active")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
[mike.kravetz@oracle.com: v2]
  Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
[mike.kravetz@oracle.com: update comment and changelog]
  Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-05 17:58:53 +01:00
Jann Horn
de04d2973a mm: enforce min addr even if capable() in expand_downwards()
commit 0a1d52994d440e21def1c2174932410b4f2a98a1 upstream.

security_mmap_addr() does a capability check with current_cred(), but
we can reach this code from contexts like a VFS write handler where
current_cred() must not be used.

This can be abused on systems without SMAP to make NULL pointer
dereferences exploitable again.

Fixes: 8869477a49 ("security: protect from stack expansion into low vm addresses")
Cc: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-05 17:58:53 +01:00
Tejun Heo
edca54b897 writeback: synchronize sync(2) against cgroup writeback membership switches
[ Upstream commit 7fc5854f8c6efae9e7624970ab49a1eac2faefb1 ]

sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes.  For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.

This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.

v2: Fixed misplaced rwsem init.  Spotted by Jiufei.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jiufei Xue <xuejiufei@gmail.com>
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-05 17:58:50 +01:00
Greg Kroah-Hartman
c97d2b535c This is the 4.19.26 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlx2U68ACgkQONu9yGCS
 aT6rHA/+PGrMAAzSp193bme2rap7P/EzDD0JF0aQOLwizdit5A4zIYSZApVVSaw5
 U8KIQg+fcwIjTvzBmkHddDVVJBEv/VdE8Xkf0EOIOidIQNqMoRQN7ctU1bnKfUyC
 20/xZBlc78BEU4TB4JpNiKkzD3bsl/Gsb2A3VenhHe0H8dbj4oJYyNWWY99/433y
 u1SNmTogLJ0p64+1XWMCTqPJA2xkUIOciFSLiO7d51MfpbguMwcW+T9XSwYat3tp
 uTdfvZYKc4iGPa4pZCZmZY7+Lrxne2OjMhp3bUlp0coKrjlNO848X+kG7gsVZ7EN
 MA7YOXUj/Ngjl+E9EilNY5gybY2hK0IzaaV8W9TRVYvPBjgSwe5Mj9/EXYO0L8q0
 Arla5HsJdRhLX7HDJ41w8BIngko87hetAaTZVCct+F+nnMYGrrMsle+BgYYbnd7r
 NYWB8HwZZ3krkWmAZb+Phpleo3Oj7beum5MLoydSH4pXhTVft5D/V8K2fwLmZtgP
 FsIC1J09jDvXJ5qHTFBY4ugTq8l21YZ2oMPdwcB6X8hBIKji4XydQ0HPrdMyAtXB
 tqUkzyLZlP1uzm9T3E+hyATWeM4ue7qLFacp7cX5LokDWIxlWi7tcIqBHqRZez+y
 S7un8/Lmz8ZvzC674JTZl1iXSHuuK6TFLrwI3mw/CVhg2qRvZtk=
 =EDHP
 -----END PGP SIGNATURE-----

Merge 4.19.26 into android-4.19

Changes in 4.19.26
	ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction
	tracing: Fix number of entries in trace header
	MIPS: eBPF: Always return sign extended 32b values
	gpio: MT7621: use a per instance irq_chip structure
	gpio: pxa: avoid attempting to set pin direction via pinctrl on MMP2
	mac80211: Restore vif beacon interval if start ap fails
	mac80211: Use linked list instead of rhashtable walk for mesh tables
	mac80211: Free mpath object when rhashtable insertion fails
	libceph: handle an empty authorize reply
	ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
	numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
	proc, oom: do not report alien mms when setting oom_score_adj
	ALSA: hda/realtek - Headset microphone and internal speaker support for System76 oryp5
	ALSA: hda/realtek: Disable PC beep in passthrough on alc285
	KEYS: allow reaching the keys quotas exactly
	backlight: pwm_bl: Fix devicetree parsing with auto-generated brightness tables
	mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells
	pvcalls-front: read all data before closing the connection
	pvcalls-front: don't try to free unallocated rings
	pvcalls-front: properly allocate sk
	pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
	mfd: twl-core: Fix section annotations on {,un}protect_pm_master
	mfd: db8500-prcmu: Fix some section annotations
	mfd: mt6397: Do not call irq_domain_remove if PMIC unsupported
	mfd: ab8500-core: Return zero in get_register_interruptible()
	mfd: bd9571mwv: Add volatile register to make DVFS work
	mfd: qcom_rpm: write fw_version to CTRL_REG
	mfd: wm5110: Add missing ASRC rate register
	mfd: axp20x: Add AC power supply cell for AXP813
	mfd: axp20x: Re-align MFD cell entries
	mfd: axp20x: Add supported cells for AXP803
	mfd: cros_ec_dev: Add missing mfd_remove_devices() call in remove
	mfd: tps65218: Use devm_regmap_add_irq_chip and clean up error path in probe()
	mfd: mc13xxx: Fix a missing check of a register-read failure
	xen/pvcalls: remove set but not used variable 'intf'
	qed: Fix qed_chain_set_prod() for PBL chains with non power of 2 page count
	qed: Fix qed_ll2_post_rx_buffer_notify_fw() by adding a write memory barrier
	net: hns: Fix use after free identified by SLUB debug
	bpf: Fix [::] -> [::1] rewrite in sys_sendmsg
	selftests/bpf: Test [::] -> [::1] rewrite in sys_sendmsg in test_sock_addr
	watchdog: mt7621_wdt/rt2880_wdt: Fix compilation problem
	net/mlx4: Get rid of page operation after dma_alloc_coherent
	MIPS: ath79: Enable OF serial ports in the default config
	xprtrdma: Double free in rpcrdma_sendctxs_create()
	mlxsw: spectrum_acl: Add cleanup after C-TCAM update error condition
	selftests: forwarding: Add a test for VLAN deletion
	netfilter: nf_tables: fix leaking object reference count
	scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
	scsi: isci: initialize shost fully before calling scsi_add_host()
	include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
	MIPS: jazz: fix 64bit build
	netfilter: nft_flow_offload: Fix reverse route lookup
	bpf: correctly set initial window on active Fast Open sender
	pvcalls-front: Avoid get_free_pages(GFP_KERNEL) under spinlock
	bpf: fix panic in stack_map_get_build_id() on i386 and arm32
	netfilter: nft_flow_offload: fix interaction with vrf slave device
	RDMA/mthca: Clear QP objects during their allocation
	powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.
	acpi/nfit: Fix race accessing memdev in nfit_get_smbios_id()
	net: stmmac: Fix PCI module removal leak
	net: stmmac: dwxgmac2: Only clear interrupts that are active
	net: stmmac: Check if CBS is supported before configuring
	net: stmmac: Fix the logic of checking if RX Watchdog must be enabled
	net: stmmac: Prevent RX starvation in stmmac_napi_poll()
	isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
	scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes
	scsi: ufs: Fix system suspend status
	scsi: qedi: Add ep_state for login completion on un-reachable targets
	scsi: ufs: Fix geometry descriptor size
	scsi: cxgb4i: add wait_for_completion()
	netfilter: nft_flow_offload: fix checking method of conntrack helper
	always clear the X2APIC_ENABLE bit for PV guest
	drm/meson: add missing of_node_put
	drm/amdkfd: Don't assign dGPUs to APU topology devices
	drm/amd/display: fix PME notification not working in RV desktop
	vhost: return EINVAL if iovecs size does not match the message size
	drm/sun4i: backend: add missing of_node_puts
	pvcalls-front: fix potential null dereference
	selftests: tc-testing: drop test on missing tunnel key id
	selftests: tc-testing: fix tunnel_key failure if dst_port is unspecified
	selftests: tc-testing: fix parsing of ife type
	afs: Don't set vnode->cb_s_break in afs_validate()
	afs: Fix key refcounting in file locking code
	bpf: don't assume build-id length is always 20 bytes
	bpf: zero out build_id for BPF_STACK_BUILD_ID_IP
	selftests/bpf: retry tests that expect build-id
	atm: he: fix sign-extension overflow on large shift
	hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table
	leds: lp5523: fix a missing check of return value of lp55xx_read
	bpf: bpf_setsockopt: reset sock dst on SO_MARK changes
	dpaa_eth: NETIF_F_LLTX requires to do our own update of trans_start
	mlxsw: pci: Return error on PCI reset timeout
	net: bridge: Mark FDB entries that were added by user as such
	mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
	selftests: forwarding: Add a test case for externally learned FDB entries
	net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
	isdn: avm: Fix string plus integer warning from Clang
	batman-adv: fix uninit-value in batadv_interface_tx()
	inet_diag: fix reporting cgroup classid and fallback to priority
	ipv6: propagate genlmsg_reply return code
	net: ena: fix race between link up and device initalization
	net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
	net/mlx5e: Don't overwrite pedit action when multiple pedit used
	net/packet: fix 4gb buffer limit due to overflow check
	net: sfp: do not probe SFP module before we're attached
	sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
	sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
	team: avoid complex list operations in team_nl_cmd_options_set()
	Revert "socket: fix struct ifreq size in compat ioctl"
	Revert "kill dev_ifsioc()"
	net: socket: fix SIOCGIFNAME in compat
	net: socket: make bond ioctls go through compat_ifreq_ioctl()
	geneve: should not call rt6_lookup() when ipv6 was disabled
	sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
	net_sched: fix a race condition in tcindex_destroy()
	net_sched: fix a memory leak in cls_tcindex
	net_sched: fix two more memory leaks in cls_tcindex
	net/mlx5e: XDP, fix redirect resources availability check
	RDMA/srp: Rework SCSI device reset handling
	KEYS: user: Align the payload buffer
	KEYS: always initialize keyring_index_key::desc_len
	parisc: Fix ptrace syscall number modification
	ARCv2: Enable unaligned access in early ASM code
	ARC: U-boot: check arguments paranoidly
	ARC: define ARCH_SLAB_MINALIGN = 8
	drm/amdgpu: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
	gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime
	drm/i915/fbdev: Actually configure untiled displays
	drm/amd/display: Fix MST reboot/poweroff sequence
	mac80211: allocate tailroom for forwarded mesh packets
	kvm: x86: Return LA57 feature based on hardware capability
	net: validate untrusted gso packets without csum offload
	net: avoid false positives in untrusted gso validation
	staging: erofs: fix a bug when appling cache strategy
	staging: erofs: complete error handing of z_erofs_do_read_page
	staging: erofs: replace BUG_ON with DBG_BUGON in data.c
	staging: erofs: drop multiref support temporarily
	staging: erofs: remove the redundant d_rehash() for the root dentry
	staging: erofs: atomic_cond_read_relaxed on ref-locked workgroup
	staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}'
	staging: erofs: add a full barrier in erofs_workgroup_unfreeze
	staging: erofs: {dir,inode,super}.c: rectify BUG_ONs
	staging: erofs: unzip_{pagevec.h,vle.c}: rectify BUG_ONs
	staging: erofs: unzip_vle_lz4.c,utils.c: rectify BUG_ONs
	Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
	netfilter: nf_tables: fix flush after rule deletion in the same batch
	netfilter: nft_compat: use-after-free when deleting targets
	netfilter: ipv6: Don't preserve original oif for loopback address
	netfilter: nfnetlink_osf: add missing fmatch check
	netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put()
	udlfb: handle unplug properly
	pinctrl: max77620: Use define directive for max77620_pinconf_param values
	net: phylink: avoid resolving link state too early
	Linux 4.19.26

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-27 10:23:18 +01:00
Ralph Campbell
c7ddb2689d numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
commit 050c17f239fd53adb55aa768d4f41bc76c0fe045 upstream.

The system call, get_mempolicy() [1], passes an unsigned long *nodemask
pointer and an unsigned long maxnode argument which specifies the length
of the user's nodemask array in bits (which is rounded up).  The manual
page says that if the maxnode value is too small, get_mempolicy will
return EINVAL but there is no system call to return this minimum value.
To determine this value, some programs search /proc/<pid>/status for a
line starting with "Mems_allowed:" and use the number of digits in the
mask to determine the minimum value.  A recent change to the way this line
is formatted [2] causes these programs to compute a value less than
MAX_NUMNODES so get_mempolicy() returns EINVAL.

Change get_mempolicy(), the older compat version of get_mempolicy(), and
the copy_nodes_to_user() function to use nr_node_ids instead of
MAX_NUMNODES, thus preserving the defacto method of computing the minimum
size for the nodemask array and the maxnode argument.

[1] http://man7.org/linux/man-pages/man2/get_mempolicy.2.html
[2] https://lore.kernel.org/lkml/1545405631-6808-1-git-send-email-longman@redhat.com

Link: http://lkml.kernel.org/r/20190211180245.22295-1-rcampbell@nvidia.com
Fixes: 4fb8e5b89bcbbbb ("include/linux/nodemask.h: use nr_node_ids (not MAX_NUMNODES) in __nodemask_pr_numnodes()")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-27 10:08:50 +01:00
Greg Kroah-Hartman
cca7d2df6d This is the 4.19.24 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxtHSUACgkQONu9yGCS
 aT6WIw//Xdd2P7aIWSDnmS6L+RlEtdz2hnO07O2lMVIhsH1MgetAk0nJbP8b9sYp
 zJY4kW6gEue1/J7vHIiTnDpnhaylxESOxOb49O/E95bnPIdhE9EqR1aiMmo8R5rU
 8F2lH6kZncAawZ/7WGnQdHqukO9GtwFO0hwtsDT5RcwiIPe/fHkIpp1QfN21ntib
 nV7xQOfDFEMVcbyzAFa+59A4UUQSWKTkld9tcJFG62pjcN9DlpPej86M2qy9mG8U
 0d7hEkYGqHthh092TAxgobsJDq5j0jUUVjK80/Fq0TKe/OSvsgQ8MRBryVsgBf8J
 hJBCLHay4Ps3ONtNI3IXDVZu9dgdL8UynBEfbWdgJHlGgKJEvMGxFCckzI8OP3yr
 HJtFm6gG24kySFt+QnvwHg1/zwAho5FDBMfx02RkukJwDiN16DZSbKRimrTVAioV
 bBTmL+JyoCBLYGfUzq4gM1+3L5TRyMBqjgcjLvXWuXF18fdpVdb0exKlNtLHMUf6
 aqzhroF90sn5vUybpw02ouduxmRPWDaDel7ArA3zb16Dp+ODCxYtpkV7dmRBjPVK
 YxwweO604Rd2musacQkVAB5JkshjPdV7WA/FmhJf7Fn1/CaaXliAVrfMNbEGW15a
 NFQvTMrJva2pihXYJY4p/UfhnsBmxXmY30wYlLX2VxRFb+qoy3Y=
 =MO0g
 -----END PGP SIGNATURE-----

Merge 4.19.24 into android-4.19

Changes in 4.19.24
	dt-bindings: eeprom: at24: add "atmel,24c2048" compatible string
	eeprom: at24: add support for 24c2048
	blk-mq: fix a hung issue when fsync
	ARM: 8789/1: signal: copy registers using __copy_to_user()
	ARM: 8790/1: signal: always use __copy_to_user to save iwmmxt context
	ARM: 8791/1: vfp: use __copy_to_user() when saving VFP state
	ARM: 8792/1: oabi-compat: copy oabi events using __copy_to_user()
	ARM: 8793/1: signal: replace __put_user_error with __put_user
	ARM: 8794/1: uaccess: Prevent speculative use of the current addr_limit
	ARM: 8795/1: spectre-v1.1: use put_user() for __put_user()
	ARM: 8796/1: spectre-v1,v1.1: provide helpers for address sanitization
	ARM: 8797/1: spectre-v1.1: harden __copy_to_user
	ARM: 8810/1: vfp: Fix wrong assignement to ufp_exc
	ARM: make lookup_processor_type() non-__init
	ARM: split out processor lookup
	ARM: clean up per-processor check_bugs method call
	ARM: add PROC_VTABLE and PROC_TABLE macros
	ARM: spectre-v2: per-CPU vtables to work around big.Little systems
	ARM: ensure that processor vtables is not lost after boot
	ARM: fix the cockup in the previous patch
	drm/amdgpu/sriov:Correct pfvf exchange logic
	ACPI: NUMA: Use correct type for printing addresses on i386-PAE
	perf report: Fix wrong iteration count in --branch-history
	perf test shell: Use a fallback to get the pathname in vfs_getname
	tools uapi: fix RISC-V 64-bit support
	riscv: fix trace_sys_exit hook
	cpufreq: check if policy is inactive early in __cpufreq_get()
	drm/bridge: tc358767: add bus flags
	drm/bridge: tc358767: add defines for DP1_SRCCTRL & PHY_2LANE
	drm/bridge: tc358767: fix single lane configuration
	drm/bridge: tc358767: fix initial DP0/1_SRCCTRL value
	drm/bridge: tc358767: reject modes which require too much BW
	drm/bridge: tc358767: fix output H/V syncs
	nvme-pci: use the same attributes when freeing host_mem_desc_bufs.
	nvme-pci: fix out of bounds access in nvme_cqe_pending
	nvme-multipath: zero out ANA log buffer
	nvme: pad fake subsys NQN vid and ssvid with zeros
	drm/amdgpu: set WRITE_BURST_LENGTH to 64B to workaround SDMA1 hang
	ARM: dts: da850-evm: Correct the audio codec regulators
	ARM: dts: da850-evm: Correct the sound card name
	ARM: dts: da850-lcdk: Correct the audio codec regulators
	ARM: dts: da850-lcdk: Correct the sound card name
	ARM: dts: kirkwood: Fix polarity of GPIO fan lines
	gpio: pl061: handle failed allocations
	drm/nouveau: Don't disable polling in fallback mode
	drm/nouveau/falcon: avoid touching registers if engine is off
	cifs: Limit memory used by lock request calls to a page
	kvm: sev: Fail KVM_SEV_INIT if already initialized
	CIFS: Do not assume one credit for async responses
	gpio: mxc: move gpio noirq suspend/resume to syscore phase
	Revert "Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G"
	Input: elan_i2c - add ACPI ID for touchpad in Lenovo V330-15ISK
	ARM: OMAP5+: Fix inverted nirq pin interrupts with irq_set_type
	perf/core: Fix impossible ring-buffer sizes warning
	perf/x86: Add check_period PMU callback
	ALSA: hda - Add quirk for HP EliteBook 840 G5
	ALSA: usb-audio: Fix implicit fb endpoint setup by quirk
	ASoC: hdmi-codec: fix oops on re-probe
	tools uapi: fix Alpha support
	riscv: Add pte bit to distinguish swap from invalid
	x86/kvm/nVMX: read from MSR_IA32_VMX_PROCBASED_CTLS2 only when it is available
	kvm: vmx: Fix entry number check for add_atomic_switch_msr()
	mmc: sunxi: Filter out unsupported modes declared in the device tree
	mmc: block: handle complete_work on separate workqueue
	Input: bma150 - register input device after setting private data
	Input: elantech - enable 3rd button support on Fujitsu CELSIUS H780
	Revert "nfsd4: return default lease period"
	Revert "mm: don't reclaim inodes with many attached pages"
	Revert "mm: slowly shrink slabs with a relatively small number of objects"
	alpha: fix page fault handling for r16-r18 targets
	alpha: Fix Eiger NR_IRQS to 128
	s390/zcrypt: fix specification exception on z196 during ap probe
	tracing/uprobes: Fix output for multiple string arguments
	x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls
	scsi: sd: fix entropy gathering for most rotational disks
	signal: Restore the stop PTRACE_EVENT_EXIT
	md/raid1: don't clear bitmap bits on interrupted recovery.
	x86/a.out: Clear the dump structure initially
	dm crypt: don't overallocate the integrity tag space
	dm thin: fix bug where bio that overwrites thin block ignores FUA
	drm: Use array_size() when creating lease
	drm/vkms: Fix license inconsistent
	drm/i915: Block fbdev HPD processing during suspend
	drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set
	mm: proc: smaps_rollup: fix pss_locked calculation
	Linux 4.19.24

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-20 10:37:09 +01:00
Dave Chinner
657fbf79a8 Revert "mm: slowly shrink slabs with a relatively small number of objects"
commit a9a238e83fbb0df31c3b9b67003f8f9d1d1b6c96 upstream.

This reverts commit 172b06c32b ("mm: slowly shrink slabs with a
relatively small number of objects").

This change changes the agressiveness of shrinker reclaim, causing small
cache and low priority reclaim to greatly increase scanning pressure on
small caches.  As a result, light memory pressure has a disproportionate
affect on small caches, and causes large caches to be reclaimed much
faster than previously.

As a result, it greatly perturbs the delicate balance of the VFS caches
(dentry/inode vs file page cache) such that the inode/dentry caches are
reclaimed much, much faster than the page cache and this drives us into
several other caching imbalance related problems.

As such, this is a bad change and needs to be reverted.

[ Needs some massaging to retain the later seekless shrinker
  modifications.]

Link: http://lkml.kernel.org/r/20190130041707.27750-3-david@fromorbit.com
Fixes: 172b06c32b ("mm: slowly shrink slabs with a relatively small number of objects")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-20 10:25:47 +01:00
Greg Kroah-Hartman
6e0411bdc2 This is the 4.19.21 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxjFL8ACgkQONu9yGCS
 aT643xAAk+mnsrOfU6/LBFjcJUUUYohK01UAU+PRvTjy4uH8rrA4G01xvp11ftIu
 jjEikA2582ScR5a2Ww+E4SfqOdKT4z9hOGLTnyI0P4xN9jeVidvu9+C90AYyBYhi
 orHm1osVQIj6n9+OQ5db+DzZYbZLbyfCoqNXbq9EoLvNRS3FUUH0y2VXqcz9Ghcj
 obpdHTVMKRaFkRWdCglo+3hSpoKrncSVpKrwUXR18GCt8jjZjj39kI9t6UoGNfc7
 nN3GOd26U1tpGo6ShZJYu6aPjV+zoYNlsg1o2zn9qJANIdulYe30vqNhWCeJ+/T7
 WcT3EHv4pPEO3Lvgfp+l10Nc6IbYdJEFUpAP3CvfP+MvRfKvz8Vo3Nm/BQlr20+q
 +MUYJb+wxhlHPRLV192XbnYFkEzZg7vzymoMPL034XheAkOkPbOK0IIVo41p5Rai
 LxmOdvhzfAktbtD/VWnLTUbexjs2EJ05bvZRjdPKKIMBNKnAWz4ux3KcHxpdsUF8
 KMCKwpJE8KDM4uiaKVdyfMhFeIg37pmy+7Uv9cUFjWwtsL3K+CiAXg8uaSvnajKr
 bOhwbFIgxoOI9VRBK8M0wvzouphA0miVbxOY81sdfbMeWNpbCLdb968AFz25llta
 rVlWp7bSAUQTsvTkVBv6mrTKPwzO4jnfHYN8yj5gO7pr5fmC2ig=
 =L+Mp
 -----END PGP SIGNATURE-----

Merge 4.19.21 into android-4.19

Changes in 4.19.21
	devres: Align data[] to ARCH_KMALLOC_MINALIGN
	drm/bufs: Fix Spectre v1 vulnerability
	staging: iio: adc: ad7280a: handle error from __ad7280_read32()
	drm/vgem: Fix vgem_init to get drm device available.
	pinctrl: bcm2835: Use raw spinlock for RT compatibility
	ASoC: Intel: mrfld: fix uninitialized variable access
	gpiolib: Fix possible use after free on label
	drm/sun4i: Initialize registers in tcon-top driver
	genirq/affinity: Spread IRQs to all available NUMA nodes
	gpu: ipu-v3: image-convert: Prevent race between run and unprepare
	nds32: Fix gcc 8.0 compiler option incompatible.
	wil6210: fix reset flow for Talyn-mb
	wil6210: fix memory leak in wil_find_tx_bcast_2
	ath10k: assign 'n_cipher_suites' for WCN3990
	ath9k: dynack: use authentication messages for 'late' ack
	scsi: lpfc: Correct LCB RJT handling
	scsi: mpt3sas: Call sas_remove_host before removing the target devices
	scsi: lpfc: Fix LOGO/PLOGI handling when triggerd by ABTS Timeout event
	ARM: 8808/1: kexec:offline panic_smp_self_stop CPU
	clk: boston: fix possible memory leak in clk_boston_setup()
	dlm: Don't swamp the CPU with callbacks queued during recovery
	x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux)
	powerpc/pseries: add of_node_put() in dlpar_detach_node()
	crypto: aes_ti - disable interrupts while accessing S-box
	drm/vc4: ->x_scaling[1] should never be set to VC4_SCALING_NONE
	serial: fsl_lpuart: clear parity enable bit when disable parity
	ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl
	MIPS: Boston: Disable EG20T prefetch
	dpaa2-ptp: defer probe when portal allocation failed
	iwlwifi: fw: do not set sgi bits for HE connection
	staging:iio:ad2s90: Make probe handle spi_setup failure
	fpga: altera-cvp: Fix registration for CvP incapable devices
	Tools: hv: kvp: Fix a warning of buffer overflow with gcc 8.0.1
	fpga: altera-cvp: fix 'bad IO access' on x86_64
	vbox: fix link error with 'gcc -Og'
	platform/chrome: don't report EC_MKBP_EVENT_SENSOR_FIFO as wakeup
	i40e: prevent overlapping tx_timeout recover
	scsi: hisi_sas: change the time of SAS SSP connection
	staging: iio: ad7780: update voltage on read
	usbnet: smsc95xx: fix rx packet alignment
	drm/rockchip: fix for mailbox read size
	ARM: OMAP2+: hwmod: Fix some section annotations
	drm/amd/display: fix gamma not being applied correctly
	drm/amd/display: calculate stream->phy_pix_clk before clock mapping
	bpf: libbpf: retry map creation without the name
	net/mlx5: EQ, Use the right place to store/read IRQ affinity hint
	modpost: validate symbol names also in find_elf_symbol
	perf tools: Add Hygon Dhyana support
	soc/tegra: Don't leak device tree node reference
	media: rc: ensure close() is called on rc_unregister_device
	media: video-i2c: avoid accessing released memory area when removing driver
	media: mtk-vcodec: Release device nodes in mtk_vcodec_init_enc_pm()
	staging: erofs: fix the definition of DBG_BUGON
	clk: meson: meson8b: do not use cpu_div3 for cpu_scale_out_sel
	clk: meson: meson8b: fix the width of the cpu_scale_div clock
	clk: meson: meson8b: mark the CPU clock as CLK_IS_CRITICAL
	ptp: Fix pass zero to ERR_PTR() in ptp_clock_register
	dmaengine: xilinx_dma: Remove __aligned attribute on zynqmp_dma_desc_ll
	powerpc/32: Add .data..Lubsan_data*/.data..Lubsan_type* sections explicitly
	iio: adc: meson-saradc: check for devm_kasprintf failure
	iio: adc: meson-saradc: fix internal clock names
	iio: accel: kxcjk1013: Add KIOX010A ACPI Hardware-ID
	media: adv*/tc358743/ths8200: fill in min width/height/pixelclock
	ACPI: SPCR: Consider baud rate 0 as preconfigured state
	staging: pi433: fix potential null dereference
	f2fs: move dir data flush to write checkpoint process
	f2fs: fix race between write_checkpoint and write_begin
	f2fs: fix wrong return value of f2fs_acl_create
	i2c: sh_mobile: add support for r8a77990 (R-Car E3)
	arm64: io: Ensure calls to delay routines are ordered against prior readX()
	net: aquantia: return 'err' if set MPI_DEINIT state fails
	sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN
	soc: bcm: brcmstb: Don't leak device tree node reference
	nfsd4: fix crash on writing v4_end_grace before nfsd startup
	drm: Clear state->acquire_ctx before leaving drm_atomic_helper_commit_duplicated_state()
	perf: arm_spe: handle devm_kasprintf() failure
	arm64: io: Ensure value passed to __iormb() is held in a 64-bit register
	Thermal: do not clear passive state during system sleep
	thermal: Fix locking in cooling device sysfs update cur_state
	firmware/efi: Add NULL pointer checks in efivars API functions
	s390/zcrypt: improve special ap message cmd handling
	mt76x0: dfs: fix IBI_R11 configuration on non-radar channels
	arm64: ftrace: don't adjust the LR value
	drm/v3d: Fix prime imports of buffers from other drivers.
	ARM: dts: mmp2: fix TWSI2
	ARM: dts: aspeed: add missing memory unit-address
	x86/fpu: Add might_fault() to user_insn()
	media: i2c: TDA1997x: select CONFIG_HDMI
	media: DaVinci-VPBE: fix error handling in vpbe_initialize()
	smack: fix access permissions for keyring
	xtensa: xtfpga.dtsi: fix dtc warnings about SPI
	usb: dwc3: Correct the logic for checking TRB full in __dwc3_prepare_one_trb()
	usb: dwc2: Disable power down feature on Samsung SoCs
	usb: hub: delay hub autosuspend if USB3 port is still link training
	timekeeping: Use proper seqcount initializer
	usb: mtu3: fix the issue about SetFeature(U1/U2_Enable)
	clk: sunxi-ng: a33: Set CLK_SET_RATE_PARENT for all audio module clocks
	media: imx274: select REGMAP_I2C
	drm/amdgpu/powerplay: fix clock stretcher limits on polaris (v2)
	tipc: fix node keep alive interval calculation
	driver core: Move async_synchronize_full call
	kobject: return error code if writing /sys/.../uevent fails
	IB/hfi1: Unreserve a reserved request when it is completed
	usb: dwc3: trace: add missing break statement to make compiler happy
	gpio: mt7621: report failure of devm_kasprintf()
	gpio: mt7621: pass mediatek_gpio_bank_probe() failure up the stack
	pinctrl: sx150x: handle failure case of devm_kstrdup
	iommu/amd: Fix amd_iommu=force_isolation
	ARM: dts: Fix OMAP4430 SDP Ethernet startup
	mips: bpf: fix encoding bug for mm_srlv32_op
	media: coda: fix H.264 deblocking filter controls
	ARM: dts: Fix up the D-Link DIR-685 MTD partition info
	watchdog: renesas_wdt: don't set divider while watchdog is running
	ARM: dts: imx51-zii-rdu1: Do not specify "power-gpio" for hpa1
	usb: dwc3: gadget: Disable CSP for stream OUT ep
	iommu/arm-smmu-v3: Avoid memory corruption from Hisilicon MSI payloads
	iommu/arm-smmu: Add support for qcom,smmu-v2 variant
	iommu/arm-smmu-v3: Use explicit mb() when moving cons pointer
	sata_rcar: fix deferred probing
	clk: imx6sl: ensure MMDC CH0 handshake is bypassed
	platform/x86: mlx-platform: Fix tachometer registers
	cpuidle: big.LITTLE: fix refcount leak
	OPP: Use opp_table->regulators to verify no regulator case
	tee: optee: avoid possible double list_del()
	drm/msm/dsi: fix dsi clock names in DSI 10nm PLL driver
	drm/msm: dpu: Only check flush register against pending flushes
	lightnvm: pblk: fix resubmission of overwritten write err lbas
	lightnvm: pblk: add lock protection to list operations
	i2c-axxia: check for error conditions first
	phy: sun4i-usb: add support for missing USB PHY index
	mlxsw: spectrum_acl: Limit priority value
	udf: Fix BUG on corrupted inode
	switchtec: Fix SWITCHTEC_IOCTL_EVENT_IDX_ALL flags overwrite
	selftests/bpf: use __bpf_constant_htons in test_prog.c
	ARM: pxa: avoid section mismatch warning
	ASoC: fsl: Fix SND_SOC_EUKREA_TLV320 build error on i.MX8M
	KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
	mmc: bcm2835: Recover from MMC_SEND_EXT_CSD
	mmc: bcm2835: reset host on timeout
	mmc: meson-mx-sdio: check devm_kasprintf for failure
	memstick: Prevent memstick host from getting runtime suspended during card detection
	mmc: sdhci-of-esdhc: Fix timeout checks
	mmc: sdhci-omap: Fix timeout checks
	mmc: sdhci-xenon: Fix timeout checks
	mmc: jz4740: Get CD/WP GPIOs from descriptors
	usb: renesas_usbhs: add support for RZ/G2E
	btrfs: harden agaist duplicate fsid on scanned devices
	serial: sh-sci: Fix locking in sci_submit_rx()
	serial: sh-sci: Resume PIO in sci_rx_interrupt() on DMA failure
	tty: serial: samsung: Properly set flags in autoCTS mode
	perf test: Fix perf_event_attr test failure
	perf dso: Fix unchecked usage of strncpy()
	perf header: Fix unchecked usage of strncpy()
	btrfs: use tagged writepage to mitigate livelock of snapshot
	perf probe: Fix unchecked usage of strncpy()
	i2c: sh_mobile: Add support for r8a774c0 (RZ/G2E)
	bnxt_en: Disable MSIX before re-reserving NQs/CMPL rings.
	tools/power/x86/intel_pstate_tracer: Fix non root execution for post processing a trace file
	livepatch: check kzalloc return values
	arm64: KVM: Skip MMIO insn after emulation
	usb: musb: dsps: fix otg state machine
	usb: musb: dsps: fix runtime pm for peripheral mode
	perf header: Fix up argument to ctime()
	perf tools: Cast off_t to s64 to avoid warning on bionic libc
	percpu: convert spin_lock_irq to spin_lock_irqsave.
	net: hns3: fix incomplete uninitialization of IRQ in the hns3_nic_uninit_vector_data()
	drm/amd/display: Add retry to read ddc_clock pin
	Bluetooth: hci_bcm: Handle deferred probing for the clock supply
	drm/amd/display: fix YCbCr420 blank color
	powerpc/uaccess: fix warning/error with access_ok()
	mac80211: fix radiotap vendor presence bitmap handling
	xfrm6_tunnel: Fix spi check in __xfrm6_tunnel_alloc_spi
	mlxsw: spectrum: Properly cleanup LAG uppers when removing port from LAG
	scsi: smartpqi: correct host serial num for ssa
	scsi: smartpqi: correct volume status
	scsi: smartpqi: increase fw status register read timeout
	cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()
	net: hns3: add max vector number check for pf
	powerpc/perf: Fix thresholding counter data for unknown type
	iwlwifi: mvm: fix setting HE ppe FW config
	powerpc/powernv/ioda: Allocate indirect TCE levels of cached userspace addresses on demand
	mlx5: update timecounter at least twice per counter overflow
	drbd: narrow rcu_read_lock in drbd_sync_handshake
	drbd: disconnect, if the wrong UUIDs are attached on a connected peer
	drbd: skip spurious timeout (ping-timeo) when failing promote
	drbd: Avoid Clang warning about pointless switch statment
	drm/amd/display: validate extended dongle caps
	video: clps711x-fb: release disp device node in probe()
	md: fix raid10 hang issue caused by barrier
	fbdev: fbmem: behave better with small rotated displays and many CPUs
	i40e: define proper net_device::neigh_priv_len
	ice: Do not enable NAPI on q_vectors that have no rings
	igb: Fix an issue that PME is not enabled during runtime suspend
	ACPI/APEI: Clear GHES block_status before panic()
	fbdev: fbcon: Fix unregister crash when more than one framebuffer
	powerpc/mm: Fix reporting of kernel execute faults on the 8xx
	pinctrl: meson: meson8: fix the GPIO function for the GPIOAO pins
	pinctrl: meson: meson8b: fix the GPIO function for the GPIOAO pins
	KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupported
	powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
	kvm: Change offset in kvm_write_guest_offset_cached to unsigned
	NFS: nfs_compare_mount_options always compare auth flavors.
	perf build: Don't unconditionally link the libbfd feature test to -liberty and -lz
	hwmon: (lm80) fix a missing check of the status of SMBus read
	hwmon: (lm80) fix a missing check of bus read in lm80 probe
	seq_buf: Make seq_buf_puts() null-terminate the buffer
	crypto: ux500 - Use proper enum in cryp_set_dma_transfer
	crypto: ux500 - Use proper enum in hash_set_dma_transfer
	MIPS: ralink: Select CONFIG_CPU_MIPSR2_IRQ_VI on MT7620/8
	cifs: check ntwrk_buf_start for NULL before dereferencing it
	f2fs: fix use-after-free issue when accessing sbi->stat_info
	um: Avoid marking pages with "changed protection"
	niu: fix missing checks of niu_pci_eeprom_read
	f2fs: fix sbi->extent_list corruption issue
	cgroup: fix parsing empty mount option string
	perf python: Do not force closing original perf descriptor in evlist.get_pollfd()
	scripts/decode_stacktrace: only strip base path when a prefix of the path
	arch/sh/boards/mach-kfr2r09/setup.c: fix struct mtd_oob_ops build warning
	ocfs2: don't clear bh uptodate for block read
	ocfs2: improve ocfs2 Makefile
	mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
	zram: fix lockdep warning of free block handling
	isdn: hisax: hfc_pci: Fix a possible concurrency use-after-free bug in HFCPCI_l1hw()
	gdrom: fix a memory leak bug
	fsl/fman: Use GFP_ATOMIC in {memac,tgec}_add_hash_mac_address()
	block/swim3: Fix -EBUSY error when re-opening device after unmount
	thermal: bcm2835: enable hwmon explicitly
	kdb: Don't back trace on a cpu that didn't round up
	PCI: imx: Enable MSI from downstream components
	thermal: generic-adc: Fix adc to temp interpolation
	HID: lenovo: Add checks to fix of_led_classdev_register
	arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition
	kernel/hung_task.c: break RCU locks based on jiffies
	proc/sysctl: fix return error for proc_doulongvec_minmax()
	kernel/hung_task.c: force console verbose before panic
	fs/epoll: drop ovflist branch prediction
	exec: load_script: don't blindly truncate shebang string
	kernel/kcov.c: mark write_comp_data() as notrace
	scripts/gdb: fix lx-version string output
	xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat
	xfs: cancel COW blocks before swapext
	xfs: Fix error code in 'xfs_ioc_getbmap()'
	xfs: fix overflow in xfs_attr3_leaf_verify
	xfs: fix shared extent data corruption due to missing cow reservation
	xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
	xfs: delalloc -> unwritten COW fork allocation can go wrong
	fs/xfs: fix f_ffree value for statfs when project quota is set
	xfs: fix PAGE_MASK usage in xfs_free_file_space
	xfs: fix inverted return from xfs_btree_sblock_verify_crc
	thermal: hwmon: inline helpers when CONFIG_THERMAL_HWMON is not set
	dccp: fool proof ccid_hc_[rt]x_parse_options()
	enic: fix checksum validation for IPv6
	lib/test_rhashtable: Make test_insert_dup() allocate its hash table dynamically
	net: dp83640: expire old TX-skb
	net: dsa: Fix lockdep false positive splat
	net: dsa: Fix NULL checking in dsa_slave_set_eee()
	net: dsa: mv88e6xxx: Fix counting of ATU violations
	net: dsa: slave: Don't propagate flag changes on down slave interfaces
	net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
	net: systemport: Fix WoL with password after deep sleep
	rds: fix refcount bug in rds_sock_addref
	Revert "net: phy: marvell: avoid pause mode on SGMII-to-Copper for 88e151x"
	rxrpc: bad unlock balance in rxrpc_recvmsg
	sctp: check and update stream->out_curr when allocating stream_out
	sctp: walk the list of asoc safely
	skge: potential memory corruption in skge_get_regs()
	virtio_net: Account for tx bytes and packets on sending xdp_frames
	net/mlx5e: FPGA, fix Innova IPsec TX offload data path performance
	xfs: eof trim writeback mapping as soon as it is cached
	ALSA: compress: Fix stop handling on compressed capture streams
	ALSA: usb-audio: Add support for new T+A USB DAC
	ALSA: hda - Serialize codec registrations
	ALSA: hda/realtek - Fix lose hp_pins for disable auto mute
	ALSA: hda/realtek - Use a common helper for hp pin reference
	ALSA: hda/realtek - Headset microphone support for System76 darp5
	fuse: call pipe_buf_release() under pipe lock
	fuse: decrement NR_WRITEBACK_TEMP on the right page
	fuse: handle zero sized retrieve correctly
	HID: debug: fix the ring buffer implementation
	dmaengine: bcm2835: Fix interrupt race on RT
	dmaengine: bcm2835: Fix abort of transactions
	dmaengine: imx-dma: fix wrong callback invoke
	futex: Handle early deadlock return correctly
	irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID
	usb: phy: am335x: fix race condition in _probe
	usb: dwc3: gadget: Handle 0 xfer length for OUT EP
	usb: gadget: udc: net2272: Fix bitwise and boolean operations
	usb: gadget: musb: fix short isoc packets with inventra dma
	staging: speakup: fix tty-operation NULL derefs
	scsi: cxlflash: Prevent deadlock when adapter probe fails
	scsi: aic94xx: fix module loading
	KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
	kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
	KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
	cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
	perf/x86/intel/uncore: Add Node ID mask
	x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()
	perf/core: Don't WARN() for impossible ring-buffer sizes
	perf tests evsel-tp-sched: Fix bitwise operator
	serial: fix race between flush_to_ldisc and tty_open
	serial: 8250_pci: Make PCI class test non fatal
	serial: sh-sci: Do not free irqs that have already been freed
	cacheinfo: Keep the old value if of_property_read_u32 fails
	IB/hfi1: Add limit test for RC/UC send via loopback
	perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu()
	ath9k: dynack: make ewma estimation faster
	ath9k: dynack: check da->enabled first in sampling routines
	Linux 4.19.21

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-12 20:37:21 +01:00
Waiman Long
f73c77535f mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
[ Upstream commit 3c0c12cc8f00ca5f81acb010023b8eb13e9a7004 ]

When CONFIG_KASAN is enabled on large memory SMP systems, the deferrred
pages initialization can take a long time.  Below were the reported init
times on a 8-socket 96-core 4TB IvyBridge system.

  1) Non-debug kernel without CONFIG_KASAN
     [    8.764222] node 1 initialised, 132086516 pages in 7027ms

  2) Debug kernel with CONFIG_KASAN
     [  146.288115] node 1 initialised, 132075466 pages in 143052ms

So the page init time in a debug kernel was 20X of the non-debug kernel.
The long init time can be problematic as the page initialization is done
with interrupt disabled.  In this particular case, it caused the
appearance of following warning messages as well as NMI backtraces of all
the cores that were doing the initialization.

[   68.240049] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   68.241000] rcu: 	25-...0: (100 ticks this GP) idle=b72/1/0x4000000000000000 softirq=915/915 fqs=16252
[   68.241000] rcu: 	44-...0: (95 ticks this GP) idle=49a/1/0x4000000000000000 softirq=788/788 fqs=16253
[   68.241000] rcu: 	54-...0: (104 ticks this GP) idle=03a/1/0x4000000000000000 softirq=721/825 fqs=16253
[   68.241000] rcu: 	60-...0: (103 ticks this GP) idle=cbe/1/0x4000000000000000 softirq=637/740 fqs=16253
[   68.241000] rcu: 	72-...0: (105 ticks this GP) idle=786/1/0x4000000000000000 softirq=536/641 fqs=16253
[   68.241000] rcu: 	84-...0: (99 ticks this GP) idle=292/1/0x4000000000000000 softirq=537/537 fqs=16253
[   68.241000] rcu: 	111-...0: (104 ticks this GP) idle=bde/1/0x4000000000000000 softirq=474/476 fqs=16253
[   68.241000] rcu: 	(detected by 13, t=65018 jiffies, g=249, q=2)

The long init time was mainly caused by the call to kasan_free_pages() to
poison the newly initialized pages.  On a 4TB system, we are talking about
almost 500GB of memory probably on the same node.

In reality, we may not need to poison the newly initialized pages before
they are ever allocated.  So KASAN poisoning of freed pages before the
completion of deferred memory initialization is now disabled.  Those pages
will be properly poisoned when they are allocated or freed after deferred
pages initialization is done.

With this change, the new page initialization time became:

[   21.948010] node 1 initialised, 132075466 pages in 18702ms

This was still about double the non-debug kernel time, but was much
better than before.

Link: http://lkml.kernel.org/r/1544459388-8736-1-git-send-email-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:18 +01:00
Dennis Zhou
7c54ba4a5f percpu: convert spin_lock_irq to spin_lock_irqsave.
[ Upstream commit 6ab7d47bcbf0144a8cb81536c2cead4cde18acfe ]

From Michael Cree:
  "Bisection lead to commit b38d08f318 ("percpu: restructure
   locking") as being the cause of lockups at initial boot on
   the kernel built for generic Alpha.

   On a suggestion by Tejun Heo that:

   So, the only thing I can think of is that it's calling
   spin_unlock_irq() while irq handling isn't set up yet.
   Can you please try the followings?

   1. Convert all spin_[un]lock_irq() to
      spin_lock_irqsave/unlock_irqrestore()."

Fixes: b38d08f318 ("percpu: restructure locking")
Reported-and-tested-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12 19:47:12 +01:00
Greg Kroah-Hartman
c28f73fe42 This is the 4.19.20 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxbC5gACgkQONu9yGCS
 aT4DYQ//Uqm/Q63KQuExgd7W+61FoP4NFHlYXZ31B5Rkydryyk2K5P2ONSdVd5n9
 k3wjzRrxvlPvjOwbh9PHv+pLkGxBDqpT1X8IAXPe36bYUkvXoH71BE4YSRPRUJAf
 sdzw/vs7WE/Kx41iT3SXiQih8ok0y3LoACBKmUsEXoLI1cJZCUnnSFpP++QNe1Iz
 B/y04BigL8R7OWR/jow6OPWe9uXOI8iEe9QKVX26g4oaakzly4vkp6OwROSwM31q
 0wut8jF/AtDcZpZXjJLjDCj10k5DRN8jwGcLD7iZeIKqexOabjUrsvfIHfbpUtXr
 e7pJw2aUM8BFb8Ba2lsB7gkqvdHQohqVKQE4Qy59aPyesm2G5miH4gAbncoixjCa
 u3eQV5ACpFLksUFR4RAMKq+10k7swsutyyJr5vG4qdbRpcTCNJirEwAGGqgI6IEP
 SDqtw6u8gMP8+SicwA9p71Wwntcq9RR6fx0gX/3wi2DQp6F8Txem00SqaciE7uQ1
 uIOUrhcpWzIq4m58SGhgTSQcBkm5qBD5S154/xRKIo0mvME+NwBub/x3fIsixN/u
 AzWQmQPXBajHbYXbKGC7t2jNHkU5d9FedZ4iDmJk/+ZZsWyFByY1bH1cg4Qnq89e
 tDxL114YmSujbZD/mFlbGWcqdmGNT355BmyetKDx6w0rNiU/RBU=
 =oprJ
 -----END PGP SIGNATURE-----

Merge 4.19.20 into android-4.19

Changes in 4.19.20
	Fix "net: ipv4: do not handle duplicate fragments as overlapping"
	drm/msm/gpu: fix building without debugfs
	ipv6: Consider sk_bound_dev_if when binding a socket to an address
	ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
	ipvlan, l3mdev: fix broken l3s mode wrt local routes
	l2tp: copy 4 more bytes to linear part if necessary
	l2tp: fix reading optional fields of L2TPv3
	net: ip_gre: always reports o_key to userspace
	net: ip_gre: use erspan key field for tunnel lookup
	net/mlx4_core: Add masking for a few queries on HCA caps
	netrom: switch to sock timer API
	net/rose: fix NULL ax25_cb kernel panic
	net: set default network namespace in init_dummy_netdev()
	ravb: expand rx descriptor data to accommodate hw checksum
	sctp: improve the events for sctp stream reset
	tun: move the call to tun_set_real_num_queues
	ucc_geth: Reset BQL queue when stopping device
	vhost: fix OOB in get_rx_bufs()
	net: ip6_gre: always reports o_key to userspace
	sctp: improve the events for sctp stream adding
	net/mlx5e: Allow MAC invalidation while spoofchk is ON
	ip6mr: Fix notifiers call on mroute_clean_tables()
	Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
	sctp: set chunk transport correctly when it's a new asoc
	sctp: set flow sport from saddr only when it's 0
	virtio_net: Don't enable NAPI when interface is down
	virtio_net: Don't call free_old_xmit_skbs for xdp_frames
	virtio_net: Fix not restoring real_num_rx_queues
	virtio_net: Fix out of bounds access of sq
	virtio_net: Don't process redirected XDP frames when XDP is disabled
	virtio_net: Use xdp_return_frame to free xdp_frames on destroying vqs
	virtio_net: Differentiate sk_buff and xdp_frame on freeing
	CIFS: Do not count -ENODATA as failure for query directory
	CIFS: Fix trace command logging for SMB2 reads and writes
	CIFS: Do not consider -ENODATA as stat failure for reads
	fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb()
	iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
	selftests/seccomp: Enhance per-arch ptrace syscall skip tests
	NFS: Fix up return value on fatal errors in nfs_page_async_flush()
	ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment
	arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
	arm64: Do not issue IPIs for user executable ptes
	arm64: hyp-stub: Forbid kprobing of the hyp-stub
	arm64: hibernate: Clean the __hyp_text to PoC after resume
	gpio: altera-a10sr: Set proper output level for direction_output
	gpiolib: fix line event timestamps for nested irqs
	gpio: pcf857x: Fix interrupts on multiple instances
	gpio: sprd: Fix the incorrect data register
	gpio: sprd: Fix incorrect irq type setting for the async EIC
	gfs2: Revert "Fix loop in gfs2_rbm_find"
	mmc: bcm2835: Fix DMA channel leak on probe error
	mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay
	ALSA: usb-audio: Add Opus #3 to quirks for native DSD support
	ALSA: hda/realtek - Fixed hp_pin no value
	IB/hfi1: Remove overly conservative VM_EXEC flag check
	platform/x86: asus-nb-wmi: Map 0x35 to KEY_SCREENLOCK
	platform/x86: asus-nb-wmi: Drop mapping of 0x33 and 0x34 scan codes
	mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
	Btrfs: fix deadlock when allocating tree block during leaf/node split
	btrfs: On error always free subvol_name in btrfs_mount
	kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
	mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
	oom, oom_reaper: do not enqueue same task twice
	mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
	mm, oom: fix use-after-free in oom_kill_process
	mm: hwpoison: use do_send_sig_info() instead of force_sig()
	mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
	of: Convert to using %pOFn instead of device_node.name
	of: overlay: add tests to validate kfrees from overlay removal
	of: overlay: add missing of_node_get() in __of_attach_node_sysfs
	of: overlay: use prop add changeset entry for property in new nodes
	of: overlay: do not duplicate properties from overlay for new nodes
	md/raid5: fix 'out of memory' during raid cache recovery
	cifs: Always resolve hostname before reconnecting
	Linux 4.19.20

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-02-07 08:40:17 +01:00
David Hildenbrand
214dea147f mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
commit e0a352fabce61f730341d119fbedf71ffdb8663f upstream.

We had a race in the old balloon compaction code before b1123ea6d3
("mm: balloon: use general non-lru movable page feature") refactored it
that became visible after backporting 195a8c43e9 ("virtio-balloon:
deflate via a page list") without the refactoring.

The bug existed from commit d6d86c0a7f ("mm/balloon_compaction:
redesign ballooned pages management") till b1123ea6d3 ("mm: balloon:
use general non-lru movable page feature").  d6d86c0a7f
("mm/balloon_compaction: redesign ballooned pages management") was
backported to 3.12, so the broken kernels are stable kernels [3.12 -
4.7].

There was a subtle race between dropping the page lock of the newpage in
__unmap_and_move() and checking for __is_movable_balloon_page(newpage).

Just after dropping this page lock, virtio-balloon could go ahead and
deflate the newpage, effectively dequeueing it and clearing PageBalloon,
in turn making __is_movable_balloon_page(newpage) fail.

This resulted in dropping the reference of the newpage via
putback_lru_page(newpage) instead of put_page(newpage), leading to
page->lru getting modified and a !LRU page ending up in the LRU lists.
With 195a8c43e9 ("virtio-balloon: deflate via a page list")
backported, one would suddenly get corrupted lists in
release_pages_balloon():

- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100

Nowadays this race is no longer possible, but it is hidden behind very
ugly handling of __ClearPageMovable() and __PageMovable().

__ClearPageMovable() will not make __PageMovable() fail, only
PageMovable().  So the new check (__PageMovable(newpage)) will still
hold even after newpage was dequeued by virtio-balloon.

If anybody would ever change that special handling, the BUG would be
introduced again.  So instead, make it explicit and use the information
of the original isolated page before migration.

This patch can be backported fairly easy to stable kernels (in contrast
to the refactoring).

Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com
Fixes: d6d86c0a7f ("mm/balloon_compaction: redesign ballooned pages management")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Vratislav Bendel <vbendel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vratislav Bendel <vbendel@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>	[3.12 - 4.7]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:15 +01:00
Naoya Horiguchi
ced41d9d6a mm: hwpoison: use do_send_sig_info() instead of force_sig()
commit 6376360ecbe525a9c17b3d081dfd88ba3e4ed65b upstream.

Currently memory_failure() is racy against process's exiting, which
results in kernel crash by null pointer dereference.

The root cause is that memory_failure() uses force_sig() to forcibly
kill asynchronous (meaning not in the current context) processes.  As
discussed in thread https://lkml.org/lkml/2010/6/8/236 years ago for OOM
fixes, this is not a right thing to do.  OOM solves this issue by using
do_send_sig_info() as done in commit d2d393099d ("signal:
oom_kill_task: use SEND_SIG_FORCED instead of force_sig()"), so this
patch is suggesting to do the same for hwpoison.  do_send_sig_info()
properly accesses to siglock with lock_task_sighand(), so is free from
the reported race.

I confirmed that the reported bug reproduces with inserting some delay
in kill_procs(), and it never reproduces with this patch.

Note that memory_failure() can send another type of signal using
force_sig_mceerr(), and the reported race shouldn't happen on it because
force_sig_mceerr() is called only for synchronous processes (i.e.
BUS_MCEERR_AR happens only when some process accesses to the corrupted
memory.)

Link: http://lkml.kernel.org/r/20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:15 +01:00
Shakeel Butt
b6f534ab69 mm, oom: fix use-after-free in oom_kill_process
commit cefc7ef3c87d02fc9307835868ff721ea12cc597 upstream.

Syzbot instance running on upstream kernel found a use-after-free bug in
oom_kill_process.  On further inspection it seems like the process
selected to be oom-killed has exited even before reaching
read_lock(&tasklist_lock) in oom_kill_process().  More specifically the
tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task()
and the put_task_struct within for_each_thread() frees the tsk and
for_each_thread() tries to access the tsk.  The easiest fix is to do
get/put across the for_each_thread() on the selected task.

Now the next question is should we continue with the oom-kill as the
previously selected task has exited? However before adding more
complexity and heuristics, let's answer why we even look at the children
of oom-kill selected task? The select_bad_process() has already selected
the worst process in the system/memcg.  Due to race, the selected
process might not be the worst at the kill time but does that matter?
The userspace can use the oom_score_adj interface to prefer children to
be killed before the parent.  I looked at the history but it seems like
this is there before git history.

Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com
Reported-by: syzbot+7fbbfa368521945f0e3d@syzkaller.appspotmail.com
Fixes: 6b0c81b3be ("mm, oom: reduce dependency on tasklist_lock")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:15 +01:00
Oscar Salvador
d9f4d88d56 mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
commit eeb0efd071d821a88da3fbd35f2d478f40d3b2ea upstream.

This is the same sort of error we saw in commit 17e2e7d7e1b8 ("mm,
page_alloc: fix has_unmovable_pages for HugePages").

Gigantic hugepages cross several memblocks, so it can be that the page
we get in scan_movable_pages() is a page-tail belonging to a
1G-hugepage.  If that happens, page_hstate()->size_to_hstate() will
return NULL, and we will blow up in hugepage_migration_supported().

The splat is as follows:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 1350 Comm: bash Tainted: G            E     5.0.0-rc1-mm1-1-default+ #27
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:__offline_pages+0x6ae/0x900
  Call Trace:
   memory_subsys_offline+0x42/0x60
   device_offline+0x80/0xa0
   state_store+0xab/0xc0
   kernfs_fop_write+0x102/0x180
   __vfs_write+0x26/0x190
   vfs_write+0xad/0x1b0
   ksys_write+0x42/0x90
   do_syscall_64+0x5b/0x180
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  Modules linked in: af_packet(E) xt_tcpudp(E) ipt_REJECT(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) iptable_filter(E) ip_tables(E) x_tables(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) bochs_drm(E) ttm(E) aesni_intel(E) drm_kms_helper(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) drm(E) virtio_net(E) syscopyarea(E) sysfillrect(E) net_failover(E) sysimgblt(E) pcspkr(E) failover(E) i2c_piix4(E) fb_sys_fops(E) parport_pc(E) parport(E) button(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) sd_mod(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E) scsi_mod(E) autofs4(E)

[akpm@linux-foundation.org: fix brace layout, per David.  Reduce indentation]
Link: http://lkml.kernel.org/r/20190122154407.18417-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:15 +01:00
Tetsuo Handa
7e70ddc332 oom, oom_reaper: do not enqueue same task twice
commit 9bcdeb51bd7d2ae9fe65ea4d60643d2aeef5bfe3 upstream.

Arkadiusz reported that enabling memcg's group oom killing causes
strange memcg statistics where there is no task in a memcg despite the
number of tasks in that memcg is not 0.  It turned out that there is a
bug in wake_oom_reaper() which allows enqueuing same task twice which
makes impossible to decrease the number of tasks in that memcg due to a
refcount leak.

This bug existed since the OOM reaper became invokable from
task_will_free_mem(current) path in out_of_memory() in Linux 4.7,

  T1@P1     |T2@P1     |T3@P1     |OOM reaper
  ----------+----------+----------+------------
                                   # Processing an OOM victim in a different memcg domain.
                        try_charge()
                          mem_cgroup_out_of_memory()
                            mutex_lock(&oom_lock)
             try_charge()
               mem_cgroup_out_of_memory()
                 mutex_lock(&oom_lock)
  try_charge()
    mem_cgroup_out_of_memory()
      mutex_lock(&oom_lock)
                            out_of_memory()
                              oom_kill_process(P1)
                                do_send_sig_info(SIGKILL, @P1)
                                mark_oom_victim(T1@P1)
                                wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
                            mutex_unlock(&oom_lock)
                 out_of_memory()
                   mark_oom_victim(T2@P1)
                   wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
                 mutex_unlock(&oom_lock)
      out_of_memory()
        mark_oom_victim(T1@P1)
        wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
      mutex_unlock(&oom_lock)
                                   # Completed processing an OOM victim in a different memcg domain.
                                   spin_lock(&oom_reaper_lock)
                                   # T1P1 is dequeued.
                                   spin_unlock(&oom_reaper_lock)

but memcg's group oom killing made it easier to trigger this bug by
calling wake_oom_reaper() on the same task from one out_of_memory()
request.

Fix this bug using an approach used by commit 855b018325 ("oom,
oom_reaper: disable oom_reaper for oom_kill_allocating_task").  As a
side effect of this patch, this patch also avoids enqueuing multiple
threads sharing memory via task_will_free_mem(current) path.

Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
Fixes: af8e15cc85 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: Jay Kamat <jgkamat@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:14 +01:00
Andrea Arcangeli
15033ca6bd mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
commit 1ac25013fb9e4ed595cd608a406191e93520881e upstream.

hugetlb needs the same fix as faultin_nopage (which was applied in
commit 96312e6128 ("mm/gup.c: teach get_user_pages_unlocked to handle
FOLL_NOWAIT")) or KVM hangs because it thinks the mmap_sem was already
released by hugetlb_fault() if it returned VM_FAULT_RETRY, but it wasn't
in the FOLL_NOWAIT case.

Link: http://lkml.kernel.org/r/20190109020203.26669-2-aarcange@redhat.com
Fixes: ce53053ce3 ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-06 17:30:14 +01:00
Greg Kroah-Hartman
18ba00a34e This is the 4.19.19 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxSoGgACgkQONu9yGCS
 aT5IShAAmr4efAHaepYtsEM5q7hpYwdIXauhfmVzYS1KITQ0L+FSO6b6r18/X/Xu
 ZnGr87x9s7rzTwbgW9EK+cPBE7tVI3Tlem1prTSAQLneiv2zN/iTvJigWGUmifB/
 +xldQxiM1S8j6dlVTu0lFl9P8voJ/zFA1II1DV4KJYiPX2lfVis77Tolcd/3TUJ4
 V67abXHeAsLc+bU8kcMxamVievyQndwVlMT4XjStJyl6xy1zDozgiNwphyLqT7yc
 GY0H4jCbFNLfhZlMQgpHanvXzHshbJ4VtMNnjmUApplftzrVf864rgi+sRcoHWK/
 Q6ER8LtgFYoqwG1ZjLUIEMChjhs/Xv+FLHWsCvCIkINyzo0PSgubnbYQVccXkBmB
 XSZ62YTh9sdQtdihWUly0Gr53yMvSn9+ndwJjvBDEErjC9b/D9jOLMcY1L/cCDaQ
 J34dp6ES+6YdQjAu0TEwuuJHMdGR+BvtKgGIL7V6ujxZhuU39eaJ0QLZktsmIO12
 qWQyUKaLA450Qqiqza7foTWAVM6nFu9fd9xsZUKZV6lNnFjKYy/vWjhHsMCR8eKi
 aojGiRVRNnrlIwgk9h7H4EiuRkK3CJFc9jyZhA6u05NLMdIjD9YuLaMLHuSN0R+m
 lSgdmM3dVC4gyosKZHGdwzX/ytHTQiA8QRb4SWEnck7piKyWplg=
 =/rR0
 -----END PGP SIGNATURE-----

Merge 4.19.19 into android-4.19

Changes in 4.19.19
	amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs
	net: bridge: Fix ethernet header pointer before check skb forwardable
	net: Fix usage of pskb_trim_rcsum
	net: phy: marvell: Errata for mv88e6390 internal PHYs
	net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling
	net/sched: act_tunnel_key: fix memory leak in case of action replace
	net_sched: refetch skb protocol for each filter
	openvswitch: Avoid OOB read when parsing flow nlattrs
	vhost: log dirty page correctly
	mlxsw: pci: Increase PCI SW reset timeout
	net: ipv4: Fix memory leak in network namespace dismantle
	mlxsw: spectrum_fid: Update dummy FID index
	mlxsw: pci: Ring CQ's doorbell before RDQ's
	net/sched: cls_flower: allocate mask dynamically in fl_change()
	udp: with udp_segment release on error path
	ip6_gre: fix tunnel list corruption for x-netns
	erspan: build the header with the right proto according to erspan_ver
	net: phy: marvell: Fix deadlock from wrong locking
	ip6_gre: update version related info when changing link
	tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state
	mei: me: mark LBG devices as having dma support
	mei: me: add denverton innovation engine device IDs
	USB: leds: fix regression in usbport led trigger
	USB: serial: simple: add Motorola Tetra TPG2200 device id
	USB: serial: pl2303: add new PID to support PL2303TB
	ceph: clear inode pointer when snap realm gets dropped by its inode
	ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages
	ASoC: rt5514-spi: Fix potential NULL pointer dereference
	ASoC: tlv320aic32x4: Kernel OOPS while entering DAPM standby mode
	clk: socfpga: stratix10: fix rate calculation for pll clocks
	clk: socfpga: stratix10: fix naming convention for the fixed-clocks
	inotify: Fix fd refcount leak in inotify_add_watch().
	ALSA: hda/realtek - Fix typo for ALC225 model
	ALSA: hda - Add mute LED support for HP ProBook 470 G5
	ARCv2: lib: memeset: fix doing prefetchw outside of buffer
	ARC: adjust memblock_reserve of kernel memory
	ARC: perf: map generic branches to correct hardware condition
	s390/mm: always force a load of the primary ASCE on context switch
	s390/early: improve machine detection
	s390/smp: fix CPU hotplug deadlock with CPU rescan
	misc: ibmvsm: Fix potential NULL pointer dereference
	char/mwave: fix potential Spectre v1 vulnerability
	mmc: dw_mmc-bluefield: : Fix the license information
	mmc: meson-gx: Free irq in release() callback
	staging: rtl8188eu: Add device code for D-Link DWA-121 rev B1
	tty: Handle problem if line discipline does not have receive_buf
	uart: Fix crash in uart_write and uart_put_char
	tty/n_hdlc: fix __might_sleep warning
	hv_balloon: avoid touching uninitialized struct page during tail onlining
	Drivers: hv: vmbus: Check for ring when getting debug info
	vgacon: unconfuse vc_origin when using soft scrollback
	CIFS: Fix possible hang during async MTU reads and writes
	CIFS: Fix credits calculations for reads with errors
	CIFS: Fix credit calculation for encrypted reads with errors
	CIFS: Do not reconnect TCP session in add_credits()
	smb3: add credits we receive from oplock/break PDUs
	Input: xpad - add support for SteelSeries Stratus Duo
	Input: input_event - provide override for sparc64
	Input: uinput - fix undefined behavior in uinput_validate_absinfo()
	acpi/nfit: Block function zero DSMs
	acpi/nfit: Fix command-supported detection
	scsi: ufs: Use explicit access size in ufshcd_dump_regs
	dm thin: fix passdown_double_checking_shared_status()
	dm crypt: fix parsing of extended IV arguments
	drm/amdgpu: Add APTX quirk for Lenovo laptop
	KVM: x86: Fix single-step debugging
	KVM: x86: Fix PV IPIs for 32-bit KVM host
	KVM: x86: WARN_ONCE if sending a PV IPI returns a fatal error
	kvm: x86/vmx: Use kzalloc for cached_vmcs12
	KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
	x86/pkeys: Properly copy pkey state at fork()
	x86/selftests/pkeys: Fork() to check for state being preserved
	x86/kaslr: Fix incorrect i8254 outb() parameters
	x86/entry/64/compat: Fix stack switching for XEN PV
	posix-cpu-timers: Unbreak timer rearming
	net: sun: cassini: Cleanup license conflict
	irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
	can: dev: __can_get_echo_skb(): fix bogous check for non-existing skb by removing it
	can: bcm: check timer values before ktime conversion
	can: flexcan: fix NULL pointer exception during bringup
	vt: make vt_console_print() compatible with the unicode screen buffer
	vt: always call notifier with the console lock held
	vt: invoke notifier on screen size change
	drm/meson: Fix atomic mode switching regression
	bpf: improve verifier branch analysis
	bpf: add per-insn complexity limit
	bpf: move {prev_,}insn_idx into verifier env
	bpf: move tmp variable into ax register in interpreter
	bpf: enable access to ax register also from verifier rewrite
	bpf: restrict map value pointer arithmetic for unprivileged
	bpf: restrict stack pointer arithmetic for unprivileged
	bpf: restrict unknown scalars of mixed signed bounds for unprivileged
	bpf: fix check_map_access smin_value test when pointer contains offset
	bpf: prevent out of bounds speculation on pointer arithmetic
	bpf: fix sanitation of alu op with pointer / scalar type from different paths
	bpf: fix inner map masking to prevent oob under speculation
	s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU
	nvmet-rdma: Add unlikely for response allocated check
	nvmet-rdma: fix null dereference under heavy load
	Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
	usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
	ide: fix a typo in the settings proc file name
	Input: input_event - fix the CONFIG_SPARC64 mixup
	Linux 4.19.19

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-31 08:29:40 +01:00
Michal Hocko
6bab957396 Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
commit 4aa9fc2a435abe95a1e8d7f8c7b3d6356514b37a upstream.

This reverts commit 2830bf6f05fb3e05bc4743274b806c821807a684.

The underlying assumption that one sparse section belongs into a single
numa node doesn't hold really. Robert Shteynfeld has reported a boot
failure. The boot log was not captured but his memory layout is as
follows:

  Early memory node ranges
    node   1: [mem 0x0000000000001000-0x0000000000090fff]
    node   1: [mem 0x0000000000100000-0x00000000dbdf8fff]
    node   1: [mem 0x0000000100000000-0x0000001423ffffff]
    node   0: [mem 0x0000001424000000-0x0000002023ffffff]

This means that node0 starts in the middle of a memory section which is
also in node1.  memmap_init_zone tries to initialize padding of a
section even when it is outside of the given pfn range because there are
code paths (e.g.  memory hotplug) which assume that the full worth of
memory section is always initialized.

In this particular case, though, such a range is already intialized and
most likely already managed by the page allocator.  Scribbling over
those pages corrupts the internal state and likely blows up when any of
those pages gets used.

Reported-by: Robert Shteynfeld <robert.shteynfeld@gmail.com>
Fixes: 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the full memory section")
Cc: stable@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31 08:14:41 +01:00
Greg Kroah-Hartman
26bf816608 This is the 4.19.18 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlxMGy0ACgkQONu9yGCS
 aT5ppQ/8COjyZg1aTrCrd0ttMHYotw3Lb4B6E/SCf2ub4X38SxGz9irhQ7r2FKdK
 w0ZXlLOF2ddqWe6BUnIfWago4Pk1GBpg3bgnp5XyYTjlJbfI2yZ9ggiO0iNYBPaL
 fN2JwM9eze/7cDlpYbhwGpF4+Wz8wTrzh+NIputcvC6n3SQH/cTGmOUa9rlamQju
 uukkvLanAYY3sqDCl4B415Ds44ROU4filqHYIkvZC81jc3Q0YZ8M7cTmpLcDQKGz
 8Z+Veil07jEM9bF2W8iX79nwxMT+edFC62HMuRCoxJKq+1kccw1TVMWpQ8TWbv13
 zeLOqXxNP6VcNaC251q3QzlInRDp1dtr8KtzA/OG0WFnZBTEDng/iChhiL8qZt0R
 9+Sz7n9uZ5pMRK3tr03Ccjg3AneKWRqad2iaTB/kOwAdu7Uqxz8U9qUuRDFPV7OY
 KTMCCfdS8XpMHl/S+Cvg2dqSNiBEkNmowYO6NvQClG0aoN4/6wH+m2TZ0hCl6PVq
 pNFOTJmp7FOaztEZC4rqW8DoOGeGaNo5DP9A2XKKDR20F7EiAE437ApEQ4p5QGVk
 ek4uslZkwJWU/UOzXRl/Hoz0OlI0ixsdZy1vw88HCl7SD1E7xHJpnRUkOjigTT1Q
 nbCt0Nm/A2+c1tKbzU+PVW8FtIbutZhW1BtrqaIbbHr9NBTICR0=
 =Yg+/
 -----END PGP SIGNATURE-----

Merge 4.19.18 into android-4.19

Changes in 4.19.18
	ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
	mlxsw: spectrum: Disable lag port TX before removing it
	mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
	net: dsa: mv88x6xxx: mv88e6390 errata
	net, skbuff: do not prefer skb allocation fails early
	qmi_wwan: add MTU default to qmap network interface
	r8169: Add support for new Realtek Ethernet
	ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
	net: clear skb->tstamp in bridge forwarding path
	netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets
	gpio: pl061: Move irq_chip definition inside struct pl061
	drm/amd/display: Guard against null stream_state in set_crc_source
	drm/amdkfd: fix interrupt spin lock
	ixgbe: allow IPsec Tx offload in VEPA mode
	platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey
	e1000e: allow non-monotonic SYSTIM readings
	usb: typec: tcpm: Do not disconnect link for self powered devices
	selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
	of: overlay: add missing of_node_put() after add new node to changeset
	writeback: don't decrement wb->refcnt if !wb->bdi
	serial: set suppress_bind_attrs flag only if builtin
	bpf: Allow narrow loads with offset > 0
	ALSA: oxfw: add support for APOGEE duet FireWire
	x86/mce: Fix -Wmissing-prototypes warnings
	MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
	crypto: ecc - regularize scalar for scalar multiplication
	arm64: perf: set suppress_bind_attrs flag to true
	drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
	clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
	samples: bpf: fix: error handling regarding kprobe_events
	usb: gadget: udc: renesas_usb3: add a safety connection way for forced_b_device
	fpga: altera-cvp: fix probing for multiple FPGAs on the bus
	selinux: always allow mounting submounts
	ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
	scsi: qedi: Check for session online before getting iSCSI TLV data.
	drm/amdgpu: Reorder uvd ring init before uvd resume
	rxe: IB_WR_REG_MR does not capture MR's iova field
	efi/libstub: Disable some warnings for x86{,_64}
	jffs2: Fix use of uninitialized delayed_work, lockdep breakage
	clk: imx: make mux parent strings const
	pstore/ram: Do not treat empty buffers as valid
	media: uvcvideo: Refactor teardown of uvc on USB disconnect
	powerpc/xmon: Fix invocation inside lock region
	powerpc/pseries/cpuidle: Fix preempt warning
	media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
	ASoC: use dma_ops of parent device for acp_audio_dma
	media: venus: core: Set dma maximum segment size
	staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io'
	net: call sk_dst_reset when set SO_DONTROUTE
	scsi: target: use consistent left-aligned ASCII INQUIRY data
	scsi: target/core: Make sure that target_wait_for_sess_cmds() waits long enough
	selftests: do not macro-expand failed assertion expressions
	arm64: kasan: Increase stack size for KASAN_EXTRA
	clk: imx6q: reset exclusive gates on init
	arm64: Fix minor issues with the dcache_by_line_op macro
	bpf: relax verifier restriction on BPF_MOV | BPF_ALU
	kconfig: fix file name and line number of warn_ignored_character()
	kconfig: fix memory leak when EOF is encountered in quotation
	mmc: atmel-mci: do not assume idle after atmci_request_end
	btrfs: volumes: Make sure there is no overlap of dev extents at mount time
	btrfs: alloc_chunk: fix more DUP stripe size handling
	btrfs: fix use-after-free due to race between replace start and cancel
	btrfs: improve error handling of btrfs_add_link
	tty/serial: do not free trasnmit buffer page under port lock
	perf intel-pt: Fix error with config term "pt=0"
	perf tests ARM: Disable breakpoint tests 32-bit
	perf svghelper: Fix unchecked usage of strncpy()
	perf parse-events: Fix unchecked usage of strncpy()
	perf vendor events intel: Fix Load_Miss_Real_Latency on SKL/SKX
	netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set
	netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine
	netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
	x86/topology: Use total_cpus for max logical packages calculation
	dm crypt: use u64 instead of sector_t to store iv_offset
	dm kcopyd: Fix bug causing workqueue stalls
	perf stat: Avoid segfaults caused by negated options
	tools lib subcmd: Don't add the kernel sources to the include path
	dm snapshot: Fix excessive memory usage and workqueue stalls
	perf cs-etm: Correct packets swapping in cs_etm__flush()
	perf tools: Add missing sigqueue() prototype for systems lacking it
	perf tools: Add missing open_memstream() prototype for systems lacking it
	quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.
	clocksource/drivers/integrator-ap: Add missing of_node_put()
	dm: Check for device sector overflow if CONFIG_LBDAF is not set
	Bluetooth: btusb: Add support for Intel bluetooth device 8087:0029
	ALSA: bebob: fix model-id of unit for Apogee Ensemble
	sysfs: Disable lockdep for driver bind/unbind files
	IB/usnic: Fix potential deadlock
	scsi: mpt3sas: fix memory ordering on 64bit writes
	scsi: smartpqi: correct lun reset issues
	ath10k: fix peer stats null pointer dereference
	scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
	scsi: megaraid: fix out-of-bound array accesses
	iomap: don't search past page end in iomap_is_partially_uptodate
	ocfs2: fix panic due to unrecovered local alloc
	mm/page-writeback.c: don't break integrity writeback on ->writepage() error
	mm/swap: use nr_node_ids for avail_lists in swap_info_struct
	userfaultfd: clear flag if remap event not enabled
	mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
	iwlwifi: mvm: Send LQ command as async when necessary
	Bluetooth: Fix unnecessary error message for HCI request completion
	ipmi: fix use-after-free of user->release_barrier.rda
	ipmi: msghandler: Fix potential Spectre v1 vulnerabilities
	ipmi: Prevent use-after-free in deliver_response
	ipmi:ssif: Fix handling of multi-part return messages
	ipmi: Don't initialize anything in the core until something uses it
	Linux 4.19.18

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-26 11:58:37 +01:00
Aaron Lu
b0cd52e644 mm/swap: use nr_node_ids for avail_lists in swap_info_struct
[ Upstream commit 66f71da9dd38af17dc17209cdde7987d4679a699 ]

Since a2468cc9bf ("swap: choose swap device according to numa node"),
avail_lists field of swap_info_struct is changed to an array with
MAX_NUMNODES elements.  This made swap_info_struct size increased to 40KiB
and needs an order-4 page to hold it.

This is not optimal in that:
1 Most systems have way less than MAX_NUMNODES(1024) nodes so it
  is a waste of memory;
2 It could cause swapon failure if the swap device is swapped on
  after system has been running for a while, due to no order-4
  page is available as pointed out by Vasily Averin.

Solve the above two issues by using nr_node_ids(which is the actual
possible node number the running system has) for avail_lists instead of
MAX_NUMNODES.

nr_node_ids is unknown at compile time so can't be directly used when
declaring this array.  What I did here is to declare avail_lists as zero
element array and allocate space for it when allocating space for
swap_info_struct.  The reason why keep using array but not pointer is
plist_for_each_entry needs the field to be part of the struct, so pointer
will not work.

This patch is on top of Vasily Averin's fix commit.  I think the use of
kvzalloc for swap_info_struct is still needed in case nr_node_ids is
really big on some systems.

Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:43 +01:00
Brian Foster
dc15e3fd3f mm/page-writeback.c: don't break integrity writeback on ->writepage() error
[ Upstream commit 3fa750dcf29e8606e3969d13d8e188cc1c0f511d ]

write_cache_pages() is used in both background and integrity writeback
scenarios by various filesystems.  Background writeback is mostly
concerned with cleaning a certain number of dirty pages based on various
mm heuristics.  It may not write the full set of dirty pages or wait for
I/O to complete.  Integrity writeback is responsible for persisting a set
of dirty pages before the writeback job completes.  For example, an
fsync() call must perform integrity writeback to ensure data is on disk
before the call returns.

write_cache_pages() unconditionally breaks out of its processing loop in
the event of a ->writepage() error.  This is fine for background
writeback, which had no strict requirements and will eventually come
around again.  This can cause problems for integrity writeback on
filesystems that might need to clean up state associated with failed page
writeouts.  For example, XFS performs internal delayed allocation
accounting before returning a ->writepage() error, where applicable.  If
the current writeback happens to be associated with an unmount and
write_cache_pages() completes the writeback prematurely due to error, the
filesystem is unmounted in an inconsistent state if dirty+delalloc pages
still exist.

To handle this problem, update write_cache_pages() to always process the
full set of pages for integrity writeback regardless of ->writepage()
errors.  Save the first encountered error and return it to the caller once
complete.  This facilitates XFS (or any other fs that expects integrity
writeback to process the entire set of dirty pages) to clean up its
internal state completely in the event of persistent mapping errors.
Background writeback continues to exit on the first error encountered.

[akpm@linux-foundation.org: fix typo in comment]
Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.com
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:43 +01:00
Greg Kroah-Hartman
976f78d572 This is the 4.19.16 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlw/nGYACgkQONu9yGCS
 aT5/LA//bP/+XrOaB6YIkiM7EfhWTuATY6DOkhwT7kpIgRXMR4FvyTA3o7iEz0DE
 5HfSL2wUpF+UZ0sC8c+zFCaNqhMNTwl95J6w4YI3N4V6IsTWxDYQOosQLU1y11Zw
 w5SV1FXqvKnbPchHehg/toDORs+sryw9QbydTXOPukqEQ1J9Kx8xtcyNivpvccVs
 /Jn+MNwnDZXWgw1gyx4/BcbtSVnu9RgLdtXSyBBUfZmZxy4Tx+e+ckfp+sd0TpE7
 H7QPrMZHZys7EVKfvP1SWOJgStJNGav869Klj8HAZm3rI0R3EhMZBEIxG96HsxFd
 XOqRfn3Yarl0OQHKggRJQi0EbcOAEUAzWgJKxKFaoqBJyYVoQivp3XJvF+2B56Yb
 sg4EISWR2OXdO4ER1eYbPyDL+ZO+P0C5eQ16NRly1PifiUk1iHs1dyGg266GU4Tj
 cHWmdt743nMNCndQ+cUnHAqbJS+UQ6Y/96bOxZlKei93fQfMqynUZBV9FN6DejJt
 mMNqwV0aEEPlTx37rvExrxS30ydYg1lnF9BY7QP8r71RjpdXgB8fjLN3W2S21SWv
 04zMSg9kAKgC3vRDc2vr7nZ9zkeujD/VBVp3HdTLU9gDb1xUL4MqdXNnTiUOzS29
 wBWBi7+uiPhSC282kNM08PE1SDq6WtKU9WixJxLP9jYZccMjJDk=
 =Fi9w
 -----END PGP SIGNATURE-----

Merge 4.19.16 into android-4.19

Changes in 4.19.16
	Btrfs: fix deadlock when using free space tree due to block group creation
	staging: rtl8188eu: Fix module loading from tasklet for CCMP encryption
	staging: rtl8188eu: Fix module loading from tasklet for WEP encryption
	cpufreq: scmi: Fix frequency invariance in slow path
	x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
	ALSA: hda/realtek - Support Dell headset mode for New AIO platform
	ALSA: hda/realtek - Add unplug function into unplug state of Headset Mode for ALC225
	ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225
	CIFS: Fix adjustment of credits for MTU requests
	CIFS: Do not set credits to 1 if the server didn't grant anything
	CIFS: Do not hide EINTR after sending network packets
	CIFS: Fix credit computation for compounded requests
	cifs: Fix potential OOB access of lock element array
	usb: cdc-acm: send ZLP for Telit 3G Intel based modems
	USB: storage: don't insert sane sense for SPC3+ when bad sense specified
	USB: storage: add quirk for SMI SM3350
	USB: Add USB_QUIRK_DELAY_CTRL_MSG quirk for Corsair K70 RGB
	slab: alien caches must not be initialized if the allocation of the alien cache failed
	mm/usercopy.c: no check page span for stack objects
	mm, memcg: fix reclaim deadlock with writeback
	ACPI: power: Skip duplicate power resource references in _PRx
	ACPI / PMIC: xpower: Fix TS-pin current-source handling
	ACPI/IORT: Fix rc_dma_get_range()
	i2c: dev: prevent adapter retries and timeout being set as minus value
	mtd: rawnand: qcom: fix memory corruption that causes panic
	vfio/type1: Fix unmap overflow off-by-one
	drm/amdgpu: Add new VegaM pci id
	PCI: dwc: Use interrupt masking instead of disabling
	PCI: dwc: Take lock when ACKing an interrupt
	PCI: dwc: Move interrupt acking into the proper callback
	drm/amd/display: Fix MST dp_blank REG_WAIT timeout
	drm/fb_helper: Allow leaking fbdev smem_start
	drm/fb-helper: Partially bring back workaround for bugs of SDL 1.2
	drm/i915: Unwind failure on pinning the gen7 ppgtt
	drm/amdgpu: Don't ignore rc from drm_dp_mst_topology_mgr_resume()
	drm/amdgpu: Don't fail resume process if resuming atomic state fails
	rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set
	ext4: make sure enough credits are reserved for dioread_nolock writes
	ext4: fix a potential fiemap/page fault deadlock w/ inline_data
	ext4: avoid kernel warning when writing the superblock to a dead device
	ext4: use ext4_write_inode() when fsyncing w/o a journal
	ext4: track writeback errors using the generic tracking infrastructure
	ext4: fix special inode number checks in __ext4_iget()
	mm: page_mapped: don't assume compound page is huge or THP
	sunrpc: use-after-free in svc_process_common()
	KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less
	arm64: compat: Don't pull syscall number from regs in arm_compat_syscall
	Btrfs: fix access to available allocation bits when starting balance
	Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation
	Btrfs: use nofs context when initializing security xattrs to avoid deadlock
	Linux 4.19.16

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-16 22:17:03 +01:00
Jan Stancek
160f79c0a0 mm: page_mapped: don't assume compound page is huge or THP
commit 8ab88c7169b7fba98812ead6524b9d05bc76cf00 upstream.

LTP proc01 testcase has been observed to rarely trigger crashes
on arm64:
    page_mapped+0x78/0xb4
    stable_page_flags+0x27c/0x338
    kpageflags_read+0xfc/0x164
    proc_reg_read+0x7c/0xb8
    __vfs_read+0x58/0x178
    vfs_read+0x90/0x14c
    SyS_read+0x60/0xc0

The issue is that page_mapped() assumes that if compound page is not
huge, then it must be THP.  But if this is 'normal' compound page
(COMPOUND_PAGE_DTOR), then following loop can keep running (for
HPAGE_PMD_NR iterations) until it tries to read from memory that isn't
mapped and triggers a panic:

        for (i = 0; i < hpage_nr_pages(page); i++) {
                if (atomic_read(&page[i]._mapcount) >= 0)
                        return true;
	}

I could replicate this on x86 (v4.20-rc4-98-g60b548237fed) only
with a custom kernel module [1] which:
 - allocates compound page (PAGEC) of order 1
 - allocates 2 normal pages (COPY), which are initialized to 0xff (to
   satisfy _mapcount >= 0)
 - 2 PAGEC page structs are copied to address of first COPY page
 - second page of COPY is marked as not present
 - call to page_mapped(COPY) now triggers fault on access to 2nd COPY
   page at offset 0x30 (_mapcount)

[1] https://github.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c

Fix the loop to iterate for "1 << compound_order" pages.

Kirrill said "IIRC, sound subsystem can producuce custom mapped compound
pages".

Link: http://lkml.kernel.org/r/c440d69879e34209feba21e12d236d06bc0a25db.1543577156.git.jstancek@redhat.com
Fixes: e1534ae950 ("mm: differentiate page_mapped() from page_mapcount() for compound pages")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Debugged-by: Laszlo Ersek <lersek@redhat.com>
Suggested-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-16 22:04:36 +01:00
Michal Hocko
97b02b6324 mm, memcg: fix reclaim deadlock with writeback
commit 63f3655f950186752236bb88a22f8252c11ce394 upstream.

Liu Bo has experienced a deadlock between memcg (legacy) reclaim and the
ext4 writeback

  task1:
    wait_on_page_bit+0x82/0xa0
    shrink_page_list+0x907/0x960
    shrink_inactive_list+0x2c7/0x680
    shrink_node_memcg+0x404/0x830
    shrink_node+0xd8/0x300
    do_try_to_free_pages+0x10d/0x330
    try_to_free_mem_cgroup_pages+0xd5/0x1b0
    try_charge+0x14d/0x720
    memcg_kmem_charge_memcg+0x3c/0xa0
    memcg_kmem_charge+0x7e/0xd0
    __alloc_pages_nodemask+0x178/0x260
    alloc_pages_current+0x95/0x140
    pte_alloc_one+0x17/0x40
    __pte_alloc+0x1e/0x110
    alloc_set_pte+0x5fe/0xc20
    do_fault+0x103/0x970
    handle_mm_fault+0x61e/0xd10
    __do_page_fault+0x252/0x4d0
    do_page_fault+0x30/0x80
    page_fault+0x28/0x30

  task2:
    __lock_page+0x86/0xa0
    mpage_prepare_extent_to_map+0x2e7/0x310 [ext4]
    ext4_writepages+0x479/0xd60
    do_writepages+0x1e/0x30
    __writeback_single_inode+0x45/0x320
    writeback_sb_inodes+0x272/0x600
    __writeback_inodes_wb+0x92/0xc0
    wb_writeback+0x268/0x300
    wb_workfn+0xb4/0x390
    process_one_work+0x189/0x420
    worker_thread+0x4e/0x4b0
    kthread+0xe6/0x100
    ret_from_fork+0x41/0x50

He adds
 "task1 is waiting for the PageWriteback bit of the page that task2 has
  collected in mpd->io_submit->io_bio, and tasks2 is waiting for the
  LOCKED bit the page which tasks1 has locked"

More precisely task1 is handling a page fault and it has a page locked
while it charges a new page table to a memcg.  That in turn hits a
memory limit reclaim and the memcg reclaim for legacy controller is
waiting on the writeback but that is never going to finish because the
writeback itself is waiting for the page locked in the #PF path.  So
this is essentially ABBA deadlock:

                                        lock_page(A)
                                        SetPageWriteback(A)
                                        unlock_page(A)
  lock_page(B)
                                        lock_page(B)
  pte_alloc_pne
    shrink_page_list
      wait_on_page_writeback(A)
                                        SetPageWriteback(B)
                                        unlock_page(B)

                                        # flush A, B to clear the writeback

This accumulating of more pages to flush is used by several filesystems
to generate a more optimal IO patterns.

Waiting for the writeback in legacy memcg controller is a workaround for
pre-mature OOM killer invocations because there is no dirty IO
throttling available for the controller.  There is no easy way around
that unfortunately.  Therefore fix this specific issue by pre-allocating
the page table outside of the page lock.  We have that handy
infrastructure for that already so simply reuse the fault-around pattern
which already does this.

There are probably other hidden __GFP_ACCOUNT | GFP_KERNEL allocations
from under a fs page locked but they should be really rare.  I am not
aware of a better solution unfortunately.

[akpm@linux-foundation.org: fix mm/memory.c:__do_fault()]
[akpm@linux-foundation.org: coding-style fixes]
[mhocko@kernel.org: enhance comment, per Johannes]
  Link: http://lkml.kernel.org/r/20181214084948.GA5624@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20181213092221.27270-1-mhocko@kernel.org
Fixes: c3b94f44fc ("memcg: further prevent OOM with too many dirty pages")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Liu Bo <bo.liu@linux.alibaba.com>
Debugged-by: Liu Bo <bo.liu@linux.alibaba.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-16 22:04:33 +01:00
Qian Cai
8a4b6e8cb7 mm/usercopy.c: no check page span for stack objects
commit 7bff3c06997374fb9b9991536a547b840549a813 upstream.

It is easy to trigger this with CONFIG_HARDENED_USERCOPY_PAGESPAN=y,

  usercopy: Kernel memory overwrite attempt detected to spans multiple pages (offset 0, size 23)!
  kernel BUG at mm/usercopy.c:102!

For example,

print_worker_info
char name[WQ_NAME_LEN] = { };
char desc[WORKER_DESC_LEN] = { };
  probe_kernel_read(name, wq->name, sizeof(name) - 1);
  probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
    __copy_from_user_inatomic
      check_object_size
        check_heap_object
          check_page_span

This is because on-stack variables could cross PAGE_SIZE boundary, and
failed this check,

if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK) ==
	   ((unsigned long)end & (unsigned long)PAGE_MASK)))

ptr = FFFF889007D7EFF8
end = FFFF889007D7F00E

Hence, fix it by checking if it is a stack object first.

[keescook@chromium.org: improve comments after reorder]
  Link: http://lkml.kernel.org/r/20190103165151.GA32845@beast
Link: http://lkml.kernel.org/r/20181231030254.99441-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-16 22:04:33 +01:00
Christoph Lameter
f928ca3917 slab: alien caches must not be initialized if the allocation of the alien cache failed
commit 09c2e76ed734a1d36470d257a778aaba28e86531 upstream.

Callers of __alloc_alien() check for NULL.  We must do the same check in
__alloc_alien_cache to avoid NULL pointer dereferences on allocation
failures.

Link: http://lkml.kernel.org/r/010001680f42f192-82b4e12e-1565-4ee0-ae1f-1e98974906aa-000000@email.amazonses.com
Fixes: 49dfc304ba ("slab: use the lock on alien_cache, instead of the lock on array_cache")
Fixes: c8522a3a58 ("Slab: introduce alloc_alien")
Signed-off-by: Christoph Lameter <cl@linux.com>
Reported-by: syzbot+d6ed4ec679652b4fd4e4@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-16 22:04:33 +01:00
Joel Fernandes (Google)
dec031f640 UPSTREAM: mm/memfd: Add an F_SEAL_FUTURE_WRITE seal to memfd
Android uses ashmem for sharing memory regions.  We are looking forward to
migrating all usecases of ashmem to memfd so that we can possibly remove
the ashmem driver in the future from staging while also benefiting from
using memfd and contributing to it.  Note staging drivers are also not ABI
and generally can be removed at anytime.

One of the main usecases Android has is the ability to create a region and
mmap it as writeable, then add protection against making any "future"
writes while keeping the existing already mmap'ed writeable-region active.
This allows us to implement a usecase where receivers of the shared
memory buffer can get a read-only view, while the sender continues to
write to the buffer.  See CursorWindow documentation in Android for more
details:
https://developer.android.com/reference/android/database/CursorWindow

This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active.

A better way to do F_SEAL_FUTURE_WRITE seal was discussed [1] last week
where we don't need to modify core VFS structures to get the same
behavior of the seal. This solves several side-effects pointed by Andy.
self-tests are provided in later patch to verify the expected semantics.

[1] https://lore.kernel.org/lkml/20181111173650.GA256781@google.com/

[Thanks a lot to Andy for suggestions to improve code]
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: John Stultz <john.stultz@linaro.org>

Change-Id: I6710c045954378f87bfbff6311d372a3b8549064
2019-01-16 09:48:42 -05:00
Joel Fernandes
198aac25b7 Revert "UPSTREAM: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd"
This reverts commit 1dc8ca4429.

Change-Id: I513034073c278e2ae58a53352cc2553a256a7ee0
2019-01-16 09:48:42 -05:00
Joel Fernandes
3a49374afc Revert "UPSTREAM: mm/memfd: make F_SEAL_FUTURE_WRITE seal more robust"
This reverts commit 2e0d7ea44a.

Change-Id: Id8ca9575c75db3eeb06b7aa7217a59c85e55d0ac
2019-01-16 09:48:40 -05:00
Greg Kroah-Hartman
caf54339d3 This is the 4.19.15 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlw6+/8ACgkQONu9yGCS
 aT6VKw/9FUsbfy4MzFMH4XmTn/k9AHhcYdQ+gSEIcJbt/JLT13fU64e/O8QlQ3PF
 5GWNY5ObA+HKlReCufSuW+AuAw5s/FLVaGLn8HZQ/FU27ZgTrGpFjb3vcnYSjsU0
 vurXjstzndiRmpSahNufU6t2X7fkgyd41M94572pyidcT5NcP+ngVICwXtQOsXjH
 QkIaMZHTmr4le0Z1oNvDraNkESJnxo7+D2eJebx5yDReD/Mdm3gAl2q0UkDXpZzk
 qb3tH1oronm7ZfiEBCZYrewxMfz78ugJW3hpOu//JCbrVI2Ja0sBSh3VB6EFceoY
 WI9z8JkZ3xQeLQnCdiabdQ66mGQa9XiLUwj7+sR//P7OduwJEv8HTYpDi8iqA6Vj
 SigQmjEunjSHccqBWaPy1ZMAIXoNWQBC4EJ2erv3pAPyJr2FBw9o2Bmu6JAV18ow
 iX94YnQtllZp8cJsEKEUWEmXZPLcTy6mXLMLoQ922P4p4KRJVQUhde4EeZZLFn27
 6sPwASnrfEW9RS/i1XuxdDPbnMYg6uE0UoRfxp1tAUBKaVArjMglyIAj7t9GA07W
 4480c3AegmDFZ+GxX+w5+duKRZnxBi+sHw8aBbZRi5m9mlxeFCSWSe0hPPRR2LIQ
 fZrFySHmgbl1NtTP4cvZOb7bTxoyfjcIQfiqu7cwNsYGXtbfOuk=
 =A6Ro
 -----END PGP SIGNATURE-----

Merge 4.19.15 into android-4.19

Changes in 4.19.15
	ARM: dts: sun8i: a83t: bananapi-m3: increase vcc-pd voltage to 3.3V
	pinctrl: meson: fix pull enable register calculation
	arm64: dts: mt7622: fix no more console output on rfb1
	powerpc: Fix COFF zImage booting on old powermacs
	powerpc/mm: Fix linux page tables build with some configs
	HID: ite: Add USB id match for another ITE based keyboard rfkill key quirk
	ARM: dts: imx7d-pico: Describe the Wifi clock
	ARM: imx: update the cpu power up timing setting on i.mx6sx
	ARM: dts: imx7d-nitrogen7: Fix the description of the Wifi clock
	IB/mlx5: Block DEVX umem from the non applicable cases
	Input: restore EV_ABS ABS_RESERVED
	powerpc/mm: Fallback to RAM if the altmap is unusable
	drm/amdgpu: Fix DEBUG_LOCKS_WARN_ON(depth <= 0) in amdgpu_ctx.lock
	IB/core: Fix oops in netdev_next_upper_dev_rcu()
	checkstack.pl: fix for aarch64
	xfrm: Fix error return code in xfrm_output_one()
	xfrm: Fix bucket count reported to userspace
	xfrm: Fix NULL pointer dereference in xfrm_input when skb_dst_force clears the dst_entry.
	ieee802154: hwsim: fix off-by-one in parse nested
	netfilter: nf_tables: fix suspicious RCU usage in nft_chain_stats_replace()
	netfilter: seqadj: re-load tcp header pointer after possible head reallocation
	Revert "scsi: qla2xxx: Fix NVMe Target discovery"
	scsi: bnx2fc: Fix NULL dereference in error handling
	Input: omap-keypad - fix idle configuration to not block SoC idle states
	Input: synaptics - enable RMI on ThinkPad T560
	ibmvnic: Convert reset work item mutex to spin lock
	ibmvnic: Fix non-atomic memory allocation in IRQ context
	ieee802154: ca8210: fix possible u8 overflow in ca8210_rx_done
	x86/mm: Fix guard hole handling
	x86/dump_pagetables: Fix LDT remap address marker
	i40e: fix mac filter delete when setting mac address
	ixgbe: Fix race when the VF driver does a reset
	netfilter: ipset: do not call ipset_nest_end after nla_nest_cancel
	netfilter: nat: can't use dst_hold on noref dst
	netfilter: nf_conncount: use rb_link_node_rcu() instead of rb_link_node()
	bnx2x: Clear fip MAC when fcoe offload support is disabled
	bnx2x: Remove configured vlans as part of unload sequence.
	bnx2x: Send update-svid ramrod with retry/poll flags enabled
	scsi: target: iscsi: cxgbit: fix csk leak
	scsi: target: iscsi: cxgbit: add missing spin_lock_init()
	mt76: fix potential NULL pointer dereference in mt76_stop_tx_queues
	x86, hyperv: remove PCI dependency
	drivers: net: xgene: Remove unnecessary forward declarations
	net/tls: Init routines in create_ctx
	w90p910_ether: remove incorrect __init annotation
	net: hns: Incorrect offset address used for some registers.
	net: hns: All ports can not work when insmod hns ko after rmmod.
	net: hns: Some registers use wrong address according to the datasheet.
	net: hns: Fixed bug that netdev was opened twice
	net: hns: Clean rx fbd when ae stopped.
	net: hns: Free irq when exit from abnormal branch
	net: hns: Avoid net reset caused by pause frames storm
	net: hns: Fix ntuple-filters status error.
	net: hns: Add mac pcs config when enable|disable mac
	net: hns: Fix ping failed when use net bridge and send multicast
	mac80211: fix a kernel panic when TXing after TXQ teardown
	SUNRPC: Fix a race with XPRT_CONNECTING
	qed: Fix an error code qed_ll2_start_xmit()
	net: macb: fix random memory corruption on RX with 64-bit DMA
	net: macb: fix dropped RX frames due to a race
	net: macb: add missing barriers when reading descriptors
	lan743x: Expand phy search for LAN7431
	lan78xx: Resolve issue with changing MAC address
	vxge: ensure data0 is initialized in when fetching firmware version information
	nl80211: fix memory leak if validate_pae_over_nl80211() fails
	mac80211: free skb fraglist before freeing the skb
	kbuild: fix false positive warning/error about missing libelf
	m68k: Fix memblock-related crashes
	virtio: fix test build after uio.h change
	lan743x: Remove MAC Reset from initialization
	gpio: mvebu: only fail on missing clk if pwm is actually to be used
	Input: synaptics - enable SMBus for HP EliteBook 840 G4
	net: netxen: fix a missing check and an uninitialized use
	qmi_wwan: Fix qmap header retrieval in qmimux_rx_fixup
	serial/sunsu: fix refcount leak
	auxdisplay: charlcd: fix x/y command parsing
	scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
	scsi: lpfc: do not set queue->page_count to 0 if pc_sli4_params.wqpcnt is invalid
	fork: record start_time late
	zram: fix double free backing device
	hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
	mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
	mm, devm_memremap_pages: kill mapping "System RAM" support
	mm, devm_memremap_pages: fix shutdown handling
	mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support
	mm, hmm: use devm semantics for hmm_devmem_{add, remove}
	mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL
	mm, swap: fix swapoff with KSM pages
	memcg, oom: notify on oom killer invocation from the charge path
	sunrpc: fix cache_head leak due to queued request
	sunrpc: use SVC_NET() in svcauth_gss_* functions
	powerpc: remove old GCC version checks
	powerpc: consolidate -mno-sched-epilog into FTRACE flags
	powerpc: avoid -mno-sched-epilog on GCC 4.9 and newer
	powerpc: Disable -Wbuiltin-requires-header when setjmp is used
	kbuild: add -no-integrated-as Clang option unconditionally
	kbuild: consolidate Clang compiler flags
	Makefile: Export clang toolchain variables
	powerpc/boot: Set target when cross-compiling for clang
	raid6/ppc: Fix build for clang
	dma-direct: do not include SME mask in the DMA supported check
	mt76x0: init hw capabilities
	media: cx23885: only reset DMA on problematic CPUs
	ALSA: cs46xx: Potential NULL dereference in probe
	ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit()
	ALSA: usb-audio: Check mixer unit descriptors more strictly
	ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks
	ALSA: usb-audio: Always check descriptor sizes in parser code
	srcu: Lock srcu_data structure in srcu_gp_start()
	driver core: Add missing dev->bus->need_parent_lock checks
	Fix failure path in alloc_pid()
	block: deactivate blk_stat timer in wbt_disable_default()
	block: mq-deadline: Fix write completion handling
	dlm: fixed memory leaks after failed ls_remove_names allocation
	dlm: possible memory leak on error path in create_lkb()
	dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
	dlm: memory leaks on error path in dlm_user_request()
	gfs2: Get rid of potential double-freeing in gfs2_create_inode
	gfs2: Fix loop in gfs2_rbm_find
	b43: Fix error in cordic routine
	selinux: policydb - fix byte order and alignment issues
	PCI / PM: Allow runtime PM without callback functions
	lockd: Show pid of lockd for remote locks
	nfsd4: zero-length WRITE should succeed
	arm64: drop linker script hack to hide __efistub_ symbols
	arm64: relocatable: fix inconsistencies in linker script and options
	leds: pwm: silently error out on EPROBE_DEFER
	Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing"
	powerpc/tm: Set MSR[TS] just prior to recheckpoint
	iio: dac: ad5686: fix bit shift read register
	9p/net: put a lower bound on msize
	rxe: fix error completion wr_id and qp_num
	RDMA/srpt: Fix a use-after-free in the channel release code
	iommu/vt-d: Handle domain agaw being less than iommu agaw
	sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b
	ceph: don't update importing cap's mseq when handing cap export
	video: fbdev: pxafb: Fix "WARNING: invalid free of devm_ allocated data"
	drivers/perf: hisi: Fixup one DDRC PMU register offset
	genwqe: Fix size check
	intel_th: msu: Fix an off-by-one in attribute store
	power: supply: olpc_battery: correct the temperature units
	of: of_node_get()/of_node_put() nodes held in phandle cache
	of: __of_detach_node() - remove node from phandle cache
	lib: fix build failure in CONFIG_DEBUG_VIRTUAL test
	drm/nouveau/drm/nouveau: Check rc from drm_dp_mst_topology_mgr_resume()
	drm/vc4: Set ->is_yuv to false when num_planes == 1
	drm/rockchip: psr: do not dereference encoder before it is null checked.
	drm/amd/display: Fix unintialized max_bpc state values
	bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw
	Linux 4.19.15

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-01-13 10:25:45 +01:00
Michal Hocko
1c5e0be35d memcg, oom: notify on oom killer invocation from the charge path
commit 7056d3a37d2c6aaaab10c13e8e69adc67ec1fc65 upstream.

Burt Holzman has noticed that memcg v1 doesn't notify about OOM events via
eventfd anymore.  The reason is that 29ef680ae7 ("memcg, oom: move
out_of_memory back to the charge path") has moved the oom handling back to
the charge path.  While doing so the notification was left behind in
mem_cgroup_oom_synchronize.

Fix the issue by replicating the oom hierarchy locking and the
notification.

Link: http://lkml.kernel.org/r/20181224091107.18354-1-mhocko@kernel.org
Fixes: 29ef680ae7 ("memcg, oom: move out_of_memory back to the charge path")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Burt Holzman <burt@fnal.gov>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com
Cc: <stable@vger.kernel.org>	[4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:04 +01:00
Huang Ying
8da70752f5 mm, swap: fix swapoff with KSM pages
commit 7af7a8e19f0c5425ff639b0f0d2d244c2a647724 upstream.

KSM pages may be mapped to the multiple VMAs that cannot be reached from
one anon_vma.  So during swapin, a new copy of the page need to be
generated if a different anon_vma is needed, please refer to comments of
ksm_might_need_to_copy() for details.

During swapoff, unuse_vma() uses anon_vma (if available) to locate VMA and
virtual address mapped to the page, so not all mappings to a swapped out
KSM page could be found.  So in try_to_unuse(), even if the swap count of
a swap entry isn't zero, the page needs to be deleted from swap cache, so
that, in the next round a new page could be allocated and swapin for the
other mappings of the swapped out KSM page.

But this contradicts with the THP swap support.  Where the THP could be
deleted from swap cache only after the swap count of every swap entry in
the huge swap cluster backing the THP has reach 0.  So try_to_unuse() is
changed in commit e07098294a ("mm, THP, swap: support to reclaim swap
space for THP swapped out") to check that before delete a page from swap
cache, but this has broken KSM swapoff too.

Fortunately, KSM is for the normal pages only, so the original behavior
for KSM pages could be restored easily via checking PageTransCompound().
That is how this patch works.

The bug is introduced by e07098294a ("mm, THP, swap: support to reclaim
swap space for THP swapped out"), which is merged by v4.14-rc1.  So I
think we should backport the fix to from 4.14 on.  But Hugh thinks it may
be rare for the KSM pages being in the swap device when swapoff, so nobody
reports the bug so far.

Link: http://lkml.kernel.org/r/20181226051522.28442-1-ying.huang@intel.com
Fixes: e07098294a ("mm, THP, swap: support to reclaim swap space for THP swapped out")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Hugh Dickins <hughd@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:04 +01:00
Dan Williams
0f1a62e073 mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL
commit 02917e9f8676207a4c577d4d94eae12bf348e9d7 upstream.

At Maintainer Summit, Greg brought up a topic I proposed around
EXPORT_SYMBOL_GPL usage.  The motivation was considerations for when
EXPORT_SYMBOL_GPL is warranted and the criteria for taking the exceptional
step of reclassifying an existing export.  Specifically, I wanted to make
the case that although the line is fuzzy and hard to specify in abstract
terms, it is nonetheless clear that devm_memremap_pages() and HMM
(Heterogeneous Memory Management) have crossed it.  The
devm_memremap_pages() facility should have been EXPORT_SYMBOL_GPL from the
beginning, and HMM as a derivative of that functionality should have
naturally picked up that designation as well.

Contrary to typical rules, the HMM infrastructure was merged upstream with
zero in-tree consumers.  There was a promise at the time that those users
would be merged "soon", but it has been over a year with no drivers
arriving.  While the Nouveau driver is about to belatedly make good on
that promise it is clear that HMM was targeted first and foremost at an
out-of-tree consumer.

HMM is derived from devm_memremap_pages(), a facility Christoph and I
spearheaded to support persistent memory.  It combines a device lifetime
model with a dynamically created 'struct page' / memmap array for any
physical address range.  It enables coordination and control of the many
code paths in the kernel built to interact with memory via 'struct page'
objects.  With HMM the integration goes even deeper by allowing device
drivers to hook and manipulate page fault and page free events.

One interpretation of when EXPORT_SYMBOL is suitable is when it is
exporting stable and generic leaf functionality.  The
devm_memremap_pages() facility continues to see expanding use cases,
peer-to-peer DMA being the most recent, with no clear end date when it
will stop attracting reworks and semantic changes.  It is not suitable to
export devm_memremap_pages() as a stable 3rd party driver API due to the
fact that it is still changing and manipulates core behavior.  Moreover,
it is not in the best interest of the long term development of the core
memory management subsystem to permit any external driver to effectively
define its own system-wide memory management policies with no
encouragement to engage with upstream.

I am also concerned that HMM was designed in a way to minimize further
engagement with the core-MM.  That, with these hooks in place,
device-drivers are free to implement their own policies without much
consideration for whether and how the core-MM could grow to meet that
need.  Going forward not only should HMM be EXPORT_SYMBOL_GPL, but the
core-MM should be allowed the opportunity and stimulus to change and
address these new use cases as first class functionality.

Original changelog:

hmm_devmem_add(), and hmm_devmem_add_resource() duplicated
devm_memremap_pages() and are now simple now wrappers around the core
facility to inject a dev_pagemap instance into the global pgmap_radix and
hook page-idle events.  The devm_memremap_pages() interface is base
infrastructure for HMM.  HMM has more and deeper ties into the kernel
memory management implementation than base ZONE_DEVICE which is itself a
EXPORT_SYMBOL_GPL facility.

Originally, the HMM page structure creation routines copied the
devm_memremap_pages() code and reused ZONE_DEVICE.  A cleanup to unify the
implementations was discussed during the initial review:
http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html Recent work to
extend devm_memremap_pages() for the peer-to-peer-DMA facility enabled
this cleanup to move forward.

In addition to the integration with devm_memremap_pages() HMM depends on
other GPL-only symbols:

    mmu_notifier_unregister_no_release
    percpu_ref
    region_intersects
    __class_create

It goes further to consume / indirectly expose functionality that is not
exported to any other driver:

    alloc_pages_vma
    walk_page_range

HMM is derived from devm_memremap_pages(), and extends deep core-kernel
fundamentals. Similar to devm_memremap_pages(), mark its entry points
EXPORT_SYMBOL_GPL().

[logang@deltatee.com: PCI/P2PDMA: match interface changes to devm_memremap_pages()]
  Link: http://lkml.kernel.org/r/20181130225911.2900-1-logang@deltatee.com
Link: http://lkml.kernel.org/r/154275560565.76910.15919297436557795278.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>,
Cc: Michal Hocko <mhocko@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:04 +01:00
Dan Williams
e890a86706 mm, hmm: use devm semantics for hmm_devmem_{add, remove}
commit 58ef15b765af0d2cbe6799ec564f1dc485010ab8 upstream.

devm semantics arrange for resources to be torn down when
device-driver-probe fails or when device-driver-release completes.
Similar to devm_memremap_pages() there is no need to support an explicit
remove operation when the users properly adhere to devm semantics.

Note that devm_kzalloc() automatically handles allocating node-local
memory.

Link: http://lkml.kernel.org/r/154275559545.76910.9186690723515469051.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:04 +01:00
Michal Hocko
2c87072a3b hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
commit b15c87263a69272423771118c653e9a1d0672caa upstream.

We have received a bug report that an injected MCE about faulty memory
prevents memory offline to succeed on 4.4 base kernel.  The underlying
reason was that the HWPoison page has an elevated reference count and the
migration keeps failing.  There are two problems with that.  First of all
it is dubious to migrate the poisoned page because we know that accessing
that memory is possible to fail.  Secondly it doesn't make any sense to
migrate a potentially broken content and preserve the memory corruption
over to a new location.

Oscar has found out that 4.4 and the current upstream kernels behave
slightly differently with his simply testcase

===

int main(void)
{
        int ret;
        int i;
        int fd;
        char *array = malloc(4096);
        char *array_locked = malloc(4096);

        fd = open("/tmp/data", O_RDONLY);
        read(fd, array, 4095);

        for (i = 0; i < 4096; i++)
                array_locked[i] = 'd';

        ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked));
        if (ret)
                perror("mlock");

        sleep (20);

        ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON);
        if (ret)
                perror("madvise");

        for (i = 0; i < 4096; i++)
                array_locked[i] = 'd';

        return 0;
}
===

+ offline this memory.

In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU
list
kernel:  [<ffffffff81019ac9>] dump_trace+0x59/0x340
kernel:  [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170
kernel:  [<ffffffff8101ac71>] show_stack+0x21/0x40
kernel:  [<ffffffff8132bb90>] dump_stack+0x5c/0x7c
kernel:  [<ffffffff810815a1>] warn_slowpath_common+0x81/0xb0
kernel:  [<ffffffff811a275c>] __pagevec_lru_add_fn+0x14c/0x160
kernel:  [<ffffffff811a2eed>] pagevec_lru_move_fn+0xad/0x100
kernel:  [<ffffffff811a334c>] __lru_cache_add+0x6c/0xb0
kernel:  [<ffffffff81195236>] add_to_page_cache_lru+0x46/0x70
kernel:  [<ffffffffa02b4373>] extent_readpages+0xc3/0x1a0 [btrfs]
kernel:  [<ffffffff811a16d7>] __do_page_cache_readahead+0x177/0x200
kernel:  [<ffffffff811a18c8>] ondemand_readahead+0x168/0x2a0
kernel:  [<ffffffff8119673f>] generic_file_read_iter+0x41f/0x660
kernel:  [<ffffffff8120e50d>] __vfs_read+0xcd/0x140
kernel:  [<ffffffff8120e9ea>] vfs_read+0x7a/0x120
kernel:  [<ffffffff8121404b>] kernel_read+0x3b/0x50
kernel:  [<ffffffff81215c80>] do_execveat_common.isra.29+0x490/0x6f0
kernel:  [<ffffffff81215f08>] do_execve+0x28/0x30
kernel:  [<ffffffff81095ddb>] call_usermodehelper_exec_async+0xfb/0x130
kernel:  [<ffffffff8161c045>] ret_from_fork+0x55/0x80

And that latter confuses the hotremove path because an LRU page is
attempted to be migrated and that fails due to an elevated reference
count.  It is quite possible that the reuse of the HWPoisoned page is some
kind of fixed race condition but I am not really sure about that.

With the upstream kernel the failure is slightly different.  The page
doesn't seem to have LRU bit set but isolate_movable_page simply fails and
do_migrate_range simply puts all the isolated pages back to LRU and
therefore no progress is made and scan_movable_pages finds same set of
pages over and over again.

Fix both cases by explicitly checking HWPoisoned pages before we even try
to get reference on the page, try to unmap it if it is still mapped.  As
explained by Naoya:

: Hwpoison code never unmapped those for no big reason because
: Ksm pages never dominate memory, so we simply didn't have strong
: motivation to save the pages.

Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU
HWPoison pages which shouldn't happen but I couldn't convince myself about
that.  Naoya has noted the following:

: Theoretically no such gurantee, because try_to_unmap() doesn't have a
: guarantee of success and then memory_failure() returns immediately
: when hwpoison_user_mappings fails.
: Or the following code (comes after hwpoison_user_mappings block) also impli=
: es
: that the target page can still have PageLRU flag.
:
:         /*
:          * Torn down by someone else?
:          */
:         if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) {
:                 action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
:                 res =3D -EBUSY;
:                 goto out;
:         }
:
: So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in
: current version of your patch.

Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.com>
Debugged-by: Oscar Salvador <osalvador@suse.com>
Tested-by: Oscar Salvador <osalvador@suse.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13 09:51:04 +01:00
Greg Kroah-Hartman
a872d2d074 This is the 4.19.13 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwnaqgACgkQONu9yGCS
 aT5BPA//TXG7P4K4Nor0eu6CLJ8KaO3wZSneHUp+cV3/zZsPe/6K4pgwz5Kmho7R
 ii82FuXKTwqr+CLegkGlwF01q/HFT7u487Yz1eqdrf3oqvjQjC9+ut/qO//9JePd
 OLZCCrPtFqWT8ClpHhxWA3skYx9UsnBxseUFE+cMCuTVin1/YGQc/xV6CQBgZfs3
 V3dfmv9D1lCZ1nlvgEHh+VMvqlvnBEgUufLYZZEb6yK9GVQuRk+piXMf2rxm1RuN
 aBZHVI4tdHhYkEbhQ46ADaPLBghNeSoa2bIBnHu0G1YO+oRewQlVM/rEvMv+XOdX
 GoRSX1fNYZUjI0u6EsDw0WPBILoJaLmXF8bIH3hTmTkTev4Vslyiuz0SJNwLwrkx
 0Zzg2D+AF9MdvO4EBwoAnqwzO2lM6WkIsHp85NMymggp5+VL1yuuo0kr7OMw51Rl
 U5ReIwcq+7TZp3WtqUQHEGO5TOfPoAdW8sINcQeWTjod6c3EHPxmvrS8EE6KgPI1
 o+jE2j+uxUbgzzeq4ovJvsJj28WKqZ0jCLyMozCN6hpzki+S5qzNHYMYz3quZGQH
 GN82w5cZGrtPFHAm1Ft5hVB+uS9vj6+84jIprFVYwPnBN6f5tK8Rjsz5cJ5Oh7UW
 q5EAuLxcLt+5v2TMYlZRNLg/fzZBS3FnZy0KLx8XSJ+jm6E4LM0=
 =GR64
 -----END PGP SIGNATURE-----

Merge 4.19.13 into android-4.19

Changes in 4.19.13
	iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"
	Revert "vfs: Allow userns root to call mknod on owned filesystems."
	USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data
	xhci: Don't prevent USB2 bus suspend in state check intended for USB3 only
	USB: xhci: fix 'broken_suspend' placement in struct xchi_hcd
	USB: serial: option: add GosunCn ZTE WeLink ME3630
	USB: serial: option: add HP lt4132
	USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode)
	USB: serial: option: add Fibocom NL668 series
	USB: serial: option: add Telit LN940 series
	ubifs: Handle re-linking of inodes correctly while recovery
	scsi: t10-pi: Return correct ref tag when queue has no integrity profile
	scsi: sd: use mempool for discard special page
	mmc: core: Reset HPI enabled state during re-init and in case of errors
	mmc: core: Allow BKOPS and CACHE ctrl even if no HPI support
	mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl
	mmc: omap_hsmmc: fix DMA API warning
	gpio: max7301: fix driver for use with CONFIG_VMAP_STACK
	gpiolib-acpi: Only defer request_irq for GpioInt ACPI event handlers
	posix-timers: Fix division by zero bug
	KVM: X86: Fix NULL deref in vcpu_scan_ioapic
	kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs
	KVM: Fix UAF in nested posted interrupt processing
	Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels
	futex: Cure exit race
	x86/mtrr: Don't copy uninitialized gentry fields back to userspace
	x86/mm: Fix decoy address handling vs 32-bit builds
	x86/vdso: Pass --eh-frame-hdr to the linker
	x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence
	panic: avoid deadlocks in re-entrant console drivers
	mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
	mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
	mm: introduce mm_[p4d|pud|pmd]_folded
	xfrm_user: fix freeing of xfrm states on acquire
	rtlwifi: Fix leak of skb when processing C2H_BT_INFO
	iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT to old firmwares
	Revert "mwifiex: restructure rx_reorder_tbl_lock usage"
	iwlwifi: add new cards for 9560, 9462, 9461 and killer series
	media: ov5640: Fix set format regression
	mm, memory_hotplug: initialize struct pages for the full memory section
	mm: thp: fix flags for pmd migration when split
	mm, page_alloc: fix has_unmovable_pages for HugePages
	mm: don't miss the last page because of round-off error
	Input: elantech - disable elan-i2c for P52 and P72
	proc/sysctl: don't return ENOMEM on lookup when a table is unregistering
	drm/ioctl: Fix Spectre v1 vulnerabilities
	Linux 4.19.13

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-12-29 13:46:09 +01:00
Roman Gushchin
a5e8809697 mm: don't miss the last page because of round-off error
commit 68600f623d69da428c6163275f97ca126e1a8ec5 upstream.

I've noticed, that dying memory cgroups are often pinned in memory by a
single pagecache page.  Even under moderate memory pressure they sometimes
stayed in such state for a long time.  That looked strange.

My investigation showed that the problem is caused by applying the LRU
pressure balancing math:

  scan = div64_u64(scan * fraction[lru], denominator),

where

  denominator = fraction[anon] + fraction[file] + 1.

Because fraction[lru] is always less than denominator, if the initial scan
size is 1, the result is always 0.

This means the last page is not scanned and has
no chances to be reclaimed.

Fix this by rounding up the result of the division.

In practice this change significantly improves the speed of dying cgroups
reclaim.

[guro@fb.com: prevent double calculation of DIV64_U64_ROUND_UP() arguments]
  Link: http://lkml.kernel.org/r/20180829213311.GA13501@castle
Link: http://lkml.kernel.org/r/20180827162621.30187-3-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29 13:37:59 +01:00
Oscar Salvador
e27666dd8f mm, page_alloc: fix has_unmovable_pages for HugePages
commit 17e2e7d7e1b83fa324b3f099bfe426659aa3c2a4 upstream.

While playing with gigantic hugepages and memory_hotplug, I triggered
the following #PF when "cat memoryX/removable":

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 1481 Comm: cat Tainted: G            E     4.20.0-rc6-mm1-1-default+ #18
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:has_unmovable_pages+0x154/0x210
  Call Trace:
   is_mem_section_removable+0x7d/0x100
   removable_show+0x90/0xb0
   dev_attr_show+0x1c/0x50
   sysfs_kf_seq_show+0xca/0x1b0
   seq_read+0x133/0x380
   __vfs_read+0x26/0x180
   vfs_read+0x89/0x140
   ksys_read+0x42/0x90
   do_syscall_64+0x5b/0x180
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

The reason is we do not pass the Head to page_hstate(), and so, the call
to compound_order() in page_hstate() returns 0, so we end up checking
all hstates's size to match PAGE_SIZE.

Obviously, we do not find any hstate matching that size, and we return
NULL.  Then, we dereference that NULL pointer in
hugepage_migration_supported() and we got the #PF from above.

Fix that by getting the head page before calling page_hstate().

Also, since gigantic pages span several pageblocks, re-adjust the logic
for skipping pages.  While are it, we can also get rid of the
round_up().

[osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal]
  Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de
Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29 13:37:59 +01:00
Peter Xu
161a5654cf mm: thp: fix flags for pmd migration when split
commit 2e83ee1d8694a61d0d95a5b694f2e61e8dde8627 upstream.

When splitting a huge migrating PMD, we'll transfer all the existing PMD
bits and apply them again onto the small PTEs.  However we are fetching
the bits unconditionally via pmd_soft_dirty(), pmd_write() or
pmd_yound() while actually they don't make sense at all when it's a
migration entry.  Fix them up.  Since at it, drop the ifdef together as
not needed.

Note that if my understanding is correct about the problem then if
without the patch there is chance to lose some of the dirty bits in the
migrating pmd pages (on x86_64 we're fetching bit 11 which is part of
swap offset instead of bit 2) and it could potentially corrupt the
memory of an userspace program which depends on the dirty bit.

Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29 13:37:58 +01:00
Mikhail Zaslonko
7592dbfaf3 mm, memory_hotplug: initialize struct pages for the full memory section
commit 2830bf6f05fb3e05bc4743274b806c821807a684 upstream.

If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized.  This may lead
to VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered
by memory_hotplug sysfs handlers:

Here are the the panic examples:
 CONFIG_DEBUG_VM=y
 CONFIG_DEBUG_VM_PGFLAGS=y

 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ( test_pages_in_a_zone+0xde/0x160)
   show_valid_zones+0x5c/0x190
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

 kernel parameter mem=3075M
 --------------------------
 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ( is_mem_section_removable+0xb4/0x190)
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix the problem by initializing the last memory section of each zone in
memmap_init_zone() till the very end, even if it goes beyond the zone end.

Michal said:

: This has alwways been problem AFAIU.  It just went unnoticed because we
: have zeroed memmaps during allocation before f7f99100d8 ("mm: stop
: zeroing memory during allocation in vmemmap") and so the above test
: would simply skip these ranges as belonging to zone 0 or provided a
: garbage.
:
: So I guess we do care for post f7f99100d8 kernels mostly and
: therefore Fixes: f7f99100d8 ("mm: stop zeroing memory during
: allocation in vmemmap")

Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com
Fixes: f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29 13:37:58 +01:00
Joel Fernandes (Google)
2e0d7ea44a UPSTREAM: mm/memfd: make F_SEAL_FUTURE_WRITE seal more robust
A better way to do F_SEAL_FUTURE_WRITE seal was discussed [1] last week
where we don't need to modify core VFS structures to get the same
behavior of the seal. This solves several side-effects pointed out by
Andy [2].

[1] https://lore.kernel.org/lkml/20181111173650.GA256781@google.com/
[2] https://lore.kernel.org/lkml/69CE06CC-E47C-4992-848A-66EB23EE6C74@amacapital.net/

Suggested-by: Andy Lutomirski <luto@kernel.org>
Fixes: 5e653c2923fd ("mm: Add an F_SEAL_FUTURE_WRITE seal to memfd")
Change-id: I5d2414cfcf8ac42d3632d0b0dc960c742d490e2f
Verified with test program at: https://lore.kernel.org/patchwork/patch/1008117/
Backport link: https://lore.kernel.org/patchwork/patch/1014892/
Bug: 113362644
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-12-19 11:35:35 -08:00
Joel Fernandes (Google)
1dc8ca4429 UPSTREAM: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.

One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active.  This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow

This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:

 #include <stdio.h>
 #include <errno.h>
 #include <sys/mman.h>
 #include <linux/memfd.h>
 #include <linux/fcntl.h>
 #include <asm/unistd.h>
 #include <unistd.h>
 #define F_SEAL_FUTURE_WRITE 0x0010
 #define REGION_SIZE (5 * 1024 * 1024)

int memfd_create_region(const char *name, size_t size)
{
    int ret;
    int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
    if (fd < 0) return fd;
    ret = ftruncate(fd, size);
    if (ret < 0) { close(fd); return ret; }
    return fd;
}

int main() {
    int ret, fd;
    void *addr, *addr2, *addr3, *addr1;
    ret = memfd_create_region("test_region", REGION_SIZE);
    printf("ret=%d\n", ret);
    fd = ret;

    // Create map
    addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr == MAP_FAILED)
	    printf("map 0 failed\n");
    else
	    printf("map 0 passed\n");

    if ((ret = write(fd, "test", 4)) != 4)
	    printf("write failed even though no future-write seal "
		   "(ret=%d errno =%d)\n", ret, errno);
    else
	    printf("write passed\n");

    addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr1 == MAP_FAILED)
	    perror("map 1 prot-write failed even though no seal\n");
    else
	    printf("map 1 prot-write passed as expected\n");

    ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
				 F_SEAL_GROW |
				 F_SEAL_SHRINK);
    if (ret == -1)
	    printf("fcntl failed, errno: %d\n", errno);
    else
	    printf("future-write seal now active\n");

    if ((ret = write(fd, "test", 4)) != 4)
	    printf("write failed as expected due to future-write seal\n");
    else
	    printf("write passed (unexpected)\n");

    addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (addr2 == MAP_FAILED)
	    perror("map 2 prot-write failed as expected due to seal\n");
    else
	    printf("map 2 passed\n");

    addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
    if (addr3 == MAP_FAILED)
	    perror("map 3 failed\n");
    else
	    printf("map 3 prot-read passed as expected\n");
}

The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
future-write seal now active
write failed as expected due to future-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected

Cc: jreck@google.com
Cc: john.stultz@linaro.org
Cc: tkjos@google.com
Cc: gregkh@linuxfoundation.org
Cc: hch@infradead.org
Reviewed-by: John Stultz <john.stultz@linaro.org>
Reported-by: Jann Horn <jannh@google.com>
Change-id: Ie702c08f1f41ce78e5e8db3f415446ceedb01795
Bug: 113362644
Verified with test program at: https://lore.kernel.org/patchwork/patch/1008117/
Backport link: https://lore.kernel.org/patchwork/patch/1014892/
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-12-19 11:35:19 -08:00
Greg Kroah-Hartman
67319b77a0 This is the 4.19.10 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwXXUsACgkQONu9yGCS
 aT5lHBAAm4DiCe303AjPWGQauwDWZPhXcF2ieF/gx77TSotIonxRa4w4nQdQAxVh
 aIiMzyxihwtgd6bCMNMkCjImWqUw+f189D6RGzKJZYLCB39HCPskJ6oMPvuzNXAL
 yF1+84288ZY+Z9DXxK3T9x8KJUj5qXexjgoMfdS9+lJWku/BsCTPFk8tIjxY5bI9
 hMSIePIfvZqmXWuz7Btw9uykOYwAzk3tqcVv1P1vSeWaUE7dWQts17NUZhnDt5zp
 alSnmUUt7I7w+9CWpORFOHC+ekfltf/7VjIVgzBf9cKTgxGeZ8+htceYGTRIwegg
 kzU4cq8IZGWp+Umfhm9r7vWxf+tjdil42dYkiDWs/XnbKVw5f2UFi8c2rAItmfVw
 vpSZK1hgUFm8dojOFIjbJF2AfhLpDDSqKuZNhw1SIzDmsA6rV8cLNdQx+suL9Xc5
 JoL+b1wH1uvrPnSOloScakF32gjsrU5mReP+yPgl3LNc1Hn/Nu85262i4OEzs+Od
 Kmy/TfaRWYlWWtejH3fydmVGGadJ4owNYqhuB9eYQgBKWbcSShDXZmvJ+VKVdmcs
 k9Nz/Lyt4GxrFYiaWGuQeE0VTG9z87FwQvuikYKJF7FptN4kixBITfzRlKh3JbM4
 sR/nASeAvGiv5WrwszcM6AJ0Ps0yzZJr5JZ1w7wbWX84QH457mo=
 =bF+8
 -----END PGP SIGNATURE-----

Merge 4.19.10 into android-4.19

Changes in 4.19.10
	ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
	ipv6: Check available headroom in ip6_xmit() even without options
	neighbour: Avoid writing before skb->head in neigh_hh_output()
	ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output
	net: 8139cp: fix a BUG triggered by changing mtu with network traffic
	net/mlx4_core: Correctly set PFC param if global pause is turned off.
	net/mlx4_en: Change min MTU size to ETH_MIN_MTU
	net: phy: don't allow __set_phy_supported to add unsupported modes
	net: Prevent invalid access to skb->prev in __qdisc_drop_all
	net: use skb_list_del_init() to remove from RX sublists
	Revert "net/ibm/emac: wrong bit is used for STA control"
	rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices
	sctp: kfree_rcu asoc
	tcp: Do not underestimate rwnd_limited
	tcp: fix NULL ref in tail loss probe
	tun: forbid iface creation with rtnl ops
	virtio-net: keep vnet header zeroed after processing XDP
	net: phy: sfp: correct store of detected link modes
	sctp: update frag_point when stream_interleave is set
	net: restore call to netdev_queue_numa_node_write when resetting XPS
	net: fix XPS static_key accounting
	ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup
	ASoC: rsnd: fixup clock start checker
	ASoC: qdsp6: q6afe: Fix wrong MI2S SD line mask
	ASoC: qdsp6: q6afe-dai: Fix the dai widgets
	staging: rtl8723bs: Fix the return value in case of error in 'rtw_wx_read32()'
	ARM: dts: am3517: Fix pinmuxing for CD on MMC1
	ARM: dts: LogicPD Torpedo: Fix mmc3_dat1 interrupt
	ARM: dts: logicpd-somlv: Fix interrupt on mmc3_dat1
	ARM: dts: am3517-som: Fix WL127x Wifi interrupt
	ARM: OMAP1: ams-delta: Fix possible use of uninitialized field
	tools: bpftool: prevent infinite loop in get_fdinfo()
	ASoC: sun8i-codec: fix crash on module removal
	arm64: dts: sdm845-mtp: Reserve reserved gpios
	sysv: return 'err' instead of 0 in __sysv_write_inode
	netfilter: nf_conncount: use spin_lock_bh instead of spin_lock
	netfilter: nf_conncount: fix list_del corruption in conn_free
	netfilter: nf_conncount: fix unexpected permanent node of list.
	netfilter: nf_tables: don't skip inactive chains during update
	selftests: add script to stress-test nft packet path vs. control plane
	perf tools: Fix crash on synthesizing the unit
	netfilter: xt_RATEEST: remove netns exit routine
	netfilter: nf_tables: fix use-after-free when deleting compat expressions
	s390/cio: Fix cleanup of pfn_array alloc failure
	s390/cio: Fix cleanup when unsupported IDA format is used
	hwmon (ina2xx) Fix NULL id pointer in probe()
	hwmon: (raspberrypi) Fix initial notify
	ASoC: rockchip: add missing slave_config setting for I2S
	ASoC: wm_adsp: Fix dma-unsafe read of scratch registers
	ASoC: Intel: Power down links before turning off display audio power
	ASoC: qcom: Set dai_link id to each dai_link
	s390/cpum_cf: Reject request for sampling in event initialization
	hwmon: (ina2xx) Fix current value calculation
	ASoC: omap-abe-twl6040: Fix missing audio card caused by deferred probing
	ASoC: dapm: Recalculate audio map forcely when card instantiated
	spi: omap2-mcspi: Add missing suspend and resume calls
	hwmon: (mlxreg-fan) Fix macros for tacho fault reading
	bpf: allocate local storage buffers using GFP_ATOMIC
	aio: fix failure to put the file pointer
	netfilter: xt_hashlimit: fix a possible memory leak in htable_create()
	hwmon: (w83795) temp4_type has writable permission
	perf tools: Restore proper cwd on return from mnt namespace
	PCI: imx6: Fix link training status detection in link up check
	ASoC: acpi: fix: continue searching when machine is ignored
	objtool: Fix double-free in .cold detection error path
	objtool: Fix segfault in .cold detection with -ffunction-sections
	phy: qcom-qusb2: Use HSTX_TRIM fused value as is
	phy: qcom-qusb2: Fix HSTX_TRIM tuning with fused value for SDM845
	ARM: dts: at91: sama5d2: use the divided clock for SMC
	Btrfs: send, fix infinite loop due to directory rename dependencies
	RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR
	RDMA/core: Add GIDs while changing MAC addr only for registered ndev
	RDMA/bnxt_re: Fix system hang when registration with L2 driver fails
	RDMA/bnxt_re: Avoid accessing the device structure after it is freed
	RDMA/rdmavt: Fix rvt_create_ah function signature
	tools: bpftool: fix potential NULL pointer dereference in do_load
	ASoC: omap-mcbsp: Fix latency value calculation for pm_qos
	ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE
	ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE
	exportfs: do not read dentry after free
	RDMA/hns: Bugfix pbl configuration for rereg mr
	bpf: fix check of allowed specifiers in bpf_trace_printk
	fsi: master-ast-cf: select GENERIC_ALLOCATOR
	ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notf
	USB: omap_udc: use devm_request_irq()
	USB: omap_udc: fix crashes on probe error and module removal
	USB: omap_udc: fix omap_udc_start() on 15xx machines
	USB: omap_udc: fix USB gadget functionality on Palm Tungsten E
	USB: omap_udc: fix rejection of out transfers when DMA is used
	thunderbolt: Prevent root port runtime suspend during NVM upgrade
	drm/meson: add support for 1080p25 mode
	netfilter: ipv6: Preserve link scope traffic original oif
	IB/mlx5: Fix page fault handling for MW
	netfilter: add missing error handling code for register functions
	netfilter: nat: fix double register in masquerade modules
	netfilter: nf_conncount: remove wrong condition check routine
	KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes
	KVM: x86: fix empty-body warnings
	x86/kvm/vmx: fix old-style function declaration
	net: thunderx: fix NULL pointer dereference in nic_remove
	usb: gadget: u_ether: fix unsafe list iteration
	netfilter: nf_tables: deactivate expressions in rule replecement routine
	ALSA: usb-audio: Add vendor and product name for Dell WD19 Dock
	cachefiles: Fix an assertion failure when trying to update a failed object
	fscache: Fix race in fscache_op_complete() due to split atomic_sub & read
	cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active
	igb: fix uninitialized variables
	ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
	net: hisilicon: remove unexpected free_netdev
	drm/amdgpu: Add delay after enable RLC ucode
	drm/ast: fixed reading monitor EDID not stable issue
	xen: xlate_mmu: add missing header to fix 'W=1' warning
	Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"
	pvcalls-front: fixes incorrect error handling
	pstore/ram: Correctly calculate usable PRZ bytes
	afs: Fix validation/callback interaction
	fscache: fix race between enablement and dropping of object
	cachefiles: Explicitly cast enumerated type in put_object
	fscache, cachefiles: remove redundant variable 'cache'
	nvme: warn when finding multi-port subsystems without multipathing enabled
	nvme: flush namespace scanning work just before removing namespaces
	nvme-rdma: fix double freeing of async event data
	ACPI/IORT: Fix iort_get_platform_device_domain() uninitialized pointer value
	ocfs2: fix deadlock caused by ocfs2_defrag_extent()
	mm/page_alloc.c: fix calculation of pgdat->nr_zones
	hfs: do not free node before using
	hfsplus: do not free node before using
	debugobjects: avoid recursive calls with kmemleak
	proc: fixup map_files test on arm
	kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
	initramfs: clean old path before creating a hardlink
	ocfs2: fix potential use after free
	flexfiles: enforce per-mirror stateid only for v4 DSes
	dax: Check page->mapping isn't NULL
	ALSA: fireface: fix reference to wrong register for clock configuration
	ALSA: hda/realtek - Fixed headphone issue for ALC700
	ALSA: hda/realtek: ALC294 mic and headset-mode fixups for ASUS X542UN
	ALSA: hda/realtek: Enable audio jacks of ASUS UX533FD with ALC294
	ALSA: hda/realtek: Enable audio jacks of ASUS UX433FN/UX333FA with ALC294
	ALSA: hda/realtek - Fix the mute LED regresion on Lenovo X1 Carbon
	IB/hfi1: Fix an out-of-bounds access in get_hw_stats
	bpf: fix off-by-one error in adjust_subprog_starts
	tcp: lack of available data can also cause TSO defer
	Linux 4.19.10

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-12-17 09:39:43 +01:00
Wei Yang
505bc9f389 mm/page_alloc.c: fix calculation of pgdat->nr_zones
[ Upstream commit 8f416836c0d50b198cad1225132e5abebf8980dc ]

init_currently_empty_zone() will adjust pgdat->nr_zones and set it to
'zone_idx(zone) + 1' unconditionally.  This is correct in the normal
case, while not exact in hot-plug situation.

This function is used in two places:

  * free_area_init_core()
  * move_pfn_range_to_zone()

In the first case, we are sure zone index increase monotonically.  While
in the second one, this is under users control.

One way to reproduce this is:
----------------------------

1. create a virtual machine with empty node1

   -m 4G,slots=32,maxmem=32G \
   -smp 4,maxcpus=8          \
   -numa node,nodeid=0,mem=4G,cpus=0-3 \
   -numa node,nodeid=1,mem=0G,cpus=4-7

2. hot-add cpu 3-7

   cpu-add [3-7]

2. hot-add memory to nod1

   object_add memory-backend-ram,id=ram0,size=1G
   device_add pc-dimm,id=dimm0,memdev=ram0,node=1

3. online memory with following order

   echo online_movable > memory47/state
   echo online > memory40/state

After this, node1 will have its nr_zones equals to (ZONE_NORMAL + 1)
instead of (ZONE_MOVABLE + 1).

Michal said:
 "Having an incorrect nr_zones might result in all sorts of problems
  which would be quite hard to debug (e.g. reclaim not considering the
  movable zone). I do not expect many users would suffer from this it
  but still this is trivial and obviously right thing to do so
  backporting to the stable tree shouldn't be harmful (last famous
  words)"

Link: http://lkml.kernel.org/r/20181117022022.9956-1-richard.weiyang@gmail.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-17 09:24:40 +01:00
Greg Kroah-Hartman
49fe708f16 This is the 4.19.8 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwLshEACgkQONu9yGCS
 aT4wJA//V/G9RbjbXaY9kjfMQW/mgySwfPmhvyzS1O9J3ic3b5WVO1J547UkWyd9
 DwjIOUNx8IGDTLiAs15Z92CqKYOxpGp9zy0hbNMLXE3WTLXyyg94K/jlk6jk3vXw
 jCvYGQaQuMyNhPr8chS3Nmkdqx3ZLC1NmmGIBSRJevseWXe2yVowTo4EuKDxnmEL
 dwYsEQAgsbPiZamt1J6gqKvgbcKnBk119cHXSJBFEpdtmSxjxEFz5sJIptO0QCI8
 Ck08bMUA7YaQ5CGsvbOTGJtq8EW5Vakk9DTJWDDwkdk1kZ+Xv6u2992Ey3nesvin
 oKWayd9a+1qYBlkXVyZGiKBSSE9KPN8beZsiYSUidH1qZdT8XoWKLX7cOeaL1kWl
 SHsrXy3je3UWVaz7YEiAdmdEuocjbH9Nfb4q0bfPfCYmdFB5tjrFz4gpUjbdTEpC
 oh31h9gOvuOXWedFfOckh/Ung5CDinxmXLS8zFBNe7WrHA1ZLTypMaHwASuRlsTD
 UMJ9meuMtghHg6tt+jkz5GFEP1SqnP9rCQfBuFslWlR1Y/Y3SJRSeyL7OmXUBa5N
 w/L2iwOO+SK91WRivZXqinOaMMlolYk4OF1dCehlgTFCF5Dfn8olz6mm7G7zd37S
 swAcz1ogWZb+AmQ/EWlxeIzTOjss1I+howbdMjQctpLjkYAKr7g=
 =+hPU
 -----END PGP SIGNATURE-----

Merge 4.19.8 into android-4.19

Changes in 4.19.8
	blk-mq: fix corruption with direct issue
	test_hexdump: use memcpy instead of strncpy
	unifdef: use memcpy instead of strncpy
	iser: set sector for ambiguous mr status errors
	uprobes: Fix handle_swbp() vs. unregister() + register() race once more
	mtd: nand: Fix memory allocation in nanddev_bbt_init()
	arm64: ftrace: Fix to enable syscall events on arm64
	sched, trace: Fix prev_state output in sched_switch tracepoint
	tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique
	MIPS: ralink: Fix mt7620 nd_sd pinmux
	mips: fix mips_get_syscall_arg o32 check
	IB/mlx5: Avoid load failure due to unknown link width
	tracing/fgraph: Fix set_graph_function from showing interrupts
	drm/ast: Fix incorrect free on ioregs
	drm/amd/dm: Don't forget to attach MST encoders
	drm: set is_master to 0 upon drm_new_set_master() failure
	drm/meson: Fixes for drm_crtc_vblank_on/off support
	drm/meson: Enable fast_io in meson_dw_hdmi_regmap_config
	drm/meson: Fix OOB memory accesses in meson_viu_set_osd_lut()
	userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails
	userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
	userfaultfd: shmem: add i_size checks
	userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
	kgdboc: Fix restrict error
	kgdboc: Fix warning with module build
	svm: Add mutex_lock to protect apic_access_page_done on AMD systems
	selinux: add support for RTM_NEWCHAIN, RTM_DELCHAIN, and RTM_GETCHAIN
	i40e: Fix deletion of MAC filters
	scsi: lpfc: fix block guard enablement on SLI3 adapters
	Input: xpad - quirk all PDP Xbox One gamepads
	Input: synaptics - add PNP ID for ThinkPad P50 to SMBus
	Input: matrix_keypad - check for errors from of_get_named_gpio()
	Input: cros_ec_keyb - fix button/switch capability reports
	Input: elan_i2c - add ELAN0620 to the ACPI table
	Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15ARR
	Input: elan_i2c - add support for ELAN0621 touchpad
	btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable
	ARC: change defconfig defaults to ARCv2
	arc: [devboards] Add support of NFSv3 ACL
	tipc: use destination length for copy string
	blk-mq: punt failed direct issue to dispatch list
	Linux 4.19.8

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-12-08 13:24:30 +01:00
Andrea Arcangeli
8f193a716e userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
commit dcf7fe9d89763a28e0f43975b422ff141fe79e43 upstream.

Set the page dirty if VM_WRITE is not set because in such case the pte
won't be marked dirty and the page would be reclaimed without writepage
(i.e.  swapout in the shmem case).

This was found by source review.  Most apps (certainly including QEMU)
only use UFFDIO_COPY on PROT_READ|PROT_WRITE mappings or the app can't
modify the memory in the first place.  This is for correctness and it
could help the non cooperative use case to avoid unexpected data loss.

Link: http://lkml.kernel.org/r/20181126173452.26955-6-aarcange@redhat.com
Reviewed-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 12:59:08 +01:00
Andrea Arcangeli
4ce337622f userfaultfd: shmem: add i_size checks
commit e2a50c1f64145a04959df2442305d57307e5395a upstream.

With MAP_SHARED: recheck the i_size after taking the PT lock, to
serialize against truncate with the PT lock.  Delete the page from the
pagecache if the i_size_read check fails.

With MAP_PRIVATE: check the i_size after the PT lock before mapping
anonymous memory or zeropages into the MAP_PRIVATE shmem mapping.

A mostly irrelevant cleanup: like we do the delete_from_page_cache()
pagecache removal after dropping the PT lock, the PT lock is a spinlock
so drop it before the sleepable page lock.

Link: http://lkml.kernel.org/r/20181126173452.26955-5-aarcange@redhat.com
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 12:59:08 +01:00
Andrea Arcangeli
6e44dd02c9 userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
commit 5b51072e97d587186c2f5390c8c9c1fb7e179505 upstream.

Userfaultfd did not create private memory when UFFDIO_COPY was invoked
on a MAP_PRIVATE shmem mapping.  Instead it wrote to the shmem file,
even when that had not been opened for writing.  Though, fortunately,
that could only happen where there was a hole in the file.

Fix the shmem-backed implementation of UFFDIO_COPY to create private
memory for MAP_PRIVATE mappings.  The hugetlbfs-backed implementation
was already correct.

This change is visible to userland, if userfaultfd has been used in
unintended ways: so it introduces a small risk of incompatibility, but
is necessary in order to respect file permissions.

An app that uses UFFDIO_COPY for anything like postcopy live migration
won't notice the difference, and in fact it'll run faster because there
will be no copy-on-write and memory waste in the tmpfs pagecache
anymore.

Userfaults on MAP_PRIVATE shmem keep triggering only on file holes like
before.

The real zeropage can also be built on a MAP_PRIVATE shmem mapping
through UFFDIO_ZEROPAGE and that's safe because the zeropage pte is
never dirty, in turn even an mprotect upgrading the vma permission from
PROT_READ to PROT_READ|PROT_WRITE won't make the zeropage pte writable.

Link: http://lkml.kernel.org/r/20181126173452.26955-3-aarcange@redhat.com
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 12:59:08 +01:00
Andrea Arcangeli
10f98c134b userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails
commit 9e368259ad988356c4c95150fafd1a06af095d98 upstream.

Patch series "userfaultfd shmem updates".

Jann found two bugs in the userfaultfd shmem MAP_SHARED backend: the
lack of the VM_MAYWRITE check and the lack of i_size checks.

Then looking into the above we also fixed the MAP_PRIVATE case.

Hugh by source review also found a data loss source if UFFDIO_COPY is
used on shmem MAP_SHARED PROT_READ mappings (the production usages
incidentally run with PROT_READ|PROT_WRITE, so the data loss couldn't
happen in those production usages like with QEMU).

The whole patchset is marked for stable.

We verified QEMU postcopy live migration with guest running on shmem
MAP_PRIVATE run as well as before after the fix of shmem MAP_PRIVATE.
Regardless if it's shmem or hugetlbfs or MAP_PRIVATE or MAP_SHARED, QEMU
unconditionally invokes a punch hole if the guest mapping is filebacked
and a MADV_DONTNEED too (needed to get rid of the MAP_PRIVATE COWs and
for the anon backend).

This patch (of 5):

We internally used EFAULT to communicate with the caller, switch to
ENOENT, so EFAULT can be used as a non internal retval.

Link: http://lkml.kernel.org/r/20181126173452.26955-2-aarcange@redhat.com
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-08 12:59:08 +01:00
Greg Kroah-Hartman
c454ec1e21 This is the 4.19.7 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwIG48ACgkQONu9yGCS
 aT7g6Q//RkJ8ZWaRkykcCGaWIvwI6QF1tmKalIEWmToPdndDuQdUDGzWVwfE9G7P
 yLcnp3GMlXo4F82BBwG8lFSAm9zaeqaLabnJnXbCc5mZ3xi/2aNqIGHzBY1isNZl
 0fTzzcelnAKzjp0Aa/egRLOeraSLgVt/Cp7Ha3FXMP6RNxUMzs1pbQ2IFZ3m+P4G
 CAD3Iye6geOaZTu/kXiiooUEUGFQFbV4c3AZ4VW7dZDdrG+ekwtF4YHtkEPseWJQ
 Ugtrbr6S0IxYQ91o1Pk77kg4uwUFYo12jrk8Ni4gaPZE6mQCa08tr2Alg2oZkJGw
 PdXnt2ASYGRWFYK2JAuTvKzhHrTEJYhiC323dKYCAx7BgfFaqdo5F20oNzYxXFBB
 gGA3AzDDtLUD3OOO+lxrDxXMhpwXUx92WXsoJVsaSafdqIDAueq14sH19wqm0gUJ
 D1fC2dWTsFrPZKjkU8Z6rJAyO1XZED55h7v1YlqAt2ibjCeDKpjnW3yvUt8Ivpqc
 nlnmp8v/Yl2cdY55XtlgUadpknSc2jApFMwhSWetxAaqDCvha2dLQ28YMyPRJzat
 ZHOkizM/VUntXvlUzFvVTsqLQiX0sfLG6MKcUkzWehPomNKT+B8XL1wtzytv9QXb
 jOY8nRD5PiQo2p35cqdDCskBwqzEwY+WxDe7ji0yHZysBZLxoxQ=
 =OiCf
 -----END PGP SIGNATURE-----

Merge 4.19.7 into android-4.19

Changes in 4.19.7
	mm/huge_memory: rename freeze_page() to unmap_page()
	mm/huge_memory: splitting set mapping+index before unfreeze
	mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
	mm/khugepaged: collapse_shmem() stop if punched or truncated
	mm/khugepaged: fix crashes due to misaccounted holes
	mm/khugepaged: collapse_shmem() remember to clear holes
	mm/khugepaged: minor reorderings in collapse_shmem()
	mm/khugepaged: collapse_shmem() without freezing new_page
	mm/khugepaged: collapse_shmem() do not crash on Compound
	lan743x: Enable driver to work with LAN7431
	lan743x: fix return value for lan743x_tx_napi_poll
	net: don't keep lonely packets forever in the gro hash
	net: gemini: Fix copy/paste error
	net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
	packet: copy user buffers before orphan or clone
	rapidio/rionet: do not free skb before reading its length
	s390/qeth: fix length check in SNMP processing
	usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
	net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
	net: skb_scrub_packet(): Scrub offload_fwd_mark
	virtio-net: disable guest csum during XDP set
	virtio-net: fail XDP set if guest csum is negotiated
	net/dim: Update DIM start sample after each DIM iteration
	tcp: defer SACK compression after DupThresh
	net: phy: add workaround for issue where PHY driver doesn't bind to the device
	tipc: fix lockdep warning during node delete
	x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
	x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
	x86/speculation: Propagate information about RSB filling mitigation to sysfs
	x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
	x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support
	x86/retpoline: Remove minimal retpoline support
	x86/speculation: Update the TIF_SSBD comment
	x86/speculation: Clean up spectre_v2_parse_cmdline()
	x86/speculation: Remove unnecessary ret variable in cpu_show_common()
	x86/speculation: Move STIPB/IBPB string conditionals out of cpu_show_common()
	x86/speculation: Disable STIBP when enhanced IBRS is in use
	x86/speculation: Rename SSBD update functions
	x86/speculation: Reorganize speculation control MSRs update
	sched/smt: Make sched_smt_present track topology
	x86/Kconfig: Select SCHED_SMT if SMP enabled
	sched/smt: Expose sched_smt_present static key
	x86/speculation: Rework SMT state change
	x86/l1tf: Show actual SMT state
	x86/speculation: Reorder the spec_v2 code
	x86/speculation: Mark string arrays const correctly
	x86/speculataion: Mark command line parser data __initdata
	x86/speculation: Unify conditional spectre v2 print functions
	x86/speculation: Add command line control for indirect branch speculation
	x86/speculation: Prepare for per task indirect branch speculation control
	x86/process: Consolidate and simplify switch_to_xtra() code
	x86/speculation: Avoid __switch_to_xtra() calls
	x86/speculation: Prepare for conditional IBPB in switch_mm()
	ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
	x86/speculation: Split out TIF update
	x86/speculation: Prevent stale SPEC_CTRL msr content
	x86/speculation: Prepare arch_smt_update() for PRCTL mode
	x86/speculation: Add prctl() control for indirect branch speculation
	x86/speculation: Enable prctl mode for spectre_v2_user
	x86/speculation: Add seccomp Spectre v2 user space protection mode
	x86/speculation: Provide IBPB always command line options
	userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
	kvm: mmu: Fix race in emulated page table writes
	kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb
	KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset
	KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall
	KVM: LAPIC: Fix pv ipis use-before-initialization
	KVM: X86: Fix scan ioapic use-before-initialization
	KVM: VMX: re-add ple_gap module parameter
	xtensa: enable coprocessors that are being flushed
	xtensa: fix coprocessor context offset definitions
	xtensa: fix coprocessor part of ptrace_{get,set}xregs
	udf: Allow mounting volumes with incorrect identification strings
	btrfs: Always try all copies when reading extent buffers
	Btrfs: ensure path name is null terminated at btrfs_control_ioctl
	Btrfs: fix rare chances for data loss when doing a fast fsync
	Btrfs: fix race between enabling quotas and subvolume creation
	btrfs: relocation: set trans to be NULL after ending transaction
	PCI: layerscape: Fix wrong invocation of outbound window disable accessor
	PCI: dwc: Fix MSI-X EP framework address calculation bug
	PCI: Fix incorrect value returned from pcie_get_speed_cap()
	arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou.
	x86/MCE/AMD: Fix the thresholding machinery initialization order
	x86/fpu: Disable bottom halves while loading FPU registers
	perf/x86/intel: Move branch tracing setup to the Intel-specific source file
	perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts()
	perf/x86/intel: Disallow precise_ip on BTS events
	fs: fix lost error code in dio_complete
	ALSA: wss: Fix invalid snd_free_pages() at error path
	ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write
	ALSA: control: Fix race between adding and removing a user element
	ALSA: sparc: Fix invalid snd_free_pages() at error path
	ALSA: hda: Add ASRock N68C-S UCC the power_save blacklist
	ALSA: hda/realtek - Support ALC300
	ALSA: hda/realtek - fix headset mic detection for MSI MS-B171
	ALSA: hda/realtek - fix the pop noise on headphone for lenovo laptops
	ALSA: hda/realtek - Add auto-mute quirk for HP Spectre x360 laptop
	function_graph: Create function_graph_enter() to consolidate architecture code
	ARM: function_graph: Simplify with function_graph_enter()
	microblaze: function_graph: Simplify with function_graph_enter()
	x86/function_graph: Simplify with function_graph_enter()
	nds32: function_graph: Simplify with function_graph_enter()
	powerpc/function_graph: Simplify with function_graph_enter()
	sh/function_graph: Simplify with function_graph_enter()
	sparc/function_graph: Simplify with function_graph_enter()
	parisc: function_graph: Simplify with function_graph_enter()
	riscv/function_graph: Simplify with function_graph_enter()
	s390/function_graph: Simplify with function_graph_enter()
	arm64: function_graph: Simplify with function_graph_enter()
	MIPS: function_graph: Simplify with function_graph_enter()
	function_graph: Make ftrace_push_return_trace() static
	function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack
	function_graph: Have profiler use curr_ret_stack and not depth
	function_graph: Move return callback before update of curr_ret_stack
	function_graph: Reverse the order of pushing the ret_stack and the callback
	binder: fix race that allows malicious free of live buffer
	ext2: initialize opts.s_mount_opt as zero before using it
	ext2: fix potential use after free
	ASoC: intel: cht_bsw_max98090_ti: Add quirk for boards using pmc_plt_clk_0
	ASoC: pcm186x: Fix device reset-registers trigger value
	ARM: dts: rockchip: Remove @0 from the veyron memory node
	dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
	dmaengine: at_hdmac: fix module unloading
	staging: most: use format specifier "%s" in snprintf
	staging: vchiq_arm: fix compat VCHIQ_IOC_AWAIT_COMPLETION
	staging: mt7621-dma: fix potentially dereferencing uninitialized 'tx_desc'
	staging: mt7621-pinctrl: fix uninitialized variable ngroups
	staging: rtl8723bs: Fix incorrect sense of ether_addr_equal
	staging: rtl8723bs: Add missing return for cfg80211_rtw_get_station
	USB: usb-storage: Add new IDs to ums-realtek
	usb: core: quirks: add RESET_RESUME quirk for Cherry G230 Stream series
	Revert "usb: dwc3: gadget: skip Set/Clear Halt when invalid"
	iio/hid-sensors: Fix IIO_CHAN_INFO_RAW returning wrong values for signed numbers
	iio:st_magn: Fix enable device after trigger
	lib/test_kmod.c: fix rmmod double free
	mm: cleancache: fix corruption on missed inode invalidation
	mm: use swp_offset as key in shmem_replace_page()
	Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl()
	misc: mic/scif: fix copy-paste error in scif_create_remote_lookup
	Linux 4.19.7

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-12-06 10:34:09 +01:00
Greg Kroah-Hartman
635c56d224 This is the 4.19.6 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwCSE8ACgkQONu9yGCS
 aT58lg//YXiTDY8JuG+LX8PJyL28s5gIQZyq7a8aEuxGFXbTfmym0TecN74F2gFM
 7YBJ9j4u/W5xp/u/29VUOUE9OUiRdMa+GJz73ncgslHApp7r3Z5r9PJFJHtW07Xu
 IElCg2GvQLR0pzyNlsa+Nv738pldDr0d9xZDmsOp1Cs0aCfJQAbU1y9P5WNN8j3y
 rHQP19/2+HF0j6LqYxIRmgioSrmeHrEN/nWIDlFpW74+QPyI7d/6aJpr1Tfdy64u
 6BE/48OunHjOPbO6fWcNjFm0FUlTYDKd8jtzkaIHmFKgXpDFb+3yN4AiMd4/ucPS
 SNqVqvzTfU8aKWEtIabTTG1m3AwuqJUrExYUQZwNe32zOhEMIE+rMpmgafSN3SjE
 k0cER70OS1rJ5rs/cqBY8UpqhPxqfTFSwEwHGqn66PeuYgCpjoXHIBVyn/s+I3CZ
 Be8udYwi3KXBYrMGppzFp5PklwkqrUIFFouF2ijtPBjKfZpte9/ZOGWxvZMux6Ev
 rqFaq/zf9DjvQ3BSwHh2QuQKK5WnGQVuwjDWHR/vso4bApErHFhDWvGAIFyFxRsK
 W70DUeUxSScNjNKDgyxzRUV18VF0IN8zMXfh4hCMtoq6+XzDG/DUBt6fBFXaZCun
 kWyCTZk+9sMkGVlL8kAB2UPbAjfuDRAijouwC+u0j0VRMXlsAWM=
 =ju/p
 -----END PGP SIGNATURE-----

Merge 4.19.6 into android-4.19

Changes in 4.19.6
	HID: steam: remove input device when a hid client is running.
	efi/libstub: arm: support building with clang
	usb: core: Fix hub port connection events lost
	usb: dwc3: gadget: fix ISOC TRB type on unaligned transfers
	usb: dwc3: gadget: Properly check last unaligned/zero chain TRB
	usb: dwc3: core: Clean up ULPI device
	usb: dwc3: Fix NULL pointer exception in dwc3_pci_remove()
	xhci: Fix leaking USB3 shared_hcd at xhci removal
	xhci: handle port status events for removed USB3 hcd
	xhci: Add check for invalid byte size error when UAS devices are connected.
	usb: xhci: fix uninitialized completion when USB3 port got wrong status
	usb: xhci: fix timeout for transition from RExit to U0
	xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc
	usb: xhci: Prevent bus suspend if a port connect change or polling state is detected
	ALSA: oss: Use kvzalloc() for local buffer allocations
	MAINTAINERS: Add Sasha as a stable branch maintainer
	Documentation/security-bugs: Clarify treatment of embargoed information
	Documentation/security-bugs: Postpone fix publication in exceptional cases
	mmc: sdhci-pci: Try "cd" for card-detect lookup before using NULL
	mmc: sdhci-pci: Workaround GLK firmware failing to restore the tuning value
	gpio: don't free unallocated ida on gpiochip_add_data_with_key() error path
	iwlwifi: fix wrong WGDS_WIFI_DATA_SIZE
	iwlwifi: mvm: support sta_statistics() even on older firmware
	iwlwifi: mvm: fix regulatory domain update when the firmware starts
	iwlwifi: mvm: don't use SAR Geo if basic SAR is not used
	brcmfmac: fix reporting support for 160 MHz channels
	opp: ti-opp-supply: Dynamically update u_volt_min
	opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call
	tools/power/cpupower: fix compilation with STATIC=true
	v9fs_dir_readdir: fix double-free on p9stat_read error
	selinux: Add __GFP_NOWARN to allocation at str_read()
	Input: synaptics - avoid using uninitialized variable when probing
	bfs: add sanity check at bfs_fill_super()
	sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peer
	gfs2: Don't leave s_fs_info pointing to freed memory in init_sbd
	llc: do not use sk_eat_skb()
	mm: don't warn about large allocations for slab
	mm/memory.c: recheck page table entry with page table lock held
	tcp: do not release socket ownership in tcp_close()
	drm/fb-helper: Blacklist writeback when adding connectors to fbdev
	drm/amdgpu: Add missing firmware entry for HAINAN
	drm/vc4: Set ->legacy_cursor_update to false when doing non-async updates
	drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset
	drm/i915: Disable LP3 watermarks on all SNB machines
	drm/ast: change resolution may cause screen blurred
	drm/ast: fixed cursor may disappear sometimes
	drm/ast: Remove existing framebuffers before loading driver
	can: flexcan: Unlock the MB unconditionally
	can: dev: can_get_echo_skb(): factor out non sending code to __can_get_echo_skb()
	can: dev: __can_get_echo_skb(): replace struct can_frame by canfd_frame to access frame length
	can: dev: __can_get_echo_skb(): Don't crash the kernel if can_priv::echo_skb is accessed out of bounds
	can: dev: __can_get_echo_skb(): print error message, if trying to echo non existing skb
	can: rx-offload: introduce can_rx_offload_get_echo_skb() and can_rx_offload_queue_sorted() functions
	can: rx-offload: rename can_rx_offload_irq_queue_err_skb() to can_rx_offload_queue_tail()
	can: flexcan: use can_rx_offload_queue_sorted() for flexcan_irq_bus_*()
	can: flexcan: handle tx-complete CAN frames via rx-offload infrastructure
	can: raw: check for CAN FD capable netdev in raw_sendmsg()
	can: hi311x: Use level-triggered interrupt
	can: flexcan: Always use last mailbox for TX
	can: flexcan: remove not needed struct flexcan_priv::tx_mb and struct flexcan_priv::tx_mb_idx
	ACPICA: AML interpreter: add region addresses in global list during initialization
	IB/hfi1: Eliminate races in the SDMA send error path
	fsnotify: generalize handling of extra event flags
	fanotify: fix handling of events on child sub-directory
	pinctrl: meson: fix pinconf bias disable
	pinctrl: meson: fix gxbb ao pull register bits
	pinctrl: meson: fix gxl ao pull register bits
	pinctrl: meson: fix meson8 ao pull register bits
	pinctrl: meson: fix meson8b ao pull register bits
	tools/testing/nvdimm: Fix the array size for dimm devices.
	scsi: lpfc: fix remoteport access
	scsi: hisi_sas: Remove set but not used variable 'dq_list'
	KVM: PPC: Move and undef TRACE_INCLUDE_PATH/FILE
	cpufreq: imx6q: add return value check for voltage scale
	rtc: cmos: Do not export alarm rtc_ops when we do not support alarms
	rtc: pcf2127: fix a kmemleak caused in pcf2127_i2c_gather_write
	crypto: simd - correctly take reqsize of wrapped skcipher into account
	floppy: fix race condition in __floppy_read_block_0()
	powerpc/io: Fix the IO workarounds code to work with Radix
	sched/fair: Fix cpu_util_wake() for 'execl' type workloads
	perf/x86/intel/uncore: Add more IMC PCI IDs for KabyLake and CoffeeLake CPUs
	block: copy ioprio in __bio_clone_fast() and bounce
	SUNRPC: Fix a bogus get/put in generic_key_to_expire()
	riscv: add missing vdso_install target
	RISC-V: Silence some module warnings on 32-bit
	drm/amdgpu: fix bug with IH ring setup
	kdb: Use strscpy with destination buffer size
	NFSv4: Fix an Oops during delegation callbacks
	powerpc/numa: Suppress "VPHN is not supported" messages
	efi/arm: Revert deferred unmap of early memmap mapping
	z3fold: fix possible reclaim races
	mm, memory_hotplug: check zone_movable in has_unmovable_pages
	tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
	mm, page_alloc: check for max order in hot path
	dax: Avoid losing wakeup in dax_lock_mapping_entry
	include/linux/pfn_t.h: force '~' to be parsed as an unary operator
	tty: wipe buffer.
	tty: wipe buffer if not echoing data
	gfs2: Fix iomap buffer head reference counting bug
	rcu: Make need_resched() respond to urgent RCU-QS needs
	media: ov5640: Re-work MIPI startup sequence
	media: ov5640: Fix timings setup code
	media: ov5640: fix exposure regression
	media: ov5640: fix auto gain & exposure when changing mode
	media: ov5640: fix wrong binning value in exposure calculation
	media: ov5640: fix auto controls values when switching to manual mode
	Linux 4.19.6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-12-06 09:32:46 +01:00
Yu Zhao
b66375b599 mm: use swp_offset as key in shmem_replace_page()
commit c1cb20d43728aa9b5393bd8d489bc85c142949b2 upstream.

We changed the key of swap cache tree from swp_entry_t.val to
swp_offset.  We need to do so in shmem_replace_page() as well.

Hugh said:
 "shmem_replace_page() has been wrong since the day I wrote it: good
  enough to work on swap "type" 0, which is all most people ever use
  (especially those few who need shmem_replace_page() at all), but
  broken once there are any non-0 swp_type bits set in the higher order
  bits"

Link: http://lkml.kernel.org/r/20181121215442.138545-1-yuzhao@google.com
Fixes: f6ab1f7f6b ("mm, swap: use offset of swap entry as key of swap cache")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>	[4.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-05 19:32:14 +01:00
Pavel Tikhomirov
16a2d60224 mm: cleancache: fix corruption on missed inode invalidation
commit 6ff38bd40230af35e446239396e5fc8ebd6a5248 upstream.

If all pages are deleted from the mapping by memory reclaim and also
moved to the cleancache:

__delete_from_page_cache
  (no shadow case)
  unaccount_page_cache_page
    cleancache_put_page
  page_cache_delete
    mapping->nrpages -= nr
    (nrpages becomes 0)

We don't clean the cleancache for an inode after final file truncation
(removal).

truncate_inode_pages_final
  check (nrpages || nrexceptional) is false
    no truncate_inode_pages
      no cleancache_invalidate_inode(mapping)

These way when reading the new file created with same inode we may get
these trash leftover pages from cleancache and see wrong data instead of
the contents of the new file.

Fix it by always doing truncate_inode_pages which is already ready for
nrpages == 0 && nrexceptional == 0 case and just invalidates inode.

[akpm@linux-foundation.org: add comment, per Jan]
Link: http://lkml.kernel.org/r/20181112095734.17979-1-ptikhomirov@virtuozzo.com
Fixes: commit 91b0abe36a ("mm + fs: store shadow entries in page cache")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-05 19:32:13 +01:00
Andrea Arcangeli
34b7a7cc53 userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
commit 29ec90660d68bbdd69507c1c8b4e33aa299278b1 upstream.

After the VMA to register the uffd onto is found, check that it has
VM_MAYWRITE set before allowing registration.  This way we inherit all
common code checks before allowing to fill file holes in shmem and
hugetlbfs with UFFDIO_COPY.

The userfaultfd memory model is not applicable for readonly files unless
it's a MAP_PRIVATE.

Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
Fixes: ff62a34210 ("hugetlb: implement memfd sealing")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Reported-by: Jann Horn <jannh@google.com>
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-05 19:32:04 +01:00
Hugh Dickins
8b37c40503 mm/khugepaged: collapse_shmem() do not crash on Compound
commit 06a5e1268a5fb9c2b346a3da6b97e85f2eba0f07 upstream.

collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before
it holds page lock of the first page, racing truncation then extension
might conceivably have inserted a hugepage there already.  Fail with the
SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or
otherwise mishandling the unexpected hugepage - though later we might
code up a more constructive way of handling it, with SCAN_SUCCESS.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261529310.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-05 19:31:57 +01:00
Hugh Dickins
af24c01831 mm/khugepaged: collapse_shmem() without freezing new_page
commit 87c460a0bded56195b5eb497d44709777ef7b415 upstream.

khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0).  Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.

Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.

One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.

The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong.  So simply
eliminate the freezing of the new_page.

Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before.  They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-05 19:31:57 +01:00
Hugh Dickins
3e9646c76c mm/khugepaged: minor reorderings in collapse_shmem()
commit 042a30824871fa3149b0127009074b75cc25863c upstream.

Several cleanups in collapse_shmem(): most of which probably do not
really matter, beyond doing things in a more familiar and reassuring
order.  Simplify the failure gotos in the main loop, and on success
update stats while interrupts still disabled from the last iteration.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261526400.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-05 19:31:57 +01:00
Hugh Dickins
ee13d69bc1 mm/khugepaged: collapse_shmem() remember to clear holes
commit 2af8ff291848cc4b1cce24b6c943394eb2c761e8 upstream.

Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp
flags khugepaged uses to allocate a huge page - in all common cases it
would just be a waste of effort - so collapse_shmem() must remember to
clear out any holes that it instantiates.

The obvious place to do so, where they are put into the page cache tree,
is not a good choice: because interrupts are disabled there.  Leave it
until further down, once success is assured, where the other pages are
copied (before setting PageUptodate).

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261525080.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-05 19:31:57 +01:00
Hugh Dickins
78141aabfb mm/khugepaged: fix crashes due to misaccounted holes
commit aaa52e340073b7f4593b3c4ddafcafa70cf838b5 upstream.

Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent
hit shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by
clear_inode()'s BUG_ON(inode->i_data.nrpages) when the file was later
closed and unlinked.

khugepaged's collapse_shmem() was forgetting to update mapping->nrpages
on the rollback path, after it had added but then needs to undo some
holes.

There is indeed an irritating asymmetry between shmem_charge(), whose
callers want it to increment nrpages after successfully accounting
blocks, and shmem_uncharge(), when __delete_from_page_cache() already
decremented nrpages itself: oh well, just add a comment on that to them
both.

And shmem_recalc_inode() is supposed to be called when the accounting is
expected to be in balance (so it can deduce from imbalance that reclaim
discarded some pages): so change shmem_charge() to update nrpages
earlier (though it's rare for the difference to matter at all).

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261523450.2275@eggly.anvils
Fixes: 800d8c63b2 ("shmem: add huge pages support")
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-05 19:31:57 +01:00
Hugh Dickins
8797f2f4fe mm/khugepaged: collapse_shmem() stop if punched or truncated
commit 701270fa193aadf00bdcf607738f64997275d4c7 upstream.

Huge tmpfs testing showed that although collapse_shmem() recognizes a
concurrently truncated or hole-punched page correctly, its handling of
holes was liable to refill an emptied extent.  Add check to stop that.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261522040.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-05 19:31:57 +01:00
Hugh Dickins
d31ff4722f mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
commit 006d3ff27e884f80bd7d306b041afc415f63598f upstream.

Huge tmpfs testing, on 32-bit kernel with lockdep enabled, showed that
__split_huge_page() was using i_size_read() while holding the irq-safe
lru_lock and page tree lock, but the 32-bit i_size_read() uses an
irq-unsafe seqlock which should not be nested inside them.

Instead, read the i_size earlier in split_huge_page_to_list(), and pass
the end offset down to __split_huge_page(): all while holding head page
lock, which is enough to prevent truncation of that extent before the
page tree lock has been taken.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261520070.2275@eggly.anvils
Fixes: baa355fd33 ("thp: file pages support for split_huge_page()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-05 19:31:57 +01:00
Hugh Dickins
7e18656c9a mm/huge_memory: splitting set mapping+index before unfreeze
commit 173d9d9fd3ddae84c110fea8aedf1f26af6be9ec upstream.

Huge tmpfs stress testing has occasionally hit shmem_undo_range()'s
VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page).

Move the setting of mapping and index up before the page_ref_unfreeze()
in __split_huge_page_tail() to fix this: so that a page cache lookup
cannot get a reference while the tail's mapping and index are unstable.

In fact, might as well move them up before the smp_wmb(): I don't see an
actual need for that, but if I'm missing something, this way round is
safer than the other, and no less efficient.

You might argue that VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page) is
misplaced, and should be left until after the trylock_page(); but left as
is has not crashed since, and gives more stringent assurance.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261516380.2275@eggly.anvils
Fixes: e9b61f1985 ("thp: reintroduce split_huge_page()")
Requires: 605ca5ede7 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-05 19:31:57 +01:00
Hugh Dickins
69697e6a61 mm/huge_memory: rename freeze_page() to unmap_page()
commit 906f9cdfc2a0800f13683f9e4ebdfd08c12ee81b upstream.

The term "freeze" is used in several ways in the kernel, and in mm it
has the particular meaning of forcing page refcount temporarily to 0.
freeze_page() is just too confusing a name for a function that unmaps a
page: rename it unmap_page(), and rename unfreeze_page() remap_page().

Went to change the mention of freeze_page() added later in mm/rmap.c,
but found it to be incorrect: ordinary page reclaim reaches there too;
but the substance of the comment still seems correct, so edit it down.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261514080.2275@eggly.anvils
Fixes: e9b61f1985 ("thp: reintroduce split_huge_page()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-05 19:31:57 +01:00
Rik van Riel
fe0ea3084c ANDROID: add extra free kbytes tunable
Add a userspace visible knob to tell the VM to keep an extra amount
of memory free, by increasing the gap between each zone's min and
low watermarks.

This is useful for realtime applications that call system
calls and have a bound on the number of allocations that happen
in any short time period.  In this application, extra_free_kbytes
would be left at an amount equal to or larger than than the
maximum number of allocations that happen in any burst.

It may also be useful to reduce the memory use of virtual
machines (temporarily?), in a way that does not cause memory
fragmentation like ballooning does.

[ccross]
Revived for use on old kernels where no other solution exists.
The tunable will be removed on kernels that do better at avoiding
direct reclaim.

[surenb]
Will be reverted as soon as Android framework is reworked to
use upstream-supported watermark_scale_factor instead of
extra_free_kbytes.

Bug: 86445363
Bug: 109664768
Bug: 120445732
Change-Id: I765a42be8e964bfd3e2886d1ca85a29d60c3bb3e
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2018-12-05 09:48:12 -08:00
Colin Cross
533e4ed309 ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory.  When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.

This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma.  vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.

Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.

The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas.  If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".

The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen.  This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging.  The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.

Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list.  Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.

Bug: 120441514
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Fix get_user_pages_remote() call to align with upstream commit
        5b56d49fc3 ("mm: add locked parameter to get_user_pages_remote()")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-12-05 09:48:11 -08:00
Michal Hocko
9dec38554a mm, page_alloc: check for max order in hot path
[ Upstream commit c63ae43ba53bc432b414fd73dd5f4b01fcb1ab43 ]

Konstantin has noticed that kvmalloc might trigger the following
warning:

  WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
  [...]
  Call Trace:
   fragmentation_index+0x76/0x90
   compaction_suitable+0x4f/0xf0
   shrink_node+0x295/0x310
   node_reclaim+0x205/0x250
   get_page_from_freelist+0x649/0xad0
   __alloc_pages_nodemask+0x12a/0x2a0
   kmalloc_large_node+0x47/0x90
   __kmalloc_node+0x22b/0x2e0
   kvmalloc_node+0x3e/0x70
   xt_alloc_table_info+0x3a/0x80 [x_tables]
   do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
   nf_setsockopt+0x44/0x60
   SyS_setsockopt+0x6f/0xc0
   do_syscall_64+0x67/0x120
   entry_SYSCALL_64_after_hwframe+0x3d/0xa2

the problem is that we only check for an out of bound order in the slow
path and the node reclaim might happen from the fast path already.  This
is fixable by making sure that kvmalloc doesn't ever use kmalloc for
requests that are larger than KMALLOC_MAX_SIZE but this also shows that
the code is rather fragile.  A recent UBSAN report just underlines that
by the following report

  UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
  shift exponent 51 is too large for 32-bit type 'int'
  CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0xd2/0x148 lib/dump_stack.c:113
   ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
   __ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
   __zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
   zone_watermark_fast mm/page_alloc.c:3216 [inline]
   get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
   __alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
   alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
   alloc_pages include/linux/gfp.h:509 [inline]
   __get_free_pages+0x12/0x60 mm/page_alloc.c:4414
   dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
   raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
   raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
   fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
   fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
   __blkdev_driver_ioctl block/ioctl.c:303 [inline]
   blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
   block_ioctl+0x105/0x150 fs/block_dev.c:1883
   vfs_ioctl fs/ioctl.c:46 [inline]
   do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
   ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
   __do_sys_ioctl fs/ioctl.c:709 [inline]
   __se_sys_ioctl fs/ioctl.c:707 [inline]
   __x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
   do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Note that this is not a kvmalloc path.  It is just that the fast path
really depends on having sanitzed order as well.  Therefore move the
order check to the fast path.

Link: http://lkml.kernel.org/r/20181113094305.GM15120@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Byoungyoung Lee <lifeasageek@gmail.com>
Cc: "Dae R. Jeong" <threeearcat@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-01 09:37:34 +01:00
Yufen Yu
db89fc007b tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
[ Upstream commit 1a413646931cb14442065cfc17561e50f5b5bb44 ]

Other filesystems such as ext4, f2fs and ubifs all return ENXIO when
lseek (SEEK_DATA or SEEK_HOLE) requests a negative offset.

man 2 lseek says

:      EINVAL whence  is  not  valid.   Or: the resulting file offset would be
:             negative, or beyond the end of a seekable device.
:
:      ENXIO  whence is SEEK_DATA or SEEK_HOLE, and the file offset is  beyond
:             the end of the file.

Make tmpfs return ENXIO under these circumstances as well.  After this,
tmpfs also passes xfstests's generic/448.

[akpm@linux-foundation.org: rewrite changelog]
Link: http://lkml.kernel.org/r/1540434176-14349-1-git-send-email-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-01 09:37:34 +01:00
Michal Hocko
b44fd1268b mm, memory_hotplug: check zone_movable in has_unmovable_pages
[ Upstream commit 9d7899999c62c1a81129b76d2a6ecbc4655e1597 ]

Page state checks are racy.  Under a heavy memory workload (e.g.  stress
-m 200 -t 2h) it is quite easy to hit a race window when the page is
allocated but its state is not fully populated yet.  A debugging patch to
dump the struct page state shows

  has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0
  page:ffffea0437fb0000 count:1 mapcount:1 mapping:ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1
  flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)

Note that the state has been checked for both PageLRU and PageSwapBacked
already.  Closing this race completely would require some sort of retry
logic.  This can be tricky and error prone (think of potential endless
or long taking loops).

Workaround this problem for movable zones at least.  Such a zone should
only contain movable pages.  Commit 15c30bc090 ("mm, memory_hotplug:
make has_unmovable_pages more robust") has told us that this is not
strictly true though.  Bootmem pages should be marked reserved though so
we can move the original check after the PageReserved check.  Pages from
other zones are still prone to races but we even do not pretend that
memory hotremove works for those so pre-mature failure doesn't hurt that
much.

Link: http://lkml.kernel.org/r/20181106095524.14629-1-mhocko@kernel.org
Fixes: 15c30bc090 ("mm, memory_hotplug: make has_unmovable_pages more robust")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Baoquan He <bhe@redhat.com>
Tested-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-01 09:37:33 +01:00
Vitaly Wool
510066729b z3fold: fix possible reclaim races
[ Upstream commit ca0246bb97c23da9d267c2107c07fb77e38205c9 ]

Reclaim and free can race on an object which is basically fine but in
order for reclaim to be able to map "freed" object we need to encode
object length in the handle.  handle_to_chunks() is then introduced to
extract object length from a handle and use it during mapping.

Moreover, to avoid racing on a z3fold "headless" page release, we should
not try to free that page in z3fold_free() if the reclaim bit is set.
Also, in the unlikely case of trying to reclaim a page being freed, we
should not proceed with that page.

While at it, fix the page accounting in reclaim function.

This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups".

Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Signed-off-by: Jongseok Kim <ks77sj@gmail.com>
Reported-by-by: Jongseok Kim <ks77sj@gmail.com>
Reviewed-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-01 09:37:33 +01:00
Aneesh Kumar K.V
5999609a93 mm/memory.c: recheck page table entry with page table lock held
commit ff09d7ec9786be4ad7589aa987d7dc66e2dd9160 upstream.

We clear the pte temporarily during read/modify/write update of the pte.
If we take a page fault while the pte is cleared, the application can get
SIGBUS.  One such case is with remap_pfn_range without a backing
vm_ops->fault callback.  do_fault will return SIGBUS in that case.

cpu 0		 				cpu1
mprotect()
ptep_modify_prot_start()/pte cleared.
.
.						page fault.
.
.
prep_modify_prot_commit()

Fix this by taking page table lock and rechecking for pte_none.

[aneesh.kumar@linux.ibm.com: fix crash observed with syzkaller run]
  Link: http://lkml.kernel.org/r/87va6bwlfg.fsf@linux.ibm.com
Link: http://lkml.kernel.org/r/20180926031858.9692-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-01 09:37:28 +01:00
Dmitry Vyukov
3996e891ec mm: don't warn about large allocations for slab
commit 61448479a9f2c954cde0cfe778cb6bec5d0a748d upstream.

Slub does not call kmalloc_slab() for sizes > KMALLOC_MAX_CACHE_SIZE,
instead it falls back to kmalloc_large().

For slab KMALLOC_MAX_CACHE_SIZE == KMALLOC_MAX_SIZE and it calls
kmalloc_slab() for all allocations relying on NULL return value for
over-sized allocations.

This inconsistency leads to unwanted warnings from kmalloc_slab() for
over-sized allocations for slab.  Returning NULL for failed allocations is
the expected behavior.

Make slub and slab code consistent by checking size >
KMALLOC_MAX_CACHE_SIZE in slab before calling kmalloc_slab().

While we are here also fix the check in kmalloc_slab().  We should check
against KMALLOC_MAX_CACHE_SIZE rather than KMALLOC_MAX_SIZE.  It all kinda
worked because for slab the constants are the same, and slub always checks
the size against KMALLOC_MAX_CACHE_SIZE before kmalloc_slab().  But if we
get there with size > KMALLOC_MAX_CACHE_SIZE anyhow bad things will
happen.  For example, in case of a newly introduced bug in slub code.

Also move the check in kmalloc_slab() from function entry to the size >
192 case.  This partially compensates for the additional check in slab
code and makes slub code a bit faster (at least theoretically).

Also drop __GFP_NOWARN in the warning check.  This warning means a bug in
slab code itself, user-passed flags have nothing to do with it.

Nothing of this affects slob.

Link: http://lkml.kernel.org/r/20180927171502.226522-1-dvyukov@gmail.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+87829a10073277282ad1@syzkaller.appspotmail.com
Reported-by: syzbot+ef4e8fc3a06e9019bb40@syzkaller.appspotmail.com
Reported-by: syzbot+6e438f4036df52cbb863@syzkaller.appspotmail.com
Reported-by: syzbot+8574471d8734457d98aa@syzkaller.appspotmail.com
Reported-by: syzbot+af1504df0807a083dbd9@syzkaller.appspotmail.com
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-01 09:37:28 +01:00