kernel-fxtec-pro1x/mm
David Hildenbrand 0a69047d82 mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
[ Upstream commit e822969cab48b786b64246aad1a3ba2a774f5d23 ]

Patch series "mm: fix max_pfn not falling on section boundary", v2.

Playing with different memory sizes for a x86-64 guest, I discovered that
some memmaps (highest section if max_mem does not fall on the section
boundary) are marked as being valid and online, but contain garbage.  We
have to properly initialize these memmaps.

Looking at /proc/kpageflags and friends, I found some more issues,
partially related to this.

This patch (of 3):

If max_pfn is not aligned to a section boundary, we can easily run into
BUGs.  This can e.g., be triggered on x86-64 under QEMU by specifying a
memory size that is not a multiple of 128MB (e.g., 4097MB, but also
4160MB).  I was told that on real HW, we can easily have this scenario
(esp., one of the main reasons sub-section hotadd of devmem was added).

The issue is, that we have a valid memmap (pfn_valid()) for the whole
section, and the whole section will be marked "online".
pfn_to_online_page() will succeed, but the memmap contains garbage.

E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m
4160M" - (see tools/vm/page-types.c):

[  200.476376] BUG: unable to handle page fault for address: fffffffffffffffe
[  200.477500] #PF: supervisor read access in kernel mode
[  200.478334] #PF: error_code(0x0000) - not-present page
[  200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0
[  200.479557] Oops: 0000 [#4] SMP NOPTI
[  200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G      D W         5.5.0-rc1-next-20191209 #93
[  200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
[  200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
[  200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
[  200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202
[  200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000
[  200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246
[  200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
[  200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001
[  200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08
[  200.487130] FS:  00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000
[  200.487804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0
[  200.488897] Call Trace:
[  200.489115]  kpageflags_read+0xe9/0x140
[  200.489447]  proc_reg_read+0x3c/0x60
[  200.489755]  vfs_read+0xc2/0x170
[  200.490037]  ksys_pread64+0x65/0xa0
[  200.490352]  do_syscall_64+0x5c/0xa0
[  200.490665]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

But it can be triggered much easier via "cat /proc/kpageflags > /dev/null"
after cold/hot plugging a DIMM to such a system:

[root@localhost ~]# cat /proc/kpageflags > /dev/null
[  111.517275] BUG: unable to handle page fault for address: fffffffffffffffe
[  111.517907] #PF: supervisor read access in kernel mode
[  111.518333] #PF: error_code(0x0000) - not-present page
[  111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0

This patch fixes that by at least zero-ing out that memmap (so e.g.,
page_to_pfn() will not crash).  Commit 907ec5fca3dc ("mm: zero remaining
unavailable struct pages") tried to fix a similar issue, but forgot to
consider this special case.

After this patch, there are still problems to solve.  E.g., not all of
these pages falling into a memory hole will actually get initialized later
and set PageReserved - they are only zeroed out - but at least the
immediate crashes are gone.  A follow-up patch will take care of this.

Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com
Fixes: f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>	[4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:18 -08:00
..
kasan
backing-dev.c
balloon_compaction.c
bootmem.c
cleancache.c
cma.c mm/cma.c: fail if fixed declaration can't be honored 2019-08-06 19:06:51 +02:00
cma.h
cma_debug.c
compaction.c mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone 2019-10-05 13:10:13 +02:00
debug.c
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/filemap.c: don't initiate writeback if mapping has no dirty pages 2019-11-12 19:21:20 +01:00
frame_vector.c
frontswap.c
gup.c mm/gup.c: remove some BUG_ONs from get_gate_page() 2019-07-31 07:27:08 +02:00
gup_benchmark.c mm/gup_benchmark.c: prevent integer overflow in ioctl 2019-12-01 09:17:07 +01:00
highmem.c
hmm.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
huge_memory.c mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment 2020-01-23 08:21:32 +01:00
hugetlb.c hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic() 2019-10-29 09:19:59 +01:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-20 18:45:20 +01:00
hwpoison-inject.c
init-mm.c
internal.h vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n 2019-12-05 09:20:57 +01:00
interval_tree.c
Kconfig
Kconfig.debug
khugepaged.c
kmemleak-test.c
kmemleak.c Revert "kmemleak: allow to coexist with fault injection" 2019-08-25 10:47:58 +02:00
ksm.c mm/ksm.c: don't WARN if page is still mapped in remove_stable_node() 2019-12-01 09:16:11 +01:00
list_lru.c
maccess.c
madvise.c
Makefile
memblock.c mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free() 2019-12-05 09:20:58 +01:00
memcontrol.c mm: handle no memcg case in memcg_kmem_charge() properly 2019-12-01 09:17:14 +01:00
memfd.c memfd: Use radix_tree_deref_slot_protected to avoid the warning. 2019-11-20 18:47:53 +01:00
memory-failure.c mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once 2019-10-29 09:19:59 +01:00
memory.c mm, thp, proc: report THP eligibility for each vma 2019-12-17 20:35:45 +01:00
memory_hotplug.c mm/memory_hotplug: fix remove_memory() lockdep splat 2020-02-11 04:33:56 -08:00
mempolicy.c mm/mempolicy.c: fix out of bounds write in mpol_parse_str() 2020-02-05 14:43:36 +00:00
mempool.c
memtest.c
migrate.c mm: move_pages: report the number of non-attempted pages 2020-02-11 04:33:56 -08:00
mincore.c
mlock.c mm/mlock.c: change count_mm_mlocked_page_nr return type 2019-07-10 09:53:40 +02:00
mm_init.c
mmap.c arm64: Revert support for execute-only user mappings 2020-01-09 10:19:03 +01:00
mmu_context.c
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-31 07:27:08 +02:00
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c
nommu.c mm: use down_read_killable for locking mmap_sem in access_remote_vm 2019-07-31 07:27:09 +02:00
oom_kill.c memcg, oom: don't require __GFP_FS when invoking memcg OOM killer 2019-10-05 13:10:07 +02:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:21:31 +01:00
page_alloc.c mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section 2020-02-11 04:34:18 -08:00
page_counter.c
page_ext.c
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-07-03 13:14:45 +02:00
page_io.c mm/page_io.c: do not free shared swap slots 2019-12-01 09:17:35 +01:00
page_isolation.c
page_owner.c mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo 2019-10-29 09:19:58 +01:00
page_poison.c
page_vma_mapped.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c mm/hmm: fix bad subpage pointer in try_to_unmap_one 2019-08-25 10:47:43 +02:00
rodata_test.c
shmem.c mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment 2020-01-23 08:21:30 +01:00
slab.c
slab.h
slab_common.c mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid 2020-01-23 08:21:30 +01:00
slob.c
slub.c mm/slub: fix a deadlock in show_slab_objects() 2019-10-29 09:19:58 +01:00
sparse-vmemmap.c
sparse.c mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section 2020-01-29 16:43:26 +01:00
swap.c mm/swap: fix release_pages() when releasing devmap pages 2019-07-31 07:27:03 +02:00
swap_cgroup.c
swap_slots.c
swap_state.c
swapfile.c
truncate.c
usercopy.c usercopy: Avoid HIGHMEM pfn warning 2019-10-11 18:20:58 +02:00
userfaultfd.c
util.c
vmacache.c
vmalloc.c mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() 2019-08-16 10:12:40 +02:00
vmpressure.c mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() 2019-10-17 13:45:19 -07:00
vmscan.c mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker 2019-08-06 19:06:54 +02:00
vmstat.c mm/vmstat.c: fix NUMA statistics updates 2019-12-13 08:51:27 +01:00
workingset.c
z3fold.c
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: fix the migrated zspage statistics. 2020-01-09 10:19:00 +01:00
zswap.c