kernel-fxtec-pro1x/mm
Suren Baghdasaryan a3d0ceee71 mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
[ Upstream commit 67197a4f28d28d0b073ab0427b03cb2ee5382578 ]

Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm.  This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.

Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important.  Such operation can happen frequently.  We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression.  Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us.  Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.

Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK).  Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes.  To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex.  Its scope is changed to global.

The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork().  To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified.  Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare.  Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.

With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.

[surenb@google.com: v3]
  Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com

Fixes: 44a70adec9 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj")
Reported-by: Tim Murray <timmurray@google.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Christian Kellner <christian@kellner.me>
Cc: Adrian Reber <areber@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: https://lkml.kernel.org/r/20200824153036.3201505-1-surenb@google.com
Debugged-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29 09:55:15 +01:00
..
kasan
backing-dev.c
balloon_compaction.c
bootmem.c
cleancache.c
cma.c
cma.h
cma_debug.c
compaction.c
debug.c
debug_page_ref.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/filemap.c: clear page error before actual read 2020-10-01 13:14:41 +02:00
frame_vector.c
frontswap.c
gup.c
gup_benchmark.c mm/gup_benchmark.c: prevent integer overflow in ioctl 2019-12-01 09:17:07 +01:00
highmem.c
hmm.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
huge_memory.c mm/thp: fix __split_huge_pmd_locked() for migration PMD 2020-09-26 18:01:28 +02:00
hugetlb.c mm/hugetlb: fix a race between hugetlb sysctl handlers 2020-09-09 19:04:32 +02:00
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-20 18:45:20 +01:00
hwpoison-inject.c
init-mm.c
internal.h vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n 2019-12-05 09:20:57 +01:00
interval_tree.c
Kconfig
Kconfig.debug
khugepaged.c mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged 2020-10-14 10:31:26 +02:00
kmemleak-test.c
kmemleak.c mm/kmemleak.c: use address-of operator on section symbols 2020-10-01 13:14:41 +02:00
ksm.c mm/ksm: fix NULL pointer dereference when KSM zero page is enabled 2020-04-29 16:31:28 +02:00
list_lru.c
maccess.c uaccess: Add non-pagefault user-space write function 2020-09-09 19:04:29 +02:00
madvise.c
Makefile
memblock.c mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free() 2019-12-05 09:20:58 +01:00
memcontrol.c mm/memcg: fix device private memcg accounting 2020-10-29 09:55:15 +01:00
memfd.c memfd: Use radix_tree_deref_slot_protected to avoid the warning. 2019-11-20 18:47:53 +01:00
memory-failure.c mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once 2019-10-29 09:19:59 +01:00
memory.c mm: avoid data corruption on CoW fault into PFN-mapped VMA 2020-10-01 13:14:36 +02:00
memory_hotplug.c mm: don't rely on system state to detect hot-plug operations 2020-10-07 08:00:08 +02:00
mempolicy.c mm: mempolicy: require at least one nodeid for MPOL_PREFERRED 2020-04-13 10:45:06 +02:00
mempool.c
memtest.c
migrate.c mm: move_pages: report the number of non-attempted pages 2020-02-11 04:33:56 -08:00
mincore.c
mlock.c
mm_init.c
mmap.c mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area 2020-10-01 13:14:42 +02:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa 2020-03-11 14:15:00 +01:00
mremap.c mm: Fix mremap not considering huge pmd devmap 2020-06-07 13:17:53 +02:00
msync.c
nobootmem.c
nommu.c x86/mm: split vmalloc_sync_all() 2020-03-25 08:06:13 +01:00
oom_kill.c mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-29 09:55:15 +01:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:21:31 +01:00
page_alloc.c mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged 2020-10-14 10:31:26 +02:00
page_counter.c mm/page_counter.c: fix protection usage propagation 2020-08-21 11:05:33 +02:00
page_ext.c
page_idle.c
page_io.c mm/page_io.c: do not free shared swap slots 2019-12-01 09:17:35 +01:00
page_isolation.c
page_owner.c mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo 2019-10-29 09:19:58 +01:00
page_poison.c
page_vma_mapped.c
pagewalk.c mm: pagewalk: fix termination condition in walk_pte_range() 2020-10-01 13:14:32 +02:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c percpu: fix first chunk size calculation for populated bitmap 2020-09-23 12:11:01 +02:00
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c
rodata_test.c
shmem.c shmem: fix possible deadlocks on shmlock_user_lock 2020-05-20 08:18:32 +02:00
slab.c
slab.h
slab_common.c mm: memcg/slab: fix memory leak at non-root kmem_cache destroy 2020-07-29 10:16:57 +02:00
slob.c
slub.c mm: slub: fix conversion of freelist_corrupted() 2020-09-09 19:04:31 +02:00
sparse-vmemmap.c
sparse.c mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section 2020-01-29 16:43:26 +01:00
swap.c
swap_cgroup.c
swap_slots.c
swap_state.c mm/swap_state: fix a data race in swapin_nr_pages 2020-10-01 13:14:47 +02:00
swapfile.c mm, THP, swap: fix allocating cluster for swapfile by mistake 2020-10-01 13:14:53 +02:00
truncate.c
usercopy.c
userfaultfd.c
util.c mm: add kvfree_sensitive() for freeing sensitive data objects 2020-06-22 09:05:01 +02:00
vmacache.c
vmalloc.c mm/vmalloc.c: don't dereference possible NULL pointer in __vunmap() 2020-06-03 08:19:49 +02:00
vmpressure.c
vmscan.c mm/vmscan.c: fix data races using kswapd_classzone_idx 2020-10-01 13:14:41 +02:00
vmstat.c mm/vmstat.c: fix NUMA statistics updates 2019-12-13 08:51:27 +01:00
workingset.c
z3fold.c
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc.c: fix the migrated zspage statistics. 2020-01-09 10:19:00 +01:00
zswap.c