kernel-fxtec-pro1x/mm
Tejun Heo 1f7dd3e5a6 cgroup: fix handling of multi-destination migration from subtree_control enabling
Consider the following v2 hierarchy.

  P0 (+memory) --- P1 (-memory) --- A
                                 \- B
       
P0 has memory enabled in its subtree_control while P1 doesn't.  If
both A and B contain processes, they would belong to the memory css of
P1.  Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter.  IOW, enabling controllers
can cause atomic migrations into different csses.

The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses.  pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.

 WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
 Modules linked in:
 CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
 ...
  ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
  ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
  ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
 Call Trace:
  [<ffffffff81551ffc>] dump_stack+0x4e/0x82
  [<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
  [<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
  [<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
  [<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
  [<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
  [<ffffffff81189016>] cgroup_attach_task+0x176/0x200
  [<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
  [<ffffffff81189684>] cgroup_procs_write+0x14/0x20
  [<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
  [<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
  [<ffffffff81265f88>] __vfs_write+0x28/0xe0
  [<ffffffff812666fc>] vfs_write+0xac/0x1a0
  [<ffffffff81267019>] SyS_write+0x49/0xb0
  [<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76

This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated.  All controllers are
updated accordingly.

* Controllers which don't care whether there are one or multiple
  target csses can be converted trivially.  cpu, io, freezer, perf,
  netclassid and netprio fall in this category.

* cpuset's current implementation assumes that there's single source
  and destination and thus doesn't support v2 hierarchy already.  The
  only change made by this patchset is how that single destination css
  is obtained.

* memory migration path already doesn't do anything on v2.  How the
  single destination css is obtained is updated and the prep stage of
  mem_cgroup_can_attach() is reordered to accomodate the change.

* pids is the only controller which was affected by this bug.  It now
  correctly handles multi-destination migrations and no longer causes
  counter underflow from incorrect accounting.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
2015-12-03 10:18:21 -05:00
..
kasan kasan: always taint kernel on report 2015-11-05 19:34:48 -08:00
backing-dev.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
balloon_compaction.c mm: page migration trylock newpage at same level as oldpage 2015-11-05 19:34:48 -08:00
bootmem.c bootmem: avoid freeing to bootmem after bootmem is done 2015-09-08 15:35:28 -07:00
cleancache.c cleancache: remove limit on the number of cleancache enabled filesystems 2015-04-14 16:49:03 -07:00
cma.c mm/cma.c: suppress warning 2015-11-05 19:34:48 -08:00
cma.h mm: cma: mark cma_bitmap_maxno() inline in header 2015-08-14 15:56:32 -07:00
cma_debug.c mm/cma_debug: correct size input to bitmap function 2015-07-17 16:39:54 -07:00
compaction.c mm, compaction: distinguish contended status in tracepoints 2015-11-05 19:34:48 -08:00
debug-pagealloc.c mm/debug-pagealloc: make debug-pagealloc boottime configurable 2014-12-13 12:42:48 -08:00
debug.c mm: make compound_head() robust 2015-11-06 17:50:42 -08:00
dmapool.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
early_ioremap.c mm/early_ioremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
fadvise.c writeback: implement and use inode_congested() 2015-06-02 08:33:35 -06:00
failslab.c mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM 2015-11-06 17:50:42 -08:00
filemap.c mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
frame_vector.c mm: fix docbook comment for get_vaddr_frames() 2015-11-05 19:34:48 -08:00
frontswap.c frontswap: allow multiple backends 2015-06-24 17:49:45 -07:00
gup.c mm: introduce VM_LOCKONFAULT 2015-11-05 19:34:48 -08:00
highmem.c mm/highmem: make kmap cache coloring aware 2014-08-06 18:01:22 -07:00
huge_memory.c Merge branch 'akpm' (patches from Andrew) 2015-11-07 14:32:45 -08:00
hugetlb.c hugetlb: trivial comment fix 2015-11-10 16:32:11 -08:00
hugetlb_cgroup.c mm: make compound_head() robust 2015-11-06 17:50:42 -08:00
hwpoison-inject.c hwpoison: use page_cgroup_ino for filtering by memcg 2015-09-10 13:29:01 -07:00
init-mm.c
internal.h mm: use 'unsigned int' for page order 2015-11-06 17:50:42 -08:00
interval_tree.c mm: replace vma->sharead.linear with vma->shared 2015-02-10 14:30:31 -08:00
Kconfig mm: make compound_head() robust 2015-11-06 17:50:42 -08:00
Kconfig.debug mm/debug_pagealloc: remove obsolete Kconfig options 2015-01-08 15:10:52 -08:00
kmemcheck.c mm/slab_common: move kmem_cache definition to internal header 2014-10-09 22:25:50 -04:00
kmemleak-test.c mm/kmemleak-test.c: use pr_fmt for logging 2014-06-06 16:08:18 -07:00
kmemleak.c mm/kmemleak.c: remove unneeded initialization of object to NULL 2015-11-05 19:34:48 -08:00
ksm.c ksm: unstable_tree_search_insert error checking cleanup 2015-11-05 19:34:48 -08:00
list_lru.c memcg: simplify and inline __mem_cgroup_from_kmem 2015-11-05 19:34:48 -08:00
maccess.c mm/maccess.c: actually return -EFAULT from strncpy_from_unsafe 2015-11-05 19:34:48 -08:00
madvise.c mm: madvise allow remove operation for hugetlbfs 2015-09-08 15:35:28 -07:00
Makefile media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
memblock.c mm/memblock: make memblock_remove_range() static 2015-11-05 19:34:48 -08:00
memcontrol.c cgroup: fix handling of multi-destination migration from subtree_control enabling 2015-12-03 10:18:21 -05:00
memory-failure.c mm: make compound_head() robust 2015-11-06 17:50:42 -08:00
memory.c mm, dax: fix DAX deadlocks 2015-10-16 11:42:28 -07:00
memory_hotplug.c mm/page_alloc: remove unused parameter in init_currently_empty_zone() 2015-11-05 19:34:48 -08:00
mempolicy.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
mempool.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
memtest.c memtest: remove unused header files 2015-09-08 15:35:28 -07:00
migrate.c mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM 2015-11-06 17:50:42 -08:00
mincore.c mm/mincore: use offset_in_page macro 2015-11-05 19:34:48 -08:00
mlock.c mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage 2015-11-05 19:34:48 -08:00
mm_init.c mm: meminit: remove mminit_verify_page_links 2015-06-30 19:44:56 -07:00
mmap.c mm: introduce VM_LOCKONFAULT 2015-11-05 19:34:48 -08:00
mmu_context.c sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm 2014-02-21 08:50:17 +01:00
mmu_notifier.c mmu-notifier: add clear_young callback 2015-09-10 13:29:01 -07:00
mmzone.c mm: microoptimize zonelist operations 2015-02-11 17:06:02 -08:00
mprotect.c userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx 2015-09-04 16:54:41 -07:00
mremap.c mm/mremap: use offset_in_page macro 2015-11-05 19:34:48 -08:00
msync.c mm/msync: use offset_in_page macro 2015-11-05 19:34:48 -08:00
nobootmem.c mm: page_alloc: pass PFN to __free_pages_bootmem 2015-06-30 19:44:55 -07:00
nommu.c mm/nommu.c: drop unlikely inside BUG_ON() 2015-11-05 19:34:48 -08:00
oom_kill.c mm/oom_kill.c: introduce is_sysrq_oom helper 2015-11-06 17:50:42 -08:00
page-writeback.c writeback: fix incorrect calculation of available memory for memcg domains 2015-10-12 10:31:13 -06:00
page_alloc.c Fix alloc_node_mem_map() to work on ia64 again 2015-11-10 14:44:26 -08:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_idle.c mm: introduce idle page tracking 2015-09-10 13:29:01 -07:00
page_io.c fs: use helper bio_add_page() instead of open coding on bi_io_vec 2015-08-13 12:32:00 -06:00
page_isolation.c mm, page_isolation: make set/unset_migratetype_isolate() file-local 2015-09-08 15:35:28 -07:00
page_owner.c mm/page_owner: set correct gfp_mask on page_owner 2015-07-17 16:39:54 -07:00
pagewalk.c mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callers 2015-03-25 16:20:30 -07:00
percpu-km.c percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated 2014-09-02 14:46:05 -04:00
percpu-vm.c percpu: move region iterations out of pcpu_[de]populate_chunk() 2014-09-02 14:46:02 -04:00
percpu.c mm/percpu: use offset_in_page macro 2015-11-05 19:34:48 -08:00
pgtable-generic.c mm,thp: introduce flush_pmd_tlb_range 2015-10-17 17:48:20 +05:30
process_vm_access.c process_vm_access: switch to {compat_,}import_iovec() 2015-04-11 22:27:12 -04:00
quicklist.c
readahead.c mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
rmap.c mm: page migration use migration entry for swapcache too 2015-11-05 19:34:48 -08:00
shmem.c mm: page_alloc: hide some GFP internals and document the bits and flag combinations 2015-11-06 17:50:42 -08:00
slab.c slab, slub: use page->rcu_head instead of page->lru plus cast 2015-11-06 17:50:42 -08:00
slab.h memcg: unify slab and other kmem pages charging 2015-11-05 19:34:48 -08:00
slab_common.c mm/slab_common.c: initialize kmem_cache pointer to NULL 2015-11-05 19:34:48 -08:00
slob.c mm: rename alloc_pages_exact_node() to __alloc_pages_node() 2015-09-08 15:35:28 -07:00
slub.c slab, slub: use page->rcu_head instead of page->lru plus cast 2015-11-06 17:50:42 -08:00
sparse-vmemmap.c mm/sparse: use memblock apis for early memory allocations 2014-01-21 16:19:47 -08:00
sparse.c mm: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:35:54 -07:00
swap.c mm: make compound_head() robust 2015-11-06 17:50:42 -08:00
swap_cgroup.c mm: page_cgroup: rename file to mm/swap_cgroup.c 2014-12-10 17:41:09 -08:00
swap_state.c mm: swap: zswap: maybe_preload & refactoring 2015-09-08 15:35:28 -07:00
swapfile.c mm: /proc/pid/smaps:: show proportional swap share of the mapping 2015-09-08 15:35:28 -07:00
truncate.c memcg: add per cgroup dirty page accounting 2015-06-02 08:33:33 -06:00
userfaultfd.c userfaultfd: avoid mmap_sem read recursion in mcopy_atomic 2015-09-04 16:54:41 -07:00
util.c mm/util: use offset_in_page macro 2015-11-05 19:34:48 -08:00
vmacache.c mm/vmacache: inline vmacache_valid_mm() 2015-11-05 19:34:48 -08:00
vmalloc.c mm: page_alloc: hide some GFP internals and document the bits and flag combinations 2015-11-06 17:50:42 -08:00
vmpressure.c mm/vmpressure.c: fix race in vmpressure_work_fn() 2014-12-02 17:32:07 -08:00
vmscan.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
vmstat.c mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand 2015-11-06 17:50:42 -08:00
workingset.c list_lru: add helpers to isolate items 2015-02-12 18:54:10 -08:00
zbud.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c zsmalloc: use page->private instead of page->first_page 2015-11-06 17:50:42 -08:00
zswap.c zswap: use charp for zswap param strings 2015-11-06 17:50:42 -08:00