memcg: document cgroup dirty memory interfaces

Document cgroup dirty memory interfaces and statistics.

[akpm@linux-foundation.org: fix use_hierarchy description]
Signed-off-by: Andrea Righi <arighi@develer.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Greg Thelen 2011-01-13 15:47:36 -08:00 committed by Linus Torvalds
parent db16d5ec1f
commit ece72400c2

View file

@ -385,6 +385,10 @@ mapped_file - # of bytes of mapped file (includes tmpfs/shmem)
pgpgin - # of pages paged in (equivalent to # of charging events).
pgpgout - # of pages paged out (equivalent to # of uncharging events).
swap - # of bytes of swap usage
dirty - # of bytes that are waiting to get written back to the disk.
writeback - # of bytes that are actively being written back to the disk.
nfs_unstable - # of bytes sent to the NFS server, but not yet committed to
the actual storage.
inactive_anon - # of bytes of anonymous memory and swap cache memory on
LRU list.
active_anon - # of bytes of anonymous and swap cache memory on active
@ -406,6 +410,9 @@ total_mapped_file - sum of all children's "cache"
total_pgpgin - sum of all children's "pgpgin"
total_pgpgout - sum of all children's "pgpgout"
total_swap - sum of all children's "swap"
total_dirty - sum of all children's "dirty"
total_writeback - sum of all children's "writeback"
total_nfs_unstable - sum of all children's "nfs_unstable"
total_inactive_anon - sum of all children's "inactive_anon"
total_active_anon - sum of all children's "active_anon"
total_inactive_file - sum of all children's "inactive_file"
@ -453,6 +460,73 @@ memory under it will be reclaimed.
You can reset failcnt by writing 0 to failcnt file.
# echo 0 > .../memory.failcnt
5.5 dirty memory
Control the maximum amount of dirty pages a cgroup can have at any given time.
Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
page cache used by a cgroup. So, in case of multiple cgroup writers, they will
not be able to consume more than their designated share of dirty pages and will
be forced to perform write-out if they cross that limit.
The interface is equivalent to the procfs interface: /proc/sys/vm/dirty_*. It
is possible to configure a limit to trigger both a direct writeback or a
background writeback performed by per-bdi flusher threads. The root cgroup
memory.dirty_* control files are read-only and match the contents of
the /proc/sys/vm/dirty_* files.
Per-cgroup dirty limits can be set using the following files in the cgroupfs:
- memory.dirty_ratio: the amount of dirty memory (expressed as a percentage of
cgroup memory) at which a process generating dirty pages will itself start
writing out dirty data.
- memory.dirty_limit_in_bytes: the amount of dirty memory (expressed in bytes)
in the cgroup at which a process generating dirty pages will start itself
writing out dirty data. Suffix (k, K, m, M, g, or G) can be used to indicate
that value is kilo, mega or gigabytes.
Note: memory.dirty_limit_in_bytes is the counterpart of memory.dirty_ratio.
Only one of them may be specified at a time. When one is written it is
immediately taken into account to evaluate the dirty memory limits and the
other appears as 0 when read.
- memory.dirty_background_ratio: the amount of dirty memory of the cgroup
(expressed as a percentage of cgroup memory) at which background writeback
kernel threads will start writing out dirty data.
- memory.dirty_background_limit_in_bytes: the amount of dirty memory (expressed
in bytes) in the cgroup at which background writeback kernel threads will
start writing out dirty data. Suffix (k, K, m, M, g, or G) can be used to
indicate that value is kilo, mega or gigabytes.
Note: memory.dirty_background_limit_in_bytes is the counterpart of
memory.dirty_background_ratio. Only one of them may be specified at a time.
When one is written it is immediately taken into account to evaluate the dirty
memory limits and the other appears as 0 when read.
A cgroup may contain more dirty memory than its dirty limit. This is possible
because of the principle that the first cgroup to touch a page is charged for
it. Subsequent page counting events (dirty, writeback, nfs_unstable) are also
counted to the originally charged cgroup.
Example: If page is allocated by a cgroup A task, then the page is charged to
cgroup A. If the page is later dirtied by a task in cgroup B, then the cgroup A
dirty count will be incremented. If cgroup A is over its dirty limit but cgroup
B is not, then dirtying a cgroup A page from a cgroup B task may push cgroup A
over its dirty limit without throttling the dirtying cgroup B task.
When use_hierarchy=0, each cgroup has dirty memory usage and limits.
System-wide dirty limits are also consulted. Dirty memory consumption is
checked against both system-wide and per-cgroup dirty limits.
The current implementation does not enforce per-cgroup dirty limits when
use_hierarchy=1. System-wide dirty limits are used for processes in such
cgroups. Attempts to read memory.dirty_* files return the system-wide
values. Writes to the memory.dirty_* files return error. An enhanced
implementation is needed to check the chain of parents to ensure that no
dirty limit is exceeded.
6. Hierarchy support
The memory controller supports a deep hierarchy and hierarchical accounting.