Document lowmem_reserve_ratio

Though the lower_zone_protection was changed to lowmem_reserve_ratio, the
document has been not changed.  The lowmem_reserve_ratio seems quite hard
to estimate, but there is no guidance.  This patch is to change document
for it.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Yasunori Goto 2008-02-04 22:29:32 -08:00 committed by Linus Torvalds
parent b5beb1caff
commit 7786fa9ac5

View file

@ -1336,7 +1336,7 @@ legacy_va_layout
If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel
will use the legacy (2.4) layout for all processes.
lower_zone_protection
lowmem_reserve_ratio
---------------------
For some specialised workloads on highmem machines it is dangerous for
@ -1356,25 +1356,71 @@ captured into pinned user memory.
mechanism will also defend that region from allocations which could use
highmem or lowmem).
The `lower_zone_protection' tunable determines how aggressive the kernel is
in defending these lower zones. The default value is zero - no
protection at all.
The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is
in defending these lower zones.
If you have a machine which uses highmem or ISA DMA and your
applications are using mlock(), or if you are running with no swap then
you probably should increase the lower_zone_protection setting.
you probably should change the lowmem_reserve_ratio setting.
The units of this tunable are fairly vague. It is approximately equal
to "megabytes," so setting lower_zone_protection=100 will protect around 100
megabytes of the lowmem zone from user allocations. It will also make
those 100 megabytes unavailable for use by applications and by
pagecache, so there is a cost.
The lowmem_reserve_ratio is an array. You can see them by reading this file.
-
% cat /proc/sys/vm/lowmem_reserve_ratio
256 256 32
-
Note: # of this elements is one fewer than number of zones. Because the highest
zone's value is not necessary for following calculation.
The effects of this tunable may be observed by monitoring
/proc/meminfo:LowFree. Write a single huge file and observe the point
at which LowFree ceases to fall.
But, these values are not used directly. The kernel calculates # of protection
pages for each zones from them. These are shown as array of protection pages
in /proc/zoneinfo like followings. (This is an example of x86-64 box).
Each zone has an array of protection pages like this.
A reasonable value for lower_zone_protection is 100.
-
Node 0, zone DMA
pages free 1355
min 3
low 3
high 4
:
:
numa_other 0
protection: (0, 2004, 2004, 2004)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pagesets
cpu: 0 pcp: 0
:
-
These protections are added to score to judge whether this zone should be used
for page allocation or should be reclaimed.
In this example, if normal pages (index=2) are required to this DMA zone and
pages_high is used for watermark, the kernel judges this zone should not be
used because pages_free(1355) is smaller than watermark + protection[2]
(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
normal page requirement. If requirement is DMA zone(index=0), protection[0]
(=0) is used.
zone[i]'s protection[j] is calculated by following exprssion.
(i < j):
zone[i]->protection[j]
= (total sums of present_pages from zone[i+1] to zone[j] on the node)
/ lowmem_reserve_ratio[i];
(i = j):
(should not be protected. = 0;
(i > j):
(not necessary, but looks 0)
The default values of lowmem_reserve_ratio[i] are
256 (if zone[i] means DMA or DMA32 zone)
32 (others).
As above expression, they are reciprocal number of ratio.
256 means 1/256. # of protection pages becomes about "0.39%" of total present
pages of higher zones on the node.
If you would like to protect more pages, smaller values are effective.
The minimum value is 1 (1/1 -> 100%).
page-cluster
------------