7339ff8302
Anything that writes into a tmpfs filesystem is liable to disproportionately decrease the available memory on a particular node. Since there's no telling what sort of application (e.g. dd/cp/cat) might be dropping large files there, this lets the admin choose the appropriate default behavior for their site's situation. Introduce a tmpfs mount option which allows specifying a memory policy and a second option to specify the nodelist for that policy. With the default policy, tmpfs will behave as it does today. This patch adds support for preferred, bind, and interleave policies. The default policy will cause pages to be added to tmpfs files on the node which is doing the writing. Some jobs expect a single process to create and manage the tmpfs files. This results in a node which has a significantly reduced number of free pages. With this patch, the administrator can specify the policy and nodes for that policy where they would prefer allocations. This patch was originally written by Brent Casavant and Hugh Dickins. I added support for the bind and preferred policies and the mpol_nodelist mount option. Signed-off-by: Brent Casavant <bcasavan@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
112 lines
4.6 KiB
Text
112 lines
4.6 KiB
Text
Tmpfs is a file system which keeps all files in virtual memory.
|
|
|
|
|
|
Everything in tmpfs is temporary in the sense that no files will be
|
|
created on your hard drive. If you unmount a tmpfs instance,
|
|
everything stored therein is lost.
|
|
|
|
tmpfs puts everything into the kernel internal caches and grows and
|
|
shrinks to accommodate the files it contains and is able to swap
|
|
unneeded pages out to swap space. It has maximum size limits which can
|
|
be adjusted on the fly via 'mount -o remount ...'
|
|
|
|
If you compare it to ramfs (which was the template to create tmpfs)
|
|
you gain swapping and limit checking. Another similar thing is the RAM
|
|
disk (/dev/ram*), which simulates a fixed size hard disk in physical
|
|
RAM, where you have to create an ordinary filesystem on top. Ramdisks
|
|
cannot swap and you do not have the possibility to resize them.
|
|
|
|
Since tmpfs lives completely in the page cache and on swap, all tmpfs
|
|
pages currently in memory will show up as cached. It will not show up
|
|
as shared or something like that. Further on you can check the actual
|
|
RAM+swap use of a tmpfs instance with df(1) and du(1).
|
|
|
|
|
|
tmpfs has the following uses:
|
|
|
|
1) There is always a kernel internal mount which you will not see at
|
|
all. This is used for shared anonymous mappings and SYSV shared
|
|
memory.
|
|
|
|
This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not
|
|
set, the user visible part of tmpfs is not build. But the internal
|
|
mechanisms are always present.
|
|
|
|
2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
|
|
POSIX shared memory (shm_open, shm_unlink). Adding the following
|
|
line to /etc/fstab should take care of this:
|
|
|
|
tmpfs /dev/shm tmpfs defaults 0 0
|
|
|
|
Remember to create the directory that you intend to mount tmpfs on
|
|
if necessary (/dev/shm is automagically created if you use devfs).
|
|
|
|
This mount is _not_ needed for SYSV shared memory. The internal
|
|
mount is used for that. (In the 2.3 kernel versions it was
|
|
necessary to mount the predecessor of tmpfs (shm fs) to use SYSV
|
|
shared memory)
|
|
|
|
3) Some people (including me) find it very convenient to mount it
|
|
e.g. on /tmp and /var/tmp and have a big swap partition. And now
|
|
loop mounts of tmpfs files do work, so mkinitrd shipped by most
|
|
distributions should succeed with a tmpfs /tmp.
|
|
|
|
4) And probably a lot more I do not know about :-)
|
|
|
|
|
|
tmpfs has three mount options for sizing:
|
|
|
|
size: The limit of allocated bytes for this tmpfs instance. The
|
|
default is half of your physical RAM without swap. If you
|
|
oversize your tmpfs instances the machine will deadlock
|
|
since the OOM handler will not be able to free that memory.
|
|
nr_blocks: The same as size, but in blocks of PAGE_CACHE_SIZE.
|
|
nr_inodes: The maximum number of inodes for this instance. The default
|
|
is half of the number of your physical RAM pages, or (on a
|
|
a machine with highmem) the number of lowmem RAM pages,
|
|
whichever is the lower.
|
|
|
|
These parameters accept a suffix k, m or g for kilo, mega and giga and
|
|
can be changed on remount. The size parameter also accepts a suffix %
|
|
to limit this tmpfs instance to that percentage of your physical RAM:
|
|
the default, when neither size nor nr_blocks is specified, is size=50%
|
|
|
|
If nr_blocks=0 (or size=0), blocks will not be limited in that instance;
|
|
if nr_inodes=0, inodes will not be limited. It is generally unwise to
|
|
mount with such options, since it allows any user with write access to
|
|
use up all the memory on the machine; but enhances the scalability of
|
|
that instance in a system with many cpus making intensive use of it.
|
|
|
|
|
|
tmpfs has a mount option to set the NUMA memory allocation policy for
|
|
all files in that instance:
|
|
mpol=interleave prefers to allocate memory from each node in turn
|
|
mpol=default prefers to allocate memory from the local node
|
|
mpol=bind prefers to allocate from mpol_nodelist
|
|
mpol=preferred prefers to allocate from first node in mpol_nodelist
|
|
|
|
The following mount option is used in conjunction with mpol=interleave,
|
|
mpol=bind or mpol=preferred:
|
|
mpol_nodelist: nodelist suitable for parsing with nodelist_parse.
|
|
|
|
|
|
To specify the initial root directory you can use the following mount
|
|
options:
|
|
|
|
mode: The permissions as an octal number
|
|
uid: The user id
|
|
gid: The group id
|
|
|
|
These options do not have any effect on remount. You can change these
|
|
parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem.
|
|
|
|
|
|
So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
|
|
will give you tmpfs instance on /mytmpfs which can allocate 10GB
|
|
RAM/SWAP in 10240 inodes and it is only accessible by root.
|
|
|
|
|
|
Author:
|
|
Christoph Rohland <cr@sap.com>, 1.12.01
|
|
Updated:
|
|
Hugh Dickins <hugh@veritas.com>, 13 March 2005
|