[PATCH] ramfs, rootfs, and initramfs docs
Docs for ramfs, rootfs, and initramfs. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This commit is contained in:
parent
cbf8f0f36a
commit
7f46a240b0
1 changed files with 195 additions and 0 deletions
195
Documentation/filesystems/ramfs-rootfs-initramfs.txt
Normal file
195
Documentation/filesystems/ramfs-rootfs-initramfs.txt
Normal file
|
@ -0,0 +1,195 @@
|
|||
ramfs, rootfs and initramfs
|
||||
October 17, 2005
|
||||
Rob Landley <rob@landley.net>
|
||||
=============================
|
||||
|
||||
What is ramfs?
|
||||
--------------
|
||||
|
||||
Ramfs is a very simple filesystem that exports Linux's disk caching
|
||||
mechanisms (the page cache and dentry cache) as a dynamically resizable
|
||||
ram-based filesystem.
|
||||
|
||||
Normally all files are cached in memory by Linux. Pages of data read from
|
||||
backing store (usually the block device the filesystem is mounted on) are kept
|
||||
around in case it's needed again, but marked as clean (freeable) in case the
|
||||
Virtual Memory system needs the memory for something else. Similarly, data
|
||||
written to files is marked clean as soon as it has been written to backing
|
||||
store, but kept around for caching purposes until the VM reallocates the
|
||||
memory. A similar mechanism (the dentry cache) greatly speeds up access to
|
||||
directories.
|
||||
|
||||
With ramfs, there is no backing store. Files written into ramfs allocate
|
||||
dentries and page cache as usual, but there's nowhere to write them to.
|
||||
This means the pages are never marked clean, so they can't be freed by the
|
||||
VM when it's looking to recycle memory.
|
||||
|
||||
The amount of code required to implement ramfs is tiny, because all the
|
||||
work is done by the existing Linux caching infrastructure. Basically,
|
||||
you're mounting the disk cache as a filesystem. Because of this, ramfs is not
|
||||
an optional component removable via menuconfig, since there would be negligible
|
||||
space savings.
|
||||
|
||||
ramfs and ramdisk:
|
||||
------------------
|
||||
|
||||
The older "ram disk" mechanism created a synthetic block device out of
|
||||
an area of ram and used it as backing store for a filesystem. This block
|
||||
device was of fixed size, so the filesystem mounted on it was of fixed
|
||||
size. Using a ram disk also required unnecessarily copying memory from the
|
||||
fake block device into the page cache (and copying changes back out), as well
|
||||
as creating and destroying dentries. Plus it needed a filesystem driver
|
||||
(such as ext2) to format and interpret this data.
|
||||
|
||||
Compared to ramfs, this wastes memory (and memory bus bandwidth), creates
|
||||
unnecessary work for the CPU, and pollutes the CPU caches. (There are tricks
|
||||
to avoid this copying by playing with the page tables, but they're unpleasantly
|
||||
complicated and turn out to be about as expensive as the copying anyway.)
|
||||
More to the point, all the work ramfs is doing has to happen _anyway_,
|
||||
since all file access goes through the page and dentry caches. The ram
|
||||
disk is simply unnecessary, ramfs is internally much simpler.
|
||||
|
||||
Another reason ramdisks are semi-obsolete is that the introduction of
|
||||
loopback devices offered a more flexible and convenient way to create
|
||||
synthetic block devices, now from files instead of from chunks of memory.
|
||||
See losetup (8) for details.
|
||||
|
||||
ramfs and tmpfs:
|
||||
----------------
|
||||
|
||||
One downside of ramfs is you can keep writing data into it until you fill
|
||||
up all memory, and the VM can't free it because the VM thinks that files
|
||||
should get written to backing store (rather than swap space), but ramfs hasn't
|
||||
got any backing store. Because of this, only root (or a trusted user) should
|
||||
be allowed write access to a ramfs mount.
|
||||
|
||||
A ramfs derivative called tmpfs was created to add size limits, and the ability
|
||||
to write the data to swap space. Normal users can be allowed write access to
|
||||
tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information.
|
||||
|
||||
What is rootfs?
|
||||
---------------
|
||||
|
||||
Rootfs is a special instance of ramfs, which is always present in 2.6 systems.
|
||||
(It's used internally as the starting and stopping point for searches of the
|
||||
kernel's doubly-linked list of mount points.)
|
||||
|
||||
Most systems just mount another filesystem over it and ignore it. The
|
||||
amount of space an empty instance of ramfs takes up is tiny.
|
||||
|
||||
What is initramfs?
|
||||
------------------
|
||||
|
||||
All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
|
||||
extracted into rootfs when the kernel boots up. After extracting, the kernel
|
||||
checks to see if rootfs contains a file "init", and if so it executes it as PID
|
||||
1. If found, this init process is responsible for bringing the system the
|
||||
rest of the way up, including locating and mounting the real root device (if
|
||||
any). If rootfs does not contain an init program after the embedded cpio
|
||||
archive is extracted into it, the kernel will fall through to the older code
|
||||
to locate and mount a root partition, then exec some variant of /sbin/init
|
||||
out of that.
|
||||
|
||||
All this differs from the old initrd in several ways:
|
||||
|
||||
- The old initrd was a separate file, while the initramfs archive is linked
|
||||
into the linux kernel image. (The directory linux-*/usr is devoted to
|
||||
generating this archive during the build.)
|
||||
|
||||
- The old initrd file was a gzipped filesystem image (in some file format,
|
||||
such as ext2, that had to be built into the kernel), while the new
|
||||
initramfs archive is a gzipped cpio archive (like tar only simpler,
|
||||
see cpio(1) and Documentation/early-userspace/buffer-format.txt).
|
||||
|
||||
- The program run by the old initrd (which was called /initrd, not /init) did
|
||||
some setup and then returned to the kernel, while the init program from
|
||||
initramfs is not expected to return to the kernel. (If /init needs to hand
|
||||
off control it can overmount / with a new root device and exec another init
|
||||
program. See the switch_root utility, below.)
|
||||
|
||||
- When switching another root device, initrd would pivot_root and then
|
||||
umount the ramdisk. But initramfs is rootfs: you can neither pivot_root
|
||||
rootfs, nor unmount it. Instead delete everything out of rootfs to
|
||||
free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
|
||||
with the new root (cd /newmount; mount --move . /; chroot .), attach
|
||||
stdin/stdout/stderr to the new /dev/console, and exec the new init.
|
||||
|
||||
Since this is a remarkably persnickity process (and involves deleting
|
||||
commands before you can run them), the klibc package introduced a helper
|
||||
program (utils/run_init.c) to do all this for you. Most other packages
|
||||
(such as busybox) have named this command "switch_root".
|
||||
|
||||
Populating initramfs:
|
||||
---------------------
|
||||
|
||||
The 2.6 kernel build process always creates a gzipped cpio format initramfs
|
||||
archive and links it into the resulting kernel binary. By default, this
|
||||
archive is empty (consuming 134 bytes on x86). The config option
|
||||
CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices
|
||||
in menuconfig, and living in usr/Kconfig) can be used to specify a source for
|
||||
the initramfs archive, which will automatically be incorporated into the
|
||||
resulting binary. This option can point to an existing gzipped cpio archive, a
|
||||
directory containing files to be archived, or a text file specification such
|
||||
as the following example:
|
||||
|
||||
dir /dev 755 0 0
|
||||
nod /dev/console 644 0 0 c 5 1
|
||||
nod /dev/loop0 644 0 0 b 7 0
|
||||
dir /bin 755 1000 1000
|
||||
slink /bin/sh busybox 777 0 0
|
||||
file /bin/busybox initramfs/busybox 755 0 0
|
||||
dir /proc 755 0 0
|
||||
dir /sys 755 0 0
|
||||
dir /mnt 755 0 0
|
||||
file /init initramfs/init.sh 755 0 0
|
||||
|
||||
One advantage of the text file is that root access is not required to
|
||||
set permissions or create device nodes in the new archive. (Note that those
|
||||
two example "file" entries expect to find files named "init.sh" and "busybox" in
|
||||
a directory called "initramfs", under the linux-2.6.* directory. See
|
||||
Documentation/early-userspace/README for more details.)
|
||||
|
||||
If you don't already understand what shared libraries, devices, and paths
|
||||
you need to get a minimal root filesystem up and running, here are some
|
||||
references:
|
||||
http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
|
||||
http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
|
||||
http://www.linuxfromscratch.org/lfs/view/stable/
|
||||
|
||||
The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
|
||||
designed to be a tiny C library to statically link early userspace
|
||||
code against, along with some related utilities. It is BSD licensed.
|
||||
|
||||
I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
|
||||
myself. These are LGPL and GPL, respectively.
|
||||
|
||||
In theory you could use glibc, but that's not well suited for small embedded
|
||||
uses like this. (A "hello world" program statically linked against glibc is
|
||||
over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do
|
||||
name lookups, even when otherwise statically linked.)
|
||||
|
||||
Future directions:
|
||||
------------------
|
||||
|
||||
Today (2.6.14), initramfs is always compiled in, but not always used. The
|
||||
kernel falls back to legacy boot code that is reached only if initramfs does
|
||||
not contain an /init program. The fallback is legacy code, there to ensure a
|
||||
smooth transition and allowing early boot functionality to gradually move to
|
||||
"early userspace" (I.E. initramfs).
|
||||
|
||||
The move to early userspace is necessary because finding and mounting the real
|
||||
root device is complex. Root partitions can span multiple devices (raid or
|
||||
separate journal). They can be out on the network (requiring dhcp, setting a
|
||||
specific mac address, logging into a server, etc). They can live on removable
|
||||
media, with dynamically allocated major/minor numbers and persistent naming
|
||||
issues requiring a full udev implementation to sort out. They can be
|
||||
compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned,
|
||||
and so on.
|
||||
|
||||
This kind of complexity (which inevitably includes policy) is rightly handled
|
||||
in userspace. Both klibc and busybox/uClibc are working on simple initramfs
|
||||
packages to drop into a kernel build, and when standard solutions are ready
|
||||
and widely deployed, the kernel's legacy early boot code will become obsolete
|
||||
and a candidate for the feature removal schedule.
|
||||
|
||||
But that's a while off yet.
|
Loading…
Reference in a new issue