Change the page cache allocation calls to support cpuset memory spreading.
See the previous patch, cpuset_mem_spread, for an explanation of cpuset memory
spreading.
On systems without cpusets configured in the kernel, this is no change.
On systems with cpusets configured in the kernel, but the "memory_spread"
cpuset option not enabled for the current tasks cpuset, this adds a call to a
cpuset routine and failed bit test of the processor state flag PF_SPREAD_PAGE.
On tasks in cpusets with "memory_spread" enabled, this adds a call to a cpuset
routine that computes which of the tasks mems_allowed nodes should be
preferred for this allocation.
If memory spreading applies to a particular allocation, then any other NUMA
mempolicy does not apply.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides the implementation and cpuset interface for an alternative
memory allocation policy that can be applied to certain kinds of memory
allocations, such as the page cache (file system buffers) and some slab caches
(such as inode caches).
The policy is called "memory spreading." If enabled, it spreads out these
kinds of memory allocations over all the nodes allowed to a task, instead of
preferring to place them on the node where the task is executing.
All other kinds of allocations, including anonymous pages for a tasks stack
and data regions, are not affected by this policy choice, and continue to be
allocated preferring the node local to execution, as modified by the NUMA
mempolicy.
There are two boolean flag files per cpuset that control where the kernel
allocates pages for the file system buffers and related in kernel data
structures. They are called 'memory_spread_page' and 'memory_spread_slab'.
If the per-cpuset boolean flag file 'memory_spread_page' is set, then the
kernel will spread the file system buffers (page cache) evenly over all the
nodes that the faulting task is allowed to use, instead of preferring to put
those pages on the node where the task is running.
If the per-cpuset boolean flag file 'memory_spread_slab' is set, then the
kernel will spread some file system related slab caches, such as for inodes
and dentries evenly over all the nodes that the faulting task is allowed to
use, instead of preferring to put those pages on the node where the task is
running.
The implementation is simple. Setting the cpuset flags 'memory_spread_page'
or 'memory_spread_cache' turns on the per-process flags PF_SPREAD_PAGE or
PF_SPREAD_SLAB, respectively, for each task that is in the cpuset or
subsequently joins that cpuset. In subsequent patches, the page allocation
calls for the affected page cache and slab caches are modified to perform an
inline check for these flags, and if set, a call to a new routine
cpuset_mem_spread_node() returns the node to prefer for the allocation.
The cpuset_mem_spread_node() routine is also simple. It uses the value of a
per-task rotor cpuset_mem_spread_rotor to select the next node in the current
tasks mems_allowed to prefer for the allocation.
This policy can provide substantial improvements for jobs that need to place
thread local data on the corresponding node, but that need to access large
file system data sets that need to be spread across the several nodes in the
jobs cpuset in order to fit. Without this patch, especially for jobs that
might have one thread reading in the data set, the memory allocation across
the nodes in the jobs cpuset can become very uneven.
A couple of Copyright year ranges are updated as well. And a couple of email
addresses that can be found in the MAINTAINERS file are removed.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.
This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
- parisc
- sparc64
- make the needlessly global function static:
- arm
- h8300
- m68k
- m68knommu
- s390
- v850
- x86_64
- add a prototype in asm/system.h:
- cris
- i386
- ia64
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make that the internal values for:
/proc/sys/vm/dirty_writeback_centisecs
/proc/sys/vm/dirty_expire_centisecs
are stored as jiffies instead of centiseconds. Let the sysctl interface do
the conversions with full precision using clock_t_to_jiffies, instead of
doing overflow-sensitive on-the-fly conversions every time the values are
used.
Cons: apparent precision loss if HZ is not a multiple of 100, because of
conversion back and forth. This is a common problem for all sysctl values
that use proc_dointvec_userhz_jiffies. (There is only one other in-tree
use, in net/core/neighbour.c.)
Signed-off-by: Bart Samwel <bart@samwel.tk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mundt <lethal@linux-sh.org> says:
This patch set implements a number of patches to clean up and restructure the
bitmap region code, in addition to extending the interface to support
multiword spanning allocations.
The current implementation (before this patch set) is limited by only being
able to allocate pages <= BITS_PER_LONG, as noted by the strategically
positioned BUG_ON() at lib/bitmap.c:752:
/* We don't do regions of pages > BITS_PER_LONG. The
* algorithm would be a simple look for multiple zeros in the
* array, but there's no driver today that needs this. If you
* trip this BUG(), you get to code it... */
BUG_ON(pages > BITS_PER_LONG);
As I seem to have been the first person to trigger this, the result ends up
being the following patch set with the help of Paul Jackson.
The final patch in the series eliminates quite a bit of code duplication, so
the bitmap code size ends up being smaller than the current implementation as
an added bonus.
After these are applied, it should already be possible to do multiword
allocations with dma_alloc_coherent() out of ranges established by
dma_declare_coherent_memory() on x86 without having to change any of the code,
and the SH store queue API will follow up on this as the other user that needs
support for this.
This patch:
Some code cleanup on the lib/bitmap.c bitmap_*_region() routines:
* spacing
* variable names
* comments
Has no change to code function.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The meaning of MS_VERBOSE is backwards; if the bit is set, it really means,
"don't be verbose". This is confusing and counter-intuitive.
In addition, there is also no way to set the MS_VERBOSE flag in the
mount(8) program in util-linux, but interesting, it does define options
which would do the right thing if MS_SILENT were defined, which
unfortunately we do not:
#ifdef MS_SILENT
{ "quiet", 0, 0, MS_SILENT }, /* be quiet */
{ "loud", 0, 1, MS_SILENT }, /* print out messages. */
#endif
So the obvious fix is to deprecate the use of MS_VERBOSE and replace it
with MS_SILENT.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All architecture independent system calls should be declared
in syscalls.h, add the one that is missing.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'blktrace' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
[PATCH] relay: consolidate sendfile() and read() code
[PATCH] relay: add sendfile() support
[PATCH] relay: migrate from relayfs to a generic relay API
Add missing 'const' to pci_request_region[s] 'res_name' arg,
since we pass it directly to __request_region(), whose 'name' arg
is also const.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turns out AMD 8131 quirk only affects MSI for devices behind the 8131 bridge.
Handle this by adding a flags field in pci_bus, inherited from parent to child.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change the semantics of this call to return the max reserved
bus number instead of just the max assigned bus number.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add Broadcom HT-1000 south bridge's PCI ID to i2c-piix driver. Note
that at least on Supermicro H8SSL it uses non-standard SMBHSTCFG = 3
and standard values like 0 or 9 causes hangup.
Signed-off-by: Martin Devera <devik@cdi.cz>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Drop the i2c-frodo bus driver. It isn't referenced by the build
system, and depends on code which was never included in 2.6 kernels.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
semaphore to mutex conversion.
the conversion was generated via scripts, and the result was validated
automatically via a script as well.
build tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch refactors SENSOR_DEVICE_ATTR_2 macro, following pattern set by
SENSOR_ATTR. First it creates a new macro SENSOR_ATTR_2() which expands
to an initialization expression, then it uses that in SENSOR_DEVICE_ATTR_2,
which declares and initializes a struct sensor_device_attribute_2.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch refactors SENSOR_DEVICE_ATTR macro. First it creates a new
macro SENSOR_ATTR() which expands to an initialization expression, then
it uses that in SENSOR_DEVICE_ATTR, which declares and initializes a
struct sensor_device_attribute.
IOW, SENSOR_ATTR() imitates __ATTR() in include/linux/device.h.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().
This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eliminate a handful of cache references by keeping current in a register
instead of reloading (helps x86) and avoiding the overhead of a function
call. Inlining eventpoll_init_file() saves 24 bytes. Also reorder file
initialization to make writes occur more sequentially.
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes two needlessly global structs static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attempt to fix the problem wherein people's oops reports scroll off the screen
due to repeated oopsing or to oopses on other CPUs.
If this happens the user can reboot with the `pause_on_oops=<seconds>' option.
It will allow the first oopsing CPU to print an oops record just a single
time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs
to enter a tight loop until the specified number of seconds have elapsed.
The patch implements the infrastructure generically in the expectation that
architectures other than x86 will find it useful.
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes all occurances of _INLINE_ in the kernel.
With the exception of tty_flip.h, I've simply removed the inline's since
gcc should know best which functions to be inlined.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Declarations use struct notifier_block on both legs of the ifdef, so move the
notifier_block forward declaration outside the ifdef.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The fat code uses the fat_lock always in a mutex way (taking and releasing
the lock in the same function), the patch below converts it into the new
mutex primitive. Please consider this patch for the code.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ext3's truncate_sem is always released in the same function it's taken
and it otherwise is a mutex as well..
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Seems like needless clutter having a bunch of #if defined(CONFIG_$ARCH) in
include/linux/cache.h. Move the per architecture section definition to
asm/cache.h, and keep the if-not-defined dummy case in linux/cache.h to
catch architectures which don't implement the section.
Verified that symbols still go in .data.read_mostly on parisc,
and the compile doesn't break.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since early 2.4.x all cdrom drivers implement the block_device methods
themselves, so they can handle additional ioctls directly instead of going
through the cdrom layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
platforms, lowering kmalloc() allocated space by 50%.
2) Reduce the size of (files_struct), using a special 32 bits (or
64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
close_on_exec_init and open_fds_init fields. This save some ram (248
bytes per task) as most tasks dont open more than 32 files. D-Cache
footprint for such tasks is also reduced to the minimum.
3) Reduce size of allocated fdset. Currently two full pages are
allocated, that is 32768 bits on x86 for example, and way too much. The
minimum is now L1_CACHE_BYTES.
UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
same cache line)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus points out that ext3_readdir's readahead only cuts in when
ext3_readdir() is operating at the very start of the directory. So for large
directories we end up performing no readahead at all and we suck.
So take it all out and use the core VM's page_cache_readahead(). This means
that ext3 directory reads will use all of readahead's dynamic sizing goop.
Note that we're using the directory's filp->f_ra to hold the readahead state,
but readahead is actually being performed against the underlying blockdev's
address_space. Fortunately the readahead code is all set up to handle this.
Tested with printk. It works. I was struggling to find a real workload which
actually cared.
(The patch also exports page_cache_readahead() to GPL modules)
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>