Optimize select and poll by a using stack space for small fd sets
This brings back an old optimization from Linux 2.0. Using the stack is
faster than kmalloc. On a Intel P4 system it speeds up a select of a
single pty fd by about 13% (~4000 cycles -> ~3500)
It also saves memory because a daemon hanging in select or poll will
usually save one or two less pages. This can add up - e.g. if you have 10
daemons blocking in poll/select you save 40KB of memory.
I did a patch for this long ago, but it was never applied. This version is
a reimplementation of the old patch that tries to be less intrusive. I
only did the minimal changes needed for the stack allocation.
The cut off point before external memory is allocated is currently at
832bytes. The system calls always allocate this much memory on the stack.
These 832 bytes are divided into 256 bytes frontend data (for the select
bitmaps of the pollfds) and the rest of the space for the wait queues used
by the low level drivers. There are some extreme cases where this won't
work out for select and it falls back to allocating memory too early -
especially with very sparse large select bitmaps - but the majority of
processes who only have a small number of file descriptors should be ok.
[TBD: 832/256 might not be the best split for select or poll]
I suspect more optimizations might be possible, but they would be more
complicated. One way would be to cache the select/poll context over
multiple system calls because typically the input values should be similar.
Problem is when to flush the file descriptors out though.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
drivers/ide/pci/generic.c:45: warning: `ide_generic_all_on' defined but not used
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some quick backport bits from the libata PATA work to fix things found in
the sis driver. The piix driver needs some fixes too but those are way to
large and need someone working on old IDE with time to do them.
This patch fixes the case where random bits get loaded into SIS timing
registers according to the description of the correct behaviour from
Vojtech Pavlik. It also adds the SiS5517 ATA16 chipset which is not
currently supported by the driver. Thanks to Conrad Harriss for loaning me
the machine with the 5517 chipset.
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>From http://marc.theaimsgroup.com/?l=linux-kernel&m=110304128900342&w=2
AMD756 doesn't support host side cable detection. Do disk side only and
don't advice obsolete options.
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a proper prototype for autofs4_dentry_release() to autofs_i.h.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A size_t can't be < 0.
(akpm: and rw_verify_area() already did that check)
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add proper prototypes for fat_cache_init() and fat_cache_destroy() in
msdos_fs.h.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The Coverity checker found this off-by-one error.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since dash2underscore() just operates and returns chars, I guess its safe
to change the return value to a char. With my .config, this reduces its
size by 5 bytes.
text data bss dec hex filename
4155 152 0 4307 10d3 params.o.orig
4150 152 0 4302 10ce params.o
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
(akpm: I don't do comment typos patches. This one snuck through by accident)
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Renumber the recently-added POLLREMOVE and POLLRDHUP to line up with the other
architectures.
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On ppc64 we look at a profiling register to work out the sample address and
if it was in userspace or kernel.
The backtrace interface oprofile_add_sample does not allow this. Create
oprofile_add_ext_sample and make oprofile_add_sample use it too.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: John Levon <levon@movementarian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
gcc-4.2:
kernel/module.c: In function '__find_symbol':
kernel/module.c:158: warning: the address of '__start___kcrctab', will always evaluate as 'true'
kernel/module.c:165: warning: the address of '__start___kcrctab_gpl', will always evaluate as 'true'
kernel/module.c:182: warning: the address of '__start___kcrctab_gpl_future', will always evaluate as 'true'
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the user specified `major=0' (odd thing to do), capi.c will use dynamic
allocation. We need to pick up that major for subsequent unregister_chrdev().
Acked-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the user specified `major=0' (odd thing to do), pt.c will use dynamic
allocation. We need to pick up that major for subsequent unregister_chrdev().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If the user specified `major=0' (odd thing to do), pg.c will use dynamic
allocation. We need to pick up that major for subsequent unregister_chrdev().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's purely cosmetic, but with the patch there's no longer a
BLK_DEV_RAM_COUNT setting in the .config if BLK_DEV_RAM=n.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove code in async receive handling that serves no purpose with new tty
receive buffering. Previously this code tried to free up receive buffer
space, but now does nothing useful while making expensive calls.
Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add driver support for general purpose I/O feature of the Synclink GT
adapters.
Signed-off-by: Paul Fulghum <paulkf@micrgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove dead code from synclink driver. This was used previously when the
write method had a from_user flag, which has been removed.
Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that Christoph Lameter's atomic_long_t support is merged in mainline,
might as well convert asm-generic/local.h to use it, so the same code can
be used for both sizes of 32 and 64-bit unsigned longs.
akpm sayeth:
Q:
Is there any particular reason why these routines weren't simply
implemented with local_save/restore_flags, if they are only meant to
guarantee atomicity to the local cpu? I'm sure on most platforms this
would be more efficient than using an atomic...
A:
The whole _point_ of local_t is to avoid local_irq_disable(). It's
designed to exploit the fact that many CPUs can do incs and decs in a way
which is atomic wrt local interrupts, but not atomic wrt SMP.
But this patch makes sense, because asm-generic/local.h is just a fallback
implementation for architectures which either cannot perform these
local-irq-atomic operations, or its maintainers haven't yet got around to
implementing them.
We need more work done on local_t in the 2.6.17 timeframe - they're defined as
unsigned long, but some architectures implement them as signed long.
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix up some RTC whitespace and style
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The sync may still be needed for CPU clock calibration but we don't sync in
the regular case.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move real_year inside the read loop and move the spinlock up as well
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reading the CMOS clock on x86 and some other arches currently takes up to one
second because it synchronizes with the CMOS second tick-over. This delay
shows up at boot time as well a resume time.
This is the currently the most substantial boot time delay for machines that
are working towards instant-on capability. Also, a quick back of the envelope
calculation (.5sec * 2M users * 1 boot a day * 10 years) suggests it has cost
Linux users in the neighborhood of a million man-hours.
An earlier thread on this topic is here:
http://groups.google.com/group/linux.kernel/browse_frm/thread/8a24255215ff6151/2aa97e66a977653d?hl=en&lr=&ie=UTF-8&rnum=1&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D1To2R-2S7-11%40gated-at.bofh.it#2aa97e66a977653d
..from which the consensus seems to be that it's no longer desirable.
In my view, there are basically four cases to consider:
1) networked, need precise walltime: use NTP
2) networked, don't need precise walltime: use NTP anyway
3) not networked, don't need sub-second precision walltime: don't care
4) not networked, need sub-second precision walltime:
get a network or a radio time source because RTC isn't good enough anyway
So this patch series simply removes the synchronization in favor of a simple
seqlock-like approach using the seconds value.
Note that for purposes of timer accuracy on wakeup, this patch will cause us
to fire timers up to one second late. But as the current timer resume code
will already sync once (or more!), it's no worse for short timers.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NET]: drop duplicate assignment in request_sock
[IPSEC]: Fix tunnel error handling in ipcomp6
* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] Don't make debugfs depend on DEBUG_KERNEL
[PATCH] Fix blktrace compile with sysfs not defined
[PATCH] unused label in drivers/block/cciss.
[BLOCK] increase size of disk stat counters
[PATCH] blk_execute_rq_nowait-speedup
[PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure
[BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
[PATCH] kzalloc() conversion in drivers/block
[PATCH] update max_sectors documentation
... being careful that mutex_trylock is inverted wrt down_trylock
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When retrying a write due to barrier failure, we don't reset 'remaining', so
it goes negative and never hits 0 again.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
An md array can be asked to change the amount of each device that it is using,
and in particular can be asked to use the maximum available space. This
currently only works if the first device is not larger than the rest. As
'size' gets changed and so 'fit' becomes wrong. So check if a 'fit' is
required early and don't corrupt it.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
raid5 overloads bi_phys_segments to count the number of blocks that the
request was broken in to so that it knows when the bio is completely handled.
Accessing this must always be done under a spinlock. In one case we also call
bi_end_io under that spinlock, which probably isn't ideal as bi_end_io could
be expensive (even though it isn't allowed to sleep).
So we reducde the range of the spinlock to just accessing bi_phys_segments.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
wait_event_lock_irq puts a ';' after its usage of the 4th arg, so we don't
need to.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows user-space to access data safely. This is needed for raid5
reshape as user-space needs to take a backup of the first few stripes before
allowing reshape to commence.
It will also be useful in cluster-aware raid1 configurations so that all
cluster members can leave a section of the array untouched while a
resync/recovery happens.
A 'start' and 'end' of the suspended range are written to 2 sysfs attributes.
Note that only one range can be suspended at a time.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This allows reshape to be triggerred via sysfs (which is the only way to start
it happening).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>