Commit graph

222 commits

Author SHA1 Message Date
Heiko Carstens
642fe301c3 [SPARC64]: Fix sys_newfstatat syscall table entry for 64-bit.
The sparc64 64 bit syscall table seems to be broken as it has
compat_sys_newfstatat in its syscall table instead of sys_newfstatat.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 15:09:15 -08:00
David S. Miller
0fc9b55606 [SPARC64]: Update defconfig.
Do not enable CONFIG_LOCALVERSION_AUTO by default.
When doing kernel development it just leaves a ton
of crap around.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-07 18:12:34 -08:00
David S. Miller
1b9a428901 [SPARC]: Wire up sys_unshare().
Also, the Solaris syscall table is sized differrently,
and does not go beyond entry 255, so trim off the excess
entries.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-07 18:11:24 -08:00
David S. Miller
7f7ff6bf02 [SPARC64]: Update defconfig.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-05 00:15:12 -08:00
David S. Miller
3794689dba [SPARC64]: Add .gitignore file for sparc64 boot images.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-05 00:15:11 -08:00
David S. Miller
cddfc12e25 [SPARC64]: Kill compat_sys_clock_settime sign extension stub.
It's wrong and totally unneeded.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-30 01:31:09 -08:00
David S. Miller
4415863773 [SPARC]: Increase NR_SYSCALLS to 299
To let new syscalls through.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-22 12:12:01 -08:00
David S. Miller
3ee68c4af3 [SPARC64]: Use compat_sys_futimesat in 32-bit syscall table.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-20 01:49:15 -08:00
David S. Miller
2d7d5f0511 [SPARC]: Add support for *at(), ppoll, and pselect syscalls.
This also includes by necessity _TIF_RESTORE_SIGMASK support,
which actually resulted in a lot of cleanups.

The sparc signal handling code is quite a mess and I should
clean it up some day.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-19 02:42:49 -08:00
Ulrich Drepper
5590ff0d55 [PATCH] vfs: *at functions: core
Here is a series of patches which introduce in total 13 new system calls
which take a file descriptor/filename pair instead of a single file
name.  These functions, openat etc, have been discussed on numerous
occasions.  They are needed to implement race-free filesystem traversal,
they are necessary to implement a virtual per-thread current working
directory (think multi-threaded backup software), etc.

We have in glibc today implementations of the interfaces which use the
/proc/self/fd magic.  But this code is rather expensive.  Here are some
results (similar to what Jim Meyering posted before).

The test creates a deep directory hierarchy on a tmpfs filesystem.  Then
rm -fr is used to remove all directories.  Without syscall support I get
this:

real    0m31.921s
user    0m0.688s
sys     0m31.234s

With syscall support the results are much better:

real    0m20.699s
user    0m0.536s
sys     0m20.149s

The interfaces are for obvious reasons currently not much used.  But they'll
be used.  coreutils (and Jeff's posixutils) are already using them.
Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
them.  I expect a patch to make follow soon.  Every program which is walking
the filesystem tree will benefit.

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:29 -08:00
Linus Torvalds
3da38566df Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 2006-01-18 15:08:02 -08:00
David S. Miller
959a85ada5 [SPARC64]: Fix build with CONFIG_COMPAT disabled.
Based upon a report and preliminary patch from Jim Gifford.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18 14:58:05 -08:00
Eddie C. Dost
c126cf80d4 [SPARC64]: Serial Console for E250 Patch
From: Eddie C. Dost <ecd@brainaid.de>

I have the following patch for serial console over the RSC
(remote system controller) on my E250 machine. It basically adds
support for input-device=rsc and output-device=rsc from OBP, and
allows 115200,8,n,1,- serial mode setting.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18 14:54:31 -08:00
David S. Miller
c07a8475dd [SPARC64]: Update defconfig.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18 13:41:36 -08:00
Jesse Brandeburg
35ec56bb78 [PATCH] e1000: Added disable packet split capability
Adds the ability to disability packet split at compile time and use the legacy receive path on PCI express hardware.  Made this a CONFIG option and modified the Kconfig, to reflect the new option.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-18 16:17:57 -05:00
Richard Mortimer
9eb3394bf2 [SPARC64]: Eliminate race condition reading Hummingbird STICK register
Ensure a consistent value is read from the STICK register by ensuring
that both high and low are read without high changing due to a roll
over of the low register.

Various Debian/SPARC users (myself include) have noticed problems with
Hummingbird based systems. The symptoms are that the system time is
seen to jump forward 3 days, 6 hours, 11 minutes give or take a few
seconds. In many cases the system then hangs some time afterwards.

I've spotted a race condition in the code to read the STICK register.
I could not work out why 3d, 6h, 11m is important but guess that it is
due to the 2^32 jump of STICK (forwards on one read and then the next
read will seem to be backwards) during a timer interrupt. I'm guessing
that a change of -2^32 will get converted to a large unsigned
increment after the arithmetic manipulation between STICK,
nanoseconds, jiffies etc.

I did a test where I modified __hbird_read_stick to artificially
inject rollover faults forcefully every few seconds. With this I saw
the clock jump over 6 times in 12 hours compared to once every month
or so.

Signed-off-by: Richard Mortimer <richm@oldelvet.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 15:21:01 -08:00
Al Viro
26ecbdea4b [PATCH] sparc64: task_pt_regs()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:52 -08:00
Al Viro
ee3eea165e [PATCH] sparc64: task_stack_page()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:52 -08:00
Al Viro
f3169641c1 [PATCH] sparc64: task_thread_info()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:52 -08:00
Randy Dunlap
a941564458 [PATCH] capable/capability.h (arch/)
arch: Use <linux/capability.h> where capable() is used.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:14 -08:00
Keshavamurthy Anil S
eb3a72921c [PATCH] kprobes: fix race in recovery of reentrant probe
There is a window where a probe gets removed right after the probe is hit
on some different cpu.  In this case probe handlers can't find a matching
probe instance related to break address.  In this case we need to read the
original instruction at break address to see if that is not a break/int3
instruction and recover safely.

Previous code had a bug where we were not checking for the above race in
case of reentrant probes and the below patch fixes this race.

Tested on IA64, Powerpc, x86_64.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:12 -08:00
Adrian Bunk
61943c5015 [SPARC64] arch/sparc64/Kconfig: fix HUGETLB_PAGE_SIZE_64K dependencies
This patch fixes a typo in the dependencies of HUGETLB_PAGE_SIZE_64K.

It might be more logical to rename the HUGETLB_PAGE_SIZE_*K
dependencies to HUGETLB_PAGE_SIZE_*KB, but let's fix this bug first.

This bug was reported by Jean-Luc Leger <reiga@dspnet.fr.eu.org>.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 15:55:23 -08:00
Linus Torvalds
dd49f96777 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 2006-01-10 08:28:53 -08:00
Anil S Keshavamurthy
e597c2984c [PATCH] kprobes: arch_remove_kprobe
Currently arch_remove_kprobes() is only implemented/required for x86_64 and
powerpc.  All other architecture like IA64, i386 and sparc64 implementes a
dummy function which is being called from arch independent kprobes.c file.

This patch removes the dummy functions and replaces it with
#define arch_remove_kprobe(p, s)	do { } while(0)

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:40 -08:00
Anil S Keshavamurthy
49a2a1b83b [PATCH] kprobes: changed from using spinlock to mutex
Since Kprobes runtime exception handlers is now lock free as this code path is
now using RCU to walk through the list, there is no need for the
register/unregister{_kprobe} to use spin_{lock/unlock}_isr{save/restore}.  The
serialization during registration/unregistration is now possible using just a
mutex.

In the above process, this patch also fixes a minor memory leak for x86_64 and
powerpc.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:40 -08:00
Christoph Hellwig
e6a6d2efcb [PATCH] sanitize building of fs/compat_ioctl.c
Now that all these entries in the arch ioctl32.c files are gone [1], we can
build fs/compat_ioctl.c as a normal object and kill tons of cruft.  We need a
special do_ioctl32_pointer handler for s390 so the compat_ptr call is done.
This is not needed but harmless on all other architectures.  Also remove some
superflous includes in fs/compat_ioctl.c

Tested on ppc64.

[1] parisc still had it's PPP handler left, which is not fully correct
    for ppp and besides that ppp uses the generic SIOCPRIV ioctl so it'd
    kick in for all netdevice users.  We can introduce a proper handler
    in one of the next patch series by adding a compat_ioctl method to
    struct net_device but for now let's just kill it - parisc doesn't
    compile in mainline anyway and I don't want this to block this
    patchset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:33 -08:00
Christoph Hellwig
3a0f69d59b [PATCH] common compat_sys_timer_create
The comment in compat.c is wrong, every architecture provides a
get_compat_sigevent() for the IPC compat code already.

This basically moves the x86_64 version to common code and removes all the
others.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:32 -08:00
akpm@osdl.org
df2e71fb91 [PATCH] dump_thread() cleanup
)

From: Adrian Bunk <bunk@stusta.de>

- create one common dump_thread() prototype in kernel.h

- dump_thread() is only used in fs/binfmt_aout.c and can therefore be
  removed on all architectures where CONFIG_BINFMT_AOUT is not
  available

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:25 -08:00
David S. Miller
cdade109d3 [SPARC64]: Fix sys_fstat64() entry in 64-bit syscall table.
Noticed by Jakub Jelinek.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 20:43:57 -08:00
Linus Torvalds
977127174a Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 2006-01-09 18:41:42 -08:00
David S. Miller
3ebc284d52 [SPARC64]: Update defconfig.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:36:49 -08:00
Richard Mortimer
695ca07bf1 [SPARC64]: Fix ptrace/strace
Don't clobber register %l0 while checking TI_SYS_NOERROR value in
syscall return path.  This bug was introduced by:

db7d9a4eb7

Problem narrowed down by Luis F. Ortiz and Richard Mortimer.

I tried using %l2 as suggested by Luis and that works for me.

Looking at the code I wonder if it makes sense to simplify the code
a little bit. The following works for me but I'm not sure how to
exercise the "NOERROR" codepath.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:35:50 -08:00
David S. Miller
90bf811664 [SPARC64]: Add needed pm_power_off symbol.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:12:50 -08:00
Jiri Slaby
d08fa1a22e [PATCH] PCI: pci_find_device remove (sparc64/kernel/ebus.c)
Signed-off-by: Jiri Slaby <xslaby@fi.muni.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-09 12:13:15 -08:00
Matt Mackall
e585e47031 [PATCH] tiny: Make *[ug]id16 support optional
Configurable 16-bit UID and friends support

This allows turning off the legacy 16 bit UID interfaces on embedded platforms.

   text    data     bss     dec     hex filename
3330172  529036  190556 4049764  3dcb64 vmlinux-baseline
3328268  529040  190556 4047864  3dc3f8 vmlinux

From: Adrian Bunk <bunk@stusta.de>

    UID16 was accidentially disabled for !EMBEDDED.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:11 -08:00
Christoph Hellwig
6b9c7ed848 [PATCH] use ptrace_get_task_struct in various places
The ptrace_get_task_struct() helper that I added as part of the ptrace
consolidation is useful in variety of places that currently opencode it.
Switch them to the common helpers.

Add a ptrace_traceme() helper that needs to be explicitly called, and simplify
the ptrace_get_task_struct() interface.  We don't need the request argument
now, and we return the task_struct directly, using ERR_PTR() for error
returns.  It's a bit more code in the callers, but we have two sane routines
that do one thing well now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:51 -08:00
David S. Miller
d5784b57d2 [SPARC]: Use STABS_DEBUG and DWARF_DEBUG macros in vmlinux.lds.S
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-28 13:22:54 -08:00
David S. Miller
597d1f0622 [SPARC]: Kill CHILD_MAX.
It's definition is wrong (-1 means "no limit" not 999),
only the Sparc SunOS/Solaris compat code uses it, so
let's just kill it off completely from limits.h and
all referencing code.

Noticed by Ulrich Drepper.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-22 23:10:03 -08:00
Adrian Bunk
0b57ee9e55 [SPARC]: introduce a SPARC Kconfig symbol
Introduce a Kconfig symbol SPARC that is defined on both the sparc and
sparc64 architectures.

This symbol makes some dependencies more readable.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-22 23:09:54 -08:00
David S. Miller
a9c9dff1bc [SPARC64]: Stop putting -finline-limit=XXX into CFLAGS
It was a stupid workaround for the "static inline" vs.
"extern inline" issues of long ago, and it is what causes
schedule() to be inlined like crazy into kernel/sched.c
when -Os is specified.

MIPS and S390 should probably do the same.

Now CC_OPTIMIZE_FOR_SIZE can be safely used on sparc64
once more.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-20 14:53:05 -08:00
Keshavamurthy Anil S
bf8d5c52c3 [PATCH] kprobes: increment kprobe missed count for multiprobes
When multiple probes are registered at the same address and if due to some
recursion (probe getting triggered within a probe handler), we skip calling
pre_handlers and just increment nmissed field.

The below patch make sure it walks the list for multiple probes case.
Without the below patch we get incorrect results of nmissed count for
multiple probe case.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 08:57:45 -08:00
David S. Miller
cab3f16feb [SPARC64]: Fix >8K I/O mappings.
Increment the PFN field of the PTE so that the tests
on vm_pfn in mm/memory.c match up.  The TLB ignores these
lower bits for larger page sizes, so it's OK to set things
like this.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 13:59:03 -08:00
David S. Miller
5cd9194a1b [PATCH] sparc: convert IO remapping to VM_PFNMAP
Here are the Sparc bits.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:35:36 -08:00
Hugh Dickins
f3d48f0373 [PATCH] unpaged: fix sound Bad page states
Earlier I unifdefed PageCompound, so that snd_pcm_mmap_control_nopage and
others can give out a 0-order component of a higher-order page, which won't
be mistakenly freed when zap_pte_range unmaps it.  But many Bad page states
reported a PG_reserved was freed after all: I had missed that we need to
say __GFP_COMP to get compound page behaviour.

Some of these higher-order pages are allocated by snd_malloc_pages, some by
snd_malloc_dev_pages; or if SBUS, by sbus_alloc_consistent - but that has
no gfp arg, so add __GFP_COMP into its sparc32/64 implementations.

I'm still rather puzzled that DRM seems not to need a similar change.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:13:43 -08:00
Hugh Dickins
0b14c179a4 [PATCH] unpaged: VM_UNPAGED
Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
drivers set VM_RESERVED on areas which are then populated by nopage.  The
PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
zap_pte_range, without changing those drivers not to set it: so their pages
just leak away.

Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
to flag the special areas where the ptes may have no struct page, or if they
have then it's not to be touched.  Replace most instances of VM_RESERVED in
core mm by VM_UNPAGED.  Force it on in remap_pfn_range, and the sparc and
sparc64 io_remap_pfn_range.

Revert addition of VM_RESERVED to powerpc vdso, it's not needed there.  Is it
needed anywhere?  It still governs the mm->reserved_vm statistic, and special
vmas not to be merged, and areas not to be core dumped; but could probably be
eliminated later (the drivers are probably specifying it because in 2.4 it
kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
don't get on).

Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
purpose whatsoever, and should be removed from drivers when we clean up.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:13:42 -08:00
Christoph Hellwig
9ffb83bcc5 [SBUSFB]: implement ->compat_ioctl
This patch adds a new function, sbusfb_compat_ioctl() to
drivers/video/sbuslib.c and uses it as compat_ioctl in all sbus fb
drivers

This remove the last per-arch compat ioctl bits in
arch/sparc64/kernel/ioctl32.c so it would be nice if people could test
if this actually copiles and works and if yes apply it :)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-12 12:11:12 -08:00
David S. Miller
4d45cbacb8 [SPARC64]: Restore 2.4.x /proc/cpuinfo behavior for "ncpus probed" field.
Noticed by Tom 'spot' Callaway.

Even on uniprocessor we always reported the number of physical
cpus in the system via /proc/cpuinfo.  But when this got changed
to use num_possible_cpus() it always reads as "1" on uniprocessor.
This change was unintentional.

So scan the firmware device tree and count the number of cpu
nodes, and report that, as we always did.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-11 12:48:56 -08:00
Christoph Hellwig
8ca2bdc7a9 [SPARC] sbus rtc: implement ->compat_ioctl
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 12:07:18 -08:00
Tobias Klauser
84c1a13a30 [SPARC64]: Use ARRAY_SIZE macro
Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove a
duplicate of ARRAY_SIZE which is never used anyways.

Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 12:03:42 -08:00
Nick Piggin
64c7c8f885 [PATCH] sched: resched and cpu_idle rework
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid.  Improves efficiency of
resched_task and some cpu_idle routines.

* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
  and as we hold it during resched_task, then there is no need for an
  atomic test and set there. The only other time this should be set is
  when the task's quantum expires, in the timer interrupt - this is
  protected against because the rq lock is irq-safe.

- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
  won't get unset until the task get's schedule()d off.

- If we are running on the same CPU as the task we resched, then set
  TIF_NEED_RESCHED and no further action is required.

- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
  after TIF_NEED_RESCHED has been set, then we need to send an IPI.

Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.

* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
  becomes true, explicitly call schedule(). This makes things a bit clearer
  (IMO), but haven't updated all architectures yet.

- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
  to the resched_task rules, this isn't needed (and actually breaks the
  assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
  held). So remove that. Generally one less locked memory op when switching
  to the idle thread.

- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
  most polling idle loops. The above resched_task semantics allow it to be
  set until before the last time need_resched() is checked before going into
  a halt requiring interrupt wakeup.

  Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
  can be always left set, completely eliminating resched IPIs when rescheduling
  the idle task.

  POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:33 -08:00