Commit graph

155 commits

Author SHA1 Message Date
Nicolas Pitre
99595d0237 [ARM] 3308/1: old ABI compat: struct sockaddr_un
Patch from Nicolas Pitre

struct sockaddr_un loses its padding with EABI.  Since the size of the
structure is used as a validation test in unix_mkname(), we need to
change the length argument to 110 whenever it is 112.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-02-08 21:19:36 +00:00
Al Viro
fa1b4f91d6 [ARM] safer handling of syscall table padding
ARM entry-common.S needs to know syscall table size; in itself that would
not be a problem, but there's an additional constraint - some of the
instructions using it want a constant that would be a multiple of 4.
So we have to pad syscall table with sys_ni_syscall and that's where
the trouble begins.  .rept pseudo-op wants a constant expression for
number of repetitions and subtraction of two labels (before and after
syscall table) doesn't always get simplified to constant early enough
for .rept.  If labels end up in different frags, we lose.  And while
the frag size is large enough (slightly below 4Kb), the syscall table
is about 1/3 of that.  We used to get away with that, but the recent
changes had been enough to trigger the breakage.

Proper fix is simple: have a macro (CALL(x)) to populate the table
instead of using explicit .long x and the first time we include calls.S
have it defined to .equ NR_syscalls,NR_syscalls+1.  Then we can find
the proper amount of padding on the first inclusion simply by looking
at NR_syscalls at that time.  And that will be constant, no matter what.

Moreover, the same trick kills the need of having an estimate of padded
NR_syscalls - it will be calculated for free at the same time.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-19 12:57:01 +00:00
Nicolas Pitre
5e0974459d [ARM] 3271/1: ARM EABI: fix calling of cmpxchg syscall emulation
Patch from Nicolas Pitre

This is kernel provided user space code.

Since a syscall is used, it has to be updated to work with EABI.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18 22:38:49 +00:00
Nicolas Pitre
fcca538b83 [ARM] 3270/1: ARM EABI: fix sigreturn and rt_sigreturn
Patch from Nicolas Pitre

The signal return path consists of user code provided by the kernel.
Since a syscall is used, it has to be updated to work with EABI.

Noticed by Daniel Jacobowitz.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-18 22:38:47 +00:00
Linus Torvalds
8d5c315059 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2006-01-14 19:43:21 -08:00
Nicolas Pitre
3f471126ee [ARM] 3262/4: allow ptraced syscalls to be overriden
Patch from Nicolas Pitre

This is needed by strace to properly handle the tracing of some system
calls. It could be useful for other applications as well.

Based on an earlier patch from Daniel Jacobowitz.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Daniel Jacobowitz <dan@debian.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-14 19:30:04 +00:00
Nicolas Pitre
dd35afc22b [ARM] 3110/5: old ABI compat: multi-ABI syscall entry support
Patch from Nicolas Pitre

This patch adds the required code to support both user space ABIs at
the same time. A second syscall table is created to include legacy ABI
syscalls that need an ABI compat wrapper.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-14 16:36:12 +00:00
Nicolas Pitre
687ad01914 [ARM] 3109/1: old ABI compat: syscall wrappers for ABI impedance matching
Patch from Nicolas Pitre

The difference between EABI and the legacy ABI may affect either
structure member alignment and/or argument register selection.

The patch has the details.

Included are wrappers for the following syscalls:

  sys_stat64
  sys_lstat64
  sys_fstat64
  sys_fcntl64
  sys_epoll_ctl
  sys_epoll_wait
  sys_ipc
  sys_semop
  sys_semtimedop
  sys_pread64
  sys_pwrite64
  sys_truncate64
  sys_ftruncate64
  sys_readahead

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-14 16:35:31 +00:00
Nicolas Pitre
713c481519 [ARM] 3108/2: old ABI compat: statfs64 and fstatfs64
Patch from Nicolas Pitre

struct statfs64 has extra padding with EABI growing its size from 84 to
88. This struct is now __attribute__((packed,aligned(4))) with a small
assembly wrapper to force the sz argument to 84 if it is 88 to avoid
copying the extra padding over user space memory unexpecting it.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-14 16:35:03 +00:00
Nicolas Pitre
3f2829a315 [ARM] 3105/4: ARM EABI: new syscall entry convention
Patch from Nicolas Pitre

For a while we wanted to change the way syscalls were called on ARM.
Instead of encoding the syscall number in the swi instruction which
requires reading back the instruction from memory to extract that number
and polluting the data cache, it was decided that simply storing the
syscall number into r7 would be more efficient. Since this represents
an ABI change then making that change at the same time as EABI support
is the right thing to do.

It is now expected that EABI user space binaries put the syscall number
into r7 and use "swi 0" to call the kernel. Syscall register argument
are also expected to have "EABI arrangement" i.e. 64-bit arguments
should be put in a pair of registers from an even register number.

Example with long ftruncate64(unsigned int fd, loff_t length):

	legacy ABI:
	- put fd into r0
	- put length into r1-r2
	- use "swi #(0x900000 + 194)" to call the kernel

	new ARM EABI:
	- put fd into r0
	- put length into r2-r3 (skipping over r1)
	- put 194 into r7
	- use "swi 0" to call the kernel

Note that it is important to use 0 for the swi argument as backward
compatibility with legacy ABI user space relies on this.
The syscall macros in asm-arm/unistd.h were also updated to support
both ABIs and implement the right call method automatically.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-14 16:31:29 +00:00
Nicolas Pitre
ba95e4e4a0 [ARM] 3104/1: ARM EABI: new helper function names
Patch from Nicolas Pitre

The ARM EABI defines new names for GCC helper functions.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-14 16:18:29 +00:00
Nicolas Pitre
499b2ea11f [ARM] 3103/1: ARM EABI: stack pointer must be 64-bit aligned (part 2)
Patch from Nicolas Pitre

We must make sure that assembly code that modifies the stack pointer
before calling a C function does it so it remains 64-bit aligned.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-14 16:18:09 +00:00
Nicolas Pitre
2dede2d8e9 [ARM] 3102/1: ARM EABI: stack pointer must be 64-bit aligned after a CPU exception
Patch from Nicolas Pitre

The ARM EABI says that the stack pointer has to be 64-bit aligned for
reasons already mentioned in patch #3101 when calling C functions.

We therefore must verify and adjust sp accordingly when taking an
exception from kernel mode since sp might not necessarily be 64-bit
aligned if the exception occurs in the middle of a kernel function.

If the exception occurs while in user mode then no sp fixup is needed as
long as sizeof(struct pt_regs) as well as any additional syscall data
stack space remain multiples of 8.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-14 16:18:08 +00:00
Hyok S. Choi
afeb90ca08 [ARM] Support register switch in nommu mode
This patch adds register switch support in nommu mode.

Signed-off-by: Hyok S. Choi <hyok.choi@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-13 21:05:25 +00:00
Nicolas Pitre
2df96b34aa [ARM] 3259/1: remove phys_ram from struct machine_desc (part 1)
Patch from Nicolas Pitre

This field is redundent since it must be equal to PHYS_OFFSET anyway.

First, let's  use PHYS_OFFSET directly instead.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-13 20:51:46 +00:00
Russell King
e08b754161 [PATCH] Add ecard_bus_type probe/remove/shutdown methods
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-13 11:26:05 -08:00
Linus Torvalds
bf785ee0ae Merge master.kernel.org:/home/rmk/linux-2.6-arm 2006-01-12 12:23:49 -08:00
Arjan van de Ven
00431707be [ARM] Convert some arm semaphores to mutexes
The arm clock semaphores are strict mutexes, convert them to the new
mutex implementation

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-12 18:42:23 +00:00
Al Viro
32d39a9355 [PATCH] arm: task_stack_page()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:56 -08:00
Al Viro
5520582392 [PATCH] arm: end_of_stack()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:56 -08:00
Al Viro
815d5ec86e [PATCH] arm: task_pt_regs()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:55 -08:00
Al Viro
e7c1b32fd3 [PATCH] arm: task_thread_info()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12 09:08:55 -08:00
Catalin Marinas
90303b1023 [ARM] 3256/1: Make the function-returning ldm's use sp as the base register
Patch from Catalin Marinas

If the low interrupt latency mode is enabled for the CPU (from ARMv6
onwards), the ldm/stm instructions are no longer atomic. An ldm instruction
restoring the sp and pc registers can be interrupted immediately after sp
was updated but before the pc. If this happens, the CPU restores the base
register to the value before the ldm instruction but if the base register
is not sp, the interrupt routine will corrupt the stack and the restarted
ldm instruction will load garbage.

Note that future ARM cores might always run in the low interrupt latency
mode.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-12 16:53:51 +00:00
Catalin Marinas
6b090a25fe [ARM] 3234/1: Update cpu_architecture() to deal with the new ID format
Patch from Catalin Marinas

Since ARM1176, the CPU ID format has changed and it will also be used for
future ARM architectures.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-12 16:28:16 +00:00
Randy Dunlap
a941564458 [PATCH] capable/capability.h (arch/)
arch: Use <linux/capability.h> where capable() is used.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:14 -08:00
Russell King
16ed926eee [ARM] Only call set_type method in setup_irq if it's defined
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-09 19:19:18 +00:00
Andrew Morton
a136564702 [PATCH] remove gcc-2 checks
Remove various things which were checking for gcc-1.x and gcc-2.x compilers.

From: Adrian Bunk <bunk@stusta.de>

    Some documentation updates and removes some code paths for gcc < 3.2.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:02 -08:00
Russell King
9ded96f24c [PATCH] IRQ type flags
Some ARM platforms have the ability to program the interrupt controller to
detect various interrupt edges and/or levels.  For some platforms, this is
critical to setup correctly, particularly those which the setting is dependent
on the device.

Currently, ARM drivers do (eg) the following:

	err = request_irq(irq, ...);

	set_irq_type(irq, IRQT_RISING);

However, if the interrupt has previously been programmed to be level sensitive
(for whatever reason) then this will cause an interrupt storm.

Hence, if we combine set_irq_type() with request_irq(), we can then safely set
the type prior to unmasking the interrupt.  The unfortunate problem is that in
order to support this, these flags need to be visible outside of the ARM
architecture - drivers such as smc91x need these flags and they're
cross-architecture.

Finally, the SA_TRIGGER_* flag passed to request_irq() should reflect the
property that the device would like.  The IRQ controller code should do its
best to select the most appropriate supported mode.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:46 -08:00
Richard Purdie
53b7c2b243 [ARM] 3229/1: Remove uneeded ARM apm dependency on PM_LEGACY
Patch from Richard Purdie

ARM doesn't use ACPI so ARM's apm implementation has no need to depend
on PM_LEGACY. This patch removes that dependency.

Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-05 20:44:55 +00:00
Russell King
d7b4a75677 [ARM] Move DMA exports to be next to each function
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-04 15:52:45 +00:00
Russell King
95ba9fb06b [ARM] Remove definition of MAX_DMA_CHANNELS to zero
Since we now only build arch/arm/kernel/dma.c on machine types
which set ISA_DMA_API, we don't need to define MAX_DMA_CHANNELS
to 0 to indicate this - this definition becomes superfluous.
Remove it.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-04 15:51:51 +00:00
Russell King
065909b915 [ARM] Refine selection of ISA_DMA_API and generic dma.c code
ISA_DMA_API tells the rest of the kernel if the ISA DMA API is
available.  Select this symbol only on machine types which make
use of the ISA DMA API.

Make building of arch/arm/kernel/dma.c depend on this symbol -
if a machine does not support the ISA DMA API, it's pointless
building this file.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-04 15:44:16 +00:00
Russell King
6842b92992 [ARM] Use core_initcall() to initialise ARM DMA
There's no need to have DMA initialised at the same time as
interrupts.  Move it to a core_initcall().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-04 15:17:08 +00:00
Russell King
7cdad48297 [ARM] Remove '__address' from scatterlist and convert to DMA API
The old __address element in struct scatterlist remained from older
kernels because the ARM DMA emulation code made use of it.  Move
this field into struct dma_struct, and convert DMA emulation code
to setup a SG entry as required.

Also, convert DMA emulation code to use the new DMA API rather
than the PCI DMA API.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-04 15:08:30 +00:00
Russell King
333c9624b7 [ARM] Move ISA DMA bus_to_virt() out of set_dma_addr()
Allow the compiler to optimise the bus_to_virt(virt_to_bus())
transformation in the ARM ISA DMA interface.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-04 14:41:29 +00:00
Russell King
78ff18a412 [ARM] Cleanup ARM includes
arch/arm/kernel/entry-armv.S has contained a comment suggesting
that asm/hardware.h and asm/arch/irqs.h should be moved into the
asm/arch/entry-macro.S include.  So move the includes to these
two files as required.

Add missing includes (asm/hardware.h, asm/io.h) to asm/arch/system.h
includes which use those facilities, and remove asm/io.h from
kernel/process.c.

Remove other unnecessary includes from arch/arm/kernel, arch/arm/mm
and arch/arm/mach-footbridge.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-03 17:39:34 +00:00
Russell King
9d4f13e531 [ARM] Make kernel link address depend on PAGE_OFFSET
We are coding the kernel link address into the makefiles, which is
invisibly dependent on PAGE_OFFSET.  If PAGE_OFFSET is changed, the
makefiles also need to be changed.

Make adjustments such that the makefiles encode just the offset from
PAGE_OFFSET for the kernel link address, and use PAGE_OFFSET in the
linker scripts directly.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-01-03 17:28:33 +00:00
Nicolas Pitre
7c612bfd4e [ARM] 3210/1: add missing memory barrier helper for NPTL support
Patch from Nicolas Pitre

Strictly speaking, the NPTL kernel helpers are required for pre ARMv6
only.  They are available on ARMv6+ as well for obvious compatibility
reasons.  However there are cases where extra memory barriers are needed
when using an SMP ARMv6 machine but not on pre-ARMv6.

This patch adds a memory barrier kernel helper that glibc can use as
needed for pre-ARMv6 binaries to be forward compatible with an SMP
kernel on ARMv6, as well as the necessary dmb instructions to the
cmpxchg helper.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Acked-by: Daniel Jacobowitz <dan@codesourcery.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-12-19 22:20:51 +00:00
Russell King
567bd98017 [ARM] Fix sys_sendto and sys_recvfrom 6-arg syscalls
Rather than providing more wrappers for 6-arg syscalls, arrange for
them to be supported as standard.  This just means that we always
store the 6th argument on the stack, rather than in the wrappers.

This means we eliminate the wrappers for:
* sys_futex
* sys_arm_fadvise64_64
* sys_mbind
* sys_ipc

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-12-17 15:25:42 +00:00
Daniel Jacobowitz
c2e2611425 [ARM] 3205/1: Handle new EABI relocations when loading kernel modules.
Patch from Daniel Jacobowitz

Handle new EABI relocations when loading kernel modules.  This is
necessary for CONFIG_AEABI kernels, and also for some broken
(since fixed) old ABI toolchains.

Signed-off-by: Daniel Jacobowitz <dan@codesourcery.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-12-14 22:04:22 +00:00
Nikola Valerjev
22f975f4ff [ARM] 3200/1: Singlestep over ARM BX and BLX instructions using ptrace fix
Patch from Nikola Valerjev

Single stepping an application using ptrace() fails over ARM instructions BX and BLX.

Steps to reproduce:

Compile and link the following files

main.c
-----
void foo();
int main() {
    foo();
    return 0;
}

foo.s
-----
	.text
	.globl foo
foo:
	BX LR

Using ptrace() functionality, run to main(), and start singlestepping.
Singlestep over \"BX LR\" instruction won\'t transfer the control back
to main, but run the code to completion.

This problems seems to be in the function get_branch_address() in
arch/arm/kernel/ptrace.c. The function doesn\'t seem to recognize BX
and BLX instructions as branches. BX and BLX instructions can be used
to convert from ARM to Thumb mode if the target address has the low
bit set. However, they are also perfectly legal in the ARM only mode.
Although other things in the kernel seem to indicate that only ARM
mode is accepted (and not Thumb), many compilers will generate BX
and BLX instructions even when generating ARM only code.

Signed-off-by: Nikola Valerjev <nikola@ghs.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-12-10 11:59:15 +00:00
Russell King
3c0bdac387 [ARM] Remove mach-types.h from head.S
We don't really need to check whether the machine type is Netwinder
or CATS before setting up the PCI IO mapping for debugging.  This
allows us to eliminate asm/mach-types.h from head.S

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-11-25 15:43:22 +00:00
Daniel Jacobowitz
a6c61e9dfd [ARM] 3168/1: Update ARM signal delivery and masking
Patch from Daniel Jacobowitz

After delivering a signal (creating its stack frame) we must check for
additional pending unblocked signals before returning to userspace.
Otherwise signals may be delayed past the next syscall or reschedule.

Once that was fixed it became obvious that the ARM signal mask manipulation
was broken.  It was a little bit broken before the recent SA_NODEFER
changes, and then very broken after them.  We must block the requested
signals before starting the handler or the same signal can be delivered
again before the handler even gets a chance to run.

Signed-off-by: Daniel Jacobowitz <dan@codesourcery.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-11-19 10:01:07 +00:00
Russell King
d2c5b69099 [ARM] Fix get_user when passed a const pointer
Unfortunately, later gcc versions error out when our get_user is passed
a const pointer, since we write to a temporary variable declared as
typeof(*(p)) which propagates the const-ness.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-11-18 14:22:03 +00:00
Russell King
728f5c076a [ARM] Improve comment about ASSERT()s in vmlinux.lds.S
Provide folk with an idea what to do if the ASSERT statements fail
with their linker.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-11-17 16:43:14 +00:00
Linus Torvalds
70ac551651 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-11-13 18:17:54 -08:00
Jeff Garzik
bca73e4bf8 [PATCH] move pm_register/etc. to CONFIG_PM_LEGACY, pm_legacy.h
Since few people need the support anymore, this moves the legacy
pm_xxx functions to CONFIG_PM_LEGACY, and include/linux/pm_legacy.h.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13 18:14:10 -08:00
Russell King
da2660d2c4 [ARM] Restore apparant pointless change in arch/arm/kernel/smp.c
Restore smp.c back to how it used to be.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2005-11-12 17:21:47 +00:00
Linus Torvalds
ac111bfaa6 Merge master.kernel.org:/home/rmk/linux-2.6-arm 2005-11-09 08:55:53 -08:00
Nick Piggin
64c7c8f885 [PATCH] sched: resched and cpu_idle rework
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid.  Improves efficiency of
resched_task and some cpu_idle routines.

* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
  and as we hold it during resched_task, then there is no need for an
  atomic test and set there. The only other time this should be set is
  when the task's quantum expires, in the timer interrupt - this is
  protected against because the rq lock is irq-safe.

- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
  won't get unset until the task get's schedule()d off.

- If we are running on the same CPU as the task we resched, then set
  TIF_NEED_RESCHED and no further action is required.

- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
  after TIF_NEED_RESCHED has been set, then we need to send an IPI.

Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.

* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
  becomes true, explicitly call schedule(). This makes things a bit clearer
  (IMO), but haven't updated all architectures yet.

- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
  to the resched_task rules, this isn't needed (and actually breaks the
  assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
  held). So remove that. Generally one less locked memory op when switching
  to the idle thread.

- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
  most polling idle loops. The above resched_task semantics allow it to be
  set until before the last time need_resched() is checked before going into
  a halt requiring interrupt wakeup.

  Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
  can be always left set, completely eliminating resched IPIs when rescheduling
  the idle task.

  POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:33 -08:00