Commit graph

968 commits

Author SHA1 Message Date
David S. Miller
44266215e3 sparc: Implement irq_of_parse_and_map() and irq_dispose_mapping().
This allows more OF layer code to be shared between powerpc and
sparc.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-24 20:33:55 -07:00
David S. Miller
2481d76615 sparc: Add mutex for set property calls.
On some platforms, the I2C controller is shared between the OS and
OBP.  OBP uses this I2C controller to access the EEPROM, and thus is
programmed when the kernel calls prom_setprop().

Wrap such calls with the new of_set_property_mutex.

Relevant I2C bus drivers can grab this mutex around top-level I2C
operations to provide the proper protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-24 20:33:55 -07:00
David S. Miller
6f63e781ea sparc64: Handle stack trace attempts before irqstacks are setup.
Things like lockdep can try to do stack backtraces before
the irqstack blocks have been setup.  So don't try to match
their ranges so early on.

Also, remove unused variable in save_stack_trace().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-13 17:20:04 -07:00
David S. Miller
4f70f7a91b sparc64: Implement IRQ stacks.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-12 18:33:56 -07:00
David S. Miller
b6b7922fbd sparc64: Don't MAGIC_SYSRQ ifdef smp_fetch_global_regs and support code.
Based upon a report and initial patch by Friedrich Oslage.

The intention is to provide this facility for
__trigger_all_cpu_backtrace even if MAGIC_SYSRQ is not set.

The only part that should have MAGIC_SYSRQ ifdef protection is the
sparc_globalreg_op sysrq regitration and immediate code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-09 16:25:26 -07:00
David S. Miller
433c5f7068 sparc64: Fix end-of-stack checking in save_stack_trace().
Bug reported by Alexander Beregalov.

Before we dereference the stack frame or try to peek at the
pt_regs magic value, make sure the entire object is within
the kernel stack bounds.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-07 23:04:37 -07:00
Stephen Rothwell
764f2579d9 sparc: don't use asm/of_device.h
Use linux/of_device.h instead.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-07 15:33:36 -07:00
David S. Miller
ea771bd51c sparc64: Use kernel/uid16.c helpers instead of own copy.
Noticed by Adrian Bunk.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-06 23:11:08 -07:00
David S. Miller
ae583885bf sparc64: Remove all cpumask_t local variables in xcall dispatch.
All of the xcall delivery implementation is cpumask agnostic, so
we can pass around pointers to const cpumask_t objects everywhere.

The sad remaining case is the argument to arch_send_call_function_ipi().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 16:56:15 -07:00
David S. Miller
ed4d9c66eb sparc64: Kill error_mask from hypervisor_xcall_deliver().
It can eat up a lot of stack space when NR_CPUS is large.
We retain some of it's functionality by reporting at least one
of the cpu's which are seen in error state.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 16:47:57 -07:00
David S. Miller
90f7ae8a55 sparc64: Build cpu list and mondo block at top-level xcall_deliver().
Then modify all of the xcall dispatch implementations get passed and
use this information.

Now all of the xcall dispatch implementations do not need to be mindful
of details such as "is current cpu in the list?" and "is cpu online?"

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 16:42:58 -07:00
David S. Miller
c02a5119e8 sparc64: Disable local interrupts around xcall_deliver_impl() invocation.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 16:18:40 -07:00
David S. Miller
deb16999e4 sparc64: Make all xcall_deliver's go through common helper function.
This just facilitates the next changeset where we'll be building
the cpu list and mondo block in this helper function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 16:16:20 -07:00
David S. Miller
43f589235e sparc64: Always allocate the send mondo blocks, even on non-sun4v.
The idea is that we'll use this cpu list array and mondo block
even for non-hypervisor platforms.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 16:13:51 -07:00
David S. Miller
91a4231cc2 sparc64: Make smp_cross_call_masked() take a cpumask_t pointer.
Ideally this could be simplified further such that we could pass
the pointer down directly into the xcall_deliver() implementation.

But if we do that we need to do the "cpu_online(cpu)" and
"cpu != self" checks down in those functions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 13:51:40 -07:00
David S. Miller
24445a4ac9 sparc64: Directly call xcall_deliver() in smp_start_sync_tick_client.
We know the cpu is online and not the current cpu here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 13:51:40 -07:00
David S. Miller
1992663053 sparc64: Call xcall_deliver() directly in some cases.
For these cases the callers make sure:

1) The cpus indicated are online.

2) The current cpu is not in the list of indicated cpus.

Therefore we can pass a pointer to the mask directly.

One of the motivations in this transformation is to make use of
"&cpumask_of_cpu(cpu)" which evaluates to a pointer to constant
data in the kernel and thus takes up no stack space.

Hopefully someone in the future will change the interface of
arch_send_call_function_ipi() such that it passes a const cpumask_t
pointer so that this will optimize ever further.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 13:51:39 -07:00
David S. Miller
cd5bc89deb sparc64: Use cpumask_t pointers and for_each_cpu_mask_nr() in xcall_deliver.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 13:51:38 -07:00
David S. Miller
622824dbb5 sparc64: Use xcall_deliver() consistently.
There remained some spots still vectoring to the appropriate
*_xcall_deliver() function manually.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 13:51:37 -07:00
David S. Miller
5e0797e5b8 sparc64: Use function pointer for cross-call sending.
Initialize it using the smp_setup_processor_id() hook.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 13:51:37 -07:00
Huang Weiyi
abd9e69828 arch/sparc64/kernel/signal.c: removed duplicated #include
Removed duplicated #include <linux/tracehook.h> in
arch/sparc64/kernel/signal.c.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-08-04 13:51:36 -07:00
David S. Miller
0a4949c441 sparc64: Do not clobber %g7 in setcontext() trap.
That's the userland thread register, so we should never try to change
it like this.

Based upon glibc bug nptl/6577 and suggestions by Jakub Jelinek.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31 20:40:46 -07:00
David S. Miller
dbf3e95067 sparc64: Kill __show_regs().
The story is that what we used to do when we actually used
smp_report_regs() is that if you specifically only wanted to have the
current cpu's registers dumped you would call "__show_regs()"
otherwise you would call show_regs() which also invoked
smp_report_regs().

Now that we killed off smp_report_regs() there is no longer any
reason to have these two routines, just show_regs() is sufficient.

Also kill off a stray declaration of show_regs() in sparc64_ksym.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31 20:33:43 -07:00
David S. Miller
9c636e30a3 sparc64: Kill smp_report_regs().
All the call sites are #if 0'd out and we have a much more
useful global cpu dumping facility these days.  smp_report_regs()
is way too verbose to be usable.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31 01:06:02 -07:00
David S. Miller
a014821340 sparc64: Kill VERBOSE_SHOWREGS code.
It just clutters everything up and even though I wrote that hack I
can't remember having used it in the last 5 years or so.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-31 00:58:35 -07:00
David S. Miller
09ee167cbf sparc64: Hook up trigger_all_cpu_backtrace().
We already have code that does this, but it is only currently attached
to sysrq-'y'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-30 22:35:00 -07:00
David S. Miller
5afe27380b sparc64: Make global reg dumping even more useful.
Record one more level of stack frame program counter.

Particularly when lockdep and all sorts of spinlock debugging is
enabled, figuring out the caller of spin_lock() is difficult when the
cpu is stuck on the lock.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-30 21:57:59 -07:00
David S. Miller
71fc324b5b sparc64: Kill isa_bus_type.
I forgot to delete this when I removed the ISA bus layer
from the sparc ports.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-29 23:47:17 -07:00
David S. Miller
17b6f586b8 sparc64: Fix global reg snapshotting on self-cpu.
We were picking %i7 out of the wrong register window
stack slot.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-28 00:44:29 -07:00
Roland McGrath
95698466cf sparc64: tracehook_signal_handler
Call the standard hook after setting up signal handlers.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-27 17:32:35 -07:00
Roland McGrath
e35a8925e0 sparc64: tracehook: TIF_NOTIFY_RESUME
This adds TIF_NOTIFY_RESUME support for sparc64.
When set, we call tracehook_notify_resume() on the way to user mode.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-27 17:32:19 -07:00
Roland McGrath
73ccefab8a sparc64: tracehook syscall
This changes sparc64 syscall tracing to use the new tracehook.h entry
points.

[ Add assembly changes to force an immediate -ENOSYS return from
  the system call when syscall_trace() returns non-zero at syscall
  entry.  -DaveM ]

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-27 17:28:55 -07:00
Sam Ravnborg
a439fe51a1 sparc, sparc64: use arch/sparc/include
The majority of this patch was created by the following script:

***
ASM=arch/sparc/include/asm
mkdir -p $ASM
git mv include/asm-sparc64/ftrace.h $ASM
git rm include/asm-sparc64/*
git mv include/asm-sparc/* $ASM
sed -ie 's/asm-sparc64/asm/g' $ASM/*
sed -ie 's/asm-sparc/asm/g' $ASM/*
***

The rest was an update of the top-level Makefile to use sparc
for header files when sparc64 is being build.
And a small fixlet to pick up the correct unistd.h from
sparc64 code.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-27 23:00:59 +02:00
Linus Torvalds
7b35fa86e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc: Wire up new system calls.
2008-07-25 17:33:34 -07:00
David S. Miller
f1373da87b sparc: Wire up new system calls.
This wires up the recently added Wire up signalfd4, eventfd2,
epoll_create1, dup3, pipe2, and inotify_init1 system calls.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-25 15:18:31 -07:00
Srinivasa D S
ef53d9c5e4 kprobes: improve kretprobe scalability with hashed locking
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table.  We have one
global kretprobe lock to serialise the access to these lists.  This causes
only one kretprobe handler to execute at a time.  Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).

Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.

Solution:

 1) Instead of having one global lock to protect kretprobe instances
    present in kretprobe object and kretprobe hash table.  We will have
    two locks, one lock for protecting kretprobe hash table and another
    lock for kretporbe object.

 2) We hold lock present in kretprobe object while we modify kretprobe
    instance in kretprobe object and we hold per-hash-list lock while
    modifying kretprobe instances present in that hash list.  To prevent
    deadlock, we never grab a per-hash-list lock while holding a kretprobe
    lock.

 3) We can remove used_instances from struct kretprobe, as we can
    track used instances of kretprobe instances using kretprobe hash
    table.

Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.

cacheline              non-cacheline             Un-patched kernel
aligned patch 	       aligned patch
===============================================================================
real    9m46.784s       9m54.412s                  10m2.450s
user    40m5.715s       40m7.142s                  40m4.273s
sys     2m57.754s       2m58.583s                  3m17.430s
===========================================================

Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real    9m26.389s
user    40m8.775s
sys     2m7.283s
=========================

Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:30 -07:00
Linus Torvalds
ecc8b655b3 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
  nohz: prevent tick stop outside of the idle loop
2008-07-24 12:55:01 -07:00
Linus Torvalds
4378dcca85 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix cpufreq notifier registry.
  sparc64: Fix lockdep issues in LDC protocol layer.
2008-07-24 12:15:16 -07:00
Ulrich Drepper
ed8cae8ba0 flag parameters: pipe
This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value.  This patch implements
the handling of O_CLOEXEC for the flag.  I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation.  I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.

The implementation introduces do_pipe_flags.  I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code.  To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_pipe2
# ifdef __x86_64__
#  define __NR_pipe2 293
# elif defined __i386__
#  define __NR_pipe2 331
# else
#  error "need __NR_pipe2"
# endif
#endif

int
main (void)
{
  int fd[2];
  if (syscall (__NR_pipe2, fd, 0) != 0)
    {
      puts ("pipe2(0) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int coe = fcntl (fd[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if (coe & FD_CLOEXEC)
        {
          printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
          return 1;
        }
    }
  close (fd[0]);
  close (fd[1]);

  if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
    {
      puts ("pipe2(O_CLOEXEC) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int coe = fcntl (fd[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if ((coe & FD_CLOEXEC) == 0)
        {
          printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
          return 1;
        }
    }
  close (fd[0]);
  close (fd[1]);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:28 -07:00
Andrea Righi
27ac792ca0 PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:

	u64 val = PAGE_ALIGN(size);

always returns a value < 4GB even if size is greater than 4GB.

The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):

#define PAGE_SHIFT      12
#define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK       (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)

The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.

Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.

See also lkml discussion: http://lkml.org/lkml/2008/6/11/237

[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:21 -07:00
David S. Miller
7ae93f51d7 sparc64: Fix cpufreq notifier registry.
Based upon a report by Daniel Smolik.

We do it too early, which triggers a BUG in
cpufreq_register_notifier().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-23 16:21:07 -07:00
David S. Miller
b7c2a75725 sparc64: Fix lockdep issues in LDC protocol layer.
We're calling request_irq() with a IRQs disabled.

No straightforward fix exists because we want to
enable these IRQs and setup state atomically before
getting into the IRQ handler the first time.

What happens now is that we mark the VIRQ to not be
automatically enabled by request_irq().  Then we
make explicit enable_irq() calls when we grab the
LDC channel.

This way we don't need to call request_irq() illegally
under the LDC channel lock any more.

Bump LDC version and release date.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-22 22:34:29 -07:00
Linus Torvalds
6eaaaac974 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  remove CONFIG_KMOD from core kernel code
  remove CONFIG_KMOD from lib
  remove CONFIG_KMOD from sparc64
  rework try_then_request_module to do less in non-modular kernels
  remove mention of CONFIG_KMOD from documentation
  make CONFIG_KMOD invisible
  modules: Take a shortcut for checking if an address is in a module
  module: turn longs into ints for module sizes
  Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
  module: reorder struct module to save space on 64 bit builds
  module: generic each_symbol iterator function
  module: don't use stop_machine for waiting rmmod
2008-07-22 13:17:15 -07:00
Johannes Berg
184b6c7682 remove CONFIG_KMOD from sparc64
One place is just a comment, the other a conditional, unused
inclusion of linux/kmod.h.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22 19:24:30 +10:00
Greg Kroah-Hartman
2222c313e9 sparc64: fix up bus_id changes in sparc core code
This converts all instances of bus_id in the sparc core kernel to use
either dev_set_name(), or dev_name() depending on the need.

This is done in anticipation of removing the bus_id field from struct
driver.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:03 -07:00
Andi Kleen
4a0b2b4dbe sysdev: Pass the attribute to the low level sysdev show/store function
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.

I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.

I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.

Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:02 -07:00
Kay Sievers
aab0de2451 driver core: remove KOBJ_NAME_LEN define
Kobjects do not have a limit in name size since a while, so stop
pretending that they do.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:52 -07:00
Ingo Molnar
9b610fda0d Merge branch 'linus' into timers/nohz 2008-07-18 19:53:16 +02:00
Thomas Gleixner
b8f8c3cf0a nohz: prevent tick stop outside of the idle loop
Jack Ren and Eric Miao tracked down the following long standing
problem in the NOHZ code:

	scheduler switch to idle task
	enable interrupts

Window starts here

	----> interrupt happens (does not set NEED_RESCHED)
	      	irq_exit() stops the tick

	----> interrupt happens (does set NEED_RESCHED)

	return from schedule()
	
	cpu_idle(): preempt_disable();

Window ends here

The interrupts can happen at any point inside the race window. The
first interrupt stops the tick, the second one causes the scheduler to
rerun and switch away from idle again and we end up with the tick
disabled.

The fact that it needs two interrupts where the first one does not set
NEED_RESCHED and the second one does made the bug obscure and extremly
hard to reproduce and analyse. Kudos to Jack and Eric.

Solution: Limit the NOHZ functionality to the idle loop to make sure
that we can not run into such a situation ever again.

cpu_idle()
{
	preempt_disable();

	while(1) {
		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
		 			          are in the idle loop

		 while (!need_resched())
		       halt();

		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
		 preempt_enable_no_resched();
		 schedule();
		 preempt_disable();
	}
}

In hindsight we should have done this forever, but ... 

/me grabs a large brown paperbag.

Debugged-by: Jack Ren <jack.ren@marvell.com>, 
Debugged-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-18 18:10:28 +02:00
David S. Miller
432e8765f0 sparc64: Add missing hypervisor service group numbers.
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18 00:43:52 -07:00