Fix up arch-specific work items where possible to use the new work_struct and
delayed_work structs.
Three places that enqueue bits of their stack and then return have been marked
with #error as this is not permitted.
Signed-Off-By: David Howells <dhowells@redhat.com>
Use generic_handle_irq() to handle mixed-type irq handling.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
`typename' is going away and is usually uninitialised anwyay.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Correct definition of handle_IPI
[IA64] move SAL_CACHE_FLUSH check later in boot
[IA64] MCA recovery: Montecito support
[IA64] cpu-hotplug: Fixing confliction between CPU hot-add and IPI
[IA64] don't double >> PAGE_SHIFT pointer for /dev/kmem access
The declaration of handle_IPI in arch/ia64/kernel/smp.c was changed but
not the definition of this function. Remove struct pt_regs from
handle_IPI().
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The check to see if the firmware drops interrupts during a
SAL_CACHE_FLUSH is done to early in the boot. SAL_CACHE_FLUSH expects
to be able to make PAL calls in virtual mode, on some cell based
machines a fault occurs causing a MCA. This patch moves the check
after mmu_context_init so the TLB and VHPT are properly setup.
Signed-off-by Troy Heber <troy.heber@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The information in MCA records is filled in slightly differently on
Montecito than on Madison/McKinley. Usually, the cache check and bus
check target identifiers have the same address. On Montecito the
cache check and bus check target identifiers can be different if
a corrected error (ie SBE or unconsumed poison data) was encountered and
then an uncorrected error (ie DBE) was consumed. In that case, the
cache check target identifier is the physical address of the DBE (that
caused the MCA to surface) while the bus check target identifier is the
physical address of the SBE. This patch correctly finds the target
identifier that triggered the MCA.
If there are multiple valid cache target identifiers in the same
error record then use the one with the lowest cache level.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add a vmlinux.lds.h helper macro for defining the eight-level initcall table,
teach all the architectures to use it.
This is a prerequisite for a patch which performs initcall synchronisation for
multithreaded-probing.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Added AVR32 as well ]
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Count the number of "resched" interrupts that each cpu receives.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reformat to fit in 80 columns. Fix a couple typos. Remove
a couple unused labels.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linux maps PAL instructions with an ITR, but uses a DTC for PAL data.
Section 11.10.2.1.3, "Making PAL Procedures Calls in Physical or Virtual
Mode," of the SDM (rev 2.2), says we must therefore make all PAL calls
with PSR.ic = 1 so that Linux can handle any TLB faults.
PAL_CALL_IC_OFF is currently unused, and as long as we use the ITR + DTC
strategy, we can't use it. So remove it. I also removed the code in
ia64_pal_call_static() that conditionally cleared PSR.ic.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Allow pending IPIs to interrupt a timer interrupt that is looping
in the do_timer() "while" loop in timer_interrupt(). (Interrupts are
allowed at only 1 spot in the code).
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Missed one piece of ia64 fallout from the global IRQ patch
7d12e780e0
Perfmon interrupt handler needs to use get_irq_regs() too.
Acked-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
This is just a few makefile tweaks and some file renames.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The last in-kernel user of errno is gone, so we should remove the definition
and everything referring to it. This also removes the now-unused lib/execve.c
file that was introduced earlier.
Also remove every trace of __KERNEL_SYSCALLS__ that still remained in the
kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some architectures provide an execve function that does not set errno, but
instead returns the result code directly. Rename these to kernel_execve to
get the right semantics there. Moreover, there is no reasone for any of these
architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
remove these right away.
[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c. This
avoids all arches having to be updated. Compiles and boots on s390.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a nsproxy structure to the task struct. Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.
The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.
[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpumask: ensure that node_to_cpumask() is available to modules for all
supported combinations of architecture and CONFIG_NUMA.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kprobe_flush_task() possibly calls kfree function during holding
kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
incur spinlock deadlock. This patch moves kfree function out scope of
kretprobe_lock.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Whitespace is used to indent, this patch cleans up these sentences by
kernel coding style.
Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
asm/serial.h is supposed to contain the definitions for the architecture
specific 8250 ports for the 8250 driver. It may also define BASE_BAUD,
but this is the base baud for the architecture specific ports _only_.
Therefore, nothing other than the 8250 driver should be including this
header file. In order to move towards this goal, here is a patch which
removes some of the more obvious incorrect includes of the file.
Acked-by: Paul Fulghum <paulkf@microgate.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
So we can kill wall_jiffies completely.
This is just a cleanup and logically should not change any real behavior
except for one thing: RTC updating code in (old) ppc and xtensa use a
condition "jiffies - wall_jiffies == 1". This condition is never met so I
suppose it is just a bug. I just remove that condition only instead of
kill the whole "if" block.
[heiko.carstens@de.ibm.com: s390 build fix and cleanup]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390
timer interrupt handler with this change.
Currently update_times() calculates ticks by "jiffies - wall_jiffies", but
callers of do_timer() should know how many ticks to update. Passing ticks
get rid of this redundant calculation. Also there are another redundancy
pointed out by Martin Schwidefsky.
This cleanup make a barrier added by
5aee405c66 needless. So this patch removes
it.
As a bonus, this cleanup make wall_jiffies can be removed easily, since now
wall_jiffies is always synced with jiffies. (This patch does not really
remove wall_jiffies. It would be another cleanup patch)
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] minor reformatting to vmlinux.lds.S
[IA64] CMC/CPE: Reverse the order of fetching log and checking poll threshold
[IA64] PAL calls need physical mode, stacked
[IA64] ar.fpsr not set on MCA/INIT kernel entry
[IA64] printing support for MCA/INIT
[IA64] trim output of show_mem()
[IA64] show_mem() printk levels
[IA64] Make gp value point to Region 5 in mca handler
Revert "[IA64] Unwire set/get_robust_list"
[IA64] Implement futex primitives
[IA64-SGI] Do not request DMA memory for BTE
[IA64] Move perfmon tables from thread_struct to pfm_context
[IA64] Add interface so modules can discover whether multithreading is on.
[IA64] kprobes: fixup the pagefault exception caused by probehandlers
[IA64] kprobe opcode 16 bytes alignment on IA64
[IA64] esi-support
[IA64] Add "model name" to /proc/cpuinfo
Fix build error introduced by 3212fe1594
Non-NUMA case should be handled.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Minor reformatting to vmlinux.lds.S to make it 80-column usable,
in accordance with Linux coding style.
Signed-off-by: Al Stone <ahs3@fc.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch reverses the order of fetching log from SAL and
checking poll threshold. This will fix following trivial issues:
- If SAL_GET_SATE_INFO is unbelievably slow (due to huge system
or just its silly implementation) and if it takes more than
1/5 sec, CMCI/CPEI will never switch to CMCP/CPEP.
- Assuming terrible flood of interrupt (continuous corrected
errors let all CPUs enter to handler at once and bind them
in it), CPUs will be serialized by IA64_LOG_LOCK(*).
Now we check the poll threshold after the lock and log fetch,
so we need to call SAL_GET_STATE_INFO (num_online_cpus() + 4)
times in the worst case.
if we can check the threshold before the lock, we can shut up
interrupts quickly without waiting preceding log fetches, and
the number of times will be reduced to (num_online_cpus()) in
the same situation.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When entering the kernel due to an MCA or INIT, ar.fpsr (ar40)
was not getting set to the kernel default value (remaining
at the user value). The effect depends on the user setting
of ar.fpsr. In the test case, the effect was addresses
printing with strange hex values.
Setting ar.fpsr in ia64_set_kernel_registers sets it for both
the MCA and INIT paths. The user value of ar.fpsr is correctly
saved (in ia64_state_save) and restored (in ia64_state_restore).
Below is an example of output with very strange hex values.
Anyone know the value of hex 'g'? :-)
Processes interrupted by INIT - 0 (cpu 14 task 0xdfffg55g7a4c6gA)
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Printing message to console from MCA/INIT handler is useful,
however doing oops_in_progress = 1 in them exactly makes
something in kernel wrong. Especially it sounds ugly if
system goes wrong after returning from recoverable MCA.
This patch adds ia64_mca_printk() function that collects
messages into temporary-not-so-large message buffer during
in MCA/INIT environment and print them out later, after
returning to normal context or when handlers determine to
down the system.
Also this print function is exported for use in extensional
MCA handler. It would be useful to describe detail about
recovery.
NOTE:
I don't think it is sane thing if temporary message buffer
is enlarged enough to hold whole stack dumps from INIT, so
buffering is disabled during stack dump from INIT-monarch
(= default_monarch_init_process). please fix it in future.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
MCA dispatch code take physical address of GP passed from SAL, then call
DATA_PA_TO_VA twice on GP before call into C code. The first time is
in ia64_set_kernel_register, the second time is in VIRTUAL_MODE_ENTER.
The gp is changed to a virtual address in region 7 because DATA_PA_TO_VA
is implemented by dep instruction.
However when notify blocks were called from MCA handler code, because
notify blocks are supported by callback function pointers, gp value
value was switched to region 5 again.
The patch set gp register to kernel gp of region 5 at entry of MCA
dispatch.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This reverts commit 2636255488.
Jakub Jelinek provided the missing futex_atomic_cmpxchg_inatomic()
function, so now it should be safe to re-enable these syscalls.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch renders thread_struct->pmcs[] and thread_struct->pmds[]
OBSOLETE. The actual table is moved to pfm_context structure which
saves space in thread_struct (in turn saving space in task_struct
which frees up more space for kernel stacks).
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add is_multithreading_enabled() to check whether multi-threading
is enabled independently of which cpu is currently online
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If the user-specified kprobe handler causes the page fault when accessing
user space address, fixup this fault since do_page_fault() should not
continue as the kprobe handler are run with preemption disabled.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On IA64 instruction opcode must be 16 bytes alignment, in kprobe structure
there is one element to save original instruction, currently saved opcode
is not statically allocated in kprobe structure, that can not assure
16 bytes alignment. This patch dynamically allocated kprobe instruction
opcode to assure 16 bytes alignment.
Signed-off-by: bibo mao <bibo.mao@intel.com>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.
Move it into <linux/smp.h>.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In many places we will need to use the same combination of flags. Specify
a single GFP_THISNODE definition for ease of use in gfp.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The uncached allocator manages per node pools. Specify __GFP_THISNODE in
order to force allocation on the indicated node or fail. The uncached
allocator has already logic to deal with failing allocations.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Assume that a cpu is *physically* offlined at boot time...
Because smpboot.c::smp_boot_cpu_map() canoot find cpu's sapicid,
numa.c::build_cpu_to_node_map() cannot build cpu<->node map for
offlined cpu.
For such cpus, cpu_to_node map should be fixed at cpu-hot-add.
This mapping should be done before cpu onlining.
This patch also handles cpu hotremove case.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Problem description:
We have additional_cpus= option for allocating possible_cpus. But nid
for possible cpus are not fixed at boot time. cpus which is offlined at
boot or cpus which is not on SRAT is not tied to its node. This will
cause panic at cpu onlining.
Usually, pxm_to_nid() mapping is fixed at boot time by SRAT.
But, unfortunately, some system (my system!) do not include
full SRAT table for possible cpus. (Then, I use
additiona_cpus= option.)
For such possible cpus, pxm<->nid should be fixed at
hot-add. We now have acpi_map_pxm_to_node() which is also
used at boot. It's suitable here.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The SN PROM uses the register stack in the slave loop. The contents
must be preserved for the OS to return to the slave loop via offlining
a cpu or for kexec. A 'flushrs" is needed to force the stack to be written
to memory prior to changing bspstore.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The syscalls set/get_robust_list must not be wired up until
futex_atomic_cmpxchg_inatomic is implemented. Otherwise the kernel will
hang in handle_futex_death.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix a bug in sys_perfmonctl() whereby it was not correctly
decrementing the file descriptor reference count.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This prevents cross-region mappings on IA64 and SPARC which could lead
to system crash. They were correctly trapped for normal mmap() calls,
but not for the kernel internal calls generated by executable loading.
This code just moves the architecture-specific cross-region checks into
an arch-specific "arch_mmap_check()" macro, and defines that for the
architectures that needed it (ia64, sparc and sparc64).
Architectures that don't have any special requirements can just ignore
the new cross-region check, since the mmap() code will just notice on
its own when the macro isn't defined.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Cleaned up to not affect architectures that don't need it ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Increase default nodes shift to 10, nr_cpus to 1024
[IA64] remove redundant local_irq_save() calls from sn_sal.h
[IA64] panic if topology_init kzalloc fails
[IA64-SGI] Silent data corruption caused by XPC V2.
There really is no sense trying to continue if the kzalloc of sysfs_cpus[]
fails in ia64 topology_init. The code calling into here doesn't check
errors very well, and one ends up with a nonobvious boot failure that
wastes peoples time debugging.
See for example the lkml thread at:
http://lkml.org/lkml/2006/3/2/215
Since the system is totally dead when this kzalloc fails, not having yet
even booted, might as well announce one's death boldly and plainly.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ACPI 3.0 appended a variable length UID string to the LAPIC structure
as part of support for > 256 processors. So the BAD_MADT_ENTRY() sanity
check can no longer compare for equality with a fixed structure length.
Signed-off-by: Alexey Y Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Previously the message was "Fatal exception: panic_on_oops", as introduced
in a recent patch whith removed a somewhat dangerous call to ssleep() in
the panic_on_oops path. However, Paul Mackerras suggested that this was
somewhat confusing, leadind people to believe that it was panic_on_oops
that was the root cause of the fatal exception. On his suggestion, this
patch changes the message to simply "Fatal exception". A suitable oops
message should already have been displayed.
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The uncached allocator has a function, uncached_get_new_chunk(), that needs
to be serialized on a per node basis. It also has a global variable,
allocated_granules, which should be defined on a per node basis and protected
by that serialization. Additionally, all error returns from functions called
(like ia64_pal_mc_drain()) should be handled appropriately.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Acked-by: Jes Sorenson <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
CONFIG_MD_RAID5 became CONFIG_MD_RAID456 in drivers/md/Kconfig. Make
the same change in arch/ia64
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Aron Griffis <aron@hp.com>
Acked-by: Jes Sorenson <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I think ia64_switch_mode_phys and ia64_switch_mode_virt
does not need to alloc an empty frame.
An empty frame is required by loadrs but flushrs
does not need that.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We found an issue in pal.S.
According to the software runtime SPEC,
The caller's output registers do not need to be preserved for
caller. The callee may reuse input registers for any other
purpose within the procedure.
in ia64_pal_call_phys_stacked,
input registers are copied to output registers before call
into ia64_switch_mode_phys, then used to call into PAL. This
assumes output registers are preserved in ia64_switch_mode_phys,
which may not be true.
In this particular case, ia64_switch_mode_phys alloc a null frame
, and mask off psr.i.
If an interrupt comes at this small window,
or an MCA comes inside the procedure, output registers
maybe changed,
then the pal call may got some staled input registers.
This patch moves the copies from input to output
after ia64_switch_mode_phys to follow the software
runtime convention.
It also removed some unused labels in
ia64_pal_call_phys_stacked.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix some sparse warnings on ia64. Large constants that should be long
instead of int. Use NULL instead of 0. Add some missing __iomem
casts. Replace a non-C99 structure assignment.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The latest toolchains can produce a new ELF section in DSOs and
dynamically-linked executables. The new section ".gnu.hash" replaces
".hash", and allows for more efficient runtime symbol lookups by the
dynamic linker. The new ld option --hash-style={sysv|gnu|both} controls
whether to produce the old ".hash", the new ".gnu.hash", or both. In some
new systems such as Fedora Core 6, gcc by default passes --hash-style=gnu
to the linker, so that a standard invocation of "gcc -shared" results in
producing a DSO with only ".gnu.hash". The new ".gnu.hash" sections need
to be dealt with the same way as ".hash" sections in all respects; only the
dynamic linker cares about their contents. To work with older dynamic
linkers (i.e. preexisting releases of glibc), a binary must have the old
".hash" section. The --hash-style=both option produces binaries that a new
dynamic linker can use more efficiently, but an old dynamic linker can
still handle.
The new section runs afoul of the custom linker scripts used to build vDSO
images for the kernel. On ia64, the failure mode for this is a boot-time
panic because the vDSO's PT_IA_64_UNWIND segment winds up ill-formed.
This patch addresses the problem in two ways.
First, it mentions ".gnu.hash" in all the linker scripts alongside ".hash".
This produces correct vDSO images with --hash-style=sysv (or old tools),
with --hash-style=gnu, or with --hash-style=both.
Second, it passes the --hash-style=sysv option when building the vDSO
images, so that ".gnu.hash" is not actually produced. This is the most
conservative choice for compatibility with any old userland. There is some
concern that some ancient glibc builds (though not any known old production
system) might choke on --hash-style=both binaries. The optimizations
provided by the new style of hash section do not really matter for a DSO
with a tiny number of symbols, as the vDSO has. If someone wants to use
=gnu or =both for their vDSO builds and worry less about that
compatibility, just change the option and the linker script changes will
make any choice work fine.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use hotplug version of register_cpu_notifier in late init functions.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is part of an effort to unify the panic_on_oops behaviour across
all architectures that implement it.
It was pointed out to me by Andi Kleen that if an oops has occured in
interrupt context, then calling sleep() in the oops path will only cause a
panic, and that it would be really better for it not to be in the path at
all.
This patch removes the ssleep() call and reworks the console message
accordinly. I have a slght concern that the resulting console message is
too long, feedback welcome.
For powerpc it also unifies the 32bit and 64bit behaviour.
Fror x86_64, this patch only updates the console message, as ssleep() is
already not present.
Signed-off-by: Horms <horms@verge.net.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kprobe inserts breakpoint instruction in probepoint and then jumps to
instruction slot when breakpoint is hit, the instruction slot icache must
be consistent with dcache. Here is the patch which invalidates instruction
slot icache area.
Without this patch, in some machines there will be fault when executing
instruction slot where icache content is inconsistent with dcache.
Signed-off-by: bibo,mao <bibo.mao@intel.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
/proc/pal/*/version_info is a bit confusing. HP firmware, at least,
reports 07.31 instead of 0.7.31. Also, the comment is out of place;
it's an internal detail about the implementation of ia64_pal_version.
Since the 2.2 revision of the SDM still states that PAL_VERSION can
be called in virtual mode, correct the comment to be more accurate.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Newer ARMs have a 40 bit physical address space, but mapping physical
memory above 4G needs a special page table format which we (currently?) do
not use for userspace mappings, so what happens instead is that mapping an
address >= 4G will happily discard the upper bits and wrap.
There is a valid_mmap_phys_addr_range() arch hook where we could check for
>= 4G addresses and deny the mapping, but this hook takes an unsigned long
address:
static inline int valid_mmap_phys_addr_range(unsigned long addr, size_t size);
And drivers/char/mem.c:mmap_mem() calls it like this:
static int mmap_mem(struct file * file, struct vm_area_struct * vma)
{
size_t size = vma->vm_end - vma->vm_start;
if (!valid_mmap_phys_addr_range(vma->vm_pgoff << PAGE_SHIFT, size))
So that's not much help either.
This patch makes the hook take a pfn instead of a phys address.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
screen_info.h doesn't have anything to do with the tty layer and shouldn't be
included by tty.h. This patches removes the include and modifies all users to
directly include screen_info.h. struct screen_info is mainly used to
communicate with the console drivers in drivers/video/console. Note that this
patch touches every arch and I have no way of testing it. If there is a
mistake the worst thing that will happen is a compile error.
[akpm@osdl.org: fix arm build]
[akpm@osdl.org: fix alpha build]
Signed-off-by: Jon Smirl <jonsmir@gmail.com>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.
Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the new IRQF_ constants and remove the SA_INTERRUPT define
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow to tie upper bits of syscall bitmap in audit rules to kernel-defined
sets of syscalls. Infrastructure, a couple of classes (with 32bit counterparts
for biarch targets) and actual tie-in on i386, amd64 and ia64.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add ->retrigger() irq op to consolidate hw_irq_resend() implementations.
(Most architectures had it defined to NOP anyway.)
NOTE: ia64 needs testing. i386 and x86_64 tested.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cleanup: remove irq_descp() - explicit use of irq_desc[] is shorter and more
readable.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidation: remove the irq_affinity[NR_IRQS] array and move it into the
irq_desc[NR_IRQS].affinity field.
[akpm@osdl.org: sparc64 build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch-queue improves the generic IRQ layer to be truly generic, by adding
various abstractions and features to it, without impacting existing
functionality.
While the queue can be best described as "fix and improve everything in the
generic IRQ layer that we could think of", and thus it consists of many
smaller features and lots of cleanups, the one feature that stands out most is
the new 'irq chip' abstraction.
The irq-chip abstraction is about describing and coding and IRQ controller
driver by mapping its raw hardware capabilities [and quirks, if needed] in a
straightforward way, without having to think about "IRQ flow"
(level/edge/etc.) type of details.
This stands in contrast with the current 'irq-type' model of genirq
architectures, which 'mixes' raw hardware capabilities with 'flow' details.
The patchset supports both types of irq controller designs at once, and
converts i386 and x86_64 to the new irq-chip design.
As a bonus side-effect of the irq-chip approach, chained interrupt controllers
(master/slave PIC constructs, etc.) are now supported by design as well.
The end result of this patchset intends to be simpler architecture-level code
and more consolidation between architectures.
We reused many bits of code and many concepts from Russell King's ARM IRQ
layer, the merging of which was one of the motivations for this patchset.
This patch:
rename desc->handler to desc->chip.
Originally i did not want to do this, because it's a big patch. But having
both "desc->handler", "desc->handle_irq" and "action->handler" caused a
large degree of confusion and made the code appear alot less clean than it
truly is.
I have also attempted a dual approach as well by introducing a
desc->chip alias - but that just wasnt robust enough and broke
frequently.
So lets get over with this quickly. The conversion was done automatically
via scripts and converts all the code in the kernel.
This renaming patch is the first one amongst the patches, so that the
remaining patches can stay flexible and can be merged and split up
without having some big monolithic patch act as a merge barrier.
[akpm@osdl.org: build fix]
[akpm@osdl.org: another build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chandra Seetharaman missed one place in commit:
65edc68c34
[but it only shows up when building the ski simulator configuration
of ia64, so thats understandable]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make use the of newly defined hotplug version of cpu_notifier functionality
wherever appropriate.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make notifier_blocks associated with cpu_notifier as __cpuinitdata.
__cpuinitdata makes sure that the data is init time only unless
CONFIG_HOTPLUG_CPU is defined.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).
We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).
If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.
This patch reverts the notifier_call changes made in 2.6.17
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With Goto-san's patch, we can add new pgdat/node at runtime. I'm now
considering node-hot-add with cpu + memory on ACPI.
I found acpi container, which describes node, could evaluate cpu before
memory. This means cpu-hot-add occurs before memory hot add.
In most part, cpu-hot-add doesn't depend on node hot add. But register_cpu(),
which creates symbolic link from node to cpu, requires that node should be
onlined before register_cpu(). When a node is onlined, its pgdat should be
there.
This patch-set holds off creating symbolic link from node to cpu
until node is onlined.
This removes node arguments from register_cpu().
Now, register_cpu() requires 'struct node' as its argument. But the array of
struct node is now unified in driver/base/node.c now (By Goto's node hotplug
patch). We can get struct node in generic way. So, this argument is not
necessary now.
This patch also guarantees add cpu under node only when node is onlined. It
is necessary for node-hot-add vs. cpu-hot-add patch following this.
Moreover, register_cpu calculates cpu->node_id by cpu_to_node() without regard
to its 'struct node *root' argument. This patch removes it.
Also modify callers of register_cpu()/unregister_cpu, whose args are changed
by register-cpu-remove-node-struct patch.
[Brice.Goglin@ens-lyon.org: fix it]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When new node becomes enable by hot-add, new sysfs file must be created for
new node. So, if new node is enabled by add_memory(), register_one_node() is
called to create it. In addition, I386's arch_register_node() and a part of
register_nodes() of powerpc are consolidated to register_one_node() as a
generic_code().
This is tested by Tiger4(IPF) with node hot-plug emulation.
Signed-off-by: Keiichiro Tokunaga <tokuanga.keiich@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
During some profiling I noticed that default_idle causes a lot of
memory traffic. I think that is caused by the atomic operations
to clear/set the polling flag in thread_info. There is actually
no reason to make this atomic - only the idle thread does it
to itself, other CPUs only read it. So I moved it into ti->status.
Converted i386/x86-64/ia64 for now because that was the easiest
way to fix ACPI which also manipulates these flags in its idle
function.
Cc: Nick Piggin <npiggin@novell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert a few stragglers over to for_each_possible_cpu(), remove
for_each_cpu().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (65 commits)
ACPI: suppress power button event on S3 resume
ACPI: resolve merge conflict between sem2mutex and processor_perflib.c
ACPI: use for_each_possible_cpu() instead of for_each_cpu()
ACPI: delete newly added debugging macros in processor_perflib.c
ACPI: UP build fix for bugzilla-5737
Enable P-state software coordination via _PDC
P-state software coordination for speedstep-centrino
P-state software coordination for acpi-cpufreq
P-state software coordination for ACPI core
ACPI: create acpi_thermal_resume()
ACPI: create acpi_fan_suspend()/acpi_fan_resume()
ACPI: pass pm_message_t from acpi_device_suspend() to root_suspend()
ACPI: create acpi_device_suspend()/acpi_device_resume()
ACPI: replace spin_lock_irq with mutex for ec poll mode
ACPI: Allow a WAN module enable/disable on a Thinkpad X60.
sem2mutex: acpi, acpi_link_lock
ACPI: delete unused acpi_bus_drivers_lock
sem2mutex: drivers/acpi/processor_perflib.c
ACPI add ia64 exports to build acpi_memhotplug as a module
ACPI: asus_acpi_init(): propagate correct return value
...
Manual resolve of conflicts in:
arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
include/acpi/processor.h
Pass the POSIX lock owner ID to the flush operation.
This is useful for filesystems which don't want to store any locking state
in inode->i_flock but want to handle locking/unlocking POSIX locks
internally. FUSE is one such filesystem but I think it possible that some
network filesystems would need this also.
Also add a flag to indicate that a POSIX locking request was generated by
close(), so filesystems using the above feature won't send an extra locking
request in this case.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
move_pages() is used to move individual pages of a process. The function can
be used to determine the location of pages and to move them onto the desired
node. move_pages() returns status information for each page.
long move_pages(pid, number_of_pages_to_move,
addresses_of_pages[],
nodes[] or NULL,
status[],
flags);
The addresses of pages is an array of void * pointing to the
pages to be moved.
The nodes array contains the node numbers that the pages should be moved
to. If a NULL is passed instead of an array then no pages are moved but
the status array is updated. The status request may be used to determine
the page state before issuing another move_pages() to move pages.
The status array will contain the state of all individual page migration
attempts when the function terminates. The status array is only valid if
move_pages() completed successfullly.
Possible page states in status[]:
0..MAX_NUMNODES The page is now on the indicated node.
-ENOENT Page is not present
-EACCES Page is mapped by multiple processes and can only
be moved if MPOL_MF_MOVE_ALL is specified.
-EPERM The page has been mlocked by a process/driver and
cannot be moved.
-EBUSY Page is busy and cannot be moved. Try again later.
-EFAULT Invalid address (no VMA or zero page).
-ENOMEM Unable to allocate memory on target node.
-EIO Unable to write back page. The page must be written
back in order to move it since the page is dirty and the
filesystem does not provide a migration function that
would allow the moving of dirty pages.
-EINVAL A dirty page cannot be moved. The filesystem does not provide
a migration function and has no ability to write back pages.
The flags parameter indicates what types of pages to move:
MPOL_MF_MOVE Move pages that are only mapped by the process.
MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
Requires sufficient capabilities.
Possible return codes from move_pages()
-ENOENT No pages found that would require moving. All pages
are either already on the target node, not present, had an
invalid address or could not be moved because they were
mapped by multiple processes.
-EINVAL Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
to migrate pages in a kernel thread.
-EPERM MPOL_MF_MOVE_ALL specified without sufficient priviledges.
or an attempt to move a process belonging to another user.
-EACCES One of the target nodes is not allowed by the current cpuset.
-ENODEV One of the target nodes is not online.
-ESRCH Process does not exist.
-E2BIG Too many pages to move.
-ENOMEM Not enough memory to allocate control array.
-EFAULT Parameters could not be accessed.
A test program for move_pages() may be found with the patches
on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3
From: Christoph Lameter <clameter@sgi.com>
Detailed results for sys_move_pages()
Pass a pointer to an integer to get_new_page() that may be used to
indicate where the completion status of a migration operation should be
placed. This allows sys_move_pags() to report back exactly what happened to
each page.
Wish there would be a better way to do this. Looks a bit hacky.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>