The dynamic add path for PCI Host Bridges can fail to configure children
adapters under P5IOC controllers. It fails to properly fixup bus/device
resources, and it fails to properly enable EEH. Both of these steps
need to occur before any children devices are enabled in
pci_bus_add_devices().
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
While testing kexec and kdump we hit problems where the new kernel would
freeze or instantly reboot. The easiest way to trigger it was to kexec a
kernel compiled for CONFIG_M586 on an athlon cpu. Compiling for CONFIG_MK7
instead would work fine.
The patch fixes a few problems with the kexec inline asm.
Signed-off-by: Chris Mason <mason@suse.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We fixed this:
arch/powerpc/platforms/pseries/eeh.c: In function `eeh_add_device_tree_late':
arch/powerpc/platforms/pseries/eeh.c:901: warning: implicit declaration of function `eeh_add_device_late'
arch/powerpc/platforms/pseries/eeh.c: At top level:
arch/powerpc/platforms/pseries/eeh.c:918: error: conflicting types for 'eeh_add_device_late'
arch/powerpc/platforms/pseries/eeh.c:901: error: previous implicit declaration of 'eeh_add_device_late' was here
make[2]: *** [arch/powerpc/platforms/pseries/eeh.o] Error 1
But we forgot the !CONFIG_EEH stub.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A careful reading of the recent changes to the system call entry/exit
paths revealed several problems, plus some things that could be
simplified and improved:
* 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit
path, so it was only doing anything with it once it saw some other
bit being set. In other words, the noerror behaviour would apply to
the next system call where we had to reschedule or deliver a signal,
which is not necessarily the current system call.
* 32-bit wasn't doing the call to ptrace_notify in the syscall exit
path when the _TIF_SINGLESTEP bit was set.
* _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and
_TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set
by system calls. I took it out of _TIF_USER_WORK_MASK.
* On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers
to be restored (unless perhaps a signal was delivered or the syscall
was traced or single-stepped). Thus the non-volatile registers
weren't restored on exit from a signal handler. We probably got
away with it mostly because signal handlers written in C wouldn't
alter the non-volatile registers.
* On 32-bit I simplified the code and made it more like 64-bit by
making the syscall exit path jump to ret_from_except to handle
preemption and signal delivery.
* 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was
set - but I think because of that 32-bit was actually restoring the
non-volatile registers on exit from a signal handler.
* I changed the order of enabling interrupts and saving the
non-volatile registers before calling do_syscall_trace_leave; now we
enable interrupts first.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The powerpc pud_ERROR() function misleadingly prints a message
indicating a pmd error. This patch fixes it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch makes userland aware of the icache snoop capability of the
POWER5 (and possibly others in the future) and of SMT capabilities.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch should fixe a problem with eeh_add_device_late() not being
defined in the ppc64 build process, causing the build to break.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some hotplug driver functions were migrated to the kernel for use by EEH
in commit 2bf6a8fa21.
Previously, the PCI Hotplug module had been changed to use the new
OFDT-based PCI probe when appropriate:
5fa80fcdca
When rpaphp_pci_config_slot() was moved from the rpaphp driver to the
new kernel function pcibios_add_pci_devices(), the OFDT-based probe
stuff was dropped. This patch restores it.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Do atomic_add_unless natively instead of using cmpxchg.
Improved register allocation idea from Joel Schopp.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a newline at the end of the ISYNC_ON_SMP string.
Needed for a subsequent patch.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently ARCH=powerpc will not compile when STRICT_MM_TYPECHECKS is
turned on and CONFIG_64K_PAGES is turned off. This corrects the
problem.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This implements accurate task and cpu time accounting for 64-bit
powerpc kernels. Instead of accounting a whole jiffy of time to a
task on a timer interrupt because that task happened to be running at
the time, we now account time in units of timebase ticks according to
the actual time spent by the task in user mode and kernel mode. We
also count the time spent processing hardware and software interrupts
accurately. This is conditional on CONFIG_VIRT_CPU_ACCOUNTING. If
that is not set, we do tick-based approximate accounting as before.
To get this accurate information, we read either the PURR (processor
utilization of resources register) on POWER5 machines, or the timebase
on other machines on
* each entry to the kernel from usermode
* each exit to usermode
* transitions between process context, hard irq context and soft irq
context in kernel mode
* context switches.
On POWER5 systems with shared-processor logical partitioning we also
read both the PURR and the timebase at each timer interrupt and
context switch in order to determine how much time has been taken by
the hypervisor to run other partitions ("steal" time). Unfortunately,
since we need values of the PURR on both threads at the same time to
accurately calculate the steal time, and since we can only calculate
steal time on a per-core basis, the apportioning of the steal time
between idle time (time which we ceded to the hypervisor in the idle
loop) and actual stolen time is somewhat approximate at the moment.
This is all based quite heavily on what s390 does, and it uses the
generic interfaces that were added by the s390 developers,
i.e. account_system_time(), account_user_time(), etc.
This patch doesn't add any new interfaces between the kernel and
userspace, and doesn't change the units in which time is reported to
userspace by things such as /proc/stat, /proc/<pid>/stat, getrusage(),
times(), etc. Internally the various task and cpu times are stored in
timebase units, but they are converted to USER_HZ units (1/100th of a
second) when reported to userspace. Some precision is therefore lost
but there should not be any accumulating error, since the internal
accumulation is at full precision.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The runlatch SPR can take a lot of time to write. My original runlatch
code would set it on every exception entry even though most of the time
this was not required. It would also continually set it in the idle
loop, which is an issue on an SMT capable processor.
Now we cache the runlatch value in a threadinfo bit, and only check for
it in decrementer and hardware interrupt exceptions as well as the idle
loop. Boot on POWER3, POWER5 and iseries, and compile tested on pmac32.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On the 83xx platform to ensure the PCI inbound memory is handled properly we
have to turn on coherency for all pages in the MMU. Otherwise we see
corruption if inbound "prefetching/streaming" is enabled on the PCI controller.
Signed-off-by: Randy Vinson <rvinson@mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For kexec we need to know the size of the MMU hash table.
Currently we calculate the size once in the htab code, and then twice more in
the kexec code, once using htab_hash_mask and once using ppc64_pft_size.
On some machines the ppc64_pft_size calculation is broken because
ppc64_pft_size is not set.
So we need to fix the second calculation, but better still we should just
calculate the size once and use it everywhere else.
Tested on Power5 LPAR, Power4 non-LPAR and Power3.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
One of the parameters to the __pud_free_tlb() macro for powerpc is
incorrect (see patch) . We get away with it by accident, because the one
place the macro is called, the second parameter is a variable named "pud".
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make new MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK consistent across all
arches. The idea is to make it possible to use them portably even before
distros include them in libc headers.
Move common flags to asm-generic/mman.h
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, copy-on-write may change the physical address of a page even if the
user requested that the page is pinned in memory (either by mlock or by
get_user_pages). This happens if the process forks meanwhile, and the parent
writes to that page. As a result, the page is orphaned: in case of
get_user_pages, the application will never see any data hardware DMA's into
this page after the COW. In case of mlock'd memory, the parent is not getting
the realtime/security benefits of mlock.
In particular, this affects the Infiniband modules which do DMA from and into
user pages all the time.
This patch adds madvise options to control whether memory range is inherited
across fork. Useful e.g. for when hardware is doing DMA from/into these
pages. Could also be useful to an application wanting to speed up its forks
by cutting large areas out of consideration.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Hugh Dickins <hugh@veritas.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree. I think this accomplises
everything wanted, though there might be a few references I missed.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we have some stuff in firmware.h and kernel/firmware.c that is
#ifdef CONFIG_PPC_PSERIES. Move it all into platforms/pseries.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Registers system call for the powerpc architecture.
Signed-off-by: Janak Desai <janak@us.ibm.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds a windfarm module, windfarm_pm112, for the dual core G5s
(both 2 and 4 core models), keeping the machine from getting into
vacuum-cleaner mode ;) For proper credits, the patch was initially
written by Paul Mackerras, and slightly reworked by me to add overtemp
handling among others. The patch also removes the sysfs attributes from
windfarm_pm81 and windfarm_pm91 and instead adds code to the windfarm
core to automagically expose attributes for sensor & controls.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A bunch of asm/bug.h includes are both not needed (since it will get
pulled anyway) and bogus (since they are done too early). Removed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Prototypes aren't so useful without parameter names, add them to lmb.h based
on the names in lmb.c
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
LMB_ALLOC_ANYWHERE doesn't need to be part of the API, it's only used in
lmb.c - so move it out of the header file.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently most callers of lmb_alloc() don't check if it worked or not, if it
ever does weird bad things will probably happen. The few callers who do check
just panic or BUG_ON.
So make lmb_alloc() panic internally, to catch bugs at the source. The few
callers who did check the result no longer need to.
The only caller that did anything interesting with the return result was
careful_allocation(). For it we create __lmb_alloc_base() which _doesn't_ panic
automatically, a little messy, but passable.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It's possible for prom_init to allocate the flat device tree inside the
kdump crash kernel region. If this happens, when we load the kdump kernel we
overwrite the flattened device tree, which is bad.
We could make prom_init try and avoid allocating inside the crash kernel
region, but then we run into issues if the crash kernel region uses all the
space inside the RMO. The easiest solution is to move the flat device tree
once we're running in the kernel.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
- kexec.h is included from assembly code, thus C code must be properly
protected.
- (embedded) ppc32 systems use machine_kexec_simple whose declaration
vanished during a recent powerpc merge change.
Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Cc: <fastboot@osdl.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the platform function interrupt functions actually work. Calls
irq_enable() for the first in the list, and irq_disable() for the last.
Added *func to struct irq_client so the the user can pass just that to
pmf_unregister_irq_client().
Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement the TIF_RESTORE_SIGMASK flag in the new arch/powerpc kernel, for
both 32-bit and 64-bit system call paths.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of
sys_rt_sigsuspend() instead of duplicating it for each architecture. This
provides such an implementation and makes arch/powerpc use it.
It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sparse can't parse a struct definition in include/asm-powerpc/lppaca.h,
even though gcc can accept it. The form looks like this:
struct __attribute__((whatever)) foo { };
An equivalent that both gcc and sparse can handle is
struct foo { } __attribute__((whatever));
This is the only definition of this type in the tree, and fixing it is
easier than fixing sparse.
Signed-off-by: Bryan O'Sullivan <bos@serpentine.com>
[ Side note: fixing sparse wouldn't be hard, but the "attribute at the
end" version is the canonical one, and the one that makes sense. So
let's just fix the kernel instead. Luc Van Oostenryck already sent
out a sparse patch to the sparse mailing list in case anybody cares.
-- Linus ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- This contains the arch specific changes for the following the
kdump generic fixes which were already accepted in the upstream.
. Capturing CPU registers (for the case of 'panic' and invoking
the dump using 'sysrq-trigger') from a function (stack frame) which will
be not be available during the kdump boot. Hence, might result in
invalid stack trace.
. Dynamically allocating per cpu ELF notes section instead of
statically for NR_CPUS.
- Fix the compiler warning in prom_init.c.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In 2.6.15-git6 a change was commited in the oprofile support in
the powerpc architecture. It introduced the powerpc_oprofile_type
which contains the define G4. This causes a name clash with the
existing wacom usb tablet driver.
CC [M] drivers/usb/input/wacom.o
drivers/usb/input/wacom.c:98: error: conflicting types for `G4'
include/asm/cputable.h:37: error: previous declaration of `G4'
CC [M] drivers/usb/mon/mon_text.o
make[3]: *** [drivers/usb/input/wacom.o] Error 1
make[2]: *** [drivers/usb/input] Error 2
The elements of an enum declared in global scope are effectivly
global identifiers themselves. As such we need to ensure the names
are unique. This patch updates the later oprofile support to use
unique names.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The glibc folks want to use AT_PLATFORM to select between possible
alternative versions of shared libraries. This commit makes the kernel
supply an AT_PLATFORM string that indicates what class of processor
we are running on. Processors with the same set of user-level
instructions and roughly the same instruction scheduling characteristics
are given the same AT_PLATFORM value; for example, 821, 823 and 860
are all reported as "ppc823", and 7447, 7447A, 7448, 7450, 7451, 7455
are all called "ppc7450".
The intention is that the AT_PLATFORM values match the values that
gcc accepts for the -mcpu= option. For values which are numeric
(e.g. -mcpu=750), "ppc" has been prepended.
This also adds a PPC_FEATURE_BOOKE bit to the AT_HWCAP value and sets
it for the 440 family and the Freescale 85xx family.
Signed-off-by: Paul Mackerras <paulus@samba.org>
eieio is only a store - store ordering. When used to order an unlock
operation loads may leak out of the critical region. This is potentially
buggy, one example is if a user wants to atomically read a couple of
values.
We can solve this with an lwsync which orders everything except store - load.
I removed the (now unused) EIEIO_ON_SMP macros and the c versions
isync_on_smp and eieio_on_smp now we dont use them. I also removed some
old comments that were used to identify inline spinlocks in assembly,
they dont make sense now our locks are out of line.
Another interesting thing was that read_unlock was using an eieio even
though the rest of the spinlock code had already been converted to
use lwsync.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present the lppaca - the structure shared with the iSeries
hypervisor and phyp - is contained within the PACA, our own low-level
per-cpu structure. This doesn't have to be so, the patch below
removes it, making a separate array of lppaca structures.
This saves approximately 500*NR_CPUS bytes of image size and kernel
memory, because we don't need aligning gap between the Linux and
hypervisor portions of every PACA. On the other hand it means an
extra level of dereference in many accesses to the lppaca.
The patch also gets rid of several places where we assign the paca
address to a local variable for no particular reason.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch consolidates the variety of macros used for loading 32 or
64-bit constants in assembler (LOADADDR, LOADBASE, SET_REG_TO_*). The
idea is to make the set of macros consistent across 32 and 64 bit and
to make it more obvious which is the appropriate one to use in a given
situation. The new macros and their semantics are described in the
comments in ppc_asm.h.
In the process, we change several places that were unnecessarily using
immediate loads on ppc64 to use the GOT/TOC. Likewise we cleanup a
couple of places where we were clumsily subtracting PAGE_OFFSET with
asm instructions to use assemble-time arithmetic or the toreal() macro
instead.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add an of_find_property function that returns a struct property
given a property name. Then change the get_property function to
use that routine internally.
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add support for updating and removing device tree
properties. Since we hand out pointers to properties with gay
abandon, we can't just free the property storage. Instead we
move deleted, or the old copy of an updated property, to a
"dead properties" list.
Also note, its not feasable to kref device tree properties.
we call get_property() all over the kernel in a wild variety
of contexts.
One consequence of this change is that we now take a
read_lock(&devtree_lock) when doing get_property().
Signed-off-by: Dave Boutcher <sleddog@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
{get,put}_thread_info() were introduced in 2.5.4 and never
had been called by anything in the tree.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Ingo Molnar <mingo@elte.hu>
This is the latest version of the scheduler cache-hot-auto-tune patch.
The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:
- I've added a 'domain distance' function, which is used to cache
measurement results. Each distance is only measured once. This means
that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
distances 0 and 1, and on SMP distance 0 is measured. The code walks
the domain tree to determine the distance, so it automatically follows
whatever hierarchy an architecture sets up. This cuts down on the boot
time significantly and removes the O(N^2) limit. The only assumption
is that migration costs can be expressed as a function of domain
distance - this covers the overwhelming majority of existing systems,
and is a good guess even for more assymetric systems.
[ People hacking systems that have assymetries that break this
assumption (e.g. different CPU speeds) should experiment a bit with
the cpu_distance() function. Adding a ->migration_distance factor to
the domain structure would be one possible solution - but lets first
see the problem systems, if they exist at all. Lets not overdesign. ]
Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:
- Instead of relying on a single cache-size provided by the platform and
sticking to it, the code now auto-detects the 'effective migration
cost' between two measured CPUs, via iterating through a wide range of
cachesizes. The code searches for the maximum migration cost, which
occurs when the working set of the test-workload falls just below the
'effective cache size'. I.e. real-life optimized search is done for
the maximum migration cost, between two real CPUs.
This, amongst other things, has the positive effect hat if e.g. two
CPUs share a L2/L3 cache, a different (and accurate) migration cost
will be found than between two CPUs on the same system that dont share
any caches.
(The reliable measurement of migration costs is tricky - see the source
for details.)
Furthermore i've added various boot-time options to override/tune
migration behavior.
Firstly, there's a blanket override for autodetection:
migration_cost=1000,2000,3000
will override the depth 0/1/2 values with 1msec/2msec/3msec values.
Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:
migration_factor=120
will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.
I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:
Dual Celeron (128K L2 cache):
---------------------
migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
---------------------
[00] [01]
[00]: - 1.7(1)
[01]: 1.7(1) -
---------------------
cacheflush times [2]: 0.0 (0) 1.7 (1784008)
---------------------
Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.
Dual HT P4 (512K L2 cache):
---------------------
migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
---------------------
[00] [01] [02] [03]
[00]: - 0.4(1) 0.0(0) 0.4(1)
[01]: 0.4(1) - 0.4(1) 0.0(0)
[02]: 0.0(0) 0.4(1) - 0.4(1)
[03]: 0.4(1) 0.0(0) 0.4(1) -
---------------------
cacheflush times [2]: 0.0 (33900) 0.4 (448514)
---------------------
Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.
8-way P3/Xeon [2MB L2 cache]:
---------------------
migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
---------------------
[00] [01] [02] [03] [04] [05] [06] [07]
[00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1)
[04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1)
[05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1)
[06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1)
[07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) -
---------------------
cacheflush times [2]: 0.0 (0) 19.2 (19281756)
---------------------
This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add per-arch sched_cacheflush() which is a write-back cacheflush used by
the migration-cost calibration code at bootup time.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pcibios_claim_one_bus is not needed on iSeries and phbs_remap_io can be
mode static.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There was a function declared for CONFIG_PSERIES which no longer exists
and the two function declarations for CONFIG_ISERIES have been moved
into an include file in platforms/iseries since they are defined and
used only there.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This reverts part of "ppc64 iSeries: allow build with no PCI"
(145d01e428) which affected generic code
and applies a fix in the arch specific code.
Commit "partly merge iseries do_IRQ"
(5fee9b3b39eb55c7e3619a3b36ceeabffeb8f144) introduced iSeries_get_irq
which was only available if CONFIG_PCI is set.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Mainly just removing file names from the comments.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Also does some comment cleanups and removal of unnecessary
variables.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Heikki Lindholm pointed out that there was a potential race with the
lazy CPU state (FP, VR, EVR) stuff if preempt is enabled. The race
is that in the process of restoring FP state on sigreturn, the task
gets preempted by a user task that wants to use the FPU. It will take
an FP unavailable exception, which will write the current FPU state
to the thread_struct, overwriting the values which sigreturn has
stored. Note that this can only happen on UP since we don't implement
lazy CPU state on SMP.
The fix is to flush the lazy CPU state before updating the
thread_struct. To do this we re-use the flush_lazy_cpu_state()
function from process.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
PPC32 is still using asm-generic/4level-fixup.h, but asm-powerpc/page.h
was defining pud_t and pgd_t. Depending on the order in which files
got included, this could result in a compilation error. Tweak the ifdef
so that page.h doesn't try to define pud_t on ppc32 (which uses 2-level
page tables).
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current ppc64 per cpu data implementation is quite slow. eg:
lhz 11,18(13) /* smp_processor_id() */
ld 9,.LC63-.LCTOC1(30) /* per_cpu__variable_name */
ld 8,.LC61-.LCTOC1(30) /* __per_cpu_offset */
sldi 11,11,3 /* form index into __per_cpu_offset */
mr 10,9
ldx 9,11,8 /* __per_cpu_offset[smp_processor_id()] */
ldx 0,10,9 /* load per cpu data */
5 loads for something that is supposed to be fast, pretty awful. One
reason for the large number of loads is that we have to synthesize 2
64bit constants (per_cpu__variable_name and __per_cpu_offset).
By putting __per_cpu_offset into the paca we can avoid the 2 loads
associated with it:
ld 11,56(13) /* paca->data_offset */
ld 9,.LC59-.LCTOC1(30) /* per_cpu__variable_name */
ldx 0,9,11 /* load per cpu data
Longer term we can should be able to do even better than 3 loads.
If per_cpu__variable_name wasnt a 64bit constant and paca->data_offset
was in a register we could cut it down to one load. A suggestion from
Rusty is to use gcc's __thread extension here. In order to do this we
would need to free up r13 (the __thread register and where the paca
currently is). So far Ive had a few unsuccessful attempts at doing that :)
The patch also allocates per cpu memory node local on NUMA machines.
This patch from Rusty has been sitting in my queue _forever_ but stalled
when I hit the compiler bug. Sorry about that.
Finally I also only allocate per cpu data for possible cpus, which comes
straight out of the x86-64 port. On a pseries kernel (with NR_CPUS == 128)
and 4 possible cpus we see some nice gains:
total used free shared buffers cached
Mem: 4012228 212860 3799368 0 0 162424
total used free shared buffers cached
Mem: 4016200 212984 3803216 0 0 162424
A saving of 3.75MB. Quite nice for smaller machines. Note: we now have
to be careful of per cpu users that touch data for !possible cpus.
At this stage it might be worth making the NUMA and possible cpu
optimisations generic, but per cpu init is done so early we have to be
careful that all architectures have their possible map setup correctly.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This stops parport from accessing nonexistent parallel ports.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds Kconfig entries to control the early debugging options,
currently in setup_64.c.
Doing this via Kconfig rather than #defines means you can have one source tree,
which is buildable for multiple platforms - and you can enable the correct
early debug option for each platform via .config.
I made udbg_early_init() a static inline because otherwise GCC is to daft to
optimise it away when debugging is off.
Now that we have udbg_init_rtas() we can make call_rtas_display_status* static.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The following patch (against 2.6.15-rc5-mm3) fixes a kprobes build break
due to changes introduced in the kprobe locking in 2.6.15-rc5-mm3. In
addition, the patch reverts back the open-coding of kprobe_mutex.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently arch_remove_kprobes() is only implemented/required for x86_64 and
powerpc. All other architecture like IA64, i386 and sparc64 implementes a
dummy function which is being called from arch independent kprobes.c file.
This patch removes the dummy functions and replaces it with
#define arch_remove_kprobe(p, s) do { } while(0)
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The arch specific kprobes.h files never gets included when CONFIG_KPROBES is
turned off. Hence check for CONFIG_KPROBES is not appropriate here in this
arch specific kprobes.h files.
Also the below defined function kprobes_exception_notify() is not needed when
CONFIG_KPROBES is off.
Compile tested for both CONFIG_KPROBES=y and N.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Kernel/kprobes.c defines get_insn_slot() and free_insn_slot() which are
currently required _only_ for x86_64 and powerpc (which has no-exec support).
FYI, get{free}_insn_slot() functions manages the memory page which is mapped
as executable, required for instruction emulation.
This patch moves those two functions under __ARCH_WANT_KPROBES_INSN_SLOT and
defines __ARCH_WANT_KPROBES_INSN_SLOT in arch specific kprobes.h file.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
)
From: Vivek Goyal <vgoyal@in.ibm.com>
crash_setup_regs() is an architecture dependent function which is called in
architecture independent section. So every architecture supporting kexec
should at least provide a dummy definition of crash_setup_regs() even if
crash dumping is not implemented yet, to avoid build failures.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- In case of system crash, current state of cpu registers is saved in memory
in elf note format. So far memory for storing elf notes was being allocated
statically for NR_CPUS.
- This patch introduces dynamic allocation of memory for storing elf notes.
It uses alloc_percpu() interface. This should lead to better memory usage.
- Introduced based on Andi Kleen's and Eric W. Biederman's suggestions.
- This patch also moves memory allocation for elf notes from architecture
dependent portion to architecture independent portion. Now crash_notes is
architecture independent. The whole idea is that size of memory to be
allocated per cpu (MAX_NOTE_BYTES) can be architecture dependent and
allocation of this memory can be architecture independent.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The previous change by Kumar Gala in this area led to legacy_serial.c
and udbg_16550.c being built as modules when CONFIG_SERIAL_8250=m.
Fix this by introducing a new symbol, CONFIG_PPC_UDBG_16550, to
control whether these files get built, and arrange for it to be selected
for those platforms that need it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
241-eeh-save-bars-earlier.patch
Save the PCI device bars *before* any PCI probing is done.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 76c902b919098860f3d4e125f847abcc4cb1782a commit)
238-eeh-stop-if-reset_failed.patch
If the firmware is unable to reset the PCI slot for some reason, then
don't attempt any further recovery steps after that point. Instead,
mark the device as permanently failed.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e06b942521eb2cdaf232726f45a820d5837acb12 commit)
234-eeh-find-pe.patch
The find_device_pe() routine is duplicated in two files. Remove one of
the two copies, declare the other extern.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 48408e708282d4d0269136ff27ea5acbd9410b5a commit)
26-eeh-partition-endpoint.patch
New versions of firmware introduce a new method by which the
"partitionable endpoint" (the point at which the pci bus is cut)
should be located. This code adds the support for this (mandatory)
new feature.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 9fcfb5d35b5294659f9299aa9cae6fd16325c07e commit)
25-pci-address-cache.patch
The core EEH file is rather large. This patch splits out a self-contained
chunk of it into its own file. This is the chunk that performes the
caching and lookup of pci devices based on the i/o addresses of thier
resoures. This code is almos architecture-independent and could be
used by any system that wanted to find a pci device based only on
the i/o address used by the device.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from b0b291d59906d4a9a89ed9e34d9fd684c7188924 commit)
Various PCI bus errors can be signaled by newer PCI controllers. The
core error recovery routines are architecture dependent. This patch adds
a recovery infrastructure for the PPC64 pSeries systems.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)
add the per-arch mutex.h files for the remaining architectures.
We default to asm-generic/mutex-dec.h, because that performs
quite well on most arches. Arches that do not have atomic
decrement/increment instructions should switch to mutex-xchg.h
instead. Arches can also provide their own implementation for
the mutex fastpath primitives.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add atomic_xchg() to all the architectures. Needed by the new mutex code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
My recent changes to oprofile broke it when built as a module. Fix it by
using an enum instead of a function pointer. This way we still retain
the oprofile configuration in the cputable.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is the platform function interpreter itself along with the backends
for UniN/U3/U4, mac-io, GPIOs and i2c. It adds the ability to execute
those do-platform-* scripts in the device-tree (at least for most
devices for which a backend is provided). This should replace the clock
spreading hacks properly. It might also have an impact on all sort of
machines since some of the scripts marked "at init" will now be executed
on boot (or some other on sleep/wakeup), those will possibly do things
that the kernel didn't do at all, like setting some values into some i2c
devices (changing thermal sensor calibration or conversion rate) etc...
Thus regression testing is MUCH welcome. Also loook for errors in dmesg.
That's also why I've left rather verbose debugging enabled in this
version of the patch.
(I do expect some Windtunnel G4s to show some errors as they have an i2c
clock chip on the PMU bus that uses some primitives that the i2c backend
doesn't implement yet. I really need users that have one of those
machine to come back to me so we can get that done right, though the
errors themselves should be harmless, I suspect the machine might not
run at full speed).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is the continuation of the previous patch. This one removes the old
PowerMac i2c drivers (i2c-keywest and i2c-pmac-smu) and replaces them
both with a single stub driver that uses the new PowerMac low i2c layer.
Now that i2c-keywest is gone, the low-i2c code is extended to support
interrupt driver transfers. All i2c busses now appear as platform
devices. Compatibility with existing drivers should be maintained as the
i2c bus names have been kept identical, except for the SMU bus but in
that later case, all users has been fixed.
With that patch added, matching a device node to an i2c_adapter becomes
trivial.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This is the first part of a rework of the PowerMac i2c code. It
completely reworks the "low_i2c" layer. It is now more flexible,
supports KeyWest, SMU and PMU i2c busses, and provides functions to
match device nodes to i2c busses and adapters.
This patch also extends & fix some bugs in the SMU driver related to i2c
support and removes the clock spreading hacks from the pmac feature code
rather than adapting them to the new API since they'll be replaced by
the platform function code completely in patch 3/5
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For far, all SPU triggered interrupts always end up on
the first SMT thread, which is a bad solution.
This patch implements setting the affinity to the
CPU that was running last when entering execution on
an SPU. This should result in a significant reduction
in IPI calls and better cache locality for SPE thread
specific data.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The size of the local store is architecture defined
and independent from the page size, so it should
not be defined in terms of pages in the first place.
This mistake broke a few places when building for
64kb pages.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In a hypervisor based setup, direct access to the first
priviledged register space can typically not be allowed
to the kernel and has to be implemented through hypervisor
calls.
As suggested by Masato Noguchi, let's abstract the register
access trough a number of function calls. Since there is
currently no public specification of actual hypervisor
calls to implement this, I only provide a place that
makes it easier to hook into.
Cc: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Cc: Geoff Levand <geoff.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
checking bits manually might not be synchonized with
the use of set_bit/clear_bit. Make sure we always use
the correct bitops by removing the unnecessary
identifiers.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch enables support for pause(0) power management state
for the Cell Broadband Processor, which is import for power efficient
operation. The pervasive infrastructure will in the future enable
us to introduce more functionality specific to the Cell's
pervasive unit.
From: Maximino Aguilar <maguilar@us.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Undefined symbols (eeh_add_device_tree_early and eeh_remove_bus_device)
when EEH is not enabled. This small patch will fix this.
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Added a common udbg_progress for use by ppc_md.progress()
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>