* 'core-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
plist: Fix grammar mistake, and c-style mistake
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kprobes: Add mcount to the kprobes blacklist
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86_64: Print modules like i386 does
* 'x86-doc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Put 'nopat' in kernel-parameters
* 'x86-gpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64: Allow fbdev primary video code
* 'x86-rlimit-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Use helpers for rlimits
Now that both Xen and VMI disable allocations of PTE pages from high
memory this paravirt op serves no further purpose.
This effectively reverts ce6234b5 "add kmap_atomic_pte for mapping
highpte pages".
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1267204562-11844-3-git-send-email-ian.campbell@citrix.com>
Acked-by: Alok Kataria <akataria@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Remove the name field from the arch_hw_breakpoint. We never deal
with target symbols in the arch level, neither do we need to ever
store it. It's a legacy for the previous version of the x86
breakpoint backend.
Let's remove it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Enable NMI on all cpus in UV system and add an NMI handler
to dump_stack on each cpu.
By default on x86 all the cpus except the boot cpu have NMI
masked off. This patch enables NMI on all cpus in UV system
and adds an NMI handler to dump_stack on each cpu. This
way if a system hangs we can NMI the machine and get a
backtrace from all the cpus.
Version 2: Use x86_platform driver mechanism for nmi init, per
Ingo's suggestion.
Version 3: Clean up Ingo's nits.
Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20100226164912.GA24439@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Replace the #ifdef'ed OLPC-specific init functions by a conditional
x86_init function. If the function returns 0 we leave pci_arch_init,
otherwise we continue.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andres Salomon <dilinger@collabora.co.uk>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318CE89@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Added an abstraction function for arch specific init calls.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318CE84@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Introduce x86 arch-specific optimization code, which supports
both of x86-32 and x86-64.
This code also supports safety checking, which decodes whole of
a function in which probe is inserted, and checks following
conditions before optimization:
- The optimized instructions which will be replaced by a jump instruction
don't straddle the function boundary.
- There is no indirect jump instruction, because it will jumps into
the address range which is replaced by jump operand.
- There is no jump/loop instruction which jumps into the address range
which is replaced by jump operand.
- Don't optimize kprobes if it is in functions into which fixup code will
jumps.
This uses text_poke_multibyte() which doesn't support modifying
code on NMI/MCE handler. However, since kprobes itself doesn't
support NMI/MCE code probing, it's not a problem.
Changes in v9:
- Use *_text_reserved() for checking the probe can be optimized.
- Verify jump address range is in 2G range when preparing slot.
- Backup original code when switching optimized buffer, instead of
preparing buffer, because there can be int3 of other probes in
preparing phase.
- Check kprobe is disabled in arch_check_optimized_kprobe().
- Strictly check indirect jump opcodes (ff /4, ff /5).
Changes in v6:
- Split stop_machine-based jump patching code.
- Update comments and coding style.
Changes in v5:
- Introduce stop_machine-based jump replacing.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Anders Kaseorg <andersk@ksplice.com>
Cc: Tim Abbott <tabbott@ksplice.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20100225133446.6725.78994.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add generic text_poke_smp for SMP which uses stop_machine()
to synchronize modifying code.
This stop_machine() method is officially described at "7.1.3
Handling Self- and Cross-Modifying Code" on the intel's
software developer's manual 3A.
Since stop_machine() can't protect code against NMI/MCE, this
function can not modify those handlers. And also, this function
is basically for modifying multibyte-single-instruction. For
modifying multibyte-multi-instructions, we need another special
trap & detour code.
This code originaly comes from immediate values with
stop_machine() version. Thanks Jason and Mathieu!
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Anders Kaseorg <andersk@ksplice.com>
Cc: Tim Abbott <tabbott@ksplice.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20100225133438.6725.80273.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
enable CONFIG_HIGHPTE for any x86 configuration which has highmem
enabled. This means that the overhead applies even to machines which
have a fairly modest amount of high memory and which therefore do not
really benefit from allocating PTEs in high memory but still pay the
price of the additional mapping operations.
Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
no actual highptes being allocated there was a reduction in system
time used from 59.737s to 55.9s.
With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 175.396 (0.238914)
User Time 515.983 (5.85019)
System Time 59.737 (1.26727)
Percent CPU 263.8 (71.6796)
Context Switches 39989.7 (4672.64)
Sleeps 42617.7 (246.307)
With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 174.278 (0.831968)
User Time 515.659 (6.07012)
System Time 55.9 (1.07799)
Percent CPU 263.8 (71.266)
Context Switches 39929.6 (4485.13)
Sleeps 42583.7 (373.039)
This patch allows the user to control the allocation of PTEs in
highmem from the command line ("userpte=nohigh") but retains the
status-quo as the default.
It is possible that some simple heuristic could be developed which
allows auto-tuning of this option however I don't have a sufficiently
large machine available to me to perform any particularly meaningful
experiments. We could probably handwave up an argument for a threshold
at 16G of total RAM.
Assuming 768M of lowmem we have 196608 potential lowmem PTE
pages. Each page can map 2M of RAM in a PAE-enabled configuration,
meaning a maximum of 384G of RAM could potentially be mapped using
lowmem PTEs.
Even allowing generous factor of 10 to account for other required
lowmem allocations, generous slop to account for page sharing (which
reduces the total amount of RAM mappable by a given number of PT
pages) and other innacuracies in the estimations it would seem that
even a 32G machine would not have a particularly pressing need for
highmem PTEs. I think 32G could be considered to be at the upper bound
of what might be sensible on a 32 bit machine (although I think in
practice 64G is still supported).
It's seems questionable if HIGHPTE is even a win for any amount of RAM
you would sensibly run a 32 bit kernel on rather than going 64 bit.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
nr_legacy_irqs and its ilk have moved to legacy_pic.
-v2: there is one in ioapic_.c
Singed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B84AAC4.2020204@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Moorestown platform does not have PIT or HPET platform timers. Instead it
has a bank of eight APB timers. The number of available timers to the os
is exposed via SFI mtmr tables. All APB timer interrupts are routed via
ioapic rtes and delivered as MSI.
Currently, we use timer 0 and 1 for per cpu clockevent devices, timer 2
for clocksource.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318D2D2@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
vRTC information is obtained from SFI tables on Moorestown, this patch parses
these tables and assign the information.
Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0D@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Moorestown platform timer information is obtained from SFI FW tables.
This patch parses SFI table then assign the irq information to mp_irqs.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0B@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch added Moorestown platform specific PCI init functions.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0A@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Some ioapic extern functions are used when CONFIG_X86_IO_APIC is not
defined. We need the dummy functions to avoid a compile time error.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318DA07@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Moorestown platform needs apic ready early for the system timer irq
which is delievered via ioapic. Should not impact other platforms.
In the longer term, once ioapic setup is moved before late time init,
we will not need this patch to do early apic enabling.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D07@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The main benefit of using ACPI host bridge window information is that
we can do better resource allocation in systems with multiple host bridges,
e.g., http://bugzilla.kernel.org/show_bug.cgi?id=14183
Sometimes we need _CRS information even if we only have one host bridge,
e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/341681
Most of these systems are relatively new, so this patch turns on
"pci=use_crs" only on machines with a BIOS date of 2008 or newer.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Merge reason:
Conflicts in arch/x86/kernel/apic/io_apic.c
Resolved Conflicts:
arch/x86/kernel/apic/io_apic.c
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Merge reason: conflict in arch/x86/kernel/apic/io_apic.c
Resolved Conflicts:
arch/x86/kernel/apic/io_apic.c
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies. We do this via make_coherent() by making the pages
uncacheable.
This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().
Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():
On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
to construct a pointer to the pte again. Passing a pte_t * is much
more elegant. Maybe we might even replace the pte argument with the
pte_t?
Ben Herrenschmidt would also like the pte pointer for PowerPC:
Passing the ptep in there is exactly what I want. I want that
-instead- of the PTE value, because I have issue on some ppc cases,
for I$/D$ coherency, where set_pte_at() may decide to mask out the
_PAGE_EXEC.
So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.
Includes a fix from Stephen Rothwell:
sparc: fix fallout from update_mmu_cache API change
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The ioapic_disable_legacy() call is no longer needed for platforms do
not have legacy pic. the legacy pic abstraction has taken care it
automatically.
This patch also initialize irq-related static variables based on
information obtained from legacy_pic.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A30A7660@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch replaces legacy PIC-related global variable and functions
with the new legacy_pic abstraction.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D04@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch makes i8259A like legacy programmable interrupt controller
code into a driver so that legacy pic functions can be selected at
runtime based on platform information, such as HW subarchitecure ID.
Default structure of legacy_pic maintains the current code path for
x86pc.
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D03@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Platforms like Moorestown want to override the pcibios_fixup_irqs
default function. Add it to x86_init.pci.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D00@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Moorestown wants to reuse pcibios_init_irq but needs to provide its
own implementation of pci_enable_irq. After we distangled the init we
can move the init_irq call to x86_init and remove the pci_enable_irq
!= NULL check in pcibios_init_irq. pci_enable_irq is compile time
initialized to pirq_enable_irq and the special cases which override it
(visws and acpi) set the x86_init function pointer to noop. That
allows MSRT to override pci_enable_irq and otherwise run
pcibios_init_irq unmodified.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80CFF@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The PCI initialization in pci_subsys_init() is a mess. pci_numaq_init,
pci_acpi_init, pci_visws_init and pci_legacy_init are called and each
implementation checks and eventually modifies the global variable
pcibios_scanned.
x86_init functions allow us to do this more elegant. The pci.init
function pointer is preset to pci_legacy_init. numaq, acpi and visws
can modify the pointer in their early setup functions. The functions
return 0 when they did the full initialization including bus scan. A
non zero return value indicates that pci_legacy_init needs to be
called either because the selected function failed or wants the
generic bus scan in pci_legacy_init to happen (e.g. visws).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80CFE@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When the user enables breakpoints through dr7, he can choose
between "local" or "global" enable bits but given how linux is
implemented, both have the same effect.
That said we don't keep track how the user enabled the breakpoints
so when the user requests the dr7 value, we only translate the
"enabled" status using the global enabled bits. It means that if
the user enabled a breakpoint using the local enabled bit, reading
back dr7 will set the global bit and clear the local one.
Apps like Wine expect a full dr7 POKEUSER/PEEKUSER match for emulated
softwares that implement old reverse engineering protection schemes.
We fix that by keeping track of the whole dr7 value given by the user
in the thread structure to drop this bug. We'll think about
something more proper later.
This fixes a 2.6.32 - 2.6.33-x ptrace regression.
Reported-and-tested-by: Michael Stefaniuc <mstefani@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Maneesh Soni <maneesh@linux.vnet.ibm.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
x86/mm is on 32-rc4 and missing the spinlock namespace changes which
are needed for further commits into this topic.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Most implementations of arch_syscall_addr() are the same, so create a
default version in common code and move the one piece that differs (the
syscall table) to asm/syscall.h. New arch ports don't have to waste
time copying & pasting this simple function.
The s390/sparc versions need to be different, so document why.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1264498803-17278-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
This makes the range reservation feature available to other
architectures.
-v2: add get_max_mapped, max_pfn_mapped only defined in x86...
to fix PPC compiling
-v3: according to hpa, add CONFIG_HAVE_EARLY_RES
-v4: fix typo about EARLY_RES in config
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B7B5723.4070009@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
For some reason the 64-bit tree was doing this differently and
I can't see why it would need to.
This correct behaviour when you have two GPUs plugged in and
32-bit put the console in one place and 64-bit in another.
Signed-off-by: Dave Airlie <airlied@redhat.com>
LKML-Reference: <1262847894-27498-1-git-send-email-airlied@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The 64-bit version of ELF_PLAT_INIT() clears TIF_IA32, but at this point
it has already been cleared by SET_PERSONALITY == set_personality_64bit.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The atomic ops emulation for 32bit legacy CPUs floods the tracer with
irq off/on entries. The irq disabled regions are short and therefor
not interesting when chasing long irq disabled latencies. Mark them
raw and keep them out of the trace.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
numa=fake=N specifies the number of fake nodes, N, to partition the
system into and then allocates them by interleaving over physical nodes.
This requires knowledge of the system capacity when attempting to
allocate nodes of a certain size: either very large nodes to benchmark
scalability of code that operates on individual nodes, or very small
nodes to find bugs in the VM.
This patch introduces numa=fake=<size>[MG] so it is possible to specify
the size of each node to allocate. When used, nodes of the size
specified will be allocated and interleaved over the set of physical
nodes.
FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
include/asm/numa_64.h.
Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch adds code to cpu initialization path to detect
the extended virtualization features of AMD cpus to show
them in /proc/cpuinfo.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <1260792521-15212-1-git-send-email-joerg.roedel@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The Intel Architecture Optimization Reference Manual states that a short
load that follows a long store to the same object will suffer a store
forwading penalty, particularly if the two accesses use different addresses.
Trivially, a long load that follows a short store will also suffer a penalty.
__downgrade_write() in rwsem incurs both penalties: the increment operation
will not be able to reuse a recently-loaded rwsem value, and its result will
not be reused by any recently-following rwsem operation.
A comment in the code states that this is because 64-bit immediates are
special and expensive; but while they are slightly special (only a single
instruction allows them), they aren't expensive: a test shows that two loops,
one loading a 32-bit immediate and one loading a 64-bit immediate, both take
1.5 cycles per iteration.
Fix this by changing __downgrade_write to use the same add instruction on
i386 and on x86_64, so that it uses the same operand size as all the other
rwsem functions.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1266049992-17419-1-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
... so we can move early_res up.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-27-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
... in preparation of moving early_res to kernel/.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-26-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Makes early_res.c more clean, so later could move it to /kernel.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-23-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
... to make e820.c smaller.
-v2: fix 32bit compiling with MAX_DMA32_PFN
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-21-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Finally we can use early_res to replace bootmem for x86_64 now.
Still can use CONFIG_NO_BOOTMEM to enable it or not.
-v2: fix 32bit compiling about MAX_DMA32_PFN
-v3: folded bug fix from LKML message below
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B747239.4070907@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add the xstate regset support which helps extend the kernel ptrace and the
core-dump interfaces to support AVX state etc.
This regset interface is designed to support all the future state that gets
supported using xsave/xrstor infrastructure.
Looking at the memory layout saved by "xsave", one can't say which state
is represented in the memory layout. This is because if a particular state is
in init state, in the xsave hdr it can be represented by bit '0'. And hence
we can't really say by the xsave header wether a state is in init state or
the state is not saved in the memory layout.
And hence the xsave memory layout available through this regset
interface uses SW usable bytes [464..511] to convey what state is represented
in the memory layout.
First 8 bytes of the sw_usable_bytes[464..467] will be set to OS enabled xstate
mask(which is same as the 64bit mask returned by the xgetbv's xCR0).
The note NT_X86_XSTATE represents the extended state information in the
core file, using the above mentioned memory layout.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100211195614.802495327@sbs-t61.sc.intel.com>
Signed-off-by: Hongjiu Lu <hjl.tools@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
64bit NUMA already make enough space under 4G with new early_node_mem.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-16-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Thomas Renninger <trenn@suse.de> reported on IBM x3330
booting a latest kernel on this machine results in:
PCI: PCI BIOS revision 2.10 entry at 0xfd61c, last bus=1
PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0
ACPI: SCI (IRQ30) allocation failed
ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20090903/evevent-161)
ACPI: Unable to start the ACPI Interpreter
Later all kind of devices fail...
and bisect it down to this commit:
commit b9c61b7007
x86/pci: update pirq_enable_irq() to setup io apic routing
it turns out we need to set irq routing for the sci on ioapic1 early.
-v2: make it work without sparseirq too.
-v3: fix checkpatch.pl warning, and cc to stable
Reported-by: Thomas Renninger <trenn@suse.de>
Bisected-by: Thomas Renninger <trenn@suse.de>
Tested-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-2-git-send-email-yinghai@kernel.org>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Both x86-32 and x86-64 with 32-bit compat use ARCH_DLINFO_IA32,
which defines two saved_auxv entries. But system.h only defines
AT_VECTOR_SIZE_ARCH as 2 for CONFIG_IA32_EMULATION, not for
CONFIG_X86_32. Fix that.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
LKML-Reference: <20100209023502.GA15408@us.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add function to direct Legacy VGA I/O traffic to correct I/O Hub.
Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <201002022238.o12McEbi018727@imap1.linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
io_32.h and io_64.h are now identical. Merge them into io.h.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-8-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Always make it an inline instead of using a macro for the no-op case.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-7-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Iomem has no special significance on x86. Use the standard mem*
functions instead of trying to call other versions. Some fixups
are needed to match the function prototypes.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Copied from io_32.h.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Make it more similar to io_32.h. No real code changes.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
These were leftover from the numaq support that was removed in commit
1fba38703d.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The XQUAD stuff is part of the NUMAQ architecture, so move it there.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Introducing *_text_reserved functions for checking the text
address range is partially reserved or not. This patch provides
checking routines for x86 smp alternatives and dynamic ftrace.
Since both functions modify fixed pieces of kernel text, they
should reserve and protect those from other dynamic text
modifier, like kprobes.
This will also be extended when introducing other subsystems
which modify fixed pieces of kernel text. Dynamic text modifiers
should avoid those.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <20100202214911.4694.16587.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The generic resource based page_is_ram() works better with memory
hotplug/hotremove. So switch the x86 e820map based code to it.
CC: Andi Kleen <andi@firstfloor.org>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
LKML-Reference: <20100122033004.470767217@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/agp: Fix agp_amd64_init regression
x86: Add quirk for Intel DG45FC board to avoid low memory corruption
x86: Add Dell OptiPlex 760 reboot quirk
x86, UV: Fix RTC latency bug by reading replicated cachelines
oprofile/x86: add Xeon 7500 series support
oprofile/x86: fix crash when profiling more than 28 events
lib/dma-debug.c: mark file-local struct symbol static.
x86/amd-iommu: Fix deassignment of a device from the pt_domain
x86/amd-iommu: Fix IOMMU-API initialization for iommu=pt
x86/amd-iommu: Fix NULL pointer dereference in __detach_device()
x86/amd-iommu: Fix possible integer overflow
Now that the previous commit made it possible to do the personality
setting at the point of no return, we do just that for ELF binaries.
And suddenly all the reasons for that insane TIF_ABI_PENDING bit go
away, and we can just make SET_PERSONALITY() just do the obvious thing
for a 32-bit compat process.
Everything becomes much more straightforward this way.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since constraints are specified on the event number, not number
and unit mask shorten the constraint masks so that we'll
actually match something.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <20100127221121.967610372@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch improves event scheduling by maximizing the use of PMU
registers regardless of the order in which events are created in a group.
The algorithm takes into account the list of counter constraints for each
event. It assigns events to counters from the most constrained, i.e.,
works on only one counter, to the least constrained, i.e., works on any
counter.
Intel Fixed counter events and the BTS special event are also handled via
this algorithm which is designed to be fairly generic.
The patch also updates the validation of an event to use the scheduling
algorithm. This will cause early failure in perf_event_open().
The 2nd version of this patch follows the model used by PPC, by running
the scheduling algorithm and the actual assignment separately. Actual
assignment takes place in hw_perf_enable() whereas scheduling is
implemented in hw_perf_group_sched_in() and x86_pmu_enable().
Signed-off-by: Stephane Eranian <eranian@google.com>
[ fixup whitespace and style nits as well as adding is_x86_event() ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4b5430c6.0f975e0a.1bf9.ffff85fe@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Clear the reserved bits from the stored copy of debug status
register (DR6).
This will help easy bitwise operations such as quick testing
of a debug event origin.
Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20100128111401.GB13935@in.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
CONFIG_X86_CPU_DEBUG, which provides some parsed versions of the x86
CPU configuration via debugfs, has caused boot failures on real
hardware. The value of this feature has been marginal at best, as all
this information is already available to userspace via generic
interfaces.
Causes crashes that have not been fixed + minimal utility -> remove.
See the referenced LKML thread for more information.
Reported-by: Ozan Çağlayan <ozan@pardus.org.tr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <alpine.LFD.2.00.1001221755320.13231@localhost.localdomain>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@kernel.org>
Commit d1c84f79a6
leads to a regression when microcode_amd.c is compiled into the kernel.
It causes a big boot delay because the firmware is not available.
See http://marc.info/?l=linux-kernel&m=126267290920060
It also renders the reload sysfs attribute useless.
Fixing this is too intrusive for an -rc5 kernel.
Thus I'd like to restore the microcode loading behaviour of kernel
2.6.32.
CC: Gene Heskett <gene.heskett@verizon.net>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100122203456.GB13792@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
side-effects on floppy DMA. Do not use HPET MSI on such platforms.
Original problem report from Mark Hounschell
http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html
[ This patch needs to go to stable as well. But, there are some
conflicts that prevents the patch from going as is. I can
rebase/resubmit to stable once the patch goes upstream.
hpa: still Cc:'ing stable@ as an FYI. ]
Tested-by: Mark Hounschell <markh@compro.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
LKML-Reference: <20100121190952.GA32523@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add wbinvd_on_cpu and wbinvd_on_all_cpus stubs for executing wbinvd on a
particular CPU.
[ hpa: renamed lib/smp.c to lib/cache-smp.c ]
[ hpa: wbinvd_on_all_cpus() returns int, but wbinvd() returns
void. Thus, the former cannot be a macro for the latter,
replace with an inline function. ]
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1264172467-25155-2-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch moves the initialization of the iommu-api out of
the dma-ops initialization code. This ensures that the
iommu-api is initialized even with iommu=pt.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Propagate the ANY bit into the fixed counter config for v3 and higher.
Signed-off-by: Stephane Eranian <eranian@google.com>
[a.p.zijlstra@chello.nl: split from larger patch]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4b5430c6.0f975e0a.1bf9.ffff85fe@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently IRQ0..IRQ15 are assigned to IRQ0_VECTOR..IRQ15_VECTOR's on
all the cpu's.
If these IRQ's are handled by legacy pic controller, then the kernel
handles them only on cpu 0. So there is no need to block this vector
space on all cpu's.
Similarly if these IRQ's are handled by IO-APIC, then the IRQ affinity
will determine on which cpu's we need allocate the vector resource for
that particular IRQ. This can be done dynamically and here also there
is no need to block 16 vectors for IRQ0..IRQ15 on all cpu's.
Fix this by initially assigning IRQ0..IRQ15 to IRQ0_VECTOR..IRQ15_VECTOR's only
on cpu 0. If the legacy controllers like pic handles these irq's, then
this configuration will be fixed. If more modern controllers like IO-APIC
handle these IRQ's, then we start with this configuration and as IRQ's
migrate, vectors (/and cpu's) associated with these IRQ's change dynamically.
This will freeup the block of 16 vectors on other cpu's which don't handle
IRQ0..IRQ15, which can now be used for other IRQ's that the particular cpu
handle.
[ hpa: this also an architectural cleanup for future legacy-PIC-free
configurations. ]
[ hpa: fixed typo NR_LEGACY_IRQS -> NR_IRQS_LEGACY ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1263932453.2814.52.camel@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
For x86-64, 32767 threads really is not enough. Change rwsem_count_t
to a signed long, so that it is 64 bits on x86-64.
This required the following changes to the assembly code:
a) %z0 doesn't work on all versions of gcc! At least gcc 4.4.2 as
shipped with Fedora 12 emits "ll" not "q" for 64 bits, even for
integer operands. Newer gccs apparently do this correctly, but
avoid this problem by using the _ASM_ macros instead of %z.
b) 64 bits immediates are only allowed in "movq $imm,%reg"
constructs... no others. Change some of the constraints to "e",
and fix the one case where we would have had to use an invalid
immediate -- in that case, we only care about the upper half
anyway, so just access the upper half.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <tip-bafaecd11df15ad5b1e598adc7736afcd38ee13d@git.kernel.org>
After talking to some more folks inside intel (Peter Anvin, Asit Mallick),
the safest option (for future compatibility etc) seen was to use vector 0x20
for IRQ_MOVE_CLEANUP_VECTOR instead of using vector 0x1f (which is documented as
reserved vector in the Intel IA32 manuals).
Also we don't need to reserve the entire privilege level (all 16 vectors in
the priority bucket that IRQ_MOVE_CLEANUP_VECTOR falls into), as the
x86 architecture (section 10.9.3 in SDM Vol3a) specifies that with in the
priority level, the higher the vector number the higher the priority.
And hence we don't need to reserve the complete priority level 0x20-0x2f for
the IRQ migration cleanup logic.
So change the IRQ_MOVE_CLEANUP_VECTOR to 0x20 and allow 0x21-0x2f to be used
for device interrupts. 0x30-0x3f will be used for ISA interrupts (these
also can be migrated in the context of IOAPIC and hence need to be at a higher
priority level than IRQ_MOVE_CLEANUP_VECTOR).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100114002118.521826763@sbs-t61.sc.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, uv: Ensure hub revision set for all ACPI modes.
x86, uv: Add function retrieving node controller revision number
x86: xen: 64-bit kernel RPL should be 0
x86: kernel_thread() -- initialize SS to a known state
x86/agp: Fix agp_amd64_init and agp_amd64_cleanup
x86: SGI UV: Fix mapping of MMIO registers
x86: mce.h: Fix warning in header checks
Add function for determining the revision id of the SGI UV
node controller chip (HUB). This function is needed in a
subsequent patch.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100112210904.GA24546@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The fast version of the rwsems (the code that uses xadd) has
traditionally only worked on x86-32, and as a result it mixes different
kinds of types wildly - they just all happen to be 32-bit. We have
"long", we have "__s32", and we have "int".
To make it work on x86-64, the types suddenly matter a lot more. It can
be either a 32-bit or 64-bit signed type, and both work (with the caveat
that a 32-bit counter will only have 15 bits of effective write
counters, so it's limited to 32767 users). But whatever type you
choose, it needs to be used consistently.
This makes a new 'rwsem_counter_t', that is a 32-bit signed type. For a
64-bit type, you'd need to also update the BIAS values.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121755220.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Someone isn't reading their build output: Move the definition
out of the exported header.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: linux-kernel@vger.kernelorg
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Because of dropping function argument syntax from kprobe-tracer,
we don't need this API anymore.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@ozlabs.org
LKML-Reference: <20100105224656.19431.92588.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The check that ignores the debug and nmi stack frames is useless
now that we have a frame pointer that makes us start at the
right place. We don't anymore have to deal with these.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1262235183-5320-2-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes gcc use the right register names and instruction operand sizes
automatically for the rwsem inline asm statements.
So instead of using "(%%eax)" to specify the memory address that is the
semaphore, we use "(%1)" or similar. And instead of forcing the operation
to always be 32-bit, we use "%z0", taking the size from the actual
semaphore data structure itself.
This doesn't actually matter on x86-32, but if we want to use the same
inline asm for x86-64, we'll need to have the compiler generate the proper
64-bit names for the registers (%rax instead of %eax), and if we want to
use a 64-bit counter too (in order to avoid the 15-bit limit on the
write counter that limits concurrent users to 32767 threads), we'll need
to be able to generate instructions with "q" accesses rather than "l".
Since this header currently isn't enabled on x86-64, none of that matters,
but we do want to use the xadd version of the semaphores rather than have
to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended
when you have lots of threads all taking page faults, and the fallback
rwsem code that uses a spinlock performs abysmally badly in that case.
[ hpa: modified the patch to skip size suffixes entirely when they are
redundant due to register operands. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121613560.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86, irq: Check move_in_progress before freeing the vector mapping
x86: copy_from_user() should not return -EFAULT
Revert "x86: Side-step lguest problem by only building cmpxchg8b_emu for pre-Pentium"
x86/pci: Intel ioh bus num reg accessing fix
x86: Fix size for ex trampoline with 32bit
Fix bug in uv_global_gru_mmr_address macro. Macro failed
to cast an int value to a long prior to a left shift > 32.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100107161240.GA2610@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Merge the now identical code from asm/atomic_32.h and asm/atomic_64.h
into asm/atomic.h.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1262883215-4034-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Prepare for merging into asm/atomic.h.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1262883215-4034-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Split atomic64_t functions out into separate headers, since they will
not be practical to merge between 32 and 64 bits.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1262883215-4034-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Callers of copy_from_user() expect it to return the number of bytes
it could not copy. In no case it is supposed to return -EFAULT.
In case of a detected buffer overflow just return the requested
length. In addition one could think of a memset that would clear
the size of the target object.
[ hpa: code is not in .32 so not needed for -stable ]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20100105131911.GC5480@osiris.boeblingen.de.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Optimize code generated for percpu access by checking for increment and
decrements.
tj: fix incorrect usage of __builtin_constant_p() and restructure
percpu_add_op() macro.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
We want to use a vector-assignment sequence that avoids stumbling onto
0x80 earlier in the sequence, in order to improve the spread of
vectors across priority levels on machines with a small number of
interrupt sources. Right now, this is done by simply making the first
vector (0x31 or 0x41) completely unusable. This is unnecessary; all
we need is to start assignment at a +1 offset, we don't actually need
to prohibit the usage of this vector once we have wrapped around.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4B426550.6000209@kernel.org>
Reclaim 16 IDT vectors and make them available for general allocation.
Reclaim vectors 0x20-0x2f by reallocating the IRQ_MOVE_CLEANUP_VECTOR
to vector 0x1f. This is in the range of vector numbers that is
officially reserved for the CPU (for exceptions), however, the use of
the APIC to generate any vector 0x10 or above is documented, and the
CPU internally can receive any vector number (the legacy BIOS uses INT
0x08-0x0f for interrupts, as messed up as that is.)
Since IRQ_MOVE_CLEANUP_VECTOR has to be alone in the lowest-numbered
priority level (block of 16), this effectively enables us to reclaim
an otherwise-unusable APIC priority level and put it to use.
Since this is a transient kernel-only allocation we can change it at
any time, and if/when there is an exception at vector 0x1f this
assignment needs to be changed as part of OS enabling that new feature.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B4284C6.9030107@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/agp: Fix agp_amd64_init() initialization with CONFIG_GART_IOMMU enabled
x86: SGI UV: Fix writes to led registers on remote uv hubs
x86, kmemcheck: Use KERN_WARNING for error reporting
x86: Use KERN_DEFAULT log-level in __show_regs()
x86, compress: Force i386 instructions for the decompressor
x86/amd-iommu: Fix initialization failure panic
dma-debug: Do not add notifier when dma debugging is disabled.
x86: Fix objdump version check in chkobjdump.awk for different formats.
Trivial conflicts in arch/x86/include/asm/uv/uv_hub.h due to me having
applied an earlier version of an SGI UV fix.