Impact: bogus irq_cpustat field removed
idle_timestamp is left over from the removed irqbalance code.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
pte_flags() was introduced as a new pvop in order to extract just the
flags portion of a pte, which is a potentially cheaper operation than
extracting the page number as well. It turns out this operation is
not needed, because simply using a mask to extract the flags from a
pte is sufficient for all current users.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Fix debugobjects warning
debugobject enabled kernels spit out a warning in hpet code due to a
workqueue which is initialized on stack.
Add INIT_WORK_ON_STACK() which calls init_timer_on_stack() and use it
in hpet.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: Fixes crashes with misconfigured BIOSes on XSAVE hardware
Avuton Olrich reported early boot crashes with v2.6.28 and
bisected it down to dc1e35c6e9
("x86, xsave: enable xsave/xrstor on cpus with xsave support").
If the CPUID limit bit in MSR_IA32_MISC_ENABLE is set, clear it to
make all CPUID information available. This is required for some
features to work, in particular XSAVE.
Reported-and-bisected-by: Avuton Olrich <avuton@gmail.com>
Tested-by: Avuton Olrich <avuton@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Make X86 SGI Ultraviolet support configurable. Saves about 13K of text size
on my modest config.
text data bss dec hex filename
6770537 1158680 694356 8623573 8395d5 vmlinux
6757492 1157664 694228 8609384 835e68 vmlinux.nouv
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
while looking at:
http://bugzilla.kernel.org/show_bug.cgi?id=11541
I realized that the mtrr.show param cannot work, because
the code is processed much too early.
This patch:
- Declares mtrr.show as early_param
- Stays consistent with the previous param (which I doubt
that it ever worked), so mtrr.show=1 would still work
- Declares mtrr_show as initdata
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Now that it's unified, move the (SMP) TLB flushing code from arch/x86/kernel/
to arch/x86/mm/, where it belongs logically.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/x86/kernel/tlb_32.c
Merge it here because both the cpumask changes and the ongoing percpu
work is touching the TLB code. The percpu changes take precedence, as
they eliminate tlb_32.c altogether.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 4217458daf.
Justin Madru bisected this commit, it was causing weird Firefox
crashes.
The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of
the calling frame, which corrupts the syscall return pt_regs state and
thus corrupts user-space register state.
So we go back to the slightly less clean but more optimization-safe
method of getting to pt_regs. Also add a comment to explain this.
Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505
Reported-and-bisected-by: Justin Madru <jdm64@gawab.com>
Tested-by: Justin Madru <jdm64@gawab.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: less contention when issuing invalidate IPI, cleanup
Make x86_32 use the same tlb code as 64bit. The 64bit code uses
multiple IPI vectors for tlb shootdown to reduce contention. This
patch makes x86_32 allocate the same 8 IPIs as x86_64 and share the
code paths.
Note that the usage of asmlinkage is inconsistent for x86_32 and 64
and calls for further cleanup. This has been noted with a FIXME
comment in tlb_64.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: clean up, ipi vector number reordering for x86_32
Make the following changes to prepare for tlb merge.
* reorder x86_32 ip vectors
* adjust tlb_32.c and tlb_64.c such that their logics coincide exactly
- on spurious invalidate ipi, tlb_32 acks the irq
- tlb_64 now has proper memory barriers around clearing
flush_cpumask (no change in generated code)
* unexport flush_tlb_page from tlb_32.c, there's no user
* use unsigned int for cpu id
* drop unnecessary includes from tlb_64.c
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Make the following uv related cleanups.
* collect visible uv related definitions and interfaces into uv/uv.h
and use it. this cleans up the messy situation where on 64bit, uv
is defined properly, on 32bit generic it's dummy and on the rest
undefined. after this clean up, uv is defined on 64 and dummy on
32.
* update uv_flush_tlb_others() such that it takes cpumask of
to-be-flushed cpus as argument, instead of that minus self, and
returns yet-to-be-flushed cpumask, instead of modifying the passed
in parameter. this interface change will ease dummy implementation
of uv_flush_tlb_others() and makes uv tlb flush related stuff
defined in tlb_uv proper.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup, better irq_regs code generation for x86_64
Make 64-bit use the same optimizations as 32-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
%fs is currently set to __KERNEL_DS at boot, and conditionally
switched to __KERNEL_PERCPU for secondary cpus. Instead, initialize
GDT_ENTRY_PERCPU to the same attributes as GDT_ENTRY_KERNEL_DS and
set %fs to __KERNEL_PERCPU unconditionally.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup && more compact percpu area layout with future changes
Move 64-bit GDT to page-aligned section and clean up comment
formatting.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Remove byte locks implementation, which was introduced by Jeremy in
8efcbab6 ("paravirt: introduce a "lock-byte" spinlock implementation"),
but turned out to be dead code that is not used by any in-kernel
virtualization guest (Xen uses its own variant of spinlocks implementation
and KVM is not planning to move to byte locks).
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: x86_64 percpu area layout change, irq_stack now at the beginning
Now that the PDA is empty except for the stack canary, it can be removed.
The irqstack is moved to the start of the per-cpu section. If the stack
protector is enabled, the canary overlaps the bottom 48 bytes of the irqstack.
tj: * updated subject
* dropped asm relocation of irq_stack_ptr
* updated comments a bit
* rebased on top of stack canary changes
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Use cpu_number to determine if the adjustment is necessary.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Copy the code to cpu_init() to satisfy the requirement that the cpu
be reinitialized. Remove all other calls, since the segments are
already initialized in head_64.S.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Make the following cleanups.
* remove duplicate comment from boot_init_stack_canary() which fits
better in the other place - cpu_idle().
* move stack_canary offset check from __switch_to() to
boot_init_stack_canary().
Signed-off-by: Tejun Heo <tj@kernel.org>
-tip testing found this crash:
> [ 35.258515] calling acpi_cpufreq_init+0x0/0x127 @ 1
> [ 35.264127] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 35.267554] IP: [<ffffffff80478092>] __bitmap_intersects+0x48/0x73
> [ 35.267554] PGD 0
> [ 35.267554] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c is still broken: there's no
allocation of the variable mask, so we pass in an uninitialized cmd.mask
field to drv_read(), which then passes it to the scheduler which then
crashes ...
Switch it over to the much simpler constant-cpumask-pointers approach.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use new work_on_cpu function to reduce stack usage
Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().
Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.
Note: This patch basically reverts 50c668d6 which reverted 7503bfba, now
that the work_on_cpu() function is more stable.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Dieter Ries <clip2@gmx.de>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <cpufreq@vger.kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since commit 75c46fa, "x64, x2apic/intr-remap: MSI and MSI-X
support for interrupt remapping infrastructure", x86 has had an
implementation of arch_setup_msi_irqs().
That implementation does not call arch_setup_msi_irq(), instead it calls
setup_irq(). No other x86 code calls arch_setup_msi_irq().
That leaves only arch_setup_msi_irqs() in drivers/pci/msi.c, but that
routine is overridden by the x86 version of arch_setup_msi_irqs().
So arch_setup_msi_irq() is dead code, remove it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The function setup_cpu_local_masks() has been marked __init, in
order to remove the following section mismatch messages:
WARNING: vmlinux.o(.text+0x3c2c7): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
WARNING: vmlinux.o(.text+0x3c2d3): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
WARNING: vmlinux.o(.text+0x3c2df): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
WARNING: vmlinux.o(.text+0x3c2eb): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
The function setup_cpu_local_masks() references
the function __init alloc_bootmem_cpumask_var().
This is often because setup_cpu_local_masks lacks a __init
annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
Signed-off-by: Leonardo Potenza <lpotenza@inwind.it>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add debug warning
Fire off one message if two apic's discovered with different
apic versions. (this code is only called during CPU init)
The goal of this is to pave the way of the removal of the apic_version[]
array. We dont expect any apic version incompatibilities in the x86
landscape of systems [if so we dont handle them very well and probably
never will handle deep apic version assymetries well], but it's prudent
to have a debug check for one kernel cycle nevertheless.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
tj: * in asm-offsets_64.c, pda.h inclusion shouldn't be removed as pda
is still referenced in the file
* s/oldrsp/old_rsp/
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Also clean up PER_CPU_VAR usage in xen-asm_64.S
tj: * remove now unused stack_thread_info()
* s/kernelstack/kernel_stack/
* added FIXME comment in xen-asm_64.S
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
tj: moved cpu_number definition out of CONFIG_HAVE_SETUP_PER_CPU_AREA
for voyager.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Move the exception stacks to per-cpu, removing specific allocation code.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Move the irqstackptr variable from the PDA to per-cpu. Make the
stacks themselves per-cpu, removing some specific allocation code.
Add a seperate flag (is_boot_cpu) to simplify the per-cpu boot
adjustments.
tj: * sprinkle some underbars around.
* irq_stack_ptr is not used till traps_init(), no reason to
initialize it early. On SMP, just leaving it NULL till proper
initialization in setup_per_cpu_areas() works. Dropped
is_boot_cpu and early irq_stack_ptr initialization.
* do DECLARE/DEFINE_PER_CPU(char[IRQ_STACK_SIZE], irq_stack)
instead of (char, irq_stack[IRQ_STACK_SIZE]).
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: use new work_on_cpu function to reduce stack usage
Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
a work_on_cpu function for drv_read() and drv_write().
Basically converts do_drv_{read,write} into "work_on_cpu" functions that
are now called by drv_read and drv_write.
Note: This patch basically reverts 50c668d6 which reverted 7503bfba, now
that the work_on_cpu() function is more stable.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Dieter Ries <clip2@gmx.de>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <cpufreq@vger.kernel.org>
Check CONFIG_FREEZER instead of CONFIG_PM because kprobe booster
depends on freeze_processes() and thaw_processes() when CONFIG_PREEMPT=y.
This fixes a linkage error which occurs when CONFIG_PREEMPT=y, CONFIG_PM=y
and CONFIG_FREEZER=n.
Reported-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Len Brown <len.brown@intel.com>
On x86_64, if get_per_cpu_var() is used before per cpu area is setup
(if lockdep is turned on, it happens), it needs this_cpu_off to point
to __per_cpu_load. Initialize accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
It is an optimization and a cleanup, and adds the following new
generic percpu methods:
percpu_read()
percpu_write()
percpu_add()
percpu_sub()
percpu_and()
percpu_or()
percpu_xor()
and implements support for them on x86. (other architectures will fall
back to a default implementation)
The advantage is that for example to read a local percpu variable,
instead of this sequence:
return __get_cpu_var(var);
ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx
ffffffff8102ca32: 81
ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax
ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax
We can get a single instruction by using the optimized variants:
return percpu_read(var);
ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax
I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.
tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
* added percpu_and() for completeness's sake
* made generic percpu ops atomic against preemption
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
Do the following cleanups:
* kill x86_64_init_pda() which now is equivalent to pda_init()
* use per_cpu_offset() instead of cpu_pda() when initializing
initial_gs
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pda is now a percpu variable and there's no reason it can't use plain
x86 percpu accessors. Add x86_test_and_clear_bit_percpu() and replace
pda op implementations with wrappers around x86 percpu accessors.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
As pda is now allocated in percpu area, it can easily be made a proper
percpu variable. Make it so by defining per cpu symbol from linker
script and declaring it in C code for SMP and simply defining it for
UP. This change cleans up code and brings SMP and UP closer a bit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that pda is allocated as part of percpu, percpu doesn't need to be
accessed through pda. Unify x86_64 SMP percpu access with x86_32 SMP
one. Other than the segment register, operand size and the base of
percpu symbols, they behave identical now.
This patch replaces now unnecessary pda->data_offset with a dummy
field which is necessary to keep stack_canary at its place. This
patch also moves per_cpu_offset initialization out of init_gdt() into
setup_per_cpu_areas(). Note that this change also necessitates
explicit per_cpu_offset initializations in voyager_smp.c.
With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
x86_32 and also x86_64 can use assembly PER_CPU macros.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
Currently pdas and percpu areas are allocated separately. %gs points
to local pda and percpu area can be reached using pda->data_offset.
This patch folds pda into percpu area.
Due to strange gcc requirement, pda needs to be at the beginning of
the percpu area so that pda->stack_canary is at %gs:40. To achieve
this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
added and used to reserve pda sized chunk at the start of the percpu
area.
After this change, for boot cpu, %gs first points to pda in the
data.init area and later during setup_per_cpu_areas() gets updated to
point to the actual pda. This means that setup_per_cpu_areas() need
to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
already has modified it when control reaches setup_per_cpu_areas().
This patch also removes now unnecessary get_local_pda() and its call
sites.
A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
_cpu_pda array first uses statically allocated storage in data.init
and then switches to allocated bootmem to conserve space. However,
after folding pda area into percpu area, _cpu_pda array will be
removed completely. Drop the reallocation part to simplify the code
for soon-to-follow changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
CPU startup code in head_64.S loaded address of a zero page into %gs
for temporary use till pda is loaded but address to the actual pda is
available at the point. Load the real address directly instead.
This will help unifying percpu and pda handling later on.
This patch is mostly taken from Mike Travis' "x86_64: Fold pda into
per cpu area" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
This patch makes percpu symbols zerobased on x86_64 SMP by adding
PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
the percpu output section and using it in vmlinux_64.lds.S. A new
PHDR is added as existing ones cannot contain sections near address
zero. PERCPU_VADDR() also adds a new symbol __per_cpu_load which
always points to the vaddr of the loaded percpu data.init region.
The following adjustments have been made to accomodate the address
change.
* code to locate percpu gdt_page in head_64.S is updated to add the
load address to the gdt_page offset.
* __per_cpu_load is used in places where access to the init data area
is necessary.
* pda->data_offset is initialized soon after C code is entered as zero
value doesn't work anymore.
This patch is mostly taken from Mike Travis' "x86_64: Base percpu
variables at zero" patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make vmlinux_32.lds.S use the generic PERCPU() macro instead of open
coding it. This will ease future changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Based on original patch from Christoph Lameter and Mike Travis. ]
* Ruggedize some calls in setup_percpu.c to prevent mishaps
in early calls, particularly for non-critical functions.
* Cleanup DEBUG_PER_CPU_MAPS usages and some comments.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make early_per_cpu() a lvalue as per_cpu() is and use it where
applicable.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The function uv_wait_completion() spins on reads of a memory-mapped
register, waiting for completion of BAU hardware replies.
It should call "cpu_relax()" between those reads to improve performance
on hyperthreaded configurations.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
-tip testing found this crash:
> [ 35.258515] calling acpi_cpufreq_init+0x0/0x127 @ 1
> [ 35.264127] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 35.267554] IP: [<ffffffff80478092>] __bitmap_intersects+0x48/0x73
> [ 35.267554] PGD 0
> [ 35.267554] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c is still broken: there's no
allocation of the variable mask, so we pass in an uninitialized cmd.mask
field to drv_read(), which then passes it to the scheduler which then
crashes ...
Switch it over to the much simpler constant-cpumask-pointers approach.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: widen the effect of the 'nolapic' boot parameter
"nolapic" should not only suppress SMP and use of the LAPIC, but it
also ought to have the effect of disabling all IO-APIC related activity
as well as PCI MSI and HT-IRQs.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: micro-optimization, memory reduction
On x86_64 flush tlb data is stored in per_cpu variables. This is
unnecessary because only the first NUM_INVALIDATE_TLB_VECTORS entries
are accessed.
This patch aims at making the code less confusing (there's nothing
really "per_cpu") by using a plain array. It also would save some memory
on most distros out there (Ubuntu x86_64 has NR_CPUS=64 by default).
[ Ravikiran G Thirumalai also pointed out that the correct alignment
is ____cacheline_internodealigned_in_smp, so that there's no
bouncing on vsmp. ]
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@xprog.eu>
Acked-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This commit broke flush_tlb_others_ipi() causing boot hangs on a
16 logical cpu system:
> commit 4595f9620c
> Author: Rusty Russell <rusty@rustcorp.com.au>
> Date: Sat Jan 10 21:58:09 2009 -0800
>
> x86: change flush_tlb_others to take a const struct cpumask
This change resulted in sending the invalidate tlb vector to the
sender itself causing the hang. flush_tlb_others_ipi() should exclude
the sender itself from the destination list.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix potential boot crash on MAXSMP
Remove code left over by:
50c668d: Revert "cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read
That cmd.cpumask is not allocated anymore. No impact on default !MAXSMP
kernels.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit e0c7317557.
This patch was wrong, as lockdep (and thus the irq state tracer)
aren't nmi safe. People are already seeing lockdep warnings due
to this.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 7503bfbae8.
Dieter Ries reported bootup soft-hangs and bisected it back to
this commit, and reverting this commit gave him a working system.
The commit introduces work_on_cpu() use into the cpufreq code,
but that is subtly problematic from a lock hierarchy POV: the
hotplug-cpu lock is an highlevel lock that is taken before
lowlevel locks, and in this codepath we are called with the
policy lock taken.
Dieter did not have lockdep enabled so we dont have a nice stack
trace proof for this, but using work_on_cpu() in such a lowlevel
place certainly looks wrong, so we revert the patch.
work_on_cpu() needs to be reworked to be more generally usable.
Reported-by: Dieter Ries <clip2@gmx.de>
Tested-by: Dieter Ries <clip2@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix this by reintroducing asm/smp.h include in apic.c - later on
I will fix this by removing non-smp data from smp.h
Also fix the __inquire_remote_apic() prototype/inline.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix this by reintroducing asm/smp.h include in mpparse.c - later on
I will fix this by removing non-smp data from smp.h.
Reported-by: Petr Titera <P.Titera@century.cz>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Alexander Beregalov reported that this warning is caused by the HPET code:
> hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
> hpet0: 3 comparators, 64-bit 14.318180 MHz counter
> ODEBUG: object is on stack, but not annotated
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:251 __debug_object_init+0x2a4/0x352()
> Bisected down to 26afe5f2fb
> (x86: HPET_MSI Initialise per-cpu HPET timers)
The commit is fine - but the on-stack workqueue entry needs annotation.
Reported-and-bisected-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Fix:
ERROR: trailing whitespace
ERROR: code indent should use tabs where possible
WARNING: %Ld/%Lu are not-standard C, use %lld/%llu
WARNING: printk() should include KERN_ facility level
ERROR: spaces required around that '=' (ctx:VxW)
total: 13 errors, 2 warnings
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: preparation, cleanup, add KERN_INFO printk
Modify references from NR_IRQS to nr_irqs as the later will become
variable-sized based on nr_cpu_ids when CONFIG_SPARSE_IRQS=y.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: reduce stack usage.
init_intel_cacheinfo() does not use the cpumask so define a subset
of struct _cpuid4_info (_cpuid4_info_regs) that can be used instead.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: Reduce memory usage, use new cpumask API.
Use cpumask_var_t for 'cpus' cpumask in struct threshold_bank and update
remaining old cpumask_t functions to new cpumask API.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: Improve tlb flush performance for UV
Calling alloc_cpumask_var a zillion times a second does affect
performance. Replace with static cpumask.
Note: when CONFIG_X86_UV is defined, this extra PER_CPU memory
will be optimized out for non-UV configs as is_uv_system() will
then return a constant 0.
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: reduce stack usage, use new cpumask API.
This is made a little more tricky by uv_flush_tlb_others which
actually alters its argument, for an IPI to be sent to the remaining
cpus in the mask.
I solve this by allocating a cpumask_var_t for this case and falling back
to IPI should this fail.
To eliminate temporaries in the caller, all flush_tlb_others implementations
now do the this-cpu-elimination step themselves.
Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)"
which has been there since pre-git and yet f->flush_cpumask is always zero
at this point.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Impact: reduce memory usage, use new cpumask API.
Replace the affinity and pending_masks with cpumask_var_t's. This adds
to the significant size reduction done with the SPARSE_IRQS changes.
The added functions (init_alloc_desc_masks & init_copy_desc_masks) are
in the include file so they can be inlined (and optimized out for the
!CONFIG_CPUMASKS_OFFSTACK case.) [Naming chosen to be consistent with
the other init*irq functions, as well as the backwards arg declaration
of "from, to" instead of the more common "to, from" standard.]
Includes a slight change to the declaration of struct irq_desc to embed
the pending_mask within ifdef(CONFIG_SMP) to be consistent with other
references, and some small changes to Xen.
Tested: sparse/non-sparse/cpumask_offstack/non-cpumask_offstack/nonuma/nosmp on x86_64
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: virtualization@lists.osdl.org
Cc: xen-devel@lists.xensource.com
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Impact: micro-optimization
The patch below removes an unnecessary locked instruction from
switch_to(). TIF_FORK is only ever set in copy_thread() on initial
process creation, and gets cleared during the first scheduling of the
process. As such, it is safe to use an unlocked test for the flag
within switch_to().
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
x86: fix section mismatch warnings in mcheck/mce_amd_64.c
x86: offer frame pointers in all build modes
x86: remove duplicated #include's
x86: k8 numa register active regions later
x86: update Alan Cox's email addresses
x86: rename all fields of mpc_table mpc_X to X
x86: rename all fields of mpc_oemtable oem_X to X
x86: rename all fields of mpc_bus mpc_X to X
x86: rename all fields of mpc_cpu mpc_X to X
x86: rename all fields of mpc_intsrc mpc_X to X
x86: rename all fields of mpc_lintsrc mpc_X to X
x86: rename all fields of mpc_iopic mpc_X to X
x86: irqinit_64.c init_ISA_irqs should be static
Documentation/x86/boot.txt: payload length was changed to payload_length
x86: setup_percpu.c fix style problems
x86: irqinit_64.c fix style problems
x86: irqinit_32.c fix style problems
x86: i8259.c fix style problems
x86: irq_32.c fix style problems
x86: ioport.c fix style problems
...
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
[IA64] fix typo in cpumask_of_pcibus()
x86: fix x86_32 builds for summit and es7000 arch's
cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
cpumask: use cpumask_var_t in acpi-cpufreq.c
cpumask: use work_on_cpu in acpi/cstate.c
cpumask: convert struct cpufreq_policy to cpumask_var_t
cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
x86: cleanup remaining cpumask_t ops in smpboot code
cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
cpumask: update local_cpus_show to use new cpumask API
ia64: cpumask fix for is_affinity_mask_valid()
We found a situation on Linus' machine that the Nvidia timer quirk hit on
a Intel chipset system. The problem is that the system has a fancy Nvidia
card with an own PCI bridge, and the early-quirks code looking for any
NVidia bridge triggered on it incorrectly. This didn't lead a boot
failure by luck, but the timer routing code selecting the wrong timer
first and some ugly messages. It might lead to real problems on other
systems.
I checked all the devices which are currently checked for by early_quirks
and it turns out they are all located in the root bus zero.
So change the early-quirks loop to only scan bus 0. This incidently also
saves quite some unnecessary scanning work, because early_quirks doesn't
go through all the non root busses.
The graphics card is not on bus 0, so it is not matched anymore.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>