Use NodeId MSR to get NodeId and number of nodes per processor.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20091216144355.GB28798@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix kprobes build with non-gawk awk
x86: Split swiotlb initialization into two stages
x86: Regex support and known-movable symbols for relocs, fix _end
x86, msr: Remove incorrect, duplicated code in the MSR driver
x86: Merge kernel_thread()
x86: Sync 32/64-bit kernel_thread
x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper
x86, 64-bit: Use user_mode() to determine new stack pointer in copy_thread()
x86, 64-bit: Move kernel_thread to C
x86-64, paravirt: Call set_iopl_mask() on 64 bits
x86-32: Avoid pipeline serialization in PTREGSCALL1 and 2
x86: Merge sys_clone
x86, 32-bit: Convert sys_vm86 & sys_vm86old
x86: Merge sys_sigaltstack
x86: Merge sys_execve
x86: Merge sys_iopl
x86-32: Add new pt_regs stubs
cpumask: Use modern cpumask style in arch/x86/kernel/cpu/mcheck/mce-inject.c
* 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
modpost: fix segfault with short symbol names
module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y
Kbuild: clear marker out of modpost
module: make MODULE_SYMBOL_PREFIX into a CONFIG option
ARM: unexport symbols used to implement floating point emulation
ARM: use unified discard definition in linker script
x86: don't export inline function
sparc64: don't export static inline pci_ functions
Use bitmap library and kill some unused iommu helper functions.
1. s/iommu_area_free/bitmap_clear/
2. s/iommu_area_reserve/bitmap_set/
3. Use bitmap_find_next_zero_area instead of find_next_zero_area
This cannot be simple substitution because find_next_zero_area
doesn't check the last bit of the limit in bitmap
4. Remove iommu_area_free, iommu_area_reserve, and find_next_zero_area
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The UV BIOS has moved the location of some of their pointers to the
"partition reserved page" from memory into a uv hub MMR. The GRU does not
support bcopy operations from MMR space so we need to special case the MMR
addresses using VLOAD operations.
Additionally, the BIOS call for registering a message queue watchlist has
removed the 'blade' value and eliminated the structure that was being
passed in. This is also reflected in this patch.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested by Roland.
Unlike powepc, x86 always calls tracehook_report_syscall_exit(step) with
step = 0, and sends the trap by hand.
This results in unnecessary SIGTRAP when PTRACE_SINGLESTEP follows the
syscall-exit stop.
Change syscall_trace_leave() to pass the correct "step" argument to
tracehook and remove the send_sigtrap() logic.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested by Roland.
Implement user_single_step_siginfo() for x86. Extract this code from
send_sigtrap().
Since x86 calls tracehook_report_syscall_exit(step => 0) the new helper is
not used yet.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
register_chrdev() hardcodes registering 256 minors, presumably to
avoid breaking old drivers. However, we need to register enough
minors so that we have all possible CPUs.
checkpatch warns on this patch, but the patch is correct: NR_CPUS here
is a static *upper bound* on the *maximum CPU index* (not *number of
CPUs!*) and that is what we want.
Reported-and-tested-by: Russ Anderson <rja@sgi.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-*@git.kernel.org>
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
clockevents: Convert to raw_spinlock
clockevents: Make tick_device_lock static
debugobjects: Convert to raw_spinlocks
perf_event: Convert to raw_spinlock
hrtimers: Convert to raw_spinlocks
genirq: Convert irq_desc.lock to raw_spinlock
smp: Convert smplocks to raw_spinlocks
rtmutes: Convert rtmutex.lock to raw_spinlock
sched: Convert pi_lock to raw_spinlock
sched: Convert cpupri lock to raw_spinlock
sched: Convert rt_runtime_lock to raw_spinlock
sched: Convert rq->lock to raw_spinlock
plist: Make plist debugging raw_spinlock aware
bkl: Fixup core_lock fallout
locking: Cleanup the name space completely
locking: Further name space cleanups
alpha: Fix fallout from locking changes
locking: Implement new raw_spinlock
locking: Convert raw_rwlock functions to arch_rwlock
locking: Convert raw_rwlock to arch_rwlock
...
Makes use of skip_spaces() defined in lib/string.c for removing leading
spaces from strings all over the tree.
It decreases lib.a code size by 47 bytes and reuses the function tree-wide:
text data bss dec hex filename
64688 584 592 65864 10148 (TOTALS-BEFORE)
64641 584 592 65817 10119 (TOTALS-AFTER)
Also, while at it, if we see (*str && isspace(*str)), we can be sure to
remove the first condition (*str) as the second one (isspace(*str)) also
evaluates to 0 whenever *str == 0, making it redundant. In other words,
"a char equals zero is never a space".
Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below,
and found occurrences of this pattern on 3 more files:
drivers/leds/led-class.c
drivers/leds/ledtrig-timer.c
drivers/video/output.c
@@
expression str;
@@
( // ignore skip_spaces cases
while (*str && isspace(*str)) { \(str++;\|++str;\) }
|
- *str &&
isspace(*str)
)
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Cc: Julia Lawall <julia@diku.dk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With generic modular drivers handling all of this stuff, the
geode-specific code can go away. The cs5535-gpio, cs5535-mfgpt, and
cs5535-clockevt drivers now handle this.
Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only thing that uses this is the reboot_fixups code.
Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The commit f4780ca005 moves
swiotlb initialization before dma32_free_bootmem(). It's
supposed to fix a bug that the commit
75f1cdf1dd introduced, we
initialize SWIOTLB right after dma32_free_bootmem so we wrongly
steal memory area allocated for GART with broken BIOS earlier.
However, the above commit introduced another problem, which
likely breaks machines with huge amount of memory. Such a box
use the majority of DMA32_ZONE so there is no memory for
swiotlb.
With this patch, the x86 IOMMU initialization sequence are:
1. We set swiotlb to 1 in the case of (max_pfn > MAX_DMA32_PFN
&& !no_iommu). If swiotlb usage is forced by the boot option,
we go to the step 3 and finish (we don't try to detect IOMMUs).
2. We call the detection functions of all the IOMMUs. The
detection function sets x86_init.iommu.iommu_init to the IOMMU
initialization function (so we can avoid calling the
initialization functions of all the IOMMUs needlessly).
3. We initialize swiotlb (and set dma_ops to swiotlb_dma_ops) if
swiotlb is set to 1.
4. If the IOMMU initialization function doesn't need swiotlb
(e.g. the initialization is sucessful) then sets swiotlb to zero.
5. If we find that swiotlb is set to zero, we free swiotlb
resource.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <20091215204729A.fujita.tomonori@lab.ntt.co.jp>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For CONFIG_PARAVIRT, load_gs_index is an inline function (it's #defined
to native_load_gs_index otherwise).
Exporting an inline function breaks the new assembler-based alphabetical
sorted symbol list:
Today's linux-next build (x86_64 allmodconfig) failed like this:
.tmp_exports-asm.o: In function `__ksymtab_load_gs_index':
(__ksymtab_sorted+0x5b40): undefined reference to `load_gs_index'
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: x86@kernel.org
Cc: alan-jenkins@tuffmail.co.uk
Currently, ARB_DISABLE is a NOP on all of the recent Intel platforms.
For such platforms, reduce contention on c3_lock by skipping the fake
ARB_DISABLE.
The cpu model id on one laptop is 14. If we disable ARB_DISABLE on this box,
the box can't be booted correctly. But if we still enable ARB_DISABLE on this
box, the box can be booted correctly.
So we still use the ARB_DISABLE for the cpu which mode id is less than 0x0f.
http://bugzilla.kernel.org/show_bug.cgi?id=14700
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
cc: stable@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Name space cleanup. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Further name space cleanup. No functional change
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.
Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
This adds a new category of symbols to the relocs program: symbols
which are known to be relative, even though the linker emits them as
absolute; this is the case for symbols that live in the linker script,
which currently applies to _end.
Unfortunately the previous workaround of putting _end in its own empty
section was defeated by newer binutils, which remove empty sections
completely.
This patch also changes the symbol matching to use regular expressions
instead of hardcoded C for specific patterns.
This is a decidedly non-minimal patch: a modified version of the
relocs program is used as part of the Syslinux build, and this is
basically a backport to Linux of some of those changes; they have
thus been well tested.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4AF86211.3070103@zytor.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: Clean up thermal init by introducing intel_thermal_supported()
x86, mce: Thermal monitoring depends on APIC being enabled
x86: Gart: fix breakage due to IOMMU initialization cleanup
x86: Move swiotlb initialization before dma32_free_bootmem
x86: Fix build warning in arch/x86/mm/mmio-mod.c
x86: Remove usedac in feature-removal-schedule.txt
x86: Fix duplicated UV BAU interrupt vector
nvram: Fix write beyond end condition; prove to gcc copy is safe
mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe
x86: Limit the number of processor bootup messages
x86: Remove enabling x2apic message for every CPU
doc: Add documentation for bootloader_{type,version}
x86, msr: Add support for non-contiguous cpumasks
x86: Use find_e820() instead of hard coded trampoline address
x86, AMD: Fix stale cpuid4_info shared_map data in shared_cpu_map cpumasks
Trivial percpu-naming-introduced conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c
The MSR driver would compute the values for cpu and c at declaration,
and then again in the body of the function. This isn't merely
redundant, but unsafe, since cpu might not refer to a valid CPU at
that point.
Remove the unnecessary and dangerous references in the declarations.
This code now matches the equivalent code in the CPUID driver.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
It looks better to have a common function. No change in functionality.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <4B25FDDC.407@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Add check if APIC is not disabled since thermal
monitoring depends on it. As only apic gets disabled
we should not try to install "thermal monitor" vector,
print out that thermal monitoring is enabled and etc...
Note that "Intel Correct Machine Check Interrupts" already
has such a check.
Also I decided to not add cpu_has_apic check into
mcheck_intel_therm_init since even if it'll call apic_read on
disabled apic -- it's safe here and allow us to save a few code
bytes.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: <4B25FDC2.3020401@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This fixes the following breakage of the commit
75f1cdf1dd:
- GART systems that don't AGP with broken BIOS and more than 4GB
memory are forced to use swiotlb. They can allocate aperture by
hand and use GART.
- GART systems without GAP must disable GART on shutdown.
- swiotlb usage is forced by the boot option,
gart_iommu_hole_init() is not called, so we disable GART
early_gart_iommu_check().
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1260759135-6450-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The commit 75f1cdf1dd introduced a
bug that we initialize SWIOTLB right after dma32_free_bootmem so
we wrongly steal memory area allocated for GART with broken BIOS
earlier.
This moves swiotlb initialization before dma32_free_bootmem().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: yinghai@kernel.org
LKML-Reference: <1260759135-6450-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/amd-iommu: Fix PCI hotplug with passthrough mode
x86/amd-iommu: Fix passthrough mode
x86: mmio-mod.c: Use pr_fmt
x86: kmmio.c: Add and use pr_fmt(fmt)
x86: i8254.c: Add pr_fmt(fmt)
x86: setup_percpu.c: Use pr_<level> and add pr_fmt(fmt)
x86: es7000_32.c: Use pr_<level> and add pr_fmt(fmt)
x86: Print DMI_BOARD_NAME as well as DMI_PRODUCT_NAME from __show_regs()
x86: Factor duplicated code out of __show_regs() into show_regs_common()
arch/x86/kernel/microcode*: Use pr_fmt() and remove duplicated KERN_ERR prefix
x86, mce: fix confusion between bank attributes and mce attributes
x86/mce: Set up timer unconditionally
x86: Fix bogus warning in apic_noop.apic_write()
x86: Fix typo in arch/x86/mm/kmmio.c
x86: ASUS P4S800 reboot=bios quirk
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits)
x86, perf events: Check if we have APIC enabled
perf_event: Fix variable initialization in other codepaths
perf kmem: Fix unused argument build warning
perf symbols: perf_header__read_build_ids() offset'n'size should be u64
perf symbols: dsos__read_build_ids() should read both user and kernel buildids
perf tools: Align long options which have no short forms
perf kmem: Show usage if no option is specified
sched: Mark sched_clock() as notrace
perf sched: Add max delay time snapshot
perf tools: Correct size given to memset
perf_event: Fix perf_swevent_hrtimer() variable initialization
perf sched: Fix for getting task's execution time
tracing/kprobes: Fix field creation's bad error handling
perf_event: Cleanup for cpu_clock_perf_event_update()
perf_event: Allocate children's perf_event_ctxp at the right time
perf_event: Clean up __perf_event_init_context()
hw-breakpoints: Modify breakpoints without unregistering them
perf probe: Update perf-probe document
perf probe: Support --del option
trace-kprobe: Support delete probe syntax
...
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface
[CPUFREQ] make internal cpufreq_add_dev_* static
[CPUFREQ] use an enum for speedstep processor identification
[CPUFREQ] Document units for transition latency
[CPUFREQ] Use global sysfs cpufreq structure for conservative governor tunings
[CPUFREQ] Documentation: ABI: /sys/devices/system/cpu/cpu#/cpufreq/
[CPUFREQ] powernow-k6: set transition latency value so ondemand governor can be used
[CPUFREQ] cpumask: don't put a cpumask on the stack in x86...cpufreq/powernow-k8.c
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: Always process the whole breakpoint list on activate or deactivate
kgdb: continue and warn on signal passing from gdb
kgdb,x86: do not set kgdb_single_step on x86
kgdb: allow for cpu switch when single stepping
kgdb,i386: Fix corner case access to ss with NMI watch dog exception
kgdb: Replace strstr() by strchr() for single-character needles
kgdbts: Read buffer overflow
kgdb: Read buffer overflow
kgdb,x86: remove redundant test
When there are a large number of processors in a system, there
is an excessive amount of messages sent to the system console.
It's estimated that with 4096 processors in a system, and the
console baudrate set to 56K, the startup messages will take
about 84 minutes to clear the serial port.
This set of patches limits the number of repetitious messages
which contain no additional information. Much of this information
is obtainable from the /proc and /sysfs. Some of the messages
are also sent to the kernel log buffer as KERN_DEBUG messages so
dmesg can be used to examine more closely any details specific to
a problem.
The new cpu bootup sequence for system_state == SYSTEM_BOOTING:
Booting Node 0, Processors #1#2#3#4#5#6#7 Ok.
Booting Node 1, Processors #8#9#10#11#12#13#14#15 Ok.
...
Booting Node 3, Processors #56#57#58#59#60#61#62#63 Ok.
Brought up 64 CPUs
After the system is running, a single line boot message is displayed
when CPU's are hotplugged on:
Booting Node %d Processor %d APIC 0x%x
Status of the following lines:
CPU: Physical Processor ID: printed once (for boot cpu)
CPU: Processor Core ID: printed once (for boot cpu)
CPU: Hyper-Threading is disabled printed once (for boot cpu)
CPU: Thermal monitoring enabled printed once (for boot cpu)
CPU %d/0x%x -> Node %d: removed
CPU %d is now offline: only if system_state == RUNNING
Initializing CPU#%d: KERN_DEBUG
Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <4B219E28.8080601@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Print only once that the system is supporting x2apic mode.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <4B226E92.5080904@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/mmap:
Add missing alignment check in arch/score sys_mmap()
fix broken aliasing checks for MAP_FIXED on sparc32, mips, arm and sh
Get rid of open-coding in ia64_brk()
sparc_brk() is not needed anymore
switch do_brk() to get_unmapped_area()
Take arch_mmap_check() into get_unmapped_area()
fix a struct file leak in do_mmap_pgoff()
Unify sys_mmap*
Cut hugetlb case early for 32bit on ia64
arch_mmap_check() on mn10300
Kill ancient crap in s390 compat mmap
arm: add arch_mmap_check(), get rid of sys_arm_mremap()
file ->get_unmapped_area() shouldn't duplicate work of get_unmapped_area()
kill useless checks in sparc mremap variants
fix pgoff in "have to relocate" case of mremap()
fix the arch checks in MREMAP_FIXED case
fix checks for expand-in-place mremap
do_mremap() untangling, part 3
do_mremap() untangling, part 2
untangling do_mremap(), part 1
On an SMP system the kgdb_single_step flag has the possibility to
indefinitely hang the system in the case. Consider the case where,
CPU 1 has the schedule lock and CPU 0 is set to single step, there is
no way for CPU 0 to run another task.
The easy way to observe the problem is to make 2 cpus busy, and run
the kgdb test suite. You will see that it hangs the system very
quickly.
while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done &
while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done &
echo V1 > /sys/module/kgdbts/parameters/kgdbts
The side effect of this patch is that there is the possibility
to miss a breakpoint in the case that a single step operation
was executed to step over a breakpoint in common code.
The trade off of the missed breakpoint is preferred to
hanging the kernel. This can be fixed in the future by
using kprobes or another strategy to step over planted
breakpoints with out of line execution.
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
It is possible for the user_mode_vm(regs) check to return true on the
i368 arch for a non master kgdb cpu or when the master kgdb cpu
handles the NMI watch dog exception.
The solution is simply to select the correct gdb_ss location
based on the check to user_mode_vm(regs).
CC: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
The for loop starts with a breakno of 0, and ends when it's 4. so
this test is always true.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
New helper - sys_mmap_pgoff(); switch syscalls to using it.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jens found the following crash/regression:
[ 0.000000] found SMP MP-table at [ffff8800000fdd80] fdd80
[ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 0-fff BIOS data page
and
[ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 6000-7fff TRAMPOLINE
and bisected it to b24c2a9 ("x86: Move find_smp_config()
earlier and avoid bootmem usage").
It turns out the BIOS is using the first 64k for mptable,
without reserving it.
So try to find good range for the real-mode trampoline instead of
hard coding it, in case some bios tries to use that range for sth.
Reported-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <4B21630A.6000308@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The per_cpu cpuid4_info shared_map can contain stale data when CPUs are added
and removed.
The stale data can lead to a NULL pointer derefernce panic on a remove of a
CPU that has had siblings previously removed.
This patch resolves the panic by verifying a cpu is actually online before
adding it to the shared_cpu_map, only examining cpus that are part of
the same lower level cache, and by updating other siblings lowest level cache
maps when a cpu is added.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
LKML-Reference: <20091209183336.17855.98708.sendpatchset@prarit.bos.redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The arg should be in %eax, but that is clobbered by the return value
of clone. The function pointer can be in any register. Also, don't
push args onto the stack, since regparm(3) is the normal calling
convention now.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Use user_mode() instead of a magic value for sp to determine when returning
to kernel mode.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Prepare for merging with 32-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260380084-3707-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The device change notifier is initialized in the dma_ops
initialization path. But this path is never executed for
iommu=pt. Move the notifier initialization to IOMMU hardware
init code to fix this.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The data structure changes to use dev->archdata.iommu field
broke the iommu=pt mode because in this case the
dev->archdata.iommu was left uninitialized. This moves the
inititalization of the devices into the main init function
and fixes the problem.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPICA: Update version to 20091112.
ACPICA: Add additional module-level code support
ACPICA: Deploy new create integer interface where appropriate
ACPICA: New internal utility function to create Integer objects
ACPICA: Add repair for predefined methods that must return sorted lists
ACPICA: Fix possible fault if return Package objects contain NULL elements
ACPICA: Add post-order callback to acpi_walk_namespace
ACPICA: Change package length error message to an info message
ACPICA: Reduce severity of predefined repair messages, Warning to Info
ACPICA: Update version to 20091013
ACPICA: Fix possible memory leak for Scope ASL operator
ACPICA: Remove possibility of executing _REG methods twice
ACPICA: Add repair for bad _MAT buffers
ACPICA: Add repair for bad _BIF/_BIX packages
set_iopl_mask() is a no-op on 64 bits, but it is also a paravirt hook,
so call it even on 64 bits.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-3-git-send-email-brgerst@gmail.com>
In the PTREGSCALL1 and 2 macros, we can trivially avoid an unnecessary
pipeline serialization, so do so.
In PTREGSCALLS3 this is much less clear-cut since we have to push a
new value to the stack. Leave it alone for now assuming it is as good
as it is going to be; may want to check on Atom or another in-order
x86 to see if we can do better.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-2-git-send-email-brgerst@gmail.com>
Change 32-bit sys_clone to new PTREGSCALL stub, and merge with 64-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-7-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Convert these to new PTREGSCALL stubs.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Change 32-bit sys_sigaltstack to PTREGSCALL2, and merge with 64-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Change 32-bit sys_execve to PTREGSCALL3, and merge with 64-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Change 32-bit sys_iopl to PTREGSCALL1, and merge with 64-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add new stubs which add the pt_regs pointer as the last arg, matching
64-bit. This will allow these syscalls to be easily merged.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Robert Hancock observes that DMI_BOARD_NAME is often more useful
than DMI_PRODUCT_NAME, especially on standalone motherboards.
So, print both.
Signed-off-by: Andy Isaacson <adi@hexapodia.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: Richard Zidlicky <rz@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091208083021.GB27174@hexapodia.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unify x86_32 and x86_64 implementations of __show_regs() header,
standardizing on the x86_64 format string in the process. Also,
32-bit will now call print_modules.
Signed-off-by: Andy Isaacson <adi@hexapodia.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: Richard Zidlicky <rz@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091208082942.GA27174@hexapodia.org>
[ v2: resolved conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, when ptrace needs to modify a breakpoint, like disabling
it, changing its address, type or len, it calls
modify_user_hw_breakpoint(). This latter will perform the heavy and
racy task of unregistering the old breakpoint and registering a new
one.
This is racy as someone else might steal the reserved breakpoint
slot under us, which is undesired as the breakpoint is only
supposed to be modified, sometimes in the middle of a debugging
workflow. We don't want our slot to be stolen in the middle.
So instead of unregistering/registering the breakpoint, just
disable it while we modify its breakpoint fields and re-enable it
after if necessary.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1260347148-5519-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Use #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
- Remove "microcode: " prefix from each pr_<level>
- Fix duplicated KERN_ERR prefix
- Coalesce pr_<level> format strings
- Add a space after an exclamation point
No other change in output.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
LKML-Reference: <1260340250.27677.191.camel@Joe-Laptop.home>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers, init: Limit the number of per cpu calibration bootup messages
posix-cpu-timers: optimize and document timer_create callback
clockevents: Add missing include to pacify sparse
x86: vmiclock: Fix printk format
x86: Fix printk format due to variable type change
sparc: fix printk for change of variable type
clocksource/events: Fix fallout of generic code changes
nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
nohz: Track last do_timer() cpu
nohz: Prevent clocksource wrapping during idle
nohz: Type cast printk argument
mips: Use generic mult/shift factor calculation for clocks
clocksource: Provide a generic mult/shift factor calculation
clockevents: Use u32 for mult and shift factors
nohz: Introduce arch_needs_cpu
nohz: Reuse ktime in sub-functions of tick_check_idle.
time: Remove xtime_cache
time: Implement logarithmic time accumulation
* 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: hpet: Make WARN_ON understandable
x86: arch specific support for remapping HPET MSIs
intr-remap: generic support for remapping HPET MSIs
x86, hpet: Simplify the HPET code
x86, hpet: Disable per-cpu hpet timer if ARAT is supported
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: don't restart timer if disabled
x86: Use -maccumulate-outgoing-args for sane mcount prologues
x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage
x86: AMD Northbridge: Verify NB's node is online
x86 VSDO: Fix Kconfig help
x86: Fix typo in Intel CPU cache size descriptor
x86: Add new Intel CPU cache size descriptors
* 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/reboot: Add pci_dev_put in reboot_fixup_32.c for consistency
* 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64: merge the standard and compat start_thread() functions
x86-64: make compat_start_thread() match start_thread()
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
x86, mm: Correct the implementation of is_untracked_pat_range()
x86/pat: Trivial: don't create debugfs for memtype if pat is disabled
x86, mtrr: Fix sorting of mtrr after subtracting
x86: Move find_smp_config() earlier and avoid bootmem usage
x86, platform: Change is_untracked_pat_range() to bool; cleanup init
x86: Change is_ISA_range() into an inline function
x86, mm: is_untracked_pat_range() takes a normal semiclosed range
x86, mm: Call is_untracked_pat_range() rather than is_ISA_range()
x86: UV SGI: Don't track GRU space in PAT
x86: SGI UV: Fix BAU initialization
x86, numa: Use near(er) online node instead of roundrobin for NUMA
x86, numa, bootmem: Only free bootmem on NUMA failure path
x86: Change crash kernel to reserve via reserve_early()
x86: Eliminate redundant/contradicting cache line size config options
x86: When cleaning MTRRs, do not fold WP into UC
x86: remove "extern" from function prototypes in <asm/proto.h>
x86, mm: Report state of NX protections during boot
x86, mm: Clean up and simplify NX enablement
x86, pageattr: Make set_memory_(x|nx) aware of NX support
x86, sleep: Always save the value of EFER
...
Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range)
to 'struct x86_platform_ops') in
arch/x86/include/asm/x86_init.h
arch/x86/kernel/x86_init.c
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: ucode-amd: Move family check to microcde_amd.c's init function
x86, ucode-amd: Ensure ucode update on suspend/resume after CPU off/online cycle
x86: ucode-amd: Convert printk(KERN_*...) to pr_*(...)
x86: ucode-amd: Don't warn when no ucode is available for a CPU revision
x86: ucode-amd: Load ucode-patches once and not separately of each CPU
x86, amd-ucode: Remove needless log messages
Commit cebe182033 had an unnecessary,
wrong change: &mce_banks[i].attr is equivalent to the former
bank_attrs[i], not to mce_attrs[i].
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <4B1E05CC.4040703@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits)
KVM: VMX: Fix comparison of guest efer with stale host value
KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
KVM: Drop user return notifier when disabling virtualization on a cpu
KVM: VMX: Disable unrestricted guest when EPT disabled
KVM: x86 emulator: limit instructions to 15 bytes
KVM: s390: Make psw available on all exits, not just a subset
KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
KVM: VMX: Report unexpected simultaneous exceptions as internal errors
KVM: Allow internal errors reported to userspace to carry extra data
KVM: Reorder IOCTLs in main kvm.h
KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
KVM: only clear irq_source_id if irqchip is present
KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
KVM: VMX: Remove vmx->msr_offset_efer
KVM: MMU: update invlpg handler comment
KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
KVM: remove duplicated task_switch check
KVM: powerpc: Fix BUILD_BUG_ON condition
KVM: VMX: Use shared msr infrastructure
...
Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
mce_timer must be passed to setup_timer() in all cases, no
matter whether it is going to be actually used. Otherwise, when
the CPU gets brought down, its call to del_timer_sync() will
never return, as the timer won't have a base associated, and
hence lock_timer_base() will loop infinitely.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: <stable@kernel.org>
LKML-Reference: <4B1DB831.2030801@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
apic_noop is used to provide dummy apic functions. It's installed
when the CPU has no APIC or when the APIC is disabled on the kernel
command line.
The apic_noop implementation of apic_write() warns when the CPU has
an APIC or when the APIC is not disabled.
That's bogus. The warning should only happen when the CPU has an
APIC _AND_ the APIC is not disabled. apic_noop.apic_read() has the
correct check.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: <stable@kernel.org> # in <= .32 this typo resides in native_apic_write_dummy()
LKML-Reference: <alpine.LFD.2.00.0912071255420.3089@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we enter in irq, two things can happen to preserve the link
to the previous frame pointer:
- If we were in an irq already, we don't switch to the irq stack
as we are inside. We just need to save the previous frame
pointer and to link the new one to the previous.
- Otherwise we need another level of indirection. We enter the irq with
the previous stack. We save the previous bp inside and make bp
pointing to its saved address. Then we switch to the irq stack and
push bp another time but to the new stack. This makes two levels to
dereference instead of one.
In the second case, the current stacktrace code omits the second level
and loses the frame pointer accuracy. The stack that follows will then
be considered as unreliable.
Handling that makes the perf callchain happier.
Before:
43.94% [k] _raw_read_lock
|
--- _read_lock
|
|--60.53%-- send_sigio
| __kill_fasync
| kill_fasync
| evdev_pass_event
| evdev_event
| input_pass_event
| input_handle_event
| input_event
| synaptics_process_byte
| psmouse_handle_byte
| psmouse_interrupt
| serio_interrupt
| i8042_interrupt
| handle_IRQ_event
| handle_edge_irq
| handle_irq
| __irqentry_text_start
| ret_from_intr
| |
| |--30.43%-- __select
| |
| |--17.39%-- 0x454f15
| |
| |--13.04%-- __read
| |
| |--13.04%-- vread_hpet
| |
| |--13.04%-- _xcb_lock_io
| |
| --13.04%-- 0x7f630878ce8
After:
50.00% [k] _raw_read_lock
|
--- _read_lock
|
|--98.97%-- send_sigio
| __kill_fasync
| kill_fasync
| evdev_pass_event
| evdev_event
| input_pass_event
| input_handle_event
| input_event
| |
| |--96.88%-- synaptics_process_byte
| | psmouse_handle_byte
| | psmouse_interrupt
| | serio_interrupt
| | i8042_interrupt
| | handle_IRQ_event
| | handle_edge_irq
| | handle_irq
| | __irqentry_text_start
| | ret_from_intr
| | |
| | |--39.78%-- __const_udelay
| | | |
| | | |--91.89%-- ath5k_hw_register_timeout
| | | | ath5k_hw_noise_floor_calibration
| | | | ath5k_hw_reset
| | | | ath5k_reset
| | | | ath5k_config
| | | | ieee80211_hw_config
| | | | |
| | | | |--88.24%-- ieee80211_scan_work
| | | | | worker_thread
| | | | | kthread
| | | | | child_rip
| | | | |
| | | | --11.76%-- ieee80211_scan_completed
| | | | ieee80211_scan_work
| | | | worker_thread
| | | | kthread
| | | | child_rip
| | | |
| | | --8.11%-- ath5k_hw_noise_floor_calibration
| | | ath5k_hw_reset
| | | ath5k_reset
| | | ath5k_config
Note: This does not only affect perf events but also x86-64
stacktraces. They were considered as unreliable once we quit
the irq stack frame.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
While dumping a stacktrace, the end of the exception stack won't link
the frame pointer to the previous stack.
The interrupted stack will then be considered as unreliable and ignored
by perf, as the frame pointer is unreliable itself.
This happens because we overwrite the frame pointer that links to the
interrupted frame with the address of the exception stack. This is
done in order to reserve space inside.
But rbp has been chosen here only because it is not a scratch register,
so that the address of the exception stack remains in rbp after calling
do_debug(), we can then release the exception stack space without the
need to retrieve its address again.
But we can pick another non-scratch register to do that, so that we
preserve the link to the interrupted stack frame in the stacktraces.
Just randomly choose r12. Every registers are saved just before and
restored just after calling do_debug(). And r12 is not used in the
middle, which makes it a perfect candidate.
Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw
Before:
44.18% [k] _raw_read_lock
|
|
--- |--6.31%-- waitid
|
|--4.26%-- writev
|
|--3.63%-- __select
|
|--3.15%-- __waitpid
| |
| |--28.57%-- 0x8b52e00000139f
| |
| |--28.57%-- 0x8b52e0000013c6
| |
| |--14.29%-- 0x7fde786dc000
| |
| |--14.29%-- 0x62696c2f7273752f
| |
| --14.29%-- 0x1ea9df800000000
|
|--3.00%-- __poll
After:
43.94% [k] _raw_read_lock
|
--- _read_lock
|
|--60.53%-- send_sigio
| __kill_fasync
| kill_fasync
| evdev_pass_event
| evdev_event
| input_pass_event
| input_handle_event
| input_event
| synaptics_process_byte
| psmouse_handle_byte
| psmouse_interrupt
| serio_interrupt
| i8042_interrupt
| handle_IRQ_event
| handle_edge_irq
| handle_irq
| __irqentry_text_start
| ret_from_intr
| |
| |--30.43%-- __select
| |
| |--17.39%-- 0x454f15
| |
| |--13.04%-- __read
| |
| |--13.04%-- vread_hpet
| |
| |--13.04%-- _xcb_lock_io
| |
| --13.04%-- 0x7f630878ce87
Note: it does not only affect perf events but also other stacktraces in
x86-64. They were considered as unreliable once we quit the debug
stack frame.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Dumping the callchains from breakpoint events with perf gives strange
results:
3.75% perf [kernel] [k] _raw_read_unlock
|
--- _raw_read_unlock
perf_callchain
perf_prepare_sample
__perf_event_overflow
perf_swevent_overflow
perf_swevent_add
perf_bp_event
hw_breakpoint_exceptions_notify
notifier_call_chain
__atomic_notifier_call_chain
atomic_notifier_call_chain
notify_die
do_debug
debug
munmap
We are infected with all the debug stack. Like the nmi stack, the debug
stack is undesired as it is part of the profiling path, not helpful for
the user.
Ignore it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
struct perf_event::event callback was called when a breakpoint
triggers. But this is a rather opaque callback, pretty
tied-only to the breakpoint API and not really integrated into perf
as it triggers even when we don't overflow.
We prefer to use overflow_handler() as it fits into the perf events
rules, being called only when we overflow.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Drop the callback and task parameters from modify_user_hw_breakpoint().
For now we have no user that need to modify a breakpoint to the point
of changing its handler or its task context.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Limit number of per cpu TSC sync messages
x86: dumpstack, 64-bit: Disable preemption when walking the IRQ/exception stacks
x86: dumpstack: Clean up the x86_stack_ids[][] initalization and other details
x86, cpu: mv display_cacheinfo -> cpu_detect_cache_sizes
x86: Suppress stack overrun message for init_task
x86: Fix cpu_devs[] initialization in early_cpu_init()
x86: Remove CPU cache size output for non-Intel too
x86: Minimise printk spew from per-vendor init code
x86: Remove the CPU cache size printk's
cpumask: Avoid cpumask_t in arch/x86/kernel/apic/nmi.c
x86: Make sure we also print a Code: line for show_regs()
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t
x86, cpuid: Simplify the code in cpuid_open
x86, cpuid: Remove the bkl from cpuid_open()
x86, msr: Remove the bkl from msr_open()
x86: AMD Geode LX optimizations
x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpus
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix a section mismatch in arch/x86/kernel/setup.c
x86: Fixup last users of irq_chip->typename
x86: Remove BKL from apm_32
x86: Remove BKL from microcode
x86: use kernel_stack_pointer() in kprobes.c
x86: use kernel_stack_pointer() in kgdb.c
x86: use kernel_stack_pointer() in dumpstack.c
x86: use kernel_stack_pointer() in process_32.c