Move all CPU definitions to Kconfig.cpu
Always define X86_MINIMUM_CPU_FAMILY and do the
obvious code cleanup in boot/cpucheck.c
Comments from: Adrian Bunk <bunk@kernel.org> incorporated.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Brian Gerst <bgerst@didntduck.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
This step introduces the file arch/x86/Kconfig
which contains all the menu's from "Power Management"
and below.
The main part of the new Kconfig file is shared
and the remaining i386/x86_64 specific symbols
are covered by dependencies.
A x86_64 allmodconfig build did not show any differences.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Merge the two Kconfig files to a single file.
Checked using make allmodconfig for x86_64.
No changes in build.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
restore sigcontext is taking a DNA exception while restoring FP context
from the user stack, during the sigreturn. Appended patch fixes it by
doing clts() if the app doesn't touch FP during the signal handler
execution. This will stop generating a DNA, during the fxrstor in the
sigreturn.
This improves 64-bit lat_sig numbers by ~30% on my core2 platform.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ jdike - Pushing Chuck's patch - see
http://lkml.org/lkml/2005/9/16/261 for some history and a test
program. UML is also broken without this patch - its processes get
SIGBUS from the corrupt 6th argument to mmap being interpretted as a
file offset ]
When the 32-bit vDSO is used to make a system call, the %ebp register for
the 6th syscall arg has to be loaded from the user stack (where it's pushed
by the vDSO user code). The native i386 kernel always does this before
stopping for syscall tracing, so %ebp can be seen and modified via ptrace
to access the 6th syscall argument. The x86-64 kernel fails to do this,
presenting the stack address to ptrace instead. This makes the %rbp value
seen by 64-bit ptrace of a 32-bit process, and the %ebp value seen by a
32-bit caller of ptrace, both differ from the native i386 behavior.
This patch fixes the problem by putting the word loaded from the user stack
into %rbp before calling syscall_trace_enter, and reloading the 6th syscall
argument from there afterwards (so ptrace can change it). This makes the
behavior match that of i386 kernels.
Original-Patch-By: Roland McGrath <roland@redhat.com>
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The addr argument to PTRACE_GET_THREAD_AREA and PTRACE_SET_THREAD_AREA is
not a magic constant. It's derived from the segment register values being
used, which are computed originally from the index used with set_thread_area.
The value does not need to match what a native i386 kernel would accept.
It needs to match the segment selectors that can actually be in use in this
32-bit process. The 64-bit ptrace support for PTRACE_GET_THREAD_AREA
(normally used only on 32-bit processes) is correct, but the 32-bit emulation
of ptrace is broken.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
prepare for up_smp_call_function() to ensure that the 'func'
pointer is unused. (which is related to a KVM build fix)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest:
lguest: tidy up documentation
kernel/futex.c: make 3 functions static
unexport access_process_vm
lguest: make async_hcall() static
After Adrian Bunk's "make async_hcall static" moved things around, update
comments to match (aka "make Guest").
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In accordance with the newly formalized 32-bit boot protocol, set
%ebx == %ebp == %edi == 0 in order to support future extensions to the
protocol.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
No reason I can think of of making them default y Most people don't have
the hardware and with default y they just pollute lots of configs during
make oldconfig.
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jeff Garzik <jeff@garzik.org>
Acked-by: "Nelson, Shannon" <shannon.nelson@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
node0_bdata and paddr_to_nid() can become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch removes the unused EXPORT_SYMBOL(machine_id).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch renames the 4 symbols iommu_hole_init(), iommu_aperture,
iommu_aperture_allowed, iommu_aperture_disabled. All these symbols are only
used for the GART implementation of IOMMUs.
It adds and additional gart_ prefix to them.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch makes some functions and variables static in pci-gart_64.c which are
not used somewhere else.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch renames the IOMMU config option to GART_IOMMU because in fact it
means the GART and not general support for an IOMMU on x86.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch renames the include file asm-x86/iommu.h to asm-x86/gart.h to make
clear to which IOMMU implementation it belongs. The patch also adds "GART" to
the Kconfig line.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Additional CPUID strings (sse4_1, sse4_2, sse5, skinit, wdt); fix the
positioning of the AMD ecx strings (cr8_legacy was duplicated under
two different names, so the alignment of all the other strings were
off by one.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 2e1c49db4c.
First off, testing in Fedora has shown it to cause boot failures,
bisected down by Martin Ebourne, and reported by Dave Jobes. So the
commit will likely be reverted in the 2.6.23 stable kernels.
Secondly, in the 2.6.24 model, x86-64 has now grown support for
SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
bug is not visible any more, it's become invisible due to the code just
being irrelevant and no longer enabled on the only architecture that
this ever affected.
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Martin Ebourne <fedora@ebourne.me.uk>
Cc: Zou Nan hai <nanhai.zou@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
blk_rq_map_sg doesn't initialize sg->dma_address/length to zero
anymore. Some low level drivers reuse sg lists without initializing so
IOMMUs might get non-zero dma_address/length. If map_sg fails, we need
pass the number of the mapped entries to gart_unmap_sg.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds the symbol "init_level4_pgt" to the vmcoreinfo data so
that makedumpfile (dump filtering command) supports x86_64 sparsemem
kernel of linux-2.6.24.
makedumpfile creates a small dumpfile by excluding unnecessary pages for
the analysis. It checks attributes in page structures and distinguishes
necessary pages and unnecessary ones. To check them, makedumpfile gets
the vmcoreinfo data which has the minimum debugging information only for
dump filtering.
For older x86_64 kernel (linux-2.6.23 or before), makedumpfile translates
the virtual address of page structure into physical address by subtracting
PAGE_OFFSET from virtual address, but this translation isn't effective for
linux-2.6.24 sparsemem kernel, because its page structures are in virtual
memmap area. makedumpfile should translate their virtual address by 4-levels
paging and it needs the symbol "init_level4_pgt".
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fix this warning:
arch/x86/kernel/early-quirks.c:40: warning: nvidia_hpet_check defined but not used
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix !CONFIG_SMP warning:
arch/x86/kernel/acpi/processor.c: In function arch_acpi_processor_init_pdc:
arch/x86/kernel/acpi/processor.c:65: warning: unused variable cpu
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kernel only ever supports 1 version of the boot protocol
so there is no need to check the boot protocol revision to
see if a feature is supported.
Both x86 and x86_64 support the same boot protocol so we need
to implement the KEEP_SEGMENTS on x86_64 as well. It isn't
just paravirt bootloaders that could use this functionality.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There were two problems. Firstly, someone forgot the struct keyword in
front of cpuinfo_x86, so I take it this wasn't even compile checked.
Secondly, the actual definition has this as a SHARED_ALIGNED, so the
definitions mismatch.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
KVM uses smp_call_function_mask and therefor need smp_ops to be exported.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 6442eea937.
The patch breaks smp_ops and needs to be reverted. The solution to
allow modular build of KVM is to export smp_ops instead.
Pointed-out-by: James Bottomley
<jejb> tglx, so write out 100 times "voyager is a useful architecture" ...
<tglx> yes, Sir
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup:
x86 setup: sizeof() is unsigned, unbreak comparisons
x86 setup: handle boot loaders which set up the stack incorrectly
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/x86:
x86: kill the old i386 and x86_64 directories
x86: move i386 and x86_64 Kconfig files to x86 directory
kconfig: small code refactoring in kconfig Makefile
x86: unification of i386 and x86_64 Kconfig.debug
x86: move defconfig files for i386 and x86_64 to x86
x86: move i386 and x86_64 Makefiles to arch/x86
We use signed values for limit checking since the values can go
negative under certain circumstances. However, sizeof() is unsigned
and forces the comparison to be unsigned, so move the comparison into
the heap_free() macros so we can ensure it is a signed comparison.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Apparently some specific versions of LILO enter the kernel with a
stack pointer that doesn't match the rest of the segments. Make our
best attempt at untangling the resulting mess.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'sg' of git://git.kernel.dk/linux-2.6-block:
fix sg_phys to use dma_addr_t
ub: add sg_init_table for sense and read capacity commands
x86: pci-gart fix
blackfin: fix sg fallout
xtensa: dma-mapping.h is using linux/scatterlist.h functions, so include it
SG: audit of drivers that use blk_rq_map_sg()
arch/um/drivers/ubd_kern.c: fix a building error
SG: Change sg_set_page() to take length and offset argument
AVR32: Fix sg_page breakage
mmc: sg fallout
m68k: sg fallout
More SG build fixes
sg: add missing sg_init_table calls to zfcp
SG build fix
Adding proper dependencies so the two Kconfig.debug files
are now identical and move the result file to x86.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
With some small changes to kconfig makefile we can now
locate the defconfig files for i386 and x86_64 in
the configs/ subdirectory under x86.
make ARCH=i386 defconfig and make defconfig
works as expected also after this change.
But arch maintainers shall now update a defconfig file in
the configs/ directory.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Moving the ARCH specific Makefiles for i386 and x86_64
required a litle bit tweaking in the top-lvel Makefile.
SRCARCH is now set in the top-level Makefile
because we need this info to include the correct
arch Makefile.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Ensure we fixup the IRQ state before we hit any locking code.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
map_sg could copy the last sg element to another position (if merging
some elements). It breaks sg chaining. This copies only
dma_address/length instead of the whole sg element.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Went through the documentation doing typo and content fixes. This
patch contains only comment and whitespace changes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix this error (i386 !SMP build)
arch/x86/lguest/boot.c: In function ‘lguest_init’:
arch/x86/lguest/boot.c:1059: error: ‘pm_power_off’ undeclared (first use in this function)
by including linux/pm.h.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix this error (i386 !SMP build):
arch/x86/lguest/boot.c: In function lguest_init:
arch/x86/lguest/boot.c:1059: error: pm_power_off undeclared (first use in this function)
by including linux/pm.h.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'irq-upstream' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6:
[SPARC, XEN, NET/CXGB3] use irq_handler_t where appropriate
drivers/char/riscom8: clean up irq handling
isdn/sc: irq handler clean
isdn/act2000: fix major bug. clean irq handler.
char/pcmcia/synclink_cs: trim trailing whitespace
drivers/char/ip2: separate polling and irq-driven work entry points
drivers/char/ip2: split out irq core logic into separate function
[NETDRVR] lib82596, netxen: delete pointless tests from irq handler
Eliminate pointless casts from void* in a few driver irq handlers.
[PARPORT] Remove unused 'irq' argument from parport irq functions
[PARPORT] Kill useful 'irq' arg from parport_{generic_irq,ieee1284_interrupt}
[PARPORT] Consolidate code copies into a single generic irq handler
Rather than hand-rolling our own prototype, make the code more
future-proof by using the standard irq_handler_t typedef.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add support to force_hpet for all known MCP55 (nForce 5) chipset
LPC bridges.
These are the untested nForce 5 chips (taken from Mikko's original
patch, and checked against pci.ids).
Signed-off-by: Carlos Corbacho <cathectic@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/quirks.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
This patch adds a quirk from LinuxBIOS to force enable HPET on
the nVidia CK804 (nForce 4) chipset.
This quirk can very likely support more than just nForce 4
(LinuxBIOS use the same code for nForce 5), and possibly nForce 3,
but I don't have those chipsets, so cannot add and test them.
Tested on an Abit KN9 (CK804).
Signed-off-by: Carlos Corbacho <cathectic@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Documentation/kernel-parameters.txt | 3 +-
arch/x86/kernel/quirks.c | 37 +++++++++++++++++++++++++++++++++++-
2 files changed, 38 insertions(+), 2 deletions(-)
Make <asm/setup.h> usable by the boot code.
Clean up vestiges of the old command-line protocol from setup.h and
head_32.S (it is still supported from the boot loader point of
view, since it is converted to the new command-line protocol by the
boot code.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
During hibernation and suspend on x86_64 save CPU registers in the saved_context
structure rather than in a handful of separate variables.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The x86_64 arch/x86/kernel/Makefile uses references into
arch/x86/kernel/cpu/... to use code from there.
Unifiy it with the nicely structured i386 way and reuse the existing
subdirectory make rules.
Also move the machine check related source into ...kernel/cpu/mcheck,
where the other machine check related code is.
No code change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move mce.c to mce_32.c to allow the later move of the x86_64 mce.c
from arch/x86/kernel/ to ...kernel/cpu/mcheck
No code change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Prepare the makefiles in x86/kernel/cpu and x86/kernel/cpu/mcheck to
be used by the x86_64 build as well.
No code change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Most of contents in crash are same.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The previous patch wasn't correctly handling the 'count' variable. If
a CPU gave bad results on the 1st or 2nd run but good results on the
3rd, it wouldn't do the correct thing. No idea if any such CPU
exists, but the patch below handles that case by discarding the bad
runs.
If a bad result (too quick, or too slow) occurs on any of the 3 runs
it will be discarded.
Also updated some comments to explain what's going on.
Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I ran into this problem on a system that was unable to obtain NTP sync
because the clock was running very slow (over 10000ppm slow). ntpd had
declared all of its peers 'reject' with 'peer_dist' reason.
On investigation, the tsc_khz variable was significantly incorrect
causing xtime to run slow. After a reboot tsc_khz was correct so I
did a reboot test to see how often the problem occurred:
Test was done on a 2000 Mhz Xeon system. Of 689 reboots, 8 of them
had unacceptable tsc_khz values (>500ppm):
range of tsc_khz # of boots % of boots
---------------- ---------- ----------
< 1999750 0 0.000%
1999750 - 1999800 21 3.048%
1999800 - 1999850 166 24.128%
1999850 - 1999900 241 35.029%
1999900 - 1999950 211 30.669%
1999950 - 2000000 42 6.105%
2000000 - 2000000 0 0.000%
2000050 - 2000100 0 0.000%
[...]
2000100 - 2015000 1 0.145% << BAD
2015000 - 2030000 6 0.872% << BAD
2030000 - 2045000 1 0.145% << BAD
2045000 < 0 0.000%
The worst boot was 2032.577 Mhz, over 1.5% off!
It appears that on rare occasions, mach_countup() is taking longer to
complete than necessary.
I suspect that this is caused by the CPU taking a periodic SMI
interrupt right at the end of the 30ms calibration loop. This would
cause the loop to delay while the SMI BIOS hander runs. The resulting
TSC value is beyond what it actually should be resulting in a higher
tsc_khz.
The below patch makes native_calculate_cpu_khz() take the best
(shortest duration, lowest khz) run of it's 3 calibration loops. If a
SMI goes off causing a bad result (long duration, higher khz) it will
be discarded.
With the patch applied, 300 boots of the same system produce good
results:
range of tsc_khz # of boots % of boots
---------------- ---------- ----------
< 1999750 0 0.000%
1999750 - 1999800 30 10.000%
1999800 - 1999850 166 55.333%
1999850 - 1999900 89 29.667%
1999900 - 1999950 15 5.000%
1999950 < 0 0.000%
Problem was found and tested against 2.6.18. Patch is against 2.6.22.
Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It seems commit 09cadedbdc was incomplete
due to a clash with the x86 architecture merge.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Version 2.07 of the boot protocol uses 0x23C for the hardware_subarch
field, that for lguest is "1". This allows us to use the standard
boot entry point rather than the "GenuineLguest" string hack.
The standard entry point also clears the BSS and copies the boot parameters
and commandline for us, saving more code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes lguest able to use the virtio devices.
We change the device descriptor page from a simple array to a variable
length "type, config_len, status, config data..." format, and
implement virtio_config_ops to read from that config data.
We use the virtio ring implementation for an efficient Guest <-> Host
virtqueue mechanism, and the new LHCALL_NOTIFY hypercall to kick the
host when it changes.
We also use LHCALL_NOTIFY on kernel addresses for very very early
console output. We could have another hypercall, but this hack works
quite well.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This gets rid of the lguest bus, drivers and DMA mechanism, to make
way for a generic virtio mechanism.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These helper routines supply most of the virtqueue_ops for hypervisors
which want to use a ring for virtio. Unlike the previous lguest
implementation:
1) The rings are variable sized (2^n-1 elements).
2) They have an unfortunate limit of 65535 bytes per sg element.
3) The page numbers are always 64 bit (PAE anyone?)
4) They no longer place used[] on a separate page, just a separate
cacheline.
5) We do a modulo on a variable. We could be tricky if we cared.
6) Interrupts and notifies are suppressed using flags within the rings.
Users need only get the ring pages and provide a notify hook (KVM
wants the guest to allocate the rings, lguest does it sanely).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>
1) This allows us to get alot closer to booting bzImages.
2) It means we don't have to know page_offset.
3) The Guest needs to modify the boot pagetables to create the
PAGE_OFFSET mapping before jumping to C code.
4) guest_pa() walks the page tables rather than using page_offset.
5) We don't use page_offset to figure out whether to emulate: it was
always kinda quesationable, and won't work for instructions done
before remapping (bzImage unpacking in particular).
6) We still want the kernel address for tlb flushing: have the initial
hypercall give us that, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch).
This patch allows Guests to specify what system call vector they want,
and we try to reserve it. We only allow one non-Linux system call
vector, to try to avoid DoS on the Host.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Clean up the hypercall code to make the code in hypercalls.c
architecture independent. First process the common hypercalls and
then call lguest_arch_do_hcall() if the call hasn't been handled.
Rename struct hcall_ring to hcall_args.
This patch requires the previous patch which reorganize the layout of
struct lguest_regs on i386 so they match the layout of struct
hcall_args.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Separate i386 architecture specific from core.c and move it to
x86/core.c and add x86/lguest.h header file to match.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Lguest has two sides: host support (to launch guests) and guest
support (replacement boot path and paravirt_ops). This moves the
guest side to arch/x86/lguest where it's closer to related code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
1) Group all the "guest OS" support options together, under a PARAVIRT_GUEST
menu.
2) Make those options select CONFIG_PARAVIRT, as suggested by Andi.
3) Make kconfig help titles consistent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
* 'sg' of git://git.kernel.dk/linux-2.6-block:
Add CONFIG_DEBUG_SG sg validation
Change table chaining layout
Update arch/ to use sg helpers
Update swiotlb to use sg helpers
Update net/ to use sg helpers
Update fs/ to use sg helpers
[SG] Update drivers to use sg helpers
[SG] Update crypto/ to sg helpers
[SG] Update block layer to use sg helpers
[SG] Add helpers for manipulating SG entries
This patch should apply cleanly to the 2.6.23-git7 kernel. It changes the
powernow-k8 driver code that deals with 3rd generation Opteron, Phenom,
and later processors to match the architectural pstate driver described
in the AMD64 Architecture Programmer's Manual Volume 2 Chapter 18. The
initial implementation of the hardware pstate driver for PowerNow!
used some processor-version specific features, and would not be
maintainable in the long term as the processor features changed.
This architectural driver should work on all future AMD processors.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Add the BSS to the resource tree just as kernel text and kernel data are in
the resource tree. The main reason behind this is to avoid crashkernel
reservation in that area.
While it's not strictly necessary to have the BSS in the resource tree (the
actual collision detection is done in the reserve_bootmem() function before),
the usage of the BSS resource should be presented to the user in /proc/iomem
just as Kernel data and Kernel code.
Note: The patch currently is only implemented for x86 and ia64 (because
efi_initialize_iomem_resources() has the same signature on i386 and ia64).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we fix all the opensource gfx drivers to use the DMA api's, at that time
we can yank this config options out.
[jengelh@computergmbh.de: Kconfig fixes]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MSI interrupt handler registrations and fault handling support for Intel-IOMMU
hadrware.
This patch enables the MSI interrupts for the DMA remapping units and in the
interrupt handler read the fault cause and outputs the same on to the console.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Actual intel IOMMU driver. Hardware spec can be found at:
http://www.intel.com/technology/virtualization
This driver sets X86_64 'dma_ops', so hook into standard DMA APIs. In this
way, PCI driver will get virtual DMA address. This change is transparent to
PCI drivers.
[akpm@linux-foundation.org: remove unneeded cast]
[akpm@linux-foundation.org: build fix]
[bunk@stusta.de: fix duplicate CONFIG_DMAR Makefile line]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch uses the updated boot protocol to do paravirtualized boot.
If the boot version is >= 2.07, then it will do two things:
1. Check the bootparams loadflags to see if we should reload the
segment registers and clear interrupts. This is appropriate
for normal native boot and some paravirtualized environments, but
inapproprate for others.
2. Check the hardware architecture, and dispatch to the appropriate
kernel entrypoint. If the bootloader doesn't set this, then we
simply do the normal boot sequence.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Updates for version 2.07 of the boot protocol. This includes:
load_flags.KEEP_SEGMENTS- flag to request/inhibit segment reloads
hardware_subarch - what subarchitecture we're booting under
hardware_subarch_data - per-architecture data
The intention of these changes is to make booting a paravirtualized
kernel work via the normal Linux boot protocol.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits)
fix do_sys_open() prototype
sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake
Documentation: Fix typo in SubmitChecklist.
Typo: depricated -> deprecated
Add missing profile=kvm option to Documentation/kernel-parameters.txt
fix typo about TBI in e1000 comment
proc.txt: Add /proc/stat field
small documentation fixes
Fix compiler warning in smount example program from sharedsubtree.txt
docs/sysfs: add missing word to sysfs attribute explanation
documentation/ext3: grammar fixes
Documentation/java.txt: typo and grammar fixes
Documentation/filesystems/vfs.txt: typo fix
include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros
trivial copy_data_pages() tidy up
Fix typo in arch/x86/kernel/tsc_32.c
file link fix for Pegasus USB net driver help
remove unused return within void return function
Typo fixes retrun -> return
x86 hpet.h: remove broken links
...
a rather obvious fix given the opening of the function:
...
static __init int no_ipi_broadcast(char *str)
{
get_option(&str, &no_broadcast);
...
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
* ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: (33 commits)
x86: convert cpuinfo_x86 array to a per_cpu array
x86: introduce frame_pointer() and stack_pointer()
x86 & generic: change to __builtin_prefetch()
i386: do not BUG_ON() when MSR is unknown
x86: acpi use cpu_physical_id
x86: convert cpu_llc_id to be a per cpu variable
x86: convert cpu_to_apicid to be a per cpu variable
i386: introduce "used_vectors" bitmap which can be used to reserve vectors.
x86: use raw locks during oopses
x86: honor _PAGE_PSE bit on page walks
i386: do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.
x86: implement missing x86_64 function smp_call_function_mask()
x86: use descriptor's functions instead of inline assembly
i386: consolidate show_regs and show_registers for i386
i386: make callgraph use dump_trace() on i386/x86_64
x86: enable iommu_merge by default
i386: i386 add AMD64 Barcelona PMU MSR definitions to msr.h
x86: Unify i386 and x86-64 early quirks
x86: enable HPET on ICH3 and ICH4
x86: force enable HPET on VT8235/8237 chipsets
...
Manually fix trivial conflict with task pid container helper changes in
arch/x86/kernel/process_32.c
* Convert files to UTF-8.
* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)
* Correct town names (Goettingen -> Göttingen)
* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Fix the various misspellings of "system", controller", "interrupt" and
"[un]necessary".
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
This patch removes the crashkernel parsing from
arch/x86_64/kernel/machine_kexec.c and calls the generic function, introduced
in the last patch, in setup_bootmem_allocator().
This is necessary because the amount of System RAM must be known in this
function now because of the new syntax.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the crashkernel parsing from
arch/i386/kernel/machine_kexec.c and calls the generic function, introduced in
the last patch, in setup_bootmem_allocator().
This is necessary because the amount of System RAM must be known in this
function now because of the new syntax.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.
It took some time to cross-compile it, but hopefully these are all the
printks in arch code.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
copy_oldmem_page should not return leaving a page frame from the
previous kernel mapped.
Signed-off-by: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cpu_data is currently an array defined using NR_CPUS. This means that
we overallocate since we will rarely really use maximum configured cpus.
When NR_CPU count is raised to 4096 the size of cpu_data becomes
3,145,728 bytes.
These changes were adopted from the sparc64 (and ia64) code. An
additional field was added to cpuinfo_x86 to be a non-ambiguous cpu
index. This corresponds to the index into a cpumask_t as well as the
per_cpu index. It's used in various places like show_cpuinfo().
cpu_data is defined to be the boot_cpu_data structure for the NON-SMP
case.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch defines frame_pointer() and stack_pointer() similar to the
already defined instruction_pointer(). Thus the oprofile code can be
written in a more readable fashion.
[ tglx: arch/x86 adaptation ]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Here is a small patch to change the behavior of the PMU msr allocator
to avoid BUG_ON() when the MSR is unknwon. Instead, it now returns
ok, which means "I do not manage". The current allocator is not
yet managing the full set of PMU registers (e.g., GLOBAL_* on Core 2).
[watchdog] do not BUG_ON() in the MSR allocator if MSR is unknown, return ok
instead
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is from an earlier message from Christoph Lameter:
processor_core.c currently tries to determine the apicid by special casing
for IA64 and x86. The desired information is readily available via
cpu_physical_id()
on IA64, i386 and x86_64.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Additionally, boot_cpu_id needed to be exported to fix compile errors in
dma code when !CONFIG_SMP.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Convert cpu_llc_id from a static array sized by NR_CPUS to a per_cpu
variable. This saves sizeof(cpu_llc_id) * NR unused cpus. Access is
mostly from startup and CPU HOTPLUG functions.
Note there's an additional change of the type of cpu_llc_id from int to
u8 for ARCH i386 to correspond with the same type in ARCH x86_64.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch converts the x86_cpu_to_apicid array to be a per cpu
variable. This saves sizeof(apicid) * NR unused cpus. Access is mostly
from startup and CPU HOTPLUG functions.
MP_processor_info() is one of the functions that require access to the
x86_cpu_to_apicid array before the per_cpu data area is setup. For this
case, a pointer to the __initdata array is initialized in setup_arch()
and removed in smp_prepare_cpus() after the per_cpu data area is
initialized.
A second change is included to change the initial array value of ARCH
i386 from 0xff to BAD_APICID to be consistent with ARCH x86_64.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This simplifies the io_apic.c __assign_irq_vector() logic and removes
the explicit SYSCALL_VECTOR check, and also allows for vectors to be
reserved by other mechanisms (ie. lguest).
[ tglx: arch/x86 adaptation ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Don't want any lockdep or other fragile machinery to run during oopses.
Use raw spinlocks directly for oops locking.
Also disables irq flag tracing there.
[ tglx: arch/x86 adaptation ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch defines the missing function smp_call_function_mask() for x86_64,
this is more or less a cut&paste of i386 function. It removes also some
duplicate code.
This function is needed by KVM to execute a function on some CPUs.
AK: Fixed description
AK: Moved WARN_ON(irqs_disabled) one level up to not warn in the panic case.
[ tglx: arch/x86 adaptation ]
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch provides a new set of functions for managing the descriptor
tables that can be used instead of putting the raw assembly in .c files.
Remodeling of store_tr() suggested by Frederik Deweerdt.
[ tglx: arch/x86 adaptation ]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Both functions printk the same information, except for CRx and
debug registers in the show_registers() one and a bit different
manner. So move the common code into one place. This is already
done for x86_64, so I think it's worth having the same on i386.
This saves 100 bytes of .rodata section :) ...
but only 8 from .text :(
[ tglx: arch/x86 adaptation ]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch improves oprofile callgraphs for i386/x86_64. The old
backtracing code was unable to produce kernel backtraces if the
kernel wasn't compiled with framepointers. The code now uses
dump_trace().
[ tglx: arch/x86 adaptation ]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
They were already very similar; just use the same file now.
[ tglx: arch/x86 adaptation ]
Cc: lenb@kernel.org
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ICH3 and ICH4 have undocumented HPET capabilities. This patch enables
HPET for platforms based around these ICHs.
Tested on various ICH3 and ICH4 platforms.
Because HPET is not officially documented for ICH3/4 and may not have
been validated by chipset folks, we're on thin ice here. I'd recommend
testing this patch in -hrt or -mm for a while and wait for
success/failure reports before feeding it upstream.
tglx: depends on the force_hpet command line option !
Signed-off-by: Udo A. Steinberg <us15@os.inf.tu-dresden.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds quirks to force enable HPET on Via VT8235 and
VT8237 chipsets. The datasheet for 8237 documents HPET
functionality (although wrongly) whereas HPET is undocumented
for 8235.
Tested on A7V880 (8237) and K7VT4A+ (8235) boards.
tglx: depends on the force_hept commandline option
Signed-off-by: Udo A. Steinberg <us15@os.inf.tu-dresden.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
add force_hpet boot option.
(this will be useful to make the forced-enable quirks depend on.)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
don't zero pad addresses in segfault message. Matches the other trap
messages. This leaves some more space for the new file name.
[ mingo: arch/x86 adaptation ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This makes x86-64's ia32 code use the new linux/elfcore-compat.h, reducing
some hand-copied duplication.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Old debugging code that is not really needed anymore. If someone
wants it it would be better replaced with a systemtap script or
kprobe.
This avoids a potential cache miss during page fault processing.
[ mingo: arch/x86 adaptation ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Simple cosmetic update for the cs5536 reboot fixup; define the MSR that's used
for rebooting in geode.h, and use the define.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86 NUMA kernels crash in the scheduler setup code if "nosmp" or
"maxcpus=0" is passed on the boot command line:
| Brought up 1 CPUs
| BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
| printing eip: c011f0b5 *pde = 00000000
| Oops: 0000 [#1] SMP
|
| Pid: 1, comm: swapper Not tainted (2.6.23 #67)
| EIP: 0060:[<c011f0b5>] EFLAGS: 00010246 CPU: 0
| EIP is at sd_degenerate+0x35/0x40
the reason is sloppy spaghetti code in smpboot_32.c that resulted in a
missing map_cpu_to_logical_apicid() call - which also had the side-effect
of setting up the cpu_2_node[] entry for the lone CPU. That resulted in
node_to_cpumask(0) resulting in 00000000 - confusing the sched-domains
setup code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Appended patch fixes an oops while changing the vsyscall sysctl.
I am sure no one tested this code before integrating into mainline :(
BTW, using ioremap() in vsyscall_sysctl_change() to get the virtual
address of a kernel symbol sounds like an over kill.. I wonder if we
can define a simple __va_vsymbol() which will return directly the
kernel direct mapping. comments in the code which says gcc has trouble
with __va(__pa()) sounds bogus to me. __pa() on a vsyscall address will
not work anyhow :(
And also, the whole nop out syscall in vsyscall page infrastructure
(vsyscall_sysctl_change()) is added to make some attacks difficult,
and yet I don't see this nop out being done by default. This area
requires more cleanups?
Fix an oops with __pa_vsymbol(). VSYSCALL_FIRST_PAGE is a fixmap index.
We want the starting virtual address of the vsyscall page and not the index.
[ mingo: arch/x86 adaptation ]
Reported-by: Yanmin Zhang <yanmin.zhang@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
While we were reviewing pageattr_32/64.c for unification,
Thomas Gleixner noticed the following serious SMP bug in
global_flush_tlb():
down_read(&init_mm.mmap_sem);
list_replace_init(&deferred_pages, &l);
up_read(&init_mm.mmap_sem);
this is SMP-unsafe because list_replace_init() done on two CPUs in
parallel can corrupt the list.
This bug has been introduced about a year ago in the 64-bit tree:
commit ea7322decb
Author: Andi Kleen <ak@suse.de>
Date: Thu Dec 7 02:14:05 2006 +0100
[PATCH] x86-64: Speed and clean up cache flushing in change_page_attr
down_read(&init_mm.mmap_sem);
- dpage = xchg(&deferred_pages, NULL);
+ list_replace_init(&deferred_pages, &l);
up_read(&init_mm.mmap_sem);
the xchg() based version was SMP-safe, but list_replace_init() is not.
So this "cleanup" introduced a nasty bug.
why this bug never become prominent is a mystery - it can probably be
explained with the (still) relative obscurity of the x86_64 architecture.
the safe fix for now is to write-lock init_mm.mmap_sem.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Get rid of sparse related warnings from places that use integer as NULL
pointer.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
msr_class_cpu_callback() can be marked __cpuinit, being the notifier callback
for a __cpuinitdata notifier_block. So can be marked msr_device_create() too,
called only from the newly-__cpuinit msr_class_cpu_callback() or from
__init-marked msr_init().
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: Andi Kleen <ak@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix resource leakage in error case within detect_cache_attributes()
- Don't register hotcpu notifier when cache_add_dev() returns error
- Introduce cache_dev_map cpumask to track whether cache interface for
CPU is successfully added by cache_add_dev() or not.
cache_add_dev() may fail with out of memory error. In order to
avoid cache_remove_dev() with that uninitialized cache interface when
CPU_DEAD event is delivered we need to have the cache_dev_map cpumask.
(We cannot change cache_add_dev() from CPU_ONLINE event handler
to CPU_UP_PREPARE event handler. Because cache_add_dev() needs
to do cpuid and store the results with its CPU online.)
[nix.or.die@googlemail.com: fix a section mismatch warning]
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Clear kobject in percpu device_mce before calling sysdev_register() with
Because mce_create_device() may fail and it leaves kobject filled with
junk. It will be the problem when mce_create_device() will be called
next time.
- Fix error handling in mce_create_device()
Error handling should not do sysdev_remove_file() with not yet added
attributes.
- Don't register hotcpu notifier when mce_create_device() returns error
- Do mce_create_device() in CPU_UP_PREPARE instead of CPU_ONLINE
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On platforms that copy sys_tz into the vdso (currently only x86_64, soon to
include powerpc), it is possible for the vdso to get out of sync if a user
calls (admittedly unusual) settimeofday(NULL, ptr).
This patch adds a hook for architectures that set
CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also
updatee their copy in the vdso.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use temporary page tables for the kernel text mapping during hibernation
restore on x86_64.
Without the patch, the original boot kernel's page tables that represent the
kernel text mapping are used while the core of the image kernel is being
restored. However, in principle, if the boot kernel is not identical to the
image kernel, the location of these page tables in the image kernel need not
be the same, so we should create a safe copy of the kernel text mapping prior
to restoring the core of the image kernel.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since we already pass the address of restore_registers() in the image header,
we can also pass the value of the CR3 register from before the hibernation in
the same way. This will allow us to avoid using init_level4_pgt page tables
during the restore.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it possible to restore a hibernation image on x86_64 with the help of a
kernel different from the one in the image.
The idea is to split the core restoration code into two separate parts and to
place each of them in a different page. The first part belongs to the boot
kernel and is executed as the last step of the image kernel's memory
restoration procedure. Before being executed, it is relocated to a safe page
that won't be overwritten while copying the image kernel pages.
The final operation performed by it is a jump to the second part of the core
restoration code that belongs to the image kernel and has just been restored.
This code makes the CPU switch to the image kernel's page tables and restores
the state of general purpose registers (including the stack pointer) from
before the hibernation.
The main issue with this idea is that in order to jump to the second part of
the core restoration code the boot kernel needs to know its address.
However, this address may be passed to it in the image header. Namely, the
part of the image header previously used for checking if the version of the
image kernel is correct can be replaced with some architecture specific data
that will allow the boot kernel to jump to the right address within the image
kernel. These data should also be used for checking if the image kernel is
compatible with the boot kernel (as far as the memory restroration procedure
is concerned). It can be done, for example, with the help of a "magic" value
that has to be equal in both kernels, so that they can be regarded as
compatible.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes old debugging stuff, that should be no longer neccessary. It
accessed VGA hardware (which may not be ready at this point), and used LEDs
at port 80 for debugging.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
make clean failed to delete a few files in
x86/kernel. This is because kbuild does not
see the correct/full kernel/Makefile.
As a workaround until the Makefiles are merged specify
the files to be deleted in the common Makefile.
Reported by Mike Galbraith <efault@gmx.de>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix rebuild of kernel when there is no changes.
This happened for i386.
Using make V=2 hinted that the output files were
not assigned to targets - fixed by this patch.
Reported by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch:
- makes .gitignore files visible to git
- makes arch/x86/kernel/vsyscall_32.lds and arch/i386/boot invisible
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I can't see the reason ". = VDSO_PRELINK + 0x900;" was ever there in
the linker script for the x86_64 vDSO. I can't find anything that
depends on this magic offset, or that should care at all about the
particular location of of the .data section (all from vvar.c) in the
vDSO image. If it is really desireable to place .data at 0x900, then it
should be after all the other sections so they fill in the space up to
0x900.
This removes the 0x900 magic and cleans up the output sections generally
in the vDSO linker script. This saves a few hundred bytes in the size
of the vDSO file, bringing it back well under 4kb total so that its vma
only needs one page.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>