Commit graph

1941 commits

Author SHA1 Message Date
Linus Torvalds
7019b1b500 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix 2.6.27rc1 cannot boot more than 8CPUs
  x86: make "apic" an early_param() on 32-bit, NULL check
  EFI, x86: fix function prototype
  x86, pci-calgary: fix function declaration
  x86: work around gcc 3.4.x bug
  x86: make "apic" an early_param() on 32-bit
  x86, debug: tone down arch/x86/kernel/mpparse.c debugging printk
  x86_64: restore the proper NR_IRQS define so larger systems work.
  x86: Restore proper vector locking during cpu hotplug
  x86: Fix broken VMI in 2.6.27-rc..
  x86: fdiv bug detection fix
2008-08-11 16:44:35 -07:00
Randy Dunlap
b0fbaa6b59 EFI, x86: fix function prototype
Fix function prototype in header file to match source code:

linux-next-20080807/arch/x86/kernel/efi_64.c💯14: error: symbol 'efi_ioremap' redeclared with different type (originally declared at include2/asm/efi.h:89) - different address spaces

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11 18:50:19 +02:00
Cyrill Gorcunov
2ae111cdd8 x86: apic interrupts - move assignments to irqinit_32.c, v2
64bit mode APIC interrupt handlers are set within irqinit_64.c.
Lets do tha same for 32bit mode which would help in furter code merging.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11 16:43:09 +02:00
Ingo Molnar
6de9c70882 Merge branch 'linus' into x86/cleanups 2008-08-11 12:57:01 +02:00
Ingo Molnar
8067794bec Merge branch 'linus' into x86/x2apic
Conflicts:

	arch/x86/kernel/genapic_64.c

Manual merge:

	arch/x86/kernel/genx2apic_uv_x.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11 11:19:20 +02:00
Eric W. Biederman
3c7569b284 x86_64: restore the proper NR_IRQS define so larger systems work.
As pointed out and tracked by Yinghai Lu <yhlu.kernel@gmail.com>:

 Dhaval Giani got:
 kernel BUG at arch/x86/kernel/io_apic_64.c:357!
 invalid opcode: 0000 [1] SMP
 CPU 24
 ...

his system (x3950) has 8 ioapic, irq > 256

This was caused by:

       commit 9b7dc567d0
       Author: Thomas Gleixner <tglx@linutronix.de>
       Date:   Fri May 2 20:10:09 2008 +0200

          x86: unify interrupt vector defines

          The interrupt vector defines are copied 4 times around with minimal
          differences. Move them all into asm-x86/irq_vectors.h

It appears that Thomas did not notice that x86_64 does something
completely different when he merge irq_vectors.h

We can solve this for 2.6.27 by simply reintroducing the old heuristic
for setting NR_IRQS on x86_64 to a usable value, which trivially removes
the regression.

Long term it would be nice to harmonize the handling of ioapic interrupts
of x86_32 and x86_64 so we don't have this kind of confusion.

Dhaval Giani <dhaval@linux.vnet.ibm.com> tested an earlier version of
this patch by YH which confirms simply increasing NR_IRQS fixes the
problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11 10:39:04 +02:00
Eric W. Biederman
d388e5fdc4 x86: Restore proper vector locking during cpu hotplug
Having cpu_online_map change during assign_irq_vector can result
in some really nasty and weird things happening.  The one that
bit me last time was accessing non existent per cpu memory for non
existent cpus.

This locking was removed in a sloppy x86_64 and x86_32 merge patch.

Guys can we please try and avoid subtly breaking x86 when we are
merging files together?

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-11 10:37:34 +02:00
Marcin Slusarz
bafc1dae82 x86: mmconf: fix section mismatch warning
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1591): Section mismatch in reference from the function init_amd() to the function .init.text:check_enable_amd_mmconf_dmi()
The function __cpuinit init_amd() references
a function __init check_enable_amd_mmconf_dmi().
If check_enable_amd_mmconf_dmi is only used by init_amd then
annotate check_enable_amd_mmconf_dmi with a matching annotation.

check_enable_amd_mmconf_dmi is only called from init_amd which is __cpuinit

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-10 21:13:08 -07:00
Linus Torvalds
84ff7a0012 Merge branch 'kvm-updates-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.27' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: s390: Fix kvm on IBM System z10
  KVM: Advertise synchronized mmu support to userspace
  KVM: Synchronize guest physical memory map to host virtual memory map
  KVM: Allow browsing memslots with mmu_lock
  KVM: Allow reading aliases with mmu_lock
2008-08-01 12:48:16 -07:00
Ingo Molnar
4b336b0625 Merge branch 'x86/urgent' into x86/xen 2008-07-31 12:41:34 +02:00
Ingo Molnar
5fbf24659b Merge branch 'linus' into x86/xen 2008-07-31 12:38:04 +02:00
H. Peter Anvin
6152e4b1c9 x86, xsave: keep the XSAVE feature mask as an u64
The XSAVE feature mask is a 64-bit number; keep it that way, in order
to avoid the mistake done with rdmsr/wrmsr.  Use the xsetbv() function
provided in the previous patch.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:50:35 +02:00
H. Peter Anvin
b4a091a62c x86, xsave: add <asm/xcr.h> header file for XCR registers
Add <asm-x86/xcr.h> header file for the XCR registers and their access
functions.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:49:36 +02:00
Suresh Siddha
c37b5efea4 x86, xsave: save/restore the extended state context in sigframe
On cpu's supporting xsave/xrstor, fpstate pointer in the sigcontext, will
include the extended state information along with fpstate information. Presence
of extended state information is indicated by the presence
of FP_XSTATE_MAGIC1 at fpstate.sw_reserved.magic1 and FP_XSTATE_MAGIC2
at fpstate + (fpstate.sw_reserved.extended_size - FP_XSTATE_MAGIC2_SIZE).

Extended feature bit mask that is saved in the memory layout is represented
by the fpstate.sw_reserved.xstate_bv

For RT signal frames, UC_FP_XSTATE in the uc_flags also indicate the
presence of extended state information in the sigcontext's fpstate
pointer.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:49:27 +02:00
Suresh Siddha
bdd8caba5e x86, xsave: struct _fpstate extensions to include extended state information
Bytes 464..511 in the current 512byte layout of fxsave/fxrstor
frame, are reserved for SW usage. On cpu's supporting xsave/xrstor, these bytes
are used to extended the fpstate pointer in the sigcontext, which now includes
the extended state information along with fpstate information.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:49:27 +02:00
Suresh Siddha
9dc89c0f96 x86, xsave: xsave/xrstor specific routines
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:49:26 +02:00
Suresh Siddha
ab5137015f x86, xsave: reorganization of signal save/restore fpstate code layout
move 64bit routines that saves/restores fpstate in/from user stack from
signal_64.c to xsave.c

restore_i387_xstate() now handles the condition when user passes
NULL fpstate.

Other misc changes for prepartion of xsave/xrstor sigcontext support.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:49:26 +02:00
Suresh Siddha
3c1c7f1014 x86, xsave: dynamically allocate sigframes fpstate instead of static allocation
dynamically allocate fpstate on the stack, instead of static allocation
in the current sigframe layout on the user stack. This will allow the
fpstate structure to grow in the future, which includes extended state
information supporting xsave/xrstor.

signal handlers will be able to access the fpstate pointer from the
sigcontext structure asusual, with no change. For the non RT sigframe's
(which are supported only for 32bit apps), current static fpstate layout
in the sigframe will be unused(so that we don't change the extramask[]
offset in the sigframe and thus prevent breaking app's which modify
extramask[]).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:49:25 +02:00
Suresh Siddha
b359e8a434 x86, xsave: context switch support using xsave/xrstor
Uses xsave/xrstor (instead of traditional fxsave/fxrstor) in context switch
when available.

Introduces TS_XSAVE flag, which determine the need to use xsave/xrstor
instructions during context switch instead of the legacy fxsave/fxrstor
instructions. Thread-synchronous status word is already in L1 cache during
this code patch and thus minimizes the performance penality compared to
(cpu_has_xsave) checks.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:49:24 +02:00
Suresh Siddha
dc1e35c6e9 x86, xsave: enable xsave/xrstor on cpus with xsave support
Enables xsave/xrstor by turning on cr4.osxsave on cpu's which have
the xsave support. For now, features that OS supports/enabled are
FP and SSE.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:49:24 +02:00
Suresh Siddha
a648bf4632 x86, xsave: xsave cpuid feature bits
Add xsave CPU feature bits.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:49:23 +02:00
Ingo Molnar
15dd859cac Merge commit 'v2.6.27-rc1' into x86/core
Conflicts:

	include/asm-x86/dma-mapping.h
	include/asm-x86/namei.h
	include/asm-x86/uaccess.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-30 19:33:48 +02:00
Ingo Molnar
b2d9d33412 Merge branch 'x86/fpu' into x86/core 2008-07-30 19:32:39 +02:00
FUJITA Tomonori
8978b74253 generic, x86: fix add iommu_num_pages helper function
This IOMMU helper function doesn't work for some architectures:

  http://marc.info/?l=linux-kernel&m=121699304403202&w=2

It also breaks POWER and SPARC builds:

  http://marc.info/?l=linux-kernel&m=121730388001890&w=2

Currently, only x86 IOMMUs use this so let's move it to x86 for
now.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-29 12:12:48 +02:00
Ingo Molnar
3825c9e8d0 Merge commit 'v2.6.27-rc1' into x86/microcode
Conflicts:

	arch/x86/kernel/microcode.c

Manual resolutions:

	arch/x86/kernel/microcode_amd.c
	arch/x86/kernel/microcode_intel.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-29 11:54:24 +02:00
Andrea Arcangeli
e930bffe95 KVM: Synchronize guest physical memory map to host virtual memory map
Synchronize changes to host virtual addresses which are part of
a KVM memory slot to the KVM shadow mmu.  This allows pte operations
like swapping, page migration, and madvise() to transparently work
with KVM.

Signed-off-by: Andrea Arcangeli <andrea@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29 12:33:53 +03:00
Ingo Molnar
cb28a1bbdb Merge branch 'linus' into core/generic-dma-coherent
Conflicts:

	arch/x86/Kconfig

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-29 00:07:55 +02:00
Peter Oruba
8d86f390d9 x86: major refactoring
Refactored code by introducing a two-module solution.

There is one general module in which vendor specific modules can hook into.
However, that is exclusive, there is only one vendor specific module
allowed at a time. A CPU vendor check makes sure only the correct
module for the underlying system gets called.

Functinally in terms of patch loading itself there are no changes. This
refactoring provides a basis for future implementations of other vendors'
patch loaders.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 19:57:57 +02:00
Peter Oruba
26bf7a48c3 x86: first step of refactoring, introducing microcode_ops
Refactoring with the goal of having one general module and separate
vendor specific modules that hook into the general one.

Microcode_ops is a function pointer structure in which vendor
specific modules will enter all functions that differ between
vendors and that need to be accessed from the general module.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 19:57:57 +02:00
Peter Oruba
9835fd4ad9 x86: add AMD specific declarations
Added AMD specific declarations to header file.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 19:57:56 +02:00
Peter Oruba
d4ee366868 x86: structure declaration renaming
Renamed common structures to vendor specific naming scheme
so other vendors will be able to use the same naming
convention.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 19:57:55 +02:00
Peter Oruba
c3b71bcec0 x86: move per CPU microcode structure declaration to header file
This structure will be later used by other modules as well and
needs therfore to be moved out to a header file.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 19:57:54 +02:00
Peter Oruba
8e61028dfd x86: typedef removal
Removed typedefs. No functional changes to the code.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 19:57:53 +02:00
Peter Oruba
9a56a0f80b x86: moved Intel microcode patch loader declarations to seperate header file
Intel specific microcode declarations have been moved to a seperate header file.
There are no code changes to the code itself and no side effects to other parts.

Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 19:57:52 +02:00
Ingo Molnar
71998e83c5 Merge branch 'x86-tracehook' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace into x86/tracehook 2008-07-28 17:03:43 +02:00
Ingo Molnar
b7d0b67845 Merge branch 'linus' into x86/cpu
Conflicts:

	arch/x86/kernel/cpu/intel_cacheinfo.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-28 16:26:31 +02:00
Ingo Molnar
a403e45c3b Merge core/lib: pick up memparse() change.
Merge branch 'core/lib' into x86/xen
2008-07-28 15:07:54 +02:00
Jeremy Fitzhardinge
64f53a0492 x86: fix initialization of 'l' bit in ldt descriptors
Make sure that fill_ldt() initializes the 'l' bit in the descriptor.
It always sets it to 0, ignoring 'lm' in user_desc, preserving original
x86_64 behaviour.

Previously it was leaving 'l' uninitialized.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Glauber de Oliveira Costa <gcosta@redhat.com>
2008-07-28 14:26:26 +02:00
Joerg Roedel
5f4cb662a0 KVM: SVM: allow enabling/disabling NPT by reloading only the architecture module
If NPT is enabled after loading both KVM modules on AMD and it should be
disabled, both KVM modules must be reloaded. If only the architecture module is
reloaded the behavior is undefined. With this patch it is possible to disable
NPT only by reloading the kvm_amd module.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-27 11:34:09 +03:00
Al Viro
7f2da1e7d0 [PATCH] kill altroot
long overdue...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:20 -04:00
Roland McGrath
59e52130f0 x86: tracehook: TIF_NOTIFY_RESUME
This adds TIF_NOTIFY_RESUME support for x86, both 64-bit and 32-bit.
When set, we call tracehook_notify_resume() on the way to user mode.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-26 14:38:05 -07:00
Roland McGrath
68bd0f4ef7 x86: tracehook: asm/syscall.h
Add asm/syscall.h for x86 with all the required entry points.
This will allow arch-independent tracing code for system calls.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-26 14:38:01 -07:00
Linus Torvalds
fb3b806144 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, AMD IOMMU: include amd_iommu_last_bdf in device initialization
  x86: fix IBM Summit based systems' phys_cpu_present_map on 32-bit kernels
  x86, RDC321x: remove gpio.h complications
  x86, RDC321x: add to mach-default
  crashdump: fix undefined reference to `elfcorehdr_addr'
  flag parameters: fix compile error of sys_epoll_create1
2008-07-26 13:25:05 -07:00
Nick Piggin
8174c430e4 x86: lockless get_user_pages_fast()
Implement get_user_pages_fast without locking in the fastpath on x86.

Do an optimistic lockless pagetable walk, without taking mmap_sem or any
page table locks or even mmap_sem.  Page table existence is guaranteed by
turning interrupts off (combined with the fact that we're always looking
up the current mm, means we can do the lockless page table walk within the
constraints of the TLB shootdown design).  Basically we can do this
lockless pagetable walk in a similar manner to the way the CPU's pagetable
walker does not have to take any locks to find present ptes.

This patch (combined with the subsequent ones to convert direct IO to use
it) was found to give about 10% performance improvement on a 2 socket 8
core Intel Xeon system running an OLTP workload on DB2 v9.5

 "To test the effects of the patch, an OLTP workload was run on an IBM
  x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
  2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel.  Comparing
  runs with and without the patch resulted in an overall performance
  benefit of ~9.8%.  Correspondingly, oprofiles showed that samples from
  __up_read and __down_read routines that is seen during thread contention
  for system resources was reduced from 2.8% down to .05%.  Monitoring the
  /proc/vmstat output from the patched run showed that the counter for
  fast_gup contained a very high number while the fast_gup_slow value was
  zero."

(fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a
counter we had for the number of times the slowpath was invoked).

The main reason for the improvement is that DB2 has multiple threads each
issuing direct-IO.  Direct-IO uses get_user_pages, and thus the threads
contend the mmap_sem cacheline, and can also contend on page table locks.

I would anticipate larger performance gains on larger systems, however I
think DB2 uses an adaptive mix of threads and processes, so it could be
that thread contention remains pretty constant as machine size increases.
In which case, we stuck with "only" a 10% gain.

The downside of using get_user_pages_fast is that if there is not a pte
with the correct permissions for the access, we end up falling back to
get_user_pages and so the get_user_pages_fast is a bit of extra work.
However this should not be the common case in most performance critical
code.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: Kconfig fix]
[akpm@linux-foundation.org: Makefile fix/cleanup]
[akpm@linux-foundation.org: warning fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:06 -07:00
Nick Piggin
a0a8f5364a x86: implement pte_special
Implement the pte_special bit for x86.  This is required to support
lockless get_user_pages, because we need to know whether or not we can
refcount a particular page given only its pte (and no vma).

[hugh@veritas.com: fix a BUG]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:05 -07:00
Huang Ying
3ab8352137 kexec jump
This patch provides an enhancement to kexec/kdump.  It implements the
following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off).  This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e

Implementation point:

To support jumping without reserving memory.  One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page).  When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped.  Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Alexis Bruemmer
1956a96de4 x86 calgary: fix handling of devices that aren't behind the Calgary
The calgary code can give drivers addresses above 4GB which is very bad
for hardware that is only 32bit DMA addressable.

With this patch, the calgary code sets the global dma_ops to swiotlb or
nommu properly, and the dma_ops of devices behind the Calgary/CalIOC2
to calgary_dma_ops.  So the calgary code can handle devices safely that
aren't behind the Calgary/CalIOC2.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexis Bruemmer <alexisb@us.ibm.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:03 -07:00
FUJITA Tomonori
8d8bb39b9e dma-mapping: add the device argument to dma_mapping_error()
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:

This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread).  So I
CC'ed this to KVM camp.  Comments are appreciated.

A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
NULL, the system-wide dma_ops pointer is used as before.

If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging).  It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.

The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
device.  Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.

The first patch adds the device argument to dma_mapping_error.  The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.

This patch:

dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations.  So we can't have dma_mapping_ops per device.

Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
argument.

[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:03 -07:00
Ingo Molnar
c3cc99ff5d Merge branch 'linus' into x86/xen 2008-07-26 17:48:49 +02:00
Suresh Siddha
6ffac1e90a x64, fpu: fix possible FPU leakage in error conditions
On Thu, Jul 24, 2008 at 03:43:44PM -0700, Linus Torvalds wrote:
> So how about this patch as a starting point? This is the RightThing(tm) to
> do regardless, and if it then makes it easier to do some other cleanups,
> we should do it first. What do you think?

restore_fpu_checking() calls init_fpu() in error conditions.

While this is wrong(as our main intention is to clear the fpu state of
the thread), this was benign before commit 92d140e21f ("x86: fix taking
DNA during 64bit sigreturn").

Post commit 92d140e21f, live FPU registers may not belong to this
process at this error scenario.

In the error condition for restore_fpu_checking() (especially during the
64bit signal return), we are doing init_fpu(), which saves the live FPU
register state (possibly belonging to some other process context) into
the thread struct (through unlazy_fpu() in init_fpu()). This is wrong
and can leak the FPU data.

For the signal handler restore error condition in restore_i387(), clear
the fpu state present in the thread struct(before ultimately sending a
SIGSEGV for badframe).

For the paranoid error condition check in math_state_restore(), send a
SIGSEGV, if we fail to restore the state.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <stable@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-26 16:37:04 +02:00