Commit graph

3481 commits

Author SHA1 Message Date
Roland McGrath
15e8f348db x86_64: remove bogus optimization in sysret_signal
This short-circuit path in sysret_signal looks wrong to me.
AFAICT, in practice the branch is never taken--and if it were,
it would go wrong.  To wit, try loading a module whose init
function does set_thread_flag(TIF_IRET), and see insmod crash
(presumably with a wrong user stack pointer).

This is because the FIXUP_TOP_OF_STACK work hasn't been done yet
when we jump around the call to ptregscall_common and get to
int_with_check--where it expects the user RSP,SS,CS and EFLAGS to
have been stored by FIXUP_TOP_OF_STACK.

I don't think it's normally possible to get to sysret_signal with no
_TIF_DO_NOTIFY_MASK bits set anyway, so these two instructions are
already superfluous.  If it ever did happen, it is harmless to call
do_notify_resume with nothing for it to do.

Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-23 17:43:36 -07:00
Linus Torvalds
0988c37c24 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix crash due to missing debugctlmsr on AMD K6-3
  x86: add PTE_FLAGS_MASK
  x86: rename PTE_MASK to PTE_PFN_MASK
  x86: fix pte_flags() to only return flags, fix lguest (updated)
  x86: use setup_clear_cpu_cap with disable_apic, fix
  x86: move the last Dprintk instance to pr_debug()
2008-07-22 13:40:24 -07:00
Jan Kratochvil
d536b1f865 x86: fix crash due to missing debugctlmsr on AMD K6-3
currently if you use PTRACE_SINGLEBLOCK on AMD K6-3 (i586) it will crash.
Kernel now wrongly assumes existing DEBUGCTLMSR MSR register there.

Removed the assumption also for some other non-K6 CPUs but I am not sure there
(but it can only bring small inefficiency there if my assumption is wrong).

Based on info from Roland McGrath, Chuck Ebbert and Mikulas Patocka.
More info at:
	https://bugzilla.redhat.com/show_bug.cgi?id=456175

Signed-off-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-22 14:16:37 +02:00
Jeremy Fitzhardinge
77be1fabd0 x86: add PTE_FLAGS_MASK
PTE_PFN_MASK was getting lonely, so I made it a friend.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-22 10:43:45 +02:00
Jeremy Fitzhardinge
59438c9fc4 x86: rename PTE_MASK to PTE_PFN_MASK
Rusty, in his peevish way, complained that macros defining constants
should have a name which somewhat accurately reflects the actual
purpose of the constant.

Aside from the fact that PTE_MASK gives no clue as to what's actually
being masked, and is misleadingly similar to the functionally entirely
different PMD_MASK, PUD_MASK and PGD_MASK, I don't really see what the
problem is.

But if this patch silences the incessent noise, then it will have
achieved its goal (TODO: write test-case).

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-22 10:43:44 +02:00
Rusty Russell
c2e3277f87 x86: fix pte_flags() to only return flags, fix lguest (updated)
(Jeremy said:
	rusty: use PTE_MASK
	rusty: use PTE_MASK
	rusty: use PTE_MASK
 When I asked:
	jsgf: does that include the NX flag?
 He responded eloquently:
	rusty: use PTE_MASK
	rusty: use PTE_MASK
	yes, it's the official constant of masking flags out of ptes
)

Change a15af1c9ea 'x86/paravirt: add
pte_flags to just get pte flags' removed lguest's private pte_flags()
in favor of a generic one.

Unfortunately, the generic one doesn't filter out the non-flags bits:
this results in lguest creating corrupt shadow page tables and blowing
up host memory.

Since noone is supposed to use the pfn part of pte_flags(), it seems
safest to always do the filtering.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-and-morning-tea-spilled-by: Ingo Molnar <mingo@elte.hu>
2008-07-22 10:41:18 +02:00
Yinghai Lu
988781dc3e x86: use setup_clear_cpu_cap with disable_apic, fix
beauty fix: /proc/cpuinfo will still show apic feature even if
we booted up with it disabled.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-22 09:05:45 +02:00
Andi Kleen
d95d62c018 sysdev: Convert the x86 mce tolerant sysdev attribute to generic attribute
Use the new generic int attribute accessors for the x86 mce tolerant
attribute. Simple example to illustrate the new macros.

There are much more places all over the tree that could be converted
like this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:02 -07:00
Andi Kleen
4a0b2b4dbe sysdev: Pass the attribute to the low level sysdev show/store function
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.

I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.

I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.

Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:55:02 -07:00
Greg Kroah-Hartman
fc3a8828b1 driver core: fix a lot of printk usages of bus_id
We have the dev_printk() variants for this kind of thing, use them
instead of directly trying to access the bus_id field of struct device.

This is done in order to remove bus_id entirely.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:53 -07:00
Greg Kroah-Hartman
3bfd49c8ab device create: x86: convert device_create to device_create_drvdata
device_create() is race-prone, so use the race-free
device_create_drvdata() instead as device_create() is going away.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:46 -07:00
Linus Torvalds
6d52dcbe56 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] cpufreq: remove CVS keywords
  [CPUFREQ] change cpu freq arrays to per_cpu variables
2008-07-21 15:10:37 -07:00
Linus Torvalds
f2d0f1dea4 x86: Fix help message for STRICT_DEVMEM config option
The message talked about "left on" when it meant to say disabled.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-21 13:04:08 -07:00
Thomas Gleixner
5171c3047d x86: move the last Dprintk instance to pr_debug()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-21 21:58:34 +02:00
Thomas Gleixner
cfc1b9a6a6 x86: convert Dprintk to pr_debug
There are a couple of places where (P)Dprintk is used which is an old
compile time enabled printk wrapper. Convert it to the generic
pr_debug().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-07-21 21:35:38 +02:00
Ingo Molnar
2e2dcc7631 Merge branch 'x86/paravirt-spinlocks' into x86/for-linus 2008-07-21 16:45:56 +02:00
Ingo Molnar
acee709cab Merge branches 'x86/urgent', 'x86/amd-iommu', 'x86/apic', 'x86/cleanups', 'x86/core', 'x86/cpu', 'x86/fixmap', 'x86/gart', 'x86/kprobes', 'x86/memtest', 'x86/modules', 'x86/nmi', 'x86/pat', 'x86/reboot', 'x86/setup', 'x86/step', 'x86/unify-pci', 'x86/uv', 'x86/xen' and 'xen-64bit' into x86/for-linus 2008-07-21 16:37:17 +02:00
Ingo Molnar
e66d90fb4a Merge branch 'linus' into xen-64bit 2008-07-21 15:06:09 +02:00
Ingo Molnar
1c29dd9a9e Merge branch 'linus' into x86/paravirt-spinlocks 2008-07-21 15:05:58 +02:00
Yinghai Lu
7edf8891ad x86: remove extra calling to get ext cpuid level
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-21 13:03:13 +02:00
Yinghai Lu
9175fc06ae x86: use setup_clear_cpu_cap() when disabling the lapic
... so don't need to call clear_cpu_cap again in early_identify_cpu,
and could use cleared_cpu_caps like other places.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-21 13:03:12 +02:00
Ingo Molnar
e27772b48d Merge branch 'linus' into x86/urgent 2008-07-21 11:02:45 +02:00
Avi Kivity
722c05f219 KVM: MMU: Fix potential race setting upper shadow ptes on nonpae hosts
The direct mapped shadow code (used for real mode and two dimensional paging)
sets upper-level ptes using direct assignment rather than calling
set_shadow_pte().  A nonpae host will split this into two writes, which opens
up a race if another vcpu accesses the same memory area.

Fix by calling set_shadow_pte() instead of assigning directly.

Noticed by Izik Eidus.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:40 +03:00
Glauber Costa
2a7c5b8b55 KVM: x86 emulator: emulate clflush
If the guest issues a clflush in a mmio address, the instruction
can trap into the hypervisor. Currently, we do not decode clflush
properly, causing the guest to hang. This patch fixes this emulating
clflush (opcode 0f ae).

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:40 +03:00
Marcelo Tosatti
376c53c2b3 KVM: MMU: improve invalid shadow root page handling
Harden kvm_mmu_zap_page() against invalid root pages that
had been shadowed from memslots that are gone.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:40 +03:00
Marcelo Tosatti
34d4cb8fca KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction
Flush the shadow mmu before removing regions to avoid stale entries.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:40 +03:00
Avi Kivity
d6e88aec07 KVM: Prefix some x86 low level function with kvm_, to avoid namespace issues
Fixes compilation with CONFIG_VMI enabled.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:39 +03:00
Ben-Ami Yassour
c65bbfa1d6 KVM: check injected pic irq within valid pic irqs
Check that an injected pic irq is between 0 and 15.

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:39 +03:00
Mohammed Gamal
19fdfa0d13 KVM: x86 emulator: Fix HLT instruction
This patch fixes issue encountered with HLT instruction
under FreeDOS's HIMEM XMS Driver.

The HLT instruction jumped directly to the done label and
skips updating the EIP value, therefore causing the guest
to spin endlessly on the same instruction.

The patch changes the instruction so that it writes back
the updated EIP value.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:38 +03:00
Avi Kivity
ac9f6dc0db KVM: Apply the kernel sigmask to vcpus blocked due to being uninitialized
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:38 +03:00
Sheng Yang
4e1096d27f KVM: VMX: Add ept_sync_context in flush_tlb
Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). The
old behavior don't sync EPT TLB with modified EPT entry, which result
in inconsistent content of EPT TLB and EPT table.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:38 +03:00
Marcelo Tosatti
5a4c928804 KVM: mmu_shrink: kvm_mmu_zap_page requires slots_lock to be held
kvm_mmu_zap_page() needs slots lock held (rmap_remove->gfn_to_memslot,
for example).

Since kvm_lock spinlock is held in mmu_shrink(), do a non-blocking
down_read_trylock().

Untested.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:38 +03:00
Adrian Bunk
7e37c2998a x86: KVM guest: make kvm_smp_prepare_boot_cpu() static
This patch makes the needlessly global kvm_smp_prepare_boot_cpu() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:37 +03:00
Joerg Roedel
0da1db75a2 KVM: SVM: fix suspend/resume support
On suspend the svm_hardware_disable function is called which frees all svm_data
variables. On resume they are not re-allocated. This patch removes the
deallocation of svm_data from the hardware_disable function to the
hardware_unsetup function which is not called on suspend.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:37 +03:00
Marcelo Tosatti
f8b78fa3d4 KVM: move slots_lock acquision down to vapic_exit
There is no need to grab slots_lock if the vapic_page will not
be touched.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:36 +03:00
Chris Lalancette
efa67e0d1f KVM: VMX: Fake emulate Intel perfctr MSRs
Older linux guests (in this case, 2.6.9) can attempt to
access the performance counter MSRs without a fixup section, and injecting
a GPF kills the guest.  Work around by allowing the guest to write those MSRs.

Tested by me on RHEL-4 i386 and x86_64 guests, as well as F-9 guests.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:36 +03:00
Sheng Yang
65267ea1b3 KVM: VMX: Fix a wrong usage of vmcs_config
The function ept_update_paging_mode_cr0() write to
CPU_BASED_VM_EXEC_CONTROL based on vmcs_config.cpu_based_exec_ctrl. That's
wrong because the variable may not consistent with the content in the
CPU_BASE_VM_EXEC_CONTROL MSR.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:36 +03:00
Avi Kivity
db475c39ec KVM: MMU: Fix printk format
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:35 +03:00
Avi Kivity
6ada8cca79 KVM: MMU: When debug is enabled, make it a run-time parameter
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:35 +03:00
Avi Kivity
7a5b56dfd3 KVM: x86 emulator: lazily evaluate segment registers
Instead of prefetching all segment bases before emulation, read them at the
last moment.  Since most of them are unneeded, we save some cycles on
Intel machines where this is a bit expensive.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:35 +03:00
Avi Kivity
0adc8675d6 KVM: x86 emulator: avoid segment base adjust for lea
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:34 +03:00
Avi Kivity
f5b4edcd52 KVM: x86 emulator: simplify rip relative decoding
rip relative decoding is relative to the instruction pointer of the next
instruction; by moving address adjustment until after decoding is complete,
we remove the need to determine the instruction size.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:34 +03:00
Avi Kivity
84411d85da KVM: x86 emulator: simplify r/m decoding
Consolidate the duplicated code when not in any special case.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:33 +03:00
Avi Kivity
dc71d0f162 KVM: x86 emulator: simplify sib decoding
Instead of using sparse switches, use simpler if/else sequences.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:33 +03:00
Avi Kivity
8684c0af0b KVM: x86 emulator: handle undecoded rex.b with r/m = 5 in certain cases
x86_64 does not decode rex.b in certain cases, where the r/m field = 5.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:33 +03:00
Mohammed Gamal
b13354f8f0 KVM: x86 emulator: emulate nop and xchg reg, acc (opcodes 0x90 - 0x97)
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:33 +03:00
Avi Kivity
f76c710d75 KVM: Use printk_rlimit() instead of reporting emulation failures just once
Emulation failure reports are useful, so allow more than one per the lifetime
of the module.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:32 +03:00
Glauber Costa
25be46080f KVM: Do not calculate linear rip in emulation failure report
If we're not gonna do anything (case in which failure is already
reported), we do not need to even bother with calculating the linear rip.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:32 +03:00
Marcelo Tosatti
622395a9e6 KVM: only abort guest entry if timer count goes from 0->1
Only abort guest entry if the timer count went from 0->1, since for 1->2
or larger the bit will either be set already or a timer irq will have
been injected.

Using atomic_inc_and_test() for it also introduces an SMP barrier
to the LAPIC version (thought it was unecessary because of timer
migration, but guest can be scheduled to a different pCPU between exit
and kvm_vcpu_block(), so there is the possibility for a race).

Noticed by Avi.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:32 +03:00
Laurent Vivier
542472b53e KVM: Add coalesced MMIO support (x86 part)
This patch enables coalesced MMIO for x86 architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:31 +03:00