Commit graph

448 commits

Author SHA1 Message Date
Izik Eidus
82ce2c9683 KVM: Allow dynamic allocation of the mmu shadow cache size
The user is now able to set how many mmu pages will be allocated to the guest.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Izik Eidus
195aefde9c KVM: Add general accessors to read and write guest memory
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Izik Eidus
290fc38da8 KVM: Remove the usage of page->private field by rmap
When kvm uses user-allocated pages in the future for the guest, we won't
be able to use page->private for rmap, since page->rmap is reserved for
the filesystem.  So we move the rmap base pointers to the memory slot.

A side effect of this is that we need to store the gfn of each gpte in
the shadow pages, since the memory slot is addressed by gfn, instead of
hfn like struct page.

Signed-off-by: Izik Eidus <izik@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Avi Kivity
f566e09fc2 KVM: VMX: Simplify vcpu_clear()
Now that smp_call_function_single() knows how to call a function on the
current cpu, there's no need to check explicitly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Avi Kivity
eae5ecb5b9 KVM: VMX: Don't clear the vmcs if the vcpu is not loaded on any processor
Noted by Eddie Dong.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Laurent Vivier
b4c6abfef4 KVM: x86 emulator: Any legacy prefix after a REX prefix nullifies its effect
This patch modifies the management of REX prefix according behavior
I saw in Xen 3.1.  In Xen, this modification has been introduced by
Jan Beulich.

http://lists.xensource.com/archives/html/xen-changelog/2007-01/msg00081.html

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Laurent Vivier
a22436b7b8 KVM: Purify x86_decode_insn() error case management
The only valid case is on protected page access, other cases are errors.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Qing He
e4f8e03956 KVM: x86_emulator: no writeback for bt
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Laurent Vivier
a01af5ec51 KVM: x86 emulator: Remove no_wb, use dst.type = OP_NONE instead
Remove no_wb, use dst.type = OP_NONE instead, idea stollen from xen-3.1

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Laurent Vivier
05f086f87e KVM: x86 emulator: remove _eflags and use directly ctxt->eflags.
Remove _eflags and use directly ctxt->eflags. Caching eflags is not needed as
it is restored to vcpu by kvm_main.c:emulate_instruction() from ctxt->eflags
only if emulation doesn't fail.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Laurent Vivier
8cdbd2c9bf KVM: x86 emulator: split some decoding into functions for readability
To improve readability, move push, writeback, and grp 1a/2/3/4/5/9 emulation
parts into functions.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Ryan Harper
217648638c KVM: MMU: Ignore reserved bits in cr3 in non-pae mode
This patch removes the fault injected when the guest attempts to set reserved
bits in cr3.  X86 hardware doesn't generate a fault when setting reserved bits.
The result of this patch is that vmware-server, running within a kvm guest,
boots and runs memtest from an iso.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Avi Kivity
12b7d28fc1 KVM: MMU: Make flooding detection work when guest page faults are bypassed
When we allow guest page faults to reach the guests directly, we lose
the fault tracking which allows us to detect demand paging.  So we provide
an alternate mechnism by clearing the accessed bit when we set a pte, and
checking it later to see if the guest actually used it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Avi Kivity
c7addb9020 KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
 - host page faults, where the fault is needed to allow kvm to install
   the shadow pte or update the guest accessed and dirty bits
 - guest page faults, where the guest has faulted and kvm simply injects
   the fault back into the guest to handle

The second class, guest page faults, is pure overhead.  We can eliminate
some of it on vmx using the following evil trick:
 - when we set up a shadow page table entry, if the corresponding guest pte
   is not present, set up the shadow pte as not present
 - if the guest pte _is_ present, mark the shadow pte as present but also
   set one of the reserved bits in the shadow pte
 - tell the vmx hardware not to trap faults which have the present bit clear

With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.

Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code.  It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Avi Kivity
51c6cf662b KVM: VMX: Further reduce efer reloads
KVM avoids reloading the efer msr when the difference between the guest
and host values consist of the long mode bits (which are switched by
hardware) and the NX bit (which is emulated by the KVM MMU).

This patch also allows KVM to ignore SCE (syscall enable) when the guest
is running in 32-bit mode.  This is because the syscall instruction is
not available in 32-bit mode on Intel processors, so the SCE bit is
effectively meaningless.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00
Laurent Vivier
3427318fd2 KVM: Call x86_decode_insn() only when needed
Move emulate_ctxt to kvm_vcpu to keep emulate context when we exit from kvm
module. Call x86_decode_insn() only when needed. Modify x86_emulate_insn() to
not modify the context if it must be re-entered.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00
Laurent Vivier
1be3aa4718 KVM: emulate_instruction() calls now x86_decode_insn() and x86_emulate_insn()
emulate_instruction() calls now x86_decode_insn() and x86_emulate_insn().
x86_emulate_insn() is x86_emulate_memop() without the decoding part.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00
Laurent Vivier
8b4caf6650 KVM: x86 emulator: move all decoding process to function x86_decode_insn()
Split the decoding process into a new function x86_decode_insn().

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00
Laurent Vivier
e4e03deda8 KVM: x86 emulator: move all x86_emulate_memop() to a structure
Move all x86_emulate_memop() common variables between decode and execute to a
structure decode_cache.  This will help in later separating decode and
emulate.

            struct decode_cache {
                u8 twobyte;
                u8 b;
                u8 lock_prefix;
                u8 rep_prefix;
                u8 op_bytes;
                u8 ad_bytes;
                struct operand src;
                struct operand dst;
                unsigned long *override_base;
                unsigned int d;
                unsigned long regs[NR_VCPU_REGS];
                unsigned long eip;
                /* modrm */
                u8 modrm;
                u8 modrm_mod;
                u8 modrm_reg;
                u8 modrm_rm;
                u8 use_modrm_ea;
                unsigned long modrm_ea;
                unsigned long modrm_val;
           };

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00
Laurent Vivier
a7ddce3afc KVM: x86 emulator: remove unused functions
Remove #ifdef functions never used

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:46 +02:00
Anthony Liguori
7aa81cc047 KVM: Refactor hypercall infrastructure (v3)
This patch refactors the current hypercall infrastructure to better
support live migration and SMP.  It eliminates the hypercall page by
trapping the UD exception that would occur if you used the wrong hypercall
instruction for the underlying architecture and replacing it with the right
one lazily.

A fall-out of this patch is that the unhandled hypercalls no longer trap to
userspace.  There is very little reason though to use a hypercall to
communicate with userspace as PIO or MMIO can be used.  There is no code
in tree that uses userspace hypercalls.

[avi: fix #ud injection on vmx]

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:46 +02:00
Anthony Liguori
aca7f96600 KVM: x86 emulator: Add vmmcall/vmcall to x86_emulate (v3)
Add vmmcall/vmcall to x86_emulate.  Future patch will implement functionality
for these instructions.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:46 +02:00
Glauber de Oliveira Costa
053de04441 x86: get rid of _MASK flags
There's no need for the *_MASK flags (TF_MASK, IF_MASK, etc), found in
processor.h (both _32 and _64). They have a one-to-one mapping with the
EFLAGS value. This patch removes the definitions, and use the already
existent X86_EFLAGS_ version when applicable.

[ roland@redhat.com: KVM build fixes. ]

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:27 +01:00
Glauber de Oliveira Costa
6b68f01baa x86: unify struct desc_ptr
This patch unifies struct desc_ptr between i386 and x86_64.
They can be expressed in the exact same way in C code, only
having to change the name of one of them. As Xgt_desc_struct
is ugly and big, this is the one that goes away.

There's also a padding field in i386, but it is not really
needed in the C structure definition.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:12 +01:00
Kay Sievers
af5ca3f4ec Driver core: change sysdev classes to use dynamic kobject names
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Amit Shah
404fb881b8 KVM: SVM: Fix FPU leak while emulating clts
The clts code didn't use set_cr0 properly, so our lazy FPU
processing wasn't being done by the clts instruction at all.

(this isn't called on Intel as the hardware does the decode for us)

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-27 15:38:18 +02:00
Avi Kivity
8d379a7c06 KVM: SVM: Unload guest fpu on vcpu_put()
Not unloading the guest fpu can cause fpu leaks from guest to guest (or host
to guest).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-27 15:33:10 +02:00
Amit Shah
00b2ef475d KVM: x86 emulator: Use emulator_write_emulated and not emulator_write_std
emulator_write_std() is not implemented, and calling write_emulated should
work just as well in place of write_std.

Fixes emulator failures with the push r/m instruction.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-27 15:28:29 +02:00
Izik Eidus
2a738e20a1 KVM: x86 emulator: fix the saving of of the eip value
this make sure that no matter what is the operand size,
all the value of the eip will be saved

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-27 15:10:45 +02:00
Izik Eidus
e826ec9ae2 KVM: x86 emulator: fix JMP_REL
Change JMP_REL to call to register_address_increment(): the operands size
should not effect the calculation of the eip, instead the ad_bytes should
affect it.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-27 15:08:22 +02:00
Avi Kivity
cf5a94d133 KVM: SVM: Intercept the 'invd' and 'wbinvd' instructions
'invd' can destroy host data, and 'wbinvd' allows the guest to induce
long (milliseconds) latencies.

Noted by Ben Serebrin.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-08 12:05:45 +02:00
Avi Kivity
651a3e29b3 KVM: x86 emulator: invd instruction
Emulate the 'invd' instruction (opcode 0f 08).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-08 12:05:44 +02:00
Avi Kivity
56ba47ddbd KVM: SVM: Defer nmi processing until switch to host state is complete
If we stgi() too soon, nmis can reach the processor even though interrupts
are disabled, catching it in a half-switched state.  Delay the stgi() until
we're done switching.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-08 12:05:43 +02:00
Avi Kivity
70433389cc KVM: SVM: Fix SMP with kernel apic
AP processor needs to reset to the SIPI vector, not normal INIT.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-08 12:05:36 +02:00
Avi Kivity
1e35d3c4a7 KVM: x86 emulator: fix 'push imm8' emulation
'push imm8' found itself in the wrong switch somehow, so it is never executed.

This fixes Windows 2003 installation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-08 10:42:04 +02:00
Rusty Russell
9525ca0286 Consolidate host virtualization support under Virtualization menu
Move lguest under the virtualization menu.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Avi Kivity <avi@qumranet.com>
2007-10-23 15:49:47 +10:00
Laurent Vivier
49d3bd7e2b KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs()
In kvm_flush_remote_tlbs(), replace a loop using smp_call_function_single()
by a single call to smp_call_function_mask() (which is new for x86_64).

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 17:21:54 +02:00
Kevin Pedretti
9da8f4e83a KVM: Improve local apic timer wraparound handling
Better handle wrap-around cases when reading the APIC CCR
(current count register).  Also, if ICR is 0, CCR should also
be 0... previously reading CCR before setting ICR would result
in a large kinda-random number.

Signed-off-by: Kevin Pedretti <kevin.pedretti@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:29 +02:00
Kevin Pedretti
b33ac88b4c KVM: Fix local apic timer divide by zero
kvm_lapic_reset() was initializing apic->timer.divide_count to 0,
which could potentially lead to a divide by zero error in
apic_get_tmcct().  Any guest that reads the APIC's CCR (current count)
register before setting DCR (divide configuration) would trigger a divide
by zero exception in the host kernel, leading to a host-OS crash.

This patch results in apic->timer.divide_count being initialized to
2 at reset, eliminating the bug (DCR=0 at reset, meaning divide by 2).

Signed-off-by: Kevin Pedretti <kevin.pedretti@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:29 +02:00
Laurent Vivier
0552f73b9a KVM: Move kvm_guest_exit() after local_irq_enable()
We need to make sure that the timer interrupt happens before we clear
PF_VCPU, so the accounting code actually sees guest mode.

http://lkml.org/lkml/2007/10/15/114

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:29 +02:00
Aurelien Jarno
4e62417bf3 KVM: x86 emulator: fix access registers for instructions with ModR/M byte and Mod = 3
The patch belows changes the access type to register from memory for
instructions that are declared as SrcMem or DstMem, but have a
ModR/M byte with Mod = 3.

It fixes (at least) the lmsw and smsw instructions on an AMD64 CPU,
which are needed for FreeBSD.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:29 +02:00
Avi Kivity
78f7826868 KVM: VMX: Force vm86 mode if setting flags during real mode
When resetting from userspace, we need to handle the flags being cleared
even after we are in real mode.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:29 +02:00
Sheng Yang
a012e65aee KVM: x86 emulator: implement 'movnti mem, reg'
Implement emulation of instruction:
    movnti m32/m64, r32/r64
    opcode: 0x0f 0xc3

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:29 +02:00
Eddie Dong
8668a3c468 KVM: VMX: Reset mmu context when entering real mode
Resetting an SMP guest will force AP enter real mode (RESET) with
paging enabled in protected mode. While current enter_rmode() can
only handle mode switch from nonpaging mode to real mode which leads
to SMP reboot failure.

Fix by reloading the mmu context on entering real mode.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:28 +02:00
Avi Kivity
1b6269db3f KVM: VMX: Handle NMIs before enabling interrupts and preemption
This makes sure we handle NMI on the current cpu, and that we don't service
maskable interrupts before non-maskable ones.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:28 +02:00
Izik Eidus
7f2145ad6f KVM: MMU: Set shadow pte atomically in mmu_pte_write_zap_pte()
Setting shadow page table entry should be set atomicly using set_shadow_pte().

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:28 +02:00
Laurent Vivier
ae6200baea KVM: x86 emulator: fix repne/repnz decoding
The repnz/repne instructions must set rep_prefix to 1 like rep/repe/repz.

This patch correct the disk probe problem met with OpenBSD.

This issue appears with commit e70669abd4
because before it, the decoding was done internally to kvm and after it
is done by x86_emulate.c (which doesn't do it correctly).

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:28 +02:00
Nitin A Kamble
1a52e05136 KVM: x86 emulator: fix merge screwup due to emulator split
This code has gone to wrong place in the file. Moving it back to
right location.

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 12:03:28 +02:00
Laurent Vivier
d172fcd3ae sched: guest CPU accounting: maintain guest state in KVM
Modify KVM to update guest time accounting.

[ mingo@elte.hu: ported to 2.6.24 KVM. ]

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Avi Kivity
0967b7bf1c KVM: Skip pio instruction when it is emulated, not executed
If we defer updating rip until pio instructions are executed, we have a
problem with reset:  a pio reset updates rip, and when the instruction
completes we skip the emulated instruction, pointing rip somewhere completely
unrelated.

Fix by updating rip when we see decode the instruction, not after emulation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13 10:18:29 +02:00