KVM updates for the 3.6 merge window
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQDRDNAAoJEI7yEDeUysxlkl8P/3C2AHx2webOU8sVzhfU6ONZ ZoGevwBjyZIeJEmiWVpFTTEew1l0PXtpyOocXGNUXIddVnhXTQOKr/Scj4uFbmx8 ROqgK8NSX9+xOGrBPCoN7SlJkmp+m6uYtwYkl2SGnsEVLWMKkc7J7oqmszCcTQvN UXMf7G47/Ul2NUSBdv4Yvizhl4kpvWxluiweDw3E/hIQKN0uyP7CY58qcAztw8nG csZBAnnuPFwIAWxHXW3eBBv4UP138HbNDqJ/dujjocM6GnOxmXJmcZ6b57gh+Y64 3+w9IR4qrRWnsErb/I8inKLJ1Jdcf7yV2FmxYqR4pIXay2Yzo1BsvFd6EB+JavUv pJpixrFiDDFoQyXlh4tGpsjpqdXNMLqyG4YpqzSZ46C8naVv9gKE7SXqlXnjyDlb Llx3hb9Fop8O5ykYEGHi+gIISAK5eETiQl4yw9RUBDpxydH4qJtqGIbLiDy8y9wi Xyi8PBlNl+biJFsK805lxURqTp/SJTC3+Zb7A7CzYEQm5xZw3W/CKZx1ZYBfpaa/ pWaP6tB7JwgLIVXi4HQayLWqMVwH0soZIn9yazpOEFv6qO8d5QH5RAxAW2VXE3n5 JDlrajar/lGIdiBVWfwTJLb86gv3QDZtIWoR9mZuLKeKWE/6PRLe7HQpG1pJovsm 2AsN5bS0BWq+aqPpZHa5 =pECD -----END PGP SIGNATURE----- Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Avi Kivity: "Highlights include - full big real mode emulation on pre-Westmere Intel hosts (can be disabled with emulate_invalid_guest_state=0) - relatively small ppc and s390 updates - PCID/INVPCID support in guests - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on interrupt intensive workloads) - Lockless write faults during live migration - EPT accessed/dirty bits support for new Intel processors" Fix up conflicts in: - Documentation/virtual/kvm/api.txt: Stupid subchapter numbering, added next to each other. - arch/powerpc/kvm/booke_interrupts.S: PPC asm changes clashing with the KVM fixes - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c: Duplicated commits through the kvm tree and the s390 tree, with subsequent edits in the KVM tree. * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) KVM: fix race with level interrupts x86, hyper: fix build with !CONFIG_KVM_GUEST Revert "apic: fix kvm build on UP without IOAPIC" KVM guest: switch to apic_set_eoi_write, apic_write apic: add apic_set_eoi_write for PV use KVM: VMX: Implement PCID/INVPCID for guests with EPT KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check KVM: PPC: Critical interrupt emulation support KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests KVM: PPC64: booke: Set interrupt computation mode for 64-bit host KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt KVM: PPC: bookehv64: Add support for std/ld emulation. booke: Added crit/mc exception handler for e500v2 booke/bookehv: Add host crit-watchdog exception support KVM: MMU: document mmu-lock and fast page fault KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint KVM: MMU: trace fast page fault KVM: MMU: fast path of handling guest page fault KVM: MMU: introduce SPTE_MMU_WRITEABLE bit KVM: MMU: fold tlb flush judgement into mmu_spte_update ...
This commit is contained in:
commit
5fecc9d8f5
71 changed files with 1913 additions and 518 deletions
|
@ -1946,6 +1946,40 @@ the guest using the specified gsi pin. The irqfd is removed using
|
|||
the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
|
||||
and kvm_irqfd.gsi.
|
||||
|
||||
4.76 KVM_PPC_ALLOCATE_HTAB
|
||||
|
||||
Capability: KVM_CAP_PPC_ALLOC_HTAB
|
||||
Architectures: powerpc
|
||||
Type: vm ioctl
|
||||
Parameters: Pointer to u32 containing hash table order (in/out)
|
||||
Returns: 0 on success, -1 on error
|
||||
|
||||
This requests the host kernel to allocate an MMU hash table for a
|
||||
guest using the PAPR paravirtualization interface. This only does
|
||||
anything if the kernel is configured to use the Book 3S HV style of
|
||||
virtualization. Otherwise the capability doesn't exist and the ioctl
|
||||
returns an ENOTTY error. The rest of this description assumes Book 3S
|
||||
HV.
|
||||
|
||||
There must be no vcpus running when this ioctl is called; if there
|
||||
are, it will do nothing and return an EBUSY error.
|
||||
|
||||
The parameter is a pointer to a 32-bit unsigned integer variable
|
||||
containing the order (log base 2) of the desired size of the hash
|
||||
table, which must be between 18 and 46. On successful return from the
|
||||
ioctl, it will have been updated with the order of the hash table that
|
||||
was allocated.
|
||||
|
||||
If no hash table has been allocated when any vcpu is asked to run
|
||||
(with the KVM_RUN ioctl), the host kernel will allocate a
|
||||
default-sized hash table (16 MB).
|
||||
|
||||
If this ioctl is called when a hash table has already been allocated,
|
||||
the kernel will clear out the existing hash table (zero all HPTEs) and
|
||||
return the hash table order in the parameter. (If the guest is using
|
||||
the virtualized real-mode area (VRMA) facility, the kernel will
|
||||
re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
|
||||
|
||||
|
||||
5. The kvm_run structure
|
||||
------------------------
|
||||
|
|
|
@ -6,7 +6,129 @@ KVM Lock Overview
|
|||
|
||||
(to be written)
|
||||
|
||||
2. Reference
|
||||
2: Exception
|
||||
------------
|
||||
|
||||
Fast page fault:
|
||||
|
||||
Fast page fault is the fast path which fixes the guest page fault out of
|
||||
the mmu-lock on x86. Currently, the page fault can be fast only if the
|
||||
shadow page table is present and it is caused by write-protect, that means
|
||||
we just need change the W bit of the spte.
|
||||
|
||||
What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and
|
||||
SPTE_MMU_WRITEABLE bit on the spte:
|
||||
- SPTE_HOST_WRITEABLE means the gfn is writable on host.
|
||||
- SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when
|
||||
the gfn is writable on guest mmu and it is not write-protected by shadow
|
||||
page write-protection.
|
||||
|
||||
On fast page fault path, we will use cmpxchg to atomically set the spte W
|
||||
bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, this
|
||||
is safe because whenever changing these bits can be detected by cmpxchg.
|
||||
|
||||
But we need carefully check these cases:
|
||||
1): The mapping from gfn to pfn
|
||||
The mapping from gfn to pfn may be changed since we can only ensure the pfn
|
||||
is not changed during cmpxchg. This is a ABA problem, for example, below case
|
||||
will happen:
|
||||
|
||||
At the beginning:
|
||||
gpte = gfn1
|
||||
gfn1 is mapped to pfn1 on host
|
||||
spte is the shadow page table entry corresponding with gpte and
|
||||
spte = pfn1
|
||||
|
||||
VCPU 0 VCPU0
|
||||
on fast page fault path:
|
||||
|
||||
old_spte = *spte;
|
||||
pfn1 is swapped out:
|
||||
spte = 0;
|
||||
|
||||
pfn1 is re-alloced for gfn2.
|
||||
|
||||
gpte is changed to point to
|
||||
gfn2 by the guest:
|
||||
spte = pfn1;
|
||||
|
||||
if (cmpxchg(spte, old_spte, old_spte+W)
|
||||
mark_page_dirty(vcpu->kvm, gfn1)
|
||||
OOPS!!!
|
||||
|
||||
We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.
|
||||
|
||||
For direct sp, we can easily avoid it since the spte of direct sp is fixed
|
||||
to gfn. For indirect sp, before we do cmpxchg, we call gfn_to_pfn_atomic()
|
||||
to pin gfn to pfn, because after gfn_to_pfn_atomic():
|
||||
- We have held the refcount of pfn that means the pfn can not be freed and
|
||||
be reused for another gfn.
|
||||
- The pfn is writable that means it can not be shared between different gfns
|
||||
by KSM.
|
||||
|
||||
Then, we can ensure the dirty bitmaps is correctly set for a gfn.
|
||||
|
||||
Currently, to simplify the whole things, we disable fast page fault for
|
||||
indirect shadow page.
|
||||
|
||||
2): Dirty bit tracking
|
||||
In the origin code, the spte can be fast updated (non-atomically) if the
|
||||
spte is read-only and the Accessed bit has already been set since the
|
||||
Accessed bit and Dirty bit can not be lost.
|
||||
|
||||
But it is not true after fast page fault since the spte can be marked
|
||||
writable between reading spte and updating spte. Like below case:
|
||||
|
||||
At the beginning:
|
||||
spte.W = 0
|
||||
spte.Accessed = 1
|
||||
|
||||
VCPU 0 VCPU0
|
||||
In mmu_spte_clear_track_bits():
|
||||
|
||||
old_spte = *spte;
|
||||
|
||||
/* 'if' condition is satisfied. */
|
||||
if (old_spte.Accssed == 1 &&
|
||||
old_spte.W == 0)
|
||||
spte = 0ull;
|
||||
on fast page fault path:
|
||||
spte.W = 1
|
||||
memory write on the spte:
|
||||
spte.Dirty = 1
|
||||
|
||||
|
||||
else
|
||||
old_spte = xchg(spte, 0ull)
|
||||
|
||||
|
||||
if (old_spte.Accssed == 1)
|
||||
kvm_set_pfn_accessed(spte.pfn);
|
||||
if (old_spte.Dirty == 1)
|
||||
kvm_set_pfn_dirty(spte.pfn);
|
||||
OOPS!!!
|
||||
|
||||
The Dirty bit is lost in this case.
|
||||
|
||||
In order to avoid this kind of issue, we always treat the spte as "volatile"
|
||||
if it can be updated out of mmu-lock, see spte_has_volatile_bits(), it means,
|
||||
the spte is always atomicly updated in this case.
|
||||
|
||||
3): flush tlbs due to spte updated
|
||||
If the spte is updated from writable to readonly, we should flush all TLBs,
|
||||
otherwise rmap_write_protect will find a read-only spte, even though the
|
||||
writable spte might be cached on a CPU's TLB.
|
||||
|
||||
As mentioned before, the spte can be updated to writable out of mmu-lock on
|
||||
fast page fault path, in order to easily audit the path, we see if TLBs need
|
||||
be flushed caused by this reason in mmu_spte_update() since this is a common
|
||||
function to update spte (present -> present).
|
||||
|
||||
Since the spte is "volatile" if it can be updated out of mmu-lock, we always
|
||||
atomicly update the spte, the race caused by fast page fault can be avoided,
|
||||
See the comments in spte_has_volatile_bits() and mmu_spte_update().
|
||||
|
||||
3. Reference
|
||||
------------
|
||||
|
||||
Name: kvm_lock
|
||||
|
@ -23,3 +145,9 @@ Arch: x86
|
|||
Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
|
||||
- tsc offset in vmcb
|
||||
Comment: 'raw' because updating the tsc offsets must not be preempted.
|
||||
|
||||
Name: kvm->mmu_lock
|
||||
Type: spinlock_t
|
||||
Arch: any
|
||||
Protects: -shadow page/shadow tlb entry
|
||||
Comment: it is a spinlock since it is used in mmu notifier.
|
||||
|
|
|
@ -223,3 +223,36 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
|
|||
steal: the amount of time in which this vCPU did not run, in
|
||||
nanoseconds. Time during which the vcpu is idle, will not be
|
||||
reported as steal time.
|
||||
|
||||
MSR_KVM_EOI_EN: 0x4b564d04
|
||||
data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
|
||||
when disabled. Bit 1 is reserved and must be zero. When PV end of
|
||||
interrupt is enabled (bit 0 set), bits 63-2 hold a 4-byte aligned
|
||||
physical address of a 4 byte memory area which must be in guest RAM and
|
||||
must be zeroed.
|
||||
|
||||
The first, least significant bit of 4 byte memory location will be
|
||||
written to by the hypervisor, typically at the time of interrupt
|
||||
injection. Value of 1 means that guest can skip writing EOI to the apic
|
||||
(using MSR or MMIO write); instead, it is sufficient to signal
|
||||
EOI by clearing the bit in guest memory - this location will
|
||||
later be polled by the hypervisor.
|
||||
Value of 0 means that the EOI write is required.
|
||||
|
||||
It is always safe for the guest to ignore the optimization and perform
|
||||
the APIC EOI write anyway.
|
||||
|
||||
Hypervisor is guaranteed to only modify this least
|
||||
significant bit while in the current VCPU context, this means that
|
||||
guest does not need to use either lock prefix or memory ordering
|
||||
primitives to synchronise with the hypervisor.
|
||||
|
||||
However, hypervisor can set and clear this memory bit at any time:
|
||||
therefore to make sure hypervisor does not interrupt the
|
||||
guest and clear the least significant bit in the memory area
|
||||
in the window between guest testing it to detect
|
||||
whether it can skip EOI apic write and between guest
|
||||
clearing it to signal EOI to the hypervisor,
|
||||
guest must both read the least significant bit in the memory area and
|
||||
clear it using a single CPU instruction, such as test and clear, or
|
||||
compare and exchange.
|
||||
|
|
|
@ -109,8 +109,6 @@ The following bits are safe to be set inside the guest:
|
|||
|
||||
MSR_EE
|
||||
MSR_RI
|
||||
MSR_CR
|
||||
MSR_ME
|
||||
|
||||
If any other bit changes in the MSR, please still use mtmsr(d).
|
||||
|
||||
|
|
|
@ -4002,8 +4002,8 @@ F: arch/ia64/include/asm/kvm*
|
|||
F: arch/ia64/kvm/
|
||||
|
||||
KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
|
||||
M: Carsten Otte <cotte@de.ibm.com>
|
||||
M: Christian Borntraeger <borntraeger@de.ibm.com>
|
||||
M: Cornelia Huck <cornelia.huck@de.ibm.com>
|
||||
M: linux390@de.ibm.com
|
||||
L: linux-s390@vger.kernel.org
|
||||
W: http://www.ibm.com/developerworks/linux/linux390/
|
||||
|
|
|
@ -26,6 +26,7 @@
|
|||
|
||||
/* Select x86 specific features in <linux/kvm.h> */
|
||||
#define __KVM_HAVE_IOAPIC
|
||||
#define __KVM_HAVE_IRQ_LINE
|
||||
#define __KVM_HAVE_DEVICE_ASSIGNMENT
|
||||
|
||||
/* Architectural interrupt line count. */
|
||||
|
|
|
@ -19,6 +19,7 @@ if VIRTUALIZATION
|
|||
|
||||
config KVM
|
||||
tristate "Kernel-based Virtual Machine (KVM) support"
|
||||
depends on BROKEN
|
||||
depends on HAVE_KVM && MODULES && EXPERIMENTAL
|
||||
# for device assignment:
|
||||
depends on PCI
|
||||
|
|
|
@ -153,6 +153,8 @@
|
|||
#define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, "r5"
|
||||
#define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, "r4"
|
||||
|
||||
extern bool epapr_paravirt_enabled;
|
||||
extern u32 epapr_hypercall_start[];
|
||||
|
||||
/*
|
||||
* We use "uintptr_t" to define a register because it's guaranteed to be a
|
||||
|
|
|
@ -34,6 +34,8 @@ extern void __replay_interrupt(unsigned int vector);
|
|||
|
||||
extern void timer_interrupt(struct pt_regs *);
|
||||
extern void performance_monitor_exception(struct pt_regs *regs);
|
||||
extern void WatchdogException(struct pt_regs *regs);
|
||||
extern void unknown_exception(struct pt_regs *regs);
|
||||
|
||||
#ifdef CONFIG_PPC64
|
||||
#include <asm/paca.h>
|
||||
|
|
|
@ -36,11 +36,8 @@ static inline void svcpu_put(struct kvmppc_book3s_shadow_vcpu *svcpu)
|
|||
#define SPAPR_TCE_SHIFT 12
|
||||
|
||||
#ifdef CONFIG_KVM_BOOK3S_64_HV
|
||||
/* For now use fixed-size 16MB page table */
|
||||
#define HPT_ORDER 24
|
||||
#define HPT_NPTEG (1ul << (HPT_ORDER - 7)) /* 128B per pteg */
|
||||
#define HPT_NPTE (HPT_NPTEG << 3) /* 8 PTEs per PTEG */
|
||||
#define HPT_HASH_MASK (HPT_NPTEG - 1)
|
||||
#define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */
|
||||
extern int kvm_hpt_order; /* order of preallocated HPTs */
|
||||
#endif
|
||||
|
||||
#define VRMA_VSID 0x1ffffffUL /* 1TB VSID reserved for VRMA */
|
||||
|
|
|
@ -237,6 +237,10 @@ struct kvm_arch {
|
|||
unsigned long vrma_slb_v;
|
||||
int rma_setup_done;
|
||||
int using_mmu_notifiers;
|
||||
u32 hpt_order;
|
||||
atomic_t vcpus_running;
|
||||
unsigned long hpt_npte;
|
||||
unsigned long hpt_mask;
|
||||
spinlock_t slot_phys_lock;
|
||||
unsigned long *slot_phys[KVM_MEM_SLOTS_NUM];
|
||||
int slot_npages[KVM_MEM_SLOTS_NUM];
|
||||
|
@ -414,7 +418,9 @@ struct kvm_vcpu_arch {
|
|||
ulong mcsrr1;
|
||||
ulong mcsr;
|
||||
u32 dec;
|
||||
#ifdef CONFIG_BOOKE
|
||||
u32 decar;
|
||||
#endif
|
||||
u32 tbl;
|
||||
u32 tbu;
|
||||
u32 tcr;
|
||||
|
|
|
@ -119,7 +119,8 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
|
|||
extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
|
||||
extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);
|
||||
|
||||
extern long kvmppc_alloc_hpt(struct kvm *kvm);
|
||||
extern long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp);
|
||||
extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
|
||||
extern void kvmppc_free_hpt(struct kvm *kvm);
|
||||
extern long kvmppc_prepare_vrma(struct kvm *kvm,
|
||||
struct kvm_userspace_memory_region *mem);
|
||||
|
|
|
@ -128,6 +128,7 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
|
|||
obj-y += ppc_save_regs.o
|
||||
endif
|
||||
|
||||
obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o
|
||||
obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o
|
||||
|
||||
# Disable GCOV in odd or sensitive code
|
||||
|
|
25
arch/powerpc/kernel/epapr_hcalls.S
Normal file
25
arch/powerpc/kernel/epapr_hcalls.S
Normal file
|
@ -0,0 +1,25 @@
|
|||
/*
|
||||
* Copyright (C) 2012 Freescale Semiconductor, Inc.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU General Public License
|
||||
* as published by the Free Software Foundation; either version
|
||||
* 2 of the License, or (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <linux/threads.h>
|
||||
#include <asm/reg.h>
|
||||
#include <asm/page.h>
|
||||
#include <asm/cputable.h>
|
||||
#include <asm/thread_info.h>
|
||||
#include <asm/ppc_asm.h>
|
||||
#include <asm/asm-offsets.h>
|
||||
|
||||
/* Hypercall entry point. Will be patched with device tree instructions. */
|
||||
.global epapr_hypercall_start
|
||||
epapr_hypercall_start:
|
||||
li r3, -1
|
||||
nop
|
||||
nop
|
||||
nop
|
||||
blr
|
52
arch/powerpc/kernel/epapr_paravirt.c
Normal file
52
arch/powerpc/kernel/epapr_paravirt.c
Normal file
|
@ -0,0 +1,52 @@
|
|||
/*
|
||||
* ePAPR para-virtualization support.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License, version 2, as
|
||||
* published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope that it will be useful,
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
* GNU General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU General Public License
|
||||
* along with this program; if not, write to the Free Software
|
||||
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
|
||||
*
|
||||
* Copyright (C) 2012 Freescale Semiconductor, Inc.
|
||||
*/
|
||||
|
||||
#include <linux/of.h>
|
||||
#include <asm/epapr_hcalls.h>
|
||||
#include <asm/cacheflush.h>
|
||||
#include <asm/code-patching.h>
|
||||
|
||||
bool epapr_paravirt_enabled;
|
||||
|
||||
static int __init epapr_paravirt_init(void)
|
||||
{
|
||||
struct device_node *hyper_node;
|
||||
const u32 *insts;
|
||||
int len, i;
|
||||
|
||||
hyper_node = of_find_node_by_path("/hypervisor");
|
||||
if (!hyper_node)
|
||||
return -ENODEV;
|
||||
|
||||
insts = of_get_property(hyper_node, "hcall-instructions", &len);
|
||||
if (!insts)
|
||||
return -ENODEV;
|
||||
|
||||
if (len % 4 || len > (4 * 4))
|
||||
return -ENODEV;
|
||||
|
||||
for (i = 0; i < (len / 4); i++)
|
||||
patch_instruction(epapr_hypercall_start + i, insts[i]);
|
||||
|
||||
epapr_paravirt_enabled = true;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
early_initcall(epapr_paravirt_init);
|
|
@ -31,6 +31,7 @@
|
|||
#include <asm/cacheflush.h>
|
||||
#include <asm/disassemble.h>
|
||||
#include <asm/ppc-opcode.h>
|
||||
#include <asm/epapr_hcalls.h>
|
||||
|
||||
#define KVM_MAGIC_PAGE (-4096L)
|
||||
#define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
|
||||
|
@ -726,7 +727,7 @@ unsigned long kvm_hypercall(unsigned long *in,
|
|||
unsigned long register r11 asm("r11") = nr;
|
||||
unsigned long register r12 asm("r12");
|
||||
|
||||
asm volatile("bl kvm_hypercall_start"
|
||||
asm volatile("bl epapr_hypercall_start"
|
||||
: "=r"(r0), "=r"(r3), "=r"(r4), "=r"(r5), "=r"(r6),
|
||||
"=r"(r7), "=r"(r8), "=r"(r9), "=r"(r10), "=r"(r11),
|
||||
"=r"(r12)
|
||||
|
@ -747,29 +748,6 @@ unsigned long kvm_hypercall(unsigned long *in,
|
|||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_hypercall);
|
||||
|
||||
static int kvm_para_setup(void)
|
||||
{
|
||||
extern u32 kvm_hypercall_start;
|
||||
struct device_node *hyper_node;
|
||||
u32 *insts;
|
||||
int len, i;
|
||||
|
||||
hyper_node = of_find_node_by_path("/hypervisor");
|
||||
if (!hyper_node)
|
||||
return -1;
|
||||
|
||||
insts = (u32*)of_get_property(hyper_node, "hcall-instructions", &len);
|
||||
if (len % 4)
|
||||
return -1;
|
||||
if (len > (4 * 4))
|
||||
return -1;
|
||||
|
||||
for (i = 0; i < (len / 4); i++)
|
||||
kvm_patch_ins(&(&kvm_hypercall_start)[i], insts[i]);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static __init void kvm_free_tmp(void)
|
||||
{
|
||||
unsigned long start, end;
|
||||
|
@ -791,7 +769,7 @@ static int __init kvm_guest_init(void)
|
|||
if (!kvm_para_available())
|
||||
goto free_tmp;
|
||||
|
||||
if (kvm_para_setup())
|
||||
if (!epapr_paravirt_enabled)
|
||||
goto free_tmp;
|
||||
|
||||
if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
|
||||
|
|
|
@ -24,16 +24,6 @@
|
|||
#include <asm/page.h>
|
||||
#include <asm/asm-offsets.h>
|
||||
|
||||
/* Hypercall entry point. Will be patched with device tree instructions. */
|
||||
|
||||
.global kvm_hypercall_start
|
||||
kvm_hypercall_start:
|
||||
li r3, -1
|
||||
nop
|
||||
nop
|
||||
nop
|
||||
blr
|
||||
|
||||
#define KVM_MAGIC_PAGE (-4096)
|
||||
|
||||
#ifdef CONFIG_64BIT
|
||||
|
@ -132,7 +122,7 @@ kvm_emulate_mtmsrd_len:
|
|||
.long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
|
||||
|
||||
|
||||
#define MSR_SAFE_BITS (MSR_EE | MSR_CE | MSR_ME | MSR_RI)
|
||||
#define MSR_SAFE_BITS (MSR_EE | MSR_RI)
|
||||
#define MSR_CRITICAL_BITS ~MSR_SAFE_BITS
|
||||
|
||||
.global kvm_emulate_mtmsr
|
||||
|
|
|
@ -37,56 +37,121 @@
|
|||
/* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
|
||||
#define MAX_LPID_970 63
|
||||
|
||||
long kvmppc_alloc_hpt(struct kvm *kvm)
|
||||
/* Power architecture requires HPT is at least 256kB */
|
||||
#define PPC_MIN_HPT_ORDER 18
|
||||
|
||||
long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
|
||||
{
|
||||
unsigned long hpt;
|
||||
long lpid;
|
||||
struct revmap_entry *rev;
|
||||
struct kvmppc_linear_info *li;
|
||||
long order = kvm_hpt_order;
|
||||
|
||||
/* Allocate guest's hashed page table */
|
||||
li = kvm_alloc_hpt();
|
||||
if (li) {
|
||||
/* using preallocated memory */
|
||||
hpt = (ulong)li->base_virt;
|
||||
kvm->arch.hpt_li = li;
|
||||
} else {
|
||||
/* using dynamic memory */
|
||||
if (htab_orderp) {
|
||||
order = *htab_orderp;
|
||||
if (order < PPC_MIN_HPT_ORDER)
|
||||
order = PPC_MIN_HPT_ORDER;
|
||||
}
|
||||
|
||||
/*
|
||||
* If the user wants a different size from default,
|
||||
* try first to allocate it from the kernel page allocator.
|
||||
*/
|
||||
hpt = 0;
|
||||
if (order != kvm_hpt_order) {
|
||||
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
|
||||
__GFP_NOWARN, HPT_ORDER - PAGE_SHIFT);
|
||||
__GFP_NOWARN, order - PAGE_SHIFT);
|
||||
if (!hpt)
|
||||
--order;
|
||||
}
|
||||
|
||||
/* Next try to allocate from the preallocated pool */
|
||||
if (!hpt) {
|
||||
pr_err("kvm_alloc_hpt: Couldn't alloc HPT\n");
|
||||
return -ENOMEM;
|
||||
li = kvm_alloc_hpt();
|
||||
if (li) {
|
||||
hpt = (ulong)li->base_virt;
|
||||
kvm->arch.hpt_li = li;
|
||||
order = kvm_hpt_order;
|
||||
}
|
||||
}
|
||||
|
||||
/* Lastly try successively smaller sizes from the page allocator */
|
||||
while (!hpt && order > PPC_MIN_HPT_ORDER) {
|
||||
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
|
||||
__GFP_NOWARN, order - PAGE_SHIFT);
|
||||
if (!hpt)
|
||||
--order;
|
||||
}
|
||||
|
||||
if (!hpt)
|
||||
return -ENOMEM;
|
||||
|
||||
kvm->arch.hpt_virt = hpt;
|
||||
kvm->arch.hpt_order = order;
|
||||
/* HPTEs are 2**4 bytes long */
|
||||
kvm->arch.hpt_npte = 1ul << (order - 4);
|
||||
/* 128 (2**7) bytes in each HPTEG */
|
||||
kvm->arch.hpt_mask = (1ul << (order - 7)) - 1;
|
||||
|
||||
/* Allocate reverse map array */
|
||||
rev = vmalloc(sizeof(struct revmap_entry) * HPT_NPTE);
|
||||
rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt_npte);
|
||||
if (!rev) {
|
||||
pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
|
||||
goto out_freehpt;
|
||||
}
|
||||
kvm->arch.revmap = rev;
|
||||
kvm->arch.sdr1 = __pa(hpt) | (order - 18);
|
||||
|
||||
lpid = kvmppc_alloc_lpid();
|
||||
if (lpid < 0)
|
||||
goto out_freeboth;
|
||||
pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
|
||||
hpt, order, kvm->arch.lpid);
|
||||
|
||||
kvm->arch.sdr1 = __pa(hpt) | (HPT_ORDER - 18);
|
||||
kvm->arch.lpid = lpid;
|
||||
|
||||
pr_info("KVM guest htab at %lx, LPID %lx\n", hpt, lpid);
|
||||
if (htab_orderp)
|
||||
*htab_orderp = order;
|
||||
return 0;
|
||||
|
||||
out_freeboth:
|
||||
vfree(rev);
|
||||
out_freehpt:
|
||||
free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
|
||||
if (kvm->arch.hpt_li)
|
||||
kvm_release_hpt(kvm->arch.hpt_li);
|
||||
else
|
||||
free_pages(hpt, order - PAGE_SHIFT);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
|
||||
{
|
||||
long err = -EBUSY;
|
||||
long order;
|
||||
|
||||
mutex_lock(&kvm->lock);
|
||||
if (kvm->arch.rma_setup_done) {
|
||||
kvm->arch.rma_setup_done = 0;
|
||||
/* order rma_setup_done vs. vcpus_running */
|
||||
smp_mb();
|
||||
if (atomic_read(&kvm->arch.vcpus_running)) {
|
||||
kvm->arch.rma_setup_done = 1;
|
||||
goto out;
|
||||
}
|
||||
}
|
||||
if (kvm->arch.hpt_virt) {
|
||||
order = kvm->arch.hpt_order;
|
||||
/* Set the entire HPT to 0, i.e. invalid HPTEs */
|
||||
memset((void *)kvm->arch.hpt_virt, 0, 1ul << order);
|
||||
/*
|
||||
* Set the whole last_vcpu array to an invalid vcpu number.
|
||||
* This ensures that each vcpu will flush its TLB on next entry.
|
||||
*/
|
||||
memset(kvm->arch.last_vcpu, 0xff, sizeof(kvm->arch.last_vcpu));
|
||||
*htab_orderp = order;
|
||||
err = 0;
|
||||
} else {
|
||||
err = kvmppc_alloc_hpt(kvm, htab_orderp);
|
||||
order = *htab_orderp;
|
||||
}
|
||||
out:
|
||||
mutex_unlock(&kvm->lock);
|
||||
return err;
|
||||
}
|
||||
|
||||
void kvmppc_free_hpt(struct kvm *kvm)
|
||||
{
|
||||
kvmppc_free_lpid(kvm->arch.lpid);
|
||||
|
@ -94,7 +159,8 @@ void kvmppc_free_hpt(struct kvm *kvm)
|
|||
if (kvm->arch.hpt_li)
|
||||
kvm_release_hpt(kvm->arch.hpt_li);
|
||||
else
|
||||
free_pages(kvm->arch.hpt_virt, HPT_ORDER - PAGE_SHIFT);
|
||||
free_pages(kvm->arch.hpt_virt,
|
||||
kvm->arch.hpt_order - PAGE_SHIFT);
|
||||
}
|
||||
|
||||
/* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
|
||||
|
@ -119,6 +185,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
|
|||
unsigned long psize;
|
||||
unsigned long hp0, hp1;
|
||||
long ret;
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
|
||||
psize = 1ul << porder;
|
||||
npages = memslot->npages >> (porder - PAGE_SHIFT);
|
||||
|
@ -127,8 +194,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
|
|||
if (npages > 1ul << (40 - porder))
|
||||
npages = 1ul << (40 - porder);
|
||||
/* Can't use more than 1 HPTE per HPTEG */
|
||||
if (npages > HPT_NPTEG)
|
||||
npages = HPT_NPTEG;
|
||||
if (npages > kvm->arch.hpt_mask + 1)
|
||||
npages = kvm->arch.hpt_mask + 1;
|
||||
|
||||
hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
|
||||
HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
|
||||
|
@ -138,7 +205,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
|
|||
for (i = 0; i < npages; ++i) {
|
||||
addr = i << porder;
|
||||
/* can't use hpt_hash since va > 64 bits */
|
||||
hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & HPT_HASH_MASK;
|
||||
hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt_mask;
|
||||
/*
|
||||
* We assume that the hash table is empty and no
|
||||
* vcpus are using it at this stage. Since we create
|
||||
|
|
|
@ -56,7 +56,7 @@
|
|||
/* #define EXIT_DEBUG_INT */
|
||||
|
||||
static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
|
||||
static int kvmppc_hv_setup_rma(struct kvm_vcpu *vcpu);
|
||||
static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
|
||||
{
|
||||
|
@ -1104,11 +1104,15 @@ int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
|
|||
return -EINTR;
|
||||
}
|
||||
|
||||
/* On the first time here, set up VRMA or RMA */
|
||||
atomic_inc(&vcpu->kvm->arch.vcpus_running);
|
||||
/* Order vcpus_running vs. rma_setup_done, see kvmppc_alloc_reset_hpt */
|
||||
smp_mb();
|
||||
|
||||
/* On the first time here, set up HTAB and VRMA or RMA */
|
||||
if (!vcpu->kvm->arch.rma_setup_done) {
|
||||
r = kvmppc_hv_setup_rma(vcpu);
|
||||
r = kvmppc_hv_setup_htab_rma(vcpu);
|
||||
if (r)
|
||||
return r;
|
||||
goto out;
|
||||
}
|
||||
|
||||
flush_fp_to_thread(current);
|
||||
|
@ -1126,6 +1130,9 @@ int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
|
|||
kvmppc_core_prepare_to_enter(vcpu);
|
||||
}
|
||||
} while (r == RESUME_GUEST);
|
||||
|
||||
out:
|
||||
atomic_dec(&vcpu->kvm->arch.vcpus_running);
|
||||
return r;
|
||||
}
|
||||
|
||||
|
@ -1341,7 +1348,7 @@ void kvmppc_core_commit_memory_region(struct kvm *kvm,
|
|||
{
|
||||
}
|
||||
|
||||
static int kvmppc_hv_setup_rma(struct kvm_vcpu *vcpu)
|
||||
static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
int err = 0;
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
|
@ -1360,6 +1367,15 @@ static int kvmppc_hv_setup_rma(struct kvm_vcpu *vcpu)
|
|||
if (kvm->arch.rma_setup_done)
|
||||
goto out; /* another vcpu beat us to it */
|
||||
|
||||
/* Allocate hashed page table (if not done already) and reset it */
|
||||
if (!kvm->arch.hpt_virt) {
|
||||
err = kvmppc_alloc_hpt(kvm, NULL);
|
||||
if (err) {
|
||||
pr_err("KVM: Couldn't alloc HPT\n");
|
||||
goto out;
|
||||
}
|
||||
}
|
||||
|
||||
/* Look up the memslot for guest physical address 0 */
|
||||
memslot = gfn_to_memslot(kvm, 0);
|
||||
|
||||
|
@ -1471,13 +1487,14 @@ static int kvmppc_hv_setup_rma(struct kvm_vcpu *vcpu)
|
|||
|
||||
int kvmppc_core_init_vm(struct kvm *kvm)
|
||||
{
|
||||
long r;
|
||||
unsigned long lpcr;
|
||||
unsigned long lpcr, lpid;
|
||||
|
||||
/* Allocate hashed page table */
|
||||
r = kvmppc_alloc_hpt(kvm);
|
||||
if (r)
|
||||
return r;
|
||||
/* Allocate the guest's logical partition ID */
|
||||
|
||||
lpid = kvmppc_alloc_lpid();
|
||||
if (lpid < 0)
|
||||
return -ENOMEM;
|
||||
kvm->arch.lpid = lpid;
|
||||
|
||||
INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
|
||||
|
||||
|
@ -1487,7 +1504,6 @@ int kvmppc_core_init_vm(struct kvm *kvm)
|
|||
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_201)) {
|
||||
/* PPC970; HID4 is effectively the LPCR */
|
||||
unsigned long lpid = kvm->arch.lpid;
|
||||
kvm->arch.host_lpid = 0;
|
||||
kvm->arch.host_lpcr = lpcr = mfspr(SPRN_HID4);
|
||||
lpcr &= ~((3 << HID4_LPID1_SH) | (0xful << HID4_LPID5_SH));
|
||||
|
|
|
@ -25,6 +25,9 @@ static void __init kvm_linear_init_one(ulong size, int count, int type);
|
|||
static struct kvmppc_linear_info *kvm_alloc_linear(int type);
|
||||
static void kvm_release_linear(struct kvmppc_linear_info *ri);
|
||||
|
||||
int kvm_hpt_order = KVM_DEFAULT_HPT_ORDER;
|
||||
EXPORT_SYMBOL_GPL(kvm_hpt_order);
|
||||
|
||||
/*************** RMA *************/
|
||||
|
||||
/*
|
||||
|
@ -209,7 +212,7 @@ static void kvm_release_linear(struct kvmppc_linear_info *ri)
|
|||
void __init kvm_linear_init(void)
|
||||
{
|
||||
/* HPT */
|
||||
kvm_linear_init_one(1 << HPT_ORDER, kvm_hpt_count, KVM_LINEAR_HPT);
|
||||
kvm_linear_init_one(1 << kvm_hpt_order, kvm_hpt_count, KVM_LINEAR_HPT);
|
||||
|
||||
/* RMA */
|
||||
/* Only do this on PPC970 in HV mode */
|
||||
|
|
|
@ -237,7 +237,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
|
|||
|
||||
/* Find and lock the HPTEG slot to use */
|
||||
do_insert:
|
||||
if (pte_index >= HPT_NPTE)
|
||||
if (pte_index >= kvm->arch.hpt_npte)
|
||||
return H_PARAMETER;
|
||||
if (likely((flags & H_EXACT) == 0)) {
|
||||
pte_index &= ~7UL;
|
||||
|
@ -352,7 +352,7 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
|
|||
unsigned long v, r, rb;
|
||||
struct revmap_entry *rev;
|
||||
|
||||
if (pte_index >= HPT_NPTE)
|
||||
if (pte_index >= kvm->arch.hpt_npte)
|
||||
return H_PARAMETER;
|
||||
hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
|
||||
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
|
||||
|
@ -419,7 +419,8 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
|
|||
i = 4;
|
||||
break;
|
||||
}
|
||||
if (req != 1 || flags == 3 || pte_index >= HPT_NPTE) {
|
||||
if (req != 1 || flags == 3 ||
|
||||
pte_index >= kvm->arch.hpt_npte) {
|
||||
/* parameter error */
|
||||
args[j] = ((0xa0 | flags) << 56) + pte_index;
|
||||
ret = H_PARAMETER;
|
||||
|
@ -521,7 +522,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
|
|||
struct revmap_entry *rev;
|
||||
unsigned long v, r, rb, mask, bits;
|
||||
|
||||
if (pte_index >= HPT_NPTE)
|
||||
if (pte_index >= kvm->arch.hpt_npte)
|
||||
return H_PARAMETER;
|
||||
|
||||
hpte = (unsigned long *)(kvm->arch.hpt_virt + (pte_index << 4));
|
||||
|
@ -583,7 +584,7 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
|
|||
int i, n = 1;
|
||||
struct revmap_entry *rev = NULL;
|
||||
|
||||
if (pte_index >= HPT_NPTE)
|
||||
if (pte_index >= kvm->arch.hpt_npte)
|
||||
return H_PARAMETER;
|
||||
if (flags & H_READ_4) {
|
||||
pte_index &= ~3;
|
||||
|
@ -678,7 +679,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
|
|||
somask = (1UL << 28) - 1;
|
||||
vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT;
|
||||
}
|
||||
hash = (vsid ^ ((eaddr & somask) >> pshift)) & HPT_HASH_MASK;
|
||||
hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvm->arch.hpt_mask;
|
||||
avpn = slb_v & ~(somask >> 16); /* also includes B */
|
||||
avpn |= (eaddr & somask) >> 16;
|
||||
|
||||
|
@ -723,7 +724,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
|
|||
if (val & HPTE_V_SECONDARY)
|
||||
break;
|
||||
val |= HPTE_V_SECONDARY;
|
||||
hash = hash ^ HPT_HASH_MASK;
|
||||
hash = hash ^ kvm->arch.hpt_mask;
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
|
|
@ -612,6 +612,12 @@ static void kvmppc_fill_pt_regs(struct pt_regs *regs)
|
|||
regs->link = lr;
|
||||
}
|
||||
|
||||
/*
|
||||
* For interrupts needed to be handled by host interrupt handlers,
|
||||
* corresponding host handler are called from here in similar way
|
||||
* (but not exact) as they are called from low level handler
|
||||
* (such as from arch/powerpc/kernel/head_fsl_booke.S).
|
||||
*/
|
||||
static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
|
||||
unsigned int exit_nr)
|
||||
{
|
||||
|
@ -639,6 +645,17 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
|
|||
kvmppc_fill_pt_regs(®s);
|
||||
performance_monitor_exception(®s);
|
||||
break;
|
||||
case BOOKE_INTERRUPT_WATCHDOG:
|
||||
kvmppc_fill_pt_regs(®s);
|
||||
#ifdef CONFIG_BOOKE_WDT
|
||||
WatchdogException(®s);
|
||||
#else
|
||||
unknown_exception(®s);
|
||||
#endif
|
||||
break;
|
||||
case BOOKE_INTERRUPT_CRITICAL:
|
||||
unknown_exception(®s);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -683,6 +700,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
|||
r = RESUME_GUEST;
|
||||
break;
|
||||
|
||||
case BOOKE_INTERRUPT_WATCHDOG:
|
||||
r = RESUME_GUEST;
|
||||
break;
|
||||
|
||||
case BOOKE_INTERRUPT_DOORBELL:
|
||||
kvmppc_account_exit(vcpu, DBELL_EXITS);
|
||||
r = RESUME_GUEST;
|
||||
|
@ -1267,6 +1288,11 @@ void kvmppc_decrementer_func(unsigned long data)
|
|||
{
|
||||
struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
|
||||
|
||||
if (vcpu->arch.tcr & TCR_ARE) {
|
||||
vcpu->arch.dec = vcpu->arch.decar;
|
||||
kvmppc_emulate_dec(vcpu);
|
||||
}
|
||||
|
||||
kvmppc_set_tsr_bits(vcpu, TSR_DIS);
|
||||
}
|
||||
|
||||
|
|
|
@ -24,6 +24,7 @@
|
|||
#include "booke.h"
|
||||
|
||||
#define OP_19_XOP_RFI 50
|
||||
#define OP_19_XOP_RFCI 51
|
||||
|
||||
#define OP_31_XOP_MFMSR 83
|
||||
#define OP_31_XOP_WRTEE 131
|
||||
|
@ -36,6 +37,12 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu)
|
|||
kvmppc_set_msr(vcpu, vcpu->arch.shared->srr1);
|
||||
}
|
||||
|
||||
static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
vcpu->arch.pc = vcpu->arch.csrr0;
|
||||
kvmppc_set_msr(vcpu, vcpu->arch.csrr1);
|
||||
}
|
||||
|
||||
int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||
unsigned int inst, int *advance)
|
||||
{
|
||||
|
@ -52,6 +59,12 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
|||
*advance = 0;
|
||||
break;
|
||||
|
||||
case OP_19_XOP_RFCI:
|
||||
kvmppc_emul_rfci(vcpu);
|
||||
kvmppc_set_exit_type(vcpu, EMULATED_RFCI_EXITS);
|
||||
*advance = 0;
|
||||
break;
|
||||
|
||||
default:
|
||||
emulated = EMULATE_FAIL;
|
||||
break;
|
||||
|
@ -113,6 +126,12 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val)
|
|||
case SPRN_ESR:
|
||||
vcpu->arch.shared->esr = spr_val;
|
||||
break;
|
||||
case SPRN_CSRR0:
|
||||
vcpu->arch.csrr0 = spr_val;
|
||||
break;
|
||||
case SPRN_CSRR1:
|
||||
vcpu->arch.csrr1 = spr_val;
|
||||
break;
|
||||
case SPRN_DBCR0:
|
||||
vcpu->arch.dbcr0 = spr_val;
|
||||
break;
|
||||
|
@ -129,6 +148,9 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val)
|
|||
kvmppc_set_tcr(vcpu, spr_val);
|
||||
break;
|
||||
|
||||
case SPRN_DECAR:
|
||||
vcpu->arch.decar = spr_val;
|
||||
break;
|
||||
/*
|
||||
* Note: SPRG4-7 are user-readable.
|
||||
* These values are loaded into the real SPRGs when resuming the
|
||||
|
@ -229,6 +251,12 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val)
|
|||
case SPRN_ESR:
|
||||
*spr_val = vcpu->arch.shared->esr;
|
||||
break;
|
||||
case SPRN_CSRR0:
|
||||
*spr_val = vcpu->arch.csrr0;
|
||||
break;
|
||||
case SPRN_CSRR1:
|
||||
*spr_val = vcpu->arch.csrr1;
|
||||
break;
|
||||
case SPRN_DBCR0:
|
||||
*spr_val = vcpu->arch.dbcr0;
|
||||
break;
|
||||
|
|
|
@ -52,16 +52,21 @@
|
|||
(1<<BOOKE_INTERRUPT_PROGRAM) | \
|
||||
(1<<BOOKE_INTERRUPT_DTLB_MISS))
|
||||
|
||||
.macro KVM_HANDLER ivor_nr
|
||||
.macro KVM_HANDLER ivor_nr scratch srr0
|
||||
_GLOBAL(kvmppc_handler_\ivor_nr)
|
||||
/* Get pointer to vcpu and record exit number. */
|
||||
mtspr SPRN_SPRG_WSCRATCH0, r4
|
||||
mtspr \scratch , r4
|
||||
mfspr r4, SPRN_SPRG_RVCPU
|
||||
stw r3, VCPU_GPR(R3)(r4)
|
||||
stw r5, VCPU_GPR(R5)(r4)
|
||||
stw r6, VCPU_GPR(R6)(r4)
|
||||
mfspr r3, \scratch
|
||||
mfctr r5
|
||||
lis r6, kvmppc_resume_host@h
|
||||
stw r3, VCPU_GPR(R4)(r4)
|
||||
stw r5, VCPU_CTR(r4)
|
||||
mfspr r3, \srr0
|
||||
lis r6, kvmppc_resume_host@h
|
||||
stw r3, VCPU_PC(r4)
|
||||
li r5, \ivor_nr
|
||||
ori r6, r6, kvmppc_resume_host@l
|
||||
mtctr r6
|
||||
|
@ -69,37 +74,35 @@ _GLOBAL(kvmppc_handler_\ivor_nr)
|
|||
.endm
|
||||
|
||||
_GLOBAL(kvmppc_handlers_start)
|
||||
KVM_HANDLER BOOKE_INTERRUPT_CRITICAL
|
||||
KVM_HANDLER BOOKE_INTERRUPT_MACHINE_CHECK
|
||||
KVM_HANDLER BOOKE_INTERRUPT_DATA_STORAGE
|
||||
KVM_HANDLER BOOKE_INTERRUPT_INST_STORAGE
|
||||
KVM_HANDLER BOOKE_INTERRUPT_EXTERNAL
|
||||
KVM_HANDLER BOOKE_INTERRUPT_ALIGNMENT
|
||||
KVM_HANDLER BOOKE_INTERRUPT_PROGRAM
|
||||
KVM_HANDLER BOOKE_INTERRUPT_FP_UNAVAIL
|
||||
KVM_HANDLER BOOKE_INTERRUPT_SYSCALL
|
||||
KVM_HANDLER BOOKE_INTERRUPT_AP_UNAVAIL
|
||||
KVM_HANDLER BOOKE_INTERRUPT_DECREMENTER
|
||||
KVM_HANDLER BOOKE_INTERRUPT_FIT
|
||||
KVM_HANDLER BOOKE_INTERRUPT_WATCHDOG
|
||||
KVM_HANDLER BOOKE_INTERRUPT_DTLB_MISS
|
||||
KVM_HANDLER BOOKE_INTERRUPT_ITLB_MISS
|
||||
KVM_HANDLER BOOKE_INTERRUPT_DEBUG
|
||||
KVM_HANDLER BOOKE_INTERRUPT_SPE_UNAVAIL
|
||||
KVM_HANDLER BOOKE_INTERRUPT_SPE_FP_DATA
|
||||
KVM_HANDLER BOOKE_INTERRUPT_SPE_FP_ROUND
|
||||
KVM_HANDLER BOOKE_INTERRUPT_CRITICAL SPRN_SPRG_RSCRATCH_CRIT SPRN_CSRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_MACHINE_CHECK SPRN_SPRG_RSCRATCH_MC SPRN_MCSRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_DATA_STORAGE SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_INST_STORAGE SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_EXTERNAL SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_ALIGNMENT SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_PROGRAM SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_FP_UNAVAIL SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_SYSCALL SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_AP_UNAVAIL SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_DECREMENTER SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_FIT SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_WATCHDOG SPRN_SPRG_RSCRATCH_CRIT SPRN_CSRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_DTLB_MISS SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_ITLB_MISS SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_DEBUG SPRN_SPRG_RSCRATCH_CRIT SPRN_CSRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_SPE_UNAVAIL SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_SPE_FP_DATA SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
KVM_HANDLER BOOKE_INTERRUPT_SPE_FP_ROUND SPRN_SPRG_RSCRATCH0 SPRN_SRR0
|
||||
|
||||
_GLOBAL(kvmppc_handler_len)
|
||||
.long kvmppc_handler_1 - kvmppc_handler_0
|
||||
|
||||
|
||||
/* Registers:
|
||||
* SPRG_SCRATCH0: guest r4
|
||||
* r4: vcpu pointer
|
||||
* r5: KVM exit number
|
||||
*/
|
||||
_GLOBAL(kvmppc_resume_host)
|
||||
stw r3, VCPU_GPR(R3)(r4)
|
||||
mfcr r3
|
||||
stw r3, VCPU_CR(r4)
|
||||
stw r7, VCPU_GPR(R7)(r4)
|
||||
|
@ -180,10 +183,6 @@ _GLOBAL(kvmppc_resume_host)
|
|||
stw r3, VCPU_LR(r4)
|
||||
mfxer r3
|
||||
stw r3, VCPU_XER(r4)
|
||||
mfspr r3, SPRN_SPRG_RSCRATCH0
|
||||
stw r3, VCPU_GPR(R4)(r4)
|
||||
mfspr r3, SPRN_SRR0
|
||||
stw r3, VCPU_PC(r4)
|
||||
|
||||
/* Restore host stack pointer and PID before IVPR, since the host
|
||||
* exception handlers use them. */
|
||||
|
|
|
@ -262,7 +262,7 @@ kvm_lvl_handler BOOKE_INTERRUPT_CRITICAL, \
|
|||
kvm_lvl_handler BOOKE_INTERRUPT_MACHINE_CHECK, \
|
||||
SPRN_SPRG_RSCRATCH_MC, SPRN_MCSRR0, SPRN_MCSRR1, 0
|
||||
kvm_handler BOOKE_INTERRUPT_DATA_STORAGE, \
|
||||
SPRN_SRR0, SPRN_SRR1, (NEED_EMU | NEED_DEAR)
|
||||
SPRN_SRR0, SPRN_SRR1, (NEED_EMU | NEED_DEAR | NEED_ESR)
|
||||
kvm_handler BOOKE_INTERRUPT_INST_STORAGE, SPRN_SRR0, SPRN_SRR1, NEED_ESR
|
||||
kvm_handler BOOKE_INTERRUPT_EXTERNAL, SPRN_SRR0, SPRN_SRR1, 0
|
||||
kvm_handler BOOKE_INTERRUPT_ALIGNMENT, \
|
||||
|
|
|
@ -269,6 +269,9 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val)
|
|||
*spr_val = vcpu->arch.shared->mas7_3 >> 32;
|
||||
break;
|
||||
#endif
|
||||
case SPRN_DECAR:
|
||||
*spr_val = vcpu->arch.decar;
|
||||
break;
|
||||
case SPRN_TLB0CFG:
|
||||
*spr_val = vcpu->arch.tlbcfg[0];
|
||||
break;
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
/*
|
||||
* Copyright (C) 2010 Freescale Semiconductor, Inc. All rights reserved.
|
||||
* Copyright (C) 2010,2012 Freescale Semiconductor, Inc. All rights reserved.
|
||||
*
|
||||
* Author: Varun Sethi, <varun.sethi@freescale.com>
|
||||
*
|
||||
|
@ -57,7 +57,8 @@ void kvmppc_e500_tlbil_one(struct kvmppc_vcpu_e500 *vcpu_e500,
|
|||
struct kvm_book3e_206_tlb_entry *gtlbe)
|
||||
{
|
||||
unsigned int tid, ts;
|
||||
u32 val, eaddr, lpid;
|
||||
gva_t eaddr;
|
||||
u32 val, lpid;
|
||||
unsigned long flags;
|
||||
|
||||
ts = get_tlb_ts(gtlbe);
|
||||
|
@ -183,6 +184,9 @@ int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu)
|
|||
|
||||
vcpu->arch.shadow_epcr = SPRN_EPCR_DSIGS | SPRN_EPCR_DGTMI | \
|
||||
SPRN_EPCR_DUVD;
|
||||
#ifdef CONFIG_64BIT
|
||||
vcpu->arch.shadow_epcr |= SPRN_EPCR_ICM;
|
||||
#endif
|
||||
vcpu->arch.shadow_msrp = MSRP_UCLEP | MSRP_DEP | MSRP_PMMP;
|
||||
vcpu->arch.eplc = EPC_EGS | (vcpu->kvm->arch.lpid << EPC_ELPID_SHIFT);
|
||||
vcpu->arch.epsc = vcpu->arch.eplc;
|
||||
|
|
|
@ -59,11 +59,13 @@
|
|||
#define OP_31_XOP_STHBRX 918
|
||||
|
||||
#define OP_LWZ 32
|
||||
#define OP_LD 58
|
||||
#define OP_LWZU 33
|
||||
#define OP_LBZ 34
|
||||
#define OP_LBZU 35
|
||||
#define OP_STW 36
|
||||
#define OP_STWU 37
|
||||
#define OP_STD 62
|
||||
#define OP_STB 38
|
||||
#define OP_STBU 39
|
||||
#define OP_LHZ 40
|
||||
|
@ -392,6 +394,12 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
|
|||
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
|
||||
break;
|
||||
|
||||
/* TBD: Add support for other 64 bit load variants like ldu, ldux, ldx etc. */
|
||||
case OP_LD:
|
||||
rt = get_rt(inst);
|
||||
emulated = kvmppc_handle_load(run, vcpu, rt, 8, 1);
|
||||
break;
|
||||
|
||||
case OP_LWZU:
|
||||
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
|
||||
kvmppc_set_gpr(vcpu, ra, vcpu->arch.vaddr_accessed);
|
||||
|
@ -412,6 +420,14 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
|
|||
4, 1);
|
||||
break;
|
||||
|
||||
/* TBD: Add support for other 64 bit store variants like stdu, stdux, stdx etc. */
|
||||
case OP_STD:
|
||||
rs = get_rs(inst);
|
||||
emulated = kvmppc_handle_store(run, vcpu,
|
||||
kvmppc_get_gpr(vcpu, rs),
|
||||
8, 1);
|
||||
break;
|
||||
|
||||
case OP_STWU:
|
||||
emulated = kvmppc_handle_store(run, vcpu,
|
||||
kvmppc_get_gpr(vcpu, rs),
|
||||
|
|
|
@ -246,6 +246,7 @@ int kvm_dev_ioctl_check_extension(long ext)
|
|||
#endif
|
||||
#ifdef CONFIG_PPC_BOOK3S_64
|
||||
case KVM_CAP_SPAPR_TCE:
|
||||
case KVM_CAP_PPC_ALLOC_HTAB:
|
||||
r = 1;
|
||||
break;
|
||||
#endif /* CONFIG_PPC_BOOK3S_64 */
|
||||
|
@ -802,6 +803,23 @@ long kvm_arch_vm_ioctl(struct file *filp,
|
|||
r = -EFAULT;
|
||||
break;
|
||||
}
|
||||
|
||||
case KVM_PPC_ALLOCATE_HTAB: {
|
||||
struct kvm *kvm = filp->private_data;
|
||||
u32 htab_order;
|
||||
|
||||
r = -EFAULT;
|
||||
if (get_user(htab_order, (u32 __user *)argp))
|
||||
break;
|
||||
r = kvmppc_alloc_reset_hpt(kvm, &htab_order);
|
||||
if (r)
|
||||
break;
|
||||
r = -EFAULT;
|
||||
if (put_user(htab_order, (u32 __user *)argp))
|
||||
break;
|
||||
r = 0;
|
||||
break;
|
||||
}
|
||||
#endif /* CONFIG_KVM_BOOK3S_64_HV */
|
||||
|
||||
#ifdef CONFIG_PPC_BOOK3S_64
|
||||
|
|
|
@ -25,6 +25,7 @@ source "arch/powerpc/platforms/wsp/Kconfig"
|
|||
config KVM_GUEST
|
||||
bool "KVM Guest support"
|
||||
default n
|
||||
select EPAPR_PARAVIRT
|
||||
---help---
|
||||
This option enables various optimizations for running under the KVM
|
||||
hypervisor. Overhead for the kernel when not running inside KVM should
|
||||
|
@ -32,6 +33,14 @@ config KVM_GUEST
|
|||
|
||||
In case of doubt, say Y
|
||||
|
||||
config EPAPR_PARAVIRT
|
||||
bool "ePAPR para-virtualization support"
|
||||
default n
|
||||
help
|
||||
Enables ePAPR para-virtualization support for guests.
|
||||
|
||||
In case of doubt, say Y
|
||||
|
||||
config PPC_NATIVE
|
||||
bool
|
||||
depends on 6xx || PPC64
|
||||
|
|
|
@ -53,5 +53,7 @@ int sclp_chp_configure(struct chp_id chpid);
|
|||
int sclp_chp_deconfigure(struct chp_id chpid);
|
||||
int sclp_chp_read_info(struct sclp_chp_info *info);
|
||||
void sclp_get_ipl_info(struct sclp_ipl_info *info);
|
||||
bool sclp_has_linemode(void);
|
||||
bool sclp_has_vt220(void);
|
||||
|
||||
#endif /* _ASM_S390_SCLP_H */
|
||||
|
|
|
@ -24,6 +24,7 @@
|
|||
|
||||
#define SIGP_STATUS_CHECK_STOP 0x00000010UL
|
||||
#define SIGP_STATUS_STOPPED 0x00000040UL
|
||||
#define SIGP_STATUS_EXT_CALL_PENDING 0x00000080UL
|
||||
#define SIGP_STATUS_INVALID_PARAMETER 0x00000100UL
|
||||
#define SIGP_STATUS_INCORRECT_STATE 0x00000200UL
|
||||
#define SIGP_STATUS_NOT_RUNNING 0x00000400UL
|
||||
|
|
|
@ -61,6 +61,7 @@
|
|||
#include <asm/kvm_virtio.h>
|
||||
#include <asm/diag.h>
|
||||
#include <asm/os_info.h>
|
||||
#include <asm/sclp.h>
|
||||
#include "entry.h"
|
||||
|
||||
long psw_kernel_bits = PSW_DEFAULT_KEY | PSW_MASK_BASE | PSW_ASC_PRIMARY |
|
||||
|
@ -136,9 +137,14 @@ __setup("condev=", condev_setup);
|
|||
|
||||
static void __init set_preferred_console(void)
|
||||
{
|
||||
if (MACHINE_IS_KVM)
|
||||
add_preferred_console("hvc", 0, NULL);
|
||||
else if (CONSOLE_IS_3215 || CONSOLE_IS_SCLP)
|
||||
if (MACHINE_IS_KVM) {
|
||||
if (sclp_has_vt220())
|
||||
add_preferred_console("ttyS", 1, NULL);
|
||||
else if (sclp_has_linemode())
|
||||
add_preferred_console("ttyS", 0, NULL);
|
||||
else
|
||||
add_preferred_console("hvc", 0, NULL);
|
||||
} else if (CONSOLE_IS_3215 || CONSOLE_IS_SCLP)
|
||||
add_preferred_console("ttyS", 0, NULL);
|
||||
else if (CONSOLE_IS_3270)
|
||||
add_preferred_console("tty3270", 0, NULL);
|
||||
|
|
|
@ -347,6 +347,7 @@ static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
|
|||
vcpu->arch.guest_fpregs.fpc = 0;
|
||||
asm volatile("lfpc %0" : : "Q" (vcpu->arch.guest_fpregs.fpc));
|
||||
vcpu->arch.sie_block->gbea = 1;
|
||||
atomic_set_mask(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
|
||||
}
|
||||
|
||||
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
|
||||
|
|
|
@ -26,19 +26,23 @@ static int __sigp_sense(struct kvm_vcpu *vcpu, u16 cpu_addr,
|
|||
int rc;
|
||||
|
||||
if (cpu_addr >= KVM_MAX_VCPUS)
|
||||
return 3; /* not operational */
|
||||
return SIGP_CC_NOT_OPERATIONAL;
|
||||
|
||||
spin_lock(&fi->lock);
|
||||
if (fi->local_int[cpu_addr] == NULL)
|
||||
rc = 3; /* not operational */
|
||||
rc = SIGP_CC_NOT_OPERATIONAL;
|
||||
else if (!(atomic_read(fi->local_int[cpu_addr]->cpuflags)
|
||||
& CPUSTAT_STOPPED)) {
|
||||
& (CPUSTAT_ECALL_PEND | CPUSTAT_STOPPED)))
|
||||
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
|
||||
else {
|
||||
*reg &= 0xffffffff00000000UL;
|
||||
rc = 1; /* status stored */
|
||||
} else {
|
||||
*reg &= 0xffffffff00000000UL;
|
||||
*reg |= SIGP_STATUS_STOPPED;
|
||||
rc = 1; /* status stored */
|
||||
if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
|
||||
& CPUSTAT_ECALL_PEND)
|
||||
*reg |= SIGP_STATUS_EXT_CALL_PENDING;
|
||||
if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
|
||||
& CPUSTAT_STOPPED)
|
||||
*reg |= SIGP_STATUS_STOPPED;
|
||||
rc = SIGP_CC_STATUS_STORED;
|
||||
}
|
||||
spin_unlock(&fi->lock);
|
||||
|
||||
|
@ -54,7 +58,7 @@ static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 cpu_addr)
|
|||
int rc;
|
||||
|
||||
if (cpu_addr >= KVM_MAX_VCPUS)
|
||||
return 3; /* not operational */
|
||||
return SIGP_CC_NOT_OPERATIONAL;
|
||||
|
||||
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
|
||||
if (!inti)
|
||||
|
@ -66,7 +70,7 @@ static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 cpu_addr)
|
|||
spin_lock(&fi->lock);
|
||||
li = fi->local_int[cpu_addr];
|
||||
if (li == NULL) {
|
||||
rc = 3; /* not operational */
|
||||
rc = SIGP_CC_NOT_OPERATIONAL;
|
||||
kfree(inti);
|
||||
goto unlock;
|
||||
}
|
||||
|
@ -77,7 +81,7 @@ static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 cpu_addr)
|
|||
if (waitqueue_active(&li->wq))
|
||||
wake_up_interruptible(&li->wq);
|
||||
spin_unlock_bh(&li->lock);
|
||||
rc = 0; /* order accepted */
|
||||
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
|
||||
VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr);
|
||||
unlock:
|
||||
spin_unlock(&fi->lock);
|
||||
|
@ -92,7 +96,7 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 cpu_addr)
|
|||
int rc;
|
||||
|
||||
if (cpu_addr >= KVM_MAX_VCPUS)
|
||||
return 3; /* not operational */
|
||||
return SIGP_CC_NOT_OPERATIONAL;
|
||||
|
||||
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
|
||||
if (!inti)
|
||||
|
@ -104,7 +108,7 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 cpu_addr)
|
|||
spin_lock(&fi->lock);
|
||||
li = fi->local_int[cpu_addr];
|
||||
if (li == NULL) {
|
||||
rc = 3; /* not operational */
|
||||
rc = SIGP_CC_NOT_OPERATIONAL;
|
||||
kfree(inti);
|
||||
goto unlock;
|
||||
}
|
||||
|
@ -115,7 +119,7 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 cpu_addr)
|
|||
if (waitqueue_active(&li->wq))
|
||||
wake_up_interruptible(&li->wq);
|
||||
spin_unlock_bh(&li->lock);
|
||||
rc = 0; /* order accepted */
|
||||
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
|
||||
VCPU_EVENT(vcpu, 4, "sent sigp ext call to cpu %x", cpu_addr);
|
||||
unlock:
|
||||
spin_unlock(&fi->lock);
|
||||
|
@ -143,7 +147,7 @@ static int __inject_sigp_stop(struct kvm_s390_local_interrupt *li, int action)
|
|||
out:
|
||||
spin_unlock_bh(&li->lock);
|
||||
|
||||
return 0; /* order accepted */
|
||||
return SIGP_CC_ORDER_CODE_ACCEPTED;
|
||||
}
|
||||
|
||||
static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, int action)
|
||||
|
@ -153,12 +157,12 @@ static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, int action)
|
|||
int rc;
|
||||
|
||||
if (cpu_addr >= KVM_MAX_VCPUS)
|
||||
return 3; /* not operational */
|
||||
return SIGP_CC_NOT_OPERATIONAL;
|
||||
|
||||
spin_lock(&fi->lock);
|
||||
li = fi->local_int[cpu_addr];
|
||||
if (li == NULL) {
|
||||
rc = 3; /* not operational */
|
||||
rc = SIGP_CC_NOT_OPERATIONAL;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
|
@ -182,11 +186,11 @@ static int __sigp_set_arch(struct kvm_vcpu *vcpu, u32 parameter)
|
|||
|
||||
switch (parameter & 0xff) {
|
||||
case 0:
|
||||
rc = 3; /* not operational */
|
||||
rc = SIGP_CC_NOT_OPERATIONAL;
|
||||
break;
|
||||
case 1:
|
||||
case 2:
|
||||
rc = 0; /* order accepted */
|
||||
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
|
||||
break;
|
||||
default:
|
||||
rc = -EOPNOTSUPP;
|
||||
|
@ -207,21 +211,23 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 cpu_addr, u32 address,
|
|||
address = address & 0x7fffe000u;
|
||||
if (copy_from_guest_absolute(vcpu, &tmp, address, 1) ||
|
||||
copy_from_guest_absolute(vcpu, &tmp, address + PAGE_SIZE, 1)) {
|
||||
*reg &= 0xffffffff00000000UL;
|
||||
*reg |= SIGP_STATUS_INVALID_PARAMETER;
|
||||
return 1; /* invalid parameter */
|
||||
return SIGP_CC_STATUS_STORED;
|
||||
}
|
||||
|
||||
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
|
||||
if (!inti)
|
||||
return 2; /* busy */
|
||||
return SIGP_CC_BUSY;
|
||||
|
||||
spin_lock(&fi->lock);
|
||||
if (cpu_addr < KVM_MAX_VCPUS)
|
||||
li = fi->local_int[cpu_addr];
|
||||
|
||||
if (li == NULL) {
|
||||
rc = 1; /* incorrect state */
|
||||
*reg &= SIGP_STATUS_INCORRECT_STATE;
|
||||
*reg &= 0xffffffff00000000UL;
|
||||
*reg |= SIGP_STATUS_INCORRECT_STATE;
|
||||
rc = SIGP_CC_STATUS_STORED;
|
||||
kfree(inti);
|
||||
goto out_fi;
|
||||
}
|
||||
|
@ -229,8 +235,9 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 cpu_addr, u32 address,
|
|||
spin_lock_bh(&li->lock);
|
||||
/* cpu must be in stopped state */
|
||||
if (!(atomic_read(li->cpuflags) & CPUSTAT_STOPPED)) {
|
||||
rc = 1; /* incorrect state */
|
||||
*reg &= SIGP_STATUS_INCORRECT_STATE;
|
||||
*reg &= 0xffffffff00000000UL;
|
||||
*reg |= SIGP_STATUS_INCORRECT_STATE;
|
||||
rc = SIGP_CC_STATUS_STORED;
|
||||
kfree(inti);
|
||||
goto out_li;
|
||||
}
|
||||
|
@ -242,7 +249,7 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 cpu_addr, u32 address,
|
|||
atomic_set(&li->active, 1);
|
||||
if (waitqueue_active(&li->wq))
|
||||
wake_up_interruptible(&li->wq);
|
||||
rc = 0; /* order accepted */
|
||||
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
|
||||
|
||||
VCPU_EVENT(vcpu, 4, "set prefix of cpu %02x to %x", cpu_addr, address);
|
||||
out_li:
|
||||
|
@ -259,21 +266,21 @@ static int __sigp_sense_running(struct kvm_vcpu *vcpu, u16 cpu_addr,
|
|||
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
|
||||
|
||||
if (cpu_addr >= KVM_MAX_VCPUS)
|
||||
return 3; /* not operational */
|
||||
return SIGP_CC_NOT_OPERATIONAL;
|
||||
|
||||
spin_lock(&fi->lock);
|
||||
if (fi->local_int[cpu_addr] == NULL)
|
||||
rc = 3; /* not operational */
|
||||
rc = SIGP_CC_NOT_OPERATIONAL;
|
||||
else {
|
||||
if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
|
||||
& CPUSTAT_RUNNING) {
|
||||
/* running */
|
||||
rc = 1;
|
||||
rc = SIGP_CC_ORDER_CODE_ACCEPTED;
|
||||
} else {
|
||||
/* not running */
|
||||
*reg &= 0xffffffff00000000UL;
|
||||
*reg |= SIGP_STATUS_NOT_RUNNING;
|
||||
rc = 0;
|
||||
rc = SIGP_CC_STATUS_STORED;
|
||||
}
|
||||
}
|
||||
spin_unlock(&fi->lock);
|
||||
|
@ -286,23 +293,23 @@ static int __sigp_sense_running(struct kvm_vcpu *vcpu, u16 cpu_addr,
|
|||
|
||||
static int __sigp_restart(struct kvm_vcpu *vcpu, u16 cpu_addr)
|
||||
{
|
||||
int rc = 0;
|
||||
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
|
||||
struct kvm_s390_local_interrupt *li;
|
||||
int rc = SIGP_CC_ORDER_CODE_ACCEPTED;
|
||||
|
||||
if (cpu_addr >= KVM_MAX_VCPUS)
|
||||
return 3; /* not operational */
|
||||
return SIGP_CC_NOT_OPERATIONAL;
|
||||
|
||||
spin_lock(&fi->lock);
|
||||
li = fi->local_int[cpu_addr];
|
||||
if (li == NULL) {
|
||||
rc = 3; /* not operational */
|
||||
rc = SIGP_CC_NOT_OPERATIONAL;
|
||||
goto out;
|
||||
}
|
||||
|
||||
spin_lock_bh(&li->lock);
|
||||
if (li->action_bits & ACTION_STOP_ON_STOP)
|
||||
rc = 2; /* busy */
|
||||
rc = SIGP_CC_BUSY;
|
||||
else
|
||||
VCPU_EVENT(vcpu, 4, "sigp restart %x to handle userspace",
|
||||
cpu_addr);
|
||||
|
@ -377,7 +384,7 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu)
|
|||
case SIGP_RESTART:
|
||||
vcpu->stat.instruction_sigp_restart++;
|
||||
rc = __sigp_restart(vcpu, cpu_addr);
|
||||
if (rc == 2) /* busy */
|
||||
if (rc == SIGP_CC_BUSY)
|
||||
break;
|
||||
/* user space must know about restart */
|
||||
default:
|
||||
|
|
|
@ -465,6 +465,8 @@ static inline u32 safe_apic_wait_icr_idle(void)
|
|||
return apic->safe_wait_icr_idle();
|
||||
}
|
||||
|
||||
extern void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v));
|
||||
|
||||
#else /* CONFIG_X86_LOCAL_APIC */
|
||||
|
||||
static inline u32 apic_read(u32 reg) { return 0; }
|
||||
|
@ -474,6 +476,7 @@ static inline u64 apic_icr_read(void) { return 0; }
|
|||
static inline void apic_icr_write(u32 low, u32 high) { }
|
||||
static inline void apic_wait_icr_idle(void) { }
|
||||
static inline u32 safe_apic_wait_icr_idle(void) { return 0; }
|
||||
static inline void apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v)) {}
|
||||
|
||||
#endif /* CONFIG_X86_LOCAL_APIC */
|
||||
|
||||
|
|
|
@ -264,6 +264,13 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
|
|||
* This operation is non-atomic and can be reordered.
|
||||
* If two examples of this operation race, one can appear to succeed
|
||||
* but actually fail. You must protect multiple accesses with a lock.
|
||||
*
|
||||
* Note: the operation is performed atomically with respect to
|
||||
* the local CPU, but not other CPUs. Portable code should not
|
||||
* rely on this behaviour.
|
||||
* KVM relies on this behaviour on x86 for modifying memory that is also
|
||||
* accessed from a hypervisor on the same CPU if running in a VM: don't change
|
||||
* this without also updating arch/x86/kernel/kvm.c
|
||||
*/
|
||||
static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
|
||||
{
|
||||
|
|
|
@ -49,6 +49,7 @@ extern const struct hypervisor_x86 *x86_hyper;
|
|||
extern const struct hypervisor_x86 x86_hyper_vmware;
|
||||
extern const struct hypervisor_x86 x86_hyper_ms_hyperv;
|
||||
extern const struct hypervisor_x86 x86_hyper_xen_hvm;
|
||||
extern const struct hypervisor_x86 x86_hyper_kvm;
|
||||
|
||||
static inline bool hypervisor_x2apic_available(void)
|
||||
{
|
||||
|
|
|
@ -12,6 +12,7 @@
|
|||
/* Select x86 specific features in <linux/kvm.h> */
|
||||
#define __KVM_HAVE_PIT
|
||||
#define __KVM_HAVE_IOAPIC
|
||||
#define __KVM_HAVE_IRQ_LINE
|
||||
#define __KVM_HAVE_DEVICE_ASSIGNMENT
|
||||
#define __KVM_HAVE_MSI
|
||||
#define __KVM_HAVE_USER_NMI
|
||||
|
|
|
@ -192,8 +192,8 @@ struct x86_emulate_ops {
|
|||
struct x86_instruction_info *info,
|
||||
enum x86_intercept_stage stage);
|
||||
|
||||
bool (*get_cpuid)(struct x86_emulate_ctxt *ctxt,
|
||||
u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
|
||||
void (*get_cpuid)(struct x86_emulate_ctxt *ctxt,
|
||||
u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
|
||||
};
|
||||
|
||||
typedef u32 __attribute__((vector_size(16))) sse128_t;
|
||||
|
@ -280,9 +280,9 @@ struct x86_emulate_ctxt {
|
|||
u8 modrm_seg;
|
||||
bool rip_relative;
|
||||
unsigned long _eip;
|
||||
struct operand memop;
|
||||
/* Fields above regs are cleared together. */
|
||||
unsigned long regs[NR_VCPU_REGS];
|
||||
struct operand memop;
|
||||
struct operand *memopp;
|
||||
struct fetch_cache fetch;
|
||||
struct read_cache io_read;
|
||||
|
|
|
@ -48,12 +48,13 @@
|
|||
|
||||
#define CR3_PAE_RESERVED_BITS ((X86_CR3_PWT | X86_CR3_PCD) - 1)
|
||||
#define CR3_NONPAE_RESERVED_BITS ((PAGE_SIZE-1) & ~(X86_CR3_PWT | X86_CR3_PCD))
|
||||
#define CR3_PCID_ENABLED_RESERVED_BITS 0xFFFFFF0000000000ULL
|
||||
#define CR3_L_MODE_RESERVED_BITS (CR3_NONPAE_RESERVED_BITS | \
|
||||
0xFFFFFF0000000000ULL)
|
||||
#define CR4_RESERVED_BITS \
|
||||
(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
|
||||
| X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \
|
||||
| X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \
|
||||
| X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
|
||||
| X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_RDWRGSFS \
|
||||
| X86_CR4_OSXMMEXCPT | X86_CR4_VMXE))
|
||||
|
||||
|
@ -175,6 +176,13 @@ enum {
|
|||
|
||||
/* apic attention bits */
|
||||
#define KVM_APIC_CHECK_VAPIC 0
|
||||
/*
|
||||
* The following bit is set with PV-EOI, unset on EOI.
|
||||
* We detect PV-EOI changes by guest by comparing
|
||||
* this bit with PV-EOI in guest memory.
|
||||
* See the implementation in apic_update_pv_eoi.
|
||||
*/
|
||||
#define KVM_APIC_PV_EOI_PENDING 1
|
||||
|
||||
/*
|
||||
* We don't want allocation failures within the mmu code, so we preallocate
|
||||
|
@ -484,6 +492,11 @@ struct kvm_vcpu_arch {
|
|||
u64 length;
|
||||
u64 status;
|
||||
} osvw;
|
||||
|
||||
struct {
|
||||
u64 msr_val;
|
||||
struct gfn_to_hva_cache data;
|
||||
} pv_eoi;
|
||||
};
|
||||
|
||||
struct kvm_lpage_info {
|
||||
|
@ -661,6 +674,7 @@ struct kvm_x86_ops {
|
|||
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
|
||||
int (*get_lpage_level)(void);
|
||||
bool (*rdtscp_supported)(void);
|
||||
bool (*invpcid_supported)(void);
|
||||
void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool host);
|
||||
|
||||
void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
|
||||
|
@ -802,7 +816,20 @@ int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
|
|||
void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
|
||||
bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
|
||||
|
||||
int kvm_pic_set_irq(void *opaque, int irq, int level);
|
||||
static inline int __kvm_irq_line_state(unsigned long *irq_state,
|
||||
int irq_source_id, int level)
|
||||
{
|
||||
/* Logical OR for level trig interrupt */
|
||||
if (level)
|
||||
__set_bit(irq_source_id, irq_state);
|
||||
else
|
||||
__clear_bit(irq_source_id, irq_state);
|
||||
|
||||
return !!(*irq_state);
|
||||
}
|
||||
|
||||
int kvm_pic_set_irq(struct kvm_pic *pic, int irq, int irq_source_id, int level);
|
||||
void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
|
||||
|
||||
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
|
||||
|
||||
|
|
|
@ -22,6 +22,7 @@
|
|||
#define KVM_FEATURE_CLOCKSOURCE2 3
|
||||
#define KVM_FEATURE_ASYNC_PF 4
|
||||
#define KVM_FEATURE_STEAL_TIME 5
|
||||
#define KVM_FEATURE_PV_EOI 6
|
||||
|
||||
/* The last 8 bits are used to indicate how to interpret the flags field
|
||||
* in pvclock structure. If no bits are set, all flags are ignored.
|
||||
|
@ -37,6 +38,7 @@
|
|||
#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
|
||||
#define MSR_KVM_ASYNC_PF_EN 0x4b564d02
|
||||
#define MSR_KVM_STEAL_TIME 0x4b564d03
|
||||
#define MSR_KVM_PV_EOI_EN 0x4b564d04
|
||||
|
||||
struct kvm_steal_time {
|
||||
__u64 steal;
|
||||
|
@ -89,6 +91,11 @@ struct kvm_vcpu_pv_apf_data {
|
|||
__u32 enabled;
|
||||
};
|
||||
|
||||
#define KVM_PV_EOI_BIT 0
|
||||
#define KVM_PV_EOI_MASK (0x1 << KVM_PV_EOI_BIT)
|
||||
#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
|
||||
#define KVM_PV_EOI_DISABLED 0x0
|
||||
|
||||
#ifdef __KERNEL__
|
||||
#include <asm/processor.h>
|
||||
|
||||
|
|
|
@ -44,6 +44,7 @@
|
|||
*/
|
||||
#define X86_CR3_PWT 0x00000008 /* Page Write Through */
|
||||
#define X86_CR3_PCD 0x00000010 /* Page Cache Disable */
|
||||
#define X86_CR3_PCID_MASK 0x00000fff /* PCID Mask */
|
||||
|
||||
/*
|
||||
* Intel CPU features in CR4
|
||||
|
@ -61,6 +62,7 @@
|
|||
#define X86_CR4_OSXMMEXCPT 0x00000400 /* enable unmasked SSE exceptions */
|
||||
#define X86_CR4_VMXE 0x00002000 /* enable VMX virtualization */
|
||||
#define X86_CR4_RDWRGSFS 0x00010000 /* enable RDWRGSFS support */
|
||||
#define X86_CR4_PCIDE 0x00020000 /* enable PCID support */
|
||||
#define X86_CR4_OSXSAVE 0x00040000 /* enable xsave and xrestore */
|
||||
#define X86_CR4_SMEP 0x00100000 /* enable SMEP support */
|
||||
|
||||
|
|
|
@ -60,6 +60,7 @@
|
|||
#define SECONDARY_EXEC_WBINVD_EXITING 0x00000040
|
||||
#define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x00000080
|
||||
#define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x00000400
|
||||
#define SECONDARY_EXEC_ENABLE_INVPCID 0x00001000
|
||||
|
||||
|
||||
#define PIN_BASED_EXT_INTR_MASK 0x00000001
|
||||
|
@ -281,6 +282,7 @@ enum vmcs_field {
|
|||
#define EXIT_REASON_EPT_MISCONFIG 49
|
||||
#define EXIT_REASON_WBINVD 54
|
||||
#define EXIT_REASON_XSETBV 55
|
||||
#define EXIT_REASON_INVPCID 58
|
||||
|
||||
/*
|
||||
* Interruption-information format
|
||||
|
@ -404,6 +406,7 @@ enum vmcs_field {
|
|||
#define VMX_EPTP_WB_BIT (1ull << 14)
|
||||
#define VMX_EPT_2MB_PAGE_BIT (1ull << 16)
|
||||
#define VMX_EPT_1GB_PAGE_BIT (1ull << 17)
|
||||
#define VMX_EPT_AD_BIT (1ull << 21)
|
||||
#define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull << 24)
|
||||
#define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25)
|
||||
#define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26)
|
||||
|
@ -415,11 +418,14 @@ enum vmcs_field {
|
|||
#define VMX_EPT_MAX_GAW 0x4
|
||||
#define VMX_EPT_MT_EPTE_SHIFT 3
|
||||
#define VMX_EPT_GAW_EPTP_SHIFT 3
|
||||
#define VMX_EPT_AD_ENABLE_BIT (1ull << 6)
|
||||
#define VMX_EPT_DEFAULT_MT 0x6ull
|
||||
#define VMX_EPT_READABLE_MASK 0x1ull
|
||||
#define VMX_EPT_WRITABLE_MASK 0x2ull
|
||||
#define VMX_EPT_EXECUTABLE_MASK 0x4ull
|
||||
#define VMX_EPT_IPAT_BIT (1ull << 6)
|
||||
#define VMX_EPT_ACCESS_BIT (1ull << 8)
|
||||
#define VMX_EPT_DIRTY_BIT (1ull << 9)
|
||||
|
||||
#define VMX_EPT_IDENTITY_PAGETABLE_ADDR 0xfffbc000ul
|
||||
|
||||
|
|
|
@ -2142,6 +2142,23 @@ int default_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
|
|||
return -EINVAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* Override the generic EOI implementation with an optimized version.
|
||||
* Only called during early boot when only one CPU is active and with
|
||||
* interrupts disabled, so we know this does not race with actual APIC driver
|
||||
* use.
|
||||
*/
|
||||
void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v))
|
||||
{
|
||||
struct apic **drv;
|
||||
|
||||
for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
|
||||
/* Should happen once for each apic */
|
||||
WARN_ON((*drv)->eoi_write == eoi_write);
|
||||
(*drv)->eoi_write = eoi_write;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Power management
|
||||
*/
|
||||
|
|
|
@ -37,6 +37,9 @@ static const __initconst struct hypervisor_x86 * const hypervisors[] =
|
|||
#endif
|
||||
&x86_hyper_vmware,
|
||||
&x86_hyper_ms_hyperv,
|
||||
#ifdef CONFIG_KVM_GUEST
|
||||
&x86_hyper_kvm,
|
||||
#endif
|
||||
};
|
||||
|
||||
const struct hypervisor_x86 *x86_hyper;
|
||||
|
|
|
@ -39,6 +39,9 @@
|
|||
#include <asm/desc.h>
|
||||
#include <asm/tlbflush.h>
|
||||
#include <asm/idle.h>
|
||||
#include <asm/apic.h>
|
||||
#include <asm/apicdef.h>
|
||||
#include <asm/hypervisor.h>
|
||||
|
||||
static int kvmapf = 1;
|
||||
|
||||
|
@ -283,6 +286,22 @@ static void kvm_register_steal_time(void)
|
|||
cpu, __pa(st));
|
||||
}
|
||||
|
||||
static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
|
||||
|
||||
static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
|
||||
{
|
||||
/**
|
||||
* This relies on __test_and_clear_bit to modify the memory
|
||||
* in a way that is atomic with respect to the local CPU.
|
||||
* The hypervisor only accesses this memory from the local CPU so
|
||||
* there's no need for lock or memory barriers.
|
||||
* An optimization barrier is implied in apic write.
|
||||
*/
|
||||
if (__test_and_clear_bit(KVM_PV_EOI_BIT, &__get_cpu_var(kvm_apic_eoi)))
|
||||
return;
|
||||
apic_write(APIC_EOI, APIC_EOI_ACK);
|
||||
}
|
||||
|
||||
void __cpuinit kvm_guest_cpu_init(void)
|
||||
{
|
||||
if (!kvm_para_available())
|
||||
|
@ -300,11 +319,20 @@ void __cpuinit kvm_guest_cpu_init(void)
|
|||
smp_processor_id());
|
||||
}
|
||||
|
||||
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
|
||||
unsigned long pa;
|
||||
/* Size alignment is implied but just to make it explicit. */
|
||||
BUILD_BUG_ON(__alignof__(kvm_apic_eoi) < 4);
|
||||
__get_cpu_var(kvm_apic_eoi) = 0;
|
||||
pa = __pa(&__get_cpu_var(kvm_apic_eoi)) | KVM_MSR_ENABLED;
|
||||
wrmsrl(MSR_KVM_PV_EOI_EN, pa);
|
||||
}
|
||||
|
||||
if (has_steal_clock)
|
||||
kvm_register_steal_time();
|
||||
}
|
||||
|
||||
static void kvm_pv_disable_apf(void *unused)
|
||||
static void kvm_pv_disable_apf(void)
|
||||
{
|
||||
if (!__get_cpu_var(apf_reason).enabled)
|
||||
return;
|
||||
|
@ -316,11 +344,23 @@ static void kvm_pv_disable_apf(void *unused)
|
|||
smp_processor_id());
|
||||
}
|
||||
|
||||
static void kvm_pv_guest_cpu_reboot(void *unused)
|
||||
{
|
||||
/*
|
||||
* We disable PV EOI before we load a new kernel by kexec,
|
||||
* since MSR_KVM_PV_EOI_EN stores a pointer into old kernel's memory.
|
||||
* New kernel can re-enable when it boots.
|
||||
*/
|
||||
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
|
||||
wrmsrl(MSR_KVM_PV_EOI_EN, 0);
|
||||
kvm_pv_disable_apf();
|
||||
}
|
||||
|
||||
static int kvm_pv_reboot_notify(struct notifier_block *nb,
|
||||
unsigned long code, void *unused)
|
||||
{
|
||||
if (code == SYS_RESTART)
|
||||
on_each_cpu(kvm_pv_disable_apf, NULL, 1);
|
||||
on_each_cpu(kvm_pv_guest_cpu_reboot, NULL, 1);
|
||||
return NOTIFY_DONE;
|
||||
}
|
||||
|
||||
|
@ -371,7 +411,9 @@ static void __cpuinit kvm_guest_cpu_online(void *dummy)
|
|||
static void kvm_guest_cpu_offline(void *dummy)
|
||||
{
|
||||
kvm_disable_steal_time();
|
||||
kvm_pv_disable_apf(NULL);
|
||||
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
|
||||
wrmsrl(MSR_KVM_PV_EOI_EN, 0);
|
||||
kvm_pv_disable_apf();
|
||||
apf_task_wake_all();
|
||||
}
|
||||
|
||||
|
@ -424,6 +466,9 @@ void __init kvm_guest_init(void)
|
|||
pv_time_ops.steal_clock = kvm_steal_clock;
|
||||
}
|
||||
|
||||
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
|
||||
apic_set_eoi_write(kvm_guest_apic_eoi_write);
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
|
||||
register_cpu_notifier(&kvm_cpu_notifier);
|
||||
|
@ -432,6 +477,19 @@ void __init kvm_guest_init(void)
|
|||
#endif
|
||||
}
|
||||
|
||||
static bool __init kvm_detect(void)
|
||||
{
|
||||
if (!kvm_para_available())
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
const struct hypervisor_x86 x86_hyper_kvm __refconst = {
|
||||
.name = "KVM",
|
||||
.detect = kvm_detect,
|
||||
};
|
||||
EXPORT_SYMBOL_GPL(x86_hyper_kvm);
|
||||
|
||||
static __init int activate_jump_labels(void)
|
||||
{
|
||||
if (has_steal_clock) {
|
||||
|
|
|
@ -201,6 +201,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
|
|||
unsigned f_lm = 0;
|
||||
#endif
|
||||
unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0;
|
||||
unsigned f_invpcid = kvm_x86_ops->invpcid_supported() ? F(INVPCID) : 0;
|
||||
|
||||
/* cpuid 1.edx */
|
||||
const u32 kvm_supported_word0_x86_features =
|
||||
|
@ -228,7 +229,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
|
|||
0 /* DS-CPL, VMX, SMX, EST */ |
|
||||
0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
|
||||
F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
|
||||
0 /* Reserved, DCA */ | F(XMM4_1) |
|
||||
F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
|
||||
F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
|
||||
0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
|
||||
F(F16C) | F(RDRAND);
|
||||
|
@ -248,7 +249,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
|
|||
/* cpuid 7.0.ebx */
|
||||
const u32 kvm_supported_word9_x86_features =
|
||||
F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
|
||||
F(BMI2) | F(ERMS) | F(RTM);
|
||||
F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
|
||||
|
||||
/* all calls to cpuid_count() should be made on the same cpu */
|
||||
get_cpu();
|
||||
|
@ -409,6 +410,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
|
|||
(1 << KVM_FEATURE_NOP_IO_DELAY) |
|
||||
(1 << KVM_FEATURE_CLOCKSOURCE2) |
|
||||
(1 << KVM_FEATURE_ASYNC_PF) |
|
||||
(1 << KVM_FEATURE_PV_EOI) |
|
||||
(1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
|
||||
|
||||
if (sched_info_on())
|
||||
|
@ -639,33 +641,37 @@ static struct kvm_cpuid_entry2* check_cpuid_limit(struct kvm_vcpu *vcpu,
|
|||
return kvm_find_cpuid_entry(vcpu, maxlevel->eax, index);
|
||||
}
|
||||
|
||||
void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
|
||||
void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
|
||||
{
|
||||
u32 function, index;
|
||||
u32 function = *eax, index = *ecx;
|
||||
struct kvm_cpuid_entry2 *best;
|
||||
|
||||
function = kvm_register_read(vcpu, VCPU_REGS_RAX);
|
||||
index = kvm_register_read(vcpu, VCPU_REGS_RCX);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RAX, 0);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RBX, 0);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RCX, 0);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RDX, 0);
|
||||
best = kvm_find_cpuid_entry(vcpu, function, index);
|
||||
|
||||
if (!best)
|
||||
best = check_cpuid_limit(vcpu, function, index);
|
||||
|
||||
if (best) {
|
||||
kvm_register_write(vcpu, VCPU_REGS_RAX, best->eax);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RBX, best->ebx);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RCX, best->ecx);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RDX, best->edx);
|
||||
}
|
||||
*eax = best->eax;
|
||||
*ebx = best->ebx;
|
||||
*ecx = best->ecx;
|
||||
*edx = best->edx;
|
||||
} else
|
||||
*eax = *ebx = *ecx = *edx = 0;
|
||||
}
|
||||
|
||||
void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
u32 function, eax, ebx, ecx, edx;
|
||||
|
||||
function = eax = kvm_register_read(vcpu, VCPU_REGS_RAX);
|
||||
ecx = kvm_register_read(vcpu, VCPU_REGS_RCX);
|
||||
kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RAX, eax);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RBX, ebx);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RCX, ecx);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RDX, edx);
|
||||
kvm_x86_ops->skip_emulated_instruction(vcpu);
|
||||
trace_kvm_cpuid(function,
|
||||
kvm_register_read(vcpu, VCPU_REGS_RAX),
|
||||
kvm_register_read(vcpu, VCPU_REGS_RBX),
|
||||
kvm_register_read(vcpu, VCPU_REGS_RCX),
|
||||
kvm_register_read(vcpu, VCPU_REGS_RDX));
|
||||
trace_kvm_cpuid(function, eax, ebx, ecx, edx);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
|
||||
|
|
|
@ -17,6 +17,7 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
|
|||
int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
|
||||
struct kvm_cpuid2 *cpuid,
|
||||
struct kvm_cpuid_entry2 __user *entries);
|
||||
void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
|
||||
|
||||
|
||||
static inline bool guest_cpuid_has_xsave(struct kvm_vcpu *vcpu)
|
||||
|
@ -51,4 +52,12 @@ static inline bool guest_cpuid_has_osvw(struct kvm_vcpu *vcpu)
|
|||
return best && (best->ecx & bit(X86_FEATURE_OSVW));
|
||||
}
|
||||
|
||||
static inline bool guest_cpuid_has_pcid(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_cpuid_entry2 *best;
|
||||
|
||||
best = kvm_find_cpuid_entry(vcpu, 1, 0);
|
||||
return best && (best->ecx & bit(X86_FEATURE_PCID));
|
||||
}
|
||||
|
||||
#endif
|
||||
|
|
|
@ -433,11 +433,32 @@ static int emulator_check_intercept(struct x86_emulate_ctxt *ctxt,
|
|||
return ctxt->ops->intercept(ctxt, &info, stage);
|
||||
}
|
||||
|
||||
static void assign_masked(ulong *dest, ulong src, ulong mask)
|
||||
{
|
||||
*dest = (*dest & ~mask) | (src & mask);
|
||||
}
|
||||
|
||||
static inline unsigned long ad_mask(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
return (1UL << (ctxt->ad_bytes << 3)) - 1;
|
||||
}
|
||||
|
||||
static ulong stack_mask(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
u16 sel;
|
||||
struct desc_struct ss;
|
||||
|
||||
if (ctxt->mode == X86EMUL_MODE_PROT64)
|
||||
return ~0UL;
|
||||
ctxt->ops->get_segment(ctxt, &sel, &ss, NULL, VCPU_SREG_SS);
|
||||
return ~0U >> ((ss.d ^ 1) * 16); /* d=0: 0xffff; d=1: 0xffffffff */
|
||||
}
|
||||
|
||||
static int stack_size(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
return (__fls(stack_mask(ctxt)) + 1) >> 3;
|
||||
}
|
||||
|
||||
/* Access/update address held in a register, based on addressing mode. */
|
||||
static inline unsigned long
|
||||
address_mask(struct x86_emulate_ctxt *ctxt, unsigned long reg)
|
||||
|
@ -958,6 +979,12 @@ static void decode_register_operand(struct x86_emulate_ctxt *ctxt,
|
|||
op->orig_val = op->val;
|
||||
}
|
||||
|
||||
static void adjust_modrm_seg(struct x86_emulate_ctxt *ctxt, int base_reg)
|
||||
{
|
||||
if (base_reg == VCPU_REGS_RSP || base_reg == VCPU_REGS_RBP)
|
||||
ctxt->modrm_seg = VCPU_SREG_SS;
|
||||
}
|
||||
|
||||
static int decode_modrm(struct x86_emulate_ctxt *ctxt,
|
||||
struct operand *op)
|
||||
{
|
||||
|
@ -1061,15 +1088,20 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
|
|||
|
||||
if ((base_reg & 7) == 5 && ctxt->modrm_mod == 0)
|
||||
modrm_ea += insn_fetch(s32, ctxt);
|
||||
else
|
||||
else {
|
||||
modrm_ea += ctxt->regs[base_reg];
|
||||
adjust_modrm_seg(ctxt, base_reg);
|
||||
}
|
||||
if (index_reg != 4)
|
||||
modrm_ea += ctxt->regs[index_reg] << scale;
|
||||
} else if ((ctxt->modrm_rm & 7) == 5 && ctxt->modrm_mod == 0) {
|
||||
if (ctxt->mode == X86EMUL_MODE_PROT64)
|
||||
ctxt->rip_relative = 1;
|
||||
} else
|
||||
modrm_ea += ctxt->regs[ctxt->modrm_rm];
|
||||
} else {
|
||||
base_reg = ctxt->modrm_rm;
|
||||
modrm_ea += ctxt->regs[base_reg];
|
||||
adjust_modrm_seg(ctxt, base_reg);
|
||||
}
|
||||
switch (ctxt->modrm_mod) {
|
||||
case 0:
|
||||
if (ctxt->modrm_rm == 5)
|
||||
|
@ -1264,7 +1296,8 @@ static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt,
|
|||
|
||||
/* allowed just for 8 bytes segments */
|
||||
static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
|
||||
u16 selector, struct desc_struct *desc)
|
||||
u16 selector, struct desc_struct *desc,
|
||||
ulong *desc_addr_p)
|
||||
{
|
||||
struct desc_ptr dt;
|
||||
u16 index = selector >> 3;
|
||||
|
@ -1275,7 +1308,7 @@ static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt,
|
|||
if (dt.size < index * 8 + 7)
|
||||
return emulate_gp(ctxt, selector & 0xfffc);
|
||||
|
||||
addr = dt.address + index * 8;
|
||||
*desc_addr_p = addr = dt.address + index * 8;
|
||||
return ctxt->ops->read_std(ctxt, addr, desc, sizeof *desc,
|
||||
&ctxt->exception);
|
||||
}
|
||||
|
@ -1302,11 +1335,12 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
|
|||
static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
|
||||
u16 selector, int seg)
|
||||
{
|
||||
struct desc_struct seg_desc;
|
||||
struct desc_struct seg_desc, old_desc;
|
||||
u8 dpl, rpl, cpl;
|
||||
unsigned err_vec = GP_VECTOR;
|
||||
u32 err_code = 0;
|
||||
bool null_selector = !(selector & ~0x3); /* 0000-0003 are null */
|
||||
ulong desc_addr;
|
||||
int ret;
|
||||
|
||||
memset(&seg_desc, 0, sizeof seg_desc);
|
||||
|
@ -1324,8 +1358,14 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
|
|||
goto load;
|
||||
}
|
||||
|
||||
/* NULL selector is not valid for TR, CS and SS */
|
||||
if ((seg == VCPU_SREG_CS || seg == VCPU_SREG_SS || seg == VCPU_SREG_TR)
|
||||
rpl = selector & 3;
|
||||
cpl = ctxt->ops->cpl(ctxt);
|
||||
|
||||
/* NULL selector is not valid for TR, CS and SS (except for long mode) */
|
||||
if ((seg == VCPU_SREG_CS
|
||||
|| (seg == VCPU_SREG_SS
|
||||
&& (ctxt->mode != X86EMUL_MODE_PROT64 || rpl != cpl))
|
||||
|| seg == VCPU_SREG_TR)
|
||||
&& null_selector)
|
||||
goto exception;
|
||||
|
||||
|
@ -1336,7 +1376,7 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
|
|||
if (null_selector) /* for NULL selector skip all following checks */
|
||||
goto load;
|
||||
|
||||
ret = read_segment_descriptor(ctxt, selector, &seg_desc);
|
||||
ret = read_segment_descriptor(ctxt, selector, &seg_desc, &desc_addr);
|
||||
if (ret != X86EMUL_CONTINUE)
|
||||
return ret;
|
||||
|
||||
|
@ -1352,9 +1392,7 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
|
|||
goto exception;
|
||||
}
|
||||
|
||||
rpl = selector & 3;
|
||||
dpl = seg_desc.dpl;
|
||||
cpl = ctxt->ops->cpl(ctxt);
|
||||
|
||||
switch (seg) {
|
||||
case VCPU_SREG_SS:
|
||||
|
@ -1384,6 +1422,12 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
|
|||
case VCPU_SREG_TR:
|
||||
if (seg_desc.s || (seg_desc.type != 1 && seg_desc.type != 9))
|
||||
goto exception;
|
||||
old_desc = seg_desc;
|
||||
seg_desc.type |= 2; /* busy */
|
||||
ret = ctxt->ops->cmpxchg_emulated(ctxt, desc_addr, &old_desc, &seg_desc,
|
||||
sizeof(seg_desc), &ctxt->exception);
|
||||
if (ret != X86EMUL_CONTINUE)
|
||||
return ret;
|
||||
break;
|
||||
case VCPU_SREG_LDTR:
|
||||
if (seg_desc.s || seg_desc.type != 2)
|
||||
|
@ -1474,17 +1518,22 @@ static int writeback(struct x86_emulate_ctxt *ctxt)
|
|||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static int em_push(struct x86_emulate_ctxt *ctxt)
|
||||
static int push(struct x86_emulate_ctxt *ctxt, void *data, int bytes)
|
||||
{
|
||||
struct segmented_address addr;
|
||||
|
||||
register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RSP], -ctxt->op_bytes);
|
||||
register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RSP], -bytes);
|
||||
addr.ea = register_address(ctxt, ctxt->regs[VCPU_REGS_RSP]);
|
||||
addr.seg = VCPU_SREG_SS;
|
||||
|
||||
return segmented_write(ctxt, addr, data, bytes);
|
||||
}
|
||||
|
||||
static int em_push(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
/* Disable writeback. */
|
||||
ctxt->dst.type = OP_NONE;
|
||||
return segmented_write(ctxt, addr, &ctxt->src.val, ctxt->op_bytes);
|
||||
return push(ctxt, &ctxt->src.val, ctxt->op_bytes);
|
||||
}
|
||||
|
||||
static int emulate_pop(struct x86_emulate_ctxt *ctxt,
|
||||
|
@ -1556,6 +1605,33 @@ static int em_popf(struct x86_emulate_ctxt *ctxt)
|
|||
return emulate_popf(ctxt, &ctxt->dst.val, ctxt->op_bytes);
|
||||
}
|
||||
|
||||
static int em_enter(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
int rc;
|
||||
unsigned frame_size = ctxt->src.val;
|
||||
unsigned nesting_level = ctxt->src2.val & 31;
|
||||
|
||||
if (nesting_level)
|
||||
return X86EMUL_UNHANDLEABLE;
|
||||
|
||||
rc = push(ctxt, &ctxt->regs[VCPU_REGS_RBP], stack_size(ctxt));
|
||||
if (rc != X86EMUL_CONTINUE)
|
||||
return rc;
|
||||
assign_masked(&ctxt->regs[VCPU_REGS_RBP], ctxt->regs[VCPU_REGS_RSP],
|
||||
stack_mask(ctxt));
|
||||
assign_masked(&ctxt->regs[VCPU_REGS_RSP],
|
||||
ctxt->regs[VCPU_REGS_RSP] - frame_size,
|
||||
stack_mask(ctxt));
|
||||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static int em_leave(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
assign_masked(&ctxt->regs[VCPU_REGS_RSP], ctxt->regs[VCPU_REGS_RBP],
|
||||
stack_mask(ctxt));
|
||||
return emulate_pop(ctxt, &ctxt->regs[VCPU_REGS_RBP], ctxt->op_bytes);
|
||||
}
|
||||
|
||||
static int em_push_sreg(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
int seg = ctxt->src2.val;
|
||||
|
@ -1993,8 +2069,8 @@ static bool vendor_intel(struct x86_emulate_ctxt *ctxt)
|
|||
u32 eax, ebx, ecx, edx;
|
||||
|
||||
eax = ecx = 0;
|
||||
return ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx)
|
||||
&& ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx
|
||||
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
|
||||
return ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx
|
||||
&& ecx == X86EMUL_CPUID_VENDOR_GenuineIntel_ecx
|
||||
&& edx == X86EMUL_CPUID_VENDOR_GenuineIntel_edx;
|
||||
}
|
||||
|
@ -2013,32 +2089,31 @@ static bool em_syscall_is_enabled(struct x86_emulate_ctxt *ctxt)
|
|||
|
||||
eax = 0x00000000;
|
||||
ecx = 0x00000000;
|
||||
if (ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx)) {
|
||||
/*
|
||||
* Intel ("GenuineIntel")
|
||||
* remark: Intel CPUs only support "syscall" in 64bit
|
||||
* longmode. Also an 64bit guest with a
|
||||
* 32bit compat-app running will #UD !! While this
|
||||
* behaviour can be fixed (by emulating) into AMD
|
||||
* response - CPUs of AMD can't behave like Intel.
|
||||
*/
|
||||
if (ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx &&
|
||||
ecx == X86EMUL_CPUID_VENDOR_GenuineIntel_ecx &&
|
||||
edx == X86EMUL_CPUID_VENDOR_GenuineIntel_edx)
|
||||
return false;
|
||||
ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
|
||||
/*
|
||||
* Intel ("GenuineIntel")
|
||||
* remark: Intel CPUs only support "syscall" in 64bit
|
||||
* longmode. Also an 64bit guest with a
|
||||
* 32bit compat-app running will #UD !! While this
|
||||
* behaviour can be fixed (by emulating) into AMD
|
||||
* response - CPUs of AMD can't behave like Intel.
|
||||
*/
|
||||
if (ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx &&
|
||||
ecx == X86EMUL_CPUID_VENDOR_GenuineIntel_ecx &&
|
||||
edx == X86EMUL_CPUID_VENDOR_GenuineIntel_edx)
|
||||
return false;
|
||||
|
||||
/* AMD ("AuthenticAMD") */
|
||||
if (ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx &&
|
||||
ecx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx &&
|
||||
edx == X86EMUL_CPUID_VENDOR_AuthenticAMD_edx)
|
||||
return true;
|
||||
/* AMD ("AuthenticAMD") */
|
||||
if (ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx &&
|
||||
ecx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx &&
|
||||
edx == X86EMUL_CPUID_VENDOR_AuthenticAMD_edx)
|
||||
return true;
|
||||
|
||||
/* AMD ("AMDisbetter!") */
|
||||
if (ebx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ebx &&
|
||||
ecx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ecx &&
|
||||
edx == X86EMUL_CPUID_VENDOR_AMDisbetterI_edx)
|
||||
return true;
|
||||
}
|
||||
/* AMD ("AMDisbetter!") */
|
||||
if (ebx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ebx &&
|
||||
ecx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ecx &&
|
||||
edx == X86EMUL_CPUID_VENDOR_AMDisbetterI_edx)
|
||||
return true;
|
||||
|
||||
/* default: (not Intel, not AMD), apply Intel's stricter rules... */
|
||||
return false;
|
||||
|
@ -2547,13 +2622,14 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,
|
|||
ulong old_tss_base =
|
||||
ops->get_cached_segment_base(ctxt, VCPU_SREG_TR);
|
||||
u32 desc_limit;
|
||||
ulong desc_addr;
|
||||
|
||||
/* FIXME: old_tss_base == ~0 ? */
|
||||
|
||||
ret = read_segment_descriptor(ctxt, tss_selector, &next_tss_desc);
|
||||
ret = read_segment_descriptor(ctxt, tss_selector, &next_tss_desc, &desc_addr);
|
||||
if (ret != X86EMUL_CONTINUE)
|
||||
return ret;
|
||||
ret = read_segment_descriptor(ctxt, old_tss_sel, &curr_tss_desc);
|
||||
ret = read_segment_descriptor(ctxt, old_tss_sel, &curr_tss_desc, &desc_addr);
|
||||
if (ret != X86EMUL_CONTINUE)
|
||||
return ret;
|
||||
|
||||
|
@ -2948,6 +3024,24 @@ static int em_mov_sreg_rm(struct x86_emulate_ctxt *ctxt)
|
|||
return load_segment_descriptor(ctxt, sel, ctxt->modrm_reg);
|
||||
}
|
||||
|
||||
static int em_lldt(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
u16 sel = ctxt->src.val;
|
||||
|
||||
/* Disable writeback. */
|
||||
ctxt->dst.type = OP_NONE;
|
||||
return load_segment_descriptor(ctxt, sel, VCPU_SREG_LDTR);
|
||||
}
|
||||
|
||||
static int em_ltr(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
u16 sel = ctxt->src.val;
|
||||
|
||||
/* Disable writeback. */
|
||||
ctxt->dst.type = OP_NONE;
|
||||
return load_segment_descriptor(ctxt, sel, VCPU_SREG_TR);
|
||||
}
|
||||
|
||||
static int em_invlpg(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
int rc;
|
||||
|
@ -2989,11 +3083,42 @@ static int em_vmcall(struct x86_emulate_ctxt *ctxt)
|
|||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static int emulate_store_desc_ptr(struct x86_emulate_ctxt *ctxt,
|
||||
void (*get)(struct x86_emulate_ctxt *ctxt,
|
||||
struct desc_ptr *ptr))
|
||||
{
|
||||
struct desc_ptr desc_ptr;
|
||||
|
||||
if (ctxt->mode == X86EMUL_MODE_PROT64)
|
||||
ctxt->op_bytes = 8;
|
||||
get(ctxt, &desc_ptr);
|
||||
if (ctxt->op_bytes == 2) {
|
||||
ctxt->op_bytes = 4;
|
||||
desc_ptr.address &= 0x00ffffff;
|
||||
}
|
||||
/* Disable writeback. */
|
||||
ctxt->dst.type = OP_NONE;
|
||||
return segmented_write(ctxt, ctxt->dst.addr.mem,
|
||||
&desc_ptr, 2 + ctxt->op_bytes);
|
||||
}
|
||||
|
||||
static int em_sgdt(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
return emulate_store_desc_ptr(ctxt, ctxt->ops->get_gdt);
|
||||
}
|
||||
|
||||
static int em_sidt(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
return emulate_store_desc_ptr(ctxt, ctxt->ops->get_idt);
|
||||
}
|
||||
|
||||
static int em_lgdt(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
struct desc_ptr desc_ptr;
|
||||
int rc;
|
||||
|
||||
if (ctxt->mode == X86EMUL_MODE_PROT64)
|
||||
ctxt->op_bytes = 8;
|
||||
rc = read_descriptor(ctxt, ctxt->src.addr.mem,
|
||||
&desc_ptr.size, &desc_ptr.address,
|
||||
ctxt->op_bytes);
|
||||
|
@ -3021,6 +3146,8 @@ static int em_lidt(struct x86_emulate_ctxt *ctxt)
|
|||
struct desc_ptr desc_ptr;
|
||||
int rc;
|
||||
|
||||
if (ctxt->mode == X86EMUL_MODE_PROT64)
|
||||
ctxt->op_bytes = 8;
|
||||
rc = read_descriptor(ctxt, ctxt->src.addr.mem,
|
||||
&desc_ptr.size, &desc_ptr.address,
|
||||
ctxt->op_bytes);
|
||||
|
@ -3143,6 +3270,42 @@ static int em_bsr(struct x86_emulate_ctxt *ctxt)
|
|||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static int em_cpuid(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
u32 eax, ebx, ecx, edx;
|
||||
|
||||
eax = ctxt->regs[VCPU_REGS_RAX];
|
||||
ecx = ctxt->regs[VCPU_REGS_RCX];
|
||||
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
|
||||
ctxt->regs[VCPU_REGS_RAX] = eax;
|
||||
ctxt->regs[VCPU_REGS_RBX] = ebx;
|
||||
ctxt->regs[VCPU_REGS_RCX] = ecx;
|
||||
ctxt->regs[VCPU_REGS_RDX] = edx;
|
||||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static int em_lahf(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
ctxt->regs[VCPU_REGS_RAX] &= ~0xff00UL;
|
||||
ctxt->regs[VCPU_REGS_RAX] |= (ctxt->eflags & 0xff) << 8;
|
||||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static int em_bswap(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
switch (ctxt->op_bytes) {
|
||||
#ifdef CONFIG_X86_64
|
||||
case 8:
|
||||
asm("bswap %0" : "+r"(ctxt->dst.val));
|
||||
break;
|
||||
#endif
|
||||
default:
|
||||
asm("bswap %0" : "+r"(*(u32 *)&ctxt->dst.val));
|
||||
break;
|
||||
}
|
||||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static bool valid_cr(int nr)
|
||||
{
|
||||
switch (nr) {
|
||||
|
@ -3424,14 +3587,14 @@ static struct opcode group5[] = {
|
|||
static struct opcode group6[] = {
|
||||
DI(Prot, sldt),
|
||||
DI(Prot, str),
|
||||
DI(Prot | Priv, lldt),
|
||||
DI(Prot | Priv, ltr),
|
||||
II(Prot | Priv | SrcMem16, em_lldt, lldt),
|
||||
II(Prot | Priv | SrcMem16, em_ltr, ltr),
|
||||
N, N, N, N,
|
||||
};
|
||||
|
||||
static struct group_dual group7 = { {
|
||||
DI(Mov | DstMem | Priv, sgdt),
|
||||
DI(Mov | DstMem | Priv, sidt),
|
||||
II(Mov | DstMem | Priv, em_sgdt, sgdt),
|
||||
II(Mov | DstMem | Priv, em_sidt, sidt),
|
||||
II(SrcMem | Priv, em_lgdt, lgdt),
|
||||
II(SrcMem | Priv, em_lidt, lidt),
|
||||
II(SrcNone | DstMem | Mov, em_smsw, smsw), N,
|
||||
|
@ -3538,7 +3701,7 @@ static struct opcode opcode_table[256] = {
|
|||
D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd),
|
||||
I(SrcImmFAddr | No64, em_call_far), N,
|
||||
II(ImplicitOps | Stack, em_pushf, pushf),
|
||||
II(ImplicitOps | Stack, em_popf, popf), N, N,
|
||||
II(ImplicitOps | Stack, em_popf, popf), N, I(ImplicitOps, em_lahf),
|
||||
/* 0xA0 - 0xA7 */
|
||||
I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov),
|
||||
I2bv(DstMem | SrcAcc | Mov | MemAbs | PageTable, em_mov),
|
||||
|
@ -3561,7 +3724,8 @@ static struct opcode opcode_table[256] = {
|
|||
I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg),
|
||||
G(ByteOp, group11), G(0, group11),
|
||||
/* 0xC8 - 0xCF */
|
||||
N, N, N, I(ImplicitOps | Stack, em_ret_far),
|
||||
I(Stack | SrcImmU16 | Src2ImmByte, em_enter), I(Stack, em_leave),
|
||||
N, I(ImplicitOps | Stack, em_ret_far),
|
||||
D(ImplicitOps), DI(SrcImmByte, intn),
|
||||
D(ImplicitOps | No64), II(ImplicitOps, em_iret, iret),
|
||||
/* 0xD0 - 0xD7 */
|
||||
|
@ -3635,7 +3799,7 @@ static struct opcode twobyte_table[256] = {
|
|||
X16(D(ByteOp | DstMem | SrcNone | ModRM| Mov)),
|
||||
/* 0xA0 - 0xA7 */
|
||||
I(Stack | Src2FS, em_push_sreg), I(Stack | Src2FS, em_pop_sreg),
|
||||
DI(ImplicitOps, cpuid), I(DstMem | SrcReg | ModRM | BitOp, em_bt),
|
||||
II(ImplicitOps, em_cpuid, cpuid), I(DstMem | SrcReg | ModRM | BitOp, em_bt),
|
||||
D(DstMem | SrcReg | Src2ImmByte | ModRM),
|
||||
D(DstMem | SrcReg | Src2CL | ModRM), N, N,
|
||||
/* 0xA8 - 0xAF */
|
||||
|
@ -3658,11 +3822,12 @@ static struct opcode twobyte_table[256] = {
|
|||
I(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_btc),
|
||||
I(DstReg | SrcMem | ModRM, em_bsf), I(DstReg | SrcMem | ModRM, em_bsr),
|
||||
D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov),
|
||||
/* 0xC0 - 0xCF */
|
||||
/* 0xC0 - 0xC7 */
|
||||
D2bv(DstMem | SrcReg | ModRM | Lock),
|
||||
N, D(DstMem | SrcReg | ModRM | Mov),
|
||||
N, N, N, GD(0, &group9),
|
||||
N, N, N, N, N, N, N, N,
|
||||
/* 0xC8 - 0xCF */
|
||||
X8(I(DstReg, em_bswap)),
|
||||
/* 0xD0 - 0xDF */
|
||||
N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N,
|
||||
/* 0xE0 - 0xEF */
|
||||
|
@ -4426,12 +4591,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
|
|||
break;
|
||||
case 0xb6 ... 0xb7: /* movzx */
|
||||
ctxt->dst.bytes = ctxt->op_bytes;
|
||||
ctxt->dst.val = (ctxt->d & ByteOp) ? (u8) ctxt->src.val
|
||||
ctxt->dst.val = (ctxt->src.bytes == 1) ? (u8) ctxt->src.val
|
||||
: (u16) ctxt->src.val;
|
||||
break;
|
||||
case 0xbe ... 0xbf: /* movsx */
|
||||
ctxt->dst.bytes = ctxt->op_bytes;
|
||||
ctxt->dst.val = (ctxt->d & ByteOp) ? (s8) ctxt->src.val :
|
||||
ctxt->dst.val = (ctxt->src.bytes == 1) ? (s8) ctxt->src.val :
|
||||
(s16) ctxt->src.val;
|
||||
break;
|
||||
case 0xc0 ... 0xc1: /* xadd */
|
||||
|
|
|
@ -188,14 +188,15 @@ void kvm_pic_update_irq(struct kvm_pic *s)
|
|||
pic_unlock(s);
|
||||
}
|
||||
|
||||
int kvm_pic_set_irq(void *opaque, int irq, int level)
|
||||
int kvm_pic_set_irq(struct kvm_pic *s, int irq, int irq_source_id, int level)
|
||||
{
|
||||
struct kvm_pic *s = opaque;
|
||||
int ret = -1;
|
||||
|
||||
pic_lock(s);
|
||||
if (irq >= 0 && irq < PIC_NUM_PINS) {
|
||||
ret = pic_set_irq1(&s->pics[irq >> 3], irq & 7, level);
|
||||
int irq_level = __kvm_irq_line_state(&s->irq_states[irq],
|
||||
irq_source_id, level);
|
||||
ret = pic_set_irq1(&s->pics[irq >> 3], irq & 7, irq_level);
|
||||
pic_update_irq(s);
|
||||
trace_kvm_pic_set_irq(irq >> 3, irq & 7, s->pics[irq >> 3].elcr,
|
||||
s->pics[irq >> 3].imr, ret == 0);
|
||||
|
@ -205,6 +206,16 @@ int kvm_pic_set_irq(void *opaque, int irq, int level)
|
|||
return ret;
|
||||
}
|
||||
|
||||
void kvm_pic_clear_all(struct kvm_pic *s, int irq_source_id)
|
||||
{
|
||||
int i;
|
||||
|
||||
pic_lock(s);
|
||||
for (i = 0; i < PIC_NUM_PINS; i++)
|
||||
__clear_bit(irq_source_id, &s->irq_states[i]);
|
||||
pic_unlock(s);
|
||||
}
|
||||
|
||||
/*
|
||||
* acknowledge interrupt 'irq'
|
||||
*/
|
||||
|
|
|
@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap)
|
|||
clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
|
||||
}
|
||||
|
||||
static inline int __apic_test_and_set_vector(int vec, void *bitmap)
|
||||
{
|
||||
return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
|
||||
}
|
||||
|
||||
static inline int __apic_test_and_clear_vector(int vec, void *bitmap)
|
||||
{
|
||||
return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
|
||||
}
|
||||
|
||||
static inline int apic_hw_enabled(struct kvm_lapic *apic)
|
||||
{
|
||||
return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
|
||||
|
@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap)
|
|||
return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
|
||||
}
|
||||
|
||||
static u8 count_vectors(void *bitmap)
|
||||
{
|
||||
u32 *word = bitmap;
|
||||
int word_offset;
|
||||
u8 count = 0;
|
||||
for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
|
||||
count += hweight32(word[word_offset << 2]);
|
||||
return count;
|
||||
}
|
||||
|
||||
static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
|
||||
{
|
||||
apic->irr_pending = true;
|
||||
|
@ -242,6 +262,27 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
|
|||
apic->irr_pending = true;
|
||||
}
|
||||
|
||||
static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
|
||||
{
|
||||
if (!__apic_test_and_set_vector(vec, apic->regs + APIC_ISR))
|
||||
++apic->isr_count;
|
||||
BUG_ON(apic->isr_count > MAX_APIC_VECTOR);
|
||||
/*
|
||||
* ISR (in service register) bit is set when injecting an interrupt.
|
||||
* The highest vector is injected. Thus the latest bit set matches
|
||||
* the highest bit in ISR.
|
||||
*/
|
||||
apic->highest_isr_cache = vec;
|
||||
}
|
||||
|
||||
static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
|
||||
{
|
||||
if (__apic_test_and_clear_vector(vec, apic->regs + APIC_ISR))
|
||||
--apic->isr_count;
|
||||
BUG_ON(apic->isr_count < 0);
|
||||
apic->highest_isr_cache = -1;
|
||||
}
|
||||
|
||||
int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
|
@ -270,9 +311,61 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq)
|
|||
irq->level, irq->trig_mode);
|
||||
}
|
||||
|
||||
static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
|
||||
{
|
||||
|
||||
return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
|
||||
sizeof(val));
|
||||
}
|
||||
|
||||
static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
|
||||
{
|
||||
|
||||
return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
|
||||
sizeof(*val));
|
||||
}
|
||||
|
||||
static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
|
||||
}
|
||||
|
||||
static bool pv_eoi_get_pending(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
u8 val;
|
||||
if (pv_eoi_get_user(vcpu, &val) < 0)
|
||||
apic_debug("Can't read EOI MSR value: 0x%llx\n",
|
||||
(unsigned long long)vcpi->arch.pv_eoi.msr_val);
|
||||
return val & 0x1;
|
||||
}
|
||||
|
||||
static void pv_eoi_set_pending(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (pv_eoi_put_user(vcpu, KVM_PV_EOI_ENABLED) < 0) {
|
||||
apic_debug("Can't set EOI MSR value: 0x%llx\n",
|
||||
(unsigned long long)vcpi->arch.pv_eoi.msr_val);
|
||||
return;
|
||||
}
|
||||
__set_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
|
||||
}
|
||||
|
||||
static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (pv_eoi_put_user(vcpu, KVM_PV_EOI_DISABLED) < 0) {
|
||||
apic_debug("Can't clear EOI MSR value: 0x%llx\n",
|
||||
(unsigned long long)vcpi->arch.pv_eoi.msr_val);
|
||||
return;
|
||||
}
|
||||
__clear_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention);
|
||||
}
|
||||
|
||||
static inline int apic_find_highest_isr(struct kvm_lapic *apic)
|
||||
{
|
||||
int result;
|
||||
if (!apic->isr_count)
|
||||
return -1;
|
||||
if (likely(apic->highest_isr_cache != -1))
|
||||
return apic->highest_isr_cache;
|
||||
|
||||
result = find_highest_vector(apic->regs + APIC_ISR);
|
||||
ASSERT(result == -1 || result >= 16);
|
||||
|
@ -482,17 +575,20 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
|
|||
return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
|
||||
}
|
||||
|
||||
static void apic_set_eoi(struct kvm_lapic *apic)
|
||||
static int apic_set_eoi(struct kvm_lapic *apic)
|
||||
{
|
||||
int vector = apic_find_highest_isr(apic);
|
||||
|
||||
trace_kvm_eoi(apic, vector);
|
||||
|
||||
/*
|
||||
* Not every write EOI will has corresponding ISR,
|
||||
* one example is when Kernel check timer on setup_IO_APIC
|
||||
*/
|
||||
if (vector == -1)
|
||||
return;
|
||||
return vector;
|
||||
|
||||
apic_clear_vector(vector, apic->regs + APIC_ISR);
|
||||
apic_clear_isr(vector, apic);
|
||||
apic_update_ppr(apic);
|
||||
|
||||
if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
|
||||
|
@ -505,6 +601,7 @@ static void apic_set_eoi(struct kvm_lapic *apic)
|
|||
kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
|
||||
}
|
||||
kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
|
||||
return vector;
|
||||
}
|
||||
|
||||
static void apic_send_ipi(struct kvm_lapic *apic)
|
||||
|
@ -1081,10 +1178,13 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
|
|||
apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
|
||||
}
|
||||
apic->irr_pending = false;
|
||||
apic->isr_count = 0;
|
||||
apic->highest_isr_cache = -1;
|
||||
update_divide_count(apic);
|
||||
atomic_set(&apic->lapic_timer.pending, 0);
|
||||
if (kvm_vcpu_is_bsp(vcpu))
|
||||
vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
|
||||
vcpu->arch.pv_eoi.msr_val = 0;
|
||||
apic_update_ppr(apic);
|
||||
|
||||
vcpu->arch.apic_arb_prio = 0;
|
||||
|
@ -1248,7 +1348,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
|
|||
if (vector == -1)
|
||||
return -1;
|
||||
|
||||
apic_set_vector(vector, apic->regs + APIC_ISR);
|
||||
apic_set_isr(vector, apic);
|
||||
apic_update_ppr(apic);
|
||||
apic_clear_irr(vector, apic);
|
||||
return vector;
|
||||
|
@ -1267,6 +1367,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu)
|
|||
update_divide_count(apic);
|
||||
start_apic_timer(apic);
|
||||
apic->irr_pending = true;
|
||||
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
|
||||
apic->highest_isr_cache = -1;
|
||||
kvm_make_request(KVM_REQ_EVENT, vcpu);
|
||||
}
|
||||
|
||||
|
@ -1283,11 +1385,51 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
|
|||
hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
|
||||
}
|
||||
|
||||
/*
|
||||
* apic_sync_pv_eoi_from_guest - called on vmexit or cancel interrupt
|
||||
*
|
||||
* Detect whether guest triggered PV EOI since the
|
||||
* last entry. If yes, set EOI on guests's behalf.
|
||||
* Clear PV EOI in guest memory in any case.
|
||||
*/
|
||||
static void apic_sync_pv_eoi_from_guest(struct kvm_vcpu *vcpu,
|
||||
struct kvm_lapic *apic)
|
||||
{
|
||||
bool pending;
|
||||
int vector;
|
||||
/*
|
||||
* PV EOI state is derived from KVM_APIC_PV_EOI_PENDING in host
|
||||
* and KVM_PV_EOI_ENABLED in guest memory as follows:
|
||||
*
|
||||
* KVM_APIC_PV_EOI_PENDING is unset:
|
||||
* -> host disabled PV EOI.
|
||||
* KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is set:
|
||||
* -> host enabled PV EOI, guest did not execute EOI yet.
|
||||
* KVM_APIC_PV_EOI_PENDING is set, KVM_PV_EOI_ENABLED is unset:
|
||||
* -> host enabled PV EOI, guest executed EOI.
|
||||
*/
|
||||
BUG_ON(!pv_eoi_enabled(vcpu));
|
||||
pending = pv_eoi_get_pending(vcpu);
|
||||
/*
|
||||
* Clear pending bit in any case: it will be set again on vmentry.
|
||||
* While this might not be ideal from performance point of view,
|
||||
* this makes sure pv eoi is only enabled when we know it's safe.
|
||||
*/
|
||||
pv_eoi_clr_pending(vcpu);
|
||||
if (pending)
|
||||
return;
|
||||
vector = apic_set_eoi(apic);
|
||||
trace_kvm_pv_eoi(apic, vector);
|
||||
}
|
||||
|
||||
void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
u32 data;
|
||||
void *vapic;
|
||||
|
||||
if (test_bit(KVM_APIC_PV_EOI_PENDING, &vcpu->arch.apic_attention))
|
||||
apic_sync_pv_eoi_from_guest(vcpu, vcpu->arch.apic);
|
||||
|
||||
if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
|
||||
return;
|
||||
|
||||
|
@ -1298,17 +1440,44 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
|
|||
apic_set_tpr(vcpu->arch.apic, data & 0xff);
|
||||
}
|
||||
|
||||
/*
|
||||
* apic_sync_pv_eoi_to_guest - called before vmentry
|
||||
*
|
||||
* Detect whether it's safe to enable PV EOI and
|
||||
* if yes do so.
|
||||
*/
|
||||
static void apic_sync_pv_eoi_to_guest(struct kvm_vcpu *vcpu,
|
||||
struct kvm_lapic *apic)
|
||||
{
|
||||
if (!pv_eoi_enabled(vcpu) ||
|
||||
/* IRR set or many bits in ISR: could be nested. */
|
||||
apic->irr_pending ||
|
||||
/* Cache not set: could be safe but we don't bother. */
|
||||
apic->highest_isr_cache == -1 ||
|
||||
/* Need EOI to update ioapic. */
|
||||
kvm_ioapic_handles_vector(vcpu->kvm, apic->highest_isr_cache)) {
|
||||
/*
|
||||
* PV EOI was disabled by apic_sync_pv_eoi_from_guest
|
||||
* so we need not do anything here.
|
||||
*/
|
||||
return;
|
||||
}
|
||||
|
||||
pv_eoi_set_pending(apic->vcpu);
|
||||
}
|
||||
|
||||
void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
u32 data, tpr;
|
||||
int max_irr, max_isr;
|
||||
struct kvm_lapic *apic;
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
void *vapic;
|
||||
|
||||
apic_sync_pv_eoi_to_guest(vcpu, apic);
|
||||
|
||||
if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
|
||||
return;
|
||||
|
||||
apic = vcpu->arch.apic;
|
||||
tpr = apic_get_reg(apic, APIC_TASKPRI) & 0xff;
|
||||
max_irr = apic_find_highest_irr(apic);
|
||||
if (max_irr < 0)
|
||||
|
@ -1394,3 +1563,16 @@ int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
|
|||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
|
||||
{
|
||||
u64 addr = data & ~KVM_MSR_ENABLED;
|
||||
if (!IS_ALIGNED(addr, 4))
|
||||
return 1;
|
||||
|
||||
vcpu->arch.pv_eoi.msr_val = data;
|
||||
if (!pv_eoi_enabled(vcpu))
|
||||
return 0;
|
||||
return kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
|
||||
addr);
|
||||
}
|
||||
|
|
|
@ -13,6 +13,15 @@ struct kvm_lapic {
|
|||
u32 divide_count;
|
||||
struct kvm_vcpu *vcpu;
|
||||
bool irr_pending;
|
||||
/* Number of bits set in ISR. */
|
||||
s16 isr_count;
|
||||
/* The highest vector set in ISR; if -1 - invalid, must scan ISR. */
|
||||
int highest_isr_cache;
|
||||
/**
|
||||
* APIC register page. The layout matches the register layout seen by
|
||||
* the guest 1:1, because it is accessed by the vmx microcode.
|
||||
* Note: Only one register, the TPR, is used by the microcode.
|
||||
*/
|
||||
void *regs;
|
||||
gpa_t vapic_addr;
|
||||
struct page *vapic_page;
|
||||
|
@ -60,4 +69,6 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct kvm_vcpu *vcpu)
|
|||
{
|
||||
return vcpu->arch.hv_vapic & HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE;
|
||||
}
|
||||
|
||||
int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
|
||||
#endif
|
||||
|
|
|
@ -90,7 +90,7 @@ module_param(dbg, bool, 0644);
|
|||
|
||||
#define PTE_PREFETCH_NUM 8
|
||||
|
||||
#define PT_FIRST_AVAIL_BITS_SHIFT 9
|
||||
#define PT_FIRST_AVAIL_BITS_SHIFT 10
|
||||
#define PT64_SECOND_AVAIL_BITS_SHIFT 52
|
||||
|
||||
#define PT64_LEVEL_BITS 9
|
||||
|
@ -145,7 +145,8 @@ module_param(dbg, bool, 0644);
|
|||
#define CREATE_TRACE_POINTS
|
||||
#include "mmutrace.h"
|
||||
|
||||
#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
|
||||
#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
|
||||
#define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
|
||||
|
||||
#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
|
||||
|
||||
|
@ -188,6 +189,7 @@ static u64 __read_mostly shadow_dirty_mask;
|
|||
static u64 __read_mostly shadow_mmio_mask;
|
||||
|
||||
static void mmu_spte_set(u64 *sptep, u64 spte);
|
||||
static void mmu_free_roots(struct kvm_vcpu *vcpu);
|
||||
|
||||
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
|
||||
{
|
||||
|
@ -444,8 +446,22 @@ static bool __check_direct_spte_mmio_pf(u64 spte)
|
|||
}
|
||||
#endif
|
||||
|
||||
static bool spte_is_locklessly_modifiable(u64 spte)
|
||||
{
|
||||
return !(~spte & (SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE));
|
||||
}
|
||||
|
||||
static bool spte_has_volatile_bits(u64 spte)
|
||||
{
|
||||
/*
|
||||
* Always atomicly update spte if it can be updated
|
||||
* out of mmu-lock, it can ensure dirty bit is not lost,
|
||||
* also, it can help us to get a stable is_writable_pte()
|
||||
* to ensure tlb flush is not missed.
|
||||
*/
|
||||
if (spte_is_locklessly_modifiable(spte))
|
||||
return true;
|
||||
|
||||
if (!shadow_accessed_mask)
|
||||
return false;
|
||||
|
||||
|
@ -478,34 +494,47 @@ static void mmu_spte_set(u64 *sptep, u64 new_spte)
|
|||
|
||||
/* Rules for using mmu_spte_update:
|
||||
* Update the state bits, it means the mapped pfn is not changged.
|
||||
*
|
||||
* Whenever we overwrite a writable spte with a read-only one we
|
||||
* should flush remote TLBs. Otherwise rmap_write_protect
|
||||
* will find a read-only spte, even though the writable spte
|
||||
* might be cached on a CPU's TLB, the return value indicates this
|
||||
* case.
|
||||
*/
|
||||
static void mmu_spte_update(u64 *sptep, u64 new_spte)
|
||||
static bool mmu_spte_update(u64 *sptep, u64 new_spte)
|
||||
{
|
||||
u64 mask, old_spte = *sptep;
|
||||
u64 old_spte = *sptep;
|
||||
bool ret = false;
|
||||
|
||||
WARN_ON(!is_rmap_spte(new_spte));
|
||||
|
||||
if (!is_shadow_present_pte(old_spte))
|
||||
return mmu_spte_set(sptep, new_spte);
|
||||
if (!is_shadow_present_pte(old_spte)) {
|
||||
mmu_spte_set(sptep, new_spte);
|
||||
return ret;
|
||||
}
|
||||
|
||||
new_spte |= old_spte & shadow_dirty_mask;
|
||||
|
||||
mask = shadow_accessed_mask;
|
||||
if (is_writable_pte(old_spte))
|
||||
mask |= shadow_dirty_mask;
|
||||
|
||||
if (!spte_has_volatile_bits(old_spte) || (new_spte & mask) == mask)
|
||||
if (!spte_has_volatile_bits(old_spte))
|
||||
__update_clear_spte_fast(sptep, new_spte);
|
||||
else
|
||||
old_spte = __update_clear_spte_slow(sptep, new_spte);
|
||||
|
||||
/*
|
||||
* For the spte updated out of mmu-lock is safe, since
|
||||
* we always atomicly update it, see the comments in
|
||||
* spte_has_volatile_bits().
|
||||
*/
|
||||
if (is_writable_pte(old_spte) && !is_writable_pte(new_spte))
|
||||
ret = true;
|
||||
|
||||
if (!shadow_accessed_mask)
|
||||
return;
|
||||
return ret;
|
||||
|
||||
if (spte_is_bit_cleared(old_spte, new_spte, shadow_accessed_mask))
|
||||
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
|
||||
if (spte_is_bit_cleared(old_spte, new_spte, shadow_dirty_mask))
|
||||
kvm_set_pfn_dirty(spte_to_pfn(old_spte));
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -652,8 +681,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
|
|||
mmu_page_header_cache);
|
||||
}
|
||||
|
||||
static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc,
|
||||
size_t size)
|
||||
static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
|
||||
{
|
||||
void *p;
|
||||
|
||||
|
@ -664,8 +692,7 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc,
|
|||
|
||||
static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache,
|
||||
sizeof(struct pte_list_desc));
|
||||
return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache);
|
||||
}
|
||||
|
||||
static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
|
||||
|
@ -1051,35 +1078,82 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
|
|||
rmap_remove(kvm, sptep);
|
||||
}
|
||||
|
||||
static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level)
|
||||
|
||||
static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
|
||||
{
|
||||
if (is_large_pte(*sptep)) {
|
||||
WARN_ON(page_header(__pa(sptep))->role.level ==
|
||||
PT_PAGE_TABLE_LEVEL);
|
||||
drop_spte(kvm, sptep);
|
||||
--kvm->stat.lpages;
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
|
||||
{
|
||||
if (__drop_large_spte(vcpu->kvm, sptep))
|
||||
kvm_flush_remote_tlbs(vcpu->kvm);
|
||||
}
|
||||
|
||||
/*
|
||||
* Write-protect on the specified @sptep, @pt_protect indicates whether
|
||||
* spte writ-protection is caused by protecting shadow page table.
|
||||
* @flush indicates whether tlb need be flushed.
|
||||
*
|
||||
* Note: write protection is difference between drity logging and spte
|
||||
* protection:
|
||||
* - for dirty logging, the spte can be set to writable at anytime if
|
||||
* its dirty bitmap is properly set.
|
||||
* - for spte protection, the spte can be writable only after unsync-ing
|
||||
* shadow page.
|
||||
*
|
||||
* Return true if the spte is dropped.
|
||||
*/
|
||||
static bool
|
||||
spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)
|
||||
{
|
||||
u64 spte = *sptep;
|
||||
|
||||
if (!is_writable_pte(spte) &&
|
||||
!(pt_protect && spte_is_locklessly_modifiable(spte)))
|
||||
return false;
|
||||
|
||||
rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);
|
||||
|
||||
if (__drop_large_spte(kvm, sptep)) {
|
||||
*flush |= true;
|
||||
return true;
|
||||
}
|
||||
|
||||
if (pt_protect)
|
||||
spte &= ~SPTE_MMU_WRITEABLE;
|
||||
spte = spte & ~PT_WRITABLE_MASK;
|
||||
|
||||
*flush |= mmu_spte_update(sptep, spte);
|
||||
return false;
|
||||
}
|
||||
|
||||
static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
|
||||
int level, bool pt_protect)
|
||||
{
|
||||
u64 *sptep;
|
||||
struct rmap_iterator iter;
|
||||
int write_protected = 0;
|
||||
bool flush = false;
|
||||
|
||||
for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
|
||||
BUG_ON(!(*sptep & PT_PRESENT_MASK));
|
||||
rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);
|
||||
|
||||
if (!is_writable_pte(*sptep)) {
|
||||
sptep = rmap_get_next(&iter);
|
||||
if (spte_write_protect(kvm, sptep, &flush, pt_protect)) {
|
||||
sptep = rmap_get_first(*rmapp, &iter);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (level == PT_PAGE_TABLE_LEVEL) {
|
||||
mmu_spte_update(sptep, *sptep & ~PT_WRITABLE_MASK);
|
||||
sptep = rmap_get_next(&iter);
|
||||
} else {
|
||||
BUG_ON(!is_large_pte(*sptep));
|
||||
drop_spte(kvm, sptep);
|
||||
--kvm->stat.lpages;
|
||||
sptep = rmap_get_first(*rmapp, &iter);
|
||||
}
|
||||
|
||||
write_protected = 1;
|
||||
sptep = rmap_get_next(&iter);
|
||||
}
|
||||
|
||||
return write_protected;
|
||||
return flush;
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -1100,26 +1174,26 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
|
|||
|
||||
while (mask) {
|
||||
rmapp = &slot->rmap[gfn_offset + __ffs(mask)];
|
||||
__rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL);
|
||||
__rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL, false);
|
||||
|
||||
/* clear the first set bit */
|
||||
mask &= mask - 1;
|
||||
}
|
||||
}
|
||||
|
||||
static int rmap_write_protect(struct kvm *kvm, u64 gfn)
|
||||
static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
|
||||
{
|
||||
struct kvm_memory_slot *slot;
|
||||
unsigned long *rmapp;
|
||||
int i;
|
||||
int write_protected = 0;
|
||||
bool write_protected = false;
|
||||
|
||||
slot = gfn_to_memslot(kvm, gfn);
|
||||
|
||||
for (i = PT_PAGE_TABLE_LEVEL;
|
||||
i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
|
||||
rmapp = __gfn_to_rmap(gfn, i, slot);
|
||||
write_protected |= __rmap_write_protect(kvm, rmapp, i);
|
||||
write_protected |= __rmap_write_protect(kvm, rmapp, i, true);
|
||||
}
|
||||
|
||||
return write_protected;
|
||||
|
@ -1238,11 +1312,12 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|||
unsigned long data)
|
||||
{
|
||||
u64 *sptep;
|
||||
struct rmap_iterator iter;
|
||||
struct rmap_iterator uninitialized_var(iter);
|
||||
int young = 0;
|
||||
|
||||
/*
|
||||
* Emulate the accessed bit for EPT, by checking if this page has
|
||||
* In case of absence of EPT Access and Dirty Bits supports,
|
||||
* emulate the accessed bit for EPT, by checking if this page has
|
||||
* an EPT mapping, and clearing it if it does. On the next access,
|
||||
* a new EPT mapping will be established.
|
||||
* This has some overhead, but not as much as the cost of swapping
|
||||
|
@ -1253,11 +1328,12 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|||
|
||||
for (sptep = rmap_get_first(*rmapp, &iter); sptep;
|
||||
sptep = rmap_get_next(&iter)) {
|
||||
BUG_ON(!(*sptep & PT_PRESENT_MASK));
|
||||
BUG_ON(!is_shadow_present_pte(*sptep));
|
||||
|
||||
if (*sptep & PT_ACCESSED_MASK) {
|
||||
if (*sptep & shadow_accessed_mask) {
|
||||
young = 1;
|
||||
clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)sptep);
|
||||
clear_bit((ffs(shadow_accessed_mask) - 1),
|
||||
(unsigned long *)sptep);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -1281,9 +1357,9 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
|||
|
||||
for (sptep = rmap_get_first(*rmapp, &iter); sptep;
|
||||
sptep = rmap_get_next(&iter)) {
|
||||
BUG_ON(!(*sptep & PT_PRESENT_MASK));
|
||||
BUG_ON(!is_shadow_present_pte(*sptep));
|
||||
|
||||
if (*sptep & PT_ACCESSED_MASK) {
|
||||
if (*sptep & shadow_accessed_mask) {
|
||||
young = 1;
|
||||
break;
|
||||
}
|
||||
|
@ -1401,12 +1477,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
|
|||
u64 *parent_pte, int direct)
|
||||
{
|
||||
struct kvm_mmu_page *sp;
|
||||
sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache,
|
||||
sizeof *sp);
|
||||
sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
|
||||
sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
|
||||
sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache);
|
||||
if (!direct)
|
||||
sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache,
|
||||
PAGE_SIZE);
|
||||
sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache);
|
||||
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
|
||||
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
|
||||
bitmap_zero(sp->slot_bitmap, KVM_MEM_SLOTS_NUM);
|
||||
|
@ -1701,7 +1775,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
|
|||
|
||||
kvm_mmu_pages_init(parent, &parents, &pages);
|
||||
while (mmu_unsync_walk(parent, &pages)) {
|
||||
int protected = 0;
|
||||
bool protected = false;
|
||||
|
||||
for_each_sp(pages, sp, parents, i)
|
||||
protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
|
||||
|
@ -1866,15 +1940,6 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
|
|||
mmu_spte_set(sptep, spte);
|
||||
}
|
||||
|
||||
static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
|
||||
{
|
||||
if (is_large_pte(*sptep)) {
|
||||
drop_spte(vcpu->kvm, sptep);
|
||||
--vcpu->kvm->stat.lpages;
|
||||
kvm_flush_remote_tlbs(vcpu->kvm);
|
||||
}
|
||||
}
|
||||
|
||||
static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
||||
unsigned direct_access)
|
||||
{
|
||||
|
@ -2243,7 +2308,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
|||
gfn_t gfn, pfn_t pfn, bool speculative,
|
||||
bool can_unsync, bool host_writable)
|
||||
{
|
||||
u64 spte, entry = *sptep;
|
||||
u64 spte;
|
||||
int ret = 0;
|
||||
|
||||
if (set_mmio_spte(sptep, gfn, pfn, pte_access))
|
||||
|
@ -2257,8 +2322,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
|||
spte |= shadow_x_mask;
|
||||
else
|
||||
spte |= shadow_nx_mask;
|
||||
|
||||
if (pte_access & ACC_USER_MASK)
|
||||
spte |= shadow_user_mask;
|
||||
|
||||
if (level > PT_PAGE_TABLE_LEVEL)
|
||||
spte |= PT_PAGE_SIZE_MASK;
|
||||
if (tdp_enabled)
|
||||
|
@ -2283,7 +2350,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
|||
goto done;
|
||||
}
|
||||
|
||||
spte |= PT_WRITABLE_MASK;
|
||||
spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
|
||||
|
||||
if (!vcpu->arch.mmu.direct_map
|
||||
&& !(pte_access & ACC_WRITE_MASK)) {
|
||||
|
@ -2312,8 +2379,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
|||
__func__, gfn);
|
||||
ret = 1;
|
||||
pte_access &= ~ACC_WRITE_MASK;
|
||||
if (is_writable_pte(spte))
|
||||
spte &= ~PT_WRITABLE_MASK;
|
||||
spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -2321,14 +2387,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
|||
mark_page_dirty(vcpu->kvm, gfn);
|
||||
|
||||
set_pte:
|
||||
mmu_spte_update(sptep, spte);
|
||||
/*
|
||||
* If we overwrite a writable spte with a read-only one we
|
||||
* should flush remote TLBs. Otherwise rmap_write_protect
|
||||
* will find a read-only spte, even though the writable spte
|
||||
* might be cached on a CPU's TLB.
|
||||
*/
|
||||
if (is_writable_pte(entry) && !is_writable_pte(*sptep))
|
||||
if (mmu_spte_update(sptep, spte))
|
||||
kvm_flush_remote_tlbs(vcpu->kvm);
|
||||
done:
|
||||
return ret;
|
||||
|
@ -2403,6 +2462,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
|
|||
|
||||
static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
mmu_free_roots(vcpu);
|
||||
}
|
||||
|
||||
static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
|
||||
|
@ -2625,18 +2685,116 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
|
|||
return ret;
|
||||
}
|
||||
|
||||
static bool page_fault_can_be_fast(struct kvm_vcpu *vcpu, u32 error_code)
|
||||
{
|
||||
/*
|
||||
* #PF can be fast only if the shadow page table is present and it
|
||||
* is caused by write-protect, that means we just need change the
|
||||
* W bit of the spte which can be done out of mmu-lock.
|
||||
*/
|
||||
if (!(error_code & PFERR_PRESENT_MASK) ||
|
||||
!(error_code & PFERR_WRITE_MASK))
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool
|
||||
fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 spte)
|
||||
{
|
||||
struct kvm_mmu_page *sp = page_header(__pa(sptep));
|
||||
gfn_t gfn;
|
||||
|
||||
WARN_ON(!sp->role.direct);
|
||||
|
||||
/*
|
||||
* The gfn of direct spte is stable since it is calculated
|
||||
* by sp->gfn.
|
||||
*/
|
||||
gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
|
||||
|
||||
if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
|
||||
mark_page_dirty(vcpu->kvm, gfn);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Return value:
|
||||
* - true: let the vcpu to access on the same address again.
|
||||
* - false: let the real page fault path to fix it.
|
||||
*/
|
||||
static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
|
||||
u32 error_code)
|
||||
{
|
||||
struct kvm_shadow_walk_iterator iterator;
|
||||
bool ret = false;
|
||||
u64 spte = 0ull;
|
||||
|
||||
if (!page_fault_can_be_fast(vcpu, error_code))
|
||||
return false;
|
||||
|
||||
walk_shadow_page_lockless_begin(vcpu);
|
||||
for_each_shadow_entry_lockless(vcpu, gva, iterator, spte)
|
||||
if (!is_shadow_present_pte(spte) || iterator.level < level)
|
||||
break;
|
||||
|
||||
/*
|
||||
* If the mapping has been changed, let the vcpu fault on the
|
||||
* same address again.
|
||||
*/
|
||||
if (!is_rmap_spte(spte)) {
|
||||
ret = true;
|
||||
goto exit;
|
||||
}
|
||||
|
||||
if (!is_last_spte(spte, level))
|
||||
goto exit;
|
||||
|
||||
/*
|
||||
* Check if it is a spurious fault caused by TLB lazily flushed.
|
||||
*
|
||||
* Need not check the access of upper level table entries since
|
||||
* they are always ACC_ALL.
|
||||
*/
|
||||
if (is_writable_pte(spte)) {
|
||||
ret = true;
|
||||
goto exit;
|
||||
}
|
||||
|
||||
/*
|
||||
* Currently, to simplify the code, only the spte write-protected
|
||||
* by dirty-log can be fast fixed.
|
||||
*/
|
||||
if (!spte_is_locklessly_modifiable(spte))
|
||||
goto exit;
|
||||
|
||||
/*
|
||||
* Currently, fast page fault only works for direct mapping since
|
||||
* the gfn is not stable for indirect shadow page.
|
||||
* See Documentation/virtual/kvm/locking.txt to get more detail.
|
||||
*/
|
||||
ret = fast_pf_fix_direct_spte(vcpu, iterator.sptep, spte);
|
||||
exit:
|
||||
trace_fast_page_fault(vcpu, gva, error_code, iterator.sptep,
|
||||
spte, ret);
|
||||
walk_shadow_page_lockless_end(vcpu);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
|
||||
gva_t gva, pfn_t *pfn, bool write, bool *writable);
|
||||
|
||||
static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn,
|
||||
bool prefault)
|
||||
static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
|
||||
gfn_t gfn, bool prefault)
|
||||
{
|
||||
int r;
|
||||
int level;
|
||||
int force_pt_level;
|
||||
pfn_t pfn;
|
||||
unsigned long mmu_seq;
|
||||
bool map_writable;
|
||||
bool map_writable, write = error_code & PFERR_WRITE_MASK;
|
||||
|
||||
force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn);
|
||||
if (likely(!force_pt_level)) {
|
||||
|
@ -2653,6 +2811,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn,
|
|||
} else
|
||||
level = PT_PAGE_TABLE_LEVEL;
|
||||
|
||||
if (fast_page_fault(vcpu, v, level, error_code))
|
||||
return 0;
|
||||
|
||||
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
||||
smp_rmb();
|
||||
|
||||
|
@ -3041,7 +3202,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
|
|||
gfn = gva >> PAGE_SHIFT;
|
||||
|
||||
return nonpaging_map(vcpu, gva & PAGE_MASK,
|
||||
error_code & PFERR_WRITE_MASK, gfn, prefault);
|
||||
error_code, gfn, prefault);
|
||||
}
|
||||
|
||||
static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
|
||||
|
@ -3121,6 +3282,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
|
|||
} else
|
||||
level = PT_PAGE_TABLE_LEVEL;
|
||||
|
||||
if (fast_page_fault(vcpu, gpa, level, error_code))
|
||||
return 0;
|
||||
|
||||
mmu_seq = vcpu->kvm->mmu_notifier_seq;
|
||||
smp_rmb();
|
||||
|
||||
|
@ -3885,6 +4049,7 @@ int kvm_mmu_setup(struct kvm_vcpu *vcpu)
|
|||
void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
|
||||
{
|
||||
struct kvm_mmu_page *sp;
|
||||
bool flush = false;
|
||||
|
||||
list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
|
||||
int i;
|
||||
|
@ -3899,16 +4064,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
|
|||
!is_last_spte(pt[i], sp->role.level))
|
||||
continue;
|
||||
|
||||
if (is_large_pte(pt[i])) {
|
||||
drop_spte(kvm, &pt[i]);
|
||||
--kvm->stat.lpages;
|
||||
continue;
|
||||
}
|
||||
|
||||
/* avoid RMW */
|
||||
if (is_writable_pte(pt[i]))
|
||||
mmu_spte_update(&pt[i],
|
||||
pt[i] & ~PT_WRITABLE_MASK);
|
||||
spte_write_protect(kvm, &pt[i], &flush, false);
|
||||
}
|
||||
}
|
||||
kvm_flush_remote_tlbs(kvm);
|
||||
|
@ -3945,7 +4101,6 @@ static void kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm,
|
|||
static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
|
||||
{
|
||||
struct kvm *kvm;
|
||||
struct kvm *kvm_freed = NULL;
|
||||
int nr_to_scan = sc->nr_to_scan;
|
||||
|
||||
if (nr_to_scan == 0)
|
||||
|
@ -3957,22 +4112,30 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
|
|||
int idx;
|
||||
LIST_HEAD(invalid_list);
|
||||
|
||||
/*
|
||||
* n_used_mmu_pages is accessed without holding kvm->mmu_lock
|
||||
* here. We may skip a VM instance errorneosly, but we do not
|
||||
* want to shrink a VM that only started to populate its MMU
|
||||
* anyway.
|
||||
*/
|
||||
if (kvm->arch.n_used_mmu_pages > 0) {
|
||||
if (!nr_to_scan--)
|
||||
break;
|
||||
continue;
|
||||
}
|
||||
|
||||
idx = srcu_read_lock(&kvm->srcu);
|
||||
spin_lock(&kvm->mmu_lock);
|
||||
if (!kvm_freed && nr_to_scan > 0 &&
|
||||
kvm->arch.n_used_mmu_pages > 0) {
|
||||
kvm_mmu_remove_some_alloc_mmu_pages(kvm,
|
||||
&invalid_list);
|
||||
kvm_freed = kvm;
|
||||
}
|
||||
nr_to_scan--;
|
||||
|
||||
kvm_mmu_remove_some_alloc_mmu_pages(kvm, &invalid_list);
|
||||
kvm_mmu_commit_zap_page(kvm, &invalid_list);
|
||||
|
||||
spin_unlock(&kvm->mmu_lock);
|
||||
srcu_read_unlock(&kvm->srcu, idx);
|
||||
|
||||
list_move_tail(&kvm->vm_list, &vm_list);
|
||||
break;
|
||||
}
|
||||
if (kvm_freed)
|
||||
list_move_tail(&kvm_freed->vm_list, &vm_list);
|
||||
|
||||
raw_spin_unlock(&kvm_lock);
|
||||
|
||||
|
|
|
@ -54,8 +54,8 @@
|
|||
*/
|
||||
TRACE_EVENT(
|
||||
kvm_mmu_pagetable_walk,
|
||||
TP_PROTO(u64 addr, int write_fault, int user_fault, int fetch_fault),
|
||||
TP_ARGS(addr, write_fault, user_fault, fetch_fault),
|
||||
TP_PROTO(u64 addr, u32 pferr),
|
||||
TP_ARGS(addr, pferr),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(__u64, addr)
|
||||
|
@ -64,8 +64,7 @@ TRACE_EVENT(
|
|||
|
||||
TP_fast_assign(
|
||||
__entry->addr = addr;
|
||||
__entry->pferr = (!!write_fault << 1) | (!!user_fault << 2)
|
||||
| (!!fetch_fault << 4);
|
||||
__entry->pferr = pferr;
|
||||
),
|
||||
|
||||
TP_printk("addr %llx pferr %x %s", __entry->addr, __entry->pferr,
|
||||
|
@ -243,6 +242,44 @@ TRACE_EVENT(
|
|||
TP_printk("addr:%llx gfn %llx access %x", __entry->addr, __entry->gfn,
|
||||
__entry->access)
|
||||
);
|
||||
|
||||
#define __spte_satisfied(__spte) \
|
||||
(__entry->retry && is_writable_pte(__entry->__spte))
|
||||
|
||||
TRACE_EVENT(
|
||||
fast_page_fault,
|
||||
TP_PROTO(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code,
|
||||
u64 *sptep, u64 old_spte, bool retry),
|
||||
TP_ARGS(vcpu, gva, error_code, sptep, old_spte, retry),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(int, vcpu_id)
|
||||
__field(gva_t, gva)
|
||||
__field(u32, error_code)
|
||||
__field(u64 *, sptep)
|
||||
__field(u64, old_spte)
|
||||
__field(u64, new_spte)
|
||||
__field(bool, retry)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->vcpu_id = vcpu->vcpu_id;
|
||||
__entry->gva = gva;
|
||||
__entry->error_code = error_code;
|
||||
__entry->sptep = sptep;
|
||||
__entry->old_spte = old_spte;
|
||||
__entry->new_spte = *sptep;
|
||||
__entry->retry = retry;
|
||||
),
|
||||
|
||||
TP_printk("vcpu %d gva %lx error_code %s sptep %p old %#llx"
|
||||
" new %llx spurious %d fixed %d", __entry->vcpu_id,
|
||||
__entry->gva, __print_flags(__entry->error_code, "|",
|
||||
kvm_mmu_trace_pferr_flags), __entry->sptep,
|
||||
__entry->old_spte, __entry->new_spte,
|
||||
__spte_satisfied(old_spte), __spte_satisfied(new_spte)
|
||||
)
|
||||
);
|
||||
#endif /* _TRACE_KVMMMU_H */
|
||||
|
||||
#undef TRACE_INCLUDE_PATH
|
||||
|
|
|
@ -154,8 +154,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
|
|||
const int fetch_fault = access & PFERR_FETCH_MASK;
|
||||
u16 errcode = 0;
|
||||
|
||||
trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
|
||||
fetch_fault);
|
||||
trace_kvm_mmu_pagetable_walk(addr, access);
|
||||
retry_walk:
|
||||
eperm = false;
|
||||
walker->level = mmu->root_level;
|
||||
|
|
|
@ -3185,8 +3185,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
|
|||
break;
|
||||
case MSR_IA32_DEBUGCTLMSR:
|
||||
if (!boot_cpu_has(X86_FEATURE_LBRV)) {
|
||||
pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTL 0x%llx, nop\n",
|
||||
__func__, data);
|
||||
vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTL 0x%llx, nop\n",
|
||||
__func__, data);
|
||||
break;
|
||||
}
|
||||
if (data & DEBUGCTL_RESERVED_BITS)
|
||||
|
@ -3205,7 +3205,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
|
|||
case MSR_VM_CR:
|
||||
return svm_set_vm_cr(vcpu, data);
|
||||
case MSR_VM_IGNNE:
|
||||
pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
|
||||
vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
|
||||
break;
|
||||
default:
|
||||
return kvm_set_msr_common(vcpu, ecx, data);
|
||||
|
@ -4044,6 +4044,11 @@ static bool svm_rdtscp_supported(void)
|
|||
return false;
|
||||
}
|
||||
|
||||
static bool svm_invpcid_supported(void)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
static bool svm_has_wbinvd_exit(void)
|
||||
{
|
||||
return true;
|
||||
|
@ -4312,6 +4317,7 @@ static struct kvm_x86_ops svm_x86_ops = {
|
|||
.cpuid_update = svm_cpuid_update,
|
||||
|
||||
.rdtscp_supported = svm_rdtscp_supported,
|
||||
.invpcid_supported = svm_invpcid_supported,
|
||||
|
||||
.set_supported_cpuid = svm_set_supported_cpuid,
|
||||
|
||||
|
|
|
@ -517,6 +517,40 @@ TRACE_EVENT(kvm_apic_accept_irq,
|
|||
__entry->coalesced ? " (coalesced)" : "")
|
||||
);
|
||||
|
||||
TRACE_EVENT(kvm_eoi,
|
||||
TP_PROTO(struct kvm_lapic *apic, int vector),
|
||||
TP_ARGS(apic, vector),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( __u32, apicid )
|
||||
__field( int, vector )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->apicid = apic->vcpu->vcpu_id;
|
||||
__entry->vector = vector;
|
||||
),
|
||||
|
||||
TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
|
||||
);
|
||||
|
||||
TRACE_EVENT(kvm_pv_eoi,
|
||||
TP_PROTO(struct kvm_lapic *apic, int vector),
|
||||
TP_ARGS(apic, vector),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( __u32, apicid )
|
||||
__field( int, vector )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->apicid = apic->vcpu->vcpu_id;
|
||||
__entry->vector = vector;
|
||||
),
|
||||
|
||||
TP_printk("apicid %x vector %d", __entry->apicid, __entry->vector)
|
||||
);
|
||||
|
||||
/*
|
||||
* Tracepoint for nested VMRUN
|
||||
*/
|
||||
|
|
|
@ -71,7 +71,10 @@ static bool __read_mostly enable_unrestricted_guest = 1;
|
|||
module_param_named(unrestricted_guest,
|
||||
enable_unrestricted_guest, bool, S_IRUGO);
|
||||
|
||||
static bool __read_mostly emulate_invalid_guest_state = 0;
|
||||
static bool __read_mostly enable_ept_ad_bits = 1;
|
||||
module_param_named(eptad, enable_ept_ad_bits, bool, S_IRUGO);
|
||||
|
||||
static bool __read_mostly emulate_invalid_guest_state = true;
|
||||
module_param(emulate_invalid_guest_state, bool, S_IRUGO);
|
||||
|
||||
static bool __read_mostly vmm_exclusive = 1;
|
||||
|
@ -615,6 +618,10 @@ static void kvm_cpu_vmxon(u64 addr);
|
|||
static void kvm_cpu_vmxoff(void);
|
||||
static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
|
||||
static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
|
||||
static void vmx_set_segment(struct kvm_vcpu *vcpu,
|
||||
struct kvm_segment *var, int seg);
|
||||
static void vmx_get_segment(struct kvm_vcpu *vcpu,
|
||||
struct kvm_segment *var, int seg);
|
||||
|
||||
static DEFINE_PER_CPU(struct vmcs *, vmxarea);
|
||||
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
|
||||
|
@ -789,6 +796,11 @@ static inline bool cpu_has_vmx_ept_4levels(void)
|
|||
return vmx_capability.ept & VMX_EPT_PAGE_WALK_4_BIT;
|
||||
}
|
||||
|
||||
static inline bool cpu_has_vmx_ept_ad_bits(void)
|
||||
{
|
||||
return vmx_capability.ept & VMX_EPT_AD_BIT;
|
||||
}
|
||||
|
||||
static inline bool cpu_has_vmx_invept_individual_addr(void)
|
||||
{
|
||||
return vmx_capability.ept & VMX_EPT_EXTENT_INDIVIDUAL_BIT;
|
||||
|
@ -849,6 +861,12 @@ static inline bool cpu_has_vmx_rdtscp(void)
|
|||
SECONDARY_EXEC_RDTSCP;
|
||||
}
|
||||
|
||||
static inline bool cpu_has_vmx_invpcid(void)
|
||||
{
|
||||
return vmcs_config.cpu_based_2nd_exec_ctrl &
|
||||
SECONDARY_EXEC_ENABLE_INVPCID;
|
||||
}
|
||||
|
||||
static inline bool cpu_has_virtual_nmis(void)
|
||||
{
|
||||
return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS;
|
||||
|
@ -1739,6 +1757,11 @@ static bool vmx_rdtscp_supported(void)
|
|||
return cpu_has_vmx_rdtscp();
|
||||
}
|
||||
|
||||
static bool vmx_invpcid_supported(void)
|
||||
{
|
||||
return cpu_has_vmx_invpcid() && enable_ept;
|
||||
}
|
||||
|
||||
/*
|
||||
* Swap MSR entry in host/guest MSR entry array.
|
||||
*/
|
||||
|
@ -2458,7 +2481,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
|
|||
SECONDARY_EXEC_ENABLE_EPT |
|
||||
SECONDARY_EXEC_UNRESTRICTED_GUEST |
|
||||
SECONDARY_EXEC_PAUSE_LOOP_EXITING |
|
||||
SECONDARY_EXEC_RDTSCP;
|
||||
SECONDARY_EXEC_RDTSCP |
|
||||
SECONDARY_EXEC_ENABLE_INVPCID;
|
||||
if (adjust_vmx_controls(min2, opt2,
|
||||
MSR_IA32_VMX_PROCBASED_CTLS2,
|
||||
&_cpu_based_2nd_exec_control) < 0)
|
||||
|
@ -2645,8 +2669,12 @@ static __init int hardware_setup(void)
|
|||
!cpu_has_vmx_ept_4levels()) {
|
||||
enable_ept = 0;
|
||||
enable_unrestricted_guest = 0;
|
||||
enable_ept_ad_bits = 0;
|
||||
}
|
||||
|
||||
if (!cpu_has_vmx_ept_ad_bits())
|
||||
enable_ept_ad_bits = 0;
|
||||
|
||||
if (!cpu_has_vmx_unrestricted_guest())
|
||||
enable_unrestricted_guest = 0;
|
||||
|
||||
|
@ -2770,6 +2798,7 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
|
|||
{
|
||||
unsigned long flags;
|
||||
struct vcpu_vmx *vmx = to_vmx(vcpu);
|
||||
struct kvm_segment var;
|
||||
|
||||
if (enable_unrestricted_guest)
|
||||
return;
|
||||
|
@ -2813,20 +2842,23 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
|
|||
if (emulate_invalid_guest_state)
|
||||
goto continue_rmode;
|
||||
|
||||
vmcs_write16(GUEST_SS_SELECTOR, vmcs_readl(GUEST_SS_BASE) >> 4);
|
||||
vmcs_write32(GUEST_SS_LIMIT, 0xffff);
|
||||
vmcs_write32(GUEST_SS_AR_BYTES, 0xf3);
|
||||
vmx_get_segment(vcpu, &var, VCPU_SREG_SS);
|
||||
vmx_set_segment(vcpu, &var, VCPU_SREG_SS);
|
||||
|
||||
vmcs_write32(GUEST_CS_AR_BYTES, 0xf3);
|
||||
vmcs_write32(GUEST_CS_LIMIT, 0xffff);
|
||||
if (vmcs_readl(GUEST_CS_BASE) == 0xffff0000)
|
||||
vmcs_writel(GUEST_CS_BASE, 0xf0000);
|
||||
vmcs_write16(GUEST_CS_SELECTOR, vmcs_readl(GUEST_CS_BASE) >> 4);
|
||||
vmx_get_segment(vcpu, &var, VCPU_SREG_CS);
|
||||
vmx_set_segment(vcpu, &var, VCPU_SREG_CS);
|
||||
|
||||
fix_rmode_seg(VCPU_SREG_ES, &vmx->rmode.es);
|
||||
fix_rmode_seg(VCPU_SREG_DS, &vmx->rmode.ds);
|
||||
fix_rmode_seg(VCPU_SREG_GS, &vmx->rmode.gs);
|
||||
fix_rmode_seg(VCPU_SREG_FS, &vmx->rmode.fs);
|
||||
vmx_get_segment(vcpu, &var, VCPU_SREG_ES);
|
||||
vmx_set_segment(vcpu, &var, VCPU_SREG_ES);
|
||||
|
||||
vmx_get_segment(vcpu, &var, VCPU_SREG_DS);
|
||||
vmx_set_segment(vcpu, &var, VCPU_SREG_DS);
|
||||
|
||||
vmx_get_segment(vcpu, &var, VCPU_SREG_GS);
|
||||
vmx_set_segment(vcpu, &var, VCPU_SREG_GS);
|
||||
|
||||
vmx_get_segment(vcpu, &var, VCPU_SREG_FS);
|
||||
vmx_set_segment(vcpu, &var, VCPU_SREG_FS);
|
||||
|
||||
continue_rmode:
|
||||
kvm_mmu_reset_context(vcpu);
|
||||
|
@ -3027,6 +3059,8 @@ static u64 construct_eptp(unsigned long root_hpa)
|
|||
/* TODO write the value reading from MSR */
|
||||
eptp = VMX_EPT_DEFAULT_MT |
|
||||
VMX_EPT_DEFAULT_GAW << VMX_EPT_GAW_EPTP_SHIFT;
|
||||
if (enable_ept_ad_bits)
|
||||
eptp |= VMX_EPT_AD_ENABLE_BIT;
|
||||
eptp |= (root_hpa & PAGE_MASK);
|
||||
|
||||
return eptp;
|
||||
|
@ -3153,11 +3187,22 @@ static int __vmx_get_cpl(struct kvm_vcpu *vcpu)
|
|||
|
||||
static int vmx_get_cpl(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct vcpu_vmx *vmx = to_vmx(vcpu);
|
||||
|
||||
/*
|
||||
* If we enter real mode with cs.sel & 3 != 0, the normal CPL calculations
|
||||
* fail; use the cache instead.
|
||||
*/
|
||||
if (unlikely(vmx->emulation_required && emulate_invalid_guest_state)) {
|
||||
return vmx->cpl;
|
||||
}
|
||||
|
||||
if (!test_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail)) {
|
||||
__set_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail);
|
||||
to_vmx(vcpu)->cpl = __vmx_get_cpl(vcpu);
|
||||
vmx->cpl = __vmx_get_cpl(vcpu);
|
||||
}
|
||||
return to_vmx(vcpu)->cpl;
|
||||
|
||||
return vmx->cpl;
|
||||
}
|
||||
|
||||
|
||||
|
@ -3165,7 +3210,7 @@ static u32 vmx_segment_access_rights(struct kvm_segment *var)
|
|||
{
|
||||
u32 ar;
|
||||
|
||||
if (var->unusable)
|
||||
if (var->unusable || !var->present)
|
||||
ar = 1 << 16;
|
||||
else {
|
||||
ar = var->type & 15;
|
||||
|
@ -3177,8 +3222,6 @@ static u32 vmx_segment_access_rights(struct kvm_segment *var)
|
|||
ar |= (var->db & 1) << 14;
|
||||
ar |= (var->g & 1) << 15;
|
||||
}
|
||||
if (ar == 0) /* a 0 value means unusable */
|
||||
ar = AR_UNUSABLE_MASK;
|
||||
|
||||
return ar;
|
||||
}
|
||||
|
@ -3229,6 +3272,44 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
|
|||
|
||||
vmcs_write32(sf->ar_bytes, ar);
|
||||
__clear_bit(VCPU_EXREG_CPL, (ulong *)&vcpu->arch.regs_avail);
|
||||
|
||||
/*
|
||||
* Fix segments for real mode guest in hosts that don't have
|
||||
* "unrestricted_mode" or it was disabled.
|
||||
* This is done to allow migration of the guests from hosts with
|
||||
* unrestricted guest like Westmere to older host that don't have
|
||||
* unrestricted guest like Nehelem.
|
||||
*/
|
||||
if (!enable_unrestricted_guest && vmx->rmode.vm86_active) {
|
||||
switch (seg) {
|
||||
case VCPU_SREG_CS:
|
||||
vmcs_write32(GUEST_CS_AR_BYTES, 0xf3);
|
||||
vmcs_write32(GUEST_CS_LIMIT, 0xffff);
|
||||
if (vmcs_readl(GUEST_CS_BASE) == 0xffff0000)
|
||||
vmcs_writel(GUEST_CS_BASE, 0xf0000);
|
||||
vmcs_write16(GUEST_CS_SELECTOR,
|
||||
vmcs_readl(GUEST_CS_BASE) >> 4);
|
||||
break;
|
||||
case VCPU_SREG_ES:
|
||||
fix_rmode_seg(VCPU_SREG_ES, &vmx->rmode.es);
|
||||
break;
|
||||
case VCPU_SREG_DS:
|
||||
fix_rmode_seg(VCPU_SREG_DS, &vmx->rmode.ds);
|
||||
break;
|
||||
case VCPU_SREG_GS:
|
||||
fix_rmode_seg(VCPU_SREG_GS, &vmx->rmode.gs);
|
||||
break;
|
||||
case VCPU_SREG_FS:
|
||||
fix_rmode_seg(VCPU_SREG_FS, &vmx->rmode.fs);
|
||||
break;
|
||||
case VCPU_SREG_SS:
|
||||
vmcs_write16(GUEST_SS_SELECTOR,
|
||||
vmcs_readl(GUEST_SS_BASE) >> 4);
|
||||
vmcs_write32(GUEST_SS_LIMIT, 0xffff);
|
||||
vmcs_write32(GUEST_SS_AR_BYTES, 0xf3);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static void vmx_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
|
||||
|
@ -3731,6 +3812,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
|
|||
if (!enable_ept) {
|
||||
exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
|
||||
enable_unrestricted_guest = 0;
|
||||
/* Enable INVPCID for non-ept guests may cause performance regression. */
|
||||
exec_control &= ~SECONDARY_EXEC_ENABLE_INVPCID;
|
||||
}
|
||||
if (!enable_unrestricted_guest)
|
||||
exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
|
||||
|
@ -4489,7 +4572,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
|
|||
break;
|
||||
}
|
||||
vcpu->run->exit_reason = 0;
|
||||
pr_unimpl(vcpu, "unhandled control register: op %d cr %d\n",
|
||||
vcpu_unimpl(vcpu, "unhandled control register: op %d cr %d\n",
|
||||
(int)(exit_qualification >> 4) & 3, cr);
|
||||
return 0;
|
||||
}
|
||||
|
@ -4769,6 +4852,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
|
|||
{
|
||||
unsigned long exit_qualification;
|
||||
gpa_t gpa;
|
||||
u32 error_code;
|
||||
int gla_validity;
|
||||
|
||||
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
|
||||
|
@ -4793,7 +4877,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
|
|||
|
||||
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
|
||||
trace_kvm_page_fault(gpa, exit_qualification);
|
||||
return kvm_mmu_page_fault(vcpu, gpa, exit_qualification & 0x3, NULL, 0);
|
||||
|
||||
/* It is a write fault? */
|
||||
error_code = exit_qualification & (1U << 1);
|
||||
/* ept page table is present? */
|
||||
error_code |= (exit_qualification >> 3) & 0x1;
|
||||
|
||||
return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
|
||||
}
|
||||
|
||||
static u64 ept_rsvd_mask(u64 spte, int level)
|
||||
|
@ -4908,15 +4998,18 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
|
|||
int ret = 1;
|
||||
u32 cpu_exec_ctrl;
|
||||
bool intr_window_requested;
|
||||
unsigned count = 130;
|
||||
|
||||
cpu_exec_ctrl = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
|
||||
intr_window_requested = cpu_exec_ctrl & CPU_BASED_VIRTUAL_INTR_PENDING;
|
||||
|
||||
while (!guest_state_valid(vcpu)) {
|
||||
if (intr_window_requested
|
||||
&& (kvm_get_rflags(&vmx->vcpu) & X86_EFLAGS_IF))
|
||||
while (!guest_state_valid(vcpu) && count-- != 0) {
|
||||
if (intr_window_requested && vmx_interrupt_allowed(vcpu))
|
||||
return handle_interrupt_window(&vmx->vcpu);
|
||||
|
||||
if (test_bit(KVM_REQ_EVENT, &vcpu->requests))
|
||||
return 1;
|
||||
|
||||
err = emulate_instruction(vcpu, 0);
|
||||
|
||||
if (err == EMULATE_DO_MMIO) {
|
||||
|
@ -4924,8 +5017,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
|
|||
goto out;
|
||||
}
|
||||
|
||||
if (err != EMULATE_DONE)
|
||||
if (err != EMULATE_DONE) {
|
||||
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
|
||||
vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
|
||||
vcpu->run->internal.ndata = 0;
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (signal_pending(current))
|
||||
goto out;
|
||||
|
@ -4933,7 +5030,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
|
|||
schedule();
|
||||
}
|
||||
|
||||
vmx->emulation_required = 0;
|
||||
vmx->emulation_required = !guest_state_valid(vcpu);
|
||||
out:
|
||||
return ret;
|
||||
}
|
||||
|
@ -6467,6 +6564,23 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
|
||||
/* Exposing INVPCID only when PCID is exposed */
|
||||
best = kvm_find_cpuid_entry(vcpu, 0x7, 0);
|
||||
if (vmx_invpcid_supported() &&
|
||||
best && (best->ecx & bit(X86_FEATURE_INVPCID)) &&
|
||||
guest_cpuid_has_pcid(vcpu)) {
|
||||
exec_control |= SECONDARY_EXEC_ENABLE_INVPCID;
|
||||
vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
|
||||
exec_control);
|
||||
} else {
|
||||
exec_control &= ~SECONDARY_EXEC_ENABLE_INVPCID;
|
||||
vmcs_write32(SECONDARY_VM_EXEC_CONTROL,
|
||||
exec_control);
|
||||
if (best)
|
||||
best->ecx &= ~bit(X86_FEATURE_INVPCID);
|
||||
}
|
||||
}
|
||||
|
||||
static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
|
||||
|
@ -7201,6 +7315,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
|
|||
.cpuid_update = vmx_cpuid_update,
|
||||
|
||||
.rdtscp_supported = vmx_rdtscp_supported,
|
||||
.invpcid_supported = vmx_invpcid_supported,
|
||||
|
||||
.set_supported_cpuid = vmx_set_supported_cpuid,
|
||||
|
||||
|
@ -7230,23 +7345,21 @@ static int __init vmx_init(void)
|
|||
if (!vmx_io_bitmap_a)
|
||||
return -ENOMEM;
|
||||
|
||||
r = -ENOMEM;
|
||||
|
||||
vmx_io_bitmap_b = (unsigned long *)__get_free_page(GFP_KERNEL);
|
||||
if (!vmx_io_bitmap_b) {
|
||||
r = -ENOMEM;
|
||||
if (!vmx_io_bitmap_b)
|
||||
goto out;
|
||||
}
|
||||
|
||||
vmx_msr_bitmap_legacy = (unsigned long *)__get_free_page(GFP_KERNEL);
|
||||
if (!vmx_msr_bitmap_legacy) {
|
||||
r = -ENOMEM;
|
||||
if (!vmx_msr_bitmap_legacy)
|
||||
goto out1;
|
||||
}
|
||||
|
||||
|
||||
vmx_msr_bitmap_longmode = (unsigned long *)__get_free_page(GFP_KERNEL);
|
||||
if (!vmx_msr_bitmap_longmode) {
|
||||
r = -ENOMEM;
|
||||
if (!vmx_msr_bitmap_longmode)
|
||||
goto out2;
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* Allow direct access to the PC debug port (it is often used for I/O
|
||||
|
@ -7275,8 +7388,10 @@ static int __init vmx_init(void)
|
|||
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
|
||||
|
||||
if (enable_ept) {
|
||||
kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull,
|
||||
VMX_EPT_EXECUTABLE_MASK);
|
||||
kvm_mmu_set_mask_ptes(0ull,
|
||||
(enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
|
||||
(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
|
||||
0ull, VMX_EPT_EXECUTABLE_MASK);
|
||||
ept_set_mmio_spte_mask();
|
||||
kvm_enable_tdp();
|
||||
} else
|
||||
|
|
|
@ -528,6 +528,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
|
|||
return 1;
|
||||
}
|
||||
|
||||
if (!(cr0 & X86_CR0_PG) && kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE))
|
||||
return 1;
|
||||
|
||||
kvm_x86_ops->set_cr0(vcpu, cr0);
|
||||
|
||||
if ((cr0 ^ old_cr0) & X86_CR0_PG) {
|
||||
|
@ -604,10 +607,20 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
|
|||
kvm_read_cr3(vcpu)))
|
||||
return 1;
|
||||
|
||||
if ((cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE)) {
|
||||
if (!guest_cpuid_has_pcid(vcpu))
|
||||
return 1;
|
||||
|
||||
/* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
|
||||
if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
|
||||
return 1;
|
||||
}
|
||||
|
||||
if (kvm_x86_ops->set_cr4(vcpu, cr4))
|
||||
return 1;
|
||||
|
||||
if ((cr4 ^ old_cr4) & pdptr_bits)
|
||||
if (((cr4 ^ old_cr4) & pdptr_bits) ||
|
||||
(!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
|
||||
kvm_mmu_reset_context(vcpu);
|
||||
|
||||
if ((cr4 ^ old_cr4) & X86_CR4_OSXSAVE)
|
||||
|
@ -626,8 +639,12 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
|
|||
}
|
||||
|
||||
if (is_long_mode(vcpu)) {
|
||||
if (cr3 & CR3_L_MODE_RESERVED_BITS)
|
||||
return 1;
|
||||
if (kvm_read_cr4(vcpu) & X86_CR4_PCIDE) {
|
||||
if (cr3 & CR3_PCID_ENABLED_RESERVED_BITS)
|
||||
return 1;
|
||||
} else
|
||||
if (cr3 & CR3_L_MODE_RESERVED_BITS)
|
||||
return 1;
|
||||
} else {
|
||||
if (is_pae(vcpu)) {
|
||||
if (cr3 & CR3_PAE_RESERVED_BITS)
|
||||
|
@ -795,6 +812,7 @@ static u32 msrs_to_save[] = {
|
|||
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
|
||||
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
|
||||
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
|
||||
MSR_KVM_PV_EOI_EN,
|
||||
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
|
||||
MSR_STAR,
|
||||
#ifdef CONFIG_X86_64
|
||||
|
@ -1437,8 +1455,8 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
break;
|
||||
}
|
||||
default:
|
||||
pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x "
|
||||
"data 0x%llx\n", msr, data);
|
||||
vcpu_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x "
|
||||
"data 0x%llx\n", msr, data);
|
||||
return 1;
|
||||
}
|
||||
return 0;
|
||||
|
@ -1470,8 +1488,8 @@ static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
case HV_X64_MSR_TPR:
|
||||
return kvm_hv_vapic_msr_write(vcpu, APIC_TASKPRI, data);
|
||||
default:
|
||||
pr_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x "
|
||||
"data 0x%llx\n", msr, data);
|
||||
vcpu_unimpl(vcpu, "HYPER-V unimplemented wrmsr: 0x%x "
|
||||
"data 0x%llx\n", msr, data);
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
@ -1551,15 +1569,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
data &= ~(u64)0x100; /* ignore ignne emulation enable */
|
||||
data &= ~(u64)0x8; /* ignore TLB cache disable */
|
||||
if (data != 0) {
|
||||
pr_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n",
|
||||
data);
|
||||
vcpu_unimpl(vcpu, "unimplemented HWCR wrmsr: 0x%llx\n",
|
||||
data);
|
||||
return 1;
|
||||
}
|
||||
break;
|
||||
case MSR_FAM10H_MMIO_CONF_BASE:
|
||||
if (data != 0) {
|
||||
pr_unimpl(vcpu, "unimplemented MMIO_CONF_BASE wrmsr: "
|
||||
"0x%llx\n", data);
|
||||
vcpu_unimpl(vcpu, "unimplemented MMIO_CONF_BASE wrmsr: "
|
||||
"0x%llx\n", data);
|
||||
return 1;
|
||||
}
|
||||
break;
|
||||
|
@ -1574,8 +1592,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
thus reserved and should throw a #GP */
|
||||
return 1;
|
||||
}
|
||||
pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
|
||||
__func__, data);
|
||||
vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",
|
||||
__func__, data);
|
||||
break;
|
||||
case MSR_IA32_UCODE_REV:
|
||||
case MSR_IA32_UCODE_WRITE:
|
||||
|
@ -1653,6 +1671,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
|
||||
|
||||
break;
|
||||
case MSR_KVM_PV_EOI_EN:
|
||||
if (kvm_lapic_enable_pv_eoi(vcpu, data))
|
||||
return 1;
|
||||
break;
|
||||
|
||||
case MSR_IA32_MCG_CTL:
|
||||
case MSR_IA32_MCG_STATUS:
|
||||
|
@ -1671,8 +1693,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
case MSR_K7_EVNTSEL2:
|
||||
case MSR_K7_EVNTSEL3:
|
||||
if (data != 0)
|
||||
pr_unimpl(vcpu, "unimplemented perfctr wrmsr: "
|
||||
"0x%x data 0x%llx\n", msr, data);
|
||||
vcpu_unimpl(vcpu, "unimplemented perfctr wrmsr: "
|
||||
"0x%x data 0x%llx\n", msr, data);
|
||||
break;
|
||||
/* at least RHEL 4 unconditionally writes to the perfctr registers,
|
||||
* so we ignore writes to make it happy.
|
||||
|
@ -1681,8 +1703,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
case MSR_K7_PERFCTR1:
|
||||
case MSR_K7_PERFCTR2:
|
||||
case MSR_K7_PERFCTR3:
|
||||
pr_unimpl(vcpu, "unimplemented perfctr wrmsr: "
|
||||
"0x%x data 0x%llx\n", msr, data);
|
||||
vcpu_unimpl(vcpu, "unimplemented perfctr wrmsr: "
|
||||
"0x%x data 0x%llx\n", msr, data);
|
||||
break;
|
||||
case MSR_P6_PERFCTR0:
|
||||
case MSR_P6_PERFCTR1:
|
||||
|
@ -1693,8 +1715,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
return kvm_pmu_set_msr(vcpu, msr, data);
|
||||
|
||||
if (pr || data != 0)
|
||||
pr_unimpl(vcpu, "disabled perfctr wrmsr: "
|
||||
"0x%x data 0x%llx\n", msr, data);
|
||||
vcpu_unimpl(vcpu, "disabled perfctr wrmsr: "
|
||||
"0x%x data 0x%llx\n", msr, data);
|
||||
break;
|
||||
case MSR_K7_CLK_CTL:
|
||||
/*
|
||||
|
@ -1720,7 +1742,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
/* Drop writes to this legacy MSR -- see rdmsr
|
||||
* counterpart for further detail.
|
||||
*/
|
||||
pr_unimpl(vcpu, "ignored wrmsr: 0x%x data %llx\n", msr, data);
|
||||
vcpu_unimpl(vcpu, "ignored wrmsr: 0x%x data %llx\n", msr, data);
|
||||
break;
|
||||
case MSR_AMD64_OSVW_ID_LENGTH:
|
||||
if (!guest_cpuid_has_osvw(vcpu))
|
||||
|
@ -1738,12 +1760,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
|
|||
if (kvm_pmu_msr(vcpu, msr))
|
||||
return kvm_pmu_set_msr(vcpu, msr, data);
|
||||
if (!ignore_msrs) {
|
||||
pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n",
|
||||
msr, data);
|
||||
vcpu_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n",
|
||||
msr, data);
|
||||
return 1;
|
||||
} else {
|
||||
pr_unimpl(vcpu, "ignored wrmsr: 0x%x data %llx\n",
|
||||
msr, data);
|
||||
vcpu_unimpl(vcpu, "ignored wrmsr: 0x%x data %llx\n",
|
||||
msr, data);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
@ -1846,7 +1868,7 @@ static int get_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
|
|||
data = kvm->arch.hv_hypercall;
|
||||
break;
|
||||
default:
|
||||
pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr);
|
||||
vcpu_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr);
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
@ -1877,7 +1899,7 @@ static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
|
|||
data = vcpu->arch.hv_vapic;
|
||||
break;
|
||||
default:
|
||||
pr_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr);
|
||||
vcpu_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr);
|
||||
return 1;
|
||||
}
|
||||
*pdata = data;
|
||||
|
@ -2030,10 +2052,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
|
|||
if (kvm_pmu_msr(vcpu, msr))
|
||||
return kvm_pmu_get_msr(vcpu, msr, pdata);
|
||||
if (!ignore_msrs) {
|
||||
pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr);
|
||||
vcpu_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr);
|
||||
return 1;
|
||||
} else {
|
||||
pr_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr);
|
||||
vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr);
|
||||
data = 0;
|
||||
}
|
||||
break;
|
||||
|
@ -4116,7 +4138,7 @@ static unsigned long emulator_get_cr(struct x86_emulate_ctxt *ctxt, int cr)
|
|||
value = kvm_get_cr8(vcpu);
|
||||
break;
|
||||
default:
|
||||
vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr);
|
||||
kvm_err("%s: unexpected cr %u\n", __func__, cr);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -4145,7 +4167,7 @@ static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val)
|
|||
res = kvm_set_cr8(vcpu, val);
|
||||
break;
|
||||
default:
|
||||
vcpu_printf(vcpu, "%s: unexpected cr %u\n", __func__, cr);
|
||||
kvm_err("%s: unexpected cr %u\n", __func__, cr);
|
||||
res = -1;
|
||||
}
|
||||
|
||||
|
@ -4297,26 +4319,10 @@ static int emulator_intercept(struct x86_emulate_ctxt *ctxt,
|
|||
return kvm_x86_ops->check_intercept(emul_to_vcpu(ctxt), info, stage);
|
||||
}
|
||||
|
||||
static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
|
||||
static void emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
|
||||
u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
|
||||
{
|
||||
struct kvm_cpuid_entry2 *cpuid = NULL;
|
||||
|
||||
if (eax && ecx)
|
||||
cpuid = kvm_find_cpuid_entry(emul_to_vcpu(ctxt),
|
||||
*eax, *ecx);
|
||||
|
||||
if (cpuid) {
|
||||
*eax = cpuid->eax;
|
||||
*ecx = cpuid->ecx;
|
||||
if (ebx)
|
||||
*ebx = cpuid->ebx;
|
||||
if (edx)
|
||||
*edx = cpuid->edx;
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx);
|
||||
}
|
||||
|
||||
static struct x86_emulate_ops emulate_ops = {
|
||||
|
@ -5296,8 +5302,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
|
|||
|
||||
r = kvm_mmu_reload(vcpu);
|
||||
if (unlikely(r)) {
|
||||
kvm_x86_ops->cancel_injection(vcpu);
|
||||
goto out;
|
||||
goto cancel_injection;
|
||||
}
|
||||
|
||||
preempt_disable();
|
||||
|
@ -5322,9 +5327,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
|
|||
smp_wmb();
|
||||
local_irq_enable();
|
||||
preempt_enable();
|
||||
kvm_x86_ops->cancel_injection(vcpu);
|
||||
r = 1;
|
||||
goto out;
|
||||
goto cancel_injection;
|
||||
}
|
||||
|
||||
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
|
||||
|
@ -5388,9 +5392,16 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
|
|||
if (unlikely(vcpu->arch.tsc_always_catchup))
|
||||
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
|
||||
|
||||
kvm_lapic_sync_from_vapic(vcpu);
|
||||
if (vcpu->arch.apic_attention)
|
||||
kvm_lapic_sync_from_vapic(vcpu);
|
||||
|
||||
r = kvm_x86_ops->handle_exit(vcpu);
|
||||
return r;
|
||||
|
||||
cancel_injection:
|
||||
kvm_x86_ops->cancel_injection(vcpu);
|
||||
if (unlikely(vcpu->arch.apic_attention))
|
||||
kvm_lapic_sync_from_vapic(vcpu);
|
||||
out:
|
||||
return r;
|
||||
}
|
||||
|
@ -6304,7 +6315,7 @@ void kvm_arch_free_memslot(struct kvm_memory_slot *free,
|
|||
|
||||
for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
|
||||
if (!dont || free->arch.lpage_info[i] != dont->arch.lpage_info[i]) {
|
||||
vfree(free->arch.lpage_info[i]);
|
||||
kvm_kvfree(free->arch.lpage_info[i]);
|
||||
free->arch.lpage_info[i] = NULL;
|
||||
}
|
||||
}
|
||||
|
@ -6323,7 +6334,7 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
|
|||
slot->base_gfn, level) + 1;
|
||||
|
||||
slot->arch.lpage_info[i] =
|
||||
vzalloc(lpages * sizeof(*slot->arch.lpage_info[i]));
|
||||
kvm_kvzalloc(lpages * sizeof(*slot->arch.lpage_info[i]));
|
||||
if (!slot->arch.lpage_info[i])
|
||||
goto out_free;
|
||||
|
||||
|
@ -6350,7 +6361,7 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
|
|||
|
||||
out_free:
|
||||
for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
|
||||
vfree(slot->arch.lpage_info[i]);
|
||||
kvm_kvfree(slot->arch.lpage_info[i]);
|
||||
slot->arch.lpage_info[i] = NULL;
|
||||
}
|
||||
return -ENOMEM;
|
||||
|
|
|
@ -654,16 +654,6 @@ sclp_remove_processed(struct sccb_header *sccb)
|
|||
|
||||
EXPORT_SYMBOL(sclp_remove_processed);
|
||||
|
||||
struct init_sccb {
|
||||
struct sccb_header header;
|
||||
u16 _reserved;
|
||||
u16 mask_length;
|
||||
sccb_mask_t receive_mask;
|
||||
sccb_mask_t send_mask;
|
||||
sccb_mask_t sclp_receive_mask;
|
||||
sccb_mask_t sclp_send_mask;
|
||||
} __attribute__((packed));
|
||||
|
||||
/* Prepare init mask request. Called while sclp_lock is locked. */
|
||||
static inline void
|
||||
__sclp_make_init_req(u32 receive_mask, u32 send_mask)
|
||||
|
|
|
@ -88,6 +88,16 @@ struct sccb_header {
|
|||
u16 response_code;
|
||||
} __attribute__((packed));
|
||||
|
||||
struct init_sccb {
|
||||
struct sccb_header header;
|
||||
u16 _reserved;
|
||||
u16 mask_length;
|
||||
sccb_mask_t receive_mask;
|
||||
sccb_mask_t send_mask;
|
||||
sccb_mask_t sclp_receive_mask;
|
||||
sccb_mask_t sclp_send_mask;
|
||||
} __attribute__((packed));
|
||||
|
||||
extern u64 sclp_facilities;
|
||||
#define SCLP_HAS_CHP_INFO (sclp_facilities & 0x8000000000000000ULL)
|
||||
#define SCLP_HAS_CHP_RECONFIG (sclp_facilities & 0x2000000000000000ULL)
|
||||
|
|
|
@ -48,6 +48,7 @@ struct read_info_sccb {
|
|||
u8 _reserved5[4096 - 112]; /* 112-4095 */
|
||||
} __attribute__((packed, aligned(PAGE_SIZE)));
|
||||
|
||||
static struct init_sccb __initdata early_event_mask_sccb __aligned(PAGE_SIZE);
|
||||
static struct read_info_sccb __initdata early_read_info_sccb;
|
||||
static int __initdata early_read_info_sccb_valid;
|
||||
|
||||
|
@ -104,6 +105,19 @@ static void __init sclp_read_info_early(void)
|
|||
}
|
||||
}
|
||||
|
||||
static void __init sclp_event_mask_early(void)
|
||||
{
|
||||
struct init_sccb *sccb = &early_event_mask_sccb;
|
||||
int rc;
|
||||
|
||||
do {
|
||||
memset(sccb, 0, sizeof(*sccb));
|
||||
sccb->header.length = sizeof(*sccb);
|
||||
sccb->mask_length = sizeof(sccb_mask_t);
|
||||
rc = sclp_cmd_sync_early(SCLP_CMDW_WRITE_EVENT_MASK, sccb);
|
||||
} while (rc == -EBUSY);
|
||||
}
|
||||
|
||||
void __init sclp_facilities_detect(void)
|
||||
{
|
||||
struct read_info_sccb *sccb;
|
||||
|
@ -119,6 +133,30 @@ void __init sclp_facilities_detect(void)
|
|||
rnmax = sccb->rnmax ? sccb->rnmax : sccb->rnmax2;
|
||||
rzm = sccb->rnsize ? sccb->rnsize : sccb->rnsize2;
|
||||
rzm <<= 20;
|
||||
|
||||
sclp_event_mask_early();
|
||||
}
|
||||
|
||||
bool __init sclp_has_linemode(void)
|
||||
{
|
||||
struct init_sccb *sccb = &early_event_mask_sccb;
|
||||
|
||||
if (sccb->header.response_code != 0x20)
|
||||
return 0;
|
||||
if (sccb->sclp_send_mask & (EVTYP_MSG_MASK | EVTYP_PMSGCMD_MASK))
|
||||
return 1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
bool __init sclp_has_vt220(void)
|
||||
{
|
||||
struct init_sccb *sccb = &early_event_mask_sccb;
|
||||
|
||||
if (sccb->header.response_code != 0x20)
|
||||
return 0;
|
||||
if (sccb->sclp_send_mask & EVTYP_VT220MSG_MASK)
|
||||
return 1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
unsigned long long sclp_get_rnmax(void)
|
||||
|
|
|
@ -25,6 +25,7 @@
|
|||
#include <asm/io.h>
|
||||
#include <asm/kvm_para.h>
|
||||
#include <asm/kvm_virtio.h>
|
||||
#include <asm/sclp.h>
|
||||
#include <asm/setup.h>
|
||||
#include <asm/irq.h>
|
||||
|
||||
|
@ -468,7 +469,7 @@ static __init int early_put_chars(u32 vtermno, const char *buf, int count)
|
|||
|
||||
static int __init s390_virtio_console_init(void)
|
||||
{
|
||||
if (!MACHINE_IS_KVM)
|
||||
if (sclp_has_vt220() || sclp_has_linemode())
|
||||
return -ENODEV;
|
||||
return virtio_cons_early_init(early_put_chars);
|
||||
}
|
||||
|
|
|
@ -617,6 +617,7 @@ struct kvm_ppc_smmu_info {
|
|||
#define KVM_CAP_SIGNAL_MSI 77
|
||||
#define KVM_CAP_PPC_GET_SMMU_INFO 78
|
||||
#define KVM_CAP_S390_COW 79
|
||||
#define KVM_CAP_PPC_ALLOC_HTAB 80
|
||||
|
||||
#ifdef KVM_CAP_IRQ_ROUTING
|
||||
|
||||
|
@ -828,6 +829,8 @@ struct kvm_s390_ucas_mapping {
|
|||
#define KVM_SIGNAL_MSI _IOW(KVMIO, 0xa5, struct kvm_msi)
|
||||
/* Available with KVM_CAP_PPC_GET_SMMU_INFO */
|
||||
#define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info)
|
||||
/* Available with KVM_CAP_PPC_ALLOC_HTAB */
|
||||
#define KVM_PPC_ALLOCATE_HTAB _IOWR(KVMIO, 0xa7, __u32)
|
||||
|
||||
/*
|
||||
* ioctls for vcpu fds
|
||||
|
|
|
@ -306,7 +306,7 @@ struct kvm {
|
|||
struct hlist_head irq_ack_notifier_list;
|
||||
#endif
|
||||
|
||||
#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
|
||||
#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
|
||||
struct mmu_notifier mmu_notifier;
|
||||
unsigned long mmu_notifier_seq;
|
||||
long mmu_notifier_count;
|
||||
|
@ -314,13 +314,19 @@ struct kvm {
|
|||
long tlbs_dirty;
|
||||
};
|
||||
|
||||
/* The guest did something we don't support. */
|
||||
#define pr_unimpl(vcpu, fmt, ...) \
|
||||
pr_err_ratelimited("kvm: %i: cpu%i " fmt, \
|
||||
current->tgid, (vcpu)->vcpu_id , ## __VA_ARGS__)
|
||||
#define kvm_err(fmt, ...) \
|
||||
pr_err("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
|
||||
#define kvm_info(fmt, ...) \
|
||||
pr_info("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
|
||||
#define kvm_debug(fmt, ...) \
|
||||
pr_debug("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
|
||||
#define kvm_pr_unimpl(fmt, ...) \
|
||||
pr_err_ratelimited("kvm [%i]: " fmt, \
|
||||
task_tgid_nr(current), ## __VA_ARGS__)
|
||||
|
||||
#define kvm_printf(kvm, fmt ...) printk(KERN_DEBUG fmt)
|
||||
#define vcpu_printf(vcpu, fmt...) kvm_printf(vcpu->kvm, fmt)
|
||||
/* The guest did something we don't support. */
|
||||
#define vcpu_unimpl(vcpu, fmt, ...) \
|
||||
kvm_pr_unimpl("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
|
||||
|
||||
static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
|
||||
{
|
||||
|
@ -535,6 +541,9 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
|
|||
|
||||
void kvm_free_physmem(struct kvm *kvm);
|
||||
|
||||
void *kvm_kvzalloc(unsigned long size);
|
||||
void kvm_kvfree(const void *addr);
|
||||
|
||||
#ifndef __KVM_HAVE_ARCH_VM_ALLOC
|
||||
static inline struct kvm *kvm_arch_alloc_vm(void)
|
||||
{
|
||||
|
@ -771,7 +780,7 @@ struct kvm_stats_debugfs_item {
|
|||
extern struct kvm_stats_debugfs_item debugfs_entries[];
|
||||
extern struct dentry *kvm_debugfs_dir;
|
||||
|
||||
#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
|
||||
#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
|
||||
static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_seq)
|
||||
{
|
||||
if (unlikely(vcpu->kvm->mmu_notifier_count))
|
||||
|
@ -793,7 +802,7 @@ static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_se
|
|||
}
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_HAVE_KVM_IRQCHIP
|
||||
#ifdef KVM_CAP_IRQ_ROUTING
|
||||
|
||||
#define KVM_MAX_IRQ_ROUTES 1024
|
||||
|
||||
|
|
|
@ -13,7 +13,8 @@
|
|||
ERSN(DEBUG), ERSN(HLT), ERSN(MMIO), ERSN(IRQ_WINDOW_OPEN), \
|
||||
ERSN(SHUTDOWN), ERSN(FAIL_ENTRY), ERSN(INTR), ERSN(SET_TPR), \
|
||||
ERSN(TPR_ACCESS), ERSN(S390_SIEIC), ERSN(S390_RESET), ERSN(DCR),\
|
||||
ERSN(NMI), ERSN(INTERNAL_ERROR), ERSN(OSI)
|
||||
ERSN(NMI), ERSN(INTERNAL_ERROR), ERSN(OSI), ERSN(PAPR_HCALL), \
|
||||
ERSN(S390_UCONTROL)
|
||||
|
||||
TRACE_EVENT(kvm_userspace_exit,
|
||||
TP_PROTO(__u32 reason, int errno),
|
||||
|
@ -36,7 +37,7 @@ TRACE_EVENT(kvm_userspace_exit,
|
|||
__entry->errno < 0 ? -__entry->errno : __entry->reason)
|
||||
);
|
||||
|
||||
#if defined(__KVM_HAVE_IOAPIC)
|
||||
#if defined(__KVM_HAVE_IRQ_LINE)
|
||||
TRACE_EVENT(kvm_set_irq,
|
||||
TP_PROTO(unsigned int gsi, int level, int irq_source_id),
|
||||
TP_ARGS(gsi, level, irq_source_id),
|
||||
|
@ -56,7 +57,9 @@ TRACE_EVENT(kvm_set_irq,
|
|||
TP_printk("gsi %u level %d source %d",
|
||||
__entry->gsi, __entry->level, __entry->irq_source_id)
|
||||
);
|
||||
#endif
|
||||
|
||||
#if defined(__KVM_HAVE_IOAPIC)
|
||||
#define kvm_deliver_mode \
|
||||
{0x0, "Fixed"}, \
|
||||
{0x1, "LowPrio"}, \
|
||||
|
|
|
@ -191,7 +191,8 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
|
|||
return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
|
||||
}
|
||||
|
||||
int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
|
||||
int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
|
||||
int level)
|
||||
{
|
||||
u32 old_irr;
|
||||
u32 mask = 1 << irq;
|
||||
|
@ -201,9 +202,11 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
|
|||
spin_lock(&ioapic->lock);
|
||||
old_irr = ioapic->irr;
|
||||
if (irq >= 0 && irq < IOAPIC_NUM_PINS) {
|
||||
int irq_level = __kvm_irq_line_state(&ioapic->irq_states[irq],
|
||||
irq_source_id, level);
|
||||
entry = ioapic->redirtbl[irq];
|
||||
level ^= entry.fields.polarity;
|
||||
if (!level)
|
||||
irq_level ^= entry.fields.polarity;
|
||||
if (!irq_level)
|
||||
ioapic->irr &= ~mask;
|
||||
else {
|
||||
int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
|
||||
|
@ -221,6 +224,16 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
|
|||
return ret;
|
||||
}
|
||||
|
||||
void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int irq_source_id)
|
||||
{
|
||||
int i;
|
||||
|
||||
spin_lock(&ioapic->lock);
|
||||
for (i = 0; i < KVM_IOAPIC_NUM_PINS; i++)
|
||||
__clear_bit(irq_source_id, &ioapic->irq_states[i]);
|
||||
spin_unlock(&ioapic->lock);
|
||||
}
|
||||
|
||||
static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
|
||||
int trigger_mode)
|
||||
{
|
||||
|
|
|
@ -74,7 +74,9 @@ void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode);
|
|||
bool kvm_ioapic_handles_vector(struct kvm *kvm, int vector);
|
||||
int kvm_ioapic_init(struct kvm *kvm);
|
||||
void kvm_ioapic_destroy(struct kvm *kvm);
|
||||
int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level);
|
||||
int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
|
||||
int level);
|
||||
void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int irq_source_id);
|
||||
void kvm_ioapic_reset(struct kvm_ioapic *ioapic);
|
||||
int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
|
||||
struct kvm_lapic_irq *irq);
|
||||
|
|
|
@ -33,26 +33,12 @@
|
|||
|
||||
#include "ioapic.h"
|
||||
|
||||
static inline int kvm_irq_line_state(unsigned long *irq_state,
|
||||
int irq_source_id, int level)
|
||||
{
|
||||
/* Logical OR for level trig interrupt */
|
||||
if (level)
|
||||
set_bit(irq_source_id, irq_state);
|
||||
else
|
||||
clear_bit(irq_source_id, irq_state);
|
||||
|
||||
return !!(*irq_state);
|
||||
}
|
||||
|
||||
static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
|
||||
struct kvm *kvm, int irq_source_id, int level)
|
||||
{
|
||||
#ifdef CONFIG_X86
|
||||
struct kvm_pic *pic = pic_irqchip(kvm);
|
||||
level = kvm_irq_line_state(&pic->irq_states[e->irqchip.pin],
|
||||
irq_source_id, level);
|
||||
return kvm_pic_set_irq(pic, e->irqchip.pin, level);
|
||||
return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
|
||||
#else
|
||||
return -1;
|
||||
#endif
|
||||
|
@ -62,10 +48,7 @@ static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
|
|||
struct kvm *kvm, int irq_source_id, int level)
|
||||
{
|
||||
struct kvm_ioapic *ioapic = kvm->arch.vioapic;
|
||||
level = kvm_irq_line_state(&ioapic->irq_states[e->irqchip.pin],
|
||||
irq_source_id, level);
|
||||
|
||||
return kvm_ioapic_set_irq(ioapic, e->irqchip.pin, level);
|
||||
return kvm_ioapic_set_irq(ioapic, e->irqchip.pin, irq_source_id, level);
|
||||
}
|
||||
|
||||
inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
|
||||
|
@ -249,8 +232,6 @@ int kvm_request_irq_source_id(struct kvm *kvm)
|
|||
|
||||
void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
|
||||
{
|
||||
int i;
|
||||
|
||||
ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
|
||||
|
||||
mutex_lock(&kvm->irq_lock);
|
||||
|
@ -263,14 +244,10 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
|
|||
if (!irqchip_in_kernel(kvm))
|
||||
goto unlock;
|
||||
|
||||
for (i = 0; i < KVM_IOAPIC_NUM_PINS; i++) {
|
||||
clear_bit(irq_source_id, &kvm->arch.vioapic->irq_states[i]);
|
||||
if (i >= 16)
|
||||
continue;
|
||||
kvm_ioapic_clear_all(kvm->arch.vioapic, irq_source_id);
|
||||
#ifdef CONFIG_X86
|
||||
clear_bit(irq_source_id, &pic_irqchip(kvm)->irq_states[i]);
|
||||
kvm_pic_clear_all(pic_irqchip(kvm), irq_source_id);
|
||||
#endif
|
||||
}
|
||||
unlock:
|
||||
mutex_unlock(&kvm->irq_lock);
|
||||
}
|
||||
|
|
|
@ -516,16 +516,32 @@ static struct kvm *kvm_create_vm(unsigned long type)
|
|||
return ERR_PTR(r);
|
||||
}
|
||||
|
||||
/*
|
||||
* Avoid using vmalloc for a small buffer.
|
||||
* Should not be used when the size is statically known.
|
||||
*/
|
||||
void *kvm_kvzalloc(unsigned long size)
|
||||
{
|
||||
if (size > PAGE_SIZE)
|
||||
return vzalloc(size);
|
||||
else
|
||||
return kzalloc(size, GFP_KERNEL);
|
||||
}
|
||||
|
||||
void kvm_kvfree(const void *addr)
|
||||
{
|
||||
if (is_vmalloc_addr(addr))
|
||||
vfree(addr);
|
||||
else
|
||||
kfree(addr);
|
||||
}
|
||||
|
||||
static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
|
||||
{
|
||||
if (!memslot->dirty_bitmap)
|
||||
return;
|
||||
|
||||
if (2 * kvm_dirty_bitmap_bytes(memslot) > PAGE_SIZE)
|
||||
vfree(memslot->dirty_bitmap);
|
||||
else
|
||||
kfree(memslot->dirty_bitmap);
|
||||
|
||||
kvm_kvfree(memslot->dirty_bitmap);
|
||||
memslot->dirty_bitmap = NULL;
|
||||
}
|
||||
|
||||
|
@ -617,11 +633,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
|
|||
#ifndef CONFIG_S390
|
||||
unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
|
||||
|
||||
if (dirty_bytes > PAGE_SIZE)
|
||||
memslot->dirty_bitmap = vzalloc(dirty_bytes);
|
||||
else
|
||||
memslot->dirty_bitmap = kzalloc(dirty_bytes, GFP_KERNEL);
|
||||
|
||||
memslot->dirty_bitmap = kvm_kvzalloc(dirty_bytes);
|
||||
if (!memslot->dirty_bitmap)
|
||||
return -ENOMEM;
|
||||
|
||||
|
@ -1586,7 +1598,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
|
|||
*/
|
||||
for (pass = 0; pass < 2 && !yielded; pass++) {
|
||||
kvm_for_each_vcpu(i, vcpu, kvm) {
|
||||
if (!pass && i < last_boosted_vcpu) {
|
||||
if (!pass && i <= last_boosted_vcpu) {
|
||||
i = last_boosted_vcpu;
|
||||
continue;
|
||||
} else if (pass && i > last_boosted_vcpu)
|
||||
|
@ -2213,7 +2225,7 @@ static long kvm_dev_ioctl_check_extension_generic(long arg)
|
|||
case KVM_CAP_SIGNAL_MSI:
|
||||
#endif
|
||||
return 1;
|
||||
#ifdef CONFIG_HAVE_KVM_IRQCHIP
|
||||
#ifdef KVM_CAP_IRQ_ROUTING
|
||||
case KVM_CAP_IRQ_ROUTING:
|
||||
return KVM_MAX_IRQ_ROUTES;
|
||||
#endif
|
||||
|
|
Loading…
Reference in a new issue