We need to reserve a context from KVM to make sure we have our own
segment space. While we did that split for Book3S_64 already, 32 bit
is still outstanding.
So let's split it now.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
This is the code that will later be used instead of book3s_64_slb.S. It
does the last step of guest entry and the first generic steps of guest
exiting, once we have determined the interrupt is a KVM interrupt.
It also reads the last used instruction from the guest virtual address
space if necessary, to speed up that path.
The new thing about this file is that it makes use of generic long load
and store functions and calls a macro to fill in the actual segment
switching code. That still needs to be done differently for book3s_32 and
book3s_64.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Later in this series we will move the current segment switch code to
generic code and make that call hooks for the specific sub-archs (32
vs. 64 bit). This is the hook for 32 bits.
It enabled the entry and exit code to swap segment registers with
values from the shadow cpu structure.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
In order to support 32 bit Book3S, we need to add code to enable our
shadow MMU to actually add shadow PTEs. This is the module enabling
that support.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We have quite some code that can be used by Book3S_32 and Book3S_64 alike,
so let's call it "Book3S" instead of "Book3S_64", so we can later on
use it from the 32 bit port too.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Since gfn is not changed in the for loop, we do not need to call
gfn_to_memslot_unaliased() under the loop, and it is safe to move
it out.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK
bit directly.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Move first pte address calculation out of loop to save some cycles.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When CPU_UP_CANCELED, hardware_enable() has not been called at the CPU
which is going up because raw_notifier_call_chain(CPU_ONLINE)
has not been called for this cpu.
Drop the handling for CPU_UP_CANCELED.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Since commit bf47a760f6, we no longer handle ptes with the global bit
set specially, so there is no reason to distinguish between shadow pages
created with cr4.gpe set and clear.
Such tracking is expensive when the guest toggles cr4.pge, so drop it.
Signed-off-by: Avi Kivity <avi@redhat.com>
Quote from Avi:
|Just change the assignment to a 'goto restart;' please,
|I don't like playing with list_for_each internals.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If kvm_task_switch() fails code exits to userspace without specifying
exit reason, so the previous exit reason is reused by userspace. Fix
this by specifying exit reason correctly.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
emulator_task_switch() should return -1 for failure and 0 for success to
the caller, just like x86_emulate_insn() does.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
There is no real distinction between glevels=3 and glevels=4; both have
exactly the same format and the code is treated exactly the same way. Drop
role.glevels and replace is with role.cr4_pae (which is meaningful). This
simplifies the code a bit.
As a side effect, it allows sharing shadow page tables between pae and
longmode guest page tables at the same guest page.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When a fault triggers a task switch, the error code, if existent, has to
be pushed on the new task's stack. Implement the missing bits.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Stop the switch immediately if task_switch_16/32 returned an error. Only
if that step succeeded, the switch should actually take place and update
any register states.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We can call kvm_mmu_pte_write() directly from
emulator_cmpxchg_emulated() instead of passing mmu_only down to
emulator_write_emulated_onepage() and call it there.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch limits the number of pages per memory slot to make
us free from extra care about type issues.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently both SVM and VMX have their own DR handling code. Move it to
x86.c.
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
On SVM we set the instruction length of skipped instructions
to hard-coded, well known values, which could be wrong when (bogus,
but valid) prefixes (REX, segment override) are used.
Newer AMD processors (Fam10h 45nm and better, aka. PhenomII or
AthlonII) have an explicit NEXTRIP field in the VMCB containing the
desired information.
Since it is cheap to do so, we use this field to override the guessed
value on newer processors.
A fix for older CPUs would be rather expensive, as it would require
to fetch and partially decode the instruction. As the problem is not
a security issue and needs special, handcrafted code to trigger
(no compiler will ever generate such code), I omit a fix for older
CPUs.
If someone is interested, I have both a patch for these CPUs as well as
demo code triggering this issue: It segfaults under KVM, but runs
perfectly on native Linux.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
MAXPHYADDR is derived from cpuid 0x80000008, but when that isn't present, we
get some random value.
Fix by checking first that cpuid 0x80000008 is supported.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Log emulated instructions in ftrace, especially if they failed.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently if we an instruction spans a page boundary, when we fetch the
second half we overwrite the first half. This prevents us from tracing
the full instruction opcodes.
Fix by appending the second half to the first.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Commit a0abee86af2d1f048dbe99d2bcc4a2cefe685617 introduced unsetting of the
IRQ line from userspace. This added a new core specific callback that I
apparently forgot to add for BookE.
So let's add the callback for BookE as well, making it build again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
After is_rsvd_bits_set() checks, EFER.NXE must be enabled if NX bit is seted
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
kvm_mmu_page.oos_link is not used, so remove it
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch does:
- 'sp' parameter in inspect_spte_fn() is not used, so remove it
- fix 'kvm' and 'slots' is not defined in count_rmaps()
- fix a bug in inspect_spte_has_rmap()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Book3S knows how to convert floats to doubles and vice versa. BookE doesn't.
So let's make sure we don't export them on BookE.
This fixes a link error on BookE with CONFIG_KVM=y.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
BookE KVM doesn't know about QPRs, so let's not try to access then.
This fixes a build error on BookE KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Cell can't handle MSR_FE0 and MSR_FE1 too well. It gets dog slow.
So let's just override the guest whenever we see one of the two and mask them
out. See commit ddf5f75a16 for reference.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Bool defaults to at least byte width. We usually only want to waste a single
bit on this. So let's move all the bool values to bitfields, potentially
saving memory.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some constants were bigger than ints. Let's mark them as such so we don't
accidently truncate them.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some HTAB providers (namely the PS3) ignore the SECONDARY flag. They
just put an entry in the htab as secondary when they see fit.
So we need to check the return value of htab_insert to remember the
correct slot id so we can actually invalidate the entry again.
Fixes KVM on the PS3.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mac OS X uses the dcba instruction. According to the specification it doesn't
guarantee any functionality, so let's just emulate it as nop.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On most systems we need to emulate dcbz when running 32 bit guests. So
far we've been rather slack, not giving correct DSISR values to the guest.
This patch makes the emulation more accurate, introducing a difference
between "page not mapped" and "write protection fault". While at it, it
also speeds up dcbz emulation by an order of magnitude by using kmap.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The FPU/Altivec/VSX enablement also brought access to some structure
elements that are only defined when the respective config options
are enabled.
Unfortuately I forgot to check for the config options at some places,
so let's do that now.
Unbreaks the build when CONFIG_VSX is not set.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.
So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.
The only user of it so far is MOL.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.
In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Mac OS X has some applications - namely the Finder - that require alignment
interrupts to work properly. So we need to implement them.
But the spec for 970 and 750 also looks different. While 750 requires the
DSISR and DAR fields to reflect some instruction bits (DSISR) and the fault
address (DAR), the 970 declares this as an optional feature. So we need
to reconstruct DSISR and DAR manually.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We get MMIOs with the weirdest instructions. But every time we do,
we need to improve our emulator to implement them.
So let's do that - this time it's lbzux and lhax's round.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We have a 32 bit value in the PACA to store XER in. We also do an stw
when storing XER in there. But then we load it with ld, completely
screwing it up on every entry.
Welcome to the Big Endian world.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
BATs can't only be written to, you can also read them out!
So let's implement emulation for reading BAT values again.
While at it, I also made BAT setting flush the segment cache,
so we're absolutely sure there's no MMU state left when writing
BATs.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>