kernel-fxtec-pro1x

Author	SHA1	Message	Date
Xiao Guangrong	fa1de2bfc0	KVM: MMU: add missing reserved bits check in speculative path In the speculative path, we should check guest pte's reserved bits just as the real processor does Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:56 +03:00
Avi Kivity	24157aaf83	KVM: MMU: Eliminate redundant temporaries in FNAME(fetch) 'level' and 'sptep' are aliases for 'interator.level' and 'iterator.sptep', no need for them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:48 +03:00
Avi Kivity	5991b33237	KVM: MMU: Validate all gptes during fetch, not just those used for new pages Currently, when we fetch an spte, we only verify that gptes match those that the walker saw if we build new shadow pages for them. However, this misses the following race: vcpu1 vcpu2 walk change gpte walk instantiate sp fetch existing sp Fix by validating every gpte, regardless of whether it is used for building a new sp or not. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:47 +03:00
Avi Kivity	0b3c933302	KVM: MMU: Simplify spte fetch() function Partition the function into three sections: - fetching indirect shadow pages (host_level > guest_level) - fetching direct shadow pages (page_level < host_level <= guest_level) - the final spte (page_level == host_level) Instead of the current spaghetti. A slight change from the original code is that we call validate_direct_spte() more often: previously we called it only for gw->level, now we also call it for lower levels. The change should have no effect. [xiao: fix regression caused by validate_direct_spte() called too late] Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:45 +03:00
Avi Kivity	39c8c672a1	KVM: MMU: Add gpte_valid() helper Move the code to check whether a gpte has changed since we fetched it into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:44 +03:00
Avi Kivity	a357bd229c	KVM: MMU: Add validate_direct_spte() helper Add a helper to verify that a direct shadow page is valid wrt the required access permissions; drop the page if it is not valid. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:43 +03:00
Avi Kivity	a3aa51cfaa	KVM: MMU: Add drop_large_spte() helper To clarify spte fetching code, move large spte handling into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:42 +03:00
Avi Kivity	32ef26a359	KVM: MMU: Add link_shadow_page() helper To simplify the process of fetching an spte, add a helper that links a shadow page to an spte. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:40 +03:00
Avi Kivity	f59c1d2ded	KVM: MMU: Keep going on permission error Real hardware disregards permission errors when computing page fault error code bit 0 (page present). Do the same. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:30 +03:00
Avi Kivity	b0eeec29fe	KVM: MMU: Only indicate a fetch fault in page fault error code if nx is enabled Bit 4 of the page fault error code is set only if EFER.NX is set. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:29 +03:00
Avi Kivity	be38d276b0	KVM: MMU: Introduce drop_spte() When we call rmap_remove(), we (almost) always immediately follow it by an __set_spte() to a nonpresent pte. Since we need to perform the two operations atomically, to avoid losing the dirty and accessed bits, introduce a helper drop_spte() and convert all call sites. The operation is still nonatomic at this point. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:17 +03:00
Xiao Guangrong	84754cd8fc	KVM: MMU: cleanup FNAME(fetch)() functions Cleanup this function that we are already get the direct sp's access Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:26 +03:00
Xiao Guangrong	9e7b0e7fba	KVM: MMU: fix direct sp's access corrupted If the mapping is writable but the dirty flag is not set, we will find the read-only direct sp and setup the mapping, then if the write #PF occur, we will mark this mapping writable in the read-only direct sp, now, other real read-only mapping will happily write it without #PF. It may hurt guest's COW Fixed by re-install the mapping when write #PF occur. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:25 +03:00
Xiao Guangrong	5fd5387c89	KVM: MMU: fix conflict access permissions in direct sp In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> \|---\| P[W] \| \| LPA \ PDE2 -> \|---\| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:23 +03:00
Avi Kivity	a1f4d39500	KVM: Remove memory alias support As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:00 +03:00
Xiao Guangrong	be71e061d1	KVM: MMU: don't mark pte notrap if it's just sync transient If the sync-sp just sync transient, don't mark its pte notrap Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:42 +03:00
Xiao Guangrong	cb83cad2e7	KVM: MMU: cleanup for dirty page judgment Using wrap function to cleanup page dirty judgment Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:39 +03:00
Xiao Guangrong	ac3cd03cca	KVM: MMU: rename 'page' and 'shadow_page' to 'sp' Rename 'page' and 'shadow_page' to 'sp' to better fit the context Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:38 +03:00
Andi Kleen	a24e809902	KVM: Fix unused but set warnings No real bugs in this one. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:29 +03:00
Lai Jiangshan	3af1817a0d	KVM: MMU: calculate correct gfn for small host pages backing large guest pages In Documentation/kvm/mmu.txt: gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. But in function FNAME(fetch)(), sp->gfn is incorrect when one of following situations occurred: 1) guest is 32bit paging and the guest PDE maps a 4-MByte page (backed by 4k host pages), FNAME(fetch)() miss handling the quadrant. And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);" is incorrect. 2) guest is long mode paging and the guest PDPTE maps a 1-GByte page (backed by 4k or 2M host pages). So we fix it to suit to the document and suit to the code which requires sp->gfn correct when sp->role.direct=1. We use the goal mapping gfn(gw->gfn) to calculate the base page frame for linear translations, it is simple and easy to be understood. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:21 +03:00
Lai Jiangshan	2032a93d66	KVM: MMU: Don't allocate gfns page for direct mmu pages When sp->role.direct is set, sp->gfns does not contain any essential information, leaf sptes reachable from this sp are for a continuous guest physical memory range (a linear range). So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL) Obviously, it is not essential information, we can calculate it when need. It means we don't need sp->gfns when sp->role.direct=1, Thus we can save one page usage for every kvm_mmu_page. Note: Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn() or kvm_mmu_page_set_gfn(). It is only exposed in FNAME(sync_page). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:52 +03:00
Avi Kivity	221d059d15	KVM: Update Red Hat copyrights Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:51 +03:00
Xiao Guangrong	f78978aa3a	KVM: MMU: only update unsync page in invlpg path Only unsync pages need updated at invlpg time since other shadow pages are write-protected Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:50 +03:00
Xiao Guangrong	f55c3f419a	KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page sp->gfns[] contain unaliased gfns, but gpte might contain pointer to aliased region. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:46 +03:00
Gui Jianfeng	518c5a05e8	KVM: MMU: Fix debug output error in walk_addr() Fix a debug output error in walk_addr Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:39 +03:00
Gui Jianfeng	f3b8c964a9	KVM: MMU: mark page table dirty when a pte is actually modified Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark page table page as dirty. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:38 +03:00
Huang Ying	bf998156d2	KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages In common cases, guest SRAO MCE will cause corresponding poisoned page be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay the MCE to guest OS. But it is reported that if the poisoned page is accessed in guest after unmapping and before MCE is relayed to guest OS, userspace will be killed. The reason is as follows. Because poisoned page has been un-mapped, guest access will cause guest exit and kvm_mmu_page_fault will be called. kvm_mmu_page_fault can not get the poisoned page for fault address, so kernel and user space MMIO processing is tried in turn. In user MMIO processing, poisoned page is accessed again, then userspace is killed by force_sig_info. To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM and do not try kernel and user space MMIO processing for poisoned page. [xiao: fix warning introduced by avi] Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:26 +03:00
Xiao Guangrong	6aa0b9dec5	KVM: MMU: fix conflict access permissions in direct sp In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> \|---\| P[W] \| \| LPA \ PDE2 -> \|---\| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-07-23 09:07:04 +03:00
Xiao Guangrong	884a0ff0b6	KVM: MMU: cleanup invlpg code Using is_last_spte() to cleanup invlpg code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:28 +03:00
Xiao Guangrong	22c9b2d166	KVM: MMU: fix for calculating gpa in invlpg code If the guest is 32-bit, we should use 'quadrant' to adjust gpa offset Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-19 11:36:25 +03:00
Gui Jianfeng	814a59d207	KVM: MMU: Make use of is_large_pte() in walker Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK bit directly. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:07 +03:00
Gui Jianfeng	51fb60d81b	KVM: MMU: Move sync_page() first pte address calculation out of loop Move first pte address calculation out of loop to save some cycles. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:06 +03:00
Xiao Guangrong	24222c2fec	KVM: MMU: remove unnecessary NX check in walk_addr After is_rsvd_bits_set() checks, EFER.NXE must be enabled if NX bit is seted Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:17:30 +03:00
Avi Kivity	08e850c653	KVM: MMU: Reinstate pte prefetch on invlpg Commit `fb341f57` removed the pte prefetch on guest invlpg, citing guest races. However, the SDM is adamant that prefetch is allowed: "The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path." And, in fact, there was a race in the prefetch code: we picked up the pte without the mmu lock held, so an older invlpg could install the pte over a newer invlpg. Reinstate the prefetch logic, but this time note whether another invlpg has executed using a counter. If a race occured, do not install the pte. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:43 +03:00
Avi Kivity	fbc5d139bb	KVM: MMU: Do not instantiate nontrapping spte on unsync page The update_pte() path currently uses a nontrapping spte when a nonpresent (or nonaccessed) gpte is written. This is fine since at present it is only used on sync pages. However, on an unsync page this will cause an endless fault loop as the guest is under no obligation to invlpg a gpte that transitions from nonpresent to present. Needed for the next patch which reinstates update_pte() on invlpg. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-05-17 12:15:42 +03:00
Gleb Natapov	1871c6020d	KVM: x86 emulator: fix memory access during x86 emulation Currently when x86 emulator needs to access memory, page walk is done with broadest permission possible, so if emulated instruction was executed by userspace process it can still access kernel memory. Fix that by providing correct memory access to page walker during emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:36:11 -03:00
Takuya Yoshikawa	8dae444529	KVM: rename is_writeble_pte() to is_writable_pte() There are two spellings of "writable" in arch/x86/kvm/mmu.c and paging_tmpl.h . This patch renames is_writeble_pte() to is_writable_pte() and makes grepping easy. New name is consistent with the definition of itself: return pte & PT_WRITABLE_MASK; Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:36:00 -03:00
Marcelo Tosatti	a6085fbaf6	KVM: MMU: bail out pagewalk on kvm_read_guest error Exit the guest pagetable walk loop if reading gpte failed. Otherwise its possible to enter an endless loop processing the previous present pte. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-01-25 12:26:38 -02:00
Marcelo Tosatti	fb341f572d	KVM: MMU: remove prefault from invlpg handler The invlpg prefault optimization breaks Windows 2008 R2 occasionally. The visible effect is that the invlpg handler instantiates a pte which is, microseconds later, written with a different gfn by another vcpu. The OS could have other mechanisms to prevent a present translation from being used, which the hypervisor is unaware of. While the documentation states that the cpu is at liberty to prefetch tlb entries, it looks like this is not heeded, so remove tlb prefetch from invlpg. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-27 13:36:30 -02:00
Marcelo Tosatti	5f5c35aad5	KVM: MMU: update invlpg handler comment Large page translations are always synchronized (either in level 3 or level 2), so its not necessary to properly deal with them in the invlpg handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:23 +02:00
Izik Eidus	1403283acc	KVM: MMU: add SPTE_HOST_WRITEABLE flag to the shadow ptes this flag notify that the host physical page we are pointing to from the spte is write protected, and therefore we cant change its access to be write unless we run get_user_pages(write = 1). (this is needed for change_pte support in kvm) Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-10-04 17:04:50 +02:00
Joerg Roedel	7e4e4056f7	KVM: MMU: shadow support for 1gb pages This patch adds support for shadow paging to the 1gb page table code in KVM. With this code the guest can use 1gb pages even if the host does not support them. [ Marcelo: fix shadow page collision on pmd level if a guest 1gb page is mapped with 4kb ptes on host level ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:19 +03:00
Joerg Roedel	e04da980c3	KVM: MMU: make page walker aware of mapping levels The page walker may be used with nested paging too when accessing mmio areas. Make it support the additional page-level too. [ Marcelo: fix reserved bit check for 1gb pte ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:18 +03:00
Joerg Roedel	852e3c19ac	KVM: MMU: make direct mapping paths aware of mapping levels Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:18 +03:00
Joerg Roedel	d25797b24c	KVM: MMU: rename is_largepage_backed to mapping_level With the new name and the corresponding backend changes this function can now support multiple hugepage sizes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:18 +03:00
Avi Kivity	0742017159	KVM: MMU: Trace guest pagetable walker Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:09 +03:00
Joerg Roedel	ec04b2604c	KVM: Prepare memslot data structures for multiple hugepage sizes [avi: fix build on non-x86] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:02 +03:00
Avi Kivity	d555c333aa	KVM: MMU: s/shadow_pte/spte/ We use shadow_pte and spte inconsistently, switch to the shorter spelling. Rename set_shadow_pte() to __set_spte() to avoid a conflict with the existing set_spte(), and to indicate its lowlevelness. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:51 +03:00
Avi Kivity	43a3795a3a	KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:51 +03:00
Avi Kivity	6de4f3ada4	KVM: Cache pdptrs Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx, guest memory access on svm) extract them on demand. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:46 +03:00

1 2

93 commits