Commit graph

11322 commits

Author SHA1 Message Date
Ingo Molnar
12a81c8df1 Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core 2010-08-04 16:25:47 +02:00
Jiri Kosina
d790d4d583 Merge branch 'master' into for-next 2010-08-04 15:14:38 +02:00
Fenghua Yu
0199114c31 x86, hwmon: Package Level Thermal/Power: power limit
Power limit notification feature is published in Intel 64 and IA-32
Architectures SDMV Vol 3A 14.5.6 Power Limit Notification.

It is implemented first on Intel Sandy Bridge platform.

The patch handles notification interrupt. Interrupt handler dumps power limit
information in log_buf, logs the event in mce log, and increases the event
counters (core_power_limit and package_power_limit). Upper level applications
could use the data to detect system health or diagnose functionality/performance
issues.

In the future, the event could be handled in a more fancy way.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-5-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-03 15:58:56 -07:00
Fenghua Yu
55d435a227 x86, hwmon: Package Level Thermal/Power: thermal throttling handler
Add package level thermal throttle interrupt support. The interrupt handler
increases package level thermal throttle count. It also logs the event in MCE
log.

The package level thermal throttle interrupt happens across threads in a
package. Each thread handles the interrupt individually. User level application
is supposed to retrieve correct event count and log based on package/thread
topology. This is the same situation for core level interrupt handler. In the
future, interrupt may be reported only per package or per core.

core_throttle_count and package_throttle_count are used for user interface.
Previously only throttle_count is used for core throttle count. If you think
new core_throttle_count name breaks user interface, I can change this part.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-4-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-03 15:58:56 -07:00
Fenghua Yu
cb84b19474 x86, hwmon: Package Level Thermal/Power: pkgtemp hwmon driver
This patch adds a hwmon driver for package level thermal control. The driver
dumps package level thermal information through sysfs interface so that upper
level application (e.g. lm_sensor) can retrive the information.

Instead of having the package level hwmon code in coretemp, I write a seperate
driver pkgtemp because:

First, package level thermal sensors include not only sensors for each core,
but also sensors for uncore, memory controller or other components in the
package. Logically it will be clear to have a seperate hwmon driver for package
level hwmon to monitor wider range of sensors in a package. Merging package
thermal driver into core thermal driver doesn't make sense and may mislead.

Secondly, merging the two drivers together may cause coding mess. It's easier
to include various package level sensors info if more sensor information is
implemented. Coretemp code needs to consider a lot of legacy machine cases.
Pkgtemp code only considers platform starting from Sandy Bridge.

On a 1Sx4Cx2T Sandy Bridge platform, lm-sensors dumps the pkgtemp and coretemp:

pkgtemp-isa-0000
Adapter: ISA adapter
physical id 0: +33.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0002
Adapter: ISA adapter
Core 2:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0003
Adapter: ISA adapter
Core 3:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

[ hpa: folded v3 patch removing improper global variable "SHOW" ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-3-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-03 15:58:07 -07:00
Dave Jones
9d1f44ee20 [CPUFREQ] Remove pointless printk from p4-clockmod.
The only machines this is triggering on should be supported by
acpi-cpufreq or acpi's internal throttling.

Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:30 -04:00
Holger Freyther
307069cf6c [CPUFREQ] Fix section mismatch for powernow_cpu_init in powernow-k7.c
Use __cpuinit instead of __init for the cpufreq_driver
init function like it is done in powernow-k8.c.

This is removing the warning generated when compiling with
the CONFIG_DEBUG_SECTION_MISMATCH=y option.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:06 -04:00
Holger Freyther
2530573e45 [CPUFREQ] Fix section mismatch for longhaul_cpu_init.
Use __cpuinit instead of __init for the cpufreq_driver
init function like it is done in powernow-k8.c. Use the
__cpuinitdata for data used by the routines marked as __cpuinit.

This is removing the warning generated when compiling with
the CONFIG_DEBUG_SECTION_MISMATCH=y option.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:06 -04:00
Holger Freyther
7e2d811220 [CPUFREQ] Fix section mismatch for longrun_cpu_init.
Use __cpuinit instead of __init for the cpufreq_driver
init function like it is done in powernow-k8.c.

This is removing the warning generated when compiling with
the CONFIG_DEBUG_SECTION_MISMATCH=y option.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:06 -04:00
Borislav Petkov
b30d3304c9 [CPUFREQ] powernow-k8: Fix misleading variable naming
rdmsr() takes the lower 32 bits as a second argument and the high 32 as
a third. Fix the names accordingly since they were swapped.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:06 -04:00
Peter Huewe
55c789bb2b [CPUFREQ] Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used)
This patch converts pci_table entries, where .subvendor=PCI_ANY_ID and
.subdevice=PCI_ANY_ID, .class=0 and .class_mask=0, to use the
PCI_VDEVICE macro, and thus improves readability.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Kulikov Vasiliy
ccc5638a20 [CPUFREQ] arch/x86/kernel/cpu/cpufreq: use for_each_pci_dev()
Use for_each_pci_dev() to simplify the code.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Thomas Renninger
6f4f2723d0 [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent
and fix the broken case if a core's frequency depends on others.

trace_power_frequency was only implemented in a rather ungeneric way
in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
   cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
   notifier is triggered.
   This will support power frequency tracing by all cpufreq drivers

trace_power_frequency did not trace frequency changes correctly when
the userspace governor was used or when CPU cores' frequency depend
on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
   which gets switched automatically fixes this.

Robert Schoene provided some important fixes on top of my initial
quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
CC: robert.schoene@tu-dresden.de
Tested-by: robert.schoene@tu-dresden.de
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Thomas Renninger
6b72e3934b [CPUFREQ] acpi-cpufreq: Fix CPU_ANY CPUFREQ_{PRE,POST}CHANGE notification
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: venki@google.com
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Marti Raudsepp
298decfbc4 [CPUFREQ] powernow-k8: On load failure, remind the user to enable support in BIOS setup
On Wed, 2010-01-20 at 16:56 +0100, Thomas Renninger wrote:
> But most often this happens if people upgrade their CPU and do not
> update their BIOS.
> Or the vendor does not recognise the new CPU even if the BIOS got
> updated.

Maybe some of those people just didn't realize it was disabled in BIOS?
If you tell users that it's a firmware bug then they'll probably just
give up.

> The itself message might be an enhancment, IMO it's not worth a patch.

Why do you think so? I spent an hour on hunting down the BIOS upgrade,
only to find that it didn't improve anything. It was a day later that I
realized that it might be a BIOS option; and the option was literally
the _last_ option in the whole BIOS setup. :)

This message would have saved the day.

> But do not revert the FW_BUG part!

Sure, you have a point here.

How about this patch?
2010-08-03 13:47:04 -04:00
Borislav Petkov
c2f4a2c6e0 [CPUFREQ] powernow-k8: Limit Pstate transition latency check
The Pstate transition latency check was added for broken F10h BIOSen
which wrongly contain a value of 0 for transition and bus master
latency. Fam11h and later, however, (will) have similar transition
latency so extend that behavior for them too.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:03 -04:00
Matthew Garrett
6ebdf777ba [CPUFREQ] Fix PCC driver error path
The PCC cpufreq driver unmaps the mailbox address range if any CPUs fail to
initialise, but doesn't do anything to remove the registered CPUs from the
cpufreq core resulting in failures further down the line. We're better off
simply returning a failure - the cpufreq core will unregister us cleanly if
we end up with no successfully registered CPUs. Tidy up the failure path
and also add a sanity check to ensure that the firmware gives us a realistic
frequency - the core deals badly with that being set to 0.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:02 -04:00
Daniel J Blueman
0d9715d64f [CPUFREQ] fix double freeing in error path of pcc-cpufreq
Prevent double freeing on error path.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:02 -04:00
Matthew Garrett
5d77b85458 [CPUFREQ] pcc driver should check for pcch method before calling _OSC
The pcc specification documents an _OSC method that's incompatible with the
one defined as part of the ACPI spec. This shouldn't be a problem as both
are supposed to be guarded with a UUID. Unfortunately approximately nobody
(including HP, who wrote this spec) properly check the UUID on entry to the
_OSC call. Right now this could result in surprising behaviour if the pcc
driver performs an _OSC call on a machine that doesn't implement the pcc
specification. Check whether the PCCH method exists first in order to reduce
this probability.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:02 -04:00
Borislav Petkov
98a5ae2d99 x86, mce: Notify about corrected events too
Notify all parties registered on the mce decoder chain about logged
correctable MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2010-08-03 16:14:02 +02:00
Sreedhara DS
804f8681a9 Remove indirect read write api support.
The firmware of production devices does not support this interface so this
is dead code.

Signed-off-by: Sreedhara DS <sreedhara.ds@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2010-08-03 09:50:30 -04:00
Feng Tang
35f2915c3b intel_scu_ipc: add definitions for vRTC related command
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2010-08-03 09:48:47 -04:00
H. Peter Anvin
6238b47b58 x86, setup: move isdigit.h to ctype.h, header files on top.
It is a subset of <ctype.h> functionality, so name it ctype.h.  Also,
reorganize header files so #include statements are clustered near the
top as they should be.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4C5752F2.8030206@kernel.org>
2010-08-02 21:07:20 -07:00
Yinghai Lu
8fee13a48e x86, setup: enable early console output from the decompressor
This enables the decompressor output to be seen on the serial console.
Most of the code is shared with the regular boot code.

We could add printf to the decompressor if needed, but currently there
is no sufficiently compelling user.

-v2: define BOOT_BOOT_H to avoid include boot.h
-v3: early_serial_base need to be static in misc.c ?
-v4: create seperate string.c printf.c cmdline.c early_serial_console.c
     after hpa's patch that allow global variables in compressed/misc stage
-v5: remove printf.c related

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-02 20:32:20 -07:00
Alok Kataria
9f242dc10e x86, vmware: Preset lpj values when on VMware.
When running on VMware's platform, we have seen situations where
the AP's try to calibrate the lpj values and fail to get good calibration
runs becasue of timing issues. As a result delays don't work correctly
on all cpus.

The solutions is to set preset_lpj value based on the current tsc frequency
value. This is similar to what KVM does as well.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1280790637.14933.29.camel@ank32.eng.vmware.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-02 17:16:30 -07:00
Yinghai Lu
f4ed2877b1 x86, setup: reorganize the early console setup
Separate early_serial_console from tty.c

This allows for reuse of
early_serial_console.c/string.c/printf.c/cmdline.c in boot/compressed/.

-v2: according to hpa, don't include string.c etc
-v3: compressed/misc.c must have early_serial_base as static, so move it back to tty.c
     for setup code

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C568D2B.205@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-02 15:51:56 -07:00
H. Peter Anvin
22a57f5896 x86, setup: Allow global variables and functions in the decompressor
In order for global variables and functions to work in the
decompressor, we need to fix up the GOT in assembly code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4C57382E.8050501@zytor.com>
2010-08-02 15:34:44 -07:00
Shaohua Li
be783a4721 x86, vdso: Unmap vdso pages
We mapped vdso pages but never unmapped them and the virtual address
is lost after exiting from the function, so unmap vdso pages here.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <20100802004934.GA2505@sli10-desk.sh.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-02 15:11:59 -07:00
Konrad Rzeszutek Wilk
fe96eb404e x86: Detect whether we should use Xen SWIOTLB.
It is paramount that we call pci_xen_swiotlb_detect before
pci_swiotlb_detect as both implementations use the 'swiotlb'
and 'swiotlb_force' flags. The pci-xen_swiotlb_detect inhibits
the swiotlb_force and swiotlb flag so that the native SWIOTLB
implementation is not enabled when running under Xen.

[since v1 changed two Cc's to Acked-by]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
    [http://lkml.org/lkml/2010/7/27/374]
Cc: Albert Herranz <albert_herranz@yahoo.es>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
    [conditional http://lkml.org/lkml/2010/8/2/324]
Cc: x86@kernel.org
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-02 15:18:33 -04:00
Michal Schmidt
e8c534ec06 x86: Fix keeping track of AMD C1E
Accomodate the original C1E-aware idle routine to the different times
during boot when the BIOS enables C1E. While at it, remove the synthetic
CPUID flag in favor of a single global setting which denotes C1E status
on the system.

[ hpa: changed c1e_enabled to be a bool; clarified cpu bit 3:21 comment ]

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
LKML-Reference: <20100727165335.GA11630@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2010-08-02 08:45:56 -07:00
Ingo Molnar
3772b73472 Merge commit 'v2.6.35' into perf/core
Conflicts:
	tools/perf/Makefile
	tools/perf/util/hist.c

Merge reason: Resolve the conflicts and update to latest upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-02 08:31:54 +02:00
Avi Kivity
3444d7da18 KVM: VMX: Fix host GDT.LIMIT corruption
vmx does not restore GDT.LIMIT to the host value, instead it sets it to 64KB.
This means host userspace can learn a few bits of host memory.

Fix by reloading GDTR when we load other host state.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 08:10:18 +03:00
Xiao Guangrong
9a3aad7057 KVM: MMU: using __xchg_spte more smarter
Sometimes, atomically set spte is not needed, this patch call __xchg_spte()
more smartly

Note: if the old mapping's access bit is already set, we no need atomic operation
since the access bit is not lost

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02 06:41:01 +03:00
Xiao Guangrong
e4b502ead2 KVM: MMU: cleanup spte set and accssed/dirty tracking
Introduce set_spte_track_bits() to cleanup current code

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02 06:41:00 +03:00
Xiao Guangrong
be233d49ea KVM: MMU: don't atomicly set spte if it's not present
If the old mapping is not present, the spte.a is not lost, so no need
atomic operation to set it

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02 06:40:59 +03:00
Xiao Guangrong
9ed5520dd3 KVM: MMU: fix page dirty tracking lost while sync page
In sync-page path, if spte.writable is changed, it will lose page dirty
tracking, for example:

assume spte.writable = 0 in a unsync-page, when it's synced, it map spte
to writable(that is spte.writable = 1), later guest write spte.gfn, it means
spte.gfn is dirty, then guest changed this mapping to read-only, after it's
synced,  spte.writable = 0

So, when host release the spte, it detect spte.writable = 0 and not mark page
dirty

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02 06:40:58 +03:00
Xiao Guangrong
daa3db693c KVM: MMU: fix broken page accessed tracking with ept enabled
In current code, if ept is enabled(shadow_accessed_mask = 0), the page
accessed tracking is lost.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02 06:40:57 +03:00
Xiao Guangrong
fa1de2bfc0 KVM: MMU: add missing reserved bits check in speculative path
In the speculative path, we should check guest pte's reserved bits just as
the real processor does

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02 06:40:56 +03:00
Andrea Arcangeli
6e3e243c3b KVM: MMU: fix mmu notifier invalidate handler for huge spte
The index wasn't calculated correctly (off by one) for huge spte so KVM guest
was unstable with transparent hugepages.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02 06:40:54 +03:00
Wei Yongjun
c19b8bd60e KVM: x86 emulator: fix xchg instruction emulation
If the destination is a memory operand and the memory cannot
map to a valid page, the xchg instruction emulation and locked
instruction will not work on io regions and stuck in endless
loop. We should emulate exchange as write to fix it.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:53 +03:00
Gleb Natapov
9195c4da26 KVM: x86: Call mask notifiers from pic
If pit delivers interrupt while pic is masking it OS will never do EOI
and ack notifier will not be called so when pit will be unmasked no pit
interrupts will be delivered any more. Calling mask notifiers solves this
issue.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:52 +03:00
Gleb Natapov
68be080345 KVM: x86: never re-execute instruction with enabled tdp
With tdp enabled we should get into emulator only when emulating io, so
reexecution will always bring us back into emulator.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:51 +03:00
Gleb Natapov
c0e0608cb9 KVM: x86: emulator: inc/dec can have lock prefix
Mark inc (0xfe/0 0xff/0) and dec (0xfe/1 0xff/1) as lock prefix capable.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:49 +03:00
Avi Kivity
24157aaf83 KVM: MMU: Eliminate redundant temporaries in FNAME(fetch)
'level' and 'sptep' are aliases for 'interator.level' and 'iterator.sptep', no
need for them.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:48 +03:00
Avi Kivity
5991b33237 KVM: MMU: Validate all gptes during fetch, not just those used for new pages
Currently, when we fetch an spte, we only verify that gptes match those that
the walker saw if we build new shadow pages for them.

However, this misses the following race:

  vcpu1            vcpu2

  walk
                  change gpte
                  walk
                  instantiate sp

  fetch existing sp

Fix by validating every gpte, regardless of whether it is used for building
a new sp or not.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:47 +03:00
Avi Kivity
0b3c933302 KVM: MMU: Simplify spte fetch() function
Partition the function into three sections:

- fetching indirect shadow pages (host_level > guest_level)
- fetching direct shadow pages (page_level < host_level <= guest_level)
- the final spte (page_level == host_level)

Instead of the current spaghetti.

A slight change from the original code is that we call validate_direct_spte()
more often: previously we called it only for gw->level, now we also call it for
lower levels.  The change should have no effect.

[xiao: fix regression caused by validate_direct_spte() called too late]

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:45 +03:00
Avi Kivity
39c8c672a1 KVM: MMU: Add gpte_valid() helper
Move the code to check whether a gpte has changed since we fetched it into
a helper.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:44 +03:00
Avi Kivity
a357bd229c KVM: MMU: Add validate_direct_spte() helper
Add a helper to verify that a direct shadow page is valid wrt the required
access permissions; drop the page if it is not valid.

Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:43 +03:00
Avi Kivity
a3aa51cfaa KVM: MMU: Add drop_large_spte() helper
To clarify spte fetching code, move large spte handling into a helper.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:42 +03:00
Avi Kivity
121eee97a7 KVM: MMU: Use __set_spte to link shadow pages
To avoid split accesses to 64 bit sptes on i386, use __set_spte() to link
shadow pages together.

(not technically required since shadow pages are __GFP_KERNEL, so upper 32
bits are always clear)

Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 06:40:41 +03:00