Commit graph

310 commits

Author SHA1 Message Date
Linus Torvalds
d4c6fa73fe Features:
- PV multiconsole support, so that there can be hvc1, hvc2, etc;
  - P-state and C-state power management driver that uploads said
    power management data to the hypervisor. It also inhibits cpufreq
    scaling drivers to load so that only the hypervisor can make power
    management decisions - fixing a weird perf bug.
  - Function Level Reset (FLR) support in the Xen PCI backend.
 Fixes:
  - Kconfig dependencies for Xen PV keyboard and video
  - Compile warnings and constify fixes
  - Change over to use percpu_xxx instead of this_cpu_xxx
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPZ0qkAAoJEFjIrFwIi8fJjCgH/jeJ39E8ML8DP9tCS2HQnMqM
 uTEjLcqvoJ7sEhHvtBLPeG2p0jyBvOWjLbSc7P8nESBAMPvSYol8L6WqfWrdSU4r
 lHrma2sg9UYzRog5NyxAgkp7bBsBBFOnhVL3Cxb5Ig78cPWzeSWGpqGZ8M/d51Wf
 1iE0tHuU4DpN+fg1SZqPqEm8ecEJ/eSrVTnyTx/Qo2Ak+Zw98SqzX7SV5lo8mudd
 WFL1F2K9FyTNk79ndGhqFt36x6nEbFgMLbmCDWumLuWN6bMd1Uq0wNkCqW4F1h28
 3yqnY+rfQh4y3eXK1B9nttCUTs+/66U5ZWrT6B1IJumGTAIqcWfgeUX/Vn/HVC4=
 =tfMc
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull xen updates from Konrad Rzeszutek Wilk:
 "which has three neat features:

   - PV multiconsole support, so that there can be hvc1, hvc2, etc; This
     can be used in HVM and in PV mode.

   - P-state and C-state power management driver that uploads said power
     management data to the hypervisor.  It also inhibits cpufreq
     scaling drivers to load so that only the hypervisor can make power
     management decisions - fixing a weird perf bug.

     There is one thing in the Kconfig that you won't like: "default y
     if (X86_ACPI_CPUFREQ = y || X86_POWERNOW_K8 = y)" (note, that it
     all depends on CONFIG_XEN which depends on CONFIG_PARAVIRT which by
     default is off).  I've a fix to convert that boolean expression
     into "default m" which I am going to post after the cpufreq git
     pull - as the two patches to make this work depend on a fix in Dave
     Jones's tree.

   - Function Level Reset (FLR) support in the Xen PCI backend.

  Fixes:

   - Kconfig dependencies for Xen PV keyboard and video
   - Compile warnings and constify fixes
   - Change over to use percpu_xxx instead of this_cpu_xxx"

Fix up trivial conflicts in drivers/tty/hvc/hvc_xen.c due to changes to
a removed commit.

* tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
  xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.
  xen: constify all instances of "struct attribute_group"
  xen/xenbus: ignore console/0
  hvc_xen: introduce HVC_XEN_FRONTEND
  hvc_xen: implement multiconsole support
  hvc_xen: support PV on HVM consoles
  xenbus: don't free other end details too early
  xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
  xen/setup/pm/acpi: Remove the call to boot_option_idle_override.
  xenbus: address compiler warnings
  xen: use this_cpu_xxx replace percpu_xxx funcs
  xen/pciback: Support pci_reset_function, aka FLR or D3 support.
  pci: Introduce __pci_reset_function_locked to be used when holding device_lock.
  xen: Utilize the restore_msi_irqs hook.
2012-03-22 20:16:14 -07:00
Konrad Rzeszutek Wilk
73c154c60b xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
For the hypervisor to take advantage of the MWAIT support it needs
to extract from the ACPI _CST the register address. But the
hypervisor does not have the support to parse DSDT so it relies on
the initial domain (dom0) to parse the ACPI Power Management information
and push it up to the hypervisor. The pushing of the data is done
by the processor_harveset_xen module which parses the information that
the ACPI parser has graciously exposed in 'struct acpi_processor'.

For the ACPI parser to also expose the Cx states for MWAIT, we need
to expose the MWAIT capability (leaf 1). Furthermore we also need to
expose the MWAIT_LEAF capability (leaf 5) for cstate.c to properly
function.

The hypervisor could expose these flags when it traps the XEN_EMULATE_PREFIX
operations, but it can't do it since it needs to be backwards compatible.
Instead we choose to use the native CPUID to figure out if the MWAIT
capability exists and use the XEN_SET_PDC query hypercall to figure out
if the hypervisor wants us to expose the MWAIT_LEAF capability or not.

Note: The XEN_SET_PDC query was implemented in c/s 23783:
"ACPI: add _PDC input override mechanism".

With this in place, instead of
 C3 ACPI IOPORT 415
we get now
 C3:ACPI FFH INTEL MWAIT 0x20

Note: The cpu_idle which would be calling the mwait variants for idling
never gets set b/c we set the default pm_idle to be the hypercall variant.

Acked-by: Jan Beulich <JBeulich@suse.com>
[v2: Fix missing header file include and #ifdef]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-10 12:44:44 -05:00
Konrad Rzeszutek Wilk
8eaffa67b4 xen/pat: Disable PAT support for now.
[Pls also look at https://lkml.org/lkml/2012/2/10/228]

Using of PAT to change pages from WB to WC works quite nicely.
Changing it back to WB - not so much. The crux of the matter is
that the code that does this (__page_change_att_set_clr) has only
limited information so when it tries to the change it gets
the "raw" unfiltered information instead of the properly filtered one -
and the "raw" one tell it that PSE bit is on (while infact it
is not).  As a result when the PTE is set to be WB from WC, we get
tons of:

:WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0x67/0xa0()
:Hardware name: HP xw4400 Workstation
.. snip..
:Pid: 27, comm: kswapd0 Tainted: G        W    3.2.2-1.fc16.x86_64 #1
:Call Trace:
: [<ffffffff8106dd1f>] warn_slowpath_common+0x7f/0xc0
: [<ffffffff8106dd7a>] warn_slowpath_null+0x1a/0x20
: [<ffffffff81005a17>] xen_make_pte+0x67/0xa0
: [<ffffffff810051bd>] __raw_callee_save_xen_make_pte+0x11/0x1e
: [<ffffffff81040e15>] ? __change_page_attr_set_clr+0x9d5/0xc00
: [<ffffffff8114c2e8>] ? __purge_vmap_area_lazy+0x158/0x1d0
: [<ffffffff8114cca5>] ? vm_unmap_aliases+0x175/0x190
: [<ffffffff81041168>] change_page_attr_set_clr+0x128/0x4c0
: [<ffffffff81041542>] set_pages_array_wb+0x42/0xa0
: [<ffffffff8100a9b2>] ? check_events+0x12/0x20
: [<ffffffffa0074d4c>] ttm_pages_put+0x1c/0x70 [ttm]
: [<ffffffffa0074e98>] ttm_page_pool_free+0xf8/0x180 [ttm]
: [<ffffffffa0074f78>] ttm_pool_mm_shrink+0x58/0x90 [ttm]
: [<ffffffff8112ba04>] shrink_slab+0x154/0x310
: [<ffffffff8112f17a>] balance_pgdat+0x4fa/0x6c0
: [<ffffffff8112f4b8>] kswapd+0x178/0x3d0
: [<ffffffff815df134>] ? __schedule+0x3d4/0x8c0
: [<ffffffff81090410>] ? remove_wait_queue+0x50/0x50
: [<ffffffff8112f340>] ? balance_pgdat+0x6c0/0x6c0
: [<ffffffff8108fb6c>] kthread+0x8c/0xa0

for every page. The proper fix for this is has been posted
and is https://lkml.org/lkml/2012/2/10/228
"x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations."
along with a detailed description of the problem and solution.

But since that posting has gone nowhere I am proposing
this band-aid solution so that at least users don't get
the page corruption (the pages that are WC don't get changed to WB
and end up being recycled for filesystem or other things causing
mysterious crashes).

The negative impact of this patch is that users of WC flag
(which are InfiniBand, radeon, nouveau drivers) won't be able
to set that flag - so they are going to see performance degradation.
But stability is more important here.

Fixes RH BZ# 742032, 787403, and 745574
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-02-20 10:41:35 -05:00
Konrad Rzeszutek Wilk
416d721474 xen/setup: Remove redundant filtering of PTE masks.
commit 7347b4082e "xen: Allow
unprivileged Xen domains to create iomap pages" added a redundant
line in the early bootup code to filter out the PTE. That
filtering is already done a bit earlier so this extra processing
is not required.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-02-20 10:40:54 -05:00
Alex Shi
2113f46916 xen: use this_cpu_xxx replace percpu_xxx funcs
percpu_xxx funcs are duplicated with this_cpu_xxx funcs, so replace them
for further code clean up.

I don't know much of xen code. But, since the code is in x86 architecture,
the percpu_xxx is exactly same as this_cpu_xxx serials functions. So, the
change is safe.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-01-24 12:20:24 -05:00
Tejun Heo
fe091c208a memblock: Kill memblock_init()
memblock_init() initializes arrays for regions and memblock itself;
however, all these can be done with struct initializers and
memblock_init() can be removed.  This patch kills memblock_init() and
initializes memblock with struct initializer.

The only difference is that the first dummy entries don't have .nid
set to MAX_NUMNODES initially.  This doesn't cause any behavior
difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:07 -08:00
Zhenzhong Duan
90d4f5534d xen:pvhvm: enable PVHVM VCPU placement when using more than 32 CPUs.
PVHVM running with more than 32 vcpus and pv_irq/pv_time enabled
need VCPU placement to work, or else it will softlockup.

CC: stable@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-16 12:13:44 -05:00
Konrad Rzeszutek Wilk
5e28783013 xen/enlighten: Fix compile warnings and set cx to known value.
We get:
linux/arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’:
linux/arch/x86/xen/enlighten.c:226: warning: ‘cx’ may be used uninitialized in this function
linux/arch/x86/xen/enlighten.c:240: note: ‘cx’ was declared here

and the cx is really not set but passed in the xen_cpuid instruction
which masks the value with returned masked_ecx from cpuid. This
can potentially lead to invalid data being stored in cx.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-19 17:03:31 -04:00
Linus Torvalds
4762e252f4 Merge branch 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/tracing: Fix tracing config option properly
  xen: Do not enable PV IPIs when vector callback not present
  xen/x86: replace order-based range checking of M2P table by linear one
  xen: xen-selfballoon.c needs more header files
2011-08-22 11:25:44 -07:00
Jan Beulich
ccbcdf7cf1 xen/x86: replace order-based range checking of M2P table by linear one
The order-based approach is not only less efficient (requiring a shift
and a compare, typical generated code looking like this

	mov	eax, [machine_to_phys_order]
	mov	ecx, eax
	shr	ebx, cl
	test	ebx, ebx
	jnz	...

whereas a direct check requires just a compare, like in

	cmp	ebx, [machine_to_phys_nr]
	jae	...

), but also slightly dangerous in the 32-on-64 case - the element
address calculation can wrap if the next power of two boundary is
sufficiently far away from the actual upper limit of the table, and
hence can result in user space addresses being accessed (with it being
unknown what may actually be mapped there).

Additionally, the elimination of the mistaken use of fls() here (should
have been __fls()) fixes a latent issue on x86-64 that would trigger
if the code was run on a system with memory extending beyond the 44-bit
boundary.

CC: stable@kernel.org
Signed-off-by: Jan Beulich <jbeulich@novell.com>
[v1: Based on Jeremy's feedback]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-08-17 10:26:48 -04:00
Linus Torvalds
06e727d2a5 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip:
  x86-64: Rework vsyscall emulation and add vsyscall= parameter
  x86-64: Wire up getcpu syscall
  x86: Remove unnecessary compile flag tweaks for vsyscall code
  x86-64: Add vsyscall:emulate_vsyscall trace event
  x86-64: Add user_64bit_mode paravirt op
  x86-64, xen: Enable the vvar mapping
  x86-64: Work around gold bug 13023
  x86-64: Move the "user" vsyscall segment out of the data segment.
  x86-64: Pad vDSO to a page boundary
2011-08-12 20:46:24 -07:00
Andy Lutomirski
318f5a2a67 x86-64: Add user_64bit_mode paravirt op
Three places in the kernel assume that the only long mode CPL 3
selector is __USER_CS.  This is not true on Xen -- Xen's sysretq
changes cs to the magic value 0xe033.

Two of the places are corner cases, but as of "x86-64: Improve
vsyscall emulation CS and RIP handling"
(c9712944b2), vsyscalls will segfault
if called with Xen's extra CS selector.  This causes a panic when
older init builds die.

It seems impossible to make Xen use __USER_CS reliably without
taking a performance hit on every system call, so this fixes the
tests instead with a new paravirt op.  It's a little ugly because
ptrace.h can't include paravirt.h.

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-04 16:13:49 -07:00
Linus Torvalds
c61264f98c Merge branch 'upstream/xen-tracing2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/xen-tracing2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen/trace: use class for multicall trace
  xen/trace: convert mmu events to use DECLARE_EVENT_CLASS()/DEFINE_EVENT()
  xen/multicall: move *idx fields to start of mc_buffer
  xen/multicall: special-case singleton hypercalls
  xen/multicalls: add unlikely around slowpath in __xen_mc_entry()
  xen/multicalls: disable MC_DEBUG
  xen/mmu: tune pgtable alloc/release
  xen/mmu: use extend_args for more mmuext updates
  xen/trace: add tlb flush tracepoints
  xen/trace: add segment desc tracing
  xen/trace: add xen_pgd_(un)pin tracepoints
  xen/trace: add ptpage alloc/release tracepoints
  xen/trace: add mmu tracepoints
  xen/trace: add multicall tracing
  xen/trace: set up tracepoint skeleton
  xen/multicalls: remove debugfs stats
  trace/xen: add skeleton for Xen trace events
2011-07-24 09:06:47 -07:00
Jeremy Fitzhardinge
ab78f7ad2c xen/trace: add segment desc tracing
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2011-07-18 15:43:27 -07:00
Konrad Rzeszutek Wilk
f7fdd84e04 Merge branch 'stable/vga.support' into stable/drivers
* stable/vga.support:
  xen: allow enable use of VGA console on dom0
2011-06-21 09:25:41 -04:00
Tom Goetz
b2abe50688 xen: When calling power_off, don't call the halt function.
.. As it won't actually power off the machine.

Reported-by: Sven Köhler <sven.koehler@gmail.com>
Tested-by: Sven Köhler <sven.koehler@gmail.com>
Signed-off-by: Tom Goetz <tom.goetz@virtualcomputer.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-06-15 16:48:29 -04:00
Jeremy Fitzhardinge
c2419b4a47 xen: allow enable use of VGA console on dom0
Get the information about the VGA console hardware from Xen, and put
it into the form the bootloader normally generates, so that the rest
of the kernel can deal with VGA as usual.

[ Impact: make VGA console work in dom0 ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Rebased on 2.6.39]
[v2: Removed incorrect comments and fixed compile warnings]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-06-06 11:46:00 -04:00
Daniel Kiper
ad3062a0f4 arch/x86/xen/enlighten: Cleanup code/data sections definitions
Cleanup code/data sections definitions
accordingly to include/linux/init.h.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12 17:19:34 -04:00
Shan Haitao
947ccf9c3c xen: Allow PV-OPS kernel to detect whether XSAVE is supported
Xen fails to mask XSAVE from the cpuid feature, despite not historically
supporting guest use of XSAVE.  However, now that XSAVE support has been
added to Xen, we need to reliably detect its presence.

The most reliable way to do this is to look at the OSXSAVE feature in
cpuid which is set iff the OS (Xen, in this case), has set
CR4.OSXSAVE.

[ Cleaned up conditional a bit. - Jeremy ]

Signed-off-by: Shan Haitao <haitao.shan@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-06 08:31:13 -04:00
Jeremy Fitzhardinge
61f4237d5b xen: just completely disable XSAVE
Some (old) versions of Xen just kill the domain if it tries to set any
unknown bits in CR4, so we can't reliably probe for OSXSAVE in
CR4.

Since Xen doesn't support XSAVE for guests at the moment, and no such
support is being worked on, there's no downside in just unconditionally
masking XSAVE support.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-06 08:31:00 -04:00
Linus Torvalds
76ca078328 Merge branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: suspend: remove xen_hvm_suspend
  xen: suspend: pull pre/post suspend hooks out into suspend_info
  xen: suspend: move arch specific pre/post suspend hooks into generic hooks
  xen: suspend: refactor non-arch specific pre/post suspend hooks
  xen: suspend: add "arch" to pre/post suspend hooks
  xen: suspend: pass extra hypercall argument via suspend_info struct
  xen: suspend: refactor cancellation flag into a structure
  xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding
  xen: switch to new schedop hypercall by default.
  xen: use new schedop interface for suspend
  xen: do not respond to unknown xenstore control requests
  xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled
  xen: PV on HVM: support PV spinlocks and IPIs
  xen: make the ballon driver work for hvm domains
  xen-blkfront: handle Xen major numbers other than XENVBD
  xen: do not use xen_info on HVM, set pv_info name to "Xen HVM"
  xen: no need to delay xen_setup_shutdown_event for hvm guests anymore
2011-03-15 10:59:09 -07:00
Stefano Stabellini
99bbb3a84a xen: PV on HVM: support PV spinlocks and IPIs
Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus
(that switch to APIC mode and initialize APIC routing); on secondary
CPUs on CPU_UP_PREPARE.

Enable the usage of event channels to send and receive IPIs when
running as a PV on HVM guest.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2011-02-25 16:43:06 +00:00
Stefano Stabellini
cff520b9c2 xen: do not use xen_info on HVM, set pv_info name to "Xen HVM"
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
2011-02-25 16:43:04 +00:00
Ian Campbell
44b46c3ef8 xen: annotate functions which only call into __init at start of day
Both xen_hvm_init_shared_info and xen_build_mfn_list_list can be
called at resume time as well as at start of day but only reference
__init functions (extend_brk) at start of day. Hence annotate with
__ref.

    WARNING: arch/x86/built-in.o(.text+0x4f1): Section mismatch in reference
        from the function xen_hvm_init_shared_info() to the function
        .init.text:extend_brk()
    The function xen_hvm_init_shared_info() references
    the function __init extend_brk().
    This is often because xen_hvm_init_shared_info lacks a __init
    annotation or the annotation of extend_brk is wrong.

xen_hvm_init_shared_info calls extend_brk() iff !shared_info_page and
initialises shared_info_page with the result. This happens at start of
day only.

    WARNING: arch/x86/built-in.o(.text+0x599b): Section mismatch in reference
        from the function xen_build_mfn_list_list() to the function
        .init.text:extend_brk()
    The function xen_build_mfn_list_list() references
    the function __init extend_brk().
    This is often because xen_build_mfn_list_list lacks a __init
    annotation or the annotation of extend_brk is wrong.

(this warning occurs multiple times)

xen_build_mfn_list_list only calls extend_brk() at boot time, while
building the initial mfn list list

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-02-11 14:46:34 -05:00
Tejun Heo
2ce802f62b lockdep: Move early boot local IRQ enable/disable status to init/main.c
During early boot, local IRQ is disabled until IRQ subsystem is
properly initialized.  During this time, no one should enable
local IRQ and some operations which usually are not allowed with
IRQ disabled, e.g. operations which might sleep or require
communications with other processors, are allowed.

lockdep tracked this with early_boot_irqs_off/on() callbacks.
As other subsystems need this information too, move it to
init/main.c and make it generally available.  While at it,
toggle the boolean to early_boot_irqs_disabled instead of
enabled so that it can be initialized with %false and %true
indicates the exceptional condition.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-20 13:32:33 +01:00
Linus Torvalds
9f99a2f0e4 Merge branch 'stable/bug-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/event: validate irq before get evtchn by irq
  xen/fb: fix potential memory leak
  xen/fb: fix xenfb suspend/resume race.
  xen: disable ACPI NUMA for PV guests
  xen/irq: Cleanup the find_unbound_irq
2011-01-10 08:48:46 -08:00
Linus Torvalds
8c8ae4e8cd Merge branch 'stable/generic' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/generic' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: HVM X2APIC support
  apic: Move hypervisor detection of x2apic to hypervisor.h
2011-01-10 08:48:29 -08:00
Ian Campbell
c1f5db1a53 xen: disable ACPI NUMA for PV guests
Xen does not currently expose PV-NUMA information to PV
guests. Therefore disable NUMA for the time being to prevent the
kernel picking up on an host-level NUMA information which it might
come across in the firmware.

[ Added comment - Jeremy ]

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-10 10:46:30 -05:00
Sheng Yang
d9b8ca8474 xen: HVM X2APIC support
This patch is similiar to Gleb Natapov's patch for KVM, which enable the
hypervisor to emulate x2apic feature for the guest. By this way, the emulation
of lapic would be simpler with x2apic interface(MSR), and faster.

[v2: Re-organized 'xen_hvm_need_lapic' per Ian Campbell suggestion]

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-01-07 10:03:50 -05:00
Christoph Lameter
780f36d8b3 xen: Use this_cpu_ops
Use this_cpu_ops to reduce code size and simplify things in various places.

V3->V4:
	Move instance of this_cpu_inc_return to a later patchset so that
	this patch can be applied without infrastructure changes.

Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-17 15:07:19 +01:00
Linus Torvalds
8338fded13 Merge branches 'upstream/core' and 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: allocate irq descs on any NUMA node
  xen: prevent crashes with non-HIGHMEM 32-bit kernels with largeish memory
  xen: use default_idle
  xen: clean up "extra" memory handling some more

* 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: x86/32: perform initial startup on initial_page_table
  xen: don't bother to stop other cpus on shutdown/reboot
2010-12-03 10:08:52 -08:00
Ian Campbell
805e3f4950 xen: x86/32: perform initial startup on initial_page_table
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
(which also starts on initial_page_table) switches to it.  This helps ensure
that the generic setup paths work on Xen unmodified. In particular
clone_pgd_range writes directly to the destination pgd entries and is used to
initialise swapper_pg_dir so we need to ensure that it remains writeable until
the last possible moment during bring up.

This is complicated slightly by the need to avoid sharing kernel PMD entries
when running under Xen, therefore the Xen implementation must make a copy of
the kernel PMD (which is otherwise referred to by both intial_page_table and
swapper_pg_dir) before switching to swapper_pg_dir.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-11-29 17:07:53 -08:00
Jeremy Fitzhardinge
31e323cca9 xen: don't bother to stop other cpus on shutdown/reboot
Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's
no need to do it manually.

In any case it will fail because all the IPI irqs have been pulled
down by this point, so the cross-CPU calls will simply hang forever.

Until change 76fac077db the function calls
were not synchronously waited for, so this wasn't apparent.  However after
that change the calls became synchronous leading to a hang on shutdown
on multi-VCPU guests.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Alok Kataria <akataria@vmware.com>
2010-11-29 14:16:53 -08:00
Linus Torvalds
8a3fbc9fdb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: remove duplicated #include
  xen: x86/32: perform initial startup on initial_page_table
2010-11-25 08:35:53 +09:00
Ian Campbell
5b5c1af104 xen: x86/32: perform initial startup on initial_page_table
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
(which also starts on initial_page_table) switches to it.  This helps ensure
that the generic setup paths work on Xen unmodified. In particular
clone_pgd_range writes directly to the destination pgd entries and is used to
initialise swapper_pg_dir so we need to ensure that it remains writeable until
the last possible moment during bring up.

This is complicated slightly by the need to avoid sharing kernel PMD entries
when running under Xen, therefore the Xen implementation must make a copy of
the kernel PMD (which is otherwise referred to by both intial_page_table and
swapper_pg_dir) before switching to swapper_pg_dir.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-11-24 12:07:44 -05:00
Konrad Rzeszutek Wilk
ec35a69c46 xen: set IO permission early (before early_cpu_init())
This patch is based off "xen dom0: Set up basic IO permissions for dom0."
by Juan Quintela <quintela@redhat.com>.

On AMD machines when we boot the kernel as Domain 0 we get this nasty:

mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) traps.c:475:d0 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.1-101116  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff8130271b>]
(XEN) RFLAGS: 0000000000000282   EM: 1   CONTEXT: pv guest
(XEN) rax: 000000008000c068   rbx: ffffffff8186c680   rcx: 0000000000000068
(XEN) rdx: 0000000000000cf8   rsi: 000000000000c000   rdi: 0000000000000000
(XEN) rbp: ffffffff81801e98   rsp: ffffffff81801e50   r8:  ffffffff81801eac
(XEN) r9:  ffffffff81801ea8   r10: ffffffff81801eb4   r11: 00000000ffffffff
(XEN) r12: ffffffff8186c694   r13: ffffffff81801f90   r14: ffffffffffffffff
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 0000000221803000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffffffff81801e50:

RIP points to read_pci_config() function.

The issue is that we don't set IO permissions for the Linux kernel early enough.

The call sequence used to be:

    xen_start_kernel()
	x86_init.oem.arch_setup = xen_setup_arch;
        setup_arch:
           - early_cpu_init
               - early_init_amd
                  - read_pci_config
           - x86_init.oem.arch_setup [ xen_arch_setup ]
               - set IO permissions.

We need to set the IO permissions earlier on, which this patch does.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-11-22 12:10:31 -08:00
Ian Campbell
7e77506a59 xen: implement XENMEM_machphys_mapping
This hypercall allows Xen to specify a non-default location for the
machine to physical mapping. This capability is used when running a 32
bit domain 0 on a 64 bit hypervisor to shrink the hypervisor hole to
exactly the size required.

[ Impact: add Xen hypercall definitions ]

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-11-12 15:00:06 -08:00
Linus Torvalds
18cb657ca1 Merge branch 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
and branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm

* 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: register xen pci notifier
  xen: initialize cpu masks for pv guests in xen_smp_init
  xen: add a missing #include to arch/x86/pci/xen.c
  xen: mask the MTRR feature from the cpuid
  xen: make hvc_xen console work for dom0.
  xen: add the direct mapping area for ISA bus access
  xen: Initialize xenbus for dom0.
  xen: use vcpu_ops to setup cpu masks
  xen: map a dummy page for local apic and ioapic in xen_set_fixmap
  xen: remap MSIs into pirqs when running as initial domain
  xen: remap GSIs as pirqs when running as initial domain
  xen: introduce XEN_DOM0 as a silent option
  xen: map MSIs into pirqs
  xen: support GSI -> pirq remapping in PV on HVM guests
  xen: add xen hvm acpi_register_gsi variant
  acpi: use indirect call to register gsi in different modes
  xen: implement xen_hvm_register_pirq
  xen: get the maximum number of pirqs from xen
  xen: support pirq != irq

* 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (27 commits)
  X86/PCI: Remove the dependency on isapnp_disable.
  xen: Update Makefile with CONFIG_BLOCK dependency for biomerge.c
  MAINTAINERS: Add myself to the Xen Hypervisor Interface and remove Chris Wright.
  x86: xen: Sanitse irq handling (part two)
  swiotlb-xen: On x86-32 builts, select SWIOTLB instead of depending on it.
  MAINTAINERS: Add myself for Xen PCI and Xen SWIOTLB maintainer.
  xen/pci: Request ACS when Xen-SWIOTLB is activated.
  xen-pcifront: Xen PCI frontend driver.
  xenbus: prevent warnings on unhandled enumeration values
  xenbus: Xen paravirtualised PCI hotplug support.
  xen/x86/PCI: Add support for the Xen PCI subsystem
  x86: Introduce x86_msi_ops
  msi: Introduce default_[teardown|setup]_msi_irqs with fallback.
  x86/PCI: Export pci_walk_bus function.
  x86/PCI: make sure _PAGE_IOMAP it set on pci mappings
  x86/PCI: Clean up pci_cache_line_size
  xen: fix shared irq device passthrough
  xen: Provide a variant of xen_poll_irq with timeout.
  xen: Find an unbound irq number in reverse order (high to low).
  xen: statically initialize cpu_evtchn_mask_p
  ...

Fix up trivial conflicts in drivers/pci/Makefile
2010-10-28 17:11:17 -07:00
Linus Torvalds
17bb51d56c Merge branch 'akpm-incoming-2'
* akpm-incoming-2: (139 commits)
  epoll: make epoll_wait() use the hrtimer range feature
  select: rename estimate_accuracy() to select_estimate_accuracy()
  Remove duplicate includes from many files
  ramoops: use the platform data structure instead of module params
  kernel/resource.c: handle reinsertion of an already-inserted resource
  kfifo: fix kfifo_alloc() to return a signed int value
  w1: don't allow arbitrary users to remove w1 devices
  alpha: remove dma64_addr_t usage
  mips: remove dma64_addr_t usage
  sparc: remove dma64_addr_t usage
  fuse: use release_pages()
  taskstats: use real microsecond granularity for CPU times
  taskstats: split fill_pid function
  taskstats: separate taskstats commands
  delayacct: align to 8 byte boundary on 64-bit systems
  delay-accounting: reimplement -c for getdelays.c to report information on a target command
  namespaces Kconfig: move namespace menu location after the cgroup
  namespaces Kconfig: remove the cgroup device whitelist experimental tag
  namespaces Kconfig: remove pointless cgroup dependency
  namespaces Kconfig: make namespace a submenu
  ...
2010-10-27 18:42:52 -07:00
Linus Torvalds
0671b7674f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  percpu: Remove the multi-page alignment facility
  x86-32: Allocate irq stacks seperate from percpu area
  x86-32, mm: Remove duplicated #include
  x86, printk: Get rid of <0> from stack output
  x86, kexec: Make sure to stop all CPUs before exiting the kernel
  x86/vsmp: Eliminate kconfig dependency warning
2010-10-27 18:38:55 -07:00
Zimny Lech
61d8e11e51 Remove duplicate includes from many files
Signed-off-by: Zimny Lech <napohybelskurwysynom2010@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:18 -07:00
Linus Torvalds
520045db94 Merge branches 'upstream/xenfs' and 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/xenfs' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen/privcmd: make privcmd visible in domU
  xen/privcmd: move remap_domain_mfn_range() to core xen code and export.
  privcmd: MMAPBATCH: Fix error handling/reporting
  xenbus: export xen_store_interface for xenfs
  xen/privcmd: make sure vma is ours before doing anything to it
  xen/privcmd: print SIGBUS faults
  xen/xenfs: set_page_dirty is supposed to return true if it dirties
  xen/privcmd: create address space to allow writable mmaps
  xen: add privcmd driver
  xen: add variable hypercall caller
  xen: add xen_set_domain_pte()
  xen: add /proc/xen/xsd_{kva,port} to xenfs

* 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (29 commits)
  xen: include xen/xen.h for definition of xen_initial_domain()
  xen: use host E820 map for dom0
  xen: correctly rebuild mfn list list after migration.
  xen: improvements to VIRQ_DEBUG output
  xen: set up IRQ before binding virq to evtchn
  xen: ensure that all event channels start off bound to VCPU 0
  xen/hvc: only notify if we actually sent something
  xen: don't add extra_pages for RAM after mem_end
  xen: add support for PAT
  xen: make sure xen_max_p2m_pfn is up to date
  xen: limit extra memory to a certain ratio of base
  xen: add extra pages for E820 RAM regions, even if beyond mem_end
  xen: make sure xen_extra_mem_start is beyond all non-RAM e820
  xen: implement "extra" memory to reserve space for pages not present at boot
  xen: Use host-provided E820 map
  xen: don't map missing memory
  xen: defer building p2m mfn structures until kernel is mapped
  xen: add return value to set_phys_to_machine()
  xen: convert p2m to a 3 level tree
  xen: make install_p2mtop_page() static
  ...

Fix up trivial conflict in arch/x86/xen/mmu.c, and fix the use of
'reserve_early()' - in the new memblock world order it is now
'memblock_x86_reserve_range()' instead. Pointed out by Jeremy.
2010-10-26 18:20:19 -07:00
Ingo Molnar
7d7a48b760 Merge branch 'linus' into x86/urgent
Merge reason: We want to queue up a dependent fix.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-25 19:38:52 +02:00
Stefano Stabellini
ff12849a7a xen: mask the MTRR feature from the cpuid
We don't want Linux to think that the cpu supports MTRRs when running
under Xen because MTRR operations could only be performed through
hypercalls.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:26:02 +01:00
Juan Quintela
4ec5387cc3 xen: add the direct mapping area for ISA bus access
add the direct mapping area for ISA bus access when running as initial
domain

Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:47 +01:00
Jeremy Fitzhardinge
41f2e4771a xen: add support for PAT
Convert Linux PAT entries into Xen ones when constructing ptes.  Linux
doesn't use _PAGE_PAT for ptes, so the only difference in the first 4
entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default)
use it for WT.

xen_pte_val does the inverse conversion.

We hard-code assumptions about Linux's current PAT layout, but a
warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems.
If necessary we could go to a more general table-based conversion between
Linux and Xen PAT entries.

hugetlbfs poses a problem at the moment, the x86 architecture uses the
same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending
on which pagetable level we're using.  At the moment this should be OK
so long as nobody tries to do a pte_val on a hugetlbfs pte.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:31 -07:00
Jeremy Fitzhardinge
33a847502b xen: defer building p2m mfn structures until kernel is mapped
When building mfn parts of p2m structure, we rely on being able to
use mfn_to_virt, which in turn requires kernel to be mapped into
the linear area (which is distinct from the kernel image mapping
on 64-bit).  Defer calling xen_build_mfn_list_list() until after
xen_setup_kernel_pagetable();

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge
1e17fc7eff xen: remove noise about registering vcpu info
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:21 -07:00
Alok Kataria
76fac077db x86, kexec: Make sure to stop all CPUs before exiting the kernel
x86 smp_ops now has a new op, stop_other_cpus which takes a parameter
"wait" this allows the caller to specify if it wants to stop until all
the cpus have processed the stop IPI.  This is required specifically
for the kexec case where we should wait for all the cpus to be stopped
before starting the new kernel.  We now wait for the cpus to stop in
all cases except for panic/kdump where we expect things to be broken
and we are doing our best to make things work anyway.

This patch fixes a legitimate regression, which was introduced during
2.6.30, by commit id 4ef702c10b.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: <stable@kernel.org> v2.6.30-36
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-21 13:30:44 -07:00
Alex Nixon
b5401a96b5 xen/x86/PCI: Add support for the Xen PCI subsystem
The frontend stub lives in arch/x86/pci/xen.c, alongside other
sub-arch PCI init code (e.g. olpc.c).

It provides a mechanism for Xen PCI frontend to setup/destroy
legacy interrupts, MSI/MSI-X, and PCI configuration operations.

[ Impact: add core of Xen PCI support ]
[ v2: Removed the IOMMU code and only focusing on PCI.]
[ v3: removed usage of pci_scan_all_fns as that does not exist]
[ v4: introduced pci_xen value to fix compile warnings]
[ v5: squished fixes+features in one patch, changed Reviewed-by to Ccs]
[ v7: added Acked-by]
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Qing He <qing.he@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
2010-10-18 10:49:35 -04:00
Jeremy Fitzhardinge
236260b90d memblock: Allow memblock_init to be called early
The Xen setup code needs to call memblock_x86_reserve_range() very early,
so allow it to initialize the memblock subsystem before doing so.  The
second memblock_init() is ignored.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <4CACFDAD.3090900@goop.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11 15:59:01 -07:00
Linus Torvalds
26f0cf9181 Merge branch 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  x86: Detect whether we should use Xen SWIOTLB.
  pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
  swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
  xen/mmu: inhibit vmap aliases rather than trying to clear them out
  vmap: add flag to allow lazy unmap to be disabled at runtime
  xen: Add xen_create_contiguous_region
  xen: Rename the balloon lock
  xen: Allow unprivileged Xen domains to create iomap pages
  xen: use _PAGE_IOMAP in ioremap to do machine mappings

Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
include/xen/xen-ops.h
2010-08-12 09:09:41 -07:00
Jeremy Fitzhardinge
ca50a5f390 Merge branch 'upstream/pvhvm' into upstream/xen
* upstream/pvhvm:
  Introduce CONFIG_XEN_PVHVM compile option
  blkfront: do not create a PV cdrom device if xen_hvm_guest
  support multiple .discard.* sections to avoid section type conflicts
  xen/pvhvm: fix build problem when !CONFIG_XEN
  xenfs: enable for HVM domains too
  x86: Call HVMOP_pagetable_dying on exit_mmap.
  x86: Unplug emulated disks and nics.
  x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
  xen: Fix find_unbound_irq in presence of ioapic irqs.
  xen: Add suspend/resume support for PV on HVM guests.
  xen: Xen PCI platform device driver.
  x86/xen: event channels delivery on HVM.
  x86: early PV on HVM features initialization.
  xen: Add support for HVM hypercalls.

Conflicts:
	arch/x86/xen/enlighten.c
	arch/x86/xen/time.c
2010-08-04 14:49:16 -07:00
Ian Campbell
086748e52f xen/panic: use xen_reboot and fix smp_send_stop
Offline vcpu when using stop_self.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-04 14:47:31 -07:00
Donald Dutile
f09f6d194d Xen: register panic notifier to take crashes of xen guests on panic
Register a panic notifier so that when the guest crashes it can shut
down the domain and indicate it was a crash to the host.

Signed-off-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-04 14:47:31 -07:00
Mukesh Rathor
c06ee78d73 xen: support large numbers of CPUs with vcpu info placement
When vcpu info placement is supported, we're not limited to MAX_VIRT_CPUS
vcpus.  However, if it isn't supported, then ignore any excess vcpus.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-04 14:47:30 -07:00
Jeremy Fitzhardinge
8a22b9996b xen: drop xen_sched_clock in favour of using plain wallclock time
xen_sched_clock only counts unstolen time.  In principle this should
be useful to the Linux scheduler so that it knows how much time a process
actually consumed.  But in practice this doesn't work very well as the
scheduler expects the sched_clock time to be synchronized between
cpus.  It also uses sched_clock to measure the time a task spends
sleeping, in which case "unstolen time" isn't meaningful.

So just use plain xen_clocksource_read to return wallclock nanoseconds
for sched_clock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-04 14:47:29 -07:00
Stefano Stabellini
ca65f9fc0c Introduce CONFIG_XEN_PVHVM compile option
This patch introduce a CONFIG_XEN_PVHVM compile time option to
enable/disable Xen PV on HVM support.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-07-29 11:11:33 -07:00
Stefano Stabellini
5915100106 x86: Call HVMOP_pagetable_dying on exit_mmap.
When a pagetable is about to be destroyed, we notify Xen so that the
hypervisor can clear the related shadow pagetable.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-26 23:13:26 -07:00
Stefano Stabellini
c1c5413ad5 x86: Unplug emulated disks and nics.
Add a xen_emul_unplug command line option to the kernel to unplug
xen emulated disks and nics.

Set the default value of xen_emul_unplug depending on whether or
not the Xen PV frontends and the Xen platform PCI driver have
been compiled for this kernel (modules or built-in are both OK).

The user can specify xen_emul_unplug=ignore to enable PV drivers on HVM
even if the host platform doesn't support unplug.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-26 23:13:25 -07:00
Stefano Stabellini
409771d258 x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
Use xen_vcpuop_clockevent instead of hpet and APIC timers as main
clockevent device on all vcpus, use the xen wallclock time as wallclock
instead of rtc and use xen_clocksource as clocksource.
The pv clock algorithm needs to work correctly for the xen_clocksource
and xen wallclock to be usable, only modern Xen versions offer a
reliable pv clock in HVM guests (XENFEAT_hvm_safe_pvclock).

Using the hpet as clocksource means a VMEXIT every time we read/write to
the hpet mmio addresses, pvclock give us a better rating without
VMEXITs. Same goes for the xen wallclock and xen_vcpuop_clockevent

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-26 23:13:25 -07:00
Stefano Stabellini
016b6f5fe8 xen: Add suspend/resume support for PV on HVM guests.
Suspend/resume requires few different things on HVM: the suspend
hypercall is different; we don't need to save/restore memory related
settings; except the shared info page and the callback mechanism.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-22 16:46:21 -07:00
Sheng Yang
38e20b07ef x86/xen: event channels delivery on HVM.
Set the callback to receive evtchns from Xen, using the
callback vector delivery mechanism.

The traditional way for receiving event channel notifications from Xen
is via the interrupts from the platform PCI device.
The callback vector is a newer alternative that allow us to receive
notifications on any vcpu and doesn't need any PCI support: we allocate
a vector exclusively to receive events, in the vector handler we don't
need to interact with the vlapic, therefore we avoid a VMEXIT.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-22 16:45:59 -07:00
Sheng Yang
bee6ab53e6 x86: early PV on HVM features initialization.
Initialize basic pv on hvm features adding a new Xen HVM specific
hypervisor_x86 structure.

Don't try to initialize xen-kbdfront and xen-fbfront when running on HVM
because the backends are not available.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-22 16:45:35 -07:00
Miroslav Rezanina
093d7b4639 xen: release unused free memory
Scan an e820 table and release any memory which lies between e820 entries,
as it won't be used and would just be wasted.  At present this is just to
release any memory beyond the end of the e820 map, but it will also deal
with holes being punched in the map.

Derived from patch by Miroslav Rezanina <mrezanin@redhat.com>

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-07-20 12:51:22 -07:00
Alex Nixon
7347b4082e xen: Allow unprivileged Xen domains to create iomap pages
PV DomU domains are allowed to map hardware MFNs for PCI passthrough,
but are not generally allowed to map raw machine pages.  In particular,
various pieces of code try to map DMI and ACPI tables in the ISA ROM
range.  We disallow _PAGE_IOMAP for those mappings, so that they are
redirected to a set of local zeroed pages we reserve for that purpose.

[ Impact: prevent passthrough of ISA space, as we only allow PCI ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-06-07 14:33:13 -04:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Ian Campbell
817a824b75 x86, xen: Disable highmem PTE allocation even when CONFIG_HIGHPTE=y
There's a path in the pagefault code where the kernel deliberately
breaks its own locking rules by kmapping a high pte page without
holding the pagetable lock (in at least page_check_address). This
breaks Xen's ability to track the pinned/unpinned state of the
page. There does not appear to be a viable workaround for this
behaviour so simply disable HIGHPTE for all Xen guests.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1267204562-11844-1-git-send-email-ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pasi Kärkkäinen <pasik@iki.fi>
Cc: <stable@kernel.org> # .32.x: 14315592: Allow highmem user page tables to be disabled at boot time
Cc: <stable@kernel.org> # .32.x
Cc: <xen-devel@lists.xensource.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-27 14:41:01 -08:00
Ian Campbell
e68266b700 x86: xen: 64-bit kernel RPL should be 0
Under Xen 64 bit guests actually run their kernel in ring 3,
however the hypervisor takes care of squashing descriptor the
RPLs transparently (in order to allow them to continue to
differentiate between user and kernel space CS using the RPL).
Therefore the Xen paravirt backend should use RPL==0 instead of
1 (or 3). Using RPL==1 causes generic arch code to take
incorrect code paths because it uses "testl $3, <CS>, je foo"
type tests for a userspace CS and this considers 1==userspace.

This issue was previously masked because get_kernel_rpl() was
omitted when setting CS in kernel_thread(). This was fixed when
kernel_thread() was unified with 32 bit in
f443ff4201.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1263377768-19600-2-git-send-email-ian.campbell@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13 11:23:54 +01:00
Linus Torvalds
11bd04f6f3 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (109 commits)
  PCI: fix coding style issue in pci_save_state()
  PCI: add pci_request_acs
  PCI: fix BUG_ON triggered by logical PCIe root port removal
  PCI: remove ifdefed pci_cleanup_aer_correct_error_status
  PCI: unconditionally clear AER uncorr status register during cleanup
  x86/PCI: claim SR-IOV BARs in pcibios_allocate_resource
  PCI: portdrv: remove redundant definitions
  PCI: portdrv: remove unnecessary struct pcie_port_data
  PCI: portdrv: minor cleanup for pcie_port_device_register
  PCI: portdrv: add missing irq cleanup
  PCI: portdrv: enable device before irq initialization
  PCI: portdrv: cleanup service irqs initialization
  PCI: portdrv: check capabilities first
  PCI: portdrv: move PME capability check
  PCI: portdrv: remove redundant pcie type calculation
  PCI: portdrv: cleanup pcie_device registration
  PCI: portdrv: remove redundant pcie_port_device_probe
  PCI: Always set prefetchable base/limit upper32 registers
  PCI: read-modify-write the pcie device control register when initiating pcie flr
  PCI: show dma_mask bits in /sys
  ...

Fixed up conflicts in:
	arch/x86/kernel/amd_iommu_init.c
	drivers/pci/dmar.c
	drivers/pci/hotplug/acpiphp_glue.c
2009-12-11 12:18:16 -08:00
Linus Torvalds
ab1831b0b8 Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: try harder to balloon up under memory pressure.
  Xen balloon: fix totalram_pages counting.
  xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
  xen: improve error handling in do_suspend.
  xen: don't leak IRQs over suspend/resume.
  xen: call clock resume notifier on all CPUs
  xen: use iret for return from 64b kernel to 32b usermode
  xen: don't call dpm_resume_noirq() with interrupts disabled.
  xen: register runstate info for boot CPU early
  xen: register runstate on secondary CPUs
  xen: register timer interrupt with IRQF_TIMER
  xen: correctly restore pfn_to_mfn_list_list after resume
  xen: restore runstate_info even if !have_vcpu_info_placement
  xen: re-register runstate area earlier on resume.
  xen: wait up to 5 minutes for device connetion
  xen: improvement to wait_for_devices()
  xen: fix is_disconnected_device/exists_disconnected_device
  xen/xenbus: make DEVICE_ATTR()s static
2009-12-10 09:35:02 -08:00
Linus Torvalds
e33c019722 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
  x86, mm: Correct the implementation of is_untracked_pat_range()
  x86/pat: Trivial: don't create debugfs for memtype if pat is disabled
  x86, mtrr: Fix sorting of mtrr after subtracting
  x86: Move find_smp_config() earlier and avoid bootmem usage
  x86, platform: Change is_untracked_pat_range() to bool; cleanup init
  x86: Change is_ISA_range() into an inline function
  x86, mm: is_untracked_pat_range() takes a normal semiclosed range
  x86, mm: Call is_untracked_pat_range() rather than is_ISA_range()
  x86: UV SGI: Don't track GRU space in PAT
  x86: SGI UV: Fix BAU initialization
  x86, numa: Use near(er) online node instead of roundrobin for NUMA
  x86, numa, bootmem: Only free bootmem on NUMA failure path
  x86: Change crash kernel to reserve via reserve_early()
  x86: Eliminate redundant/contradicting cache line size config options
  x86: When cleaning MTRRs, do not fold WP into UC
  x86: remove "extern" from function prototypes in <asm/proto.h>
  x86, mm: Report state of NX protections during boot
  x86, mm: Clean up and simplify NX enablement
  x86, pageattr: Make set_memory_(x|nx) aware of NX support
  x86, sleep: Always save the value of EFER
  ...

Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range)
to 'struct x86_platform_ops') in
	arch/x86/include/asm/x86_init.h
	arch/x86/kernel/x86_init.c
2009-12-08 13:27:33 -08:00
Chris Wright
5d990b6275 PCI: add pci_request_acs
Commit ae21ee65e8 "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.

Add a function to pci core to allow an IOMMU to request that ACS
be enabled.  The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order;  iommu has only been detected not initialized.

Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.

Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-12-04 16:19:24 -08:00
Jeremy Fitzhardinge
499d19b82b xen: register runstate info for boot CPU early
printk timestamping uses sched_clock, which in turn relies on runstate
info under Xen.  So make sure we set it up before any printks can
be called.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:53 -08:00
Jeremy Fitzhardinge
3905bb2aa7 xen: restore runstate_info even if !have_vcpu_info_placement
Even if have_vcpu_info_placement is not set, we still need to set up
the runstate area on each resumed vcpu.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:51 -08:00
Ian Campbell
be012920ec xen: re-register runstate area earlier on resume.
This is necessary to ensure the runstate area is available to
xen_sched_clock before any calls to printk which will require it in
order to provide a timestamp.

I chose to pull the xen_setup_runstate_info out of xen_time_init into
the caller in order to maintain parity with calling
xen_setup_runstate_info separately from calling xen_time_resume.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:50 -08:00
H. Peter Anvin
4763ed4d45 x86, mm: Clean up and simplify NX enablement
The 32- and 64-bit code used very different mechanisms for enabling
NX, but even the 32-bit code was enabling NX in head_32.S if it is
available.  Furthermore, we had a bewildering collection of tests for
the available of NX.

This patch:

a) merges the 32-bit set_nx() and the 64-bit check_efer() function
   into a single x86_configure_nx() function.  EFER control is left
   to the head code.

b) eliminates the nx_enabled variable entirely.  Things that need to
   test for NX enablement can verify __supported_pte_mask directly,
   and cpu_has_nx gives the supported status of NX.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2009-11-16 13:44:59 -08:00
Jeremy Fitzhardinge
1ccbf5344c xen: move Xen-testing predicates to common header
Move xen_domain and related tests out of asm-x86 to xen/xen.h so they
can be included whenever they are necessary.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:24 -08:00
Jeremy Fitzhardinge
82d6469916 xen: mask extended topology info in cpuid
A Xen guest never needs to know about extended topology, and knowing
would just confuse it.

This patch just zeros ebx in leaf 0xb which indicates no topology info,
preventing a crash under Xen on cpus which support this leaf.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-11-03 11:09:12 -08:00
Jeremy Fitzhardinge
973df35ed9 xen: set up mmu_ops before trying to set any ptes
xen_setup_stackprotector() ends up trying to set page protections,
so we need to have vm_mmu_ops set up before trying to do so.
Failing to do so causes an early boot crash.

[ Impact: Fix early crash under Xen. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-10-27 16:54:19 -07:00
Ingo Molnar
14c93e8eba Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/urgent 2009-09-23 14:35:10 +02:00
Jeremy Fitzhardinge
b75fe4e5b8 xen: check EFER for NX before setting up GDT mapping
x86-64 assumes NX is available by default, so we need to
explicitly check for it before using NX.  Some first-generation
Intel x86-64 processors didn't support NX, and even recent systems
allow it to be disabled in BIOS.

[ Impact: prevent Xen crash on NX-less 64-bit machines ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-09-21 13:49:43 -07:00
Linus Torvalds
78f28b7c55 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits)
  x86: Move get/set_wallclock to x86_platform_ops
  x86: platform: Fix section annotations
  x86: apic namespace cleanup
  x86: Distangle ioapic and i8259
  x86: Add Moorestown early detection
  x86: Add hardware_subarch ID for Moorestown
  x86: Add early platform detection
  x86: Move tsc_init to late_time_init
  x86: Move tsc_calibration to x86_init_ops
  x86: Replace the now identical time_32/64.c by time.c
  x86: time_32/64.c unify profile_pc
  x86: Move calibrate_cpu to tsc.c
  x86: Make timer setup and global variables the same in time_32/64.c
  x86: Remove mca bus ifdef from timer interrupt
  x86: Simplify timer_ack magic in time_32.c
  x86: Prepare unification of time_32/64.c
  x86: Remove do_timer hook
  x86: Add timer_init to x86_init_ops
  x86: Move percpu clockevents setup to x86_init_ops
  x86: Move xen_post_allocator_init into xen_pagetable_setup_done
  ...

Fix up conflicts in arch/x86/include/asm/io_apic.h
2009-09-18 14:05:47 -07:00
Feng Tang
7bd867dfb4 x86: Move get/set_wallclock to x86_platform_ops
get/set_wallclock() have already a set of platform dependent
implementations (default, EFI, paravirt). MRST will add another
variant.

Moving them to platform ops simplifies the existing code and minimizes
the effort to integrate new variants.

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-09-16 14:34:50 +02:00
Linus Torvalds
b8cb48aae1 Merge branch 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: split __phys_addr out into separate file
  xen: use stronger barrier after unlocking lock
  xen: only enable interrupts while actually blocking for spinlock
  xen: make -fstack-protector work under Xen
2009-09-14 10:23:49 -07:00
Linus Torvalds
c7208de304 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
  x86: Fix code patching for paravirt-alternatives on 486
  x86, msr: change msr-reg.o to obj-y, and export its symbols
  x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus
  x86, sched: Workaround broken sched domain creation for AMD Magny-Cours
  x86, mcheck: Use correct cpumask for shared bank4
  x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors
  x86: Fix CPU llc_shared_map information for AMD Magny-Cours
  x86, msr: Fix msr-reg.S compilation with gas 2.16.1, on 32-bit too
  x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h
  x86, msr: fix msr-reg.S compilation with gas 2.16.1
  x86, msr: Export the register-setting MSR functions via /dev/*/msr
  x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs()
  x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT
  x86, msr: CFI annotations, cleanups for msr-reg.S
  x86, asm: Make _ASM_EXTABLE() usable from assembly code
  x86, asm: Add 32-bit versions of the combined CFI macros
  x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit
  x86, msr: Rewrite AMD rd/wrmsr variants
  x86, msr: Add rd/wrmsr interfaces with preset registers
  x86: add specific support for Intel Atom architecture
  ...
2009-09-14 07:57:32 -07:00
Jeremy Fitzhardinge
577eebeae3 xen: make -fstack-protector work under Xen
-fstack-protector uses a special per-cpu "stack canary" value.
gcc generates special code in each function to test the canary to make
sure that the function's stack hasn't been overrun.

On x86-64, this is simply an offset of %gs, which is the usual per-cpu
base segment register, so setting it up simply requires loading %gs's
base as normal.

On i386, the stack protector segment is %gs (rather than the usual kernel
percpu %fs segment register).  This requires setting up the full kernel
GDT and then loading %gs accordingly.  We also need to make sure %gs is
initialized when bringing up secondary cpus too.

To keep things consistent, we do the full GDT/segment register setup on
both architectures.

Because we need to avoid -fstack-protected code before setting up the GDT
and because there's no way to disable it on a per-function basis, several
files need to have stack-protector inhibited.

[ Impact: allow Xen booting with stack-protector enabled ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-09-09 16:37:39 -07:00
H. Peter Anvin
0cc0213e73 x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT
For some reason, the _safe MSR functions returned -EFAULT, not -EIO.
However, the only user which cares about the return code as anything
other than a boolean is the MSR driver, which wants -EIO.  Change it
to -EIO across the board.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-08-31 15:15:23 -07:00
Thomas Gleixner
2d826404f0 x86: Move tsc_calibration to x86_init_ops
TSC calibration is modified by the vmware hypervisor and paravirt by
separate means. Moorestown wants to add its own calibration routine as
well. So make calibrate_tsc a proper x86_init_ops function and
override it by paravirt or by the early setup of the vmware
hypervisor.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:47 +02:00
Thomas Gleixner
845b3944bb x86: Add timer_init to x86_init_ops
The timer init code is convoluted with several quirks and the paravirt
timer chooser. Figuring out which code path is actually taken is not
for the faint hearted.

Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and
replace the paravirt time chooser and the remaining x86 quirk with a
simple x86_init_ops function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
736decac64 x86: Move percpu clockevents setup to x86_init_ops
paravirt overrides the setup of the default apic timers as per cpu
timers. Moorestown needs to override that as well.

Move it to x86_init_ops setup and create a separate x86_cpuinit struct
which holds the function for the secondary evtl. hotplugabble CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:46 +02:00
Thomas Gleixner
f1d7062a23 x86: Move xen_post_allocator_init into xen_pagetable_setup_done
We really do not need two paravirt/x86_init_ops functions which are
called in two consecutive source lines. Move the only user of
post_allocator_init into the already existing pagetable_setup_done
function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
030cb6c00d x86: Move paravirt pagetable_setup to x86_init_ops
Replace more paravirt hackery by proper x86_init_ops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
6f30c1ac3f x86: Move paravirt banner printout to x86_init_ops
Replace another obscure paravirt magic and move it to
x86_init_ops. Such a hook is also useful for embedded and special
hardware.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
42bbdb43b1 x86: Replace ARCH_SETUP by a proper x86_init_ops
ARCH_SETUP is a horrible leftover from the old arch/i386 mach support
code. It still has a lonely user in xen. Move it to x86_init_ops.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31 09:35:45 +02:00
Thomas Gleixner
6b18ae3e2f x86: Move memory_setup to x86_init_ops
memory_setup is overridden by x86_quirks and by paravirts with weak
functions and quirks. Unify the whole mess and make it an
unconditional x86_init_ops function which defaults to the standard
function and can be overridden by the early platform code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27 17:12:52 +02:00
H. Peter Anvin
7adb4df410 x86, xen: Initialize cx to suppress warning
Initialize cx before calling xen_cpuid(), in order to suppress the
"may be used uninitialized in this function" warning.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
2009-08-25 21:10:32 -07:00
Jeremy Fitzhardinge
d560bc6157 x86, xen: Suppress WP test on Xen
Xen always runs on CPUs which properly support WP enforcement in
privileged mode, so there's no need to test for it.

This also works around a crash reported by Arnd Hannemann, though I
think its just a band-aid for that case.

Reported-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-25 21:10:32 -07:00
Ingo Molnar
cbcb340cb6 Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/urgent 2009-08-20 12:05:24 +02:00
Jeremy Fitzhardinge
ce2eef33d3 xen: rearrange things to fix stackprotector
Make sure the stack-protector segment registers are properly set up
before calling any functions which may have stack-protection compiled
into them.

[ Impact: prevent Xen early-boot crash when stack-protector is enabled ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-08-19 17:09:28 -07:00