2007-07-19 02:49:22 -06:00
|
|
|
#include <linux/linkage.h>
|
|
|
|
#include <linux/lguest.h>
|
2007-10-21 19:03:36 -06:00
|
|
|
#include <asm/lguest_hcall.h>
|
2007-07-19 02:49:22 -06:00
|
|
|
#include <asm/asm-offsets.h>
|
|
|
|
#include <asm/thread_info.h>
|
2007-07-20 06:12:56 -06:00
|
|
|
#include <asm/processor-flags.h>
|
lguest: populate initial_page_table
Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.
In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables. With
the first patch, the direct mapping tables used that memory:
Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000
I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...
2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.
This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir. So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.
For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org
2010-12-16 16:03:15 -07:00
|
|
|
#include <asm/pgtable.h>
|
2007-07-19 02:49:22 -06:00
|
|
|
|
2009-07-30 16:03:45 -06:00
|
|
|
/*G:020
|
|
|
|
* Our story starts with the kernel booting into startup_32 in
|
2008-03-28 10:05:53 -06:00
|
|
|
* arch/x86/kernel/head_32.S. It expects a boot header, which is created by
|
|
|
|
* the bootloader (the Launcher in our case).
|
|
|
|
*
|
|
|
|
* The startup_32 function does very little: it clears the uninitialized global
|
|
|
|
* C variables which we expect to be zero (ie. BSS) and then copies the boot
|
|
|
|
* header and kernel command line somewhere safe. Finally it checks the
|
|
|
|
* 'hardware_subarch' field. This was introduced in 2.6.24 for lguest and Xen:
|
|
|
|
* if it's set to '1' (lguest's assigned number), then it calls us here.
|
2007-10-21 19:03:36 -06:00
|
|
|
*
|
|
|
|
* WARNING: be very careful here! We're running at addresses equal to physical
|
|
|
|
* addesses (around 0), not above PAGE_OFFSET as most code expectes
|
|
|
|
* (eg. 0xC0000000). Jumps are relative, so they're OK, but we can't touch any
|
2008-03-28 10:05:53 -06:00
|
|
|
* data without remembering to subtract __PAGE_OFFSET!
|
2007-07-19 02:49:22 -06:00
|
|
|
*
|
2007-07-26 11:41:02 -06:00
|
|
|
* The .section line puts this code in .init.text so it will be discarded after
|
2009-07-30 16:03:45 -06:00
|
|
|
* boot.
|
|
|
|
*/
|
2007-07-19 02:49:22 -06:00
|
|
|
.section .init.text, "ax", @progbits
|
2007-10-21 19:29:44 -06:00
|
|
|
ENTRY(lguest_entry)
|
2009-07-30 16:03:45 -06:00
|
|
|
/*
|
|
|
|
* We make the "initialization" hypercall now to tell the Host about
|
|
|
|
* us, and also find out where it put our page tables.
|
|
|
|
*/
|
2007-10-21 19:03:36 -06:00
|
|
|
movl $LHCALL_LGUEST_INIT, %eax
|
2009-03-14 09:37:52 -06:00
|
|
|
movl $lguest_data - __PAGE_OFFSET, %ebx
|
2010-04-14 21:43:54 -06:00
|
|
|
int $LGUEST_TRAP_ENTRY
|
2007-10-21 19:03:36 -06:00
|
|
|
|
|
|
|
/* Set up the initial stack so we can run C code. */
|
|
|
|
movl $(init_thread_union+THREAD_SIZE),%esp
|
|
|
|
|
lguest: populate initial_page_table
Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.
In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables. With
the first patch, the direct mapping tables used that memory:
Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000
I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...
2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.
This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir. So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.
For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org
2010-12-16 16:03:15 -07:00
|
|
|
call init_pagetables
|
|
|
|
|
2009-07-30 16:03:45 -06:00
|
|
|
/* Jumps are relative: we're running __PAGE_OFFSET too low. */
|
2007-10-21 19:03:36 -06:00
|
|
|
jmp lguest_init+__PAGE_OFFSET
|
2007-07-19 02:49:22 -06:00
|
|
|
|
lguest: populate initial_page_table
Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.
In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables. With
the first patch, the direct mapping tables used that memory:
Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000
I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...
2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.
This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir. So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.
For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org
2010-12-16 16:03:15 -07:00
|
|
|
/*
|
|
|
|
* Initialize page tables. This creates a PDE and a set of page
|
|
|
|
* tables, which are located immediately beyond __brk_base. The variable
|
|
|
|
* _brk_end is set up to point to the first "safe" location.
|
|
|
|
* Mappings are created both at virtual address 0 (identity mapping)
|
|
|
|
* and PAGE_OFFSET for up to _end.
|
|
|
|
*
|
|
|
|
* FIXME: This code is taken verbatim from arch/x86/kernel/head_32.S: they
|
|
|
|
* don't have a stack at this point, so we can't just use call and ret.
|
|
|
|
*/
|
|
|
|
init_pagetables:
|
|
|
|
#if PTRS_PER_PMD > 1
|
|
|
|
#define PAGE_TABLE_SIZE(pages) (((pages) / PTRS_PER_PMD) + PTRS_PER_PGD)
|
|
|
|
#else
|
|
|
|
#define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)
|
|
|
|
#endif
|
|
|
|
#define pa(X) ((X) - __PAGE_OFFSET)
|
|
|
|
|
|
|
|
/* Enough space to fit pagetables for the low memory linear map */
|
|
|
|
MAPPING_BEYOND_END = \
|
|
|
|
PAGE_TABLE_SIZE(((1<<32) - __PAGE_OFFSET) >> PAGE_SHIFT) << PAGE_SHIFT
|
|
|
|
#ifdef CONFIG_X86_PAE
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In PAE mode initial_page_table is statically defined to contain
|
|
|
|
* enough entries to cover the VMSPLIT option (that is the top 1, 2 or 3
|
|
|
|
* entries). The identity mapping is handled by pointing two PGD entries
|
|
|
|
* to the first kernel PMD.
|
|
|
|
*
|
|
|
|
* Note the upper half of each PMD or PTE are always zero at this stage.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define KPMDS (((-__PAGE_OFFSET) >> 30) & 3) /* Number of kernel PMDs */
|
|
|
|
|
|
|
|
xorl %ebx,%ebx /* %ebx is kept at zero */
|
|
|
|
|
|
|
|
movl $pa(__brk_base), %edi
|
|
|
|
movl $pa(initial_pg_pmd), %edx
|
|
|
|
movl $PTE_IDENT_ATTR, %eax
|
|
|
|
10:
|
|
|
|
leal PDE_IDENT_ATTR(%edi),%ecx /* Create PMD entry */
|
|
|
|
movl %ecx,(%edx) /* Store PMD entry */
|
|
|
|
/* Upper half already zero */
|
|
|
|
addl $8,%edx
|
|
|
|
movl $512,%ecx
|
|
|
|
11:
|
|
|
|
stosl
|
|
|
|
xchgl %eax,%ebx
|
|
|
|
stosl
|
|
|
|
xchgl %eax,%ebx
|
|
|
|
addl $0x1000,%eax
|
|
|
|
loop 11b
|
|
|
|
|
|
|
|
/*
|
|
|
|
* End condition: we must map up to the end + MAPPING_BEYOND_END.
|
|
|
|
*/
|
|
|
|
movl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %ebp
|
|
|
|
cmpl %ebp,%eax
|
|
|
|
jb 10b
|
|
|
|
1:
|
|
|
|
addl $__PAGE_OFFSET, %edi
|
|
|
|
movl %edi, pa(_brk_end)
|
|
|
|
shrl $12, %eax
|
|
|
|
movl %eax, pa(max_pfn_mapped)
|
|
|
|
|
|
|
|
/* Do early initialization of the fixmap area */
|
|
|
|
movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
|
|
|
|
movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8)
|
|
|
|
#else /* Not PAE */
|
|
|
|
|
|
|
|
page_pde_offset = (__PAGE_OFFSET >> 20);
|
|
|
|
|
|
|
|
movl $pa(__brk_base), %edi
|
|
|
|
movl $pa(initial_page_table), %edx
|
|
|
|
movl $PTE_IDENT_ATTR, %eax
|
|
|
|
10:
|
|
|
|
leal PDE_IDENT_ATTR(%edi),%ecx /* Create PDE entry */
|
|
|
|
movl %ecx,(%edx) /* Store identity PDE entry */
|
|
|
|
movl %ecx,page_pde_offset(%edx) /* Store kernel PDE entry */
|
|
|
|
addl $4,%edx
|
|
|
|
movl $1024, %ecx
|
|
|
|
11:
|
|
|
|
stosl
|
|
|
|
addl $0x1000,%eax
|
|
|
|
loop 11b
|
|
|
|
/*
|
|
|
|
* End condition: we must map up to the end + MAPPING_BEYOND_END.
|
|
|
|
*/
|
|
|
|
movl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %ebp
|
|
|
|
cmpl %ebp,%eax
|
|
|
|
jb 10b
|
|
|
|
addl $__PAGE_OFFSET, %edi
|
|
|
|
movl %edi, pa(_brk_end)
|
|
|
|
shrl $12, %eax
|
|
|
|
movl %eax, pa(max_pfn_mapped)
|
|
|
|
|
|
|
|
/* Do early initialization of the fixmap area */
|
|
|
|
movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
|
|
|
|
movl %eax,pa(initial_page_table+0xffc)
|
|
|
|
#endif
|
|
|
|
ret
|
|
|
|
|
2009-07-30 16:03:45 -06:00
|
|
|
/*G:055
|
|
|
|
* We create a macro which puts the assembler code between lgstart_ and lgend_
|
|
|
|
* markers. These templates are put in the .text section: they can't be
|
|
|
|
* discarded after boot as we may need to patch modules, too.
|
|
|
|
*/
|
2007-09-24 22:24:44 -06:00
|
|
|
.text
|
2007-07-19 02:49:22 -06:00
|
|
|
#define LGUEST_PATCH(name, insns...) \
|
|
|
|
lgstart_##name: insns; lgend_##name:; \
|
|
|
|
.globl lgstart_##name; .globl lgend_##name
|
|
|
|
|
|
|
|
LGUEST_PATCH(cli, movl $0, lguest_data+LGUEST_DATA_irq_enabled)
|
|
|
|
LGUEST_PATCH(pushf, movl lguest_data+LGUEST_DATA_irq_enabled, %eax)
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
|
2009-07-30 16:03:45 -06:00
|
|
|
/*G:033
|
|
|
|
* But using those wrappers is inefficient (we'll see why that doesn't matter
|
|
|
|
* for save_fl and irq_disable later). If we write our routines carefully in
|
|
|
|
* assembler, we can avoid clobbering any registers and avoid jumping through
|
|
|
|
* the wrapper functions.
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
*
|
|
|
|
* I skipped over our first piece of assembler, but this one is worth studying
|
2009-07-30 16:03:45 -06:00
|
|
|
* in a bit more detail so I'll describe in easy stages. First, the routine to
|
|
|
|
* enable interrupts:
|
|
|
|
*/
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
ENTRY(lg_irq_enable)
|
2009-07-30 16:03:45 -06:00
|
|
|
/*
|
|
|
|
* The reverse of irq_disable, this sets lguest_data.irq_enabled to
|
|
|
|
* X86_EFLAGS_IF (ie. "Interrupts enabled").
|
|
|
|
*/
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
movl $X86_EFLAGS_IF, lguest_data+LGUEST_DATA_irq_enabled
|
2009-07-30 16:03:45 -06:00
|
|
|
/*
|
|
|
|
* But now we need to check if the Host wants to know: there might have
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
* been interrupts waiting to be delivered, in which case it will have
|
|
|
|
* set lguest_data.irq_pending to X86_EFLAGS_IF. If it's not zero, we
|
2009-07-30 16:03:45 -06:00
|
|
|
* jump to send_interrupts, otherwise we're done.
|
|
|
|
*/
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
testl $0, lguest_data+LGUEST_DATA_irq_pending
|
|
|
|
jnz send_interrupts
|
2009-07-30 16:03:45 -06:00
|
|
|
/*
|
|
|
|
* One cool thing about x86 is that you can do many things without using
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
* a register. In this case, the normal path hasn't needed to save or
|
2009-07-30 16:03:45 -06:00
|
|
|
* restore any registers at all!
|
|
|
|
*/
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
ret
|
|
|
|
send_interrupts:
|
2009-07-30 16:03:45 -06:00
|
|
|
/*
|
|
|
|
* OK, now we need a register: eax is used for the hypercall number,
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
* which is LHCALL_SEND_INTERRUPTS.
|
|
|
|
*
|
|
|
|
* We used not to bother with this pending detection at all, which was
|
|
|
|
* much simpler. Sooner or later the Host would realize it had to
|
|
|
|
* send us an interrupt. But that turns out to make performance 7
|
|
|
|
* times worse on a simple tcp benchmark. So now we do this the hard
|
2009-07-30 16:03:45 -06:00
|
|
|
* way.
|
|
|
|
*/
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
pushl %eax
|
|
|
|
movl $LHCALL_SEND_INTERRUPTS, %eax
|
2009-07-30 16:03:45 -06:00
|
|
|
/*
|
|
|
|
* This is a vmcall instruction (same thing that KVM uses). Older
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
* assembler versions might not know the "vmcall" instruction, so we
|
2009-07-30 16:03:45 -06:00
|
|
|
* create one manually here.
|
|
|
|
*/
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
.byte 0x0f,0x01,0xc1 /* KVM_HYPERCALL */
|
2009-07-30 16:03:45 -06:00
|
|
|
/* Put eax back the way we found it. */
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
popl %eax
|
|
|
|
ret
|
|
|
|
|
2009-07-30 16:03:45 -06:00
|
|
|
/*
|
|
|
|
* Finally, the "popf" or "restore flags" routine. The %eax register holds the
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
* flags (in practice, either X86_EFLAGS_IF or 0): if it's X86_EFLAGS_IF we're
|
2009-07-30 16:03:45 -06:00
|
|
|
* enabling interrupts again, if it's 0 we're leaving them off.
|
|
|
|
*/
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
ENTRY(lg_restore_fl)
|
|
|
|
/* This is just "lguest_data.irq_enabled = flags;" */
|
|
|
|
movl %eax, lguest_data+LGUEST_DATA_irq_enabled
|
2009-07-30 16:03:45 -06:00
|
|
|
/*
|
|
|
|
* Now, if the %eax value has enabled interrupts and
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
* lguest_data.irq_pending is set, we want to tell the Host so it can
|
|
|
|
* deliver any outstanding interrupts. Fortunately, both values will
|
|
|
|
* be X86_EFLAGS_IF (ie. 512) in that case, and the "testl"
|
|
|
|
* instruction will AND them together for us. If both are set, we
|
2009-07-30 16:03:45 -06:00
|
|
|
* jump to send_interrupts.
|
|
|
|
*/
|
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
1GB Guest->Host: input(suppressed),output(suppressed)
Before:
Seconds: 0:16.53
Packets: 377268,753673
Interrupts: 22461,24297
Notifications: 1(5245),21303(732370)
Net IRQs triggered: 377023(245),42578(711095)
After:
Seconds: 0:16.48
Packets: 377289,753673
Interrupts: 22281,24465
Notifications: 1(5245),21296(732377)
Net IRQs triggered: 377060(229),42564(711109)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:27:03 -06:00
|
|
|
testl lguest_data+LGUEST_DATA_irq_pending, %eax
|
|
|
|
jnz send_interrupts
|
|
|
|
/* Again, the normal path has used no extra registers. Clever, huh? */
|
|
|
|
ret
|
2009-07-30 16:03:45 -06:00
|
|
|
/*:*/
|
2007-07-19 02:49:22 -06:00
|
|
|
|
|
|
|
/* These demark the EIP range where host should never deliver interrupts. */
|
|
|
|
.global lguest_noirq_start
|
|
|
|
.global lguest_noirq_end
|
|
|
|
|
2009-07-30 16:03:45 -06:00
|
|
|
/*M:004
|
|
|
|
* When the Host reflects a trap or injects an interrupt into the Guest, it
|
|
|
|
* sets the eflags interrupt bit on the stack based on lguest_data.irq_enabled,
|
|
|
|
* so the Guest iret logic does the right thing when restoring it. However,
|
|
|
|
* when the Host sets the Guest up for direct traps, such as system calls, the
|
|
|
|
* processor is the one to push eflags onto the stack, and the interrupt bit
|
|
|
|
* will be 1 (in reality, interrupts are always enabled in the Guest).
|
2007-07-26 11:41:05 -06:00
|
|
|
*
|
|
|
|
* This turns out to be harmless: the only trap which should happen under Linux
|
|
|
|
* with interrupts disabled is Page Fault (due to our lazy mapping of vmalloc
|
|
|
|
* regions), which has to be reflected through the Host anyway. If another
|
|
|
|
* trap *does* go off when interrupts are disabled, the Guest will panic, and
|
2009-07-30 16:03:45 -06:00
|
|
|
* we'll never get to this iret!
|
|
|
|
:*/
|
2007-07-26 11:41:05 -06:00
|
|
|
|
2009-07-30 16:03:45 -06:00
|
|
|
/*G:045
|
|
|
|
* There is one final paravirt_op that the Guest implements, and glancing at it
|
|
|
|
* you can see why I left it to last. It's *cool*! It's in *assembler*!
|
2007-07-26 11:41:02 -06:00
|
|
|
*
|
|
|
|
* The "iret" instruction is used to return from an interrupt or trap. The
|
|
|
|
* stack looks like this:
|
|
|
|
* old address
|
|
|
|
* old code segment & privilege level
|
|
|
|
* old processor flags ("eflags")
|
|
|
|
*
|
|
|
|
* The "iret" instruction pops those values off the stack and restores them all
|
|
|
|
* at once. The only problem is that eflags includes the Interrupt Flag which
|
|
|
|
* the Guest can't change: the CPU will simply ignore it when we do an "iret".
|
|
|
|
* So we have to copy eflags from the stack to lguest_data.irq_enabled before
|
|
|
|
* we do the "iret".
|
|
|
|
*
|
|
|
|
* There are two problems with this: firstly, we need to use a register to do
|
|
|
|
* the copy and secondly, the whole thing needs to be atomic. The first
|
|
|
|
* problem is easy to solve: push %eax on the stack so we can use it, and then
|
|
|
|
* restore it at the end just before the real "iret".
|
|
|
|
*
|
|
|
|
* The second is harder: copying eflags to lguest_data.irq_enabled will turn
|
|
|
|
* interrupts on before we're finished, so we could be interrupted before we
|
|
|
|
* return to userspace or wherever. Our solution to this is to surround the
|
|
|
|
* code with lguest_noirq_start: and lguest_noirq_end: labels. We tell the
|
|
|
|
* Host that it is *never* to interrupt us there, even if interrupts seem to be
|
2009-07-30 16:03:45 -06:00
|
|
|
* enabled.
|
|
|
|
*/
|
2007-07-19 02:49:22 -06:00
|
|
|
ENTRY(lguest_iret)
|
|
|
|
pushl %eax
|
|
|
|
movl 12(%esp), %eax
|
|
|
|
lguest_noirq_start:
|
2009-07-30 16:03:45 -06:00
|
|
|
/*
|
|
|
|
* Note the %ss: segment prefix here. Normal data accesses use the
|
2007-07-26 11:41:02 -06:00
|
|
|
* "ds" segment, but that will have already been restored for whatever
|
|
|
|
* we're returning to (such as userspace): we can't trust it. The %ss:
|
2009-07-30 16:03:45 -06:00
|
|
|
* prefix makes sure we use the stack segment, which is still valid.
|
|
|
|
*/
|
2007-07-19 02:49:22 -06:00
|
|
|
movl %eax,%ss:lguest_data+LGUEST_DATA_irq_enabled
|
|
|
|
popl %eax
|
|
|
|
iret
|
|
|
|
lguest_noirq_end:
|