kernel-fxtec-pro1x/arch/x86/include/asm
Alex Shi e7b52ffd45 x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.

But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.

So, on a 512 4KB TLB entries CPU, the balance points is at:
	(512 - X) * 100ns(assumed TLB refill cost) =
		X(TLB flush entries) * 100ns(assumed invlpg cost)

Here, X is 256, that is 1/2 of 512 entries.

But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.

As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.

My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.

Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':

multi thread testing, '-t' paramter is thread number:
	       	        with patch   unpatched 3.4-rc4
./mprotect -t 1           14ns		24ns
./mprotect -t 2           13ns		22ns
./mprotect -t 4           12ns		19ns
./mprotect -t 8           14ns		16ns
./mprotect -t 16          28ns		26ns
./mprotect -t 32          54ns		51ns
./mprotect -t 128         200ns		199ns

Single process with sequencial flushing and memory accessing:

		       	with patch   unpatched 3.4-rc4
./mprotect		    7ns			11ns
./mprotect -p 4096  -l 8 -n 10240
			    21ns		21ns

[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
  has additional performance numbers. ]

Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-27 19:29:07 -07:00
..
numachip x86: Add NumaChip support 2011-12-05 17:17:24 +01:00
uv x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range 2012-06-27 19:29:07 -07:00
visws
xen Merge branch 'stable/autoballoon.v5.2' into stable/for-linus-3.5 2012-05-07 15:33:27 -04:00
a.out-core.h
a.out.h
acpi.h x86, realmode: Unbreak the ia64 build of drivers/acpi/sleep.c 2012-05-30 10:12:48 -07:00
aes.h
agp.h
alternative-asm.h x86: Fix atomic64_xxx_cx8() functions 2012-01-04 15:01:56 +01:00
alternative.h x86: Adjust asm constraints in atomic64 wrappers 2012-01-20 17:29:31 -08:00
amd_nb.h x86/PCI: amd: factor out MMCONFIG discovery 2012-01-06 12:11:19 -08:00
apb_timer.h Merge branch 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-07-23 10:34:47 -07:00
apic.h x86: Add read_mostly declaration/definition to variables from smp.h 2012-06-14 12:42:11 +02:00
apic_flat_64.h x86: Make flat_init_apic_ldr() available 2011-12-05 17:17:07 +01:00
apicdef.h x86/apic: Fix typo EIO_ACK -> EOI_ACK and document it 2012-05-18 09:46:07 +02:00
apm.h
arch_hweight.h
archrandom.h x86, random: Verify RDRAND functionality and allow it to be disabled 2011-07-31 14:02:19 -07:00
asm-offsets.h
asm.h x86, extable: Switch to relative exception table entries 2012-04-20 17:22:34 -07:00
atomic.h x86: Use xadd helper more widely 2011-08-29 13:44:12 -07:00
atomic64_32.h atomic64_32.h: fix parameter naming mismatch 2012-05-09 11:38:20 +02:00
atomic64_64.h x86: Use xadd helper more widely 2011-08-29 13:44:12 -07:00
auxvec.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
barrier.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
bios_ebda.h
bitops.h x86/bitops: Move BIT_64() for a wider use 2012-05-23 17:16:42 +02:00
bitsperlong.h
boot.h x86: Use common threadinfo allocator 2012-05-08 14:08:44 +02:00
bootparam.h keyboard: Use BIOS Keyboard variable to set Numlock 2012-05-08 14:19:41 -07:00
bug.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
bugs.h
byteorder.h
cache.h
cacheflush.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
calgary.h
calling.h
ce4100.h
checksum.h
checksum_32.h
checksum_64.h
clocksource.h clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option 2011-07-21 13:34:05 -07:00
cmpxchg.h x86: Use correct byte-sized register constraint in __add() 2012-04-06 09:40:07 -07:00
cmpxchg_32.h x86: Fix and improve cmpxchg_double{,_local}() 2012-01-04 15:01:54 +01:00
cmpxchg_64.h x86: Fix and improve cmpxchg_double{,_local}() 2012-01-04 15:01:54 +01:00
compat.h x86: replace percpu_xxx funcs with this_cpu_xxx 2012-05-14 14:15:31 -07:00
cpu.h
cpu_device_id.h Add driver auto probing for x86 features v4 2012-01-26 16:44:41 -08:00
cpufeature.h Merge branches 'x86-cpu-for-linus', 'x86-boot-for-linus', 'x86-cpufeature-for-linus', 'x86-process-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-03-22 09:28:15 -07:00
cpumask.h
cputime.h
current.h x86: replace percpu_xxx funcs with this_cpu_xxx 2012-05-14 14:15:31 -07:00
debugreg.h x86: relocate get/set debugreg fcns to include/asm/debugreg. 2012-02-28 17:48:04 -05:00
delay.h asm-generic: move archictures to common delay.h 2011-07-22 18:46:24 +02:00
desc.h x86: replace percpu_xxx funcs with this_cpu_xxx 2012-05-14 14:15:31 -07:00
desc_defs.h
device.h x86-32: Introduce CONFIG_X86_DEV_DMA_OPS 2012-04-12 11:09:56 -07:00
div64.h x86/div64: Add a micro-optimization shortcut if base is power of two 2011-12-05 18:16:11 +01:00
dma-contiguous.h X86: integrate CMA with DMA-mapping subsystem 2012-05-21 15:09:38 +02:00
dma-mapping.h Merge branch 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping 2012-05-25 09:18:59 -07:00
dma.h
dmi.h
dwarf2.h x86-64: Fix CFI data for interrupt frames 2011-09-28 19:04:52 +02:00
e820.h Revert "x86, efi: Calling __pa() with an ioremap()ed address is invalid" 2011-12-12 18:25:56 +01:00
edac.h
efi.h x86, efi: Allow basic init with mixed 32/64-bit efi/kernel 2012-02-23 18:54:51 -08:00
elf.h Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-03-29 18:12:23 -07:00
emergency-restart.h
entry_arch.h
errno.h
exec.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
fb.h
fcntl.h
fixmap.h x86/intel config: Revamp configuration to allow for Moorestown and Medfield 2011-12-18 09:17:02 +01:00
floppy.h
fpu-internal.h x86: replace percpu_xxx funcs with this_cpu_xxx 2012-05-14 14:15:31 -07:00
frame.h x86: Unify rwlock assembly implementation 2011-07-21 09:03:31 +02:00
ftrace.h ftrace: Synchronize variable setting with breakpoints 2012-05-31 23:12:17 -04:00
futex.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
gart.h
genapic.h
geode.h
gpio.h gpiolib/arches: Centralise bolierplate asm/gpio.h 2012-05-11 18:00:14 -06:00
hardirq.h x86: replace percpu_xxx funcs with this_cpu_xxx 2012-05-14 14:15:31 -07:00
highmem.h highmem: kill all __kmap_atomic() 2012-03-20 21:48:30 +08:00
hpet.h
hugetlb.h
hw_breakpoint.h
hw_irq.h iommu: Rename the DMAR and INTR_REMAP config options 2011-09-21 10:22:03 +02:00
hypertransport.h
hyperv.h Staging: hv: vmbus: Retry vmbus_post_msg() before giving up 2011-08-25 15:23:19 -07:00
hypervisor.h
i387.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
i8259.h
ia32.h signal, x86: add SIGSYS info and make it synchronous. 2012-04-14 11:13:21 +10:00
ia32_unistd.h x86: Generate system call tables and unistd_*.h from tables 2011-11-17 13:35:37 -08:00
idle.h x86: Merge the x86_32 and x86_64 cpu_idle() functions 2012-03-26 03:16:07 +02:00
inat.h x86: Fix to decode grouped AVX with VEX pp bits 2012-02-11 15:11:35 +01:00
inat_types.h
init.h x86, mm: Unify zone_sizes_init() 2011-11-11 10:22:55 +01:00
insn.h x86: Fix to decode grouped AVX with VEX pp bits 2012-02-11 15:11:35 +01:00
inst.h
intel_scu_ipc.h x86,mrst: Power control commands update 2011-12-05 12:42:11 +01:00
io.h x86: don't include xen/xen.h in <asm/io.h> unless XEN is enabled 2011-08-03 22:00:38 -10:00
io_apic.h x86/apic: Replace io_apic_ops with x86_io_apic_ops. 2012-05-01 14:50:09 -04:00
ioctl.h
ioctls.h
iomap.h
iommu.h iommu: Add option to group multi-function devices 2011-11-15 12:22:31 +01:00
iommu_table.h
ipcbuf.h
ipi.h
irq.h
irq_regs.h x86: replace percpu_xxx funcs with this_cpu_xxx 2012-05-14 14:15:31 -07:00
irq_remapping.h irq_remap: Fix compiler warning with CONFIG_IRQ_REMAP=y 2012-05-08 11:17:29 +02:00
irq_vectors.h x86/irq: Standardize on CONFIG_SPARSE_IRQ=y 2011-10-13 12:12:12 +02:00
irqflags.h tracing, x86/irq: Do not trace arch_local_{*,irq_*}() functions 2011-07-07 19:22:32 +02:00
ist.h
jump_label.h static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]() 2012-02-24 10:05:59 +01:00
kbdleds.h keyboard: Use BIOS Keyboard variable to set Numlock 2012-05-08 14:19:41 -07:00
Kbuild x32: Generate <asm/unistd_x32.h> 2012-02-20 12:51:00 -08:00
kdebug.h x86: Avoid double stack traces with show_regs() 2012-05-09 11:44:42 +02:00
kexec.h
kgdb.h kgdb: x86: Return all segment registers also in 64-bit mode 2012-03-22 15:07:15 -05:00
kmap_types.h
kmemcheck.h
kprobes.h
kvm.h KVM: provide synchronous registers in kvm_run 2012-03-05 14:52:22 +02:00
kvm_emulate.h KVM: x86 emulator: MMX support 2012-04-16 20:36:16 -03:00
kvm_host.h Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm 2012-05-24 16:17:30 -07:00
kvm_para.h Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm 2012-05-24 16:17:30 -07:00
ldt.h
lguest.h
lguest_hcall.h lguest: update comments 2011-07-22 14:39:50 +09:30
linkage.h
local.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
local64.h
mach_timer.h time: x86: Remove CLOCK_TICK_RATE from mach_timer.h 2011-11-21 19:00:57 -08:00
mach_traps.h x86/mrst: Avoid reporting wrong nmi status 2011-11-10 16:21:01 +01:00
math_emu.h
mc146818rtc.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
mce.h x86/mce: Convert static array of pointers to per-cpu variables 2012-02-22 12:58:06 -08:00
microcode.h x86, microcode, AMD: Add a vendor-specific exit function 2011-12-14 12:46:47 +01:00
mman.h
mmconfig.h
mmu.h
mmu_context.h x86: replace percpu_xxx funcs with this_cpu_xxx 2012-05-14 14:15:31 -07:00
mmx.h
mmzone.h
mmzone_32.h x86: Drop obsolete ARCH_BOOTMEM support 2012-04-14 14:28:58 +02:00
mmzone_64.h Fix node_start/end_pfn() definition for mm/page_cgroup.c 2011-06-27 14:13:09 -07:00
module.h
mpspec.h MCA: delete all remaining traces of microchannel bus support. 2012-05-17 19:06:13 -04:00
mpspec_def.h MCA: delete all remaining traces of microchannel bus support. 2012-05-17 19:06:13 -04:00
mrst-vrtc.h
mrst.h x86/mid: Remove Intel Moorestown 2012-01-26 21:23:53 +01:00
msgbuf.h
mshyperv.h
msidef.h
msr-index.h Merge branch 'perf/x86-ibs' into perf/core 2012-05-09 15:22:23 +02:00
msr.h x86, doc: Revert "x86: Document rdmsr_safe restrictions" 2012-04-19 17:07:34 -07:00
mtrr.h x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls 2012-03-01 12:48:52 -08:00
mutex.h
mutex_32.h
mutex_64.h
mwait.h
nmi.h x86/nmi: Fix section mismatch warnings on 32-bit 2012-06-08 12:19:27 +02:00
nops.h x86, nop: Make the ASM_NOP* macros work from assembly 2012-04-19 15:07:42 -07:00
numa.h
numa_32.h
numa_64.h
numaq.h
olpc.h x86, olpc-xo1-sci: Add GPE handler and ebook switch functionality 2011-07-06 14:44:38 -07:00
olpc_ofw.h
page.h
page_32.h
page_32_types.h x86: Use common threadinfo allocator 2012-05-08 14:08:44 +02:00
page_64.h
page_64_types.h x86: Use common threadinfo allocator 2012-05-08 14:08:44 +02:00
page_types.h Move all declarations of free_initmem() to linux/mm.h 2012-03-28 18:30:03 +01:00
param.h
paravirt.h x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range 2012-06-27 19:29:07 -07:00
paravirt_types.h x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range 2012-06-27 19:29:07 -07:00
parport.h
pat.h
pci-direct.h
pci-functions.h
pci.h x86/PCI: Expand the x86_msi_ops to have a restore MSIs. 2012-01-06 14:02:26 -08:00
pci_64.h
pci_x86.h PCI: Pull PCI 'latency timer' setup up into the core 2012-01-06 12:10:42 -08:00
percpu.h x86: Define early read-mostly per-cpu macros 2012-06-14 12:42:10 +02:00
perf_event.h perf/x86/ibs: Fix undefined reference to `get_ibs_caps' 2012-05-14 14:31:35 +02:00
perf_event_p4.h x86, perf: P4 PMU - Fix typos in comments and style cleanup 2011-07-21 20:41:54 +02:00
pgalloc.h
pgtable-2level.h
pgtable-2level_types.h
pgtable-3level.h mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition 2012-05-29 16:22:24 -07:00
pgtable-3level_types.h
pgtable.h x86: Use "do { } while(0)" for empty flush_tlb_fix_spurious_fault() macro 2011-12-18 09:14:18 +01:00
pgtable_32.h
pgtable_32_types.h
pgtable_64.h
pgtable_64_types.h
pgtable_types.h
poll.h
posix_types.h x32: Check __ILP32__ instead of __LP64__ for x32 2012-04-23 14:51:14 -07:00
posix_types_32.h bury __kernel_nlink_t, make internal nlink_t consistent 2012-05-30 21:04:50 -04:00
posix_types_64.h x86: Use generic posix_types.h 2012-02-14 12:01:30 -08:00
posix_types_x32.h x32: Create posix_types_x32.h 2012-02-20 12:48:47 -08:00
prctl.h
probe_roms.h
processor-cyrix.h
processor-flags.h x86: Fix rflags in FAKE_STACK_FRAME 2011-12-06 10:02:38 +01:00
processor.h x86/tlb_info: get last level TLB entry number of CPU 2012-06-27 19:28:24 -07:00
prom.h irq_domain/x86: Convert x86 (embedded) to use common irq_domain 2012-02-23 14:37:47 -07:00
proto.h
ptrace-abi.h
ptrace.h x86: Move some signal-handling definitions to a common header 2012-02-20 12:52:04 -08:00
pvclock-abi.h x86: pvclock: Add flag to indicate that a vm was stopped by the host 2012-04-08 12:48:57 +03:00
pvclock.h KVM: Fix instruction size issue in pvclock scaling 2011-08-30 14:42:30 +03:00
realmode.h x86, realmode: Change EFER to a single u64 field 2012-05-16 14:02:05 -07:00
reboot.h x86, nmi: Wire up NMI handlers to new routines 2011-10-10 06:56:57 +02:00
reboot_fixups.h
required-features.h
resource.h
resume-trace.h
rio.h
rtc.h
rwlock.h x86: Fix write lock scalability 64-bit issue 2011-07-21 09:03:36 +02:00
rwsem.h x86: Use xadd helper more widely 2011-08-29 13:44:12 -07:00
scatterlist.h
seccomp.h
seccomp_32.h
seccomp_64.h
sections.h
segment.h x86-64: Handle exception table entries during early boot 2012-04-19 15:42:45 -07:00
sembuf.h
serial.h
serpent.h crypto: serpent - add 4-way parallel i586/SSE2 assembler implementation 2011-11-21 16:13:23 +08:00
setup.h x86/intel config: Revamp configuration to allow for Moorestown and Medfield 2011-12-18 09:17:02 +01:00
setup_arch.h
shmbuf.h
shmparam.h
sigcontext.h x32: Check __ILP32__ instead of __LP64__ for x32 2012-04-23 14:51:14 -07:00
sigcontext32.h
sigframe.h x32: Add rt_sigframe_x32 2012-02-20 12:52:05 -08:00
sighandling.h most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set 2012-06-01 12:58:51 -04:00
siginfo.h x32, siginfo: Provide proper overrides for x32 siginfo_t 2012-04-23 18:11:40 -07:00
signal.h
smp.h x86: Add read_mostly declaration/definition to variables from smp.h 2012-06-14 12:42:11 +02:00
smpboot_hooks.h x86: Serialize SMP bootup CMOS accesses on rtc_lock 2011-07-21 09:20:59 +02:00
socket.h
sockios.h
sparsemem.h
special_insns.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
spinlock.h x86: spinlock.h: Remove REG_PTR_MODE 2012-03-30 10:01:59 -07:00
spinlock_types.h x86/spinlocks: Eliminate TICKET_MASK 2012-02-07 10:09:54 +01:00
sta2x11.h mfd: Add driver for STA2X11 MFD block 2012-05-09 15:34:28 +02:00
stackprotector.h x86: replace percpu_xxx funcs with this_cpu_xxx 2012-05-14 14:15:31 -07:00
stacktrace.h
stat.h vfs: don't force a big memset of stat data just to clear padding fields 2012-05-06 18:02:40 -07:00
statfs.h
string.h
string_32.h
string_64.h
suspend.h
suspend_32.h
suspend_64.h
svm.h
swab.h
swiotlb.h
switch_to.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
sync_bitops.h
sys_ia32.h x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h> 2012-02-20 12:52:05 -08:00
syscall.h arch/x86: add syscall_get_arch to syscall.h 2012-04-14 11:13:20 +10:00
syscalls.h
tce.h
termbits.h
termios.h
thread_info.h set_restore_sigmask() is never called without SIGPENDING (and never should be) 2012-06-01 12:58:50 -04:00
time.h
timer.h sched/x86: Fix overflow in cyc2ns_offset 2012-03-13 16:27:51 +01:00
timex.h
tlb.h
tlbflush.h x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range 2012-06-27 19:29:07 -07:00
topology.h sched/numa: Rewrite the CONFIG_NUMA sched domain support 2012-05-09 15:00:55 +02:00
traps.h x86: Use enum instead of literals for trap values 2012-03-09 16:47:54 -08:00
tsc.h x86: kvmclock: abstract save/restore sched_clock_state 2012-03-20 12:37:45 +02:00
types.h
uaccess.h perf/x86: Check if user fp is valid 2012-06-06 17:08:01 +02:00
uaccess_32.h x86: use the new generic strnlen_user() function 2012-05-26 11:33:54 -07:00
uaccess_64.h x86: use the new generic strnlen_user() function 2012-05-26 11:33:54 -07:00
ucontext.h
unaligned.h
unistd.h x32: Check __ILP32__ instead of __LP64__ for x32 2012-04-23 14:51:14 -07:00
uprobes.h uprobes/core: Handle breakpoint and singlestep exceptions 2012-03-14 07:41:36 +01:00
user.h
user32.h
user_32.h
user_64.h
vdso.h
vga.h efifb: Implement vga_default_device() (v2) 2012-04-24 09:50:18 +01:00
vgtod.h x86-64: Simplify and optimize vdso clock_gettime monotonic variants 2012-03-23 16:49:33 -07:00
virtext.h Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
vm86.h
vmx.h KVM: APIC: avoid instruction emulation for EOI writes 2011-09-25 19:52:17 +03:00
vsyscall.h x86-64: Rework vsyscall emulation and add vsyscall= parameter 2011-08-10 19:26:46 -05:00
vvar.h
word-at-a-time.h word-at-a-time: make the interfaces truly generic 2012-05-26 11:33:40 -07:00
x2apic.h x86/apic: Add separate apic_id_valid() functions for selected apic drivers 2012-03-23 13:28:43 +01:00
x86_init.h x86/apic: Replace io_apic_ops with x86_io_apic_ops. 2012-05-01 14:50:09 -04:00
xcr.h
xor.h
xor_32.h raid5: add AVX optimized RAID5 checksumming 2012-05-22 13:54:04 +10:00
xor_64.h raid5: add AVX optimized RAID5 checksumming 2012-05-22 13:54:04 +10:00
xor_avx.h raid5: add AVX optimized RAID5 checksumming 2012-05-22 13:54:04 +10:00
xsave.h x86, extable: Remove open-coded exception table entries in arch/x86/include/asm/xsave.h 2012-04-20 13:51:40 -07:00