Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6

* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
  [PATCH] x86-64: Export smp_call_function_single
  [PATCH] i386: Clean up smp_tune_scheduling()
  [PATCH] unwinder: move .eh_frame to RODATA
  [PATCH] unwinder: fully support linker generated .eh_frame_hdr section
  [PATCH] x86-64: don't use set_irq_regs()
  [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
  [PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
  [PATCH] i386: replace kmalloc+memset with kzalloc
  [PATCH] x86-64: remove remaining pc98 code
  [PATCH] x86-64: remove unused variable
  [PATCH] x86-64: Fix constraints in atomic_add_return()
  [PATCH] x86-64: fix asm constraints in i386 atomic_add_return
  [PATCH] x86-64: Correct documentation for bzImage protocol v2.05
  [PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
  [PATCH] x86-64: Fix numaq build error
  [PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
  [PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
  [PATCH] x86-64: Clarify error message in GART code
  [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
  [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
  ...

Fixed conflict in include/linux/uaccess.h manually

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This commit is contained in:
Linus Torvalds 2006-12-07 08:59:11 -08:00
commit 4522d58275
211 changed files with 6147 additions and 2635 deletions

View file

@ -2,7 +2,7 @@
---------------------------- ----------------------------
H. Peter Anvin <hpa@zytor.com> H. Peter Anvin <hpa@zytor.com>
Last update 2005-09-02 Last update 2006-11-17
On the i386 platform, the Linux kernel uses a rather complicated boot On the i386 platform, the Linux kernel uses a rather complicated boot
convention. This has evolved partially due to historical aspects, as convention. This has evolved partially due to historical aspects, as
@ -35,6 +35,8 @@ Protocol 2.03: (Kernel 2.4.18-pre1) Explicitly makes the highest possible
initrd address available to the bootloader. initrd address available to the bootloader.
Protocol 2.04: (Kernel 2.6.14) Extend the syssize field to four bytes. Protocol 2.04: (Kernel 2.6.14) Extend the syssize field to four bytes.
Protocol 2.05: (Kernel 2.6.20) Make protected mode kernel relocatable.
Introduce relocatable_kernel and kernel_alignment fields.
**** MEMORY LAYOUT **** MEMORY LAYOUT
@ -129,6 +131,8 @@ Offset Proto Name Meaning
0226/2 N/A pad1 Unused 0226/2 N/A pad1 Unused
0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line 0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line
022C/4 2.03+ initrd_addr_max Highest legal initrd address 022C/4 2.03+ initrd_addr_max Highest legal initrd address
0230/4 2.05+ kernel_alignment Physical addr alignment required for kernel
0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not
(1) For backwards compatibility, if the setup_sects field contains 0, the (1) For backwards compatibility, if the setup_sects field contains 0, the
real value is 4. real value is 4.

View file

@ -599,8 +599,6 @@ and is between 256 and 4096 characters. It is defined in the file
hugepages= [HW,IA-32,IA-64] Maximal number of HugeTLB pages. hugepages= [HW,IA-32,IA-64] Maximal number of HugeTLB pages.
noirqbalance [IA-32,SMP,KNL] Disable kernel irq balancing
i8042.direct [HW] Put keyboard port into non-translated mode i8042.direct [HW] Put keyboard port into non-translated mode
i8042.dumbkbd [HW] Pretend that controller can only read data from i8042.dumbkbd [HW] Pretend that controller can only read data from
keyboard and cannot control its state keyboard and cannot control its state
@ -1065,9 +1063,14 @@ and is between 256 and 4096 characters. It is defined in the file
in certain environments such as networked servers or in certain environments such as networked servers or
real-time systems. real-time systems.
noirqbalance [IA-32,SMP,KNL] Disable kernel irq balancing
noirqdebug [IA-32] Disables the code which attempts to detect and noirqdebug [IA-32] Disables the code which attempts to detect and
disable unhandled interrupt sources. disable unhandled interrupt sources.
no_timer_check [IA-32,X86_64,APIC] Disables the code which tests for
broken timer IRQ sources.
noisapnp [ISAPNP] Disables ISA PnP code. noisapnp [ISAPNP] Disables ISA PnP code.
noinitrd [RAM] Tells the kernel not to load any configured noinitrd [RAM] Tells the kernel not to load any configured
@ -1752,6 +1755,9 @@ and is between 256 and 4096 characters. It is defined in the file
norandmaps Don't use address space randomization norandmaps Don't use address space randomization
Equivalent to echo 0 > /proc/sys/kernel/randomize_va_space Equivalent to echo 0 > /proc/sys/kernel/randomize_va_space
unwind_debug=N N > 0 will enable dwarf2 unwinder debugging
This is useful to get more information why
you got a "dwarf2 unwinder stuck"
______________________________________________________________________ ______________________________________________________________________

View file

@ -62,9 +62,6 @@ consider the following facts about the Linux kernel:
- different structures can contain different fields - different structures can contain different fields
- Some functions may not be implemented at all, (i.e. some locks - Some functions may not be implemented at all, (i.e. some locks
compile away to nothing for non-SMP builds.) compile away to nothing for non-SMP builds.)
- Parameter passing of variables from function to function can be
done in different ways (the CONFIG_REGPARM option controls
this.)
- Memory within the kernel can be aligned in different ways, - Memory within the kernel can be aligned in different ways,
depending on the build options. depending on the build options.
- Linux runs on a wide range of different processor architectures. - Linux runs on a wide range of different processor architectures.

View file

@ -27,6 +27,7 @@ show up in /proc/sys/kernel:
- hotplug - hotplug
- java-appletviewer [ binfmt_java, obsolete ] - java-appletviewer [ binfmt_java, obsolete ]
- java-interpreter [ binfmt_java, obsolete ] - java-interpreter [ binfmt_java, obsolete ]
- kstack_depth_to_print [ X86 only ]
- l2cr [ PPC only ] - l2cr [ PPC only ]
- modprobe ==> Documentation/kmod.txt - modprobe ==> Documentation/kmod.txt
- msgmax - msgmax
@ -170,6 +171,13 @@ This flag controls the L2 cache of G3 processor boards. If
============================================================== ==============================================================
kstack_depth_to_print: (X86 only)
Controls the number of words to print when dumping the raw
kernel stack.
==============================================================
osrelease, ostype & version: osrelease, ostype & version:
# cat osrelease # cat osrelease

View file

@ -52,10 +52,6 @@ APICs
apicmaintimer. Useful when your PIT timer is totally apicmaintimer. Useful when your PIT timer is totally
broken. broken.
disable_8254_timer / enable_8254_timer
Enable interrupt 0 timer routing over the 8254 in addition to over
the IO-APIC. The kernel tries to set a sensible default.
Early Console Early Console
syntax: earlyprintk=vga syntax: earlyprintk=vga
@ -183,7 +179,7 @@ PCI
IOMMU IOMMU
iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge] iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge]
[,forcesac][,fullflush][,nomerge][,noaperture] [,forcesac][,fullflush][,nomerge][,noaperture][,calgary]
size set size of iommu (in bytes) size set size of iommu (in bytes)
noagp don't initialize the AGP driver and use full aperture. noagp don't initialize the AGP driver and use full aperture.
off don't use the IOMMU off don't use the IOMMU
@ -204,6 +200,7 @@ IOMMU
buffering. buffering.
nodac Forbid DMA >4GB nodac Forbid DMA >4GB
panic Always panic when IOMMU overflows panic Always panic when IOMMU overflows
calgary Use the Calgary IOMMU if it is available
swiotlb=pages[,force] swiotlb=pages[,force]

View file

@ -70,6 +70,7 @@ SECTIONS
#endif #endif
.text : .text :
{ {
_text = .;
#if defined(CONFIG_ROMKERNEL) #if defined(CONFIG_ROMKERNEL)
*(.int_redirect) *(.int_redirect)
#endif #endif

View file

@ -182,6 +182,17 @@ config X86_ES7000
endchoice endchoice
config PARAVIRT
bool "Paravirtualization support (EXPERIMENTAL)"
depends on EXPERIMENTAL
help
Paravirtualization is a way of running multiple instances of
Linux on the same machine, under a hypervisor. This option
changes the kernel so it can modify itself when it is run
under a hypervisor, improving performance significantly.
However, when run without a hypervisor the kernel is
theoretically slower. If in doubt, say N.
config ACPI_SRAT config ACPI_SRAT
bool bool
default y default y
@ -443,7 +454,8 @@ source "drivers/firmware/Kconfig"
choice choice
prompt "High Memory Support" prompt "High Memory Support"
default NOHIGHMEM default HIGHMEM4G if !X86_NUMAQ
default HIGHMEM64G if X86_NUMAQ
config NOHIGHMEM config NOHIGHMEM
bool "off" bool "off"
@ -710,20 +722,6 @@ config BOOT_IOREMAP
depends on (((X86_SUMMIT || X86_GENERICARCH) && NUMA) || (X86 && EFI)) depends on (((X86_SUMMIT || X86_GENERICARCH) && NUMA) || (X86 && EFI))
default y default y
config REGPARM
bool "Use register arguments"
default y
help
Compile the kernel with -mregparm=3. This instructs gcc to use
a more efficient function call ABI which passes the first three
arguments of a function call via registers, which results in denser
and faster code.
If this option is disabled, then the default ABI of passing
arguments via the stack is used.
If unsure, say Y.
config SECCOMP config SECCOMP
bool "Enable seccomp to safely compute untrusted bytecode" bool "Enable seccomp to safely compute untrusted bytecode"
depends on PROC_FS depends on PROC_FS
@ -773,23 +771,39 @@ config CRASH_DUMP
PHYSICAL_START. PHYSICAL_START.
For more details see Documentation/kdump/kdump.txt For more details see Documentation/kdump/kdump.txt
config PHYSICAL_START config RELOCATABLE
hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP) bool "Build a relocatable kernel(EXPERIMENTAL)"
depends on EXPERIMENTAL
default "0x1000000" if CRASH_DUMP
default "0x100000"
help help
This gives the physical address where the kernel is loaded. Normally This build a kernel image that retains relocation information
for regular kernels this value is 0x100000 (1MB). But in the case so it can be loaded someplace besides the default 1MB.
of kexec on panic the fail safe kernel needs to run at a different The relocations tend to the kernel binary about 10% larger,
address than the panic-ed kernel. This option is used to set the load but are discarded at runtime.
address for kernels used to capture crash dump on being kexec'ed
after panic. The default value for crash dump kernels is One use is for the kexec on panic case where the recovery kernel
0x1000000 (16MB). This can also be set based on the "X" value as must live at a different physical address than the primary
specified in the "crashkernel=YM@XM" command line boot parameter kernel.
passed to the panic-ed kernel. Typically this parameter is set as
crashkernel=64M@16M. Please take a look at config PHYSICAL_ALIGN
Documentation/kdump/kdump.txt for more details about crash dumps. hex "Alignment value to which kernel should be aligned"
default "0x100000"
range 0x2000 0x400000
help
This value puts the alignment restrictions on physical address
where kernel is loaded and run from. Kernel is compiled for an
address which meets above alignment restriction.
If bootloader loads the kernel at a non-aligned address and
CONFIG_RELOCATABLE is set, kernel will move itself to nearest
address aligned to above value and run from there.
If bootloader loads the kernel at a non-aligned address and
CONFIG_RELOCATABLE is not set, kernel will ignore the run time
load address and decompress itself to the address it has been
compiled for and run from there. The address for which kernel is
compiled already meets above alignment restrictions. Hence the
end result is that kernel runs from a physical address meeting
above alignment restrictions.
Don't change this unless you know what you are doing. Don't change this unless you know what you are doing.

View file

@ -103,8 +103,15 @@ config MPENTIUMM
Select this for Intel Pentium M (not Pentium-4 M) Select this for Intel Pentium M (not Pentium-4 M)
notebook chips. notebook chips.
config MCORE2
bool "Core 2/newer Xeon"
help
Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and 53xx)
CPUs. You can distingush newer from older Xeons by the CPU family
in /proc/cpuinfo. Newer ones have 6.
config MPENTIUM4 config MPENTIUM4
bool "Pentium-4/Celeron(P4-based)/Pentium-4 M/Xeon" bool "Pentium-4/Celeron(P4-based)/Pentium-4 M/older Xeon"
help help
Select this for Intel Pentium 4 chips. This includes the Select this for Intel Pentium 4 chips. This includes the
Pentium 4, P4-based Celeron and Xeon, and Pentium-4 M Pentium 4, P4-based Celeron and Xeon, and Pentium-4 M
@ -229,7 +236,7 @@ config X86_L1_CACHE_SHIFT
default "7" if MPENTIUM4 || X86_GENERIC default "7" if MPENTIUM4 || X86_GENERIC
default "4" if X86_ELAN || M486 || M386 || MGEODEGX1 default "4" if X86_ELAN || M486 || M386 || MGEODEGX1
default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
default "6" if MK7 || MK8 || MPENTIUMM default "6" if MK7 || MK8 || MPENTIUMM || MCORE2
config RWSEM_GENERIC_SPINLOCK config RWSEM_GENERIC_SPINLOCK
bool bool
@ -287,17 +294,17 @@ config X86_ALIGNMENT_16
config X86_GOOD_APIC config X86_GOOD_APIC
bool bool
depends on MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || MK8 || MEFFICEON depends on MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || MK8 || MEFFICEON || MCORE2
default y default y
config X86_INTEL_USERCOPY config X86_INTEL_USERCOPY
bool bool
depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
default y default y
config X86_USE_PPRO_CHECKSUM config X86_USE_PPRO_CHECKSUM
bool bool
depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MEFFICEON || MGEODE_LX || MCORE2
default y default y
config X86_USE_3DNOW config X86_USE_3DNOW
@ -312,5 +319,5 @@ config X86_OOSTORE
config X86_TSC config X86_TSC
bool bool
depends on (MWINCHIP3D || MWINCHIP2 || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MGEODEGX1 || MGEODE_LX) && !X86_NUMAQ depends on (MWINCHIP3D || MWINCHIP2 || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MGEODEGX1 || MGEODE_LX || MCORE2) && !X86_NUMAQ
default y default y

View file

@ -85,4 +85,14 @@ config DOUBLEFAULT
option saves about 4k and might cause you much additional grey option saves about 4k and might cause you much additional grey
hair. hair.
config DEBUG_PARAVIRT
bool "Enable some paravirtualization debugging"
default y
depends on PARAVIRT && DEBUG_KERNEL
help
Currently deliberately clobbers regs which are allowed to be
clobbered in inlined paravirt hooks, even in native mode.
If turning this off solves a problem, then DISABLE_INTERRUPTS() or
ENABLE_INTERRUPTS() is lying about what registers can be clobbered.
endmenu endmenu

View file

@ -26,10 +26,12 @@ endif
LDFLAGS := -m elf_i386 LDFLAGS := -m elf_i386
OBJCOPYFLAGS := -O binary -R .note -R .comment -S OBJCOPYFLAGS := -O binary -R .note -R .comment -S
LDFLAGS_vmlinux := ifdef CONFIG_RELOCATABLE
LDFLAGS_vmlinux := --emit-relocs
endif
CHECKFLAGS += -D__i386__ CHECKFLAGS += -D__i386__
CFLAGS += -pipe -msoft-float CFLAGS += -pipe -msoft-float -mregparm=3
# prevent gcc from keeping the stack 16 byte aligned # prevent gcc from keeping the stack 16 byte aligned
CFLAGS += $(call cc-option,-mpreferred-stack-boundary=2) CFLAGS += $(call cc-option,-mpreferred-stack-boundary=2)
@ -37,8 +39,6 @@ CFLAGS += $(call cc-option,-mpreferred-stack-boundary=2)
# CPU-specific tuning. Anything which can be shared with UML should go here. # CPU-specific tuning. Anything which can be shared with UML should go here.
include $(srctree)/arch/i386/Makefile.cpu include $(srctree)/arch/i386/Makefile.cpu
cflags-$(CONFIG_REGPARM) += -mregparm=3
# temporary until string.h is fixed # temporary until string.h is fixed
cflags-y += -ffreestanding cflags-y += -ffreestanding

View file

@ -32,6 +32,7 @@ cflags-$(CONFIG_MWINCHIP2) += $(call cc-option,-march=winchip2,-march=i586)
cflags-$(CONFIG_MWINCHIP3D) += $(call cc-option,-march=winchip2,-march=i586) cflags-$(CONFIG_MWINCHIP3D) += $(call cc-option,-march=winchip2,-march=i586)
cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0 cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686) cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686)
cflags-$(CONFIG_MCORE2) += -march=i686 $(call cc-option,-mtune=core2,$(call cc-option,-mtune=generic,-mtune=i686))
# AMD Elan support # AMD Elan support
cflags-$(CONFIG_X86_ELAN) += -march=i486 cflags-$(CONFIG_X86_ELAN) += -march=i486

View file

@ -4,22 +4,42 @@
# create a compressed vmlinux image from the original vmlinux # create a compressed vmlinux image from the original vmlinux
# #
targets := vmlinux vmlinux.bin vmlinux.bin.gz head.o misc.o piggy.o targets := vmlinux vmlinux.bin vmlinux.bin.gz head.o misc.o piggy.o \
vmlinux.bin.all vmlinux.relocs
EXTRA_AFLAGS := -traditional EXTRA_AFLAGS := -traditional
LDFLAGS_vmlinux := -Ttext $(IMAGE_OFFSET) -e startup_32 LDFLAGS_vmlinux := -T
CFLAGS_misc.o += -fPIC
hostprogs-y := relocs
$(obj)/vmlinux: $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE $(obj)/vmlinux: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o FORCE
$(call if_changed,ld) $(call if_changed,ld)
@: @:
$(obj)/vmlinux.bin: vmlinux FORCE $(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy) $(call if_changed,objcopy)
quiet_cmd_relocs = RELOCS $@
cmd_relocs = $(obj)/relocs $< > $@;$(obj)/relocs --abs-relocs $<
$(obj)/vmlinux.relocs: vmlinux $(obj)/relocs FORCE
$(call if_changed,relocs)
vmlinux.bin.all-y := $(obj)/vmlinux.bin
vmlinux.bin.all-$(CONFIG_RELOCATABLE) += $(obj)/vmlinux.relocs
quiet_cmd_relocbin = BUILD $@
cmd_relocbin = cat $(filter-out FORCE,$^) > $@
$(obj)/vmlinux.bin.all: $(vmlinux.bin.all-y) FORCE
$(call if_changed,relocbin)
ifdef CONFIG_RELOCATABLE
$(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin.all FORCE
$(call if_changed,gzip)
else
$(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
$(call if_changed,gzip) $(call if_changed,gzip)
endif
LDFLAGS_piggy.o := -r --format binary --oformat elf32-i386 -T LDFLAGS_piggy.o := -r --format binary --oformat elf32-i386 -T
$(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.gz FORCE $(obj)/piggy.o: $(src)/vmlinux.scr $(obj)/vmlinux.bin.gz FORCE
$(call if_changed,ld) $(call if_changed,ld)

View file

@ -26,9 +26,11 @@
#include <linux/linkage.h> #include <linux/linkage.h>
#include <asm/segment.h> #include <asm/segment.h>
#include <asm/page.h> #include <asm/page.h>
#include <asm/boot.h>
.section ".text.head"
.globl startup_32 .globl startup_32
startup_32: startup_32:
cld cld
cli cli
@ -37,93 +39,142 @@ startup_32:
movl %eax,%es movl %eax,%es
movl %eax,%fs movl %eax,%fs
movl %eax,%gs movl %eax,%gs
movl %eax,%ss
lss stack_start,%esp /* Calculate the delta between where we were compiled to run
xorl %eax,%eax * at and where we were actually loaded at. This can only be done
1: incl %eax # check that A20 really IS enabled * with a short local call on x86. Nothing else will tell us what
movl %eax,0x000000 # loop forever if it isn't * address we are running at. The reserved chunk of the real-mode
cmpl %eax,0x100000 * data at 0x34-0x3f are used as the stack for this calculation.
je 1b * Only 4 bytes are needed.
*/
leal 0x40(%esi), %esp
call 1f
1: popl %ebp
subl $1b, %ebp
/* %ebp contains the address we are loaded at by the boot loader and %ebx
* contains the address where we should move the kernel image temporarily
* for safe in-place decompression.
*/
#ifdef CONFIG_RELOCATABLE
movl %ebp, %ebx
addl $(CONFIG_PHYSICAL_ALIGN - 1), %ebx
andl $(~(CONFIG_PHYSICAL_ALIGN - 1)), %ebx
#else
movl $LOAD_PHYSICAL_ADDR, %ebx
#endif
/* Replace the compressed data size with the uncompressed size */
subl input_len(%ebp), %ebx
movl output_len(%ebp), %eax
addl %eax, %ebx
/* Add 8 bytes for every 32K input block */
shrl $12, %eax
addl %eax, %ebx
/* Add 32K + 18 bytes of extra slack */
addl $(32768 + 18), %ebx
/* Align on a 4K boundary */
addl $4095, %ebx
andl $~4095, %ebx
/* Copy the compressed kernel to the end of our buffer
* where decompression in place becomes safe.
*/
pushl %esi
leal _end(%ebp), %esi
leal _end(%ebx), %edi
movl $(_end - startup_32), %ecx
std
rep
movsb
cld
popl %esi
/* Compute the kernel start address.
*/
#ifdef CONFIG_RELOCATABLE
addl $(CONFIG_PHYSICAL_ALIGN - 1), %ebp
andl $(~(CONFIG_PHYSICAL_ALIGN - 1)), %ebp
#else
movl $LOAD_PHYSICAL_ADDR, %ebp
#endif
/* /*
* Initialize eflags. Some BIOS's leave bits like NT set. This would * Jump to the relocated address.
* confuse the debugger if this code is traced.
* XXX - best to initialize before switching to protected mode.
*/ */
pushl $0 leal relocated(%ebx), %eax
popfl jmp *%eax
.section ".text"
relocated:
/* /*
* Clear BSS * Clear BSS
*/ */
xorl %eax,%eax xorl %eax,%eax
movl $_edata,%edi leal _edata(%ebx),%edi
movl $_end,%ecx leal _end(%ebx), %ecx
subl %edi,%ecx subl %edi,%ecx
cld cld
rep rep
stosb stosb
/*
* Setup the stack for the decompressor
*/
leal stack_end(%ebx), %esp
/* /*
* Do the decompression, and jump to the new kernel.. * Do the decompression, and jump to the new kernel..
*/ */
subl $16,%esp # place for structure on the stack movl output_len(%ebx), %eax
movl %esp,%eax pushl %eax
pushl %ebp # output address
movl input_len(%ebx), %eax
pushl %eax # input_len
leal input_data(%ebx), %eax
pushl %eax # input_data
leal _end(%ebx), %eax
pushl %eax # end of the image as third argument
pushl %esi # real mode pointer as second arg pushl %esi # real mode pointer as second arg
pushl %eax # address of structure as first arg
call decompress_kernel call decompress_kernel
orl %eax,%eax addl $20, %esp
jnz 3f popl %ecx
popl %esi # discard address
popl %esi # real mode pointer #if CONFIG_RELOCATABLE
xorl %ebx,%ebx /* Find the address of the relocations.
ljmp $(__BOOT_CS), $__PHYSICAL_START */
movl %ebp, %edi
addl %ecx, %edi
/* Calculate the delta between where vmlinux was compiled to run
* and where it was actually loaded.
*/
movl %ebp, %ebx
subl $LOAD_PHYSICAL_ADDR, %ebx
jz 2f /* Nothing to be done if loaded at compiled addr. */
/*
* Process relocations.
*/
1: subl $4, %edi
movl 0(%edi), %ecx
testl %ecx, %ecx
jz 2f
addl %ebx, -__PAGE_OFFSET(%ebx, %ecx)
jmp 1b
2:
#endif
/* /*
* We come here, if we were loaded high. * Jump to the decompressed kernel.
* We need to move the move-in-place routine down to 0x1000
* and then start it with the buffer addresses in registers,
* which we got from the stack.
*/ */
3:
movl $move_routine_start,%esi
movl $0x1000,%edi
movl $move_routine_end,%ecx
subl %esi,%ecx
addl $3,%ecx
shrl $2,%ecx
cld
rep
movsl
popl %esi # discard the address
popl %ebx # real mode pointer
popl %esi # low_buffer_start
popl %ecx # lcount
popl %edx # high_buffer_start
popl %eax # hcount
movl $__PHYSICAL_START,%edi
cli # make sure we don't get interrupted
ljmp $(__BOOT_CS), $0x1000 # and jump to the move routine
/*
* Routine (template) for moving the decompressed kernel in place,
* if we were high loaded. This _must_ PIC-code !
*/
move_routine_start:
movl %ecx,%ebp
shrl $2,%ecx
rep
movsl
movl %ebp,%ecx
andl $3,%ecx
rep
movsb
movl %edx,%esi
movl %eax,%ecx # NOTE: rep movsb won't move if %ecx == 0
addl $3,%ecx
shrl $2,%ecx
rep
movsl
movl %ebx,%esi # Restore setup pointer
xorl %ebx,%ebx xorl %ebx,%ebx
ljmp $(__BOOT_CS), $__PHYSICAL_START jmp *%ebp
move_routine_end:
.bss
.balign 4
stack:
.fill 4096, 1, 0
stack_end:

View file

@ -9,11 +9,94 @@
* High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996 * High loaded stuff by Hans Lermen & Werner Almesberger, Feb. 1996
*/ */
#undef CONFIG_PARAVIRT
#include <linux/linkage.h> #include <linux/linkage.h>
#include <linux/vmalloc.h> #include <linux/vmalloc.h>
#include <linux/screen_info.h> #include <linux/screen_info.h>
#include <asm/io.h> #include <asm/io.h>
#include <asm/page.h> #include <asm/page.h>
#include <asm/boot.h>
/* WARNING!!
* This code is compiled with -fPIC and it is relocated dynamically
* at run time, but no relocation processing is performed.
* This means that it is not safe to place pointers in static structures.
*/
/*
* Getting to provable safe in place decompression is hard.
* Worst case behaviours need to be analized.
* Background information:
*
* The file layout is:
* magic[2]
* method[1]
* flags[1]
* timestamp[4]
* extraflags[1]
* os[1]
* compressed data blocks[N]
* crc[4] orig_len[4]
*
* resulting in 18 bytes of non compressed data overhead.
*
* Files divided into blocks
* 1 bit (last block flag)
* 2 bits (block type)
*
* 1 block occurs every 32K -1 bytes or when there 50% compression has been achieved.
* The smallest block type encoding is always used.
*
* stored:
* 32 bits length in bytes.
*
* fixed:
* magic fixed tree.
* symbols.
*
* dynamic:
* dynamic tree encoding.
* symbols.
*
*
* The buffer for decompression in place is the length of the
* uncompressed data, plus a small amount extra to keep the algorithm safe.
* The compressed data is placed at the end of the buffer. The output
* pointer is placed at the start of the buffer and the input pointer
* is placed where the compressed data starts. Problems will occur
* when the output pointer overruns the input pointer.
*
* The output pointer can only overrun the input pointer if the input
* pointer is moving faster than the output pointer. A condition only
* triggered by data whose compressed form is larger than the uncompressed
* form.
*
* The worst case at the block level is a growth of the compressed data
* of 5 bytes per 32767 bytes.
*
* The worst case internal to a compressed block is very hard to figure.
* The worst case can at least be boundined by having one bit that represents
* 32764 bytes and then all of the rest of the bytes representing the very
* very last byte.
*
* All of which is enough to compute an amount of extra data that is required
* to be safe. To avoid problems at the block level allocating 5 extra bytes
* per 32767 bytes of data is sufficient. To avoind problems internal to a block
* adding an extra 32767 bytes (the worst case uncompressed block size) is
* sufficient, to ensure that in the worst case the decompressed data for
* block will stop the byte before the compressed data for a block begins.
* To avoid problems with the compressed data's meta information an extra 18
* bytes are needed. Leading to the formula:
*
* extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size.
*
* Adding 8 bytes per 32K is a bit excessive but much easier to calculate.
* Adding 32768 instead of 32767 just makes for round numbers.
* Adding the decompressor_size is necessary as it musht live after all
* of the data as well. Last I measured the decompressor is about 14K.
* 10K of actuall data and 4K of bss.
*
*/
/* /*
* gzip declarations * gzip declarations
@ -30,15 +113,20 @@ typedef unsigned char uch;
typedef unsigned short ush; typedef unsigned short ush;
typedef unsigned long ulg; typedef unsigned long ulg;
#define WSIZE 0x8000 /* Window size must be at least 32k, */ #define WSIZE 0x80000000 /* Window size must be at least 32k,
/* and a power of two */ * and a power of two
* We don't actually have a window just
* a huge output buffer so I report
* a 2G windows size, as that should
* always be larger than our output buffer.
*/
static uch *inbuf; /* input buffer */ static uch *inbuf; /* input buffer */
static uch window[WSIZE]; /* Sliding window buffer */ static uch *window; /* Sliding window buffer, (and final output buffer) */
static unsigned insize = 0; /* valid bytes in inbuf */ static unsigned insize; /* valid bytes in inbuf */
static unsigned inptr = 0; /* index of next byte to be processed in inbuf */ static unsigned inptr; /* index of next byte to be processed in inbuf */
static unsigned outcnt = 0; /* bytes in output buffer */ static unsigned outcnt; /* bytes in output buffer */
/* gzip flag byte */ /* gzip flag byte */
#define ASCII_FLAG 0x01 /* bit 0 set: file probably ASCII text */ #define ASCII_FLAG 0x01 /* bit 0 set: file probably ASCII text */
@ -89,8 +177,6 @@ extern unsigned char input_data[];
extern int input_len; extern int input_len;
static long bytes_out = 0; static long bytes_out = 0;
static uch *output_data;
static unsigned long output_ptr = 0;
static void *malloc(int size); static void *malloc(int size);
static void free(void *where); static void free(void *where);
@ -100,24 +186,17 @@ static void *memcpy(void *dest, const void *src, unsigned n);
static void putstr(const char *); static void putstr(const char *);
extern int end; static unsigned long free_mem_ptr;
static long free_mem_ptr = (long)&end; static unsigned long free_mem_end_ptr;
static long free_mem_end_ptr;
#define INPLACE_MOVE_ROUTINE 0x1000
#define LOW_BUFFER_START 0x2000
#define LOW_BUFFER_MAX 0x90000
#define HEAP_SIZE 0x3000 #define HEAP_SIZE 0x3000
static unsigned int low_buffer_end, low_buffer_size;
static int high_loaded =0;
static uch *high_buffer_start /* = (uch *)(((ulg)&end) + HEAP_SIZE)*/;
static char *vidmem = (char *)0xb8000; static char *vidmem = (char *)0xb8000;
static int vidport; static int vidport;
static int lines, cols; static int lines, cols;
#ifdef CONFIG_X86_NUMAQ #ifdef CONFIG_X86_NUMAQ
static void * xquad_portio = NULL; void *xquad_portio;
#endif #endif
#include "../../../../lib/inflate.c" #include "../../../../lib/inflate.c"
@ -151,7 +230,7 @@ static void gzip_mark(void **ptr)
static void gzip_release(void **ptr) static void gzip_release(void **ptr)
{ {
free_mem_ptr = (long) *ptr; free_mem_ptr = (unsigned long) *ptr;
} }
static void scroll(void) static void scroll(void)
@ -179,7 +258,7 @@ static void putstr(const char *s)
y--; y--;
} }
} else { } else {
vidmem [ ( x + cols * y ) * 2 ] = c; vidmem [ ( x + cols * y ) * 2 ] = c;
if ( ++x >= cols ) { if ( ++x >= cols ) {
x = 0; x = 0;
if ( ++y >= lines ) { if ( ++y >= lines ) {
@ -224,58 +303,31 @@ static void* memcpy(void* dest, const void* src, unsigned n)
*/ */
static int fill_inbuf(void) static int fill_inbuf(void)
{ {
if (insize != 0) { error("ran out of input data");
error("ran out of input data"); return 0;
}
inbuf = input_data;
insize = input_len;
inptr = 1;
return inbuf[0];
} }
/* =========================================================================== /* ===========================================================================
* Write the output window window[0..outcnt-1] and update crc and bytes_out. * Write the output window window[0..outcnt-1] and update crc and bytes_out.
* (Used for the decompressed data only.) * (Used for the decompressed data only.)
*/ */
static void flush_window_low(void)
{
ulg c = crc; /* temporary variable */
unsigned n;
uch *in, *out, ch;
in = window;
out = &output_data[output_ptr];
for (n = 0; n < outcnt; n++) {
ch = *out++ = *in++;
c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
}
crc = c;
bytes_out += (ulg)outcnt;
output_ptr += (ulg)outcnt;
outcnt = 0;
}
static void flush_window_high(void)
{
ulg c = crc; /* temporary variable */
unsigned n;
uch *in, ch;
in = window;
for (n = 0; n < outcnt; n++) {
ch = *output_data++ = *in++;
if ((ulg)output_data == low_buffer_end) output_data=high_buffer_start;
c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
}
crc = c;
bytes_out += (ulg)outcnt;
outcnt = 0;
}
static void flush_window(void) static void flush_window(void)
{ {
if (high_loaded) flush_window_high(); /* With my window equal to my output buffer
else flush_window_low(); * I only need to compute the crc here.
*/
ulg c = crc; /* temporary variable */
unsigned n;
uch *in, ch;
in = window;
for (n = 0; n < outcnt; n++) {
ch = *in++;
c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
}
crc = c;
bytes_out += (ulg)outcnt;
outcnt = 0;
} }
static void error(char *x) static void error(char *x)
@ -287,66 +339,8 @@ static void error(char *x)
while(1); /* Halt */ while(1); /* Halt */
} }
#define STACK_SIZE (4096) asmlinkage void decompress_kernel(void *rmode, unsigned long end,
uch *input_data, unsigned long input_len, uch *output)
long user_stack [STACK_SIZE];
struct {
long * a;
short b;
} stack_start = { & user_stack [STACK_SIZE] , __BOOT_DS };
static void setup_normal_output_buffer(void)
{
#ifdef STANDARD_MEMORY_BIOS_CALL
if (RM_EXT_MEM_K < 1024) error("Less than 2MB of memory");
#else
if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < 1024) error("Less than 2MB of memory");
#endif
output_data = (unsigned char *)__PHYSICAL_START; /* Normally Points to 1M */
free_mem_end_ptr = (long)real_mode;
}
struct moveparams {
uch *low_buffer_start; int lcount;
uch *high_buffer_start; int hcount;
};
static void setup_output_buffer_if_we_run_high(struct moveparams *mv)
{
high_buffer_start = (uch *)(((ulg)&end) + HEAP_SIZE);
#ifdef STANDARD_MEMORY_BIOS_CALL
if (RM_EXT_MEM_K < (3*1024)) error("Less than 4MB of memory");
#else
if ((RM_ALT_MEM_K > RM_EXT_MEM_K ? RM_ALT_MEM_K : RM_EXT_MEM_K) < (3*1024)) error("Less than 4MB of memory");
#endif
mv->low_buffer_start = output_data = (unsigned char *)LOW_BUFFER_START;
low_buffer_end = ((unsigned int)real_mode > LOW_BUFFER_MAX
? LOW_BUFFER_MAX : (unsigned int)real_mode) & ~0xfff;
low_buffer_size = low_buffer_end - LOW_BUFFER_START;
high_loaded = 1;
free_mem_end_ptr = (long)high_buffer_start;
if ( (__PHYSICAL_START + low_buffer_size) > ((ulg)high_buffer_start)) {
high_buffer_start = (uch *)(__PHYSICAL_START + low_buffer_size);
mv->hcount = 0; /* say: we need not to move high_buffer */
}
else mv->hcount = -1;
mv->high_buffer_start = high_buffer_start;
}
static void close_output_buffer_if_we_run_high(struct moveparams *mv)
{
if (bytes_out > low_buffer_size) {
mv->lcount = low_buffer_size;
if (mv->hcount)
mv->hcount = bytes_out - low_buffer_size;
} else {
mv->lcount = bytes_out;
mv->hcount = 0;
}
}
asmlinkage int decompress_kernel(struct moveparams *mv, void *rmode)
{ {
real_mode = rmode; real_mode = rmode;
@ -361,13 +355,25 @@ asmlinkage int decompress_kernel(struct moveparams *mv, void *rmode)
lines = RM_SCREEN_INFO.orig_video_lines; lines = RM_SCREEN_INFO.orig_video_lines;
cols = RM_SCREEN_INFO.orig_video_cols; cols = RM_SCREEN_INFO.orig_video_cols;
if (free_mem_ptr < 0x100000) setup_normal_output_buffer(); window = output; /* Output buffer (Normally at 1M) */
else setup_output_buffer_if_we_run_high(mv); free_mem_ptr = end; /* Heap */
free_mem_end_ptr = end + HEAP_SIZE;
inbuf = input_data; /* Input buffer */
insize = input_len;
inptr = 0;
if ((u32)output & (CONFIG_PHYSICAL_ALIGN -1))
error("Destination address not CONFIG_PHYSICAL_ALIGN aligned");
if (end > ((-__PAGE_OFFSET-(512 <<20)-1) & 0x7fffffff))
error("Destination address too large");
#ifndef CONFIG_RELOCATABLE
if ((u32)output != LOAD_PHYSICAL_ADDR)
error("Wrong destination address");
#endif
makecrc(); makecrc();
putstr("Uncompressing Linux... "); putstr("Uncompressing Linux... ");
gunzip(); gunzip();
putstr("Ok, booting the kernel.\n"); putstr("Ok, booting the kernel.\n");
if (high_loaded) close_output_buffer_if_we_run_high(mv); return;
return high_loaded;
} }

View file

@ -0,0 +1,625 @@
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <elf.h>
#include <byteswap.h>
#define USE_BSD
#include <endian.h>
#define MAX_SHDRS 100
static Elf32_Ehdr ehdr;
static Elf32_Shdr shdr[MAX_SHDRS];
static Elf32_Sym *symtab[MAX_SHDRS];
static Elf32_Rel *reltab[MAX_SHDRS];
static char *strtab[MAX_SHDRS];
static unsigned long reloc_count, reloc_idx;
static unsigned long *relocs;
/*
* Following symbols have been audited. There values are constant and do
* not change if bzImage is loaded at a different physical address than
* the address for which it has been compiled. Don't warn user about
* absolute relocations present w.r.t these symbols.
*/
static const char* safe_abs_relocs[] = {
"__kernel_vsyscall",
"__kernel_rt_sigreturn",
"__kernel_sigreturn",
"SYSENTER_RETURN",
};
static int is_safe_abs_reloc(const char* sym_name)
{
int i, array_size;
array_size = sizeof(safe_abs_relocs)/sizeof(char*);
for(i = 0; i < array_size; i++) {
if (!strcmp(sym_name, safe_abs_relocs[i]))
/* Match found */
return 1;
}
return 0;
}
static void die(char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
vfprintf(stderr, fmt, ap);
va_end(ap);
exit(1);
}
static const char *sym_type(unsigned type)
{
static const char *type_name[] = {
#define SYM_TYPE(X) [X] = #X
SYM_TYPE(STT_NOTYPE),
SYM_TYPE(STT_OBJECT),
SYM_TYPE(STT_FUNC),
SYM_TYPE(STT_SECTION),
SYM_TYPE(STT_FILE),
SYM_TYPE(STT_COMMON),
SYM_TYPE(STT_TLS),
#undef SYM_TYPE
};
const char *name = "unknown sym type name";
if (type < sizeof(type_name)/sizeof(type_name[0])) {
name = type_name[type];
}
return name;
}
static const char *sym_bind(unsigned bind)
{
static const char *bind_name[] = {
#define SYM_BIND(X) [X] = #X
SYM_BIND(STB_LOCAL),
SYM_BIND(STB_GLOBAL),
SYM_BIND(STB_WEAK),
#undef SYM_BIND
};
const char *name = "unknown sym bind name";
if (bind < sizeof(bind_name)/sizeof(bind_name[0])) {
name = bind_name[bind];
}
return name;
}
static const char *sym_visibility(unsigned visibility)
{
static const char *visibility_name[] = {
#define SYM_VISIBILITY(X) [X] = #X
SYM_VISIBILITY(STV_DEFAULT),
SYM_VISIBILITY(STV_INTERNAL),
SYM_VISIBILITY(STV_HIDDEN),
SYM_VISIBILITY(STV_PROTECTED),
#undef SYM_VISIBILITY
};
const char *name = "unknown sym visibility name";
if (visibility < sizeof(visibility_name)/sizeof(visibility_name[0])) {
name = visibility_name[visibility];
}
return name;
}
static const char *rel_type(unsigned type)
{
static const char *type_name[] = {
#define REL_TYPE(X) [X] = #X
REL_TYPE(R_386_NONE),
REL_TYPE(R_386_32),
REL_TYPE(R_386_PC32),
REL_TYPE(R_386_GOT32),
REL_TYPE(R_386_PLT32),
REL_TYPE(R_386_COPY),
REL_TYPE(R_386_GLOB_DAT),
REL_TYPE(R_386_JMP_SLOT),
REL_TYPE(R_386_RELATIVE),
REL_TYPE(R_386_GOTOFF),
REL_TYPE(R_386_GOTPC),
#undef REL_TYPE
};
const char *name = "unknown type rel type name";
if (type < sizeof(type_name)/sizeof(type_name[0])) {
name = type_name[type];
}
return name;
}
static const char *sec_name(unsigned shndx)
{
const char *sec_strtab;
const char *name;
sec_strtab = strtab[ehdr.e_shstrndx];
name = "<noname>";
if (shndx < ehdr.e_shnum) {
name = sec_strtab + shdr[shndx].sh_name;
}
else if (shndx == SHN_ABS) {
name = "ABSOLUTE";
}
else if (shndx == SHN_COMMON) {
name = "COMMON";
}
return name;
}
static const char *sym_name(const char *sym_strtab, Elf32_Sym *sym)
{
const char *name;
name = "<noname>";
if (sym->st_name) {
name = sym_strtab + sym->st_name;
}
else {
name = sec_name(shdr[sym->st_shndx].sh_name);
}
return name;
}
#if BYTE_ORDER == LITTLE_ENDIAN
#define le16_to_cpu(val) (val)
#define le32_to_cpu(val) (val)
#endif
#if BYTE_ORDER == BIG_ENDIAN
#define le16_to_cpu(val) bswap_16(val)
#define le32_to_cpu(val) bswap_32(val)
#endif
static uint16_t elf16_to_cpu(uint16_t val)
{
return le16_to_cpu(val);
}
static uint32_t elf32_to_cpu(uint32_t val)
{
return le32_to_cpu(val);
}
static void read_ehdr(FILE *fp)
{
if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1) {
die("Cannot read ELF header: %s\n",
strerror(errno));
}
if (memcmp(ehdr.e_ident, ELFMAG, 4) != 0) {
die("No ELF magic\n");
}
if (ehdr.e_ident[EI_CLASS] != ELFCLASS32) {
die("Not a 32 bit executable\n");
}
if (ehdr.e_ident[EI_DATA] != ELFDATA2LSB) {
die("Not a LSB ELF executable\n");
}
if (ehdr.e_ident[EI_VERSION] != EV_CURRENT) {
die("Unknown ELF version\n");
}
/* Convert the fields to native endian */
ehdr.e_type = elf16_to_cpu(ehdr.e_type);
ehdr.e_machine = elf16_to_cpu(ehdr.e_machine);
ehdr.e_version = elf32_to_cpu(ehdr.e_version);
ehdr.e_entry = elf32_to_cpu(ehdr.e_entry);
ehdr.e_phoff = elf32_to_cpu(ehdr.e_phoff);
ehdr.e_shoff = elf32_to_cpu(ehdr.e_shoff);
ehdr.e_flags = elf32_to_cpu(ehdr.e_flags);
ehdr.e_ehsize = elf16_to_cpu(ehdr.e_ehsize);
ehdr.e_phentsize = elf16_to_cpu(ehdr.e_phentsize);
ehdr.e_phnum = elf16_to_cpu(ehdr.e_phnum);
ehdr.e_shentsize = elf16_to_cpu(ehdr.e_shentsize);
ehdr.e_shnum = elf16_to_cpu(ehdr.e_shnum);
ehdr.e_shstrndx = elf16_to_cpu(ehdr.e_shstrndx);
if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
die("Unsupported ELF header type\n");
}
if (ehdr.e_machine != EM_386) {
die("Not for x86\n");
}
if (ehdr.e_version != EV_CURRENT) {
die("Unknown ELF version\n");
}
if (ehdr.e_ehsize != sizeof(Elf32_Ehdr)) {
die("Bad Elf header size\n");
}
if (ehdr.e_phentsize != sizeof(Elf32_Phdr)) {
die("Bad program header entry\n");
}
if (ehdr.e_shentsize != sizeof(Elf32_Shdr)) {
die("Bad section header entry\n");
}
if (ehdr.e_shstrndx >= ehdr.e_shnum) {
die("String table index out of bounds\n");
}
}
static void read_shdrs(FILE *fp)
{
int i;
if (ehdr.e_shnum > MAX_SHDRS) {
die("%d section headers supported: %d\n",
ehdr.e_shnum, MAX_SHDRS);
}
if (fseek(fp, ehdr.e_shoff, SEEK_SET) < 0) {
die("Seek to %d failed: %s\n",
ehdr.e_shoff, strerror(errno));
}
if (fread(&shdr, sizeof(shdr[0]), ehdr.e_shnum, fp) != ehdr.e_shnum) {
die("Cannot read ELF section headers: %s\n",
strerror(errno));
}
for(i = 0; i < ehdr.e_shnum; i++) {
shdr[i].sh_name = elf32_to_cpu(shdr[i].sh_name);
shdr[i].sh_type = elf32_to_cpu(shdr[i].sh_type);
shdr[i].sh_flags = elf32_to_cpu(shdr[i].sh_flags);
shdr[i].sh_addr = elf32_to_cpu(shdr[i].sh_addr);
shdr[i].sh_offset = elf32_to_cpu(shdr[i].sh_offset);
shdr[i].sh_size = elf32_to_cpu(shdr[i].sh_size);
shdr[i].sh_link = elf32_to_cpu(shdr[i].sh_link);
shdr[i].sh_info = elf32_to_cpu(shdr[i].sh_info);
shdr[i].sh_addralign = elf32_to_cpu(shdr[i].sh_addralign);
shdr[i].sh_entsize = elf32_to_cpu(shdr[i].sh_entsize);
}
}
static void read_strtabs(FILE *fp)
{
int i;
for(i = 0; i < ehdr.e_shnum; i++) {
if (shdr[i].sh_type != SHT_STRTAB) {
continue;
}
strtab[i] = malloc(shdr[i].sh_size);
if (!strtab[i]) {
die("malloc of %d bytes for strtab failed\n",
shdr[i].sh_size);
}
if (fseek(fp, shdr[i].sh_offset, SEEK_SET) < 0) {
die("Seek to %d failed: %s\n",
shdr[i].sh_offset, strerror(errno));
}
if (fread(strtab[i], 1, shdr[i].sh_size, fp) != shdr[i].sh_size) {
die("Cannot read symbol table: %s\n",
strerror(errno));
}
}
}
static void read_symtabs(FILE *fp)
{
int i,j;
for(i = 0; i < ehdr.e_shnum; i++) {
if (shdr[i].sh_type != SHT_SYMTAB) {
continue;
}
symtab[i] = malloc(shdr[i].sh_size);
if (!symtab[i]) {
die("malloc of %d bytes for symtab failed\n",
shdr[i].sh_size);
}
if (fseek(fp, shdr[i].sh_offset, SEEK_SET) < 0) {
die("Seek to %d failed: %s\n",
shdr[i].sh_offset, strerror(errno));
}
if (fread(symtab[i], 1, shdr[i].sh_size, fp) != shdr[i].sh_size) {
die("Cannot read symbol table: %s\n",
strerror(errno));
}
for(j = 0; j < shdr[i].sh_size/sizeof(symtab[i][0]); j++) {
symtab[i][j].st_name = elf32_to_cpu(symtab[i][j].st_name);
symtab[i][j].st_value = elf32_to_cpu(symtab[i][j].st_value);
symtab[i][j].st_size = elf32_to_cpu(symtab[i][j].st_size);
symtab[i][j].st_shndx = elf16_to_cpu(symtab[i][j].st_shndx);
}
}
}
static void read_relocs(FILE *fp)
{
int i,j;
for(i = 0; i < ehdr.e_shnum; i++) {
if (shdr[i].sh_type != SHT_REL) {
continue;
}
reltab[i] = malloc(shdr[i].sh_size);
if (!reltab[i]) {
die("malloc of %d bytes for relocs failed\n",
shdr[i].sh_size);
}
if (fseek(fp, shdr[i].sh_offset, SEEK_SET) < 0) {
die("Seek to %d failed: %s\n",
shdr[i].sh_offset, strerror(errno));
}
if (fread(reltab[i], 1, shdr[i].sh_size, fp) != shdr[i].sh_size) {
die("Cannot read symbol table: %s\n",
strerror(errno));
}
for(j = 0; j < shdr[i].sh_size/sizeof(reltab[0][0]); j++) {
reltab[i][j].r_offset = elf32_to_cpu(reltab[i][j].r_offset);
reltab[i][j].r_info = elf32_to_cpu(reltab[i][j].r_info);
}
}
}
static void print_absolute_symbols(void)
{
int i;
printf("Absolute symbols\n");
printf(" Num: Value Size Type Bind Visibility Name\n");
for(i = 0; i < ehdr.e_shnum; i++) {
char *sym_strtab;
Elf32_Sym *sh_symtab;
int j;
if (shdr[i].sh_type != SHT_SYMTAB) {
continue;
}
sh_symtab = symtab[i];
sym_strtab = strtab[shdr[i].sh_link];
for(j = 0; j < shdr[i].sh_size/sizeof(symtab[0][0]); j++) {
Elf32_Sym *sym;
const char *name;
sym = &symtab[i][j];
name = sym_name(sym_strtab, sym);
if (sym->st_shndx != SHN_ABS) {
continue;
}
printf("%5d %08x %5d %10s %10s %12s %s\n",
j, sym->st_value, sym->st_size,
sym_type(ELF32_ST_TYPE(sym->st_info)),
sym_bind(ELF32_ST_BIND(sym->st_info)),
sym_visibility(ELF32_ST_VISIBILITY(sym->st_other)),
name);
}
}
printf("\n");
}
static void print_absolute_relocs(void)
{
int i, printed = 0;
for(i = 0; i < ehdr.e_shnum; i++) {
char *sym_strtab;
Elf32_Sym *sh_symtab;
unsigned sec_applies, sec_symtab;
int j;
if (shdr[i].sh_type != SHT_REL) {
continue;
}
sec_symtab = shdr[i].sh_link;
sec_applies = shdr[i].sh_info;
if (!(shdr[sec_applies].sh_flags & SHF_ALLOC)) {
continue;
}
sh_symtab = symtab[sec_symtab];
sym_strtab = strtab[shdr[sec_symtab].sh_link];
for(j = 0; j < shdr[i].sh_size/sizeof(reltab[0][0]); j++) {
Elf32_Rel *rel;
Elf32_Sym *sym;
const char *name;
rel = &reltab[i][j];
sym = &sh_symtab[ELF32_R_SYM(rel->r_info)];
name = sym_name(sym_strtab, sym);
if (sym->st_shndx != SHN_ABS) {
continue;
}
/* Absolute symbols are not relocated if bzImage is
* loaded at a non-compiled address. Display a warning
* to user at compile time about the absolute
* relocations present.
*
* User need to audit the code to make sure
* some symbols which should have been section
* relative have not become absolute because of some
* linker optimization or wrong programming usage.
*
* Before warning check if this absolute symbol
* relocation is harmless.
*/
if (is_safe_abs_reloc(name))
continue;
if (!printed) {
printf("WARNING: Absolute relocations"
" present\n");
printf("Offset Info Type Sym.Value "
"Sym.Name\n");
printed = 1;
}
printf("%08x %08x %10s %08x %s\n",
rel->r_offset,
rel->r_info,
rel_type(ELF32_R_TYPE(rel->r_info)),
sym->st_value,
name);
}
}
if (printed)
printf("\n");
}
static void walk_relocs(void (*visit)(Elf32_Rel *rel, Elf32_Sym *sym))
{
int i;
/* Walk through the relocations */
for(i = 0; i < ehdr.e_shnum; i++) {
char *sym_strtab;
Elf32_Sym *sh_symtab;
unsigned sec_applies, sec_symtab;
int j;
if (shdr[i].sh_type != SHT_REL) {
continue;
}
sec_symtab = shdr[i].sh_link;
sec_applies = shdr[i].sh_info;
if (!(shdr[sec_applies].sh_flags & SHF_ALLOC)) {
continue;
}
sh_symtab = symtab[sec_symtab];
sym_strtab = strtab[shdr[sec_symtab].sh_link];
for(j = 0; j < shdr[i].sh_size/sizeof(reltab[0][0]); j++) {
Elf32_Rel *rel;
Elf32_Sym *sym;
unsigned r_type;
rel = &reltab[i][j];
sym = &sh_symtab[ELF32_R_SYM(rel->r_info)];
r_type = ELF32_R_TYPE(rel->r_info);
/* Don't visit relocations to absolute symbols */
if (sym->st_shndx == SHN_ABS) {
continue;
}
if (r_type == R_386_PC32) {
/* PC relative relocations don't need to be adjusted */
}
else if (r_type == R_386_32) {
/* Visit relocations that need to be adjusted */
visit(rel, sym);
}
else {
die("Unsupported relocation type: %d\n", r_type);
}
}
}
}
static void count_reloc(Elf32_Rel *rel, Elf32_Sym *sym)
{
reloc_count += 1;
}
static void collect_reloc(Elf32_Rel *rel, Elf32_Sym *sym)
{
/* Remember the address that needs to be adjusted. */
relocs[reloc_idx++] = rel->r_offset;
}
static int cmp_relocs(const void *va, const void *vb)
{
const unsigned long *a, *b;
a = va; b = vb;
return (*a == *b)? 0 : (*a > *b)? 1 : -1;
}
static void emit_relocs(int as_text)
{
int i;
/* Count how many relocations I have and allocate space for them. */
reloc_count = 0;
walk_relocs(count_reloc);
relocs = malloc(reloc_count * sizeof(relocs[0]));
if (!relocs) {
die("malloc of %d entries for relocs failed\n",
reloc_count);
}
/* Collect up the relocations */
reloc_idx = 0;
walk_relocs(collect_reloc);
/* Order the relocations for more efficient processing */
qsort(relocs, reloc_count, sizeof(relocs[0]), cmp_relocs);
/* Print the relocations */
if (as_text) {
/* Print the relocations in a form suitable that
* gas will like.
*/
printf(".section \".data.reloc\",\"a\"\n");
printf(".balign 4\n");
for(i = 0; i < reloc_count; i++) {
printf("\t .long 0x%08lx\n", relocs[i]);
}
printf("\n");
}
else {
unsigned char buf[4];
buf[0] = buf[1] = buf[2] = buf[3] = 0;
/* Print a stop */
printf("%c%c%c%c", buf[0], buf[1], buf[2], buf[3]);
/* Now print each relocation */
for(i = 0; i < reloc_count; i++) {
buf[0] = (relocs[i] >> 0) & 0xff;
buf[1] = (relocs[i] >> 8) & 0xff;
buf[2] = (relocs[i] >> 16) & 0xff;
buf[3] = (relocs[i] >> 24) & 0xff;
printf("%c%c%c%c", buf[0], buf[1], buf[2], buf[3]);
}
}
}
static void usage(void)
{
die("relocs [--abs-syms |--abs-relocs | --text] vmlinux\n");
}
int main(int argc, char **argv)
{
int show_absolute_syms, show_absolute_relocs;
int as_text;
const char *fname;
FILE *fp;
int i;
show_absolute_syms = 0;
show_absolute_relocs = 0;
as_text = 0;
fname = NULL;
for(i = 1; i < argc; i++) {
char *arg = argv[i];
if (*arg == '-') {
if (strcmp(argv[1], "--abs-syms") == 0) {
show_absolute_syms = 1;
continue;
}
if (strcmp(argv[1], "--abs-relocs") == 0) {
show_absolute_relocs = 1;
continue;
}
else if (strcmp(argv[1], "--text") == 0) {
as_text = 1;
continue;
}
}
else if (!fname) {
fname = arg;
continue;
}
usage();
}
if (!fname) {
usage();
}
fp = fopen(fname, "r");
if (!fp) {
die("Cannot open %s: %s\n",
fname, strerror(errno));
}
read_ehdr(fp);
read_shdrs(fp);
read_strtabs(fp);
read_symtabs(fp);
read_relocs(fp);
if (show_absolute_syms) {
print_absolute_symbols();
return 0;
}
if (show_absolute_relocs) {
print_absolute_relocs();
return 0;
}
emit_relocs(as_text);
return 0;
}

View file

@ -0,0 +1,43 @@
OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386)
ENTRY(startup_32)
SECTIONS
{
/* Be careful parts of head.S assume startup_32 is at
* address 0.
*/
. = 0 ;
.text.head : {
_head = . ;
*(.text.head)
_ehead = . ;
}
.data.compressed : {
*(.data.compressed)
}
.text : {
_text = .; /* Text */
*(.text)
*(.text.*)
_etext = . ;
}
.rodata : {
_rodata = . ;
*(.rodata) /* read-only data */
*(.rodata.*)
_erodata = . ;
}
.data : {
_data = . ;
*(.data)
*(.data.*)
_edata = . ;
}
.bss : {
_bss = . ;
*(.bss)
*(.bss.*)
*(COMMON)
_end = . ;
}
}

View file

@ -1,9 +1,10 @@
SECTIONS SECTIONS
{ {
.data : { .data.compressed : {
input_len = .; input_len = .;
LONG(input_data_end - input_data) input_data = .; LONG(input_data_end - input_data) input_data = .;
*(.data) *(.data)
output_len = . - 4;
input_data_end = .; input_data_end = .;
} }
} }

View file

@ -81,7 +81,7 @@ start:
# This is the setup header, and it must start at %cs:2 (old 0x9020:2) # This is the setup header, and it must start at %cs:2 (old 0x9020:2)
.ascii "HdrS" # header signature .ascii "HdrS" # header signature
.word 0x0204 # header version number (>= 0x0105) .word 0x0205 # header version number (>= 0x0105)
# or else old loadlin-1.5 will fail) # or else old loadlin-1.5 will fail)
realmode_swtch: .word 0, 0 # default_switch, SETUPSEG realmode_swtch: .word 0, 0 # default_switch, SETUPSEG
start_sys_seg: .word SYSSEG start_sys_seg: .word SYSSEG
@ -160,6 +160,17 @@ ramdisk_max: .long (-__PAGE_OFFSET-(512 << 20)-1) & 0x7fffffff
# The highest safe address for # The highest safe address for
# the contents of an initrd # the contents of an initrd
kernel_alignment: .long CONFIG_PHYSICAL_ALIGN #physical addr alignment
#required for protected mode
#kernel
#ifdef CONFIG_RELOCATABLE
relocatable_kernel: .byte 1
#else
relocatable_kernel: .byte 0
#endif
pad2: .byte 0
pad3: .word 0
trampoline: call start_of_setup trampoline: call start_of_setup
.align 16 .align 16
# The offset at this point is 0x240 # The offset at this point is 0x240
@ -588,11 +599,6 @@ rmodeswtch_normal:
call default_switch call default_switch
rmodeswtch_end: rmodeswtch_end:
# we get the code32 start address and modify the below 'jmpi'
# (loader may have changed it)
movl %cs:code32_start, %eax
movl %eax, %cs:code32
# Now we move the system to its rightful place ... but we check if we have a # Now we move the system to its rightful place ... but we check if we have a
# big-kernel. In that case we *must* not move it ... # big-kernel. In that case we *must* not move it ...
testb $LOADED_HIGH, %cs:loadflags testb $LOADED_HIGH, %cs:loadflags
@ -788,11 +794,12 @@ a20_err_msg:
a20_done: a20_done:
#endif /* CONFIG_X86_VOYAGER */ #endif /* CONFIG_X86_VOYAGER */
# set up gdt and idt # set up gdt and idt and 32bit start address
lidt idt_48 # load idt with 0,0 lidt idt_48 # load idt with 0,0
xorl %eax, %eax # Compute gdt_base xorl %eax, %eax # Compute gdt_base
movw %ds, %ax # (Convert %ds:gdt to a linear ptr) movw %ds, %ax # (Convert %ds:gdt to a linear ptr)
shll $4, %eax shll $4, %eax
addl %eax, code32
addl $gdt, %eax addl $gdt, %eax
movl %eax, (gdt_48+2) movl %eax, (gdt_48+2)
lgdt gdt_48 # load gdt with whatever is lgdt gdt_48 # load gdt with whatever is
@ -851,9 +858,26 @@ flush_instr:
# Manual, Mixing 16-bit and 32-bit code, page 16-6) # Manual, Mixing 16-bit and 32-bit code, page 16-6)
.byte 0x66, 0xea # prefix + jmpi-opcode .byte 0x66, 0xea # prefix + jmpi-opcode
code32: .long 0x1000 # will be set to 0x100000 code32: .long startup_32 # will be set to %cs+startup_32
# for big kernels
.word __BOOT_CS .word __BOOT_CS
.code32
startup_32:
movl $(__BOOT_DS), %eax
movl %eax, %ds
movl %eax, %es
movl %eax, %fs
movl %eax, %gs
movl %eax, %ss
xorl %eax, %eax
1: incl %eax # check that A20 really IS enabled
movl %eax, 0x00000000 # loop forever if it isn't
cmpl %eax, 0x00100000
je 1b
# Jump to the 32bit entry point
jmpl *(code32_start - start + (DELTA_INITSEG << 4))(%esi)
.code16
# Here's a bunch of information about your current kernel.. # Here's a bunch of information about your current kernel..
kernel_version: .ascii UTS_RELEASE kernel_version: .ascii UTS_RELEASE

View file

@ -1,7 +1,7 @@
# #
# Automatically generated make config: don't edit # Automatically generated make config: don't edit
# Linux kernel version: 2.6.19-rc2-git4 # Linux kernel version: 2.6.19-git7
# Sat Oct 21 03:38:56 2006 # Wed Dec 6 23:50:49 2006
# #
CONFIG_X86_32=y CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y CONFIG_GENERIC_TIME=y
@ -40,13 +40,14 @@ CONFIG_POSIX_MQUEUE=y
CONFIG_IKCONFIG=y CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y CONFIG_IKCONFIG_PROC=y
# CONFIG_CPUSETS is not set # CONFIG_CPUSETS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set # CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE="" CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set # CONFIG_EMBEDDED is not set
CONFIG_UID16=y CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set
@ -110,6 +111,7 @@ CONFIG_SMP=y
# CONFIG_X86_VISWS is not set # CONFIG_X86_VISWS is not set
CONFIG_X86_GENERICARCH=y CONFIG_X86_GENERICARCH=y
# CONFIG_X86_ES7000 is not set # CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
CONFIG_X86_CYCLONE_TIMER=y CONFIG_X86_CYCLONE_TIMER=y
# CONFIG_M386 is not set # CONFIG_M386 is not set
# CONFIG_M486 is not set # CONFIG_M486 is not set
@ -120,6 +122,7 @@ CONFIG_X86_CYCLONE_TIMER=y
# CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMII is not set
CONFIG_MPENTIUMIII=y CONFIG_MPENTIUMIII=y
# CONFIG_MPENTIUMM is not set # CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set # CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set # CONFIG_MK6 is not set
# CONFIG_MK7 is not set # CONFIG_MK7 is not set
@ -197,7 +200,6 @@ CONFIG_RESOURCES_64BIT=y
CONFIG_MTRR=y CONFIG_MTRR=y
# CONFIG_EFI is not set # CONFIG_EFI is not set
# CONFIG_IRQBALANCE is not set # CONFIG_IRQBALANCE is not set
CONFIG_REGPARM=y
CONFIG_SECCOMP=y CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set # CONFIG_HZ_100 is not set
CONFIG_HZ_250=y CONFIG_HZ_250=y
@ -205,7 +207,8 @@ CONFIG_HZ_250=y
CONFIG_HZ=250 CONFIG_HZ=250
# CONFIG_KEXEC is not set # CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set # CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x100000 # CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x100000
# CONFIG_HOTPLUG_CPU is not set # CONFIG_HOTPLUG_CPU is not set
CONFIG_COMPAT_VDSO=y CONFIG_COMPAT_VDSO=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
@ -367,6 +370,7 @@ CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set # CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic" CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y CONFIG_IPV6=y
# CONFIG_IPV6_PRIVACY is not set # CONFIG_IPV6_PRIVACY is not set
# CONFIG_IPV6_ROUTER_PREF is not set # CONFIG_IPV6_ROUTER_PREF is not set
@ -677,6 +681,7 @@ CONFIG_SATA_INTEL_COMBINED=y
# CONFIG_PATA_IT821X is not set # CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set # CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_TRIFLEX is not set # CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_MPIIX is not set # CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_OLDPIIX is not set # CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set # CONFIG_PATA_NETCELL is not set
@ -850,6 +855,7 @@ CONFIG_BNX2=y
# CONFIG_IXGB is not set # CONFIG_IXGB is not set
# CONFIG_S2IO is not set # CONFIG_S2IO is not set
# CONFIG_MYRI10GE is not set # CONFIG_MYRI10GE is not set
# CONFIG_NETXEN_NIC is not set
# #
# Token Ring devices # Token Ring devices
@ -984,10 +990,6 @@ CONFIG_RTC=y
# CONFIG_R3964 is not set # CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set # CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set # CONFIG_SONYPI is not set
#
# Ftape, the floppy tape device driver
#
CONFIG_AGP=y CONFIG_AGP=y
# CONFIG_AGP_ALI is not set # CONFIG_AGP_ALI is not set
# CONFIG_AGP_ATI is not set # CONFIG_AGP_ATI is not set
@ -1108,6 +1110,7 @@ CONFIG_USB_DEVICEFS=y
# CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set # CONFIG_USB_SUSPEND is not set
# CONFIG_USB_MULTITHREAD_PROBE is not set
# CONFIG_USB_OTG is not set # CONFIG_USB_OTG is not set
# #
@ -1185,6 +1188,7 @@ CONFIG_USB_HIDINPUT=y
# CONFIG_USB_KAWETH is not set # CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set # CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set # CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET_MII is not set
# CONFIG_USB_USBNET is not set # CONFIG_USB_USBNET is not set
CONFIG_USB_MON=y CONFIG_USB_MON=y

View file

@ -6,7 +6,7 @@ extra-y := head.o init_task.o vmlinux.lds
obj-y := process.o signal.o entry.o traps.o irq.o \ obj-y := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \ ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o bootflag.o \ pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
quirks.o i8237.o topology.o alternative.o i8253.o tsc.o quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-$(CONFIG_STACKTRACE) += stacktrace.o
@ -40,6 +40,9 @@ obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
obj-$(CONFIG_HPET_TIMER) += hpet.o obj-$(CONFIG_HPET_TIMER) += hpet.o
obj-$(CONFIG_K8_NB) += k8.o obj-$(CONFIG_K8_NB) += k8.o
# Make sure this is linked after any other paravirt_ops structs: see head.S
obj-$(CONFIG_PARAVIRT) += paravirt.o
EXTRA_AFLAGS := -traditional EXTRA_AFLAGS := -traditional
obj-$(CONFIG_SCx200) += scx200.o obj-$(CONFIG_SCx200) += scx200.o

View file

@ -10,6 +10,7 @@
#include <asm/pci-direct.h> #include <asm/pci-direct.h>
#include <asm/acpi.h> #include <asm/acpi.h>
#include <asm/apic.h> #include <asm/apic.h>
#include <asm/irq.h>
#ifdef CONFIG_ACPI #ifdef CONFIG_ACPI
@ -49,6 +50,24 @@ static int __init check_bridge(int vendor, int device)
return 0; return 0;
} }
static void check_intel(void)
{
u16 vendor, device;
vendor = read_pci_config_16(0, 0, 0, PCI_VENDOR_ID);
if (vendor != PCI_VENDOR_ID_INTEL)
return;
device = read_pci_config_16(0, 0, 0, PCI_DEVICE_ID);
#ifdef CONFIG_SMP
if (device == PCI_DEVICE_ID_INTEL_E7320_MCH ||
device == PCI_DEVICE_ID_INTEL_E7520_MCH ||
device == PCI_DEVICE_ID_INTEL_E7525_MCH)
quirk_intel_irqbalance();
#endif
}
void __init check_acpi_pci(void) void __init check_acpi_pci(void)
{ {
int num, slot, func; int num, slot, func;
@ -60,6 +79,8 @@ void __init check_acpi_pci(void)
if (!early_pci_allowed()) if (!early_pci_allowed())
return; return;
check_intel();
/* Poor man's PCI discovery */ /* Poor man's PCI discovery */
for (num = 0; num < 32; num++) { for (num = 0; num < 32; num++) {
for (slot = 0; slot < 32; slot++) { for (slot = 0; slot < 32; slot++) {

View file

@ -124,6 +124,20 @@ static unsigned char** find_nop_table(void)
#endif /* CONFIG_X86_64 */ #endif /* CONFIG_X86_64 */
static void nop_out(void *insns, unsigned int len)
{
unsigned char **noptable = find_nop_table();
while (len > 0) {
unsigned int noplen = len;
if (noplen > ASM_NOP_MAX)
noplen = ASM_NOP_MAX;
memcpy(insns, noptable[noplen], noplen);
insns += noplen;
len -= noplen;
}
}
extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
extern struct alt_instr __smp_alt_instructions[], __smp_alt_instructions_end[]; extern struct alt_instr __smp_alt_instructions[], __smp_alt_instructions_end[];
extern u8 *__smp_locks[], *__smp_locks_end[]; extern u8 *__smp_locks[], *__smp_locks_end[];
@ -138,10 +152,9 @@ extern u8 __smp_alt_begin[], __smp_alt_end[];
void apply_alternatives(struct alt_instr *start, struct alt_instr *end) void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
{ {
unsigned char **noptable = find_nop_table();
struct alt_instr *a; struct alt_instr *a;
u8 *instr; u8 *instr;
int diff, i, k; int diff;
DPRINTK("%s: alt table %p -> %p\n", __FUNCTION__, start, end); DPRINTK("%s: alt table %p -> %p\n", __FUNCTION__, start, end);
for (a = start; a < end; a++) { for (a = start; a < end; a++) {
@ -159,13 +172,7 @@ void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
#endif #endif
memcpy(instr, a->replacement, a->replacementlen); memcpy(instr, a->replacement, a->replacementlen);
diff = a->instrlen - a->replacementlen; diff = a->instrlen - a->replacementlen;
/* Pad the rest with nops */ nop_out(instr + a->replacementlen, diff);
for (i = a->replacementlen; diff > 0; diff -= k, i += k) {
k = diff;
if (k > ASM_NOP_MAX)
k = ASM_NOP_MAX;
memcpy(a->instr + i, noptable[k], k);
}
} }
} }
@ -209,7 +216,6 @@ static void alternatives_smp_lock(u8 **start, u8 **end, u8 *text, u8 *text_end)
static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end) static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
{ {
unsigned char **noptable = find_nop_table();
u8 **ptr; u8 **ptr;
for (ptr = start; ptr < end; ptr++) { for (ptr = start; ptr < end; ptr++) {
@ -217,7 +223,7 @@ static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end
continue; continue;
if (*ptr > text_end) if (*ptr > text_end)
continue; continue;
**ptr = noptable[1][0]; nop_out(*ptr, 1);
}; };
} }
@ -343,6 +349,40 @@ void alternatives_smp_switch(int smp)
#endif #endif
#ifdef CONFIG_PARAVIRT
void apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end)
{
struct paravirt_patch *p;
for (p = start; p < end; p++) {
unsigned int used;
used = paravirt_ops.patch(p->instrtype, p->clobbers, p->instr,
p->len);
#ifdef CONFIG_DEBUG_PARAVIRT
{
int i;
/* Deliberately clobber regs using "not %reg" to find bugs. */
for (i = 0; i < 3; i++) {
if (p->len - used >= 2 && (p->clobbers & (1 << i))) {
memcpy(p->instr + used, "\xf7\xd0", 2);
p->instr[used+1] |= i;
used += 2;
}
}
}
#endif
/* Pad the rest with nops */
nop_out(p->instr + used, p->len - used);
}
/* Sync to be conservative, in case we patched following instructions */
sync_core();
}
extern struct paravirt_patch __start_parainstructions[],
__stop_parainstructions[];
#endif /* CONFIG_PARAVIRT */
void __init alternative_instructions(void) void __init alternative_instructions(void)
{ {
unsigned long flags; unsigned long flags;
@ -390,5 +430,6 @@ void __init alternative_instructions(void)
alternatives_smp_switch(0); alternatives_smp_switch(0);
} }
#endif #endif
apply_paravirt(__start_parainstructions, __stop_parainstructions);
local_irq_restore(flags); local_irq_restore(flags);
} }

View file

@ -647,23 +647,30 @@ static struct {
static int lapic_suspend(struct sys_device *dev, pm_message_t state) static int lapic_suspend(struct sys_device *dev, pm_message_t state)
{ {
unsigned long flags; unsigned long flags;
int maxlvt;
if (!apic_pm_state.active) if (!apic_pm_state.active)
return 0; return 0;
maxlvt = get_maxlvt();
apic_pm_state.apic_id = apic_read(APIC_ID); apic_pm_state.apic_id = apic_read(APIC_ID);
apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI); apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI);
apic_pm_state.apic_ldr = apic_read(APIC_LDR); apic_pm_state.apic_ldr = apic_read(APIC_LDR);
apic_pm_state.apic_dfr = apic_read(APIC_DFR); apic_pm_state.apic_dfr = apic_read(APIC_DFR);
apic_pm_state.apic_spiv = apic_read(APIC_SPIV); apic_pm_state.apic_spiv = apic_read(APIC_SPIV);
apic_pm_state.apic_lvtt = apic_read(APIC_LVTT); apic_pm_state.apic_lvtt = apic_read(APIC_LVTT);
apic_pm_state.apic_lvtpc = apic_read(APIC_LVTPC); if (maxlvt >= 4)
apic_pm_state.apic_lvtpc = apic_read(APIC_LVTPC);
apic_pm_state.apic_lvt0 = apic_read(APIC_LVT0); apic_pm_state.apic_lvt0 = apic_read(APIC_LVT0);
apic_pm_state.apic_lvt1 = apic_read(APIC_LVT1); apic_pm_state.apic_lvt1 = apic_read(APIC_LVT1);
apic_pm_state.apic_lvterr = apic_read(APIC_LVTERR); apic_pm_state.apic_lvterr = apic_read(APIC_LVTERR);
apic_pm_state.apic_tmict = apic_read(APIC_TMICT); apic_pm_state.apic_tmict = apic_read(APIC_TMICT);
apic_pm_state.apic_tdcr = apic_read(APIC_TDCR); apic_pm_state.apic_tdcr = apic_read(APIC_TDCR);
apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); #ifdef CONFIG_X86_MCE_P4THERMAL
if (maxlvt >= 5)
apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR);
#endif
local_irq_save(flags); local_irq_save(flags);
disable_local_APIC(); disable_local_APIC();
@ -675,10 +682,13 @@ static int lapic_resume(struct sys_device *dev)
{ {
unsigned int l, h; unsigned int l, h;
unsigned long flags; unsigned long flags;
int maxlvt;
if (!apic_pm_state.active) if (!apic_pm_state.active)
return 0; return 0;
maxlvt = get_maxlvt();
local_irq_save(flags); local_irq_save(flags);
/* /*
@ -700,8 +710,12 @@ static int lapic_resume(struct sys_device *dev)
apic_write(APIC_SPIV, apic_pm_state.apic_spiv); apic_write(APIC_SPIV, apic_pm_state.apic_spiv);
apic_write(APIC_LVT0, apic_pm_state.apic_lvt0); apic_write(APIC_LVT0, apic_pm_state.apic_lvt0);
apic_write(APIC_LVT1, apic_pm_state.apic_lvt1); apic_write(APIC_LVT1, apic_pm_state.apic_lvt1);
apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); #ifdef CONFIG_X86_MCE_P4THERMAL
apic_write(APIC_LVTPC, apic_pm_state.apic_lvtpc); if (maxlvt >= 5)
apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr);
#endif
if (maxlvt >= 4)
apic_write(APIC_LVTPC, apic_pm_state.apic_lvtpc);
apic_write(APIC_LVTT, apic_pm_state.apic_lvtt); apic_write(APIC_LVTT, apic_pm_state.apic_lvtt);
apic_write(APIC_TDCR, apic_pm_state.apic_tdcr); apic_write(APIC_TDCR, apic_pm_state.apic_tdcr);
apic_write(APIC_TMICT, apic_pm_state.apic_tmict); apic_write(APIC_TMICT, apic_pm_state.apic_tmict);

View file

@ -231,6 +231,7 @@
#include <asm/uaccess.h> #include <asm/uaccess.h>
#include <asm/desc.h> #include <asm/desc.h>
#include <asm/i8253.h> #include <asm/i8253.h>
#include <asm/paravirt.h>
#include "io_ports.h" #include "io_ports.h"
@ -2235,7 +2236,7 @@ static int __init apm_init(void)
dmi_check_system(apm_dmi_table); dmi_check_system(apm_dmi_table);
if (apm_info.bios.version == 0) { if (apm_info.bios.version == 0 || paravirt_enabled()) {
printk(KERN_INFO "apm: BIOS not found.\n"); printk(KERN_INFO "apm: BIOS not found.\n");
return -ENODEV; return -ENODEV;
} }

View file

@ -15,6 +15,7 @@
#include <asm/processor.h> #include <asm/processor.h>
#include <asm/thread_info.h> #include <asm/thread_info.h>
#include <asm/elf.h> #include <asm/elf.h>
#include <asm/pda.h>
#define DEFINE(sym, val) \ #define DEFINE(sym, val) \
asm volatile("\n->" #sym " %0 " #val : : "i" (val)) asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@ -51,13 +52,35 @@ void foo(void)
OFFSET(TI_exec_domain, thread_info, exec_domain); OFFSET(TI_exec_domain, thread_info, exec_domain);
OFFSET(TI_flags, thread_info, flags); OFFSET(TI_flags, thread_info, flags);
OFFSET(TI_status, thread_info, status); OFFSET(TI_status, thread_info, status);
OFFSET(TI_cpu, thread_info, cpu);
OFFSET(TI_preempt_count, thread_info, preempt_count); OFFSET(TI_preempt_count, thread_info, preempt_count);
OFFSET(TI_addr_limit, thread_info, addr_limit); OFFSET(TI_addr_limit, thread_info, addr_limit);
OFFSET(TI_restart_block, thread_info, restart_block); OFFSET(TI_restart_block, thread_info, restart_block);
OFFSET(TI_sysenter_return, thread_info, sysenter_return); OFFSET(TI_sysenter_return, thread_info, sysenter_return);
BLANK(); BLANK();
OFFSET(GDS_size, Xgt_desc_struct, size);
OFFSET(GDS_address, Xgt_desc_struct, address);
OFFSET(GDS_pad, Xgt_desc_struct, pad);
BLANK();
OFFSET(PT_EBX, pt_regs, ebx);
OFFSET(PT_ECX, pt_regs, ecx);
OFFSET(PT_EDX, pt_regs, edx);
OFFSET(PT_ESI, pt_regs, esi);
OFFSET(PT_EDI, pt_regs, edi);
OFFSET(PT_EBP, pt_regs, ebp);
OFFSET(PT_EAX, pt_regs, eax);
OFFSET(PT_DS, pt_regs, xds);
OFFSET(PT_ES, pt_regs, xes);
OFFSET(PT_GS, pt_regs, xgs);
OFFSET(PT_ORIG_EAX, pt_regs, orig_eax);
OFFSET(PT_EIP, pt_regs, eip);
OFFSET(PT_CS, pt_regs, xcs);
OFFSET(PT_EFLAGS, pt_regs, eflags);
OFFSET(PT_OLDESP, pt_regs, esp);
OFFSET(PT_OLDSS, pt_regs, xss);
BLANK();
OFFSET(EXEC_DOMAIN_handler, exec_domain, handler); OFFSET(EXEC_DOMAIN_handler, exec_domain, handler);
OFFSET(RT_SIGFRAME_sigcontext, rt_sigframe, uc.uc_mcontext); OFFSET(RT_SIGFRAME_sigcontext, rt_sigframe, uc.uc_mcontext);
BLANK(); BLANK();
@ -74,4 +97,18 @@ void foo(void)
DEFINE(VDSO_PRELINK, VDSO_PRELINK); DEFINE(VDSO_PRELINK, VDSO_PRELINK);
OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx); OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
BLANK();
OFFSET(PDA_cpu, i386_pda, cpu_number);
OFFSET(PDA_pcurrent, i386_pda, pcurrent);
#ifdef CONFIG_PARAVIRT
BLANK();
OFFSET(PARAVIRT_enabled, paravirt_ops, paravirt_enabled);
OFFSET(PARAVIRT_irq_disable, paravirt_ops, irq_disable);
OFFSET(PARAVIRT_irq_enable, paravirt_ops, irq_enable);
OFFSET(PARAVIRT_irq_enable_sysexit, paravirt_ops, irq_enable_sysexit);
OFFSET(PARAVIRT_iret, paravirt_ops, iret);
OFFSET(PARAVIRT_read_cr0, paravirt_ops, read_cr0);
#endif
} }

View file

@ -104,10 +104,7 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
f_vide(); f_vide();
rdtscl(d2); rdtscl(d2);
d = d2-d; d = d2-d;
/* Knock these two lines out if it debugs out ok */
printk(KERN_INFO "AMD K6 stepping B detected - ");
/* -- cut here -- */
if (d > 20*K6_BUG_LOOP) if (d > 20*K6_BUG_LOOP)
printk("system stability may be impaired when more than 32 MB are used.\n"); printk("system stability may be impaired when more than 32 MB are used.\n");
else else

View file

@ -18,14 +18,15 @@
#include <asm/apic.h> #include <asm/apic.h>
#include <mach_apic.h> #include <mach_apic.h>
#endif #endif
#include <asm/pda.h>
#include "cpu.h" #include "cpu.h"
DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr); DEFINE_PER_CPU(struct Xgt_desc_struct, cpu_gdt_descr);
EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr); EXPORT_PER_CPU_SYMBOL(cpu_gdt_descr);
DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]); struct i386_pda *_cpu_pda[NR_CPUS] __read_mostly;
EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack); EXPORT_SYMBOL(_cpu_pda);
static int cachesize_override __cpuinitdata = -1; static int cachesize_override __cpuinitdata = -1;
static int disable_x86_fxsr __cpuinitdata; static int disable_x86_fxsr __cpuinitdata;
@ -235,29 +236,14 @@ static int __cpuinit have_cpuid_p(void)
return flag_is_changeable_p(X86_EFLAGS_ID); return flag_is_changeable_p(X86_EFLAGS_ID);
} }
/* Do minimum CPU detection early. void __init cpu_detect(struct cpuinfo_x86 *c)
Fields really needed: vendor, cpuid_level, family, model, mask, cache alignment.
The others are not touched to avoid unwanted side effects.
WARNING: this function is only called on the BP. Don't add code here
that is supposed to run on all CPUs. */
static void __init early_cpu_detect(void)
{ {
struct cpuinfo_x86 *c = &boot_cpu_data;
c->x86_cache_alignment = 32;
if (!have_cpuid_p())
return;
/* Get vendor name */ /* Get vendor name */
cpuid(0x00000000, &c->cpuid_level, cpuid(0x00000000, &c->cpuid_level,
(int *)&c->x86_vendor_id[0], (int *)&c->x86_vendor_id[0],
(int *)&c->x86_vendor_id[8], (int *)&c->x86_vendor_id[8],
(int *)&c->x86_vendor_id[4]); (int *)&c->x86_vendor_id[4]);
get_cpu_vendor(c, 1);
c->x86 = 4; c->x86 = 4;
if (c->cpuid_level >= 0x00000001) { if (c->cpuid_level >= 0x00000001) {
u32 junk, tfms, cap0, misc; u32 junk, tfms, cap0, misc;
@ -274,6 +260,26 @@ static void __init early_cpu_detect(void)
} }
} }
/* Do minimum CPU detection early.
Fields really needed: vendor, cpuid_level, family, model, mask, cache alignment.
The others are not touched to avoid unwanted side effects.
WARNING: this function is only called on the BP. Don't add code here
that is supposed to run on all CPUs. */
static void __init early_cpu_detect(void)
{
struct cpuinfo_x86 *c = &boot_cpu_data;
c->x86_cache_alignment = 32;
if (!have_cpuid_p())
return;
cpu_detect(c);
get_cpu_vendor(c, 1);
}
static void __cpuinit generic_identify(struct cpuinfo_x86 * c) static void __cpuinit generic_identify(struct cpuinfo_x86 * c)
{ {
u32 tfms, xlvl; u32 tfms, xlvl;
@ -308,6 +314,8 @@ static void __cpuinit generic_identify(struct cpuinfo_x86 * c)
#else #else
c->apicid = (ebx >> 24) & 0xFF; c->apicid = (ebx >> 24) & 0xFF;
#endif #endif
if (c->x86_capability[0] & (1<<19))
c->x86_clflush_size = ((ebx >> 8) & 0xff) * 8;
} else { } else {
/* Have CPUID level 0 only - unheard of */ /* Have CPUID level 0 only - unheard of */
c->x86 = 4; c->x86 = 4;
@ -372,6 +380,7 @@ void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
c->x86_vendor_id[0] = '\0'; /* Unset */ c->x86_vendor_id[0] = '\0'; /* Unset */
c->x86_model_id[0] = '\0'; /* Unset */ c->x86_model_id[0] = '\0'; /* Unset */
c->x86_max_cores = 1; c->x86_max_cores = 1;
c->x86_clflush_size = 32;
memset(&c->x86_capability, 0, sizeof c->x86_capability); memset(&c->x86_capability, 0, sizeof c->x86_capability);
if (!have_cpuid_p()) { if (!have_cpuid_p()) {
@ -591,25 +600,134 @@ void __init early_cpu_init(void)
disable_pse = 1; disable_pse = 1;
#endif #endif
} }
/*
* cpu_init() initializes state that is per-CPU. Some data is already /* Make sure %gs is initialized properly in idle threads */
* initialized (naturally) in the bootstrap process, such as the GDT struct pt_regs * __devinit idle_regs(struct pt_regs *regs)
* and IDT. We reload them nevertheless, this function acts as a {
* 'CPU state barrier', nothing should get across. memset(regs, 0, sizeof(struct pt_regs));
*/ regs->xgs = __KERNEL_PDA;
void __cpuinit cpu_init(void) return regs;
}
static __cpuinit int alloc_gdt(int cpu)
{ {
int cpu = smp_processor_id();
struct tss_struct * t = &per_cpu(init_tss, cpu);
struct thread_struct *thread = &current->thread;
struct desc_struct *gdt;
__u32 stk16_off = (__u32)&per_cpu(cpu_16bit_stack, cpu);
struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu); struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
struct desc_struct *gdt;
struct i386_pda *pda;
gdt = (struct desc_struct *)cpu_gdt_descr->address;
pda = cpu_pda(cpu);
/*
* This is a horrible hack to allocate the GDT. The problem
* is that cpu_init() is called really early for the boot CPU
* (and hence needs bootmem) but much later for the secondary
* CPUs, when bootmem will have gone away
*/
if (NODE_DATA(0)->bdata->node_bootmem_map) {
BUG_ON(gdt != NULL || pda != NULL);
gdt = alloc_bootmem_pages(PAGE_SIZE);
pda = alloc_bootmem(sizeof(*pda));
/* alloc_bootmem(_pages) panics on failure, so no check */
memset(gdt, 0, PAGE_SIZE);
memset(pda, 0, sizeof(*pda));
} else {
/* GDT and PDA might already have been allocated if
this is a CPU hotplug re-insertion. */
if (gdt == NULL)
gdt = (struct desc_struct *)get_zeroed_page(GFP_KERNEL);
if (pda == NULL)
pda = kmalloc_node(sizeof(*pda), GFP_KERNEL, cpu_to_node(cpu));
if (unlikely(!gdt || !pda)) {
free_pages((unsigned long)gdt, 0);
kfree(pda);
return 0;
}
}
cpu_gdt_descr->address = (unsigned long)gdt;
cpu_pda(cpu) = pda;
return 1;
}
/* Initial PDA used by boot CPU */
struct i386_pda boot_pda = {
._pda = &boot_pda,
.cpu_number = 0,
.pcurrent = &init_task,
};
static inline void set_kernel_gs(void)
{
/* Set %gs for this CPU's PDA. Memory clobber is to create a
barrier with respect to any PDA operations, so the compiler
doesn't move any before here. */
asm volatile ("mov %0, %%gs" : : "r" (__KERNEL_PDA) : "memory");
}
/* Initialize the CPU's GDT and PDA. The boot CPU does this for
itself, but secondaries find this done for them. */
__cpuinit int init_gdt(int cpu, struct task_struct *idle)
{
struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
struct desc_struct *gdt;
struct i386_pda *pda;
/* For non-boot CPUs, the GDT and PDA should already have been
allocated. */
if (!alloc_gdt(cpu)) {
printk(KERN_CRIT "CPU%d failed to allocate GDT or PDA\n", cpu);
return 0;
}
gdt = (struct desc_struct *)cpu_gdt_descr->address;
pda = cpu_pda(cpu);
BUG_ON(gdt == NULL || pda == NULL);
/*
* Initialize the per-CPU GDT with the boot GDT,
* and set up the GDT descriptor:
*/
memcpy(gdt, cpu_gdt_table, GDT_SIZE);
cpu_gdt_descr->size = GDT_SIZE - 1;
pack_descriptor((u32 *)&gdt[GDT_ENTRY_PDA].a,
(u32 *)&gdt[GDT_ENTRY_PDA].b,
(unsigned long)pda, sizeof(*pda) - 1,
0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data segment */
memset(pda, 0, sizeof(*pda));
pda->_pda = pda;
pda->cpu_number = cpu;
pda->pcurrent = idle;
return 1;
}
/* Common CPU init for both boot and secondary CPUs */
static void __cpuinit _cpu_init(int cpu, struct task_struct *curr)
{
struct tss_struct * t = &per_cpu(init_tss, cpu);
struct thread_struct *thread = &curr->thread;
struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
/* Reinit these anyway, even if they've already been done (on
the boot CPU, this will transition from the boot gdt+pda to
the real ones). */
load_gdt(cpu_gdt_descr);
set_kernel_gs();
if (cpu_test_and_set(cpu, cpu_initialized)) { if (cpu_test_and_set(cpu, cpu_initialized)) {
printk(KERN_WARNING "CPU#%d already initialized!\n", cpu); printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
for (;;) local_irq_enable(); for (;;) local_irq_enable();
} }
printk(KERN_INFO "Initializing CPU#%d\n", cpu); printk(KERN_INFO "Initializing CPU#%d\n", cpu);
if (cpu_has_vme || cpu_has_tsc || cpu_has_de) if (cpu_has_vme || cpu_has_tsc || cpu_has_de)
@ -621,56 +739,16 @@ void __cpuinit cpu_init(void)
set_in_cr4(X86_CR4_TSD); set_in_cr4(X86_CR4_TSD);
} }
/* The CPU hotplug case */
if (cpu_gdt_descr->address) {
gdt = (struct desc_struct *)cpu_gdt_descr->address;
memset(gdt, 0, PAGE_SIZE);
goto old_gdt;
}
/*
* This is a horrible hack to allocate the GDT. The problem
* is that cpu_init() is called really early for the boot CPU
* (and hence needs bootmem) but much later for the secondary
* CPUs, when bootmem will have gone away
*/
if (NODE_DATA(0)->bdata->node_bootmem_map) {
gdt = (struct desc_struct *)alloc_bootmem_pages(PAGE_SIZE);
/* alloc_bootmem_pages panics on failure, so no check */
memset(gdt, 0, PAGE_SIZE);
} else {
gdt = (struct desc_struct *)get_zeroed_page(GFP_KERNEL);
if (unlikely(!gdt)) {
printk(KERN_CRIT "CPU%d failed to allocate GDT\n", cpu);
for (;;)
local_irq_enable();
}
}
old_gdt:
/*
* Initialize the per-CPU GDT with the boot GDT,
* and set up the GDT descriptor:
*/
memcpy(gdt, cpu_gdt_table, GDT_SIZE);
/* Set up GDT entry for 16bit stack */
*(__u64 *)(&gdt[GDT_ENTRY_ESPFIX_SS]) |=
((((__u64)stk16_off) << 16) & 0x000000ffffff0000ULL) |
((((__u64)stk16_off) << 32) & 0xff00000000000000ULL) |
(CPU_16BIT_STACK_SIZE - 1);
cpu_gdt_descr->size = GDT_SIZE - 1;
cpu_gdt_descr->address = (unsigned long)gdt;
load_gdt(cpu_gdt_descr);
load_idt(&idt_descr); load_idt(&idt_descr);
/* /*
* Set up and load the per-CPU TSS and LDT * Set up and load the per-CPU TSS and LDT
*/ */
atomic_inc(&init_mm.mm_count); atomic_inc(&init_mm.mm_count);
current->active_mm = &init_mm; curr->active_mm = &init_mm;
BUG_ON(current->mm); if (curr->mm)
enter_lazy_tlb(&init_mm, current); BUG();
enter_lazy_tlb(&init_mm, curr);
load_esp0(t, thread); load_esp0(t, thread);
set_tss_desc(cpu,t); set_tss_desc(cpu,t);
@ -682,8 +760,8 @@ void __cpuinit cpu_init(void)
__set_tss_desc(cpu, GDT_ENTRY_DOUBLEFAULT_TSS, &doublefault_tss); __set_tss_desc(cpu, GDT_ENTRY_DOUBLEFAULT_TSS, &doublefault_tss);
#endif #endif
/* Clear %fs and %gs. */ /* Clear %fs. */
asm volatile ("movl %0, %%fs; movl %0, %%gs" : : "r" (0)); asm volatile ("mov %0, %%fs" : : "r" (0));
/* Clear all 6 debug registers: */ /* Clear all 6 debug registers: */
set_debugreg(0, 0); set_debugreg(0, 0);
@ -701,6 +779,37 @@ void __cpuinit cpu_init(void)
mxcsr_feature_mask_init(); mxcsr_feature_mask_init();
} }
/* Entrypoint to initialize secondary CPU */
void __cpuinit secondary_cpu_init(void)
{
int cpu = smp_processor_id();
struct task_struct *curr = current;
_cpu_init(cpu, curr);
}
/*
* cpu_init() initializes state that is per-CPU. Some data is already
* initialized (naturally) in the bootstrap process, such as the GDT
* and IDT. We reload them nevertheless, this function acts as a
* 'CPU state barrier', nothing should get across.
*/
void __cpuinit cpu_init(void)
{
int cpu = smp_processor_id();
struct task_struct *curr = current;
/* Set up the real GDT and PDA, so we can transition from the
boot versions. */
if (!init_gdt(cpu, curr)) {
/* failed to allocate something; not much we can do... */
for (;;)
local_irq_enable();
}
_cpu_init(cpu, curr);
}
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
void __cpuinit cpu_uninit(void) void __cpuinit cpu_uninit(void)
{ {

View file

@ -107,7 +107,7 @@ static void __cpuinit init_intel(struct cpuinfo_x86 *c)
* Note that the workaround only should be initialized once... * Note that the workaround only should be initialized once...
*/ */
c->f00f_bug = 0; c->f00f_bug = 0;
if ( c->x86 == 5 ) { if (!paravirt_enabled() && c->x86 == 5) {
static int f00f_workaround_enabled = 0; static int f00f_workaround_enabled = 0;
c->f00f_bug = 1; c->f00f_bug = 1;
@ -195,8 +195,16 @@ static void __cpuinit init_intel(struct cpuinfo_x86 *c)
if ((c->x86 == 0xf && c->x86_model >= 0x03) || if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
(c->x86 == 0x6 && c->x86_model >= 0x0e)) (c->x86 == 0x6 && c->x86_model >= 0x0e))
set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability); set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
}
if (cpu_has_ds) {
unsigned int l1;
rdmsr(MSR_IA32_MISC_ENABLE, l1, l2);
if (!(l1 & (1<<11)))
set_bit(X86_FEATURE_BTS, c->x86_capability);
if (!(l1 & (1<<12)))
set_bit(X86_FEATURE_PEBS, c->x86_capability);
}
}
static unsigned int __cpuinit intel_size_cache(struct cpuinfo_x86 * c, unsigned int size) static unsigned int __cpuinit intel_size_cache(struct cpuinfo_x86 * c, unsigned int size)
{ {

View file

@ -480,12 +480,10 @@ static int __cpuinit detect_cache_attributes(unsigned int cpu)
if (num_cache_leaves == 0) if (num_cache_leaves == 0)
return -ENOENT; return -ENOENT;
cpuid4_info[cpu] = kmalloc( cpuid4_info[cpu] = kzalloc(
sizeof(struct _cpuid4_info) * num_cache_leaves, GFP_KERNEL); sizeof(struct _cpuid4_info) * num_cache_leaves, GFP_KERNEL);
if (unlikely(cpuid4_info[cpu] == NULL)) if (unlikely(cpuid4_info[cpu] == NULL))
return -ENOMEM; return -ENOMEM;
memset(cpuid4_info[cpu], 0,
sizeof(struct _cpuid4_info) * num_cache_leaves);
oldmask = current->cpus_allowed; oldmask = current->cpus_allowed;
retval = set_cpus_allowed(current, cpumask_of_cpu(cpu)); retval = set_cpus_allowed(current, cpumask_of_cpu(cpu));
@ -658,17 +656,14 @@ static int __cpuinit cpuid4_cache_sysfs_init(unsigned int cpu)
return -ENOENT; return -ENOENT;
/* Allocate all required memory */ /* Allocate all required memory */
cache_kobject[cpu] = kmalloc(sizeof(struct kobject), GFP_KERNEL); cache_kobject[cpu] = kzalloc(sizeof(struct kobject), GFP_KERNEL);
if (unlikely(cache_kobject[cpu] == NULL)) if (unlikely(cache_kobject[cpu] == NULL))
goto err_out; goto err_out;
memset(cache_kobject[cpu], 0, sizeof(struct kobject));
index_kobject[cpu] = kmalloc( index_kobject[cpu] = kzalloc(
sizeof(struct _index_kobject ) * num_cache_leaves, GFP_KERNEL); sizeof(struct _index_kobject ) * num_cache_leaves, GFP_KERNEL);
if (unlikely(index_kobject[cpu] == NULL)) if (unlikely(index_kobject[cpu] == NULL))
goto err_out; goto err_out;
memset(index_kobject[cpu], 0,
sizeof(struct _index_kobject) * num_cache_leaves);
return 0; return 0;

View file

@ -1,5 +1,3 @@
obj-y := main.o if.o generic.o state.o obj-y := main.o if.o generic.o state.o
obj-y += amd.o obj-$(CONFIG_X86_32) += amd.o cyrix.o centaur.o
obj-y += cyrix.o
obj-y += centaur.o

View file

@ -7,7 +7,7 @@
static void static void
amd_get_mtrr(unsigned int reg, unsigned long *base, amd_get_mtrr(unsigned int reg, unsigned long *base,
unsigned int *size, mtrr_type * type) unsigned long *size, mtrr_type * type)
{ {
unsigned long low, high; unsigned long low, high;

View file

@ -17,7 +17,7 @@ static u8 centaur_mcr_type; /* 0 for winchip, 1 for winchip2 */
*/ */
static int static int
centaur_get_free_region(unsigned long base, unsigned long size) centaur_get_free_region(unsigned long base, unsigned long size, int replace_reg)
/* [SUMMARY] Get a free MTRR. /* [SUMMARY] Get a free MTRR.
<base> The starting (base) address of the region. <base> The starting (base) address of the region.
<size> The size (in bytes) of the region. <size> The size (in bytes) of the region.
@ -26,10 +26,11 @@ centaur_get_free_region(unsigned long base, unsigned long size)
{ {
int i, max; int i, max;
mtrr_type ltype; mtrr_type ltype;
unsigned long lbase; unsigned long lbase, lsize;
unsigned int lsize;
max = num_var_ranges; max = num_var_ranges;
if (replace_reg >= 0 && replace_reg < max)
return replace_reg;
for (i = 0; i < max; ++i) { for (i = 0; i < max; ++i) {
if (centaur_mcr_reserved & (1 << i)) if (centaur_mcr_reserved & (1 << i))
continue; continue;
@ -49,7 +50,7 @@ mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
static void static void
centaur_get_mcr(unsigned int reg, unsigned long *base, centaur_get_mcr(unsigned int reg, unsigned long *base,
unsigned int *size, mtrr_type * type) unsigned long *size, mtrr_type * type)
{ {
*base = centaur_mcr[reg].high >> PAGE_SHIFT; *base = centaur_mcr[reg].high >> PAGE_SHIFT;
*size = -(centaur_mcr[reg].low & 0xfffff000) >> PAGE_SHIFT; *size = -(centaur_mcr[reg].low & 0xfffff000) >> PAGE_SHIFT;

View file

@ -9,7 +9,7 @@ int arr3_protected;
static void static void
cyrix_get_arr(unsigned int reg, unsigned long *base, cyrix_get_arr(unsigned int reg, unsigned long *base,
unsigned int *size, mtrr_type * type) unsigned long *size, mtrr_type * type)
{ {
unsigned long flags; unsigned long flags;
unsigned char arr, ccr3, rcr, shift; unsigned char arr, ccr3, rcr, shift;
@ -77,7 +77,7 @@ cyrix_get_arr(unsigned int reg, unsigned long *base,
} }
static int static int
cyrix_get_free_region(unsigned long base, unsigned long size) cyrix_get_free_region(unsigned long base, unsigned long size, int replace_reg)
/* [SUMMARY] Get a free ARR. /* [SUMMARY] Get a free ARR.
<base> The starting (base) address of the region. <base> The starting (base) address of the region.
<size> The size (in bytes) of the region. <size> The size (in bytes) of the region.
@ -86,9 +86,24 @@ cyrix_get_free_region(unsigned long base, unsigned long size)
{ {
int i; int i;
mtrr_type ltype; mtrr_type ltype;
unsigned long lbase; unsigned long lbase, lsize;
unsigned int lsize;
switch (replace_reg) {
case 7:
if (size < 0x40)
break;
case 6:
case 5:
case 4:
return replace_reg;
case 3:
if (arr3_protected)
break;
case 2:
case 1:
case 0:
return replace_reg;
}
/* If we are to set up a region >32M then look at ARR7 immediately */ /* If we are to set up a region >32M then look at ARR7 immediately */
if (size > 0x2000) { if (size > 0x2000) {
cyrix_get_arr(7, &lbase, &lsize, &ltype); cyrix_get_arr(7, &lbase, &lsize, &ltype);
@ -214,7 +229,7 @@ static void cyrix_set_arr(unsigned int reg, unsigned long base,
typedef struct { typedef struct {
unsigned long base; unsigned long base;
unsigned int size; unsigned long size;
mtrr_type type; mtrr_type type;
} arr_state_t; } arr_state_t;

View file

@ -3,6 +3,7 @@
#include <linux/init.h> #include <linux/init.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/mm.h> #include <linux/mm.h>
#include <linux/module.h>
#include <asm/io.h> #include <asm/io.h>
#include <asm/mtrr.h> #include <asm/mtrr.h>
#include <asm/msr.h> #include <asm/msr.h>
@ -15,12 +16,19 @@ struct mtrr_state {
struct mtrr_var_range *var_ranges; struct mtrr_var_range *var_ranges;
mtrr_type fixed_ranges[NUM_FIXED_RANGES]; mtrr_type fixed_ranges[NUM_FIXED_RANGES];
unsigned char enabled; unsigned char enabled;
unsigned char have_fixed;
mtrr_type def_type; mtrr_type def_type;
}; };
static unsigned long smp_changes_mask; static unsigned long smp_changes_mask;
static struct mtrr_state mtrr_state = {}; static struct mtrr_state mtrr_state = {};
#undef MODULE_PARAM_PREFIX
#define MODULE_PARAM_PREFIX "mtrr."
static __initdata int mtrr_show;
module_param_named(show, mtrr_show, bool, 0);
/* Get the MSR pair relating to a var range */ /* Get the MSR pair relating to a var range */
static void __init static void __init
get_mtrr_var_range(unsigned int index, struct mtrr_var_range *vr) get_mtrr_var_range(unsigned int index, struct mtrr_var_range *vr)
@ -43,6 +51,14 @@ get_fixed_ranges(mtrr_type * frs)
rdmsr(MTRRfix4K_C0000_MSR + i, p[6 + i * 2], p[7 + i * 2]); rdmsr(MTRRfix4K_C0000_MSR + i, p[6 + i * 2], p[7 + i * 2]);
} }
static void __init print_fixed(unsigned base, unsigned step, const mtrr_type*types)
{
unsigned i;
for (i = 0; i < 8; ++i, ++types, base += step)
printk(KERN_INFO "MTRR %05X-%05X %s\n", base, base + step - 1, mtrr_attrib_to_str(*types));
}
/* Grab all of the MTRR state for this CPU into *state */ /* Grab all of the MTRR state for this CPU into *state */
void __init get_mtrr_state(void) void __init get_mtrr_state(void)
{ {
@ -58,13 +74,49 @@ void __init get_mtrr_state(void)
} }
vrs = mtrr_state.var_ranges; vrs = mtrr_state.var_ranges;
rdmsr(MTRRcap_MSR, lo, dummy);
mtrr_state.have_fixed = (lo >> 8) & 1;
for (i = 0; i < num_var_ranges; i++) for (i = 0; i < num_var_ranges; i++)
get_mtrr_var_range(i, &vrs[i]); get_mtrr_var_range(i, &vrs[i]);
get_fixed_ranges(mtrr_state.fixed_ranges); if (mtrr_state.have_fixed)
get_fixed_ranges(mtrr_state.fixed_ranges);
rdmsr(MTRRdefType_MSR, lo, dummy); rdmsr(MTRRdefType_MSR, lo, dummy);
mtrr_state.def_type = (lo & 0xff); mtrr_state.def_type = (lo & 0xff);
mtrr_state.enabled = (lo & 0xc00) >> 10; mtrr_state.enabled = (lo & 0xc00) >> 10;
if (mtrr_show) {
int high_width;
printk(KERN_INFO "MTRR default type: %s\n", mtrr_attrib_to_str(mtrr_state.def_type));
if (mtrr_state.have_fixed) {
printk(KERN_INFO "MTRR fixed ranges %sabled:\n",
mtrr_state.enabled & 1 ? "en" : "dis");
print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
for (i = 0; i < 2; ++i)
print_fixed(0x80000 + i * 0x20000, 0x04000, mtrr_state.fixed_ranges + (i + 1) * 8);
for (i = 0; i < 8; ++i)
print_fixed(0xC0000 + i * 0x08000, 0x01000, mtrr_state.fixed_ranges + (i + 3) * 8);
}
printk(KERN_INFO "MTRR variable ranges %sabled:\n",
mtrr_state.enabled & 2 ? "en" : "dis");
high_width = ((size_or_mask ? ffs(size_or_mask) - 1 : 32) - (32 - PAGE_SHIFT) + 3) / 4;
for (i = 0; i < num_var_ranges; ++i) {
if (mtrr_state.var_ranges[i].mask_lo & (1 << 11))
printk(KERN_INFO "MTRR %u base %0*X%05X000 mask %0*X%05X000 %s\n",
i,
high_width,
mtrr_state.var_ranges[i].base_hi,
mtrr_state.var_ranges[i].base_lo >> 12,
high_width,
mtrr_state.var_ranges[i].mask_hi,
mtrr_state.var_ranges[i].mask_lo >> 12,
mtrr_attrib_to_str(mtrr_state.var_ranges[i].base_lo & 0xff));
else
printk(KERN_INFO "MTRR %u disabled\n", i);
}
}
} }
/* Some BIOS's are fucked and don't set all MTRRs the same! */ /* Some BIOS's are fucked and don't set all MTRRs the same! */
@ -95,7 +147,7 @@ void mtrr_wrmsr(unsigned msr, unsigned a, unsigned b)
smp_processor_id(), msr, a, b); smp_processor_id(), msr, a, b);
} }
int generic_get_free_region(unsigned long base, unsigned long size) int generic_get_free_region(unsigned long base, unsigned long size, int replace_reg)
/* [SUMMARY] Get a free MTRR. /* [SUMMARY] Get a free MTRR.
<base> The starting (base) address of the region. <base> The starting (base) address of the region.
<size> The size (in bytes) of the region. <size> The size (in bytes) of the region.
@ -104,10 +156,11 @@ int generic_get_free_region(unsigned long base, unsigned long size)
{ {
int i, max; int i, max;
mtrr_type ltype; mtrr_type ltype;
unsigned long lbase; unsigned long lbase, lsize;
unsigned lsize;
max = num_var_ranges; max = num_var_ranges;
if (replace_reg >= 0 && replace_reg < max)
return replace_reg;
for (i = 0; i < max; ++i) { for (i = 0; i < max; ++i) {
mtrr_if->get(i, &lbase, &lsize, &ltype); mtrr_if->get(i, &lbase, &lsize, &ltype);
if (lsize == 0) if (lsize == 0)
@ -117,7 +170,7 @@ int generic_get_free_region(unsigned long base, unsigned long size)
} }
static void generic_get_mtrr(unsigned int reg, unsigned long *base, static void generic_get_mtrr(unsigned int reg, unsigned long *base,
unsigned int *size, mtrr_type * type) unsigned long *size, mtrr_type *type)
{ {
unsigned int mask_lo, mask_hi, base_lo, base_hi; unsigned int mask_lo, mask_hi, base_lo, base_hi;
@ -202,7 +255,9 @@ static int set_mtrr_var_ranges(unsigned int index, struct mtrr_var_range *vr)
return changed; return changed;
} }
static unsigned long set_mtrr_state(u32 deftype_lo, u32 deftype_hi) static u32 deftype_lo, deftype_hi;
static unsigned long set_mtrr_state(void)
/* [SUMMARY] Set the MTRR state for this CPU. /* [SUMMARY] Set the MTRR state for this CPU.
<state> The MTRR state information to read. <state> The MTRR state information to read.
<ctxt> Some relevant CPU context. <ctxt> Some relevant CPU context.
@ -217,14 +272,14 @@ static unsigned long set_mtrr_state(u32 deftype_lo, u32 deftype_hi)
if (set_mtrr_var_ranges(i, &mtrr_state.var_ranges[i])) if (set_mtrr_var_ranges(i, &mtrr_state.var_ranges[i]))
change_mask |= MTRR_CHANGE_MASK_VARIABLE; change_mask |= MTRR_CHANGE_MASK_VARIABLE;
if (set_fixed_ranges(mtrr_state.fixed_ranges)) if (mtrr_state.have_fixed && set_fixed_ranges(mtrr_state.fixed_ranges))
change_mask |= MTRR_CHANGE_MASK_FIXED; change_mask |= MTRR_CHANGE_MASK_FIXED;
/* Set_mtrr_restore restores the old value of MTRRdefType, /* Set_mtrr_restore restores the old value of MTRRdefType,
so to set it we fiddle with the saved value */ so to set it we fiddle with the saved value */
if ((deftype_lo & 0xff) != mtrr_state.def_type if ((deftype_lo & 0xff) != mtrr_state.def_type
|| ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) { || ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
deftype_lo |= (mtrr_state.def_type | mtrr_state.enabled << 10); deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type | (mtrr_state.enabled << 10);
change_mask |= MTRR_CHANGE_MASK_DEFTYPE; change_mask |= MTRR_CHANGE_MASK_DEFTYPE;
} }
@ -233,7 +288,6 @@ static unsigned long set_mtrr_state(u32 deftype_lo, u32 deftype_hi)
static unsigned long cr4 = 0; static unsigned long cr4 = 0;
static u32 deftype_lo, deftype_hi;
static DEFINE_SPINLOCK(set_atomicity_lock); static DEFINE_SPINLOCK(set_atomicity_lock);
/* /*
@ -271,7 +325,7 @@ static void prepare_set(void) __acquires(set_atomicity_lock)
rdmsr(MTRRdefType_MSR, deftype_lo, deftype_hi); rdmsr(MTRRdefType_MSR, deftype_lo, deftype_hi);
/* Disable MTRRs, and set the default type to uncached */ /* Disable MTRRs, and set the default type to uncached */
mtrr_wrmsr(MTRRdefType_MSR, deftype_lo & 0xf300UL, deftype_hi); mtrr_wrmsr(MTRRdefType_MSR, deftype_lo & ~0xcff, deftype_hi);
} }
static void post_set(void) __releases(set_atomicity_lock) static void post_set(void) __releases(set_atomicity_lock)
@ -300,7 +354,7 @@ static void generic_set_all(void)
prepare_set(); prepare_set();
/* Actually set the state */ /* Actually set the state */
mask = set_mtrr_state(deftype_lo,deftype_hi); mask = set_mtrr_state();
post_set(); post_set();
local_irq_restore(flags); local_irq_restore(flags);
@ -366,7 +420,7 @@ int generic_validate_add_page(unsigned long base, unsigned long size, unsigned i
printk(KERN_WARNING "mtrr: base(0x%lx000) is not 4 MiB aligned\n", base); printk(KERN_WARNING "mtrr: base(0x%lx000) is not 4 MiB aligned\n", base);
return -EINVAL; return -EINVAL;
} }
if (!(base + size < 0x70000000 || base > 0x7003FFFF) && if (!(base + size < 0x70000 || base > 0x7003F) &&
(type == MTRR_TYPE_WRCOMB (type == MTRR_TYPE_WRCOMB
|| type == MTRR_TYPE_WRBACK)) { || type == MTRR_TYPE_WRBACK)) {
printk(KERN_WARNING "mtrr: writable mtrr between 0x70000000 and 0x7003FFFF may hang the CPU.\n"); printk(KERN_WARNING "mtrr: writable mtrr between 0x70000000 and 0x7003FFFF may hang the CPU.\n");

View file

@ -17,7 +17,7 @@ extern unsigned int *usage_table;
#define FILE_FCOUNT(f) (((struct seq_file *)((f)->private_data))->private) #define FILE_FCOUNT(f) (((struct seq_file *)((f)->private_data))->private)
static char *mtrr_strings[MTRR_NUM_TYPES] = static const char *const mtrr_strings[MTRR_NUM_TYPES] =
{ {
"uncachable", /* 0 */ "uncachable", /* 0 */
"write-combining", /* 1 */ "write-combining", /* 1 */
@ -28,7 +28,7 @@ static char *mtrr_strings[MTRR_NUM_TYPES] =
"write-back", /* 6 */ "write-back", /* 6 */
}; };
char *mtrr_attrib_to_str(int x) const char *mtrr_attrib_to_str(int x)
{ {
return (x <= 6) ? mtrr_strings[x] : "?"; return (x <= 6) ? mtrr_strings[x] : "?";
} }
@ -44,10 +44,9 @@ mtrr_file_add(unsigned long base, unsigned long size,
max = num_var_ranges; max = num_var_ranges;
if (fcount == NULL) { if (fcount == NULL) {
fcount = kmalloc(max * sizeof *fcount, GFP_KERNEL); fcount = kzalloc(max * sizeof *fcount, GFP_KERNEL);
if (!fcount) if (!fcount)
return -ENOMEM; return -ENOMEM;
memset(fcount, 0, max * sizeof *fcount);
FILE_FCOUNT(file) = fcount; FILE_FCOUNT(file) = fcount;
} }
if (!page) { if (!page) {
@ -155,6 +154,7 @@ mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
{ {
int err = 0; int err = 0;
mtrr_type type; mtrr_type type;
unsigned long size;
struct mtrr_sentry sentry; struct mtrr_sentry sentry;
struct mtrr_gentry gentry; struct mtrr_gentry gentry;
void __user *arg = (void __user *) __arg; void __user *arg = (void __user *) __arg;
@ -235,15 +235,15 @@ mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
case MTRRIOC_GET_ENTRY: case MTRRIOC_GET_ENTRY:
if (gentry.regnum >= num_var_ranges) if (gentry.regnum >= num_var_ranges)
return -EINVAL; return -EINVAL;
mtrr_if->get(gentry.regnum, &gentry.base, &gentry.size, &type); mtrr_if->get(gentry.regnum, &gentry.base, &size, &type);
/* Hide entries that go above 4GB */ /* Hide entries that go above 4GB */
if (gentry.base + gentry.size > 0x100000 if (gentry.base + size - 1 >= (1UL << (8 * sizeof(gentry.size) - PAGE_SHIFT))
|| gentry.size == 0x100000) || size >= (1UL << (8 * sizeof(gentry.size) - PAGE_SHIFT)))
gentry.base = gentry.size = gentry.type = 0; gentry.base = gentry.size = gentry.type = 0;
else { else {
gentry.base <<= PAGE_SHIFT; gentry.base <<= PAGE_SHIFT;
gentry.size <<= PAGE_SHIFT; gentry.size = size << PAGE_SHIFT;
gentry.type = type; gentry.type = type;
} }
@ -273,8 +273,14 @@ mtrr_ioctl(struct file *file, unsigned int cmd, unsigned long __arg)
case MTRRIOC_GET_PAGE_ENTRY: case MTRRIOC_GET_PAGE_ENTRY:
if (gentry.regnum >= num_var_ranges) if (gentry.regnum >= num_var_ranges)
return -EINVAL; return -EINVAL;
mtrr_if->get(gentry.regnum, &gentry.base, &gentry.size, &type); mtrr_if->get(gentry.regnum, &gentry.base, &size, &type);
gentry.type = type; /* Hide entries that would overflow */
if (size != (__typeof__(gentry.size))size)
gentry.base = gentry.size = gentry.type = 0;
else {
gentry.size = size;
gentry.type = type;
}
break; break;
} }
@ -353,8 +359,7 @@ static int mtrr_seq_show(struct seq_file *seq, void *offset)
char factor; char factor;
int i, max, len; int i, max, len;
mtrr_type type; mtrr_type type;
unsigned long base; unsigned long base, size;
unsigned int size;
len = 0; len = 0;
max = num_var_ranges; max = num_var_ranges;
@ -373,7 +378,7 @@ static int mtrr_seq_show(struct seq_file *seq, void *offset)
} }
/* RED-PEN: base can be > 32bit */ /* RED-PEN: base can be > 32bit */
len += seq_printf(seq, len += seq_printf(seq,
"reg%02i: base=0x%05lx000 (%4liMB), size=%4i%cB: %s, count=%d\n", "reg%02i: base=0x%05lx000 (%4luMB), size=%4lu%cB: %s, count=%d\n",
i, base, base >> (20 - PAGE_SHIFT), size, factor, i, base, base >> (20 - PAGE_SHIFT), size, factor,
mtrr_attrib_to_str(type), usage_table[i]); mtrr_attrib_to_str(type), usage_table[i]);
} }

View file

@ -59,7 +59,11 @@ struct mtrr_ops * mtrr_if = NULL;
static void set_mtrr(unsigned int reg, unsigned long base, static void set_mtrr(unsigned int reg, unsigned long base,
unsigned long size, mtrr_type type); unsigned long size, mtrr_type type);
#ifndef CONFIG_X86_64
extern int arr3_protected; extern int arr3_protected;
#else
#define arr3_protected 0
#endif
void set_mtrr_ops(struct mtrr_ops * ops) void set_mtrr_ops(struct mtrr_ops * ops)
{ {
@ -168,6 +172,13 @@ static void ipi_handler(void *info)
#endif #endif
static inline int types_compatible(mtrr_type type1, mtrr_type type2) {
return type1 == MTRR_TYPE_UNCACHABLE ||
type2 == MTRR_TYPE_UNCACHABLE ||
(type1 == MTRR_TYPE_WRTHROUGH && type2 == MTRR_TYPE_WRBACK) ||
(type1 == MTRR_TYPE_WRBACK && type2 == MTRR_TYPE_WRTHROUGH);
}
/** /**
* set_mtrr - update mtrrs on all processors * set_mtrr - update mtrrs on all processors
* @reg: mtrr in question * @reg: mtrr in question
@ -263,8 +274,8 @@ static void set_mtrr(unsigned int reg, unsigned long base,
/** /**
* mtrr_add_page - Add a memory type region * mtrr_add_page - Add a memory type region
* @base: Physical base address of region in pages (4 KB) * @base: Physical base address of region in pages (in units of 4 kB!)
* @size: Physical size of region in pages (4 KB) * @size: Physical size of region in pages (4 kB)
* @type: Type of MTRR desired * @type: Type of MTRR desired
* @increment: If this is true do usage counting on the region * @increment: If this is true do usage counting on the region
* *
@ -300,11 +311,9 @@ static void set_mtrr(unsigned int reg, unsigned long base,
int mtrr_add_page(unsigned long base, unsigned long size, int mtrr_add_page(unsigned long base, unsigned long size,
unsigned int type, char increment) unsigned int type, char increment)
{ {
int i; int i, replace, error;
mtrr_type ltype; mtrr_type ltype;
unsigned long lbase; unsigned long lbase, lsize;
unsigned int lsize;
int error;
if (!mtrr_if) if (!mtrr_if)
return -ENXIO; return -ENXIO;
@ -324,12 +333,18 @@ int mtrr_add_page(unsigned long base, unsigned long size,
return -ENOSYS; return -ENOSYS;
} }
if (!size) {
printk(KERN_WARNING "mtrr: zero sized request\n");
return -EINVAL;
}
if (base & size_or_mask || size & size_or_mask) { if (base & size_or_mask || size & size_or_mask) {
printk(KERN_WARNING "mtrr: base or size exceeds the MTRR width\n"); printk(KERN_WARNING "mtrr: base or size exceeds the MTRR width\n");
return -EINVAL; return -EINVAL;
} }
error = -EINVAL; error = -EINVAL;
replace = -1;
/* No CPU hotplug when we change MTRR entries */ /* No CPU hotplug when we change MTRR entries */
lock_cpu_hotplug(); lock_cpu_hotplug();
@ -337,21 +352,28 @@ int mtrr_add_page(unsigned long base, unsigned long size,
mutex_lock(&mtrr_mutex); mutex_lock(&mtrr_mutex);
for (i = 0; i < num_var_ranges; ++i) { for (i = 0; i < num_var_ranges; ++i) {
mtrr_if->get(i, &lbase, &lsize, &ltype); mtrr_if->get(i, &lbase, &lsize, &ltype);
if (base >= lbase + lsize) if (!lsize || base > lbase + lsize - 1 || base + size - 1 < lbase)
continue;
if ((base < lbase) && (base + size <= lbase))
continue; continue;
/* At this point we know there is some kind of overlap/enclosure */ /* At this point we know there is some kind of overlap/enclosure */
if ((base < lbase) || (base + size > lbase + lsize)) { if (base < lbase || base + size - 1 > lbase + lsize - 1) {
if (base <= lbase && base + size - 1 >= lbase + lsize - 1) {
/* New region encloses an existing region */
if (type == ltype) {
replace = replace == -1 ? i : -2;
continue;
}
else if (types_compatible(type, ltype))
continue;
}
printk(KERN_WARNING printk(KERN_WARNING
"mtrr: 0x%lx000,0x%lx000 overlaps existing" "mtrr: 0x%lx000,0x%lx000 overlaps existing"
" 0x%lx000,0x%x000\n", base, size, lbase, " 0x%lx000,0x%lx000\n", base, size, lbase,
lsize); lsize);
goto out; goto out;
} }
/* New region is enclosed by an existing region */ /* New region is enclosed by an existing region */
if (ltype != type) { if (ltype != type) {
if (type == MTRR_TYPE_UNCACHABLE) if (types_compatible(type, ltype))
continue; continue;
printk (KERN_WARNING "mtrr: type mismatch for %lx000,%lx000 old: %s new: %s\n", printk (KERN_WARNING "mtrr: type mismatch for %lx000,%lx000 old: %s new: %s\n",
base, size, mtrr_attrib_to_str(ltype), base, size, mtrr_attrib_to_str(ltype),
@ -364,10 +386,18 @@ int mtrr_add_page(unsigned long base, unsigned long size,
goto out; goto out;
} }
/* Search for an empty MTRR */ /* Search for an empty MTRR */
i = mtrr_if->get_free_region(base, size); i = mtrr_if->get_free_region(base, size, replace);
if (i >= 0) { if (i >= 0) {
set_mtrr(i, base, size, type); set_mtrr(i, base, size, type);
usage_table[i] = 1; if (likely(replace < 0))
usage_table[i] = 1;
else {
usage_table[i] = usage_table[replace] + !!increment;
if (unlikely(replace != i)) {
set_mtrr(replace, 0, 0, 0);
usage_table[replace] = 0;
}
}
} else } else
printk(KERN_INFO "mtrr: no more MTRRs available\n"); printk(KERN_INFO "mtrr: no more MTRRs available\n");
error = i; error = i;
@ -455,8 +485,7 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
{ {
int i, max; int i, max;
mtrr_type ltype; mtrr_type ltype;
unsigned long lbase; unsigned long lbase, lsize;
unsigned int lsize;
int error = -EINVAL; int error = -EINVAL;
if (!mtrr_if) if (!mtrr_if)
@ -544,9 +573,11 @@ extern void centaur_init_mtrr(void);
static void __init init_ifs(void) static void __init init_ifs(void)
{ {
#ifndef CONFIG_X86_64
amd_init_mtrr(); amd_init_mtrr();
cyrix_init_mtrr(); cyrix_init_mtrr();
centaur_init_mtrr(); centaur_init_mtrr();
#endif
} }
/* The suspend/resume methods are only for CPU without MTRR. CPU using generic /* The suspend/resume methods are only for CPU without MTRR. CPU using generic
@ -555,7 +586,7 @@ static void __init init_ifs(void)
struct mtrr_value { struct mtrr_value {
mtrr_type ltype; mtrr_type ltype;
unsigned long lbase; unsigned long lbase;
unsigned int lsize; unsigned long lsize;
}; };
static struct mtrr_value * mtrr_state; static struct mtrr_value * mtrr_state;
@ -565,10 +596,8 @@ static int mtrr_save(struct sys_device * sysdev, pm_message_t state)
int i; int i;
int size = num_var_ranges * sizeof(struct mtrr_value); int size = num_var_ranges * sizeof(struct mtrr_value);
mtrr_state = kmalloc(size,GFP_ATOMIC); mtrr_state = kzalloc(size,GFP_ATOMIC);
if (mtrr_state) if (!mtrr_state)
memset(mtrr_state,0,size);
else
return -ENOMEM; return -ENOMEM;
for (i = 0; i < num_var_ranges; i++) { for (i = 0; i < num_var_ranges; i++) {

View file

@ -43,15 +43,16 @@ struct mtrr_ops {
void (*set_all)(void); void (*set_all)(void);
void (*get)(unsigned int reg, unsigned long *base, void (*get)(unsigned int reg, unsigned long *base,
unsigned int *size, mtrr_type * type); unsigned long *size, mtrr_type * type);
int (*get_free_region) (unsigned long base, unsigned long size); int (*get_free_region)(unsigned long base, unsigned long size,
int replace_reg);
int (*validate_add_page)(unsigned long base, unsigned long size, int (*validate_add_page)(unsigned long base, unsigned long size,
unsigned int type); unsigned int type);
int (*have_wrcomb)(void); int (*have_wrcomb)(void);
}; };
extern int generic_get_free_region(unsigned long base, unsigned long size); extern int generic_get_free_region(unsigned long base, unsigned long size,
int replace_reg);
extern int generic_validate_add_page(unsigned long base, unsigned long size, extern int generic_validate_add_page(unsigned long base, unsigned long size,
unsigned int type); unsigned int type);
@ -62,17 +63,17 @@ extern int positive_have_wrcomb(void);
/* library functions for processor-specific routines */ /* library functions for processor-specific routines */
struct set_mtrr_context { struct set_mtrr_context {
unsigned long flags; unsigned long flags;
unsigned long deftype_lo;
unsigned long deftype_hi;
unsigned long cr4val; unsigned long cr4val;
unsigned long ccr3; u32 deftype_lo;
u32 deftype_hi;
u32 ccr3;
}; };
struct mtrr_var_range { struct mtrr_var_range {
unsigned long base_lo; u32 base_lo;
unsigned long base_hi; u32 base_hi;
unsigned long mask_lo; u32 mask_lo;
unsigned long mask_hi; u32 mask_hi;
}; };
void set_mtrr_done(struct set_mtrr_context *ctxt); void set_mtrr_done(struct set_mtrr_context *ctxt);
@ -92,6 +93,6 @@ extern struct mtrr_ops * mtrr_if;
extern unsigned int num_var_ranges; extern unsigned int num_var_ranges;
void mtrr_state_warn(void); void mtrr_state_warn(void);
char *mtrr_attrib_to_str(int x); const char *mtrr_attrib_to_str(int x);
void mtrr_wrmsr(unsigned, unsigned, unsigned); void mtrr_wrmsr(unsigned, unsigned, unsigned);

View file

@ -152,9 +152,10 @@ static int show_cpuinfo(struct seq_file *m, void *v)
seq_printf(m, " [%d]", i); seq_printf(m, " [%d]", i);
} }
seq_printf(m, "\nbogomips\t: %lu.%02lu\n\n", seq_printf(m, "\nbogomips\t: %lu.%02lu\n",
c->loops_per_jiffy/(500000/HZ), c->loops_per_jiffy/(500000/HZ),
(c->loops_per_jiffy/(5000/HZ)) % 100); (c->loops_per_jiffy/(5000/HZ)) % 100);
seq_printf(m, "clflush size\t: %u\n\n", c->x86_clflush_size);
return 0; return 0;
} }

View file

@ -34,7 +34,6 @@
#include <linux/major.h> #include <linux/major.h>
#include <linux/fs.h> #include <linux/fs.h>
#include <linux/smp_lock.h> #include <linux/smp_lock.h>
#include <linux/fs.h>
#include <linux/device.h> #include <linux/device.h>
#include <linux/cpu.h> #include <linux/cpu.h>
#include <linux/notifier.h> #include <linux/notifier.h>

894
arch/i386/kernel/e820.c Normal file
View file

@ -0,0 +1,894 @@
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/init.h>
#include <linux/bootmem.h>
#include <linux/ioport.h>
#include <linux/string.h>
#include <linux/kexec.h>
#include <linux/module.h>
#include <linux/mm.h>
#include <linux/efi.h>
#include <linux/pfn.h>
#include <linux/uaccess.h>
#include <asm/pgtable.h>
#include <asm/page.h>
#include <asm/e820.h>
#ifdef CONFIG_EFI
int efi_enabled = 0;
EXPORT_SYMBOL(efi_enabled);
#endif
struct e820map e820;
struct change_member {
struct e820entry *pbios; /* pointer to original bios entry */
unsigned long long addr; /* address for this change point */
};
static struct change_member change_point_list[2*E820MAX] __initdata;
static struct change_member *change_point[2*E820MAX] __initdata;
static struct e820entry *overlap_list[E820MAX] __initdata;
static struct e820entry new_bios[E820MAX] __initdata;
/* For PCI or other memory-mapped resources */
unsigned long pci_mem_start = 0x10000000;
#ifdef CONFIG_PCI
EXPORT_SYMBOL(pci_mem_start);
#endif
extern int user_defined_memmap;
struct resource data_resource = {
.name = "Kernel data",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
struct resource code_resource = {
.name = "Kernel code",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
static struct resource system_rom_resource = {
.name = "System ROM",
.start = 0xf0000,
.end = 0xfffff,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
};
static struct resource extension_rom_resource = {
.name = "Extension ROM",
.start = 0xe0000,
.end = 0xeffff,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
};
static struct resource adapter_rom_resources[] = { {
.name = "Adapter ROM",
.start = 0xc8000,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
} };
static struct resource video_rom_resource = {
.name = "Video ROM",
.start = 0xc0000,
.end = 0xc7fff,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
};
static struct resource video_ram_resource = {
.name = "Video RAM area",
.start = 0xa0000,
.end = 0xbffff,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
static struct resource standard_io_resources[] = { {
.name = "dma1",
.start = 0x0000,
.end = 0x001f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "pic1",
.start = 0x0020,
.end = 0x0021,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "timer0",
.start = 0x0040,
.end = 0x0043,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "timer1",
.start = 0x0050,
.end = 0x0053,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "keyboard",
.start = 0x0060,
.end = 0x006f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "dma page reg",
.start = 0x0080,
.end = 0x008f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "pic2",
.start = 0x00a0,
.end = 0x00a1,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "dma2",
.start = 0x00c0,
.end = 0x00df,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "fpu",
.start = 0x00f0,
.end = 0x00ff,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
} };
static int romsignature(const unsigned char *x)
{
unsigned short sig;
int ret = 0;
if (probe_kernel_address((const unsigned short *)x, sig) == 0)
ret = (sig == 0xaa55);
return ret;
}
static int __init romchecksum(unsigned char *rom, unsigned long length)
{
unsigned char *p, sum = 0;
for (p = rom; p < rom + length; p++)
sum += *p;
return sum == 0;
}
static void __init probe_roms(void)
{
unsigned long start, length, upper;
unsigned char *rom;
int i;
/* video rom */
upper = adapter_rom_resources[0].start;
for (start = video_rom_resource.start; start < upper; start += 2048) {
rom = isa_bus_to_virt(start);
if (!romsignature(rom))
continue;
video_rom_resource.start = start;
/* 0 < length <= 0x7f * 512, historically */
length = rom[2] * 512;
/* if checksum okay, trust length byte */
if (length && romchecksum(rom, length))
video_rom_resource.end = start + length - 1;
request_resource(&iomem_resource, &video_rom_resource);
break;
}
start = (video_rom_resource.end + 1 + 2047) & ~2047UL;
if (start < upper)
start = upper;
/* system rom */
request_resource(&iomem_resource, &system_rom_resource);
upper = system_rom_resource.start;
/* check for extension rom (ignore length byte!) */
rom = isa_bus_to_virt(extension_rom_resource.start);
if (romsignature(rom)) {
length = extension_rom_resource.end - extension_rom_resource.start + 1;
if (romchecksum(rom, length)) {
request_resource(&iomem_resource, &extension_rom_resource);
upper = extension_rom_resource.start;
}
}
/* check for adapter roms on 2k boundaries */
for (i = 0; i < ARRAY_SIZE(adapter_rom_resources) && start < upper; start += 2048) {
rom = isa_bus_to_virt(start);
if (!romsignature(rom))
continue;
/* 0 < length <= 0x7f * 512, historically */
length = rom[2] * 512;
/* but accept any length that fits if checksum okay */
if (!length || start + length > upper || !romchecksum(rom, length))
continue;
adapter_rom_resources[i].start = start;
adapter_rom_resources[i].end = start + length - 1;
request_resource(&iomem_resource, &adapter_rom_resources[i]);
start = adapter_rom_resources[i++].end & ~2047UL;
}
}
/*
* Request address space for all standard RAM and ROM resources
* and also for regions reported as reserved by the e820.
*/
static void __init
legacy_init_iomem_resources(struct resource *code_resource, struct resource *data_resource)
{
int i;
probe_roms();
for (i = 0; i < e820.nr_map; i++) {
struct resource *res;
#ifndef CONFIG_RESOURCES_64BIT
if (e820.map[i].addr + e820.map[i].size > 0x100000000ULL)
continue;
#endif
res = kzalloc(sizeof(struct resource), GFP_ATOMIC);
switch (e820.map[i].type) {
case E820_RAM: res->name = "System RAM"; break;
case E820_ACPI: res->name = "ACPI Tables"; break;
case E820_NVS: res->name = "ACPI Non-volatile Storage"; break;
default: res->name = "reserved";
}
res->start = e820.map[i].addr;
res->end = res->start + e820.map[i].size - 1;
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
if (request_resource(&iomem_resource, res)) {
kfree(res);
continue;
}
if (e820.map[i].type == E820_RAM) {
/*
* We don't know which RAM region contains kernel data,
* so we try it repeatedly and let the resource manager
* test it.
*/
request_resource(res, code_resource);
request_resource(res, data_resource);
#ifdef CONFIG_KEXEC
request_resource(res, &crashk_res);
#endif
}
}
}
/*
* Request address space for all standard resources
*
* This is called just before pcibios_init(), which is also a
* subsys_initcall, but is linked in later (in arch/i386/pci/common.c).
*/
static int __init request_standard_resources(void)
{
int i;
printk("Setting up standard PCI resources\n");
if (efi_enabled)
efi_initialize_iomem_resources(&code_resource, &data_resource);
else
legacy_init_iomem_resources(&code_resource, &data_resource);
/* EFI systems may still have VGA */
request_resource(&iomem_resource, &video_ram_resource);
/* request I/O space for devices used on all i[345]86 PCs */
for (i = 0; i < ARRAY_SIZE(standard_io_resources); i++)
request_resource(&ioport_resource, &standard_io_resources[i]);
return 0;
}
subsys_initcall(request_standard_resources);
void __init add_memory_region(unsigned long long start,
unsigned long long size, int type)
{
int x;
if (!efi_enabled) {
x = e820.nr_map;
if (x == E820MAX) {
printk(KERN_ERR "Ooops! Too many entries in the memory map!\n");
return;
}
e820.map[x].addr = start;
e820.map[x].size = size;
e820.map[x].type = type;
e820.nr_map++;
}
} /* add_memory_region */
/*
* Sanitize the BIOS e820 map.
*
* Some e820 responses include overlapping entries. The following
* replaces the original e820 map with a new one, removing overlaps.
*
*/
int __init sanitize_e820_map(struct e820entry * biosmap, char * pnr_map)
{
struct change_member *change_tmp;
unsigned long current_type, last_type;
unsigned long long last_addr;
int chgidx, still_changing;
int overlap_entries;
int new_bios_entry;
int old_nr, new_nr, chg_nr;
int i;
/*
Visually we're performing the following (1,2,3,4 = memory types)...
Sample memory map (w/overlaps):
____22__________________
______________________4_
____1111________________
_44_____________________
11111111________________
____________________33__
___________44___________
__________33333_________
______________22________
___________________2222_
_________111111111______
_____________________11_
_________________4______
Sanitized equivalent (no overlap):
1_______________________
_44_____________________
___1____________________
____22__________________
______11________________
_________1______________
__________3_____________
___________44___________
_____________33_________
_______________2________
________________1_______
_________________4______
___________________2____
____________________33__
______________________4_
*/
printk("sanitize start\n");
/* if there's only one memory region, don't bother */
if (*pnr_map < 2) {
printk("sanitize bail 0\n");
return -1;
}
old_nr = *pnr_map;
/* bail out if we find any unreasonable addresses in bios map */
for (i=0; i<old_nr; i++)
if (biosmap[i].addr + biosmap[i].size < biosmap[i].addr) {
printk("sanitize bail 1\n");
return -1;
}
/* create pointers for initial change-point information (for sorting) */
for (i=0; i < 2*old_nr; i++)
change_point[i] = &change_point_list[i];
/* record all known change-points (starting and ending addresses),
omitting those that are for empty memory regions */
chgidx = 0;
for (i=0; i < old_nr; i++) {
if (biosmap[i].size != 0) {
change_point[chgidx]->addr = biosmap[i].addr;
change_point[chgidx++]->pbios = &biosmap[i];
change_point[chgidx]->addr = biosmap[i].addr + biosmap[i].size;
change_point[chgidx++]->pbios = &biosmap[i];
}
}
chg_nr = chgidx; /* true number of change-points */
/* sort change-point list by memory addresses (low -> high) */
still_changing = 1;
while (still_changing) {
still_changing = 0;
for (i=1; i < chg_nr; i++) {
/* if <current_addr> > <last_addr>, swap */
/* or, if current=<start_addr> & last=<end_addr>, swap */
if ((change_point[i]->addr < change_point[i-1]->addr) ||
((change_point[i]->addr == change_point[i-1]->addr) &&
(change_point[i]->addr == change_point[i]->pbios->addr) &&
(change_point[i-1]->addr != change_point[i-1]->pbios->addr))
)
{
change_tmp = change_point[i];
change_point[i] = change_point[i-1];
change_point[i-1] = change_tmp;
still_changing=1;
}
}
}
/* create a new bios memory map, removing overlaps */
overlap_entries=0; /* number of entries in the overlap table */
new_bios_entry=0; /* index for creating new bios map entries */
last_type = 0; /* start with undefined memory type */
last_addr = 0; /* start with 0 as last starting address */
/* loop through change-points, determining affect on the new bios map */
for (chgidx=0; chgidx < chg_nr; chgidx++)
{
/* keep track of all overlapping bios entries */
if (change_point[chgidx]->addr == change_point[chgidx]->pbios->addr)
{
/* add map entry to overlap list (> 1 entry implies an overlap) */
overlap_list[overlap_entries++]=change_point[chgidx]->pbios;
}
else
{
/* remove entry from list (order independent, so swap with last) */
for (i=0; i<overlap_entries; i++)
{
if (overlap_list[i] == change_point[chgidx]->pbios)
overlap_list[i] = overlap_list[overlap_entries-1];
}
overlap_entries--;
}
/* if there are overlapping entries, decide which "type" to use */
/* (larger value takes precedence -- 1=usable, 2,3,4,4+=unusable) */
current_type = 0;
for (i=0; i<overlap_entries; i++)
if (overlap_list[i]->type > current_type)
current_type = overlap_list[i]->type;
/* continue building up new bios map based on this information */
if (current_type != last_type) {
if (last_type != 0) {
new_bios[new_bios_entry].size =
change_point[chgidx]->addr - last_addr;
/* move forward only if the new size was non-zero */
if (new_bios[new_bios_entry].size != 0)
if (++new_bios_entry >= E820MAX)
break; /* no more space left for new bios entries */
}
if (current_type != 0) {
new_bios[new_bios_entry].addr = change_point[chgidx]->addr;
new_bios[new_bios_entry].type = current_type;
last_addr=change_point[chgidx]->addr;
}
last_type = current_type;
}
}
new_nr = new_bios_entry; /* retain count for new bios entries */
/* copy new bios mapping into original location */
memcpy(biosmap, new_bios, new_nr*sizeof(struct e820entry));
*pnr_map = new_nr;
printk("sanitize end\n");
return 0;
}
/*
* Copy the BIOS e820 map into a safe place.
*
* Sanity-check it while we're at it..
*
* If we're lucky and live on a modern system, the setup code
* will have given us a memory map that we can use to properly
* set up memory. If we aren't, we'll fake a memory map.
*
* We check to see that the memory map contains at least 2 elements
* before we'll use it, because the detection code in setup.S may
* not be perfect and most every PC known to man has two memory
* regions: one from 0 to 640k, and one from 1mb up. (The IBM
* thinkpad 560x, for example, does not cooperate with the memory
* detection code.)
*/
int __init copy_e820_map(struct e820entry * biosmap, int nr_map)
{
/* Only one memory region (or negative)? Ignore it */
if (nr_map < 2)
return -1;
do {
unsigned long long start = biosmap->addr;
unsigned long long size = biosmap->size;
unsigned long long end = start + size;
unsigned long type = biosmap->type;
printk("copy_e820_map() start: %016Lx size: %016Lx end: %016Lx type: %ld\n", start, size, end, type);
/* Overflow in 64 bits? Ignore the memory map. */
if (start > end)
return -1;
/*
* Some BIOSes claim RAM in the 640k - 1M region.
* Not right. Fix it up.
*/
if (type == E820_RAM) {
printk("copy_e820_map() type is E820_RAM\n");
if (start < 0x100000ULL && end > 0xA0000ULL) {
printk("copy_e820_map() lies in range...\n");
if (start < 0xA0000ULL) {
printk("copy_e820_map() start < 0xA0000ULL\n");
add_memory_region(start, 0xA0000ULL-start, type);
}
if (end <= 0x100000ULL) {
printk("copy_e820_map() end <= 0x100000ULL\n");
continue;
}
start = 0x100000ULL;
size = end - start;
}
}
add_memory_region(start, size, type);
} while (biosmap++,--nr_map);
return 0;
}
/*
* Callback for efi_memory_walk.
*/
static int __init
efi_find_max_pfn(unsigned long start, unsigned long end, void *arg)
{
unsigned long *max_pfn = arg, pfn;
if (start < end) {
pfn = PFN_UP(end -1);
if (pfn > *max_pfn)
*max_pfn = pfn;
}
return 0;
}
static int __init
efi_memory_present_wrapper(unsigned long start, unsigned long end, void *arg)
{
memory_present(0, PFN_UP(start), PFN_DOWN(end));
return 0;
}
/*
* Find the highest page frame number we have available
*/
void __init find_max_pfn(void)
{
int i;
max_pfn = 0;
if (efi_enabled) {
efi_memmap_walk(efi_find_max_pfn, &max_pfn);
efi_memmap_walk(efi_memory_present_wrapper, NULL);
return;
}
for (i = 0; i < e820.nr_map; i++) {
unsigned long start, end;
/* RAM? */
if (e820.map[i].type != E820_RAM)
continue;
start = PFN_UP(e820.map[i].addr);
end = PFN_DOWN(e820.map[i].addr + e820.map[i].size);
if (start >= end)
continue;
if (end > max_pfn)
max_pfn = end;
memory_present(0, start, end);
}
}
/*
* Free all available memory for boot time allocation. Used
* as a callback function by efi_memory_walk()
*/
static int __init
free_available_memory(unsigned long start, unsigned long end, void *arg)
{
/* check max_low_pfn */
if (start >= (max_low_pfn << PAGE_SHIFT))
return 0;
if (end >= (max_low_pfn << PAGE_SHIFT))
end = max_low_pfn << PAGE_SHIFT;
if (start < end)
free_bootmem(start, end - start);
return 0;
}
/*
* Register fully available low RAM pages with the bootmem allocator.
*/
void __init register_bootmem_low_pages(unsigned long max_low_pfn)
{
int i;
if (efi_enabled) {
efi_memmap_walk(free_available_memory, NULL);
return;
}
for (i = 0; i < e820.nr_map; i++) {
unsigned long curr_pfn, last_pfn, size;
/*
* Reserve usable low memory
*/
if (e820.map[i].type != E820_RAM)
continue;
/*
* We are rounding up the start address of usable memory:
*/
curr_pfn = PFN_UP(e820.map[i].addr);
if (curr_pfn >= max_low_pfn)
continue;
/*
* ... and at the end of the usable range downwards:
*/
last_pfn = PFN_DOWN(e820.map[i].addr + e820.map[i].size);
if (last_pfn > max_low_pfn)
last_pfn = max_low_pfn;
/*
* .. finally, did all the rounding and playing
* around just make the area go away?
*/
if (last_pfn <= curr_pfn)
continue;
size = last_pfn - curr_pfn;
free_bootmem(PFN_PHYS(curr_pfn), PFN_PHYS(size));
}
}
void __init register_memory(void)
{
unsigned long gapstart, gapsize, round;
unsigned long long last;
int i;
/*
* Search for the bigest gap in the low 32 bits of the e820
* memory space.
*/
last = 0x100000000ull;
gapstart = 0x10000000;
gapsize = 0x400000;
i = e820.nr_map;
while (--i >= 0) {
unsigned long long start = e820.map[i].addr;
unsigned long long end = start + e820.map[i].size;
/*
* Since "last" is at most 4GB, we know we'll
* fit in 32 bits if this condition is true
*/
if (last > end) {
unsigned long gap = last - end;
if (gap > gapsize) {
gapsize = gap;
gapstart = end;
}
}
if (start < last)
last = start;
}
/*
* See how much we want to round up: start off with
* rounding to the next 1MB area.
*/
round = 0x100000;
while ((gapsize >> 4) > round)
round += round;
/* Fun with two's complement */
pci_mem_start = (gapstart + round) & -round;
printk("Allocating PCI resources starting at %08lx (gap: %08lx:%08lx)\n",
pci_mem_start, gapstart, gapsize);
}
void __init print_memory_map(char *who)
{
int i;
for (i = 0; i < e820.nr_map; i++) {
printk(" %s: %016Lx - %016Lx ", who,
e820.map[i].addr,
e820.map[i].addr + e820.map[i].size);
switch (e820.map[i].type) {
case E820_RAM: printk("(usable)\n");
break;
case E820_RESERVED:
printk("(reserved)\n");
break;
case E820_ACPI:
printk("(ACPI data)\n");
break;
case E820_NVS:
printk("(ACPI NVS)\n");
break;
default: printk("type %lu\n", e820.map[i].type);
break;
}
}
}
static __init __always_inline void efi_limit_regions(unsigned long long size)
{
unsigned long long current_addr = 0;
efi_memory_desc_t *md, *next_md;
void *p, *p1;
int i, j;
j = 0;
p1 = memmap.map;
for (p = p1, i = 0; p < memmap.map_end; p += memmap.desc_size, i++) {
md = p;
next_md = p1;
current_addr = md->phys_addr +
PFN_PHYS(md->num_pages);
if (is_available_memory(md)) {
if (md->phys_addr >= size) continue;
memcpy(next_md, md, memmap.desc_size);
if (current_addr >= size) {
next_md->num_pages -=
PFN_UP(current_addr-size);
}
p1 += memmap.desc_size;
next_md = p1;
j++;
} else if ((md->attribute & EFI_MEMORY_RUNTIME) ==
EFI_MEMORY_RUNTIME) {
/* In order to make runtime services
* available we have to include runtime
* memory regions in memory map */
memcpy(next_md, md, memmap.desc_size);
p1 += memmap.desc_size;
next_md = p1;
j++;
}
}
memmap.nr_map = j;
memmap.map_end = memmap.map +
(memmap.nr_map * memmap.desc_size);
}
void __init limit_regions(unsigned long long size)
{
unsigned long long current_addr;
int i;
print_memory_map("limit_regions start");
if (efi_enabled) {
efi_limit_regions(size);
return;
}
for (i = 0; i < e820.nr_map; i++) {
current_addr = e820.map[i].addr + e820.map[i].size;
if (current_addr < size)
continue;
if (e820.map[i].type != E820_RAM)
continue;
if (e820.map[i].addr >= size) {
/*
* This region starts past the end of the
* requested size, skip it completely.
*/
e820.nr_map = i;
} else {
e820.nr_map = i + 1;
e820.map[i].size -= current_addr - size;
}
print_memory_map("limit_regions endfor");
return;
}
print_memory_map("limit_regions endfunc");
}
/*
* This function checks if the entire range <start,end> is mapped with type.
*
* Note: this function only works correct if the e820 table is sorted and
* not-overlapping, which is the case
*/
int __init
e820_all_mapped(unsigned long s, unsigned long e, unsigned type)
{
u64 start = s;
u64 end = e;
int i;
for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = &e820.map[i];
if (type && ei->type != type)
continue;
/* is the region (part) in overlap with the current region ?*/
if (ei->addr >= end || ei->addr + ei->size <= start)
continue;
/* if the region is at the beginning of <start,end> we move
* start to the end of the region since it's ok until there
*/
if (ei->addr <= start)
start = ei->addr + ei->size;
/* if start is now at or beyond end, we're done, full
* coverage */
if (start >= end)
return 1; /* we're done */
}
return 0;
}
static int __init parse_memmap(char *arg)
{
if (!arg)
return -EINVAL;
if (strcmp(arg, "exactmap") == 0) {
#ifdef CONFIG_CRASH_DUMP
/* If we are doing a crash dump, we
* still need to know the real mem
* size before original memory map is
* reset.
*/
find_max_pfn();
saved_max_pfn = max_pfn;
#endif
e820.nr_map = 0;
user_defined_memmap = 1;
} else {
/* If the user specifies memory size, we
* limit the BIOS-provided memory map to
* that size. exactmap can be used to specify
* the exact map. mem=number can be used to
* trim the existing memory map.
*/
unsigned long long start_at, mem_size;
mem_size = memparse(arg, &arg);
if (*arg == '@') {
start_at = memparse(arg+1, &arg);
add_memory_region(start_at, mem_size, E820_RAM);
} else if (*arg == '#') {
start_at = memparse(arg+1, &arg);
add_memory_region(start_at, mem_size, E820_ACPI);
} else if (*arg == '$') {
start_at = memparse(arg+1, &arg);
add_memory_region(start_at, mem_size, E820_RESERVED);
} else {
limit_regions(mem_size);
user_defined_memmap = 1;
}
}
return 0;
}
early_param("memmap", parse_memmap);

View file

@ -194,17 +194,24 @@ inline int efi_set_rtc_mmss(unsigned long nowtime)
return 0; return 0;
} }
/* /*
* This should only be used during kernel init and before runtime * This is used during kernel init before runtime
* services have been remapped, therefore, we'll need to call in physical * services have been remapped and also during suspend, therefore,
* mode. Note, this call isn't used later, so mark it __init. * we'll need to call both in physical and virtual modes.
*/ */
inline unsigned long __init efi_get_time(void) inline unsigned long efi_get_time(void)
{ {
efi_status_t status; efi_status_t status;
efi_time_t eft; efi_time_t eft;
efi_time_cap_t cap; efi_time_cap_t cap;
status = phys_efi_get_time(&eft, &cap); if (efi.get_time) {
/* if we are in virtual mode use remapped function */
status = efi.get_time(&eft, &cap);
} else {
/* we are in physical mode */
status = phys_efi_get_time(&eft, &cap);
}
if (status != EFI_SUCCESS) if (status != EFI_SUCCESS)
printk("Oops: efitime: can't read time status: 0x%lx\n",status); printk("Oops: efitime: can't read time status: 0x%lx\n",status);

View file

@ -30,12 +30,13 @@
* 18(%esp) - %eax * 18(%esp) - %eax
* 1C(%esp) - %ds * 1C(%esp) - %ds
* 20(%esp) - %es * 20(%esp) - %es
* 24(%esp) - orig_eax * 24(%esp) - %gs
* 28(%esp) - %eip * 28(%esp) - orig_eax
* 2C(%esp) - %cs * 2C(%esp) - %eip
* 30(%esp) - %eflags * 30(%esp) - %cs
* 34(%esp) - %oldesp * 34(%esp) - %eflags
* 38(%esp) - %oldss * 38(%esp) - %oldesp
* 3C(%esp) - %oldss
* *
* "current" is in register %ebx during any slow entries. * "current" is in register %ebx during any slow entries.
*/ */
@ -48,26 +49,24 @@
#include <asm/smp.h> #include <asm/smp.h>
#include <asm/page.h> #include <asm/page.h>
#include <asm/desc.h> #include <asm/desc.h>
#include <asm/percpu.h>
#include <asm/dwarf2.h> #include <asm/dwarf2.h>
#include "irq_vectors.h" #include "irq_vectors.h"
#define nr_syscalls ((syscall_table_size)/4) /*
* We use macros for low-level operations which need to be overridden
* for paravirtualization. The following will never clobber any registers:
* INTERRUPT_RETURN (aka. "iret")
* GET_CR0_INTO_EAX (aka. "movl %cr0, %eax")
* ENABLE_INTERRUPTS_SYSEXIT (aka "sti; sysexit").
*
* For DISABLE_INTERRUPTS/ENABLE_INTERRUPTS (aka "cli"/"sti"), you must
* specify what registers can be overwritten (CLBR_NONE, CLBR_EAX/EDX/ECX/ANY).
* Allowing a register to be clobbered can shrink the paravirt replacement
* enough to patch inline, increasing performance.
*/
EBX = 0x00 #define nr_syscalls ((syscall_table_size)/4)
ECX = 0x04
EDX = 0x08
ESI = 0x0C
EDI = 0x10
EBP = 0x14
EAX = 0x18
DS = 0x1C
ES = 0x20
ORIG_EAX = 0x24
EIP = 0x28
CS = 0x2C
EFLAGS = 0x30
OLDESP = 0x34
OLDSS = 0x38
CF_MASK = 0x00000001 CF_MASK = 0x00000001
TF_MASK = 0x00000100 TF_MASK = 0x00000100
@ -76,23 +75,16 @@ DF_MASK = 0x00000400
NT_MASK = 0x00004000 NT_MASK = 0x00004000
VM_MASK = 0x00020000 VM_MASK = 0x00020000
/* These are replaces for paravirtualization */
#define DISABLE_INTERRUPTS cli
#define ENABLE_INTERRUPTS sti
#define ENABLE_INTERRUPTS_SYSEXIT sti; sysexit
#define INTERRUPT_RETURN iret
#define GET_CR0_INTO_EAX movl %cr0, %eax
#ifdef CONFIG_PREEMPT #ifdef CONFIG_PREEMPT
#define preempt_stop DISABLE_INTERRUPTS; TRACE_IRQS_OFF #define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
#else #else
#define preempt_stop #define preempt_stop(clobbers)
#define resume_kernel restore_nocheck #define resume_kernel restore_nocheck
#endif #endif
.macro TRACE_IRQS_IRET .macro TRACE_IRQS_IRET
#ifdef CONFIG_TRACE_IRQFLAGS #ifdef CONFIG_TRACE_IRQFLAGS
testl $IF_MASK,EFLAGS(%esp) # interrupts off? testl $IF_MASK,PT_EFLAGS(%esp) # interrupts off?
jz 1f jz 1f
TRACE_IRQS_ON TRACE_IRQS_ON
1: 1:
@ -107,6 +99,9 @@ VM_MASK = 0x00020000
#define SAVE_ALL \ #define SAVE_ALL \
cld; \ cld; \
pushl %gs; \
CFI_ADJUST_CFA_OFFSET 4;\
/*CFI_REL_OFFSET gs, 0;*/\
pushl %es; \ pushl %es; \
CFI_ADJUST_CFA_OFFSET 4;\ CFI_ADJUST_CFA_OFFSET 4;\
/*CFI_REL_OFFSET es, 0;*/\ /*CFI_REL_OFFSET es, 0;*/\
@ -136,7 +131,9 @@ VM_MASK = 0x00020000
CFI_REL_OFFSET ebx, 0;\ CFI_REL_OFFSET ebx, 0;\
movl $(__USER_DS), %edx; \ movl $(__USER_DS), %edx; \
movl %edx, %ds; \ movl %edx, %ds; \
movl %edx, %es; movl %edx, %es; \
movl $(__KERNEL_PDA), %edx; \
movl %edx, %gs
#define RESTORE_INT_REGS \ #define RESTORE_INT_REGS \
popl %ebx; \ popl %ebx; \
@ -169,17 +166,22 @@ VM_MASK = 0x00020000
2: popl %es; \ 2: popl %es; \
CFI_ADJUST_CFA_OFFSET -4;\ CFI_ADJUST_CFA_OFFSET -4;\
/*CFI_RESTORE es;*/\ /*CFI_RESTORE es;*/\
.section .fixup,"ax"; \ 3: popl %gs; \
3: movl $0,(%esp); \ CFI_ADJUST_CFA_OFFSET -4;\
jmp 1b; \ /*CFI_RESTORE gs;*/\
.pushsection .fixup,"ax"; \
4: movl $0,(%esp); \ 4: movl $0,(%esp); \
jmp 1b; \
5: movl $0,(%esp); \
jmp 2b; \ jmp 2b; \
.previous; \ 6: movl $0,(%esp); \
jmp 3b; \
.section __ex_table,"a";\ .section __ex_table,"a";\
.align 4; \ .align 4; \
.long 1b,3b; \ .long 1b,4b; \
.long 2b,4b; \ .long 2b,5b; \
.previous .long 3b,6b; \
.popsection
#define RING0_INT_FRAME \ #define RING0_INT_FRAME \
CFI_STARTPROC simple;\ CFI_STARTPROC simple;\
@ -198,18 +200,18 @@ VM_MASK = 0x00020000
#define RING0_PTREGS_FRAME \ #define RING0_PTREGS_FRAME \
CFI_STARTPROC simple;\ CFI_STARTPROC simple;\
CFI_SIGNAL_FRAME;\ CFI_SIGNAL_FRAME;\
CFI_DEF_CFA esp, OLDESP-EBX;\ CFI_DEF_CFA esp, PT_OLDESP-PT_EBX;\
/*CFI_OFFSET cs, CS-OLDESP;*/\ /*CFI_OFFSET cs, PT_CS-PT_OLDESP;*/\
CFI_OFFSET eip, EIP-OLDESP;\ CFI_OFFSET eip, PT_EIP-PT_OLDESP;\
/*CFI_OFFSET es, ES-OLDESP;*/\ /*CFI_OFFSET es, PT_ES-PT_OLDESP;*/\
/*CFI_OFFSET ds, DS-OLDESP;*/\ /*CFI_OFFSET ds, PT_DS-PT_OLDESP;*/\
CFI_OFFSET eax, EAX-OLDESP;\ CFI_OFFSET eax, PT_EAX-PT_OLDESP;\
CFI_OFFSET ebp, EBP-OLDESP;\ CFI_OFFSET ebp, PT_EBP-PT_OLDESP;\
CFI_OFFSET edi, EDI-OLDESP;\ CFI_OFFSET edi, PT_EDI-PT_OLDESP;\
CFI_OFFSET esi, ESI-OLDESP;\ CFI_OFFSET esi, PT_ESI-PT_OLDESP;\
CFI_OFFSET edx, EDX-OLDESP;\ CFI_OFFSET edx, PT_EDX-PT_OLDESP;\
CFI_OFFSET ecx, ECX-OLDESP;\ CFI_OFFSET ecx, PT_ECX-PT_OLDESP;\
CFI_OFFSET ebx, EBX-OLDESP CFI_OFFSET ebx, PT_EBX-PT_OLDESP
ENTRY(ret_from_fork) ENTRY(ret_from_fork)
CFI_STARTPROC CFI_STARTPROC
@ -237,17 +239,18 @@ ENTRY(ret_from_fork)
ALIGN ALIGN
RING0_PTREGS_FRAME RING0_PTREGS_FRAME
ret_from_exception: ret_from_exception:
preempt_stop preempt_stop(CLBR_ANY)
ret_from_intr: ret_from_intr:
GET_THREAD_INFO(%ebp) GET_THREAD_INFO(%ebp)
check_userspace: check_userspace:
movl EFLAGS(%esp), %eax # mix EFLAGS and CS movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
movb CS(%esp), %al movb PT_CS(%esp), %al
andl $(VM_MASK | SEGMENT_RPL_MASK), %eax andl $(VM_MASK | SEGMENT_RPL_MASK), %eax
cmpl $USER_RPL, %eax cmpl $USER_RPL, %eax
jb resume_kernel # not returning to v8086 or userspace jb resume_kernel # not returning to v8086 or userspace
ENTRY(resume_userspace) ENTRY(resume_userspace)
DISABLE_INTERRUPTS # make sure we don't miss an interrupt DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
# setting need_resched or sigpending # setting need_resched or sigpending
# between sampling and the iret # between sampling and the iret
movl TI_flags(%ebp), %ecx movl TI_flags(%ebp), %ecx
@ -258,14 +261,14 @@ ENTRY(resume_userspace)
#ifdef CONFIG_PREEMPT #ifdef CONFIG_PREEMPT
ENTRY(resume_kernel) ENTRY(resume_kernel)
DISABLE_INTERRUPTS DISABLE_INTERRUPTS(CLBR_ANY)
cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ? cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ?
jnz restore_nocheck jnz restore_nocheck
need_resched: need_resched:
movl TI_flags(%ebp), %ecx # need_resched set ? movl TI_flags(%ebp), %ecx # need_resched set ?
testb $_TIF_NEED_RESCHED, %cl testb $_TIF_NEED_RESCHED, %cl
jz restore_all jz restore_all
testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception path) ? testl $IF_MASK,PT_EFLAGS(%esp) # interrupts off (exception path) ?
jz restore_all jz restore_all
call preempt_schedule_irq call preempt_schedule_irq
jmp need_resched jmp need_resched
@ -287,7 +290,7 @@ sysenter_past_esp:
* No need to follow this irqs on/off section: the syscall * No need to follow this irqs on/off section: the syscall
* disabled irqs and here we enable it straight after entry: * disabled irqs and here we enable it straight after entry:
*/ */
ENABLE_INTERRUPTS ENABLE_INTERRUPTS(CLBR_NONE)
pushl $(__USER_DS) pushl $(__USER_DS)
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
/*CFI_REL_OFFSET ss, 0*/ /*CFI_REL_OFFSET ss, 0*/
@ -331,20 +334,27 @@ sysenter_past_esp:
cmpl $(nr_syscalls), %eax cmpl $(nr_syscalls), %eax
jae syscall_badsys jae syscall_badsys
call *sys_call_table(,%eax,4) call *sys_call_table(,%eax,4)
movl %eax,EAX(%esp) movl %eax,PT_EAX(%esp)
DISABLE_INTERRUPTS DISABLE_INTERRUPTS(CLBR_ECX|CLBR_EDX)
TRACE_IRQS_OFF TRACE_IRQS_OFF
movl TI_flags(%ebp), %ecx movl TI_flags(%ebp), %ecx
testw $_TIF_ALLWORK_MASK, %cx testw $_TIF_ALLWORK_MASK, %cx
jne syscall_exit_work jne syscall_exit_work
/* if something modifies registers it must also disable sysexit */ /* if something modifies registers it must also disable sysexit */
movl EIP(%esp), %edx movl PT_EIP(%esp), %edx
movl OLDESP(%esp), %ecx movl PT_OLDESP(%esp), %ecx
xorl %ebp,%ebp xorl %ebp,%ebp
TRACE_IRQS_ON TRACE_IRQS_ON
1: mov PT_GS(%esp), %gs
ENABLE_INTERRUPTS_SYSEXIT ENABLE_INTERRUPTS_SYSEXIT
CFI_ENDPROC CFI_ENDPROC
.pushsection .fixup,"ax"
2: movl $0,PT_GS(%esp)
jmp 1b
.section __ex_table,"a"
.align 4
.long 1b,2b
.popsection
# system call handler stub # system call handler stub
ENTRY(system_call) ENTRY(system_call)
@ -353,7 +363,7 @@ ENTRY(system_call)
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
SAVE_ALL SAVE_ALL
GET_THREAD_INFO(%ebp) GET_THREAD_INFO(%ebp)
testl $TF_MASK,EFLAGS(%esp) testl $TF_MASK,PT_EFLAGS(%esp)
jz no_singlestep jz no_singlestep
orl $_TIF_SINGLESTEP,TI_flags(%ebp) orl $_TIF_SINGLESTEP,TI_flags(%ebp)
no_singlestep: no_singlestep:
@ -365,9 +375,9 @@ no_singlestep:
jae syscall_badsys jae syscall_badsys
syscall_call: syscall_call:
call *sys_call_table(,%eax,4) call *sys_call_table(,%eax,4)
movl %eax,EAX(%esp) # store the return value movl %eax,PT_EAX(%esp) # store the return value
syscall_exit: syscall_exit:
DISABLE_INTERRUPTS # make sure we don't miss an interrupt DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
# setting need_resched or sigpending # setting need_resched or sigpending
# between sampling and the iret # between sampling and the iret
TRACE_IRQS_OFF TRACE_IRQS_OFF
@ -376,12 +386,12 @@ syscall_exit:
jne syscall_exit_work jne syscall_exit_work
restore_all: restore_all:
movl EFLAGS(%esp), %eax # mix EFLAGS, SS and CS movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS
# Warning: OLDSS(%esp) contains the wrong/random values if we # Warning: PT_OLDSS(%esp) contains the wrong/random values if we
# are returning to the kernel. # are returning to the kernel.
# See comments in process.c:copy_thread() for details. # See comments in process.c:copy_thread() for details.
movb OLDSS(%esp), %ah movb PT_OLDSS(%esp), %ah
movb CS(%esp), %al movb PT_CS(%esp), %al
andl $(VM_MASK | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax andl $(VM_MASK | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax
CFI_REMEMBER_STATE CFI_REMEMBER_STATE
@ -390,13 +400,13 @@ restore_nocheck:
TRACE_IRQS_IRET TRACE_IRQS_IRET
restore_nocheck_notrace: restore_nocheck_notrace:
RESTORE_REGS RESTORE_REGS
addl $4, %esp addl $4, %esp # skip orig_eax/error_code
CFI_ADJUST_CFA_OFFSET -4 CFI_ADJUST_CFA_OFFSET -4
1: INTERRUPT_RETURN 1: INTERRUPT_RETURN
.section .fixup,"ax" .section .fixup,"ax"
iret_exc: iret_exc:
TRACE_IRQS_ON TRACE_IRQS_ON
ENABLE_INTERRUPTS ENABLE_INTERRUPTS(CLBR_NONE)
pushl $0 # no error code pushl $0 # no error code
pushl $do_iret_error pushl $do_iret_error
jmp error_code jmp error_code
@ -408,33 +418,42 @@ iret_exc:
CFI_RESTORE_STATE CFI_RESTORE_STATE
ldt_ss: ldt_ss:
larl OLDSS(%esp), %eax larl PT_OLDSS(%esp), %eax
jnz restore_nocheck jnz restore_nocheck
testl $0x00400000, %eax # returning to 32bit stack? testl $0x00400000, %eax # returning to 32bit stack?
jnz restore_nocheck # allright, normal return jnz restore_nocheck # allright, normal return
#ifdef CONFIG_PARAVIRT
/*
* The kernel can't run on a non-flat stack if paravirt mode
* is active. Rather than try to fixup the high bits of
* ESP, bypass this code entirely. This may break DOSemu
* and/or Wine support in a paravirt VM, although the option
* is still available to implement the setting of the high
* 16-bits in the INTERRUPT_RETURN paravirt-op.
*/
cmpl $0, paravirt_ops+PARAVIRT_enabled
jne restore_nocheck
#endif
/* If returning to userspace with 16bit stack, /* If returning to userspace with 16bit stack,
* try to fix the higher word of ESP, as the CPU * try to fix the higher word of ESP, as the CPU
* won't restore it. * won't restore it.
* This is an "official" bug of all the x86-compatible * This is an "official" bug of all the x86-compatible
* CPUs, which we can try to work around to make * CPUs, which we can try to work around to make
* dosemu and wine happy. */ * dosemu and wine happy. */
subl $8, %esp # reserve space for switch16 pointer movl PT_OLDESP(%esp), %eax
CFI_ADJUST_CFA_OFFSET 8 movl %esp, %edx
DISABLE_INTERRUPTS call patch_espfix_desc
pushl $__ESPFIX_SS
CFI_ADJUST_CFA_OFFSET 4
pushl %eax
CFI_ADJUST_CFA_OFFSET 4
DISABLE_INTERRUPTS(CLBR_EAX)
TRACE_IRQS_OFF TRACE_IRQS_OFF
movl %esp, %eax lss (%esp), %esp
/* Set up the 16bit stack frame with switch32 pointer on top, CFI_ADJUST_CFA_OFFSET -8
* and a switch16 pointer on top of the current frame. */ jmp restore_nocheck
call setup_x86_bogus_stack
CFI_ADJUST_CFA_OFFSET -8 # frame has moved
TRACE_IRQS_IRET
RESTORE_REGS
lss 20+4(%esp), %esp # switch to 16bit stack
1: INTERRUPT_RETURN
.section __ex_table,"a"
.align 4
.long 1b,iret_exc
.previous
CFI_ENDPROC CFI_ENDPROC
# perform work that needs to be done immediately before resumption # perform work that needs to be done immediately before resumption
@ -445,7 +464,7 @@ work_pending:
jz work_notifysig jz work_notifysig
work_resched: work_resched:
call schedule call schedule
DISABLE_INTERRUPTS # make sure we don't miss an interrupt DISABLE_INTERRUPTS(CLBR_ANY) # make sure we don't miss an interrupt
# setting need_resched or sigpending # setting need_resched or sigpending
# between sampling and the iret # between sampling and the iret
TRACE_IRQS_OFF TRACE_IRQS_OFF
@ -458,7 +477,8 @@ work_resched:
work_notifysig: # deal with pending signals and work_notifysig: # deal with pending signals and
# notify-resume requests # notify-resume requests
testl $VM_MASK, EFLAGS(%esp) #ifdef CONFIG_VM86
testl $VM_MASK, PT_EFLAGS(%esp)
movl %esp, %eax movl %esp, %eax
jne work_notifysig_v86 # returning to kernel-space or jne work_notifysig_v86 # returning to kernel-space or
# vm86-space # vm86-space
@ -468,29 +488,30 @@ work_notifysig: # deal with pending signals and
ALIGN ALIGN
work_notifysig_v86: work_notifysig_v86:
#ifdef CONFIG_VM86
pushl %ecx # save ti_flags for do_notify_resume pushl %ecx # save ti_flags for do_notify_resume
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
call save_v86_state # %eax contains pt_regs pointer call save_v86_state # %eax contains pt_regs pointer
popl %ecx popl %ecx
CFI_ADJUST_CFA_OFFSET -4 CFI_ADJUST_CFA_OFFSET -4
movl %eax, %esp movl %eax, %esp
#else
movl %esp, %eax
#endif
xorl %edx, %edx xorl %edx, %edx
call do_notify_resume call do_notify_resume
jmp resume_userspace_sig jmp resume_userspace_sig
#endif
# perform syscall exit tracing # perform syscall exit tracing
ALIGN ALIGN
syscall_trace_entry: syscall_trace_entry:
movl $-ENOSYS,EAX(%esp) movl $-ENOSYS,PT_EAX(%esp)
movl %esp, %eax movl %esp, %eax
xorl %edx,%edx xorl %edx,%edx
call do_syscall_trace call do_syscall_trace
cmpl $0, %eax cmpl $0, %eax
jne resume_userspace # ret != 0 -> running under PTRACE_SYSEMU, jne resume_userspace # ret != 0 -> running under PTRACE_SYSEMU,
# so must skip actual syscall # so must skip actual syscall
movl ORIG_EAX(%esp), %eax movl PT_ORIG_EAX(%esp), %eax
cmpl $(nr_syscalls), %eax cmpl $(nr_syscalls), %eax
jnae syscall_call jnae syscall_call
jmp syscall_exit jmp syscall_exit
@ -501,7 +522,7 @@ syscall_exit_work:
testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP), %cl testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP), %cl
jz work_pending jz work_pending
TRACE_IRQS_ON TRACE_IRQS_ON
ENABLE_INTERRUPTS # could let do_syscall_trace() call ENABLE_INTERRUPTS(CLBR_ANY) # could let do_syscall_trace() call
# schedule() instead # schedule() instead
movl %esp, %eax movl %esp, %eax
movl $1, %edx movl $1, %edx
@ -515,39 +536,38 @@ syscall_fault:
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
SAVE_ALL SAVE_ALL
GET_THREAD_INFO(%ebp) GET_THREAD_INFO(%ebp)
movl $-EFAULT,EAX(%esp) movl $-EFAULT,PT_EAX(%esp)
jmp resume_userspace jmp resume_userspace
syscall_badsys: syscall_badsys:
movl $-ENOSYS,EAX(%esp) movl $-ENOSYS,PT_EAX(%esp)
jmp resume_userspace jmp resume_userspace
CFI_ENDPROC CFI_ENDPROC
#define FIXUP_ESPFIX_STACK \ #define FIXUP_ESPFIX_STACK \
movl %esp, %eax; \ /* since we are on a wrong stack, we cant make it a C code :( */ \
/* switch to 32bit stack using the pointer on top of 16bit stack */ \ movl %gs:PDA_cpu, %ebx; \
lss %ss:CPU_16BIT_STACK_SIZE-8, %esp; \ PER_CPU(cpu_gdt_descr, %ebx); \
/* copy data from 16bit stack to 32bit stack */ \ movl GDS_address(%ebx), %ebx; \
call fixup_x86_bogus_stack; \ GET_DESC_BASE(GDT_ENTRY_ESPFIX_SS, %ebx, %eax, %ax, %al, %ah); \
/* put ESP to the proper location */ \ addl %esp, %eax; \
movl %eax, %esp; pushl $__KERNEL_DS; \
#define UNWIND_ESPFIX_STACK \ CFI_ADJUST_CFA_OFFSET 4; \
pushl %eax; \ pushl %eax; \
CFI_ADJUST_CFA_OFFSET 4; \ CFI_ADJUST_CFA_OFFSET 4; \
lss (%esp), %esp; \
CFI_ADJUST_CFA_OFFSET -8;
#define UNWIND_ESPFIX_STACK \
movl %ss, %eax; \ movl %ss, %eax; \
/* see if on 16bit stack */ \ /* see if on espfix stack */ \
cmpw $__ESPFIX_SS, %ax; \ cmpw $__ESPFIX_SS, %ax; \
je 28f; \ jne 27f; \
27: popl %eax; \ movl $__KERNEL_DS, %eax; \
CFI_ADJUST_CFA_OFFSET -4; \
.section .fixup,"ax"; \
28: movl $__KERNEL_DS, %eax; \
movl %eax, %ds; \ movl %eax, %ds; \
movl %eax, %es; \ movl %eax, %es; \
/* switch to 32bit stack */ \ /* switch to normal stack */ \
FIXUP_ESPFIX_STACK; \ FIXUP_ESPFIX_STACK; \
jmp 27b; \ 27:;
.previous
/* /*
* Build the entry stubs and pointer table with * Build the entry stubs and pointer table with
@ -608,13 +628,16 @@ KPROBE_ENTRY(page_fault)
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
ALIGN ALIGN
error_code: error_code:
/* the function address is in %gs's slot on the stack */
pushl %es
CFI_ADJUST_CFA_OFFSET 4
/*CFI_REL_OFFSET es, 0*/
pushl %ds pushl %ds
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
/*CFI_REL_OFFSET ds, 0*/ /*CFI_REL_OFFSET ds, 0*/
pushl %eax pushl %eax
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET eax, 0 CFI_REL_OFFSET eax, 0
xorl %eax, %eax
pushl %ebp pushl %ebp
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET ebp, 0 CFI_REL_OFFSET ebp, 0
@ -627,7 +650,6 @@ error_code:
pushl %edx pushl %edx
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET edx, 0 CFI_REL_OFFSET edx, 0
decl %eax # eax = -1
pushl %ecx pushl %ecx
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET ecx, 0 CFI_REL_OFFSET ecx, 0
@ -635,18 +657,20 @@ error_code:
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET ebx, 0 CFI_REL_OFFSET ebx, 0
cld cld
pushl %es pushl %gs
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
/*CFI_REL_OFFSET es, 0*/ /*CFI_REL_OFFSET gs, 0*/
movl $(__KERNEL_PDA), %ecx
movl %ecx, %gs
UNWIND_ESPFIX_STACK UNWIND_ESPFIX_STACK
popl %ecx popl %ecx
CFI_ADJUST_CFA_OFFSET -4 CFI_ADJUST_CFA_OFFSET -4
/*CFI_REGISTER es, ecx*/ /*CFI_REGISTER es, ecx*/
movl ES(%esp), %edi # get the function address movl PT_GS(%esp), %edi # get the function address
movl ORIG_EAX(%esp), %edx # get the error code movl PT_ORIG_EAX(%esp), %edx # get the error code
movl %eax, ORIG_EAX(%esp) movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart
movl %ecx, ES(%esp) mov %ecx, PT_GS(%esp)
/*CFI_REL_OFFSET es, ES*/ /*CFI_REL_OFFSET gs, ES*/
movl $(__USER_DS), %ecx movl $(__USER_DS), %ecx
movl %ecx, %ds movl %ecx, %ds
movl %ecx, %es movl %ecx, %es
@ -682,7 +706,7 @@ ENTRY(device_not_available)
GET_CR0_INTO_EAX GET_CR0_INTO_EAX
testl $0x4, %eax # EM (math emulation bit) testl $0x4, %eax # EM (math emulation bit)
jne device_not_available_emulate jne device_not_available_emulate
preempt_stop preempt_stop(CLBR_ANY)
call math_state_restore call math_state_restore
jmp ret_from_exception jmp ret_from_exception
device_not_available_emulate: device_not_available_emulate:
@ -754,7 +778,7 @@ KPROBE_ENTRY(nmi)
cmpw $__ESPFIX_SS, %ax cmpw $__ESPFIX_SS, %ax
popl %eax popl %eax
CFI_ADJUST_CFA_OFFSET -4 CFI_ADJUST_CFA_OFFSET -4
je nmi_16bit_stack je nmi_espfix_stack
cmpl $sysenter_entry,(%esp) cmpl $sysenter_entry,(%esp)
je nmi_stack_fixup je nmi_stack_fixup
pushl %eax pushl %eax
@ -797,7 +821,7 @@ nmi_debug_stack_check:
FIX_STACK(24,nmi_stack_correct, 1) FIX_STACK(24,nmi_stack_correct, 1)
jmp nmi_stack_correct jmp nmi_stack_correct
nmi_16bit_stack: nmi_espfix_stack:
/* We have a RING0_INT_FRAME here. /* We have a RING0_INT_FRAME here.
* *
* create the pointer to lss back * create the pointer to lss back
@ -806,7 +830,6 @@ nmi_16bit_stack:
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
pushl %esp pushl %esp
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
movzwl %sp, %esp
addw $4, (%esp) addw $4, (%esp)
/* copy the iret frame of 12 bytes */ /* copy the iret frame of 12 bytes */
.rept 3 .rept 3
@ -817,11 +840,11 @@ nmi_16bit_stack:
CFI_ADJUST_CFA_OFFSET 4 CFI_ADJUST_CFA_OFFSET 4
SAVE_ALL SAVE_ALL
FIXUP_ESPFIX_STACK # %eax == %esp FIXUP_ESPFIX_STACK # %eax == %esp
CFI_ADJUST_CFA_OFFSET -20 # the frame has now moved
xorl %edx,%edx # zero error code xorl %edx,%edx # zero error code
call do_nmi call do_nmi
RESTORE_REGS RESTORE_REGS
lss 12+4(%esp), %esp # back to 16bit stack lss 12+4(%esp), %esp # back to espfix stack
CFI_ADJUST_CFA_OFFSET -24
1: INTERRUPT_RETURN 1: INTERRUPT_RETURN
CFI_ENDPROC CFI_ENDPROC
.section __ex_table,"a" .section __ex_table,"a"
@ -830,6 +853,19 @@ nmi_16bit_stack:
.previous .previous
KPROBE_END(nmi) KPROBE_END(nmi)
#ifdef CONFIG_PARAVIRT
ENTRY(native_iret)
1: iret
.section __ex_table,"a"
.align 4
.long 1b,iret_exc
.previous
ENTRY(native_irq_enable_sysexit)
sti
sysexit
#endif
KPROBE_ENTRY(int3) KPROBE_ENTRY(int3)
RING0_INT_FRAME RING0_INT_FRAME
pushl $-1 # mark this as an int pushl $-1 # mark this as an int
@ -949,26 +985,27 @@ ENTRY(arch_unwind_init_running)
movl 4(%esp), %edx movl 4(%esp), %edx
movl (%esp), %ecx movl (%esp), %ecx
leal 4(%esp), %eax leal 4(%esp), %eax
movl %ebx, EBX(%edx) movl %ebx, PT_EBX(%edx)
xorl %ebx, %ebx xorl %ebx, %ebx
movl %ebx, ECX(%edx) movl %ebx, PT_ECX(%edx)
movl %ebx, EDX(%edx) movl %ebx, PT_EDX(%edx)
movl %esi, ESI(%edx) movl %esi, PT_ESI(%edx)
movl %edi, EDI(%edx) movl %edi, PT_EDI(%edx)
movl %ebp, EBP(%edx) movl %ebp, PT_EBP(%edx)
movl %ebx, EAX(%edx) movl %ebx, PT_EAX(%edx)
movl $__USER_DS, DS(%edx) movl $__USER_DS, PT_DS(%edx)
movl $__USER_DS, ES(%edx) movl $__USER_DS, PT_ES(%edx)
movl %ebx, ORIG_EAX(%edx) movl $0, PT_GS(%edx)
movl %ecx, EIP(%edx) movl %ebx, PT_ORIG_EAX(%edx)
movl %ecx, PT_EIP(%edx)
movl 12(%esp), %ecx movl 12(%esp), %ecx
movl $__KERNEL_CS, CS(%edx) movl $__KERNEL_CS, PT_CS(%edx)
movl %ebx, EFLAGS(%edx) movl %ebx, PT_EFLAGS(%edx)
movl %eax, OLDESP(%edx) movl %eax, PT_OLDESP(%edx)
movl 8(%esp), %eax movl 8(%esp), %eax
movl %ecx, 8(%esp) movl %ecx, 8(%esp)
movl EBX(%edx), %ebx movl PT_EBX(%edx), %ebx
movl $__KERNEL_DS, OLDSS(%edx) movl $__KERNEL_DS, PT_OLDSS(%edx)
jmpl *%eax jmpl *%eax
CFI_ENDPROC CFI_ENDPROC
ENDPROC(arch_unwind_init_running) ENDPROC(arch_unwind_init_running)

View file

@ -55,6 +55,12 @@
*/ */
ENTRY(startup_32) ENTRY(startup_32)
#ifdef CONFIG_PARAVIRT
movl %cs, %eax
testl $0x3, %eax
jnz startup_paravirt
#endif
/* /*
* Set segments to known values. * Set segments to known values.
*/ */
@ -302,6 +308,7 @@ is386: movl $2,%ecx # set MP
movl %eax,%cr0 movl %eax,%cr0
call check_x87 call check_x87
call setup_pda
lgdt cpu_gdt_descr lgdt cpu_gdt_descr
lidt idt_descr lidt idt_descr
ljmp $(__KERNEL_CS),$1f ljmp $(__KERNEL_CS),$1f
@ -312,10 +319,13 @@ is386: movl $2,%ecx # set MP
movl %eax,%ds movl %eax,%ds
movl %eax,%es movl %eax,%es
xorl %eax,%eax # Clear FS/GS and LDT xorl %eax,%eax # Clear FS and LDT
movl %eax,%fs movl %eax,%fs
movl %eax,%gs
lldt %ax lldt %ax
movl $(__KERNEL_PDA),%eax
mov %eax,%gs
cld # gcc2 wants the direction flag cleared at all times cld # gcc2 wants the direction flag cleared at all times
pushl $0 # fake return address for unwinder pushl $0 # fake return address for unwinder
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
@ -345,6 +355,23 @@ check_x87:
.byte 0xDB,0xE4 /* fsetpm for 287, ignored by 387 */ .byte 0xDB,0xE4 /* fsetpm for 287, ignored by 387 */
ret ret
/*
* Point the GDT at this CPU's PDA. On boot this will be
* cpu_gdt_table and boot_pda; for secondary CPUs, these will be
* that CPU's GDT and PDA.
*/
setup_pda:
/* get the PDA pointer */
movl start_pda, %eax
/* slot the PDA address into the GDT */
mov cpu_gdt_descr+2, %ecx
mov %ax, (__KERNEL_PDA+0+2)(%ecx) /* base & 0x0000ffff */
shr $16, %eax
mov %al, (__KERNEL_PDA+4+0)(%ecx) /* base & 0x00ff0000 */
mov %ah, (__KERNEL_PDA+4+3)(%ecx) /* base & 0xff000000 */
ret
/* /*
* setup_idt * setup_idt
* *
@ -465,6 +492,33 @@ ignore_int:
#endif #endif
iret iret
#ifdef CONFIG_PARAVIRT
startup_paravirt:
cld
movl $(init_thread_union+THREAD_SIZE),%esp
/* We take pains to preserve all the regs. */
pushl %edx
pushl %ecx
pushl %eax
/* paravirt.o is last in link, and that probe fn never returns */
pushl $__start_paravirtprobe
1:
movl 0(%esp), %eax
pushl (%eax)
movl 8(%esp), %eax
call *(%esp)
popl %eax
movl 4(%esp), %eax
movl 8(%esp), %ecx
movl 12(%esp), %edx
addl $4, (%esp)
jmp 1b
#endif
/* /*
* Real beginning of normal "text" segment * Real beginning of normal "text" segment
*/ */
@ -484,6 +538,8 @@ ENTRY(empty_zero_page)
* This starts the data section. * This starts the data section.
*/ */
.data .data
ENTRY(start_pda)
.long boot_pda
ENTRY(stack_start) ENTRY(stack_start)
.long init_thread_union+THREAD_SIZE .long init_thread_union+THREAD_SIZE
@ -525,7 +581,7 @@ idt_descr:
# boot GDT descriptor (later on used by CPU#0): # boot GDT descriptor (later on used by CPU#0):
.word 0 # 32 bit align gdt_desc.address .word 0 # 32 bit align gdt_desc.address
cpu_gdt_descr: ENTRY(cpu_gdt_descr)
.word GDT_ENTRIES*8-1 .word GDT_ENTRIES*8-1
.long cpu_gdt_table .long cpu_gdt_table
@ -584,8 +640,8 @@ ENTRY(cpu_gdt_table)
.quad 0x00009a000000ffff /* 0xc0 APM CS 16 code (16 bit) */ .quad 0x00009a000000ffff /* 0xc0 APM CS 16 code (16 bit) */
.quad 0x004092000000ffff /* 0xc8 APM DS data */ .quad 0x004092000000ffff /* 0xc8 APM DS data */
.quad 0x0000920000000000 /* 0xd0 - ESPFIX 16-bit SS */ .quad 0x00c0920000000000 /* 0xd0 - ESPFIX SS */
.quad 0x0000000000000000 /* 0xd8 - unused */ .quad 0x00cf92000000ffff /* 0xd8 - PDA */
.quad 0x0000000000000000 /* 0xe0 - unused */ .quad 0x0000000000000000 /* 0xe0 - unused */
.quad 0x0000000000000000 /* 0xe8 - unused */ .quad 0x0000000000000000 /* 0xe8 - unused */
.quad 0x0000000000000000 /* 0xf0 - unused */ .quad 0x0000000000000000 /* 0xf0 - unused */

View file

@ -34,6 +34,7 @@ static int __init init_hpet_clocksource(void)
unsigned long hpet_period; unsigned long hpet_period;
void __iomem* hpet_base; void __iomem* hpet_base;
u64 tmp; u64 tmp;
int err;
if (!is_hpet_enabled()) if (!is_hpet_enabled())
return -ENODEV; return -ENODEV;
@ -61,7 +62,11 @@ static int __init init_hpet_clocksource(void)
do_div(tmp, FSEC_PER_NSEC); do_div(tmp, FSEC_PER_NSEC);
clocksource_hpet.mult = (u32)tmp; clocksource_hpet.mult = (u32)tmp;
return clocksource_register(&clocksource_hpet); err = clocksource_register(&clocksource_hpet);
if (err)
iounmap(hpet_base);
return err;
} }
module_init(init_hpet_clocksource); module_init(init_hpet_clocksource);

View file

@ -381,7 +381,10 @@ void __init init_ISA_irqs (void)
} }
} }
void __init init_IRQ(void) /* Overridden in paravirt.c */
void init_IRQ(void) __attribute__((weak, alias("native_init_IRQ")));
void __init native_init_IRQ(void)
{ {
int i; int i;

View file

@ -154,14 +154,20 @@ static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
* the interrupt, and we need to make sure the entry is fully populated * the interrupt, and we need to make sure the entry is fully populated
* before that happens. * before that happens.
*/ */
static void
__ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
{
union entry_union eu;
eu.entry = e;
io_apic_write(apic, 0x11 + 2*pin, eu.w2);
io_apic_write(apic, 0x10 + 2*pin, eu.w1);
}
static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e) static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
{ {
unsigned long flags; unsigned long flags;
union entry_union eu;
eu.entry = e;
spin_lock_irqsave(&ioapic_lock, flags); spin_lock_irqsave(&ioapic_lock, flags);
io_apic_write(apic, 0x11 + 2*pin, eu.w2); __ioapic_write_entry(apic, pin, e);
io_apic_write(apic, 0x10 + 2*pin, eu.w1);
spin_unlock_irqrestore(&ioapic_lock, flags); spin_unlock_irqrestore(&ioapic_lock, flags);
} }
@ -837,8 +843,7 @@ static int __init find_isa_irq_pin(int irq, int type)
if ((mp_bus_id_to_type[lbus] == MP_BUS_ISA || if ((mp_bus_id_to_type[lbus] == MP_BUS_ISA ||
mp_bus_id_to_type[lbus] == MP_BUS_EISA || mp_bus_id_to_type[lbus] == MP_BUS_EISA ||
mp_bus_id_to_type[lbus] == MP_BUS_MCA || mp_bus_id_to_type[lbus] == MP_BUS_MCA
mp_bus_id_to_type[lbus] == MP_BUS_NEC98
) && ) &&
(mp_irqs[i].mpc_irqtype == type) && (mp_irqs[i].mpc_irqtype == type) &&
(mp_irqs[i].mpc_srcbusirq == irq)) (mp_irqs[i].mpc_srcbusirq == irq))
@ -857,8 +862,7 @@ static int __init find_isa_irq_apic(int irq, int type)
if ((mp_bus_id_to_type[lbus] == MP_BUS_ISA || if ((mp_bus_id_to_type[lbus] == MP_BUS_ISA ||
mp_bus_id_to_type[lbus] == MP_BUS_EISA || mp_bus_id_to_type[lbus] == MP_BUS_EISA ||
mp_bus_id_to_type[lbus] == MP_BUS_MCA || mp_bus_id_to_type[lbus] == MP_BUS_MCA
mp_bus_id_to_type[lbus] == MP_BUS_NEC98
) && ) &&
(mp_irqs[i].mpc_irqtype == type) && (mp_irqs[i].mpc_irqtype == type) &&
(mp_irqs[i].mpc_srcbusirq == irq)) (mp_irqs[i].mpc_srcbusirq == irq))
@ -988,12 +992,6 @@ static int EISA_ELCR(unsigned int irq)
#define default_MCA_trigger(idx) (1) #define default_MCA_trigger(idx) (1)
#define default_MCA_polarity(idx) (0) #define default_MCA_polarity(idx) (0)
/* NEC98 interrupts are always polarity zero edge triggered,
* when listed as conforming in the MP table. */
#define default_NEC98_trigger(idx) (0)
#define default_NEC98_polarity(idx) (0)
static int __init MPBIOS_polarity(int idx) static int __init MPBIOS_polarity(int idx)
{ {
int bus = mp_irqs[idx].mpc_srcbus; int bus = mp_irqs[idx].mpc_srcbus;
@ -1028,11 +1026,6 @@ static int __init MPBIOS_polarity(int idx)
polarity = default_MCA_polarity(idx); polarity = default_MCA_polarity(idx);
break; break;
} }
case MP_BUS_NEC98: /* NEC 98 pin */
{
polarity = default_NEC98_polarity(idx);
break;
}
default: default:
{ {
printk(KERN_WARNING "broken BIOS!!\n"); printk(KERN_WARNING "broken BIOS!!\n");
@ -1102,11 +1095,6 @@ static int MPBIOS_trigger(int idx)
trigger = default_MCA_trigger(idx); trigger = default_MCA_trigger(idx);
break; break;
} }
case MP_BUS_NEC98: /* NEC 98 pin */
{
trigger = default_NEC98_trigger(idx);
break;
}
default: default:
{ {
printk(KERN_WARNING "broken BIOS!!\n"); printk(KERN_WARNING "broken BIOS!!\n");
@ -1168,7 +1156,6 @@ static int pin_2_irq(int idx, int apic, int pin)
case MP_BUS_ISA: /* ISA pin */ case MP_BUS_ISA: /* ISA pin */
case MP_BUS_EISA: case MP_BUS_EISA:
case MP_BUS_MCA: case MP_BUS_MCA:
case MP_BUS_NEC98:
{ {
irq = mp_irqs[idx].mpc_srcbusirq; irq = mp_irqs[idx].mpc_srcbusirq;
break; break;
@ -1236,7 +1223,7 @@ static inline int IO_APIC_irq_trigger(int irq)
} }
/* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */ /* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */
u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 }; static u8 irq_vector[NR_IRQ_VECTORS] __read_mostly = { FIRST_DEVICE_VECTOR , 0 };
static int __assign_irq_vector(int irq) static int __assign_irq_vector(int irq)
{ {
@ -1361,8 +1348,8 @@ static void __init setup_IO_APIC_irqs(void)
if (!apic && (irq < 16)) if (!apic && (irq < 16))
disable_8259A_irq(irq); disable_8259A_irq(irq);
} }
ioapic_write_entry(apic, pin, entry);
spin_lock_irqsave(&ioapic_lock, flags); spin_lock_irqsave(&ioapic_lock, flags);
__ioapic_write_entry(apic, pin, entry);
set_native_irq_info(irq, TARGET_CPUS); set_native_irq_info(irq, TARGET_CPUS);
spin_unlock_irqrestore(&ioapic_lock, flags); spin_unlock_irqrestore(&ioapic_lock, flags);
} }
@ -1927,6 +1914,15 @@ static void __init setup_ioapic_ids_from_mpc(void)
static void __init setup_ioapic_ids_from_mpc(void) { } static void __init setup_ioapic_ids_from_mpc(void) { }
#endif #endif
static int no_timer_check __initdata;
static int __init notimercheck(char *s)
{
no_timer_check = 1;
return 1;
}
__setup("no_timer_check", notimercheck);
/* /*
* There is a nasty bug in some older SMP boards, their mptable lies * There is a nasty bug in some older SMP boards, their mptable lies
* about the timer IRQ. We do the following to work around the situation: * about the timer IRQ. We do the following to work around the situation:
@ -1935,10 +1931,13 @@ static void __init setup_ioapic_ids_from_mpc(void) { }
* - if this function detects that timer IRQs are defunct, then we fall * - if this function detects that timer IRQs are defunct, then we fall
* back to ISA timer IRQs * back to ISA timer IRQs
*/ */
static int __init timer_irq_works(void) int __init timer_irq_works(void)
{ {
unsigned long t1 = jiffies; unsigned long t1 = jiffies;
if (no_timer_check)
return 1;
local_irq_enable(); local_irq_enable();
/* Let ten ticks pass... */ /* Let ten ticks pass... */
mdelay((10 * 1000) / HZ); mdelay((10 * 1000) / HZ);
@ -2162,9 +2161,15 @@ static inline void unlock_ExtINT_logic(void)
unsigned char save_control, save_freq_select; unsigned char save_control, save_freq_select;
pin = find_isa_irq_pin(8, mp_INT); pin = find_isa_irq_pin(8, mp_INT);
apic = find_isa_irq_apic(8, mp_INT); if (pin == -1) {
if (pin == -1) WARN_ON_ONCE(1);
return; return;
}
apic = find_isa_irq_apic(8, mp_INT);
if (apic == -1) {
WARN_ON_ONCE(1);
return;
}
entry0 = ioapic_read_entry(apic, pin); entry0 = ioapic_read_entry(apic, pin);
clear_IO_APIC_pin(apic, pin); clear_IO_APIC_pin(apic, pin);
@ -2209,7 +2214,7 @@ int timer_uses_ioapic_pin_0;
* is so screwy. Thanks to Brian Perkins for testing/hacking this beast * is so screwy. Thanks to Brian Perkins for testing/hacking this beast
* fanatically on his truly buggy board. * fanatically on his truly buggy board.
*/ */
static inline void check_timer(void) static inline void __init check_timer(void)
{ {
int apic1, pin1, apic2, pin2; int apic1, pin1, apic2, pin2;
int vector; int vector;
@ -2857,8 +2862,8 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a
if (!ioapic && (irq < 16)) if (!ioapic && (irq < 16))
disable_8259A_irq(irq); disable_8259A_irq(irq);
ioapic_write_entry(ioapic, pin, entry);
spin_lock_irqsave(&ioapic_lock, flags); spin_lock_irqsave(&ioapic_lock, flags);
__ioapic_write_entry(ioapic, pin, entry);
set_native_irq_info(irq, TARGET_CPUS); set_native_irq_info(irq, TARGET_CPUS);
spin_unlock_irqrestore(&ioapic_lock, flags); spin_unlock_irqrestore(&ioapic_lock, flags);

View file

@ -160,16 +160,14 @@ static int read_default_ldt(void __user * ptr, unsigned long bytecount)
{ {
int err; int err;
unsigned long size; unsigned long size;
void *address;
err = 0; err = 0;
address = &default_ldt[0];
size = 5*sizeof(struct desc_struct); size = 5*sizeof(struct desc_struct);
if (size > bytecount) if (size > bytecount)
size = bytecount; size = bytecount;
err = size; err = size;
if (copy_to_user(ptr, address, size)) if (clear_user(ptr, size))
err = -EFAULT; err = -EFAULT;
return err; return err;

View file

@ -283,10 +283,9 @@ static int __init mca_init(void)
bus->f.mca_transform_memory = mca_dummy_transform_memory; bus->f.mca_transform_memory = mca_dummy_transform_memory;
/* get the motherboard device */ /* get the motherboard device */
mca_dev = kmalloc(sizeof(struct mca_device), GFP_KERNEL); mca_dev = kzalloc(sizeof(struct mca_device), GFP_KERNEL);
if(unlikely(!mca_dev)) if(unlikely(!mca_dev))
goto out_nomem; goto out_nomem;
memset(mca_dev, 0, sizeof(struct mca_device));
/* /*
* We do not expect many MCA interrupts during initialization, * We do not expect many MCA interrupts during initialization,
@ -310,11 +309,9 @@ static int __init mca_init(void)
mca_dev->slot = MCA_MOTHERBOARD; mca_dev->slot = MCA_MOTHERBOARD;
mca_register_device(MCA_PRIMARY_BUS, mca_dev); mca_register_device(MCA_PRIMARY_BUS, mca_dev);
mca_dev = kmalloc(sizeof(struct mca_device), GFP_ATOMIC); mca_dev = kzalloc(sizeof(struct mca_device), GFP_ATOMIC);
if(unlikely(!mca_dev)) if(unlikely(!mca_dev))
goto out_unlock_nomem; goto out_unlock_nomem;
memset(mca_dev, 0, sizeof(struct mca_device));
/* Put motherboard into video setup mode, read integrated video /* Put motherboard into video setup mode, read integrated video
* POS registers, and turn motherboard setup off. * POS registers, and turn motherboard setup off.
@ -349,10 +346,9 @@ static int __init mca_init(void)
} }
if(which_scsi) { if(which_scsi) {
/* found a scsi card */ /* found a scsi card */
mca_dev = kmalloc(sizeof(struct mca_device), GFP_ATOMIC); mca_dev = kzalloc(sizeof(struct mca_device), GFP_ATOMIC);
if(unlikely(!mca_dev)) if(unlikely(!mca_dev))
goto out_unlock_nomem; goto out_unlock_nomem;
memset(mca_dev, 0, sizeof(struct mca_device));
for(j = 0; j < 8; j++) for(j = 0; j < 8; j++)
mca_dev->pos[j] = pos[j]; mca_dev->pos[j] = pos[j];
@ -378,10 +374,9 @@ static int __init mca_init(void)
if(!mca_read_and_store_pos(pos)) if(!mca_read_and_store_pos(pos))
continue; continue;
mca_dev = kmalloc(sizeof(struct mca_device), GFP_ATOMIC); mca_dev = kzalloc(sizeof(struct mca_device), GFP_ATOMIC);
if(unlikely(!mca_dev)) if(unlikely(!mca_dev))
goto out_unlock_nomem; goto out_unlock_nomem;
memset(mca_dev, 0, sizeof(struct mca_device));
for(j=0; j<8; j++) for(j=0; j<8; j++)
mca_dev->pos[j]=pos[j]; mca_dev->pos[j]=pos[j];

View file

@ -108,7 +108,8 @@ int module_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs, const Elf_Shdr *sechdrs,
struct module *me) struct module *me)
{ {
const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL; const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
*para = NULL;
char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@ -118,6 +119,8 @@ int module_finalize(const Elf_Ehdr *hdr,
alt = s; alt = s;
if (!strcmp(".smp_locks", secstrings + s->sh_name)) if (!strcmp(".smp_locks", secstrings + s->sh_name))
locks= s; locks= s;
if (!strcmp(".parainstructions", secstrings + s->sh_name))
para = s;
} }
if (alt) { if (alt) {
@ -132,6 +135,12 @@ int module_finalize(const Elf_Ehdr *hdr,
lseg, lseg + locks->sh_size, lseg, lseg + locks->sh_size,
tseg, tseg + text->sh_size); tseg, tseg + text->sh_size);
} }
if (para) {
void *pseg = (void *)para->sh_addr;
apply_paravirt(pseg, pseg + para->sh_size);
}
return 0; return 0;
} }

View file

@ -249,8 +249,6 @@ static void __init MP_bus_info (struct mpc_config_bus *m)
mp_current_pci_id++; mp_current_pci_id++;
} else if (strncmp(str, BUSTYPE_MCA, sizeof(BUSTYPE_MCA)-1) == 0) { } else if (strncmp(str, BUSTYPE_MCA, sizeof(BUSTYPE_MCA)-1) == 0) {
mp_bus_id_to_type[m->mpc_busid] = MP_BUS_MCA; mp_bus_id_to_type[m->mpc_busid] = MP_BUS_MCA;
} else if (strncmp(str, BUSTYPE_NEC98, sizeof(BUSTYPE_NEC98)-1) == 0) {
mp_bus_id_to_type[m->mpc_busid] = MP_BUS_NEC98;
} else { } else {
printk(KERN_WARNING "Unknown bustype %s - ignoring\n", str); printk(KERN_WARNING "Unknown bustype %s - ignoring\n", str);
} }

View file

@ -195,7 +195,6 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
{ {
const u32 __user *tmp = (const u32 __user *)buf; const u32 __user *tmp = (const u32 __user *)buf;
u32 data[2]; u32 data[2];
size_t rv;
u32 reg = *ppos; u32 reg = *ppos;
int cpu = iminor(file->f_dentry->d_inode); int cpu = iminor(file->f_dentry->d_inode);
int err; int err;
@ -203,7 +202,7 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
if (count % 8) if (count % 8)
return -EINVAL; /* Invalid chunk size */ return -EINVAL; /* Invalid chunk size */
for (rv = 0; count; count -= 8) { for (; count; count -= 8) {
if (copy_from_user(&data, tmp, 8)) if (copy_from_user(&data, tmp, 8))
return -EFAULT; return -EFAULT;
err = do_wrmsr(cpu, reg, data[0], data[1]); err = do_wrmsr(cpu, reg, data[0], data[1]);

View file

@ -22,6 +22,7 @@
#include <linux/percpu.h> #include <linux/percpu.h>
#include <linux/dmi.h> #include <linux/dmi.h>
#include <linux/kprobes.h> #include <linux/kprobes.h>
#include <linux/cpumask.h>
#include <asm/smp.h> #include <asm/smp.h>
#include <asm/nmi.h> #include <asm/nmi.h>
@ -42,6 +43,8 @@ int nmi_watchdog_enabled;
static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner); static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner);
static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[3]); static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[3]);
static cpumask_t backtrace_mask = CPU_MASK_NONE;
/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's /* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
* offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now) * offset from MSR_P4_BSU_ESCR0. It will be the max for all platforms (for now)
*/ */
@ -867,14 +870,16 @@ static unsigned int
void touch_nmi_watchdog (void) void touch_nmi_watchdog (void)
{ {
int i; if (nmi_watchdog > 0) {
unsigned cpu;
/* /*
* Just reset the alert counters, (other CPUs might be * Just reset the alert counters, (other CPUs might be
* spinning on locks we hold): * spinning on locks we hold):
*/ */
for_each_possible_cpu(i) for_each_present_cpu (cpu)
alert_counter[i] = 0; alert_counter[cpu] = 0;
}
/* /*
* Tickle the softlockup detector too: * Tickle the softlockup detector too:
@ -907,6 +912,16 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
touched = 1; touched = 1;
} }
if (cpu_isset(cpu, backtrace_mask)) {
static DEFINE_SPINLOCK(lock); /* Serialise the printks */
spin_lock(&lock);
printk("NMI backtrace for cpu %d\n", cpu);
dump_stack();
spin_unlock(&lock);
cpu_clear(cpu, backtrace_mask);
}
sum = per_cpu(irq_stat, cpu).apic_timer_irqs; sum = per_cpu(irq_stat, cpu).apic_timer_irqs;
/* if the apic timer isn't firing, this cpu isn't doing much */ /* if the apic timer isn't firing, this cpu isn't doing much */
@ -1033,6 +1048,19 @@ int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file,
#endif #endif
void __trigger_all_cpu_backtrace(void)
{
int i;
backtrace_mask = cpu_online_map;
/* Wait for up to 10 seconds for all CPUs to do the backtrace */
for (i = 0; i < 10 * 1000; i++) {
if (cpus_empty(backtrace_mask))
break;
mdelay(1);
}
}
EXPORT_SYMBOL(nmi_active); EXPORT_SYMBOL(nmi_active);
EXPORT_SYMBOL(nmi_watchdog); EXPORT_SYMBOL(nmi_watchdog);
EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi); EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);

569
arch/i386/kernel/paravirt.c Normal file
View file

@ -0,0 +1,569 @@
/* Paravirtualization interfaces
Copyright (C) 2006 Rusty Russell IBM Corporation
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include <linux/errno.h>
#include <linux/module.h>
#include <linux/efi.h>
#include <linux/bcd.h>
#include <linux/start_kernel.h>
#include <asm/bug.h>
#include <asm/paravirt.h>
#include <asm/desc.h>
#include <asm/setup.h>
#include <asm/arch_hooks.h>
#include <asm/time.h>
#include <asm/irq.h>
#include <asm/delay.h>
#include <asm/fixmap.h>
#include <asm/apic.h>
#include <asm/tlbflush.h>
/* nop stub */
static void native_nop(void)
{
}
static void __init default_banner(void)
{
printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
paravirt_ops.name);
}
char *memory_setup(void)
{
return paravirt_ops.memory_setup();
}
/* Simple instruction patching code. */
#define DEF_NATIVE(name, code) \
extern const char start_##name[], end_##name[]; \
asm("start_" #name ": " code "; end_" #name ":")
DEF_NATIVE(cli, "cli");
DEF_NATIVE(sti, "sti");
DEF_NATIVE(popf, "push %eax; popf");
DEF_NATIVE(pushf, "pushf; pop %eax");
DEF_NATIVE(pushf_cli, "pushf; pop %eax; cli");
DEF_NATIVE(iret, "iret");
DEF_NATIVE(sti_sysexit, "sti; sysexit");
static const struct native_insns
{
const char *start, *end;
} native_insns[] = {
[PARAVIRT_IRQ_DISABLE] = { start_cli, end_cli },
[PARAVIRT_IRQ_ENABLE] = { start_sti, end_sti },
[PARAVIRT_RESTORE_FLAGS] = { start_popf, end_popf },
[PARAVIRT_SAVE_FLAGS] = { start_pushf, end_pushf },
[PARAVIRT_SAVE_FLAGS_IRQ_DISABLE] = { start_pushf_cli, end_pushf_cli },
[PARAVIRT_INTERRUPT_RETURN] = { start_iret, end_iret },
[PARAVIRT_STI_SYSEXIT] = { start_sti_sysexit, end_sti_sysexit },
};
static unsigned native_patch(u8 type, u16 clobbers, void *insns, unsigned len)
{
unsigned int insn_len;
/* Don't touch it if we don't have a replacement */
if (type >= ARRAY_SIZE(native_insns) || !native_insns[type].start)
return len;
insn_len = native_insns[type].end - native_insns[type].start;
/* Similarly if we can't fit replacement. */
if (len < insn_len)
return len;
memcpy(insns, native_insns[type].start, insn_len);
return insn_len;
}
static fastcall unsigned long native_get_debugreg(int regno)
{
unsigned long val = 0; /* Damn you, gcc! */
switch (regno) {
case 0:
asm("movl %%db0, %0" :"=r" (val)); break;
case 1:
asm("movl %%db1, %0" :"=r" (val)); break;
case 2:
asm("movl %%db2, %0" :"=r" (val)); break;
case 3:
asm("movl %%db3, %0" :"=r" (val)); break;
case 6:
asm("movl %%db6, %0" :"=r" (val)); break;
case 7:
asm("movl %%db7, %0" :"=r" (val)); break;
default:
BUG();
}
return val;
}
static fastcall void native_set_debugreg(int regno, unsigned long value)
{
switch (regno) {
case 0:
asm("movl %0,%%db0" : /* no output */ :"r" (value));
break;
case 1:
asm("movl %0,%%db1" : /* no output */ :"r" (value));
break;
case 2:
asm("movl %0,%%db2" : /* no output */ :"r" (value));
break;
case 3:
asm("movl %0,%%db3" : /* no output */ :"r" (value));
break;
case 6:
asm("movl %0,%%db6" : /* no output */ :"r" (value));
break;
case 7:
asm("movl %0,%%db7" : /* no output */ :"r" (value));
break;
default:
BUG();
}
}
void init_IRQ(void)
{
paravirt_ops.init_IRQ();
}
static fastcall void native_clts(void)
{
asm volatile ("clts");
}
static fastcall unsigned long native_read_cr0(void)
{
unsigned long val;
asm volatile("movl %%cr0,%0\n\t" :"=r" (val));
return val;
}
static fastcall void native_write_cr0(unsigned long val)
{
asm volatile("movl %0,%%cr0": :"r" (val));
}
static fastcall unsigned long native_read_cr2(void)
{
unsigned long val;
asm volatile("movl %%cr2,%0\n\t" :"=r" (val));
return val;
}
static fastcall void native_write_cr2(unsigned long val)
{
asm volatile("movl %0,%%cr2": :"r" (val));
}
static fastcall unsigned long native_read_cr3(void)
{
unsigned long val;
asm volatile("movl %%cr3,%0\n\t" :"=r" (val));
return val;
}
static fastcall void native_write_cr3(unsigned long val)
{
asm volatile("movl %0,%%cr3": :"r" (val));
}
static fastcall unsigned long native_read_cr4(void)
{
unsigned long val;
asm volatile("movl %%cr4,%0\n\t" :"=r" (val));
return val;
}
static fastcall unsigned long native_read_cr4_safe(void)
{
unsigned long val;
/* This could fault if %cr4 does not exist */
asm("1: movl %%cr4, %0 \n"
"2: \n"
".section __ex_table,\"a\" \n"
".long 1b,2b \n"
".previous \n"
: "=r" (val): "0" (0));
return val;
}
static fastcall void native_write_cr4(unsigned long val)
{
asm volatile("movl %0,%%cr4": :"r" (val));
}
static fastcall unsigned long native_save_fl(void)
{
unsigned long f;
asm volatile("pushfl ; popl %0":"=g" (f): /* no input */);
return f;
}
static fastcall void native_restore_fl(unsigned long f)
{
asm volatile("pushl %0 ; popfl": /* no output */
:"g" (f)
:"memory", "cc");
}
static fastcall void native_irq_disable(void)
{
asm volatile("cli": : :"memory");
}
static fastcall void native_irq_enable(void)
{
asm volatile("sti": : :"memory");
}
static fastcall void native_safe_halt(void)
{
asm volatile("sti; hlt": : :"memory");
}
static fastcall void native_halt(void)
{
asm volatile("hlt": : :"memory");
}
static fastcall void native_wbinvd(void)
{
asm volatile("wbinvd": : :"memory");
}
static fastcall unsigned long long native_read_msr(unsigned int msr, int *err)
{
unsigned long long val;
asm volatile("2: rdmsr ; xorl %0,%0\n"
"1:\n\t"
".section .fixup,\"ax\"\n\t"
"3: movl %3,%0 ; jmp 1b\n\t"
".previous\n\t"
".section __ex_table,\"a\"\n"
" .align 4\n\t"
" .long 2b,3b\n\t"
".previous"
: "=r" (*err), "=A" (val)
: "c" (msr), "i" (-EFAULT));
return val;
}
static fastcall int native_write_msr(unsigned int msr, unsigned long long val)
{
int err;
asm volatile("2: wrmsr ; xorl %0,%0\n"
"1:\n\t"
".section .fixup,\"ax\"\n\t"
"3: movl %4,%0 ; jmp 1b\n\t"
".previous\n\t"
".section __ex_table,\"a\"\n"
" .align 4\n\t"
" .long 2b,3b\n\t"
".previous"
: "=a" (err)
: "c" (msr), "0" ((u32)val), "d" ((u32)(val>>32)),
"i" (-EFAULT));
return err;
}
static fastcall unsigned long long native_read_tsc(void)
{
unsigned long long val;
asm volatile("rdtsc" : "=A" (val));
return val;
}
static fastcall unsigned long long native_read_pmc(void)
{
unsigned long long val;
asm volatile("rdpmc" : "=A" (val));
return val;
}
static fastcall void native_load_tr_desc(void)
{
asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8));
}
static fastcall void native_load_gdt(const struct Xgt_desc_struct *dtr)
{
asm volatile("lgdt %0"::"m" (*dtr));
}
static fastcall void native_load_idt(const struct Xgt_desc_struct *dtr)
{
asm volatile("lidt %0"::"m" (*dtr));
}
static fastcall void native_store_gdt(struct Xgt_desc_struct *dtr)
{
asm ("sgdt %0":"=m" (*dtr));
}
static fastcall void native_store_idt(struct Xgt_desc_struct *dtr)
{
asm ("sidt %0":"=m" (*dtr));
}
static fastcall unsigned long native_store_tr(void)
{
unsigned long tr;
asm ("str %0":"=r" (tr));
return tr;
}
static fastcall void native_load_tls(struct thread_struct *t, unsigned int cpu)
{
#define C(i) get_cpu_gdt_table(cpu)[GDT_ENTRY_TLS_MIN + i] = t->tls_array[i]
C(0); C(1); C(2);
#undef C
}
static inline void native_write_dt_entry(void *dt, int entry, u32 entry_low, u32 entry_high)
{
u32 *lp = (u32 *)((char *)dt + entry*8);
lp[0] = entry_low;
lp[1] = entry_high;
}
static fastcall void native_write_ldt_entry(void *dt, int entrynum, u32 low, u32 high)
{
native_write_dt_entry(dt, entrynum, low, high);
}
static fastcall void native_write_gdt_entry(void *dt, int entrynum, u32 low, u32 high)
{
native_write_dt_entry(dt, entrynum, low, high);
}
static fastcall void native_write_idt_entry(void *dt, int entrynum, u32 low, u32 high)
{
native_write_dt_entry(dt, entrynum, low, high);
}
static fastcall void native_load_esp0(struct tss_struct *tss,
struct thread_struct *thread)
{
tss->esp0 = thread->esp0;
/* This can only happen when SEP is enabled, no need to test "SEP"arately */
if (unlikely(tss->ss1 != thread->sysenter_cs)) {
tss->ss1 = thread->sysenter_cs;
wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
}
}
static fastcall void native_io_delay(void)
{
asm volatile("outb %al,$0x80");
}
static fastcall void native_flush_tlb(void)
{
__native_flush_tlb();
}
/*
* Global pages have to be flushed a bit differently. Not a real
* performance problem because this does not happen often.
*/
static fastcall void native_flush_tlb_global(void)
{
__native_flush_tlb_global();
}
static fastcall void native_flush_tlb_single(u32 addr)
{
__native_flush_tlb_single(addr);
}
#ifndef CONFIG_X86_PAE
static fastcall void native_set_pte(pte_t *ptep, pte_t pteval)
{
*ptep = pteval;
}
static fastcall void native_set_pte_at(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pteval)
{
*ptep = pteval;
}
static fastcall void native_set_pmd(pmd_t *pmdp, pmd_t pmdval)
{
*pmdp = pmdval;
}
#else /* CONFIG_X86_PAE */
static fastcall void native_set_pte(pte_t *ptep, pte_t pte)
{
ptep->pte_high = pte.pte_high;
smp_wmb();
ptep->pte_low = pte.pte_low;
}
static fastcall void native_set_pte_at(struct mm_struct *mm, u32 addr, pte_t *ptep, pte_t pte)
{
ptep->pte_high = pte.pte_high;
smp_wmb();
ptep->pte_low = pte.pte_low;
}
static fastcall void native_set_pte_present(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
{
ptep->pte_low = 0;
smp_wmb();
ptep->pte_high = pte.pte_high;
smp_wmb();
ptep->pte_low = pte.pte_low;
}
static fastcall void native_set_pte_atomic(pte_t *ptep, pte_t pteval)
{
set_64bit((unsigned long long *)ptep,pte_val(pteval));
}
static fastcall void native_set_pmd(pmd_t *pmdp, pmd_t pmdval)
{
set_64bit((unsigned long long *)pmdp,pmd_val(pmdval));
}
static fastcall void native_set_pud(pud_t *pudp, pud_t pudval)
{
*pudp = pudval;
}
static fastcall void native_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
ptep->pte_low = 0;
smp_wmb();
ptep->pte_high = 0;
}
static fastcall void native_pmd_clear(pmd_t *pmd)
{
u32 *tmp = (u32 *)pmd;
*tmp = 0;
smp_wmb();
*(tmp + 1) = 0;
}
#endif /* CONFIG_X86_PAE */
/* These are in entry.S */
extern fastcall void native_iret(void);
extern fastcall void native_irq_enable_sysexit(void);
static int __init print_banner(void)
{
paravirt_ops.banner();
return 0;
}
core_initcall(print_banner);
/* We simply declare start_kernel to be the paravirt probe of last resort. */
paravirt_probe(start_kernel);
struct paravirt_ops paravirt_ops = {
.name = "bare hardware",
.paravirt_enabled = 0,
.kernel_rpl = 0,
.patch = native_patch,
.banner = default_banner,
.arch_setup = native_nop,
.memory_setup = machine_specific_memory_setup,
.get_wallclock = native_get_wallclock,
.set_wallclock = native_set_wallclock,
.time_init = time_init_hook,
.init_IRQ = native_init_IRQ,
.cpuid = native_cpuid,
.get_debugreg = native_get_debugreg,
.set_debugreg = native_set_debugreg,
.clts = native_clts,
.read_cr0 = native_read_cr0,
.write_cr0 = native_write_cr0,
.read_cr2 = native_read_cr2,
.write_cr2 = native_write_cr2,
.read_cr3 = native_read_cr3,
.write_cr3 = native_write_cr3,
.read_cr4 = native_read_cr4,
.read_cr4_safe = native_read_cr4_safe,
.write_cr4 = native_write_cr4,
.save_fl = native_save_fl,
.restore_fl = native_restore_fl,
.irq_disable = native_irq_disable,
.irq_enable = native_irq_enable,
.safe_halt = native_safe_halt,
.halt = native_halt,
.wbinvd = native_wbinvd,
.read_msr = native_read_msr,
.write_msr = native_write_msr,
.read_tsc = native_read_tsc,
.read_pmc = native_read_pmc,
.load_tr_desc = native_load_tr_desc,
.set_ldt = native_set_ldt,
.load_gdt = native_load_gdt,
.load_idt = native_load_idt,
.store_gdt = native_store_gdt,
.store_idt = native_store_idt,
.store_tr = native_store_tr,
.load_tls = native_load_tls,
.write_ldt_entry = native_write_ldt_entry,
.write_gdt_entry = native_write_gdt_entry,
.write_idt_entry = native_write_idt_entry,
.load_esp0 = native_load_esp0,
.set_iopl_mask = native_set_iopl_mask,
.io_delay = native_io_delay,
.const_udelay = __const_udelay,
#ifdef CONFIG_X86_LOCAL_APIC
.apic_write = native_apic_write,
.apic_write_atomic = native_apic_write_atomic,
.apic_read = native_apic_read,
#endif
.flush_tlb_user = native_flush_tlb,
.flush_tlb_kernel = native_flush_tlb_global,
.flush_tlb_single = native_flush_tlb_single,
.set_pte = native_set_pte,
.set_pte_at = native_set_pte_at,
.set_pmd = native_set_pmd,
.pte_update = (void *)native_nop,
.pte_update_defer = (void *)native_nop,
#ifdef CONFIG_X86_PAE
.set_pte_atomic = native_set_pte_atomic,
.set_pte_present = native_set_pte_present,
.set_pud = native_set_pud,
.pte_clear = native_pte_clear,
.pmd_clear = native_pmd_clear,
#endif
.irq_enable_sysexit = native_irq_enable_sysexit,
.iret = native_iret,
};
EXPORT_SYMBOL(paravirt_ops);

View file

@ -92,14 +92,12 @@ int dma_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,
if (!mem_base) if (!mem_base)
goto out; goto out;
dev->dma_mem = kmalloc(sizeof(struct dma_coherent_mem), GFP_KERNEL); dev->dma_mem = kzalloc(sizeof(struct dma_coherent_mem), GFP_KERNEL);
if (!dev->dma_mem) if (!dev->dma_mem)
goto out; goto out;
memset(dev->dma_mem, 0, sizeof(struct dma_coherent_mem)); dev->dma_mem->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
dev->dma_mem->bitmap = kmalloc(bitmap_size, GFP_KERNEL);
if (!dev->dma_mem->bitmap) if (!dev->dma_mem->bitmap)
goto free1_out; goto free1_out;
memset(dev->dma_mem->bitmap, 0, bitmap_size);
dev->dma_mem->virt_base = mem_base; dev->dma_mem->virt_base = mem_base;
dev->dma_mem->device_base = device_addr; dev->dma_mem->device_base = device_addr;

View file

@ -56,6 +56,7 @@
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
#include <asm/cpu.h> #include <asm/cpu.h>
#include <asm/pda.h>
asmlinkage void ret_from_fork(void) __asm__("ret_from_fork"); asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
@ -99,22 +100,18 @@ EXPORT_SYMBOL(enable_hlt);
*/ */
void default_idle(void) void default_idle(void)
{ {
local_irq_enable();
if (!hlt_counter && boot_cpu_data.hlt_works_ok) { if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
current_thread_info()->status &= ~TS_POLLING; current_thread_info()->status &= ~TS_POLLING;
smp_mb__after_clear_bit(); smp_mb__after_clear_bit();
while (!need_resched()) { local_irq_disable();
local_irq_disable(); if (!need_resched())
if (!need_resched()) safe_halt(); /* enables interrupts racelessly */
safe_halt(); else
else local_irq_enable();
local_irq_enable();
}
current_thread_info()->status |= TS_POLLING; current_thread_info()->status |= TS_POLLING;
} else { } else {
while (!need_resched()) /* loop is done by the caller */
cpu_relax(); cpu_relax();
} }
} }
#ifdef CONFIG_APM_MODULE #ifdef CONFIG_APM_MODULE
@ -128,14 +125,7 @@ EXPORT_SYMBOL(default_idle);
*/ */
static void poll_idle (void) static void poll_idle (void)
{ {
local_irq_enable(); cpu_relax();
asm volatile(
"2:"
"testl %0, %1;"
"rep; nop;"
"je 2b;"
: : "i"(_TIF_NEED_RESCHED), "m" (current_thread_info()->flags));
} }
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
@ -256,8 +246,7 @@ void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
static void mwait_idle(void) static void mwait_idle(void)
{ {
local_irq_enable(); local_irq_enable();
while (!need_resched()) mwait_idle_with_hints(0, 0);
mwait_idle_with_hints(0, 0);
} }
void __devinit select_idle_routine(const struct cpuinfo_x86 *c) void __devinit select_idle_routine(const struct cpuinfo_x86 *c)
@ -314,8 +303,8 @@ void show_regs(struct pt_regs * regs)
regs->eax,regs->ebx,regs->ecx,regs->edx); regs->eax,regs->ebx,regs->ecx,regs->edx);
printk("ESI: %08lx EDI: %08lx EBP: %08lx", printk("ESI: %08lx EDI: %08lx EBP: %08lx",
regs->esi, regs->edi, regs->ebp); regs->esi, regs->edi, regs->ebp);
printk(" DS: %04x ES: %04x\n", printk(" DS: %04x ES: %04x GS: %04x\n",
0xffff & regs->xds,0xffff & regs->xes); 0xffff & regs->xds,0xffff & regs->xes, 0xffff & regs->xgs);
cr0 = read_cr0(); cr0 = read_cr0();
cr2 = read_cr2(); cr2 = read_cr2();
@ -346,6 +335,7 @@ int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
regs.xds = __USER_DS; regs.xds = __USER_DS;
regs.xes = __USER_DS; regs.xes = __USER_DS;
regs.xgs = __KERNEL_PDA;
regs.orig_eax = -1; regs.orig_eax = -1;
regs.eip = (unsigned long) kernel_thread_helper; regs.eip = (unsigned long) kernel_thread_helper;
regs.xcs = __KERNEL_CS | get_kernel_rpl(); regs.xcs = __KERNEL_CS | get_kernel_rpl();
@ -431,7 +421,6 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long esp,
p->thread.eip = (unsigned long) ret_from_fork; p->thread.eip = (unsigned long) ret_from_fork;
savesegment(fs,p->thread.fs); savesegment(fs,p->thread.fs);
savesegment(gs,p->thread.gs);
tsk = current; tsk = current;
if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) { if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
@ -508,7 +497,7 @@ void dump_thread(struct pt_regs * regs, struct user * dump)
dump->regs.ds = regs->xds; dump->regs.ds = regs->xds;
dump->regs.es = regs->xes; dump->regs.es = regs->xes;
savesegment(fs,dump->regs.fs); savesegment(fs,dump->regs.fs);
savesegment(gs,dump->regs.gs); dump->regs.gs = regs->xgs;
dump->regs.orig_eax = regs->orig_eax; dump->regs.orig_eax = regs->orig_eax;
dump->regs.eip = regs->eip; dump->regs.eip = regs->eip;
dump->regs.cs = regs->xcs; dump->regs.cs = regs->xcs;
@ -648,22 +637,27 @@ struct task_struct fastcall * __switch_to(struct task_struct *prev_p, struct tas
__unlazy_fpu(prev_p); __unlazy_fpu(prev_p);
/* we're going to use this soon, after a few expensive things */
if (next_p->fpu_counter > 5)
prefetch(&next->i387.fxsave);
/* /*
* Reload esp0. * Reload esp0.
*/ */
load_esp0(tss, next); load_esp0(tss, next);
/* /*
* Save away %fs and %gs. No need to save %es and %ds, as * Save away %fs. No need to save %gs, as it was saved on the
* those are always kernel segments while inside the kernel. * stack on entry. No need to save %es and %ds, as those are
* Doing this before setting the new TLS descriptors avoids * always kernel segments while inside the kernel. Doing this
* the situation where we temporarily have non-reloadable * before setting the new TLS descriptors avoids the situation
* segments in %fs and %gs. This could be an issue if the * where we temporarily have non-reloadable segments in %fs
* NMI handler ever used %fs or %gs (it does not today), or * and %gs. This could be an issue if the NMI handler ever
* if the kernel is running inside of a hypervisor layer. * used %fs or %gs (it does not today), or if the kernel is
* running inside of a hypervisor layer.
*/ */
savesegment(fs, prev->fs); savesegment(fs, prev->fs);
savesegment(gs, prev->gs);
/* /*
* Load the per-thread Thread-Local Storage descriptor. * Load the per-thread Thread-Local Storage descriptor.
@ -671,22 +665,14 @@ struct task_struct fastcall * __switch_to(struct task_struct *prev_p, struct tas
load_TLS(next, cpu); load_TLS(next, cpu);
/* /*
* Restore %fs and %gs if needed. * Restore %fs if needed.
* *
* Glibc normally makes %fs be zero, and %gs is one of * Glibc normally makes %fs be zero.
* the TLS segments.
*/ */
if (unlikely(prev->fs | next->fs)) if (unlikely(prev->fs | next->fs))
loadsegment(fs, next->fs); loadsegment(fs, next->fs);
if (prev->gs | next->gs) write_pda(pcurrent, next_p);
loadsegment(gs, next->gs);
/*
* Restore IOPL if needed.
*/
if (unlikely(prev->iopl != next->iopl))
set_iopl_mask(next->iopl);
/* /*
* Now maybe handle debug registers and/or IO bitmaps * Now maybe handle debug registers and/or IO bitmaps
@ -697,6 +683,13 @@ struct task_struct fastcall * __switch_to(struct task_struct *prev_p, struct tas
disable_tsc(prev_p, next_p); disable_tsc(prev_p, next_p);
/* If the task has used fpu the last 5 timeslices, just do a full
* restore of the math state immediately to avoid the trap; the
* chances of needing FPU soon are obviously high now
*/
if (next_p->fpu_counter > 5)
math_state_restore();
return prev_p; return prev_p;
} }

View file

@ -94,13 +94,9 @@ static int putreg(struct task_struct *child,
return -EIO; return -EIO;
child->thread.fs = value; child->thread.fs = value;
return 0; return 0;
case GS:
if (value && (value & 3) != 3)
return -EIO;
child->thread.gs = value;
return 0;
case DS: case DS:
case ES: case ES:
case GS:
if (value && (value & 3) != 3) if (value && (value & 3) != 3)
return -EIO; return -EIO;
value &= 0xffff; value &= 0xffff;
@ -116,8 +112,8 @@ static int putreg(struct task_struct *child,
value |= get_stack_long(child, EFL_OFFSET) & ~FLAG_MASK; value |= get_stack_long(child, EFL_OFFSET) & ~FLAG_MASK;
break; break;
} }
if (regno > GS*4) if (regno > ES*4)
regno -= 2*4; regno -= 1*4;
put_stack_long(child, regno - sizeof(struct pt_regs), value); put_stack_long(child, regno - sizeof(struct pt_regs), value);
return 0; return 0;
} }
@ -131,18 +127,16 @@ static unsigned long getreg(struct task_struct *child,
case FS: case FS:
retval = child->thread.fs; retval = child->thread.fs;
break; break;
case GS:
retval = child->thread.gs;
break;
case DS: case DS:
case ES: case ES:
case GS:
case SS: case SS:
case CS: case CS:
retval = 0xffff; retval = 0xffff;
/* fall through */ /* fall through */
default: default:
if (regno > GS*4) if (regno > ES*4)
regno -= 2*4; regno -= 1*4;
regno = regno - sizeof(struct pt_regs); regno = regno - sizeof(struct pt_regs);
retval &= get_stack_long(child, regno); retval &= get_stack_long(child, regno);
} }

View file

@ -3,10 +3,23 @@
*/ */
#include <linux/pci.h> #include <linux/pci.h>
#include <linux/irq.h> #include <linux/irq.h>
#include <asm/pci-direct.h>
#include <asm/genapic.h>
#include <asm/cpu.h>
#if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI) #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI)
static void __devinit verify_quirk_intel_irqbalance(struct pci_dev *dev)
{
#ifdef CONFIG_X86_64
if (genapic != &apic_flat)
panic("APIC mode must be flat on this system\n");
#elif defined(CONFIG_X86_GENERICARCH)
if (genapic != &apic_default)
panic("APIC mode must be default(flat) on this system. Use apic=default\n");
#endif
}
static void __devinit quirk_intel_irqbalance(struct pci_dev *dev) void __init quirk_intel_irqbalance(void)
{ {
u8 config, rev; u8 config, rev;
u32 word; u32 word;
@ -16,18 +29,18 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
* based platforms. * based platforms.
* Disable SW irqbalance/affinity on those platforms. * Disable SW irqbalance/affinity on those platforms.
*/ */
pci_read_config_byte(dev, PCI_CLASS_REVISION, &rev); rev = read_pci_config_byte(0, 0, 0, PCI_CLASS_REVISION);
if (rev > 0x9) if (rev > 0x9)
return; return;
printk(KERN_INFO "Intel E7520/7320/7525 detected."); printk(KERN_INFO "Intel E7520/7320/7525 detected.");
/* enable access to config space*/ /* enable access to config space */
pci_read_config_byte(dev, 0xf4, &config); config = read_pci_config_byte(0, 0, 0, 0xf4);
pci_write_config_byte(dev, 0xf4, config|0x2); write_pci_config_byte(0, 0, 0, 0xf4, config|0x2);
/* read xTPR register */ /* read xTPR register */
raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word); word = read_pci_config_16(0, 0, 0x40, 0x4c);
if (!(word & (1 << 13))) { if (!(word & (1 << 13))) {
printk(KERN_INFO "Disabling irq balancing and affinity\n"); printk(KERN_INFO "Disabling irq balancing and affinity\n");
@ -37,14 +50,25 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
noirqdebug_setup(""); noirqdebug_setup("");
#ifdef CONFIG_PROC_FS #ifdef CONFIG_PROC_FS
no_irq_affinity = 1; no_irq_affinity = 1;
#endif
#ifdef CONFIG_HOTPLUG_CPU
printk(KERN_INFO "Disabling cpu hotplug control\n");
enable_cpu_hotplug = 0;
#endif
#ifdef CONFIG_X86_64
/* force the genapic selection to flat mode so that
* interrupts can be redirected to more than one CPU.
*/
genapic_force = &apic_flat;
#endif #endif
} }
/* put back the original value for config space*/ /* put back the original value for config space */
if (!(config & 0x2)) if (!(config & 0x2))
pci_write_config_byte(dev, 0xf4, config); write_pci_config_byte(0, 0, 0, 0xf4, config);
} }
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7320_MCH, quirk_intel_irqbalance); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7320_MCH, verify_quirk_intel_irqbalance);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7525_MCH, quirk_intel_irqbalance); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7525_MCH, verify_quirk_intel_irqbalance);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7520_MCH, quirk_intel_irqbalance); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7520_MCH, verify_quirk_intel_irqbalance);
#endif #endif

View file

@ -63,9 +63,6 @@
#include <setup_arch.h> #include <setup_arch.h>
#include <bios_ebda.h> #include <bios_ebda.h>
/* Forward Declaration. */
void __init find_max_pfn(void);
/* This value is set up by the early boot code to point to the value /* This value is set up by the early boot code to point to the value
immediately after the boot time page tables. It contains a *physical* immediately after the boot time page tables. It contains a *physical*
address, and must not be in the .bss segment! */ address, and must not be in the .bss segment! */
@ -76,11 +73,8 @@ int disable_pse __devinitdata = 0;
/* /*
* Machine setup.. * Machine setup..
*/ */
extern struct resource code_resource;
#ifdef CONFIG_EFI extern struct resource data_resource;
int efi_enabled = 0;
EXPORT_SYMBOL(efi_enabled);
#endif
/* cpu data as detected by the assembly code in head.S */ /* cpu data as detected by the assembly code in head.S */
struct cpuinfo_x86 new_cpu_data __initdata = { 0, 0, 0, 0, -1, 1, 0, 0, -1 }; struct cpuinfo_x86 new_cpu_data __initdata = { 0, 0, 0, 0, -1, 1, 0, 0, -1 };
@ -99,12 +93,6 @@ unsigned int machine_submodel_id;
unsigned int BIOS_revision; unsigned int BIOS_revision;
unsigned int mca_pentium_flag; unsigned int mca_pentium_flag;
/* For PCI or other memory-mapped resources */
unsigned long pci_mem_start = 0x10000000;
#ifdef CONFIG_PCI
EXPORT_SYMBOL(pci_mem_start);
#endif
/* Boot loader ID as an integer, for the benefit of proc_dointvec */ /* Boot loader ID as an integer, for the benefit of proc_dointvec */
int bootloader_type; int bootloader_type;
@ -134,7 +122,6 @@ struct ist_info ist_info;
defined(CONFIG_X86_SPEEDSTEP_SMI_MODULE) defined(CONFIG_X86_SPEEDSTEP_SMI_MODULE)
EXPORT_SYMBOL(ist_info); EXPORT_SYMBOL(ist_info);
#endif #endif
struct e820map e820;
extern void early_cpu_init(void); extern void early_cpu_init(void);
extern int root_mountflags; extern int root_mountflags;
@ -149,516 +136,6 @@ static char command_line[COMMAND_LINE_SIZE];
unsigned char __initdata boot_params[PARAM_SIZE]; unsigned char __initdata boot_params[PARAM_SIZE];
static struct resource data_resource = {
.name = "Kernel data",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
static struct resource code_resource = {
.name = "Kernel code",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
static struct resource system_rom_resource = {
.name = "System ROM",
.start = 0xf0000,
.end = 0xfffff,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
};
static struct resource extension_rom_resource = {
.name = "Extension ROM",
.start = 0xe0000,
.end = 0xeffff,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
};
static struct resource adapter_rom_resources[] = { {
.name = "Adapter ROM",
.start = 0xc8000,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
}, {
.name = "Adapter ROM",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
} };
static struct resource video_rom_resource = {
.name = "Video ROM",
.start = 0xc0000,
.end = 0xc7fff,
.flags = IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
};
static struct resource video_ram_resource = {
.name = "Video RAM area",
.start = 0xa0000,
.end = 0xbffff,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
static struct resource standard_io_resources[] = { {
.name = "dma1",
.start = 0x0000,
.end = 0x001f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "pic1",
.start = 0x0020,
.end = 0x0021,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "timer0",
.start = 0x0040,
.end = 0x0043,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "timer1",
.start = 0x0050,
.end = 0x0053,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "keyboard",
.start = 0x0060,
.end = 0x006f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "dma page reg",
.start = 0x0080,
.end = 0x008f,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "pic2",
.start = 0x00a0,
.end = 0x00a1,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "dma2",
.start = 0x00c0,
.end = 0x00df,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
}, {
.name = "fpu",
.start = 0x00f0,
.end = 0x00ff,
.flags = IORESOURCE_BUSY | IORESOURCE_IO
} };
#define romsignature(x) (*(unsigned short *)(x) == 0xaa55)
static int __init romchecksum(unsigned char *rom, unsigned long length)
{
unsigned char *p, sum = 0;
for (p = rom; p < rom + length; p++)
sum += *p;
return sum == 0;
}
static void __init probe_roms(void)
{
unsigned long start, length, upper;
unsigned char *rom;
int i;
/* video rom */
upper = adapter_rom_resources[0].start;
for (start = video_rom_resource.start; start < upper; start += 2048) {
rom = isa_bus_to_virt(start);
if (!romsignature(rom))
continue;
video_rom_resource.start = start;
/* 0 < length <= 0x7f * 512, historically */
length = rom[2] * 512;
/* if checksum okay, trust length byte */
if (length && romchecksum(rom, length))
video_rom_resource.end = start + length - 1;
request_resource(&iomem_resource, &video_rom_resource);
break;
}
start = (video_rom_resource.end + 1 + 2047) & ~2047UL;
if (start < upper)
start = upper;
/* system rom */
request_resource(&iomem_resource, &system_rom_resource);
upper = system_rom_resource.start;
/* check for extension rom (ignore length byte!) */
rom = isa_bus_to_virt(extension_rom_resource.start);
if (romsignature(rom)) {
length = extension_rom_resource.end - extension_rom_resource.start + 1;
if (romchecksum(rom, length)) {
request_resource(&iomem_resource, &extension_rom_resource);
upper = extension_rom_resource.start;
}
}
/* check for adapter roms on 2k boundaries */
for (i = 0; i < ARRAY_SIZE(adapter_rom_resources) && start < upper; start += 2048) {
rom = isa_bus_to_virt(start);
if (!romsignature(rom))
continue;
/* 0 < length <= 0x7f * 512, historically */
length = rom[2] * 512;
/* but accept any length that fits if checksum okay */
if (!length || start + length > upper || !romchecksum(rom, length))
continue;
adapter_rom_resources[i].start = start;
adapter_rom_resources[i].end = start + length - 1;
request_resource(&iomem_resource, &adapter_rom_resources[i]);
start = adapter_rom_resources[i++].end & ~2047UL;
}
}
static void __init limit_regions(unsigned long long size)
{
unsigned long long current_addr = 0;
int i;
if (efi_enabled) {
efi_memory_desc_t *md;
void *p;
for (p = memmap.map, i = 0; p < memmap.map_end;
p += memmap.desc_size, i++) {
md = p;
current_addr = md->phys_addr + (md->num_pages << 12);
if (md->type == EFI_CONVENTIONAL_MEMORY) {
if (current_addr >= size) {
md->num_pages -=
(((current_addr-size) + PAGE_SIZE-1) >> PAGE_SHIFT);
memmap.nr_map = i + 1;
return;
}
}
}
}
for (i = 0; i < e820.nr_map; i++) {
current_addr = e820.map[i].addr + e820.map[i].size;
if (current_addr < size)
continue;
if (e820.map[i].type != E820_RAM)
continue;
if (e820.map[i].addr >= size) {
/*
* This region starts past the end of the
* requested size, skip it completely.
*/
e820.nr_map = i;
} else {
e820.nr_map = i + 1;
e820.map[i].size -= current_addr - size;
}
return;
}
}
void __init add_memory_region(unsigned long long start,
unsigned long long size, int type)
{
int x;
if (!efi_enabled) {
x = e820.nr_map;
if (x == E820MAX) {
printk(KERN_ERR "Ooops! Too many entries in the memory map!\n");
return;
}
e820.map[x].addr = start;
e820.map[x].size = size;
e820.map[x].type = type;
e820.nr_map++;
}
} /* add_memory_region */
#define E820_DEBUG 1
static void __init print_memory_map(char *who)
{
int i;
for (i = 0; i < e820.nr_map; i++) {
printk(" %s: %016Lx - %016Lx ", who,
e820.map[i].addr,
e820.map[i].addr + e820.map[i].size);
switch (e820.map[i].type) {
case E820_RAM: printk("(usable)\n");
break;
case E820_RESERVED:
printk("(reserved)\n");
break;
case E820_ACPI:
printk("(ACPI data)\n");
break;
case E820_NVS:
printk("(ACPI NVS)\n");
break;
default: printk("type %lu\n", e820.map[i].type);
break;
}
}
}
/*
* Sanitize the BIOS e820 map.
*
* Some e820 responses include overlapping entries. The following
* replaces the original e820 map with a new one, removing overlaps.
*
*/
struct change_member {
struct e820entry *pbios; /* pointer to original bios entry */
unsigned long long addr; /* address for this change point */
};
static struct change_member change_point_list[2*E820MAX] __initdata;
static struct change_member *change_point[2*E820MAX] __initdata;
static struct e820entry *overlap_list[E820MAX] __initdata;
static struct e820entry new_bios[E820MAX] __initdata;
int __init sanitize_e820_map(struct e820entry * biosmap, char * pnr_map)
{
struct change_member *change_tmp;
unsigned long current_type, last_type;
unsigned long long last_addr;
int chgidx, still_changing;
int overlap_entries;
int new_bios_entry;
int old_nr, new_nr, chg_nr;
int i;
/*
Visually we're performing the following (1,2,3,4 = memory types)...
Sample memory map (w/overlaps):
____22__________________
______________________4_
____1111________________
_44_____________________
11111111________________
____________________33__
___________44___________
__________33333_________
______________22________
___________________2222_
_________111111111______
_____________________11_
_________________4______
Sanitized equivalent (no overlap):
1_______________________
_44_____________________
___1____________________
____22__________________
______11________________
_________1______________
__________3_____________
___________44___________
_____________33_________
_______________2________
________________1_______
_________________4______
___________________2____
____________________33__
______________________4_
*/
/* if there's only one memory region, don't bother */
if (*pnr_map < 2)
return -1;
old_nr = *pnr_map;
/* bail out if we find any unreasonable addresses in bios map */
for (i=0; i<old_nr; i++)
if (biosmap[i].addr + biosmap[i].size < biosmap[i].addr)
return -1;
/* create pointers for initial change-point information (for sorting) */
for (i=0; i < 2*old_nr; i++)
change_point[i] = &change_point_list[i];
/* record all known change-points (starting and ending addresses),
omitting those that are for empty memory regions */
chgidx = 0;
for (i=0; i < old_nr; i++) {
if (biosmap[i].size != 0) {
change_point[chgidx]->addr = biosmap[i].addr;
change_point[chgidx++]->pbios = &biosmap[i];
change_point[chgidx]->addr = biosmap[i].addr + biosmap[i].size;
change_point[chgidx++]->pbios = &biosmap[i];
}
}
chg_nr = chgidx; /* true number of change-points */
/* sort change-point list by memory addresses (low -> high) */
still_changing = 1;
while (still_changing) {
still_changing = 0;
for (i=1; i < chg_nr; i++) {
/* if <current_addr> > <last_addr>, swap */
/* or, if current=<start_addr> & last=<end_addr>, swap */
if ((change_point[i]->addr < change_point[i-1]->addr) ||
((change_point[i]->addr == change_point[i-1]->addr) &&
(change_point[i]->addr == change_point[i]->pbios->addr) &&
(change_point[i-1]->addr != change_point[i-1]->pbios->addr))
)
{
change_tmp = change_point[i];
change_point[i] = change_point[i-1];
change_point[i-1] = change_tmp;
still_changing=1;
}
}
}
/* create a new bios memory map, removing overlaps */
overlap_entries=0; /* number of entries in the overlap table */
new_bios_entry=0; /* index for creating new bios map entries */
last_type = 0; /* start with undefined memory type */
last_addr = 0; /* start with 0 as last starting address */
/* loop through change-points, determining affect on the new bios map */
for (chgidx=0; chgidx < chg_nr; chgidx++)
{
/* keep track of all overlapping bios entries */
if (change_point[chgidx]->addr == change_point[chgidx]->pbios->addr)
{
/* add map entry to overlap list (> 1 entry implies an overlap) */
overlap_list[overlap_entries++]=change_point[chgidx]->pbios;
}
else
{
/* remove entry from list (order independent, so swap with last) */
for (i=0; i<overlap_entries; i++)
{
if (overlap_list[i] == change_point[chgidx]->pbios)
overlap_list[i] = overlap_list[overlap_entries-1];
}
overlap_entries--;
}
/* if there are overlapping entries, decide which "type" to use */
/* (larger value takes precedence -- 1=usable, 2,3,4,4+=unusable) */
current_type = 0;
for (i=0; i<overlap_entries; i++)
if (overlap_list[i]->type > current_type)
current_type = overlap_list[i]->type;
/* continue building up new bios map based on this information */
if (current_type != last_type) {
if (last_type != 0) {
new_bios[new_bios_entry].size =
change_point[chgidx]->addr - last_addr;
/* move forward only if the new size was non-zero */
if (new_bios[new_bios_entry].size != 0)
if (++new_bios_entry >= E820MAX)
break; /* no more space left for new bios entries */
}
if (current_type != 0) {
new_bios[new_bios_entry].addr = change_point[chgidx]->addr;
new_bios[new_bios_entry].type = current_type;
last_addr=change_point[chgidx]->addr;
}
last_type = current_type;
}
}
new_nr = new_bios_entry; /* retain count for new bios entries */
/* copy new bios mapping into original location */
memcpy(biosmap, new_bios, new_nr*sizeof(struct e820entry));
*pnr_map = new_nr;
return 0;
}
/*
* Copy the BIOS e820 map into a safe place.
*
* Sanity-check it while we're at it..
*
* If we're lucky and live on a modern system, the setup code
* will have given us a memory map that we can use to properly
* set up memory. If we aren't, we'll fake a memory map.
*
* We check to see that the memory map contains at least 2 elements
* before we'll use it, because the detection code in setup.S may
* not be perfect and most every PC known to man has two memory
* regions: one from 0 to 640k, and one from 1mb up. (The IBM
* thinkpad 560x, for example, does not cooperate with the memory
* detection code.)
*/
int __init copy_e820_map(struct e820entry * biosmap, int nr_map)
{
/* Only one memory region (or negative)? Ignore it */
if (nr_map < 2)
return -1;
do {
unsigned long long start = biosmap->addr;
unsigned long long size = biosmap->size;
unsigned long long end = start + size;
unsigned long type = biosmap->type;
/* Overflow in 64 bits? Ignore the memory map. */
if (start > end)
return -1;
/*
* Some BIOSes claim RAM in the 640k - 1M region.
* Not right. Fix it up.
*/
if (type == E820_RAM) {
if (start < 0x100000ULL && end > 0xA0000ULL) {
if (start < 0xA0000ULL)
add_memory_region(start, 0xA0000ULL-start, type);
if (end <= 0x100000ULL)
continue;
start = 0x100000ULL;
size = end - start;
}
}
add_memory_region(start, size, type);
} while (biosmap++,--nr_map);
return 0;
}
#if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE) #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
struct edd edd; struct edd edd;
#ifdef CONFIG_EDD_MODULE #ifdef CONFIG_EDD_MODULE
@ -682,7 +159,7 @@ static inline void copy_edd(void)
} }
#endif #endif
static int __initdata user_defined_memmap = 0; int __initdata user_defined_memmap = 0;
/* /*
* "mem=nopentium" disables the 4MB page tables. * "mem=nopentium" disables the 4MB page tables.
@ -719,51 +196,6 @@ static int __init parse_mem(char *arg)
} }
early_param("mem", parse_mem); early_param("mem", parse_mem);
static int __init parse_memmap(char *arg)
{
if (!arg)
return -EINVAL;
if (strcmp(arg, "exactmap") == 0) {
#ifdef CONFIG_CRASH_DUMP
/* If we are doing a crash dump, we
* still need to know the real mem
* size before original memory map is
* reset.
*/
find_max_pfn();
saved_max_pfn = max_pfn;
#endif
e820.nr_map = 0;
user_defined_memmap = 1;
} else {
/* If the user specifies memory size, we
* limit the BIOS-provided memory map to
* that size. exactmap can be used to specify
* the exact map. mem=number can be used to
* trim the existing memory map.
*/
unsigned long long start_at, mem_size;
mem_size = memparse(arg, &arg);
if (*arg == '@') {
start_at = memparse(arg+1, &arg);
add_memory_region(start_at, mem_size, E820_RAM);
} else if (*arg == '#') {
start_at = memparse(arg+1, &arg);
add_memory_region(start_at, mem_size, E820_ACPI);
} else if (*arg == '$') {
start_at = memparse(arg+1, &arg);
add_memory_region(start_at, mem_size, E820_RESERVED);
} else {
limit_regions(mem_size);
user_defined_memmap = 1;
}
}
return 0;
}
early_param("memmap", parse_memmap);
#ifdef CONFIG_PROC_VMCORE #ifdef CONFIG_PROC_VMCORE
/* elfcorehdr= specifies the location of elf core header /* elfcorehdr= specifies the location of elf core header
* stored by the crashed kernel. * stored by the crashed kernel.
@ -827,90 +259,6 @@ static int __init parse_reservetop(char *arg)
} }
early_param("reservetop", parse_reservetop); early_param("reservetop", parse_reservetop);
/*
* Callback for efi_memory_walk.
*/
static int __init
efi_find_max_pfn(unsigned long start, unsigned long end, void *arg)
{
unsigned long *max_pfn = arg, pfn;
if (start < end) {
pfn = PFN_UP(end -1);
if (pfn > *max_pfn)
*max_pfn = pfn;
}
return 0;
}
static int __init
efi_memory_present_wrapper(unsigned long start, unsigned long end, void *arg)
{
memory_present(0, PFN_UP(start), PFN_DOWN(end));
return 0;
}
/*
* This function checks if the entire range <start,end> is mapped with type.
*
* Note: this function only works correct if the e820 table is sorted and
* not-overlapping, which is the case
*/
int __init
e820_all_mapped(unsigned long s, unsigned long e, unsigned type)
{
u64 start = s;
u64 end = e;
int i;
for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = &e820.map[i];
if (type && ei->type != type)
continue;
/* is the region (part) in overlap with the current region ?*/
if (ei->addr >= end || ei->addr + ei->size <= start)
continue;
/* if the region is at the beginning of <start,end> we move
* start to the end of the region since it's ok until there
*/
if (ei->addr <= start)
start = ei->addr + ei->size;
/* if start is now at or beyond end, we're done, full
* coverage */
if (start >= end)
return 1; /* we're done */
}
return 0;
}
/*
* Find the highest page frame number we have available
*/
void __init find_max_pfn(void)
{
int i;
max_pfn = 0;
if (efi_enabled) {
efi_memmap_walk(efi_find_max_pfn, &max_pfn);
efi_memmap_walk(efi_memory_present_wrapper, NULL);
return;
}
for (i = 0; i < e820.nr_map; i++) {
unsigned long start, end;
/* RAM? */
if (e820.map[i].type != E820_RAM)
continue;
start = PFN_UP(e820.map[i].addr);
end = PFN_DOWN(e820.map[i].addr + e820.map[i].size);
if (start >= end)
continue;
if (end > max_pfn)
max_pfn = end;
memory_present(0, start, end);
}
}
/* /*
* Determine low and high memory ranges: * Determine low and high memory ranges:
*/ */
@ -970,68 +318,6 @@ unsigned long __init find_max_low_pfn(void)
return max_low_pfn; return max_low_pfn;
} }
/*
* Free all available memory for boot time allocation. Used
* as a callback function by efi_memory_walk()
*/
static int __init
free_available_memory(unsigned long start, unsigned long end, void *arg)
{
/* check max_low_pfn */
if (start >= (max_low_pfn << PAGE_SHIFT))
return 0;
if (end >= (max_low_pfn << PAGE_SHIFT))
end = max_low_pfn << PAGE_SHIFT;
if (start < end)
free_bootmem(start, end - start);
return 0;
}
/*
* Register fully available low RAM pages with the bootmem allocator.
*/
static void __init register_bootmem_low_pages(unsigned long max_low_pfn)
{
int i;
if (efi_enabled) {
efi_memmap_walk(free_available_memory, NULL);
return;
}
for (i = 0; i < e820.nr_map; i++) {
unsigned long curr_pfn, last_pfn, size;
/*
* Reserve usable low memory
*/
if (e820.map[i].type != E820_RAM)
continue;
/*
* We are rounding up the start address of usable memory:
*/
curr_pfn = PFN_UP(e820.map[i].addr);
if (curr_pfn >= max_low_pfn)
continue;
/*
* ... and at the end of the usable range downwards:
*/
last_pfn = PFN_DOWN(e820.map[i].addr + e820.map[i].size);
if (last_pfn > max_low_pfn)
last_pfn = max_low_pfn;
/*
* .. finally, did all the rounding and playing
* around just make the area go away?
*/
if (last_pfn <= curr_pfn)
continue;
size = last_pfn - curr_pfn;
free_bootmem(PFN_PHYS(curr_pfn), PFN_PHYS(size));
}
}
/* /*
* workaround for Dell systems that neglect to reserve EBDA * workaround for Dell systems that neglect to reserve EBDA
*/ */
@ -1118,8 +404,8 @@ void __init setup_bootmem_allocator(void)
* the (very unlikely) case of us accidentally initializing the * the (very unlikely) case of us accidentally initializing the
* bootmem allocator with an invalid RAM area. * bootmem allocator with an invalid RAM area.
*/ */
reserve_bootmem(__PHYSICAL_START, (PFN_PHYS(min_low_pfn) + reserve_bootmem(__pa_symbol(_text), (PFN_PHYS(min_low_pfn) +
bootmap_size + PAGE_SIZE-1) - (__PHYSICAL_START)); bootmap_size + PAGE_SIZE-1) - __pa_symbol(_text));
/* /*
* reserve physical page 0 - it's a special BIOS page on many boxes, * reserve physical page 0 - it's a special BIOS page on many boxes,
@ -1199,126 +485,6 @@ void __init remapped_pgdat_init(void)
} }
} }
/*
* Request address space for all standard RAM and ROM resources
* and also for regions reported as reserved by the e820.
*/
static void __init
legacy_init_iomem_resources(struct resource *code_resource, struct resource *data_resource)
{
int i;
probe_roms();
for (i = 0; i < e820.nr_map; i++) {
struct resource *res;
#ifndef CONFIG_RESOURCES_64BIT
if (e820.map[i].addr + e820.map[i].size > 0x100000000ULL)
continue;
#endif
res = kzalloc(sizeof(struct resource), GFP_ATOMIC);
switch (e820.map[i].type) {
case E820_RAM: res->name = "System RAM"; break;
case E820_ACPI: res->name = "ACPI Tables"; break;
case E820_NVS: res->name = "ACPI Non-volatile Storage"; break;
default: res->name = "reserved";
}
res->start = e820.map[i].addr;
res->end = res->start + e820.map[i].size - 1;
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
if (request_resource(&iomem_resource, res)) {
kfree(res);
continue;
}
if (e820.map[i].type == E820_RAM) {
/*
* We don't know which RAM region contains kernel data,
* so we try it repeatedly and let the resource manager
* test it.
*/
request_resource(res, code_resource);
request_resource(res, data_resource);
#ifdef CONFIG_KEXEC
request_resource(res, &crashk_res);
#endif
}
}
}
/*
* Request address space for all standard resources
*
* This is called just before pcibios_init(), which is also a
* subsys_initcall, but is linked in later (in arch/i386/pci/common.c).
*/
static int __init request_standard_resources(void)
{
int i;
printk("Setting up standard PCI resources\n");
if (efi_enabled)
efi_initialize_iomem_resources(&code_resource, &data_resource);
else
legacy_init_iomem_resources(&code_resource, &data_resource);
/* EFI systems may still have VGA */
request_resource(&iomem_resource, &video_ram_resource);
/* request I/O space for devices used on all i[345]86 PCs */
for (i = 0; i < ARRAY_SIZE(standard_io_resources); i++)
request_resource(&ioport_resource, &standard_io_resources[i]);
return 0;
}
subsys_initcall(request_standard_resources);
static void __init register_memory(void)
{
unsigned long gapstart, gapsize, round;
unsigned long long last;
int i;
/*
* Search for the bigest gap in the low 32 bits of the e820
* memory space.
*/
last = 0x100000000ull;
gapstart = 0x10000000;
gapsize = 0x400000;
i = e820.nr_map;
while (--i >= 0) {
unsigned long long start = e820.map[i].addr;
unsigned long long end = start + e820.map[i].size;
/*
* Since "last" is at most 4GB, we know we'll
* fit in 32 bits if this condition is true
*/
if (last > end) {
unsigned long gap = last - end;
if (gap > gapsize) {
gapsize = gap;
gapstart = end;
}
}
if (start < last)
last = start;
}
/*
* See how much we want to round up: start off with
* rounding to the next 1MB area.
*/
round = 0x100000;
while ((gapsize >> 4) > round)
round += round;
/* Fun with two's complement */
pci_mem_start = (gapstart + round) & -round;
printk("Allocating PCI resources starting at %08lx (gap: %08lx:%08lx)\n",
pci_mem_start, gapstart, gapsize);
}
#ifdef CONFIG_MCA #ifdef CONFIG_MCA
static void set_mca_bus(int x) static void set_mca_bus(int x)
{ {
@ -1328,6 +494,12 @@ static void set_mca_bus(int x)
static void set_mca_bus(int x) { } static void set_mca_bus(int x) { }
#endif #endif
/* Overridden in paravirt.c if CONFIG_PARAVIRT */
char * __attribute__((weak)) memory_setup(void)
{
return machine_specific_memory_setup();
}
/* /*
* Determine if we were loaded by an EFI loader. If so, then we have also been * Determine if we were loaded by an EFI loader. If so, then we have also been
* passed the efi memmap, systab, etc., so we should use these data structures * passed the efi memmap, systab, etc., so we should use these data structures
@ -1380,7 +552,7 @@ void __init setup_arch(char **cmdline_p)
efi_init(); efi_init();
else { else {
printk(KERN_INFO "BIOS-provided physical RAM map:\n"); printk(KERN_INFO "BIOS-provided physical RAM map:\n");
print_memory_map(machine_specific_memory_setup()); print_memory_map(memory_setup());
} }
copy_edd(); copy_edd();

View file

@ -128,7 +128,7 @@ restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc, int *peax
X86_EFLAGS_TF | X86_EFLAGS_SF | X86_EFLAGS_ZF | \ X86_EFLAGS_TF | X86_EFLAGS_SF | X86_EFLAGS_ZF | \
X86_EFLAGS_AF | X86_EFLAGS_PF | X86_EFLAGS_CF) X86_EFLAGS_AF | X86_EFLAGS_PF | X86_EFLAGS_CF)
GET_SEG(gs); COPY_SEG(gs);
GET_SEG(fs); GET_SEG(fs);
COPY_SEG(es); COPY_SEG(es);
COPY_SEG(ds); COPY_SEG(ds);
@ -244,9 +244,7 @@ setup_sigcontext(struct sigcontext __user *sc, struct _fpstate __user *fpstate,
{ {
int tmp, err = 0; int tmp, err = 0;
tmp = 0; err |= __put_user(regs->xgs, (unsigned int __user *)&sc->gs);
savesegment(gs, tmp);
err |= __put_user(tmp, (unsigned int __user *)&sc->gs);
savesegment(fs, tmp); savesegment(fs, tmp);
err |= __put_user(tmp, (unsigned int __user *)&sc->fs); err |= __put_user(tmp, (unsigned int __user *)&sc->fs);

View file

@ -321,7 +321,6 @@ static inline void leave_mm (unsigned long cpu)
fastcall void smp_invalidate_interrupt(struct pt_regs *regs) fastcall void smp_invalidate_interrupt(struct pt_regs *regs)
{ {
struct pt_regs *old_regs = set_irq_regs(regs);
unsigned long cpu; unsigned long cpu;
cpu = get_cpu(); cpu = get_cpu();
@ -352,7 +351,6 @@ fastcall void smp_invalidate_interrupt(struct pt_regs *regs)
smp_mb__after_clear_bit(); smp_mb__after_clear_bit();
out: out:
put_cpu_no_resched(); put_cpu_no_resched();
set_irq_regs(old_regs);
} }
static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm, static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
@ -607,14 +605,11 @@ void smp_send_stop(void)
*/ */
fastcall void smp_reschedule_interrupt(struct pt_regs *regs) fastcall void smp_reschedule_interrupt(struct pt_regs *regs)
{ {
struct pt_regs *old_regs = set_irq_regs(regs);
ack_APIC_irq(); ack_APIC_irq();
set_irq_regs(old_regs);
} }
fastcall void smp_call_function_interrupt(struct pt_regs *regs) fastcall void smp_call_function_interrupt(struct pt_regs *regs)
{ {
struct pt_regs *old_regs = set_irq_regs(regs);
void (*func) (void *info) = call_data->func; void (*func) (void *info) = call_data->func;
void *info = call_data->info; void *info = call_data->info;
int wait = call_data->wait; int wait = call_data->wait;
@ -637,7 +632,6 @@ fastcall void smp_call_function_interrupt(struct pt_regs *regs)
mb(); mb();
atomic_inc(&call_data->finished); atomic_inc(&call_data->finished);
} }
set_irq_regs(old_regs);
} }
/* /*

View file

@ -33,6 +33,11 @@
* Dave Jones : Report invalid combinations of Athlon CPUs. * Dave Jones : Report invalid combinations of Athlon CPUs.
* Rusty Russell : Hacked into shape for new "hotplug" boot process. */ * Rusty Russell : Hacked into shape for new "hotplug" boot process. */
/* SMP boot always wants to use real time delay to allow sufficient time for
* the APs to come online */
#define USE_REAL_TIME_DELAY
#include <linux/module.h> #include <linux/module.h>
#include <linux/init.h> #include <linux/init.h>
#include <linux/kernel.h> #include <linux/kernel.h>
@ -52,6 +57,8 @@
#include <asm/desc.h> #include <asm/desc.h>
#include <asm/arch_hooks.h> #include <asm/arch_hooks.h>
#include <asm/nmi.h> #include <asm/nmi.h>
#include <asm/pda.h>
#include <asm/genapic.h>
#include <mach_apic.h> #include <mach_apic.h>
#include <mach_wakecpu.h> #include <mach_wakecpu.h>
@ -536,11 +543,11 @@ set_cpu_sibling_map(int cpu)
static void __devinit start_secondary(void *unused) static void __devinit start_secondary(void *unused)
{ {
/* /*
* Dont put anything before smp_callin(), SMP * Don't put *anything* before secondary_cpu_init(), SMP
* booting is too fragile that we want to limit the * booting is too fragile that we want to limit the
* things done here to the most necessary things. * things done here to the most necessary things.
*/ */
cpu_init(); secondary_cpu_init();
preempt_disable(); preempt_disable();
smp_callin(); smp_callin();
while (!cpu_isset(smp_processor_id(), smp_commenced_mask)) while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
@ -599,13 +606,16 @@ void __devinit initialize_secondary(void)
"movl %0,%%esp\n\t" "movl %0,%%esp\n\t"
"jmp *%1" "jmp *%1"
: :
:"r" (current->thread.esp),"r" (current->thread.eip)); :"m" (current->thread.esp),"m" (current->thread.eip));
} }
/* Static state in head.S used to set up a CPU */
extern struct { extern struct {
void * esp; void * esp;
unsigned short ss; unsigned short ss;
} stack_start; } stack_start;
extern struct i386_pda *start_pda;
extern struct Xgt_desc_struct cpu_gdt_descr;
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
@ -936,9 +946,6 @@ static int __devinit do_boot_cpu(int apicid, int cpu)
unsigned long start_eip; unsigned long start_eip;
unsigned short nmi_high = 0, nmi_low = 0; unsigned short nmi_high = 0, nmi_low = 0;
++cpucount;
alternatives_smp_switch(1);
/* /*
* We can't use kernel_thread since we must avoid to * We can't use kernel_thread since we must avoid to
* reschedule the child. * reschedule the child.
@ -946,15 +953,30 @@ static int __devinit do_boot_cpu(int apicid, int cpu)
idle = alloc_idle_task(cpu); idle = alloc_idle_task(cpu);
if (IS_ERR(idle)) if (IS_ERR(idle))
panic("failed fork for CPU %d", cpu); panic("failed fork for CPU %d", cpu);
/* Pre-allocate and initialize the CPU's GDT and PDA so it
doesn't have to do any memory allocation during the
delicate CPU-bringup phase. */
if (!init_gdt(cpu, idle)) {
printk(KERN_INFO "Couldn't allocate GDT/PDA for CPU %d\n", cpu);
return -1; /* ? */
}
idle->thread.eip = (unsigned long) start_secondary; idle->thread.eip = (unsigned long) start_secondary;
/* start_eip had better be page-aligned! */ /* start_eip had better be page-aligned! */
start_eip = setup_trampoline(); start_eip = setup_trampoline();
++cpucount;
alternatives_smp_switch(1);
/* So we see what's up */ /* So we see what's up */
printk("Booting processor %d/%d eip %lx\n", cpu, apicid, start_eip); printk("Booting processor %d/%d eip %lx\n", cpu, apicid, start_eip);
/* Stack for startup_32 can be just as for start_secondary onwards */ /* Stack for startup_32 can be just as for start_secondary onwards */
stack_start.esp = (void *) idle->thread.esp; stack_start.esp = (void *) idle->thread.esp;
start_pda = cpu_pda(cpu);
cpu_gdt_descr = per_cpu(cpu_gdt_descr, cpu);
irq_ctx_init(cpu); irq_ctx_init(cpu);
x86_cpu_to_apicid[cpu] = apicid; x86_cpu_to_apicid[cpu] = apicid;
@ -1109,34 +1131,15 @@ static int __cpuinit __smp_prepare_cpu(int cpu)
} }
#endif #endif
static void smp_tune_scheduling (void) static void smp_tune_scheduling(void)
{ {
unsigned long cachesize; /* kB */ unsigned long cachesize; /* kB */
unsigned long bandwidth = 350; /* MB/s */
/*
* Rough estimation for SMP scheduling, this is the number of
* cycles it takes for a fully memory-limited process to flush
* the SMP-local cache.
*
* (For a P5 this pretty much means we will choose another idle
* CPU almost always at wakeup time (this is due to the small
* L1 cache), on PIIs it's around 50-100 usecs, depending on
* the cache size)
*/
if (!cpu_khz) { if (cpu_khz) {
/*
* this basically disables processor-affinity
* scheduling on SMP without a TSC.
*/
return;
} else {
cachesize = boot_cpu_data.x86_cache_size; cachesize = boot_cpu_data.x86_cache_size;
if (cachesize == -1) {
cachesize = 16; /* Pentiums, 2x8kB cache */ if (cachesize > 0)
bandwidth = 100; max_cache_size = cachesize * 1024;
}
max_cache_size = cachesize * 1024;
} }
} }
@ -1462,6 +1465,12 @@ int __devinit __cpu_up(unsigned int cpu)
cpu_set(cpu, smp_commenced_mask); cpu_set(cpu, smp_commenced_mask);
while (!cpu_isset(cpu, cpu_online_map)) while (!cpu_isset(cpu, cpu_online_map))
cpu_relax(); cpu_relax();
#ifdef CONFIG_X86_GENERICARCH
if (num_online_cpus() > 8 && genapic == &apic_default)
panic("Default flat APIC routing can't be used with > 8 cpus\n");
#endif
return 0; return 0;
} }

View file

@ -27,7 +27,11 @@
* Should the kernel map a VDSO page into processes and pass its * Should the kernel map a VDSO page into processes and pass its
* address down to glibc upon exec()? * address down to glibc upon exec()?
*/ */
#ifdef CONFIG_PARAVIRT
unsigned int __read_mostly vdso_enabled = 0;
#else
unsigned int __read_mostly vdso_enabled = 1; unsigned int __read_mostly vdso_enabled = 1;
#endif
EXPORT_SYMBOL_GPL(vdso_enabled); EXPORT_SYMBOL_GPL(vdso_enabled);

View file

@ -56,6 +56,7 @@
#include <asm/uaccess.h> #include <asm/uaccess.h>
#include <asm/processor.h> #include <asm/processor.h>
#include <asm/timer.h> #include <asm/timer.h>
#include <asm/time.h>
#include "mach_time.h" #include "mach_time.h"
@ -116,10 +117,7 @@ static int set_rtc_mmss(unsigned long nowtime)
/* gets recalled with irq locally disabled */ /* gets recalled with irq locally disabled */
/* XXX - does irqsave resolve this? -johnstul */ /* XXX - does irqsave resolve this? -johnstul */
spin_lock_irqsave(&rtc_lock, flags); spin_lock_irqsave(&rtc_lock, flags);
if (efi_enabled) retval = set_wallclock(nowtime);
retval = efi_set_rtc_mmss(nowtime);
else
retval = mach_set_rtc_mmss(nowtime);
spin_unlock_irqrestore(&rtc_lock, flags); spin_unlock_irqrestore(&rtc_lock, flags);
return retval; return retval;
@ -223,10 +221,7 @@ unsigned long get_cmos_time(void)
spin_lock_irqsave(&rtc_lock, flags); spin_lock_irqsave(&rtc_lock, flags);
if (efi_enabled) retval = get_wallclock();
retval = efi_get_time();
else
retval = mach_get_cmos_time();
spin_unlock_irqrestore(&rtc_lock, flags); spin_unlock_irqrestore(&rtc_lock, flags);
@ -370,7 +365,7 @@ static void __init hpet_time_init(void)
printk("Using HPET for base-timer\n"); printk("Using HPET for base-timer\n");
} }
time_init_hook(); do_time_init();
} }
#endif #endif
@ -392,5 +387,5 @@ void __init time_init(void)
do_settimeofday(&ts); do_settimeofday(&ts);
time_init_hook(); do_time_init();
} }

View file

@ -132,14 +132,20 @@ int __init hpet_enable(void)
* the single HPET timer for system time. * the single HPET timer for system time.
*/ */
#ifdef CONFIG_HPET_EMULATE_RTC #ifdef CONFIG_HPET_EMULATE_RTC
if (!(id & HPET_ID_NUMBER)) if (!(id & HPET_ID_NUMBER)) {
iounmap(hpet_virt_address);
hpet_virt_address = NULL;
return -1; return -1;
}
#endif #endif
hpet_period = hpet_readl(HPET_PERIOD); hpet_period = hpet_readl(HPET_PERIOD);
if ((hpet_period < HPET_MIN_PERIOD) || (hpet_period > HPET_MAX_PERIOD)) if ((hpet_period < HPET_MIN_PERIOD) || (hpet_period > HPET_MAX_PERIOD)) {
iounmap(hpet_virt_address);
hpet_virt_address = NULL;
return -1; return -1;
}
/* /*
* 64 bit math * 64 bit math
@ -156,8 +162,11 @@ int __init hpet_enable(void)
hpet_use_timer = id & HPET_ID_LEGSUP; hpet_use_timer = id & HPET_ID_LEGSUP;
if (hpet_timer_stop_set_go(hpet_tick)) if (hpet_timer_stop_set_go(hpet_tick)) {
iounmap(hpet_virt_address);
hpet_virt_address = NULL;
return -1; return -1;
}
use_hpet = 1; use_hpet = 1;

View file

@ -40,14 +40,18 @@ int arch_register_cpu(int num)
* restrictions and assumptions in kernel. This basically * restrictions and assumptions in kernel. This basically
* doesnt add a control file, one cannot attempt to offline * doesnt add a control file, one cannot attempt to offline
* BSP. * BSP.
*
* Also certain PCI quirks require not to enable hotplug control
* for all CPU's.
*/ */
if (!num) if (num && enable_cpu_hotplug)
cpu_devices[num].cpu.no_control = 1; cpu_devices[num].cpu.hotpluggable = 1;
return register_cpu(&cpu_devices[num].cpu, num); return register_cpu(&cpu_devices[num].cpu, num);
} }
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
int enable_cpu_hotplug = 1;
void arch_unregister_cpu(int num) { void arch_unregister_cpu(int num) {
return unregister_cpu(&cpu_devices[num].cpu); return unregister_cpu(&cpu_devices[num].cpu);

View file

@ -29,6 +29,7 @@
#include <linux/kexec.h> #include <linux/kexec.h>
#include <linux/unwind.h> #include <linux/unwind.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <linux/nmi.h>
#ifdef CONFIG_EISA #ifdef CONFIG_EISA
#include <linux/ioport.h> #include <linux/ioport.h>
@ -61,9 +62,6 @@ int panic_on_unrecovered_nmi;
asmlinkage int system_call(void); asmlinkage int system_call(void);
struct desc_struct default_ldt[] = { { 0, 0 }, { 0, 0 }, { 0, 0 },
{ 0, 0 }, { 0, 0 } };
/* Do we ignore FPU interrupts ? */ /* Do we ignore FPU interrupts ? */
char ignore_fpu_irq = 0; char ignore_fpu_irq = 0;
@ -94,7 +92,7 @@ asmlinkage void alignment_check(void);
asmlinkage void spurious_interrupt_bug(void); asmlinkage void spurious_interrupt_bug(void);
asmlinkage void machine_check(void); asmlinkage void machine_check(void);
static int kstack_depth_to_print = 24; int kstack_depth_to_print = 24;
#ifdef CONFIG_STACK_UNWIND #ifdef CONFIG_STACK_UNWIND
static int call_trace = 1; static int call_trace = 1;
#else #else
@ -163,16 +161,25 @@ dump_trace_unwind(struct unwind_frame_info *info, void *data)
{ {
struct ops_and_data *oad = (struct ops_and_data *)data; struct ops_and_data *oad = (struct ops_and_data *)data;
int n = 0; int n = 0;
unsigned long sp = UNW_SP(info);
if (arch_unw_user_mode(info))
return -1;
while (unwind(info) == 0 && UNW_PC(info)) { while (unwind(info) == 0 && UNW_PC(info)) {
n++; n++;
oad->ops->address(oad->data, UNW_PC(info)); oad->ops->address(oad->data, UNW_PC(info));
if (arch_unw_user_mode(info)) if (arch_unw_user_mode(info))
break; break;
if ((sp & ~(PAGE_SIZE - 1)) == (UNW_SP(info) & ~(PAGE_SIZE - 1))
&& sp > UNW_SP(info))
break;
sp = UNW_SP(info);
} }
return n; return n;
} }
#define MSG(msg) ops->warning(data, msg)
void dump_trace(struct task_struct *task, struct pt_regs *regs, void dump_trace(struct task_struct *task, struct pt_regs *regs,
unsigned long *stack, unsigned long *stack,
struct stacktrace_ops *ops, void *data) struct stacktrace_ops *ops, void *data)
@ -191,29 +198,31 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
if (unwind_init_frame_info(&info, task, regs) == 0) if (unwind_init_frame_info(&info, task, regs) == 0)
unw_ret = dump_trace_unwind(&info, &oad); unw_ret = dump_trace_unwind(&info, &oad);
} else if (task == current) } else if (task == current)
unw_ret = unwind_init_running(&info, dump_trace_unwind, &oad); unw_ret = unwind_init_running(&info, dump_trace_unwind,
&oad);
else { else {
if (unwind_init_blocked(&info, task) == 0) if (unwind_init_blocked(&info, task) == 0)
unw_ret = dump_trace_unwind(&info, &oad); unw_ret = dump_trace_unwind(&info, &oad);
} }
if (unw_ret > 0) { if (unw_ret > 0) {
if (call_trace == 1 && !arch_unw_user_mode(&info)) { if (call_trace == 1 && !arch_unw_user_mode(&info)) {
ops->warning_symbol(data, "DWARF2 unwinder stuck at %s\n", ops->warning_symbol(data,
"DWARF2 unwinder stuck at %s",
UNW_PC(&info)); UNW_PC(&info));
if (UNW_SP(&info) >= PAGE_OFFSET) { if (UNW_SP(&info) >= PAGE_OFFSET) {
ops->warning(data, "Leftover inexact backtrace:\n"); MSG("Leftover inexact backtrace:");
stack = (void *)UNW_SP(&info); stack = (void *)UNW_SP(&info);
if (!stack) if (!stack)
return; return;
ebp = UNW_FP(&info); ebp = UNW_FP(&info);
} else } else
ops->warning(data, "Full inexact backtrace again:\n"); MSG("Full inexact backtrace again:");
} else if (call_trace >= 1) } else if (call_trace >= 1)
return; return;
else else
ops->warning(data, "Full inexact backtrace again:\n"); MSG("Full inexact backtrace again:");
} else } else
ops->warning(data, "Inexact backtrace:\n"); MSG("Inexact backtrace:");
} }
if (!stack) { if (!stack) {
unsigned long dummy; unsigned long dummy;
@ -247,6 +256,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs,
stack = (unsigned long*)context->previous_esp; stack = (unsigned long*)context->previous_esp;
if (!stack) if (!stack)
break; break;
touch_nmi_watchdog();
} }
} }
EXPORT_SYMBOL(dump_trace); EXPORT_SYMBOL(dump_trace);
@ -379,7 +389,7 @@ void show_registers(struct pt_regs *regs)
* time of the fault.. * time of the fault..
*/ */
if (in_kernel) { if (in_kernel) {
u8 __user *eip; u8 *eip;
int code_bytes = 64; int code_bytes = 64;
unsigned char c; unsigned char c;
@ -388,18 +398,20 @@ void show_registers(struct pt_regs *regs)
printk(KERN_EMERG "Code: "); printk(KERN_EMERG "Code: ");
eip = (u8 __user *)regs->eip - 43; eip = (u8 *)regs->eip - 43;
if (eip < (u8 __user *)PAGE_OFFSET || __get_user(c, eip)) { if (eip < (u8 *)PAGE_OFFSET ||
probe_kernel_address(eip, c)) {
/* try starting at EIP */ /* try starting at EIP */
eip = (u8 __user *)regs->eip; eip = (u8 *)regs->eip;
code_bytes = 32; code_bytes = 32;
} }
for (i = 0; i < code_bytes; i++, eip++) { for (i = 0; i < code_bytes; i++, eip++) {
if (eip < (u8 __user *)PAGE_OFFSET || __get_user(c, eip)) { if (eip < (u8 *)PAGE_OFFSET ||
probe_kernel_address(eip, c)) {
printk(" Bad EIP value."); printk(" Bad EIP value.");
break; break;
} }
if (eip == (u8 __user *)regs->eip) if (eip == (u8 *)regs->eip)
printk("<%02x> ", c); printk("<%02x> ", c);
else else
printk("%02x ", c); printk("%02x ", c);
@ -415,7 +427,7 @@ static void handle_BUG(struct pt_regs *regs)
if (eip < PAGE_OFFSET) if (eip < PAGE_OFFSET)
return; return;
if (probe_kernel_address((unsigned short __user *)eip, ud2)) if (probe_kernel_address((unsigned short *)eip, ud2))
return; return;
if (ud2 != 0x0b0f) if (ud2 != 0x0b0f)
return; return;
@ -428,11 +440,11 @@ static void handle_BUG(struct pt_regs *regs)
char *file; char *file;
char c; char c;
if (probe_kernel_address((unsigned short __user *)(eip + 2), if (probe_kernel_address((unsigned short *)(eip + 2), line))
line))
break; break;
if (__get_user(file, (char * __user *)(eip + 4)) || if (probe_kernel_address((char **)(eip + 4), file) ||
(unsigned long)file < PAGE_OFFSET || __get_user(c, file)) (unsigned long)file < PAGE_OFFSET ||
probe_kernel_address(file, c))
file = "<bad filename>"; file = "<bad filename>";
printk(KERN_EMERG "kernel BUG at %s:%d!\n", file, line); printk(KERN_EMERG "kernel BUG at %s:%d!\n", file, line);
@ -707,8 +719,7 @@ mem_parity_error(unsigned char reason, struct pt_regs * regs)
{ {
printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x on " printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x on "
"CPU %d.\n", reason, smp_processor_id()); "CPU %d.\n", reason, smp_processor_id());
printk(KERN_EMERG "You probably have a hardware problem with your RAM " printk(KERN_EMERG "You have some hardware problem, likely on the PCI bus.\n");
"chips\n");
if (panic_on_unrecovered_nmi) if (panic_on_unrecovered_nmi)
panic("NMI: Not continuing"); panic("NMI: Not continuing");
@ -773,7 +784,6 @@ void __kprobes die_nmi(struct pt_regs *regs, const char *msg)
printk(" on CPU%d, eip %08lx, registers:\n", printk(" on CPU%d, eip %08lx, registers:\n",
smp_processor_id(), regs->eip); smp_processor_id(), regs->eip);
show_registers(regs); show_registers(regs);
printk(KERN_EMERG "console shuts up ...\n");
console_silent(); console_silent();
spin_unlock(&nmi_print_lock); spin_unlock(&nmi_print_lock);
bust_spinlocks(0); bust_spinlocks(0);
@ -1088,49 +1098,24 @@ fastcall void do_spurious_interrupt_bug(struct pt_regs * regs,
#endif #endif
} }
fastcall void setup_x86_bogus_stack(unsigned char * stk) fastcall unsigned long patch_espfix_desc(unsigned long uesp,
unsigned long kesp)
{ {
unsigned long *switch16_ptr, *switch32_ptr;
struct pt_regs *regs;
unsigned long stack_top, stack_bot;
unsigned short iret_frame16_off;
int cpu = smp_processor_id(); int cpu = smp_processor_id();
/* reserve the space on 32bit stack for the magic switch16 pointer */ struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
memmove(stk, stk + 8, sizeof(struct pt_regs)); struct desc_struct *gdt = (struct desc_struct *)cpu_gdt_descr->address;
switch16_ptr = (unsigned long *)(stk + sizeof(struct pt_regs)); unsigned long base = (kesp - uesp) & -THREAD_SIZE;
regs = (struct pt_regs *)stk; unsigned long new_kesp = kesp - base;
/* now the switch32 on 16bit stack */ unsigned long lim_pages = (new_kesp | (THREAD_SIZE - 1)) >> PAGE_SHIFT;
stack_bot = (unsigned long)&per_cpu(cpu_16bit_stack, cpu); __u64 desc = *(__u64 *)&gdt[GDT_ENTRY_ESPFIX_SS];
stack_top = stack_bot + CPU_16BIT_STACK_SIZE; /* Set up base for espfix segment */
switch32_ptr = (unsigned long *)(stack_top - 8); desc &= 0x00f0ff0000000000ULL;
iret_frame16_off = CPU_16BIT_STACK_SIZE - 8 - 20; desc |= ((((__u64)base) << 16) & 0x000000ffffff0000ULL) |
/* copy iret frame on 16bit stack */ ((((__u64)base) << 32) & 0xff00000000000000ULL) |
memcpy((void *)(stack_bot + iret_frame16_off), &regs->eip, 20); ((((__u64)lim_pages) << 32) & 0x000f000000000000ULL) |
/* fill in the switch pointers */ (lim_pages & 0xffff);
switch16_ptr[0] = (regs->esp & 0xffff0000) | iret_frame16_off; *(__u64 *)&gdt[GDT_ENTRY_ESPFIX_SS] = desc;
switch16_ptr[1] = __ESPFIX_SS; return new_kesp;
switch32_ptr[0] = (unsigned long)stk + sizeof(struct pt_regs) +
8 - CPU_16BIT_STACK_SIZE;
switch32_ptr[1] = __KERNEL_DS;
}
fastcall unsigned char * fixup_x86_bogus_stack(unsigned short sp)
{
unsigned long *switch32_ptr;
unsigned char *stack16, *stack32;
unsigned long stack_top, stack_bot;
int len;
int cpu = smp_processor_id();
stack_bot = (unsigned long)&per_cpu(cpu_16bit_stack, cpu);
stack_top = stack_bot + CPU_16BIT_STACK_SIZE;
switch32_ptr = (unsigned long *)(stack_top - 8);
/* copy the data from 16bit stack to 32bit stack */
len = CPU_16BIT_STACK_SIZE - 8 - sp;
stack16 = (unsigned char *)(stack_bot + sp);
stack32 = (unsigned char *)
(switch32_ptr[0] + CPU_16BIT_STACK_SIZE - 8 - len);
memcpy(stack32, stack16, len);
return stack32;
} }
/* /*
@ -1143,7 +1128,7 @@ fastcall unsigned char * fixup_x86_bogus_stack(unsigned short sp)
* Must be called with kernel preemption disabled (in this case, * Must be called with kernel preemption disabled (in this case,
* local interrupts are disabled at the call-site in entry.S). * local interrupts are disabled at the call-site in entry.S).
*/ */
asmlinkage void math_state_restore(struct pt_regs regs) asmlinkage void math_state_restore(void)
{ {
struct thread_info *thread = current_thread_info(); struct thread_info *thread = current_thread_info();
struct task_struct *tsk = thread->task; struct task_struct *tsk = thread->task;
@ -1153,6 +1138,7 @@ asmlinkage void math_state_restore(struct pt_regs regs)
init_fpu(tsk); init_fpu(tsk);
restore_fpu(tsk); restore_fpu(tsk);
thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */ thread->status |= TS_USEDFPU; /* So we fnsave on switch_to() */
tsk->fpu_counter++;
} }
#ifndef CONFIG_MATH_EMULATION #ifndef CONFIG_MATH_EMULATION

View file

@ -13,7 +13,6 @@
#include <asm/delay.h> #include <asm/delay.h>
#include <asm/tsc.h> #include <asm/tsc.h>
#include <asm/delay.h>
#include <asm/io.h> #include <asm/io.h>
#include "mach_timer.h" #include "mach_timer.h"

View file

@ -43,6 +43,7 @@
#include <linux/highmem.h> #include <linux/highmem.h>
#include <linux/ptrace.h> #include <linux/ptrace.h>
#include <linux/audit.h> #include <linux/audit.h>
#include <linux/stddef.h>
#include <asm/uaccess.h> #include <asm/uaccess.h>
#include <asm/io.h> #include <asm/io.h>
@ -72,10 +73,10 @@
/* /*
* 8- and 16-bit register defines.. * 8- and 16-bit register defines..
*/ */
#define AL(regs) (((unsigned char *)&((regs)->eax))[0]) #define AL(regs) (((unsigned char *)&((regs)->pt.eax))[0])
#define AH(regs) (((unsigned char *)&((regs)->eax))[1]) #define AH(regs) (((unsigned char *)&((regs)->pt.eax))[1])
#define IP(regs) (*(unsigned short *)&((regs)->eip)) #define IP(regs) (*(unsigned short *)&((regs)->pt.eip))
#define SP(regs) (*(unsigned short *)&((regs)->esp)) #define SP(regs) (*(unsigned short *)&((regs)->pt.esp))
/* /*
* virtual flags (16 and 32-bit versions) * virtual flags (16 and 32-bit versions)
@ -89,10 +90,37 @@
#define SAFE_MASK (0xDD5) #define SAFE_MASK (0xDD5)
#define RETURN_MASK (0xDFF) #define RETURN_MASK (0xDFF)
#define VM86_REGS_PART2 orig_eax /* convert kernel_vm86_regs to vm86_regs */
#define VM86_REGS_SIZE1 \ static int copy_vm86_regs_to_user(struct vm86_regs __user *user,
( (unsigned)( & (((struct kernel_vm86_regs *)0)->VM86_REGS_PART2) ) ) const struct kernel_vm86_regs *regs)
#define VM86_REGS_SIZE2 (sizeof(struct kernel_vm86_regs) - VM86_REGS_SIZE1) {
int ret = 0;
/* kernel_vm86_regs is missing xfs, so copy everything up to
(but not including) xgs, and then rest after xgs. */
ret += copy_to_user(user, regs, offsetof(struct kernel_vm86_regs, pt.xgs));
ret += copy_to_user(&user->__null_gs, &regs->pt.xgs,
sizeof(struct kernel_vm86_regs) -
offsetof(struct kernel_vm86_regs, pt.xgs));
return ret;
}
/* convert vm86_regs to kernel_vm86_regs */
static int copy_vm86_regs_from_user(struct kernel_vm86_regs *regs,
const struct vm86_regs __user *user,
unsigned extra)
{
int ret = 0;
ret += copy_from_user(regs, user, offsetof(struct kernel_vm86_regs, pt.xgs));
ret += copy_from_user(&regs->pt.xgs, &user->__null_gs,
sizeof(struct kernel_vm86_regs) -
offsetof(struct kernel_vm86_regs, pt.xgs) +
extra);
return ret;
}
struct pt_regs * FASTCALL(save_v86_state(struct kernel_vm86_regs * regs)); struct pt_regs * FASTCALL(save_v86_state(struct kernel_vm86_regs * regs));
struct pt_regs * fastcall save_v86_state(struct kernel_vm86_regs * regs) struct pt_regs * fastcall save_v86_state(struct kernel_vm86_regs * regs)
@ -112,10 +140,8 @@ struct pt_regs * fastcall save_v86_state(struct kernel_vm86_regs * regs)
printk("no vm86_info: BAD\n"); printk("no vm86_info: BAD\n");
do_exit(SIGSEGV); do_exit(SIGSEGV);
} }
set_flags(regs->eflags, VEFLAGS, VIF_MASK | current->thread.v86mask); set_flags(regs->pt.eflags, VEFLAGS, VIF_MASK | current->thread.v86mask);
tmp = copy_to_user(&current->thread.vm86_info->regs,regs, VM86_REGS_SIZE1); tmp = copy_vm86_regs_to_user(&current->thread.vm86_info->regs,regs);
tmp += copy_to_user(&current->thread.vm86_info->regs.VM86_REGS_PART2,
&regs->VM86_REGS_PART2, VM86_REGS_SIZE2);
tmp += put_user(current->thread.screen_bitmap,&current->thread.vm86_info->screen_bitmap); tmp += put_user(current->thread.screen_bitmap,&current->thread.vm86_info->screen_bitmap);
if (tmp) { if (tmp) {
printk("vm86: could not access userspace vm86_info\n"); printk("vm86: could not access userspace vm86_info\n");
@ -129,9 +155,11 @@ struct pt_regs * fastcall save_v86_state(struct kernel_vm86_regs * regs)
current->thread.saved_esp0 = 0; current->thread.saved_esp0 = 0;
put_cpu(); put_cpu();
loadsegment(fs, current->thread.saved_fs);
loadsegment(gs, current->thread.saved_gs);
ret = KVM86->regs32; ret = KVM86->regs32;
loadsegment(fs, current->thread.saved_fs);
ret->xgs = current->thread.saved_gs;
return ret; return ret;
} }
@ -183,9 +211,9 @@ asmlinkage int sys_vm86old(struct pt_regs regs)
tsk = current; tsk = current;
if (tsk->thread.saved_esp0) if (tsk->thread.saved_esp0)
goto out; goto out;
tmp = copy_from_user(&info, v86, VM86_REGS_SIZE1); tmp = copy_vm86_regs_from_user(&info.regs, &v86->regs,
tmp += copy_from_user(&info.regs.VM86_REGS_PART2, &v86->regs.VM86_REGS_PART2, offsetof(struct kernel_vm86_struct, vm86plus) -
(long)&info.vm86plus - (long)&info.regs.VM86_REGS_PART2); sizeof(info.regs));
ret = -EFAULT; ret = -EFAULT;
if (tmp) if (tmp)
goto out; goto out;
@ -233,9 +261,9 @@ asmlinkage int sys_vm86(struct pt_regs regs)
if (tsk->thread.saved_esp0) if (tsk->thread.saved_esp0)
goto out; goto out;
v86 = (struct vm86plus_struct __user *)regs.ecx; v86 = (struct vm86plus_struct __user *)regs.ecx;
tmp = copy_from_user(&info, v86, VM86_REGS_SIZE1); tmp = copy_vm86_regs_from_user(&info.regs, &v86->regs,
tmp += copy_from_user(&info.regs.VM86_REGS_PART2, &v86->regs.VM86_REGS_PART2, offsetof(struct kernel_vm86_struct, regs32) -
(long)&info.regs32 - (long)&info.regs.VM86_REGS_PART2); sizeof(info.regs));
ret = -EFAULT; ret = -EFAULT;
if (tmp) if (tmp)
goto out; goto out;
@ -252,15 +280,15 @@ asmlinkage int sys_vm86(struct pt_regs regs)
static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk) static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk)
{ {
struct tss_struct *tss; struct tss_struct *tss;
long eax;
/* /*
* make sure the vm86() system call doesn't try to do anything silly * make sure the vm86() system call doesn't try to do anything silly
*/ */
info->regs.__null_ds = 0; info->regs.pt.xds = 0;
info->regs.__null_es = 0; info->regs.pt.xes = 0;
info->regs.pt.xgs = 0;
/* we are clearing fs,gs later just before "jmp resume_userspace", /* we are clearing fs later just before "jmp resume_userspace",
* because starting with Linux 2.1.x they aren't no longer saved/restored * because it is not saved/restored.
*/ */
/* /*
@ -268,10 +296,10 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
* has set it up safely, so this makes sure interrupt etc flags are * has set it up safely, so this makes sure interrupt etc flags are
* inherited from protected mode. * inherited from protected mode.
*/ */
VEFLAGS = info->regs.eflags; VEFLAGS = info->regs.pt.eflags;
info->regs.eflags &= SAFE_MASK; info->regs.pt.eflags &= SAFE_MASK;
info->regs.eflags |= info->regs32->eflags & ~SAFE_MASK; info->regs.pt.eflags |= info->regs32->eflags & ~SAFE_MASK;
info->regs.eflags |= VM_MASK; info->regs.pt.eflags |= VM_MASK;
switch (info->cpu_type) { switch (info->cpu_type) {
case CPU_286: case CPU_286:
@ -294,7 +322,7 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
info->regs32->eax = 0; info->regs32->eax = 0;
tsk->thread.saved_esp0 = tsk->thread.esp0; tsk->thread.saved_esp0 = tsk->thread.esp0;
savesegment(fs, tsk->thread.saved_fs); savesegment(fs, tsk->thread.saved_fs);
savesegment(gs, tsk->thread.saved_gs); tsk->thread.saved_gs = info->regs32->xgs;
tss = &per_cpu(init_tss, get_cpu()); tss = &per_cpu(init_tss, get_cpu());
tsk->thread.esp0 = (unsigned long) &info->VM86_TSS_ESP0; tsk->thread.esp0 = (unsigned long) &info->VM86_TSS_ESP0;
@ -306,19 +334,18 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
tsk->thread.screen_bitmap = info->screen_bitmap; tsk->thread.screen_bitmap = info->screen_bitmap;
if (info->flags & VM86_SCREEN_BITMAP) if (info->flags & VM86_SCREEN_BITMAP)
mark_screen_rdonly(tsk->mm); mark_screen_rdonly(tsk->mm);
__asm__ __volatile__("xorl %eax,%eax; movl %eax,%fs; movl %eax,%gs\n\t");
__asm__ __volatile__("movl %%eax, %0\n" :"=r"(eax));
/*call audit_syscall_exit since we do not exit via the normal paths */ /*call audit_syscall_exit since we do not exit via the normal paths */
if (unlikely(current->audit_context)) if (unlikely(current->audit_context))
audit_syscall_exit(AUDITSC_RESULT(eax), eax); audit_syscall_exit(AUDITSC_RESULT(0), 0);
__asm__ __volatile__( __asm__ __volatile__(
"movl %0,%%esp\n\t" "movl %0,%%esp\n\t"
"movl %1,%%ebp\n\t" "movl %1,%%ebp\n\t"
"mov %2, %%fs\n\t"
"jmp resume_userspace" "jmp resume_userspace"
: /* no outputs */ : /* no outputs */
:"r" (&info->regs), "r" (task_thread_info(tsk))); :"r" (&info->regs), "r" (task_thread_info(tsk)), "r" (0));
/* we never return here */ /* we never return here */
} }
@ -348,12 +375,12 @@ static inline void clear_IF(struct kernel_vm86_regs * regs)
static inline void clear_TF(struct kernel_vm86_regs * regs) static inline void clear_TF(struct kernel_vm86_regs * regs)
{ {
regs->eflags &= ~TF_MASK; regs->pt.eflags &= ~TF_MASK;
} }
static inline void clear_AC(struct kernel_vm86_regs * regs) static inline void clear_AC(struct kernel_vm86_regs * regs)
{ {
regs->eflags &= ~AC_MASK; regs->pt.eflags &= ~AC_MASK;
} }
/* It is correct to call set_IF(regs) from the set_vflags_* /* It is correct to call set_IF(regs) from the set_vflags_*
@ -370,7 +397,7 @@ static inline void clear_AC(struct kernel_vm86_regs * regs)
static inline void set_vflags_long(unsigned long eflags, struct kernel_vm86_regs * regs) static inline void set_vflags_long(unsigned long eflags, struct kernel_vm86_regs * regs)
{ {
set_flags(VEFLAGS, eflags, current->thread.v86mask); set_flags(VEFLAGS, eflags, current->thread.v86mask);
set_flags(regs->eflags, eflags, SAFE_MASK); set_flags(regs->pt.eflags, eflags, SAFE_MASK);
if (eflags & IF_MASK) if (eflags & IF_MASK)
set_IF(regs); set_IF(regs);
else else
@ -380,7 +407,7 @@ static inline void set_vflags_long(unsigned long eflags, struct kernel_vm86_regs
static inline void set_vflags_short(unsigned short flags, struct kernel_vm86_regs * regs) static inline void set_vflags_short(unsigned short flags, struct kernel_vm86_regs * regs)
{ {
set_flags(VFLAGS, flags, current->thread.v86mask); set_flags(VFLAGS, flags, current->thread.v86mask);
set_flags(regs->eflags, flags, SAFE_MASK); set_flags(regs->pt.eflags, flags, SAFE_MASK);
if (flags & IF_MASK) if (flags & IF_MASK)
set_IF(regs); set_IF(regs);
else else
@ -389,7 +416,7 @@ static inline void set_vflags_short(unsigned short flags, struct kernel_vm86_reg
static inline unsigned long get_vflags(struct kernel_vm86_regs * regs) static inline unsigned long get_vflags(struct kernel_vm86_regs * regs)
{ {
unsigned long flags = regs->eflags & RETURN_MASK; unsigned long flags = regs->pt.eflags & RETURN_MASK;
if (VEFLAGS & VIF_MASK) if (VEFLAGS & VIF_MASK)
flags |= IF_MASK; flags |= IF_MASK;
@ -493,7 +520,7 @@ static void do_int(struct kernel_vm86_regs *regs, int i,
unsigned long __user *intr_ptr; unsigned long __user *intr_ptr;
unsigned long segoffs; unsigned long segoffs;
if (regs->cs == BIOSSEG) if (regs->pt.xcs == BIOSSEG)
goto cannot_handle; goto cannot_handle;
if (is_revectored(i, &KVM86->int_revectored)) if (is_revectored(i, &KVM86->int_revectored))
goto cannot_handle; goto cannot_handle;
@ -505,9 +532,9 @@ static void do_int(struct kernel_vm86_regs *regs, int i,
if ((segoffs >> 16) == BIOSSEG) if ((segoffs >> 16) == BIOSSEG)
goto cannot_handle; goto cannot_handle;
pushw(ssp, sp, get_vflags(regs), cannot_handle); pushw(ssp, sp, get_vflags(regs), cannot_handle);
pushw(ssp, sp, regs->cs, cannot_handle); pushw(ssp, sp, regs->pt.xcs, cannot_handle);
pushw(ssp, sp, IP(regs), cannot_handle); pushw(ssp, sp, IP(regs), cannot_handle);
regs->cs = segoffs >> 16; regs->pt.xcs = segoffs >> 16;
SP(regs) -= 6; SP(regs) -= 6;
IP(regs) = segoffs & 0xffff; IP(regs) = segoffs & 0xffff;
clear_TF(regs); clear_TF(regs);
@ -524,7 +551,7 @@ int handle_vm86_trap(struct kernel_vm86_regs * regs, long error_code, int trapno
if (VMPI.is_vm86pus) { if (VMPI.is_vm86pus) {
if ( (trapno==3) || (trapno==1) ) if ( (trapno==3) || (trapno==1) )
return_to_32bit(regs, VM86_TRAP + (trapno << 8)); return_to_32bit(regs, VM86_TRAP + (trapno << 8));
do_int(regs, trapno, (unsigned char __user *) (regs->ss << 4), SP(regs)); do_int(regs, trapno, (unsigned char __user *) (regs->pt.xss << 4), SP(regs));
return 0; return 0;
} }
if (trapno !=1) if (trapno !=1)
@ -560,10 +587,10 @@ void handle_vm86_fault(struct kernel_vm86_regs * regs, long error_code)
handle_vm86_trap(regs, 0, 1); \ handle_vm86_trap(regs, 0, 1); \
return; } while (0) return; } while (0)
orig_flags = *(unsigned short *)&regs->eflags; orig_flags = *(unsigned short *)&regs->pt.eflags;
csp = (unsigned char __user *) (regs->cs << 4); csp = (unsigned char __user *) (regs->pt.xcs << 4);
ssp = (unsigned char __user *) (regs->ss << 4); ssp = (unsigned char __user *) (regs->pt.xss << 4);
sp = SP(regs); sp = SP(regs);
ip = IP(regs); ip = IP(regs);
@ -650,7 +677,7 @@ void handle_vm86_fault(struct kernel_vm86_regs * regs, long error_code)
SP(regs) += 6; SP(regs) += 6;
} }
IP(regs) = newip; IP(regs) = newip;
regs->cs = newcs; regs->pt.xcs = newcs;
CHECK_IF_IN_TRAP; CHECK_IF_IN_TRAP;
if (data32) { if (data32) {
set_vflags_long(newflags, regs); set_vflags_long(newflags, regs);

View file

@ -1,13 +1,26 @@
/* ld script to make i386 Linux kernel /* ld script to make i386 Linux kernel
* Written by Martin Mares <mj@atrey.karlin.mff.cuni.cz>; * Written by Martin Mares <mj@atrey.karlin.mff.cuni.cz>;
*
* Don't define absolute symbols until and unless you know that symbol
* value is should remain constant even if kernel image is relocated
* at run time. Absolute symbols are not relocated. If symbol value should
* change if kernel is relocated, make the symbol section relative and
* put it inside the section definition.
*/ */
/* Don't define absolute symbols until and unless you know that symbol
* value is should remain constant even if kernel image is relocated
* at run time. Absolute symbols are not relocated. If symbol value should
* change if kernel is relocated, make the symbol section relative and
* put it inside the section definition.
*/
#define LOAD_OFFSET __PAGE_OFFSET #define LOAD_OFFSET __PAGE_OFFSET
#include <asm-generic/vmlinux.lds.h> #include <asm-generic/vmlinux.lds.h>
#include <asm/thread_info.h> #include <asm/thread_info.h>
#include <asm/page.h> #include <asm/page.h>
#include <asm/cache.h> #include <asm/cache.h>
#include <asm/boot.h>
OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386") OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386) OUTPUT_ARCH(i386)
@ -21,34 +34,35 @@ PHDRS {
} }
SECTIONS SECTIONS
{ {
. = __KERNEL_START; . = LOAD_OFFSET + LOAD_PHYSICAL_ADDR;
phys_startup_32 = startup_32 - LOAD_OFFSET; phys_startup_32 = startup_32 - LOAD_OFFSET;
/* read-only */ /* read-only */
_text = .; /* Text and read-only data */
.text : AT(ADDR(.text) - LOAD_OFFSET) { .text : AT(ADDR(.text) - LOAD_OFFSET) {
_text = .; /* Text and read-only data */
*(.text) *(.text)
SCHED_TEXT SCHED_TEXT
LOCK_TEXT LOCK_TEXT
KPROBES_TEXT KPROBES_TEXT
*(.fixup) *(.fixup)
*(.gnu.warning) *(.gnu.warning)
} :text = 0x9090 _etext = .; /* End of text section */
} :text = 0x9090
_etext = .; /* End of text section */
. = ALIGN(16); /* Exception table */ . = ALIGN(16); /* Exception table */
__start___ex_table = .; __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { *(__ex_table) } __start___ex_table = .;
__stop___ex_table = .; *(__ex_table)
__stop___ex_table = .;
}
RODATA RODATA
. = ALIGN(4); . = ALIGN(4);
__tracedata_start = .;
.tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) { .tracedata : AT(ADDR(.tracedata) - LOAD_OFFSET) {
__tracedata_start = .;
*(.tracedata) *(.tracedata)
__tracedata_end = .;
} }
__tracedata_end = .;
/* writeable */ /* writeable */
. = ALIGN(4096); . = ALIGN(4096);
@ -57,11 +71,19 @@ SECTIONS
CONSTRUCTORS CONSTRUCTORS
} :data } :data
.paravirtprobe : AT(ADDR(.paravirtprobe) - LOAD_OFFSET) {
__start_paravirtprobe = .;
*(.paravirtprobe)
__stop_paravirtprobe = .;
}
. = ALIGN(4096); . = ALIGN(4096);
__nosave_begin = .; .data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) {
.data_nosave : AT(ADDR(.data_nosave) - LOAD_OFFSET) { *(.data.nosave) } __nosave_begin = .;
. = ALIGN(4096); *(.data.nosave)
__nosave_end = .; . = ALIGN(4096);
__nosave_end = .;
}
. = ALIGN(4096); . = ALIGN(4096);
.data.page_aligned : AT(ADDR(.data.page_aligned) - LOAD_OFFSET) { .data.page_aligned : AT(ADDR(.data.page_aligned) - LOAD_OFFSET) {
@ -75,17 +97,10 @@ SECTIONS
/* rarely changed data like cpu maps */ /* rarely changed data like cpu maps */
. = ALIGN(32); . = ALIGN(32);
.data.read_mostly : AT(ADDR(.data.read_mostly) - LOAD_OFFSET) { *(.data.read_mostly) } .data.read_mostly : AT(ADDR(.data.read_mostly) - LOAD_OFFSET) {
_edata = .; /* End of data section */ *(.data.read_mostly)
_edata = .; /* End of data section */
#ifdef CONFIG_STACK_UNWIND
. = ALIGN(4);
.eh_frame : AT(ADDR(.eh_frame) - LOAD_OFFSET) {
__start_unwind = .;
*(.eh_frame)
__end_unwind = .;
} }
#endif
. = ALIGN(THREAD_SIZE); /* init_task */ . = ALIGN(THREAD_SIZE); /* init_task */
.data.init_task : AT(ADDR(.data.init_task) - LOAD_OFFSET) { .data.init_task : AT(ADDR(.data.init_task) - LOAD_OFFSET) {
@ -94,88 +109,102 @@ SECTIONS
/* might get freed after init */ /* might get freed after init */
. = ALIGN(4096); . = ALIGN(4096);
__smp_alt_begin = .;
__smp_alt_instructions = .;
.smp_altinstructions : AT(ADDR(.smp_altinstructions) - LOAD_OFFSET) { .smp_altinstructions : AT(ADDR(.smp_altinstructions) - LOAD_OFFSET) {
__smp_alt_begin = .;
__smp_alt_instructions = .;
*(.smp_altinstructions) *(.smp_altinstructions)
__smp_alt_instructions_end = .;
} }
__smp_alt_instructions_end = .;
. = ALIGN(4); . = ALIGN(4);
__smp_locks = .;
.smp_locks : AT(ADDR(.smp_locks) - LOAD_OFFSET) { .smp_locks : AT(ADDR(.smp_locks) - LOAD_OFFSET) {
__smp_locks = .;
*(.smp_locks) *(.smp_locks)
__smp_locks_end = .;
} }
__smp_locks_end = .;
.smp_altinstr_replacement : AT(ADDR(.smp_altinstr_replacement) - LOAD_OFFSET) { .smp_altinstr_replacement : AT(ADDR(.smp_altinstr_replacement) - LOAD_OFFSET) {
*(.smp_altinstr_replacement) *(.smp_altinstr_replacement)
__smp_alt_end = .;
} }
/* will be freed after init
* Following ALIGN() is required to make sure no other data falls on the
* same page where __smp_alt_end is pointing as that page might be freed
* after boot. Always make sure that ALIGN() directive is present after
* the section which contains __smp_alt_end.
*/
. = ALIGN(4096); . = ALIGN(4096);
__smp_alt_end = .;
/* will be freed after init */ /* will be freed after init */
. = ALIGN(4096); /* Init code and data */ . = ALIGN(4096); /* Init code and data */
__init_begin = .;
.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) { .init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
__init_begin = .;
_sinittext = .; _sinittext = .;
*(.init.text) *(.init.text)
_einittext = .; _einittext = .;
} }
.init.data : AT(ADDR(.init.data) - LOAD_OFFSET) { *(.init.data) } .init.data : AT(ADDR(.init.data) - LOAD_OFFSET) { *(.init.data) }
. = ALIGN(16); . = ALIGN(16);
__setup_start = .; .init.setup : AT(ADDR(.init.setup) - LOAD_OFFSET) {
.init.setup : AT(ADDR(.init.setup) - LOAD_OFFSET) { *(.init.setup) } __setup_start = .;
__setup_end = .; *(.init.setup)
__initcall_start = .; __setup_end = .;
}
.initcall.init : AT(ADDR(.initcall.init) - LOAD_OFFSET) { .initcall.init : AT(ADDR(.initcall.init) - LOAD_OFFSET) {
__initcall_start = .;
INITCALLS INITCALLS
__initcall_end = .;
} }
__initcall_end = .;
__con_initcall_start = .;
.con_initcall.init : AT(ADDR(.con_initcall.init) - LOAD_OFFSET) { .con_initcall.init : AT(ADDR(.con_initcall.init) - LOAD_OFFSET) {
__con_initcall_start = .;
*(.con_initcall.init) *(.con_initcall.init)
__con_initcall_end = .;
} }
__con_initcall_end = .;
SECURITY_INIT SECURITY_INIT
. = ALIGN(4); . = ALIGN(4);
__alt_instructions = .;
.altinstructions : AT(ADDR(.altinstructions) - LOAD_OFFSET) { .altinstructions : AT(ADDR(.altinstructions) - LOAD_OFFSET) {
__alt_instructions = .;
*(.altinstructions) *(.altinstructions)
__alt_instructions_end = .;
} }
__alt_instructions_end = .;
.altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) { .altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) {
*(.altinstr_replacement) *(.altinstr_replacement)
} }
. = ALIGN(4);
.parainstructions : AT(ADDR(.parainstructions) - LOAD_OFFSET) {
__start_parainstructions = .;
*(.parainstructions)
__stop_parainstructions = .;
}
/* .exit.text is discard at runtime, not link time, to deal with references /* .exit.text is discard at runtime, not link time, to deal with references
from .altinstructions and .eh_frame */ from .altinstructions and .eh_frame */
.exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) } .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
.exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) } .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) }
. = ALIGN(4096); . = ALIGN(4096);
__initramfs_start = .; .init.ramfs : AT(ADDR(.init.ramfs) - LOAD_OFFSET) {
.init.ramfs : AT(ADDR(.init.ramfs) - LOAD_OFFSET) { *(.init.ramfs) } __initramfs_start = .;
__initramfs_end = .; *(.init.ramfs)
__initramfs_end = .;
}
. = ALIGN(L1_CACHE_BYTES); . = ALIGN(L1_CACHE_BYTES);
__per_cpu_start = .; .data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) {
.data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) } __per_cpu_start = .;
__per_cpu_end = .; *(.data.percpu)
__per_cpu_end = .;
}
. = ALIGN(4096); . = ALIGN(4096);
__init_end = .;
/* freed after init ends here */ /* freed after init ends here */
__bss_start = .; /* BSS */
.bss.page_aligned : AT(ADDR(.bss.page_aligned) - LOAD_OFFSET) {
*(.bss.page_aligned)
}
.bss : AT(ADDR(.bss) - LOAD_OFFSET) { .bss : AT(ADDR(.bss) - LOAD_OFFSET) {
__init_end = .;
__bss_start = .; /* BSS */
*(.bss.page_aligned)
*(.bss) *(.bss)
. = ALIGN(4);
__bss_stop = .;
_end = . ;
/* This is where the kernel creates the early boot page tables */
. = ALIGN(4096);
pg0 = . ;
} }
. = ALIGN(4);
__bss_stop = .;
_end = . ;
/* This is where the kernel creates the early boot page tables */
. = ALIGN(4096);
pg0 = .;
/* Sections to be discarded */ /* Sections to be discarded */
/DISCARD/ : { /DISCARD/ : {

View file

@ -45,7 +45,9 @@ static int __init parse_apic(char *arg)
return 0; return 0;
} }
} }
return -ENOENT;
/* Parsed again by __setup for debug/verbose */
return 0;
} }
early_param("apic", parse_apic); early_param("apic", parse_apic);

View file

@ -776,7 +776,7 @@ voyager_cat_init(void)
for(asic=0; asic < (*modpp)->num_asics; asic++) { for(asic=0; asic < (*modpp)->num_asics; asic++) {
int j; int j;
voyager_asic_t *asicp = *asicpp voyager_asic_t *asicp = *asicpp
= kmalloc(sizeof(voyager_asic_t), GFP_KERNEL); /*&voyager_asic_storage[asic_count++];*/ = kzalloc(sizeof(voyager_asic_t), GFP_KERNEL); /*&voyager_asic_storage[asic_count++];*/
voyager_sp_table_t *sp_table; voyager_sp_table_t *sp_table;
voyager_at_t *asic_table; voyager_at_t *asic_table;
voyager_jtt_t *jtag_table; voyager_jtt_t *jtag_table;
@ -785,7 +785,6 @@ voyager_cat_init(void)
printk("**WARNING** kmalloc failure in cat_init\n"); printk("**WARNING** kmalloc failure in cat_init\n");
continue; continue;
} }
memset(asicp, 0, sizeof(voyager_asic_t));
asicpp = &(asicp->next); asicpp = &(asicp->next);
asicp->asic_location = asic; asicp->asic_location = asic;
sp_table = (voyager_sp_table_t *)(eprom_buf + sp_offset); sp_table = (voyager_sp_table_t *)(eprom_buf + sp_offset);
@ -851,8 +850,7 @@ voyager_cat_init(void)
#endif #endif
{ {
struct resource *res = kmalloc(sizeof(struct resource),GFP_KERNEL); struct resource *res = kzalloc(sizeof(struct resource),GFP_KERNEL);
memset(res, 0, sizeof(struct resource));
res->name = kmalloc(128, GFP_KERNEL); res->name = kmalloc(128, GFP_KERNEL);
sprintf((char *)res->name, "Voyager %s Quad CPI", cat_module_name(i)); sprintf((char *)res->name, "Voyager %s Quad CPI", cat_module_name(i));
res->start = qic_addr; res->start = qic_addr;

View file

@ -28,6 +28,7 @@
#include <asm/pgalloc.h> #include <asm/pgalloc.h>
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
#include <asm/arch_hooks.h> #include <asm/arch_hooks.h>
#include <asm/pda.h>
/* TLB state -- visible externally, indexed physically */ /* TLB state -- visible externally, indexed physically */
DEFINE_PER_CPU(struct tlb_state, cpu_tlbstate) ____cacheline_aligned = { &init_mm, 0 }; DEFINE_PER_CPU(struct tlb_state, cpu_tlbstate) ____cacheline_aligned = { &init_mm, 0 };
@ -422,6 +423,7 @@ find_smp_config(void)
VOYAGER_SUS_IN_CONTROL_PORT); VOYAGER_SUS_IN_CONTROL_PORT);
current_thread_info()->cpu = boot_cpu_id; current_thread_info()->cpu = boot_cpu_id;
write_pda(cpu_number, boot_cpu_id);
} }
/* /*
@ -458,7 +460,7 @@ start_secondary(void *unused)
/* external functions not defined in the headers */ /* external functions not defined in the headers */
extern void calibrate_delay(void); extern void calibrate_delay(void);
cpu_init(); secondary_cpu_init();
/* OK, we're in the routine */ /* OK, we're in the routine */
ack_CPI(VIC_CPU_BOOT_CPI); ack_CPI(VIC_CPU_BOOT_CPI);
@ -578,6 +580,15 @@ do_boot_cpu(__u8 cpu)
/* init_tasks (in sched.c) is indexed logically */ /* init_tasks (in sched.c) is indexed logically */
stack_start.esp = (void *) idle->thread.esp; stack_start.esp = (void *) idle->thread.esp;
/* Pre-allocate and initialize the CPU's GDT and PDA so it
doesn't have to do any memory allocation during the
delicate CPU-bringup phase. */
if (!init_gdt(cpu, idle)) {
printk(KERN_INFO "Couldn't allocate GDT/PDA for CPU %d\n", cpu);
cpucount--;
return;
}
irq_ctx_init(cpu); irq_ctx_init(cpu);
/* Note: Don't modify initial ss override */ /* Note: Don't modify initial ss override */
@ -1963,4 +1974,5 @@ void __init
smp_setup_processor_id(void) smp_setup_processor_id(void)
{ {
current_thread_info()->cpu = hard_smp_processor_id(); current_thread_info()->cpu = hard_smp_processor_id();
write_pda(cpu_number, hard_smp_processor_id());
} }

View file

@ -57,6 +57,7 @@
#define TAG_Special Const(2) /* De-normal, + or - infinity, #define TAG_Special Const(2) /* De-normal, + or - infinity,
or Not a Number */ or Not a Number */
#define TAG_Empty Const(3) /* empty */ #define TAG_Empty Const(3) /* empty */
#define TAG_Error Const(0x80) /* probably need to abort */
#define LOADED_DATA Const(10101) /* Special st() number to identify #define LOADED_DATA Const(10101) /* Special st() number to identify
loaded data (not on stack). */ loaded data (not on stack). */

View file

@ -742,7 +742,8 @@ int save_i387_soft(void *s387, struct _fpstate __user * buf)
S387->fcs &= ~0xf8000000; S387->fcs &= ~0xf8000000;
S387->fos |= 0xffff0000; S387->fos |= 0xffff0000;
#endif /* PECULIAR_486 */ #endif /* PECULIAR_486 */
__copy_to_user(d, &S387->cwd, 7*4); if (__copy_to_user(d, &S387->cwd, 7*4))
return -1;
RE_ENTRANT_CHECK_ON; RE_ENTRANT_CHECK_ON;
d += 7*4; d += 7*4;

View file

@ -68,6 +68,7 @@
#define FPU_access_ok(x,y,z) if ( !access_ok(x,y,z) ) \ #define FPU_access_ok(x,y,z) if ( !access_ok(x,y,z) ) \
math_abort(FPU_info,SIGSEGV) math_abort(FPU_info,SIGSEGV)
#define FPU_abort math_abort(FPU_info, SIGSEGV)
#undef FPU_IGNORE_CODE_SEGV #undef FPU_IGNORE_CODE_SEGV
#ifdef FPU_IGNORE_CODE_SEGV #ifdef FPU_IGNORE_CODE_SEGV

View file

@ -227,6 +227,8 @@ int FPU_load_store(u_char type, fpu_addr_modes addr_modes,
case 027: /* fild m64int */ case 027: /* fild m64int */
clear_C1(); clear_C1();
loaded_tag = FPU_load_int64((long long __user *)data_address); loaded_tag = FPU_load_int64((long long __user *)data_address);
if (loaded_tag == TAG_Error)
return 0;
FPU_settag0(loaded_tag); FPU_settag0(loaded_tag);
break; break;
case 030: /* fstenv m14/28byte */ case 030: /* fstenv m14/28byte */

View file

@ -244,7 +244,8 @@ int FPU_load_int64(long long __user *_s)
RE_ENTRANT_CHECK_OFF; RE_ENTRANT_CHECK_OFF;
FPU_access_ok(VERIFY_READ, _s, 8); FPU_access_ok(VERIFY_READ, _s, 8);
copy_from_user(&s,_s,8); if (copy_from_user(&s,_s,8))
FPU_abort;
RE_ENTRANT_CHECK_ON; RE_ENTRANT_CHECK_ON;
if (s == 0) if (s == 0)
@ -907,7 +908,8 @@ int FPU_store_int64(FPU_REG *st0_ptr, u_char st0_tag, long long __user *d)
RE_ENTRANT_CHECK_OFF; RE_ENTRANT_CHECK_OFF;
FPU_access_ok(VERIFY_WRITE,d,8); FPU_access_ok(VERIFY_WRITE,d,8);
copy_to_user(d, &tll, 8); if (copy_to_user(d, &tll, 8))
FPU_abort;
RE_ENTRANT_CHECK_ON; RE_ENTRANT_CHECK_ON;
return 1; return 1;
@ -1336,7 +1338,8 @@ u_char __user *fstenv(fpu_addr_modes addr_modes, u_char __user *d)
I387.soft.fcs &= ~0xf8000000; I387.soft.fcs &= ~0xf8000000;
I387.soft.fos |= 0xffff0000; I387.soft.fos |= 0xffff0000;
#endif /* PECULIAR_486 */ #endif /* PECULIAR_486 */
__copy_to_user(d, &control_word, 7*4); if (__copy_to_user(d, &control_word, 7*4))
FPU_abort;
RE_ENTRANT_CHECK_ON; RE_ENTRANT_CHECK_ON;
d += 0x1c; d += 0x1c;
} }
@ -1359,9 +1362,11 @@ void fsave(fpu_addr_modes addr_modes, u_char __user *data_address)
FPU_access_ok(VERIFY_WRITE,d,80); FPU_access_ok(VERIFY_WRITE,d,80);
/* Copy all registers in stack order. */ /* Copy all registers in stack order. */
__copy_to_user(d, register_base+offset, other); if (__copy_to_user(d, register_base+offset, other))
FPU_abort;
if ( offset ) if ( offset )
__copy_to_user(d+other, register_base, offset); if (__copy_to_user(d+other, register_base, offset))
FPU_abort;
RE_ENTRANT_CHECK_ON; RE_ENTRANT_CHECK_ON;
finit(); finit();

View file

@ -16,6 +16,7 @@
*/ */
#undef CONFIG_X86_PAE #undef CONFIG_X86_PAE
#undef CONFIG_PARAVIRT
#include <asm/page.h> #include <asm/page.h>
#include <asm/pgtable.h> #include <asm/pgtable.h>
#include <asm/tlbflush.h> #include <asm/tlbflush.h>

View file

@ -168,7 +168,7 @@ static void __init allocate_pgdat(int nid)
if (nid && node_has_online_mem(nid)) if (nid && node_has_online_mem(nid))
NODE_DATA(nid) = (pg_data_t *)node_remap_start_vaddr[nid]; NODE_DATA(nid) = (pg_data_t *)node_remap_start_vaddr[nid];
else { else {
NODE_DATA(nid) = (pg_data_t *)(__va(min_low_pfn << PAGE_SHIFT)); NODE_DATA(nid) = (pg_data_t *)(pfn_to_kaddr(min_low_pfn));
min_low_pfn += PFN_UP(sizeof(pg_data_t)); min_low_pfn += PFN_UP(sizeof(pg_data_t));
} }
} }

View file

@ -22,9 +22,9 @@
#include <linux/highmem.h> #include <linux/highmem.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/kprobes.h> #include <linux/kprobes.h>
#include <linux/uaccess.h>
#include <asm/system.h> #include <asm/system.h>
#include <asm/uaccess.h>
#include <asm/desc.h> #include <asm/desc.h>
#include <asm/kdebug.h> #include <asm/kdebug.h>
#include <asm/segment.h> #include <asm/segment.h>
@ -167,7 +167,7 @@ static inline unsigned long get_segment_eip(struct pt_regs *regs,
static int __is_prefetch(struct pt_regs *regs, unsigned long addr) static int __is_prefetch(struct pt_regs *regs, unsigned long addr)
{ {
unsigned long limit; unsigned long limit;
unsigned long instr = get_segment_eip (regs, &limit); unsigned char *instr = (unsigned char *)get_segment_eip (regs, &limit);
int scan_more = 1; int scan_more = 1;
int prefetch = 0; int prefetch = 0;
int i; int i;
@ -177,9 +177,9 @@ static int __is_prefetch(struct pt_regs *regs, unsigned long addr)
unsigned char instr_hi; unsigned char instr_hi;
unsigned char instr_lo; unsigned char instr_lo;
if (instr > limit) if (instr > (unsigned char *)limit)
break; break;
if (__get_user(opcode, (unsigned char __user *) instr)) if (probe_kernel_address(instr, opcode))
break; break;
instr_hi = opcode & 0xf0; instr_hi = opcode & 0xf0;
@ -204,9 +204,9 @@ static int __is_prefetch(struct pt_regs *regs, unsigned long addr)
case 0x00: case 0x00:
/* Prefetch instruction is 0x0F0D or 0x0F18 */ /* Prefetch instruction is 0x0F0D or 0x0F18 */
scan_more = 0; scan_more = 0;
if (instr > limit) if (instr > (unsigned char *)limit)
break; break;
if (__get_user(opcode, (unsigned char __user *) instr)) if (probe_kernel_address(instr, opcode))
break; break;
prefetch = (instr_lo == 0xF) && prefetch = (instr_lo == 0xF) &&
(opcode == 0x0D || opcode == 0x18); (opcode == 0x0D || opcode == 0x18);

View file

@ -192,8 +192,6 @@ static inline int page_kills_ppro(unsigned long pagenr)
return 0; return 0;
} }
extern int is_available_memory(efi_memory_desc_t *);
int page_is_ram(unsigned long pagenr) int page_is_ram(unsigned long pagenr)
{ {
int i; int i;

View file

@ -67,11 +67,17 @@ static struct page *split_large_page(unsigned long address, pgprot_t prot,
return base; return base;
} }
static void flush_kernel_map(void *dummy) static void flush_kernel_map(void *arg)
{ {
/* Could use CLFLUSH here if the CPU supports it (Hammer,P4) */ unsigned long adr = (unsigned long)arg;
if (boot_cpu_data.x86_model >= 4)
if (adr && cpu_has_clflush) {
int i;
for (i = 0; i < PAGE_SIZE; i += boot_cpu_data.x86_clflush_size)
asm volatile("clflush (%0)" :: "r" (adr + i));
} else if (boot_cpu_data.x86_model >= 4)
wbinvd(); wbinvd();
/* Flush all to work around Errata in early athlons regarding /* Flush all to work around Errata in early athlons regarding
* large page flushing. * large page flushing.
*/ */
@ -173,9 +179,9 @@ __change_page_attr(struct page *page, pgprot_t prot)
return 0; return 0;
} }
static inline void flush_map(void) static inline void flush_map(void *adr)
{ {
on_each_cpu(flush_kernel_map, NULL, 1, 1); on_each_cpu(flush_kernel_map, adr, 1, 1);
} }
/* /*
@ -217,9 +223,13 @@ void global_flush_tlb(void)
spin_lock_irq(&cpa_lock); spin_lock_irq(&cpa_lock);
list_replace_init(&df_list, &l); list_replace_init(&df_list, &l);
spin_unlock_irq(&cpa_lock); spin_unlock_irq(&cpa_lock);
flush_map(); if (!cpu_has_clflush)
list_for_each_entry_safe(pg, next, &l, lru) flush_map(0);
list_for_each_entry_safe(pg, next, &l, lru) {
if (cpu_has_clflush)
flush_map(page_address(pg));
__free_page(pg); __free_page(pg);
}
} }
#ifdef CONFIG_DEBUG_PAGEALLOC #ifdef CONFIG_DEBUG_PAGEALLOC

View file

@ -95,8 +95,11 @@ static void set_pte_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags)
return; return;
} }
pte = pte_offset_kernel(pmd, vaddr); pte = pte_offset_kernel(pmd, vaddr);
/* <pfn,flags> stored as-is, to permit clearing entries */ if (pgprot_val(flags))
set_pte(pte, pfn_pte(pfn, flags)); /* <pfn,flags> stored as-is, to permit clearing entries */
set_pte(pte, pfn_pte(pfn, flags));
else
pte_clear(&init_mm, vaddr, pte);
/* /*
* It's enough to flush this one mapping. * It's enough to flush this one mapping.

View file

@ -45,6 +45,13 @@ void write_pci_config(u8 bus, u8 slot, u8 func, u8 offset,
outl(val, 0xcfc); outl(val, 0xcfc);
} }
void write_pci_config_byte(u8 bus, u8 slot, u8 func, u8 offset, u8 val)
{
PDprintk("%x writing to %x: %x\n", slot, offset, val);
outl(0x80000000 | (bus<<16) | (slot<<11) | (func<<8) | offset, 0xcf8);
outb(val, 0xcfc);
}
int early_pci_allowed(void) int early_pci_allowed(void)
{ {
return (pci_probe & (PCI_PROBE_CONF1|PCI_PROBE_NOEARLY)) == return (pci_probe & (PCI_PROBE_CONF1|PCI_PROBE_NOEARLY)) ==

View file

@ -764,7 +764,7 @@ static void __init pirq_find_router(struct irq_router *r)
DBG(KERN_DEBUG "PCI: Attempting to find IRQ router for %04x:%04x\n", DBG(KERN_DEBUG "PCI: Attempting to find IRQ router for %04x:%04x\n",
rt->rtr_vendor, rt->rtr_device); rt->rtr_vendor, rt->rtr_device);
pirq_router_dev = pci_find_slot(rt->rtr_bus, rt->rtr_devfn); pirq_router_dev = pci_get_bus_and_slot(rt->rtr_bus, rt->rtr_devfn);
if (!pirq_router_dev) { if (!pirq_router_dev) {
DBG(KERN_DEBUG "PCI: Interrupt router not found at " DBG(KERN_DEBUG "PCI: Interrupt router not found at "
"%02x:%02x\n", rt->rtr_bus, rt->rtr_devfn); "%02x:%02x\n", rt->rtr_bus, rt->rtr_devfn);
@ -784,6 +784,8 @@ static void __init pirq_find_router(struct irq_router *r)
pirq_router_dev->vendor, pirq_router_dev->vendor,
pirq_router_dev->device, pirq_router_dev->device,
pci_name(pirq_router_dev)); pci_name(pirq_router_dev));
/* The device remains referenced for the kernel lifetime */
} }
static struct irq_info *pirq_get_info(struct pci_dev *dev) static struct irq_info *pirq_get_info(struct pci_dev *dev)

View file

@ -5,6 +5,7 @@
#include <linux/pci.h> #include <linux/pci.h>
#include <linux/init.h> #include <linux/init.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/uaccess.h>
#include "pci.h" #include "pci.h"
#include "pci-functions.h" #include "pci-functions.h"
@ -314,6 +315,10 @@ static struct pci_raw_ops * __devinit pci_find_bios(void)
for (check = (union bios32 *) __va(0xe0000); for (check = (union bios32 *) __va(0xe0000);
check <= (union bios32 *) __va(0xffff0); check <= (union bios32 *) __va(0xffff0);
++check) { ++check) {
long sig;
if (probe_kernel_address(&check->fields.signature, sig))
continue;
if (check->fields.signature != BIOS32_SIGNATURE) if (check->fields.signature != BIOS32_SIGNATURE)
continue; continue;
length = check->fields.length * 16; length = check->fields.length * 16;
@ -331,11 +336,13 @@ static struct pci_raw_ops * __devinit pci_find_bios(void)
} }
DBG("PCI: BIOS32 Service Directory structure at 0x%p\n", check); DBG("PCI: BIOS32 Service Directory structure at 0x%p\n", check);
if (check->fields.entry >= 0x100000) { if (check->fields.entry >= 0x100000) {
printk("PCI: BIOS32 entry (0x%p) in high memory, cannot use.\n", check); printk("PCI: BIOS32 entry (0x%p) in high memory, "
"cannot use.\n", check);
return NULL; return NULL;
} else { } else {
unsigned long bios32_entry = check->fields.entry; unsigned long bios32_entry = check->fields.entry;
DBG("PCI: BIOS32 Service Directory entry at 0x%lx\n", bios32_entry); DBG("PCI: BIOS32 Service Directory entry at 0x%lx\n",
bios32_entry);
bios32_indirect.address = bios32_entry + PAGE_OFFSET; bios32_indirect.address = bios32_entry + PAGE_OFFSET;
if (check_pcibios()) if (check_pcibios())
return &pci_bios_access; return &pci_bios_access;

View file

@ -26,8 +26,8 @@ void __save_processor_state(struct saved_context *ctxt)
/* /*
* descriptor tables * descriptor tables
*/ */
store_gdt(&ctxt->gdt_limit); store_gdt(&ctxt->gdt);
store_idt(&ctxt->idt_limit); store_idt(&ctxt->idt);
store_tr(ctxt->tr); store_tr(ctxt->tr);
/* /*
@ -99,8 +99,8 @@ void __restore_processor_state(struct saved_context *ctxt)
* now restore the descriptor tables to their proper values * now restore the descriptor tables to their proper values
* ltr is done i fix_processor_context(). * ltr is done i fix_processor_context().
*/ */
load_gdt(&ctxt->gdt_limit); load_gdt(&ctxt->gdt);
load_idt(&ctxt->idt_limit); load_idt(&ctxt->idt);
/* /*
* segment registers * segment registers

View file

@ -31,11 +31,11 @@ int arch_register_cpu(int num)
{ {
#if defined (CONFIG_ACPI) && defined (CONFIG_HOTPLUG_CPU) #if defined (CONFIG_ACPI) && defined (CONFIG_HOTPLUG_CPU)
/* /*
* If CPEI cannot be re-targetted, and this is * If CPEI can be re-targetted or if this is not
* CPEI target, then dont create the control file * CPEI target, then it is hotpluggable
*/ */
if (!can_cpei_retarget() && is_cpu_cpei_target(num)) if (can_cpei_retarget() || !is_cpu_cpei_target(num))
sysfs_cpus[num].cpu.no_control = 1; sysfs_cpus[num].cpu.hotpluggable = 1;
map_cpu_to_node(num, node_cpuid[num].nid); map_cpu_to_node(num, node_cpuid[num].nid);
#endif #endif

View file

@ -60,6 +60,7 @@ SECTIONS {
#endif #endif
.text : { .text : {
_text = .;
_stext = . ; _stext = . ;
*(.text) *(.text)
SCHED_TEXT SCHED_TEXT

View file

@ -239,7 +239,7 @@ static void unregister_cpu_online(unsigned int cpu)
struct cpu *c = &per_cpu(cpu_devices, cpu); struct cpu *c = &per_cpu(cpu_devices, cpu);
struct sys_device *s = &c->sysdev; struct sys_device *s = &c->sysdev;
BUG_ON(c->no_control); BUG_ON(!c->hotpluggable);
if (!firmware_has_feature(FW_FEATURE_ISERIES) && if (!firmware_has_feature(FW_FEATURE_ISERIES) &&
cpu_has_feature(CPU_FTR_SMT)) cpu_has_feature(CPU_FTR_SMT))
@ -424,10 +424,10 @@ static int __init topology_init(void)
* CPU. For instance, the boot cpu might never be valid * CPU. For instance, the boot cpu might never be valid
* for hotplugging. * for hotplugging.
*/ */
if (!ppc_md.cpu_die) if (ppc_md.cpu_die)
c->no_control = 1; c->hotpluggable = 1;
if (cpu_online(cpu) || (c->no_control == 0)) { if (cpu_online(cpu) || c->hotpluggable) {
register_cpu(c, cpu); register_cpu(c, cpu);
sysdev_create_file(&c->sysdev, &attr_physical_id); sysdev_create_file(&c->sysdev, &attr_physical_id);

View file

@ -33,6 +33,7 @@ SECTIONS
/* Text and gots */ /* Text and gots */
.text : { .text : {
_text = .;
*(.text .text.*) *(.text .text.*)
SCHED_TEXT SCHED_TEXT
LOCK_TEXT LOCK_TEXT

View file

@ -31,6 +31,7 @@ SECTIONS
.plt : { *(.plt) } .plt : { *(.plt) }
.text : .text :
{ {
_text = .;
*(.text) *(.text)
SCHED_TEXT SCHED_TEXT
LOCK_TEXT LOCK_TEXT

View file

@ -11,6 +11,7 @@ SECTIONS
. = 0x10000 + SIZEOF_HEADERS; . = 0x10000 + SIZEOF_HEADERS;
.text 0xf0004000 : .text 0xf0004000 :
{ {
_text = .;
*(.text) *(.text)
SCHED_TEXT SCHED_TEXT
LOCK_TEXT LOCK_TEXT

View file

@ -13,6 +13,7 @@ SECTIONS
. = 0x4000; . = 0x4000;
.text 0x0000000000404000 : .text 0x0000000000404000 :
{ {
_text = .;
*(.text) *(.text)
SCHED_TEXT SCHED_TEXT
LOCK_TEXT LOCK_TEXT

View file

@ -90,6 +90,7 @@
/* Kernel text segment, and some constant data areas. */ /* Kernel text segment, and some constant data areas. */
#define TEXT_CONTENTS \ #define TEXT_CONTENTS \
_text = .; \
__stext = . ; \ __stext = . ; \
*(.text) \ *(.text) \
SCHED_TEXT \ SCHED_TEXT \

View file

@ -122,7 +122,7 @@ endchoice
choice choice
prompt "Processor family" prompt "Processor family"
default MK8 default GENERIC_CPU
config MK8 config MK8
bool "AMD-Opteron/Athlon64" bool "AMD-Opteron/Athlon64"
@ -130,16 +130,31 @@ config MK8
Optimize for AMD Opteron/Athlon64/Hammer/K8 CPUs. Optimize for AMD Opteron/Athlon64/Hammer/K8 CPUs.
config MPSC config MPSC
bool "Intel EM64T" bool "Intel P4 / older Netburst based Xeon"
help help
Optimize for Intel Pentium 4 and Xeon CPUs with Intel Optimize for Intel Pentium 4 and older Nocona/Dempsey Xeon CPUs
Extended Memory 64 Technology(EM64T). For details see with Intel Extended Memory 64 Technology(EM64T). For details see
<http://www.intel.com/technology/64bitextensions/>. <http://www.intel.com/technology/64bitextensions/>.
Note the the latest Xeons (Xeon 51xx and 53xx) are not based on the
Netburst core and shouldn't use this option. You can distingush them
using the cpu family field
in /proc/cpuinfo. Family 15 is a older Xeon, Family 6 a newer one
(this rule only applies to system that support EM64T)
config MCORE2
bool "Intel Core2 / newer Xeon"
help
Optimize for Intel Core2 and newer Xeons (51xx)
You can distingush the newer Xeons from the older ones using
the cpu family field in /proc/cpuinfo. 15 is a older Xeon
(use CONFIG_MPSC then), 6 is a newer one. This rule only
applies to CPUs that support EM64T.
config GENERIC_CPU config GENERIC_CPU
bool "Generic-x86-64" bool "Generic-x86-64"
help help
Generic x86-64 CPU. Generic x86-64 CPU.
Run equally well on all x86-64 CPUs.
endchoice endchoice
@ -149,12 +164,12 @@ endchoice
config X86_L1_CACHE_BYTES config X86_L1_CACHE_BYTES
int int
default "128" if GENERIC_CPU || MPSC default "128" if GENERIC_CPU || MPSC
default "64" if MK8 default "64" if MK8 || MCORE2
config X86_L1_CACHE_SHIFT config X86_L1_CACHE_SHIFT
int int
default "7" if GENERIC_CPU || MPSC default "7" if GENERIC_CPU || MPSC
default "6" if MK8 default "6" if MK8 || MCORE2
config X86_INTERNODE_CACHE_BYTES config X86_INTERNODE_CACHE_BYTES
int int
@ -344,11 +359,6 @@ config ARCH_DISCONTIGMEM_ENABLE
depends on NUMA depends on NUMA
default y default y
config ARCH_DISCONTIGMEM_ENABLE
def_bool y
depends on NUMA
config ARCH_DISCONTIGMEM_DEFAULT config ARCH_DISCONTIGMEM_DEFAULT
def_bool y def_bool y
depends on NUMA depends on NUMA
@ -455,6 +465,17 @@ config CALGARY_IOMMU
Normally the kernel will make the right choice by itself. Normally the kernel will make the right choice by itself.
If unsure, say Y. If unsure, say Y.
config CALGARY_IOMMU_ENABLED_BY_DEFAULT
bool "Should Calgary be enabled by default?"
default y
depends on CALGARY_IOMMU
help
Should Calgary be enabled by default? if you choose 'y', Calgary
will be used (if it exists). If you choose 'n', Calgary will not be
used even if it exists. If you choose 'n' and would like to use
Calgary anyway, pass 'iommu=calgary' on the kernel command line.
If unsure, say Y.
# need this always selected by IOMMU for the VIA workaround # need this always selected by IOMMU for the VIA workaround
config SWIOTLB config SWIOTLB
bool bool

View file

@ -30,6 +30,10 @@ cflags-y :=
cflags-kernel-y := cflags-kernel-y :=
cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8) cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona) cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
# gcc doesn't support -march=core2 yet as of gcc 4.3, but I hope it
# will eventually. Use -mtune=generic as fallback
cflags-$(CONFIG_MCORE2) += \
$(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic) cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
cflags-y += -m64 cflags-y += -m64

View file

@ -1,7 +1,7 @@
# #
# Automatically generated make config: don't edit # Automatically generated make config: don't edit
# Linux kernel version: 2.6.19-rc2-git4 # Linux kernel version: 2.6.19-git7
# Sat Oct 21 03:38:52 2006 # Wed Dec 6 23:50:47 2006
# #
CONFIG_X86_64=y CONFIG_X86_64=y
CONFIG_64BIT=y CONFIG_64BIT=y
@ -47,13 +47,14 @@ CONFIG_POSIX_MQUEUE=y
CONFIG_IKCONFIG=y CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y CONFIG_IKCONFIG_PROC=y
# CONFIG_CPUSETS is not set # CONFIG_CPUSETS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set # CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE="" CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set # CONFIG_EMBEDDED is not set
CONFIG_UID16=y CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set
@ -87,9 +88,7 @@ CONFIG_STOP_MACHINE=y
# Block layer # Block layer
# #
CONFIG_BLOCK=y CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
# #
# IO Schedulers # IO Schedulers
@ -111,10 +110,11 @@ CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set # CONFIG_X86_VSMP is not set
# CONFIG_MK8 is not set # CONFIG_MK8 is not set
# CONFIG_MPSC is not set # CONFIG_MPSC is not set
CONFIG_GENERIC_CPU=y CONFIG_MCORE2=y
CONFIG_X86_L1_CACHE_BYTES=128 # CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_INTERNODE_CACHE_BYTES=128 CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_TSC=y CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y CONFIG_X86_GOOD_APIC=y
# CONFIG_MICROCODE is not set # CONFIG_MICROCODE is not set
@ -322,6 +322,7 @@ CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set # CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic" CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y CONFIG_IPV6=y
# CONFIG_IPV6_PRIVACY is not set # CONFIG_IPV6_PRIVACY is not set
# CONFIG_IPV6_ROUTER_PREF is not set # CONFIG_IPV6_ROUTER_PREF is not set
@ -624,6 +625,7 @@ CONFIG_SATA_INTEL_COMBINED=y
# CONFIG_PATA_IT821X is not set # CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set # CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_TRIFLEX is not set # CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_MPIIX is not set # CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_OLDPIIX is not set # CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set # CONFIG_PATA_NETCELL is not set
@ -795,6 +797,7 @@ CONFIG_BNX2=y
CONFIG_S2IO=m CONFIG_S2IO=m
# CONFIG_S2IO_NAPI is not set # CONFIG_S2IO_NAPI is not set
# CONFIG_MYRI10GE is not set # CONFIG_MYRI10GE is not set
# CONFIG_NETXEN_NIC is not set
# #
# Token Ring devices # Token Ring devices
@ -927,10 +930,6 @@ CONFIG_RTC=y
# CONFIG_DTLK is not set # CONFIG_DTLK is not set
# CONFIG_R3964 is not set # CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set # CONFIG_APPLICOM is not set
#
# Ftape, the floppy tape device driver
#
CONFIG_AGP=y CONFIG_AGP=y
CONFIG_AGP_AMD64=y CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y CONFIG_AGP_INTEL=y
@ -1135,6 +1134,7 @@ CONFIG_USB_DEVICEFS=y
# CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_DYNAMIC_MINORS is not set # CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set # CONFIG_USB_SUSPEND is not set
# CONFIG_USB_MULTITHREAD_PROBE is not set
# CONFIG_USB_OTG is not set # CONFIG_USB_OTG is not set
# #
@ -1212,6 +1212,7 @@ CONFIG_USB_HIDINPUT=y
# CONFIG_USB_KAWETH is not set # CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set # CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set # CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET_MII is not set
# CONFIG_USB_USBNET is not set # CONFIG_USB_USBNET is not set
CONFIG_USB_MON=y CONFIG_USB_MON=y

View file

@ -584,6 +584,11 @@ int ia32_setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
regs->rdx = (unsigned long) &frame->info; regs->rdx = (unsigned long) &frame->info;
regs->rcx = (unsigned long) &frame->uc; regs->rcx = (unsigned long) &frame->uc;
/* Make -mregparm=3 work */
regs->rax = sig;
regs->rdx = (unsigned long) &frame->info;
regs->rcx = (unsigned long) &frame->uc;
asm volatile("movl %0,%%ds" :: "r" (__USER32_DS)); asm volatile("movl %0,%%ds" :: "r" (__USER32_DS));
asm volatile("movl %0,%%es" :: "r" (__USER32_DS)); asm volatile("movl %0,%%es" :: "r" (__USER32_DS));

View file

@ -25,6 +25,7 @@
#include <linux/kernel_stat.h> #include <linux/kernel_stat.h>
#include <linux/sysdev.h> #include <linux/sysdev.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/ioport.h>
#include <asm/atomic.h> #include <asm/atomic.h>
#include <asm/smp.h> #include <asm/smp.h>
@ -45,6 +46,12 @@ int apic_calibrate_pmtmr __initdata;
int disable_apic_timer __initdata; int disable_apic_timer __initdata;
static struct resource *ioapic_resources;
static struct resource lapic_resource = {
.name = "Local APIC",
.flags = IORESOURCE_MEM | IORESOURCE_BUSY,
};
/* /*
* cpu_mask that denotes the CPUs that needs timer interrupt coming in as * cpu_mask that denotes the CPUs that needs timer interrupt coming in as
* IPIs in place of local APIC timers * IPIs in place of local APIC timers
@ -133,7 +140,6 @@ void clear_local_APIC(void)
apic_write(APIC_LVTERR, APIC_LVT_MASKED); apic_write(APIC_LVTERR, APIC_LVT_MASKED);
if (maxlvt >= 4) if (maxlvt >= 4)
apic_write(APIC_LVTPC, APIC_LVT_MASKED); apic_write(APIC_LVTPC, APIC_LVT_MASKED);
v = GET_APIC_VERSION(apic_read(APIC_LVR));
apic_write(APIC_ESR, 0); apic_write(APIC_ESR, 0);
apic_read(APIC_ESR); apic_read(APIC_ESR);
} }
@ -452,23 +458,30 @@ static struct {
static int lapic_suspend(struct sys_device *dev, pm_message_t state) static int lapic_suspend(struct sys_device *dev, pm_message_t state)
{ {
unsigned long flags; unsigned long flags;
int maxlvt;
if (!apic_pm_state.active) if (!apic_pm_state.active)
return 0; return 0;
maxlvt = get_maxlvt();
apic_pm_state.apic_id = apic_read(APIC_ID); apic_pm_state.apic_id = apic_read(APIC_ID);
apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI); apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI);
apic_pm_state.apic_ldr = apic_read(APIC_LDR); apic_pm_state.apic_ldr = apic_read(APIC_LDR);
apic_pm_state.apic_dfr = apic_read(APIC_DFR); apic_pm_state.apic_dfr = apic_read(APIC_DFR);
apic_pm_state.apic_spiv = apic_read(APIC_SPIV); apic_pm_state.apic_spiv = apic_read(APIC_SPIV);
apic_pm_state.apic_lvtt = apic_read(APIC_LVTT); apic_pm_state.apic_lvtt = apic_read(APIC_LVTT);
apic_pm_state.apic_lvtpc = apic_read(APIC_LVTPC); if (maxlvt >= 4)
apic_pm_state.apic_lvtpc = apic_read(APIC_LVTPC);
apic_pm_state.apic_lvt0 = apic_read(APIC_LVT0); apic_pm_state.apic_lvt0 = apic_read(APIC_LVT0);
apic_pm_state.apic_lvt1 = apic_read(APIC_LVT1); apic_pm_state.apic_lvt1 = apic_read(APIC_LVT1);
apic_pm_state.apic_lvterr = apic_read(APIC_LVTERR); apic_pm_state.apic_lvterr = apic_read(APIC_LVTERR);
apic_pm_state.apic_tmict = apic_read(APIC_TMICT); apic_pm_state.apic_tmict = apic_read(APIC_TMICT);
apic_pm_state.apic_tdcr = apic_read(APIC_TDCR); apic_pm_state.apic_tdcr = apic_read(APIC_TDCR);
apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); #ifdef CONFIG_X86_MCE_INTEL
if (maxlvt >= 5)
apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR);
#endif
local_irq_save(flags); local_irq_save(flags);
disable_local_APIC(); disable_local_APIC();
local_irq_restore(flags); local_irq_restore(flags);
@ -479,10 +492,13 @@ static int lapic_resume(struct sys_device *dev)
{ {
unsigned int l, h; unsigned int l, h;
unsigned long flags; unsigned long flags;
int maxlvt;
if (!apic_pm_state.active) if (!apic_pm_state.active)
return 0; return 0;
maxlvt = get_maxlvt();
local_irq_save(flags); local_irq_save(flags);
rdmsr(MSR_IA32_APICBASE, l, h); rdmsr(MSR_IA32_APICBASE, l, h);
l &= ~MSR_IA32_APICBASE_BASE; l &= ~MSR_IA32_APICBASE_BASE;
@ -496,8 +512,12 @@ static int lapic_resume(struct sys_device *dev)
apic_write(APIC_SPIV, apic_pm_state.apic_spiv); apic_write(APIC_SPIV, apic_pm_state.apic_spiv);
apic_write(APIC_LVT0, apic_pm_state.apic_lvt0); apic_write(APIC_LVT0, apic_pm_state.apic_lvt0);
apic_write(APIC_LVT1, apic_pm_state.apic_lvt1); apic_write(APIC_LVT1, apic_pm_state.apic_lvt1);
apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); #ifdef CONFIG_X86_MCE_INTEL
apic_write(APIC_LVTPC, apic_pm_state.apic_lvtpc); if (maxlvt >= 5)
apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr);
#endif
if (maxlvt >= 4)
apic_write(APIC_LVTPC, apic_pm_state.apic_lvtpc);
apic_write(APIC_LVTT, apic_pm_state.apic_lvtt); apic_write(APIC_LVTT, apic_pm_state.apic_lvtt);
apic_write(APIC_TDCR, apic_pm_state.apic_tdcr); apic_write(APIC_TDCR, apic_pm_state.apic_tdcr);
apic_write(APIC_TMICT, apic_pm_state.apic_tmict); apic_write(APIC_TMICT, apic_pm_state.apic_tmict);
@ -585,6 +605,64 @@ static int __init detect_init_APIC (void)
return 0; return 0;
} }
#ifdef CONFIG_X86_IO_APIC
static struct resource * __init ioapic_setup_resources(void)
{
#define IOAPIC_RESOURCE_NAME_SIZE 11
unsigned long n;
struct resource *res;
char *mem;
int i;
if (nr_ioapics <= 0)
return NULL;
n = IOAPIC_RESOURCE_NAME_SIZE + sizeof(struct resource);
n *= nr_ioapics;
mem = alloc_bootmem(n);
res = (void *)mem;
if (mem != NULL) {
memset(mem, 0, n);
mem += sizeof(struct resource) * nr_ioapics;
for (i = 0; i < nr_ioapics; i++) {
res[i].name = mem;
res[i].flags = IORESOURCE_MEM | IORESOURCE_BUSY;
sprintf(mem, "IOAPIC %u", i);
mem += IOAPIC_RESOURCE_NAME_SIZE;
}
}
ioapic_resources = res;
return res;
}
static int __init ioapic_insert_resources(void)
{
int i;
struct resource *r = ioapic_resources;
if (!r) {
printk("IO APIC resources could be not be allocated.\n");
return -1;
}
for (i = 0; i < nr_ioapics; i++) {
insert_resource(&iomem_resource, r);
r++;
}
return 0;
}
/* Insert the IO APIC resources after PCI initialization has occured to handle
* IO APICS that are mapped in on a BAR in PCI space. */
late_initcall(ioapic_insert_resources);
#endif
void __init init_apic_mappings(void) void __init init_apic_mappings(void)
{ {
unsigned long apic_phys; unsigned long apic_phys;
@ -604,6 +682,11 @@ void __init init_apic_mappings(void)
apic_mapped = 1; apic_mapped = 1;
apic_printk(APIC_VERBOSE,"mapped APIC to %16lx (%16lx)\n", APIC_BASE, apic_phys); apic_printk(APIC_VERBOSE,"mapped APIC to %16lx (%16lx)\n", APIC_BASE, apic_phys);
/* Put local APIC into the resource map. */
lapic_resource.start = apic_phys;
lapic_resource.end = lapic_resource.start + PAGE_SIZE - 1;
insert_resource(&iomem_resource, &lapic_resource);
/* /*
* Fetch the APIC ID of the BSP in case we have a * Fetch the APIC ID of the BSP in case we have a
* default configuration (or the MP table is broken). * default configuration (or the MP table is broken).
@ -613,7 +696,9 @@ void __init init_apic_mappings(void)
{ {
unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0; unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
int i; int i;
struct resource *ioapic_res;
ioapic_res = ioapic_setup_resources();
for (i = 0; i < nr_ioapics; i++) { for (i = 0; i < nr_ioapics; i++) {
if (smp_found_config) { if (smp_found_config) {
ioapic_phys = mp_ioapics[i].mpc_apicaddr; ioapic_phys = mp_ioapics[i].mpc_apicaddr;
@ -625,6 +710,12 @@ void __init init_apic_mappings(void)
apic_printk(APIC_VERBOSE,"mapped IOAPIC to %016lx (%016lx)\n", apic_printk(APIC_VERBOSE,"mapped IOAPIC to %016lx (%016lx)\n",
__fix_to_virt(idx), ioapic_phys); __fix_to_virt(idx), ioapic_phys);
idx++; idx++;
if (ioapic_res != NULL) {
ioapic_res->start = ioapic_phys;
ioapic_res->end = ioapic_phys + (4 * 1024) - 1;
ioapic_res++;
}
} }
} }
} }
@ -644,10 +735,9 @@ void __init init_apic_mappings(void)
static void __setup_APIC_LVTT(unsigned int clocks) static void __setup_APIC_LVTT(unsigned int clocks)
{ {
unsigned int lvtt_value, tmp_value, ver; unsigned int lvtt_value, tmp_value;
int cpu = smp_processor_id(); int cpu = smp_processor_id();
ver = GET_APIC_VERSION(apic_read(APIC_LVR));
lvtt_value = APIC_LVT_TIMER_PERIODIC | LOCAL_TIMER_VECTOR; lvtt_value = APIC_LVT_TIMER_PERIODIC | LOCAL_TIMER_VECTOR;
if (cpu_isset(cpu, timer_interrupt_broadcast_ipi_mask)) if (cpu_isset(cpu, timer_interrupt_broadcast_ipi_mask))

Some files were not shown because too many files have changed in this diff Show more