Commit graph

372 commits

Author SHA1 Message Date
Rafael J. Wysocki
8a235efad5 Hibernation: Handle DEBUG_PAGEALLOC on x86
Make hibernation work with CONFIG_DEBUG_PAGEALLOC set on x86, by
checking if the pages to be copied are marked as present in the
kernel mapping and temporarily marking them as present if that's not
the case.  No functional modifications are introduced if
CONFIG_DEBUG_PAGEALLOC is unset.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-21 02:15:28 -05:00
Linus Torvalds
5d9c4a7de6 Merge branch 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6
* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
  agp: fix missing casts that produced a warning.
  agp: add support for 662/671 to agp driver
  fix historic ioremap() abuse in AGP
  agp/sis: Suspend support for SiS AGP
  agp/sis: Clear bit 2 from aperture size byte as well
2008-02-19 18:29:57 -08:00
Arjan van de Ven
156fbc3fbe x86: fix page_is_ram() thinko
page_is_ram() has a special case for the 640k-1M bios area, however
due to a thinko the special case checks the e820 table entry and
not the memory the user has asked for. This patch fixes the bug.

[ mingo@elte.hu: this too is better solved in the e820 space, but those
  fixes are too intrusive for v2.6.25. ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:34 +01:00
Arjan van de Ven
d8a9e6a51e x86: fix WARN_ON() message: teach page_is_ram() about the special 4Kb bios data page
This patch teaches page_is_ram() about the fact that the first
4Kb of memory are special on x86, even though the E820 table
normally doesn't exclude it.

This fixes the WARN_ON() reported by Laurent Riffard who was also
very helpful in diagnosing the issue.

[ mingo@elte.hu: we are working on doing this properly in the e820
  space, but for 2.6.25 this is the better fix. ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:34 +01:00
Sam Ravnborg
d01b9ad56e x86: fix section mismatch in srat_64.c:reserve_hotadd
reserve_hotadd() are only used by __init acpi_numa_memory_affinity_init().
Annotate reserve_hotadd() with __init is the trivial fix.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:31 +01:00
Andi Kleen
8e31c2ac11 x86: CPA: remove BUG_ON for LRU/Compound pages
New implementation does not use lru for anything so there is no need
to reject pages that are in the LRU. Similar for compound pages (which
were checked because they also use page->lru)

[ tglx@linutronix.de: removed unused variable ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-19 16:18:29 +01:00
Arjan van dev Ven
fcea424d31 fix historic ioremap() abuse in AGP
Several AGP drivers right now use ioremap_nocache() on kernel ram in order
to turn a page of regular memory uncached.

There are two problems with this:

    1) This is a total nightmare for the ioremap() implementation to keep
       various mappings of the same page coherent.

    2) It's a total nightmare for the AGP code since it adds a ton of
       complexity in terms of keeping track of 2 different pointers to
       the same thing, in terms of error handling etc etc.

This patch fixes this by making the AGP drivers use the new
set_memory_XX APIs instead.

Note: amd-k7-agp.c is built on Alpha too, and generic.c is built
on ia64 as well, which do not yet have the set_memory_*() APIs,
so for them some we have a few ugly #ifdefs - hopefully they'll
be fixed soon.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2008-02-19 14:46:39 +10:00
Yinghai Lu
b7ad149d62 x86: reenable support for system without on node0
One system doesn't have RAM for node0 installed.

SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 1 -> APIC 2 -> Node 1
SRAT: PXM 1 -> APIC 3 -> Node 1
SRAT: Node 1 PXM 1 0-a0000
SRAT: Node 1 PXM 1 0-dd000000
SRAT: Node 1 PXM 1 0-123000000
ACPI: SLIT: nodes = 2
 10 13
 13 10
mapped APIC to ffffffffff5fb000 (        fee00000)
Bootmem setup node 1 0000000000000000-0000000123000000
  NODE_DATA [000000000000e000 - 0000000000014fff]
  bootmap [0000000000015000 -  00000000000395ff] pages 25
Could not find start_pfn for node 0
Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #14

Call Trace:
 [<ffffffff80bab498>] free_area_init_node+0x22/0x381
 [<ffffffff8045ffc5>] generic_swap+0x0/0x17
 [<ffffffff80bab0cc>] find_zone_movable_pfns_for_nodes+0x54/0x271
 [<ffffffff80baba5f>] free_area_init_nodes+0x239/0x287
 [<ffffffff80ba6311>] paging_init+0x46/0x4c
 [<ffffffff80b9dda5>] setup_arch+0x3c3/0x44e
 [<ffffffff80b978be>] start_kernel+0x6f/0x2c7
 [<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3

This happens because node 0 is not online, but the node state in
mm/page_alloc.c has node 0 set.

        nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
                [N_POSSIBLE] = NODE_MASK_ALL,
                [N_ONLINE] = { { [0] = 1UL } },

So we need to clear node_online_map before initializing the memory.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
f34b439f34 x86: CPA: avoid double checking of alias ranges
When the CPA code is called with an virtual address in the range of
the direct mapping or the high alias then we do not need to run
through the alias check for this range.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
af96e4438a x86: CPA no alias checking for _NX
NX settings are not required to be consistent across alias mappings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
31eedd823c x86: zap invalid and unused pmds in early boot
The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting
from __START_KERNEL_map. The kernel itself only needs _text to _end
mapped in the high alias. On relocatible kernels the ASM setup code
adjusts the compile time created high mappings to the relocation. This
creates invalid pmd entries for negative offsets:

0xffffffff80000000 -> pmd entry: ffffffffff2001e3
It points outside of the physical address space and is marked present.

This starts at the virtual address __START_KERNEL_map and goes up to
the point where the first valid physical address (0x0) is mapped.

Zap the mappings before _text and after _end right away in early
boot. This removes also the invalid entries.

Furthermore it simplifies the range check for high aliases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Thomas Gleixner
c31c7d4844 x86: CPA, fix alias checks
c_p_a() did not discover all aliases correctly. (such as when called
on vmalloc()-ed areas or ioremap()-ed areas)

Push the alias checks to the lower, physical level and consistently
discover all aliases that might exist: the low direct mappings and
the high linear kernel-text mappings (on 64-bit).

Thanks to Andi Kleen for pointing out that this was buggy.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-18 20:54:14 +01:00
Ingo Molnar
f8d8406bcb x86: cpa, fix out of date comment
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-14 23:30:21 +01:00
Thomas Gleixner
69b1415e93 x86: cpa: ensure page alignment
the cpa API is page aligned - warn about any weird alignments.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-14 23:30:20 +01:00
Harvey Harrison
7bfeab9af9 x86: include proper prototypes for rodata_test
extern should not appear in C files.  Also, the definitions
do not match the prototype currently, not sure what way you
want to go with this, I've switched the prototype to return
int, but I can see going to the void return as well.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-14 23:30:20 +01:00
Adrian Bunk
cae30f8270 x86: make dump_pagetable() static
dump_pagetable() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-14 23:30:19 +01:00
Andi Kleen
5d3c8b21e2 x86: CPA: fix gbpages support in try_preserve_large_page
[ mingo@elte.hu: while gbpages cannot be enabled on mainline currently,
  keep the code uptodate and this fix is easy enough. ]

Use correct page sizes and masks for GB pages in try_preserve_large_page()

This prevents a boot hang on a GB capable system with CONFIG_DIRECT_GBPAGES
enabled.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-13 16:20:35 +01:00
Jeremy Fitzhardinge
37cc8d7f96 x86/early_ioremap: don't assume we're using swapper_pg_dir
At the early stages of boot, before the kernel pagetable has been
fully initialized, a Xen kernel will still be running off the
Xen-provided pagetables rather than swapper_pg_dir[].  Therefore,
readback cr3 to determine the base of the pagetable rather than
assuming swapper_pg_dir[].

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Tested-by: Jody Belka <knew-linux@pimb.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-13 16:20:35 +01:00
Thomas Gleixner
81772fea41 x86: remove over noisy debug printk
pageattr-test.c contains a noisy debug printk that people reported.
The condition under which it prints (randomly tapping into a mem_map[]
hole and not being able to c_p_a() there) is valid behavior and not
interesting to report.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-11 11:24:24 -08:00
Thomas Gleixner
fac8493960 x86: cpa, strict range check in try_preserve_large_page()
Right now, we check only the first 4k page for static required protections.
This does not take overlapping regions into account. So we might end up
setting the wrong permissions/protections for other parts of this large page.

This can be optimized further, but correctness is the important part.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09 23:24:09 +01:00
Thomas Gleixner
eb5b5f024c x86: cpa, use page pool
Switch the split page code to use the page pool. We do this
unconditionally to avoid different behaviour with and without
DEBUG_PAGEALLOC enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09 23:24:09 +01:00
Thomas Gleixner
76ebd0548d x86: introduce page pool in cpa
DEBUG_PAGEALLOC was not possible on 64-bit due to its early-bootup
hardcoded reliance on PSE pages, and the unrobustness of the runtime
splitup of large pages. The splitup ended in recursive calls to
alloc_pages() when a page for a pte split was requested.

Avoid the recursion with a preallocated page pool, which is used to
split up large mappings and gets refilled in the return path of
kernel_map_pages after the split has been done. The size of the page
pool is adjusted to the available memory.

This part just implements the page pool and the initialization w/o
using it yet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09 23:24:09 +01:00
Ian Campbell
b6fbb669c8 x86: fix early_ioremap pagetable ops
Some important parts of f6df72e71e got
dropped along the way, reintroduce them.

Only affects paravirt guests.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:09 +01:00
Ian Campbell
551889a6e2 x86: construct 32-bit boot time page tables in native format.
Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled
kernel are in PAE format.

early_ioremap is updated to use the standard page table accessors.

Clear any mappings beyond max_low_pfn from the boot page tables in
native_pagetable_setup_start because the initial mappings can extend
beyond the range of physical memory and into the vmalloc area.

Derived from patches by Eric Biederman and H. Peter Anvin.

[ jeremy@goop.org: PAE swapper_pg_dir needs to be page-sized fix ]

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:09 +01:00
Thomas Gleixner
bfc734b246 x86: avoid unused variable warning in mm/init_64.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09 23:24:09 +01:00
Harvey Harrison
da7bfc50f5 x86: sparse warnings in pageattr.c
Adjust the definition of lookup_address to take an unsigned long
level argument.  Adjust callers in xen/mmu.c that pass in a
dummy variable.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-09 23:24:08 +01:00
Martin Schwidefsky
2f569afd9c CONFIG_HIGHPTE vs. sub-page page tables.
Background: I've implemented 1K/2K page tables for s390.  These sub-page
page tables are required to properly support the s390 virtualization
instruction with KVM.  The SIE instruction requires that the page tables
have 256 page table entries (pte) followed by 256 page status table entries
(pgste).  The pgstes are only required if the process is using the SIE
instruction.  The pgstes are updated by the hardware and by the hypervisor
for a number of reasons, one of them is dirty and reference bit tracking.
To avoid wasting memory the standard pte table allocation should return
1K/2K (31/64 bit) and 2K/4K if the process is using SIE.

Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
the s390 version for pte_alloc_one cannot return a pointer to a struct
page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
cannot return a pointer to a pte either, since that would require more than
32 bit for the return value of pte_alloc_one (and the pte * would not be
accessible since its not kmapped).

Solution: The only solution I found to this dilemma is a new typedef: a
pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
later patch.  For everybody else it will be a (struct page *).  The
additional problem with the initialization of the ptl lock and the
NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
a destructor pgtable_page_dtor.  The page table allocation and free
functions need to call these two whenever a page table page is allocated or
freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
 To get the pgtable_t back from a pmd entry that has been installed with
pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
call in free_pte_range and apply_to_pte_range.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:42 -08:00
Bernhard Walle
72a7fe3967 Introduce flags for reserve_bootmem()
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.

This patch:

Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past.  This is to avoid conflicts.

Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:25 -08:00
Ingo Molnar
58d5d0d8dd x86: fix deadlock, make pgd_lock irq-safe
lockdep just caught this one:

=================================
[ INFO: inconsistent lock state ]
2.6.24 #38
---------------------------------
inconsistent {in-softirq-W} -> {softirq-on-W} usage.
swapper/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (pgd_lock){-+..}, at: [<ffffffff8022a9ea>] mm_init+0x1da/0x250
{in-softirq-W} state was registered at:
  [<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 394559
hardirqs last  enabled at (394559): [<ffffffff80267f0a>] get_page_from_freelist+0x30a/0x4c0
hardirqs last disabled at (394558): [<ffffffff80267d25>] get_page_from_freelist+0x125/0x4c0
softirqs last  enabled at (393952): [<ffffffff80232f8e>] __do_softirq+0xce/0xe0
softirqs last disabled at (393945): [<ffffffff8020c57c>] call_softirq+0x1c/0x30

other info that might help us debug this:
no locks held by swapper/1.

stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24 #38

Call Trace:
 [<ffffffff8024e1fb>] print_usage_bug+0x18b/0x190
 [<ffffffff8024f55d>] mark_lock+0x53d/0x560
 [<ffffffff8024fffa>] __lock_acquire+0x3ca/0xed0
 [<ffffffff80250ba8>] lock_acquire+0xa8/0xe0
 [<ffffffff8022a9ea>] ? mm_init+0x1da/0x250
 [<ffffffff809bcd10>] _spin_lock+0x30/0x70
 [<ffffffff8022a9ea>] mm_init+0x1da/0x250
 [<ffffffff8022aa99>] mm_alloc+0x39/0x50
 [<ffffffff8028b95a>] bprm_mm_init+0x2a/0x1a0
 [<ffffffff8028d12b>] do_execve+0x7b/0x220
 [<ffffffff80209776>] sys_execve+0x46/0x70
 [<ffffffff8020c214>] kernel_execve+0x64/0xd0
 [<ffffffff8020901e>] ? _stext+0x1e/0x20
 [<ffffffff802090ba>] init_post+0x9a/0xf0
 [<ffffffff809bc5f6>] ? trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff8024f75a>] ? trace_hardirqs_on+0xba/0xd0
 [<ffffffff8020c1a8>] ? child_rip+0xa/0x12
 [<ffffffff8020bcbc>] ? restore_args+0x0/0x44
 [<ffffffff8020c19e>] ? child_rip+0x0/0x12

turns out that pgd_lock has been used on 64-bit x86 in an irq-unsafe
way for almost two years, since commit 8c914cb704.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-06 22:39:45 +01:00
Ingo Molnar
971a52d66a x86: delay CPA self-test and repeat it
delay the CPA self-test so that any impact (corruption) of
user-space pagetables can be triggered. Repeat the test
every 30 seconds.

this would have prevented the bug fixed by 8cb2a7c1e9,
at its source.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-06 22:39:45 +01:00
Arjan van de Ven
cc842b82cc x86: remove suprious ifdefs from pageattr.c
The .rodata section really should just be read only; the config option
is there to make breaking up the 2Mb page an option (so people whos machines
give more performance for the 2Mb case can opt to do so).
But when the page gets split anyway, this is no longer an issue, so
clean up the code and remove the ifdefs

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-06 22:39:45 +01:00
Arjan van de Ven
984bb80d94 x86: mark the .rodata section also NX
The .rodata section shouldn't just be read-only,
but also non-executable. This is free since we've broken
up the 2MB page already anyway.

also update test_nx to check for this.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-06 22:39:45 +01:00
Ingo Molnar
2d684cd6d9 x86: remove X2 workaround
With the spurious handler fix, the X2 does not lock up anymore.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-06 22:39:44 +01:00
Thomas Gleixner
d8b57bb700 x86: make spurious fault handler aware of large mappings
In very rare cases, on certain CPUs, we could end up in the spurious
fault handler and ignore a large pud/pmd mapping. The resulting pte
pointer points into the mapped physical space and dereferencing it
will fault recursively.

Make the code aware of large mappings and do the permission check
on the pmd/pud entry, when a large pud/pmd mapping is detected.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-06 22:39:43 +01:00
Hugh Dickins
8cb2a7c1e9 stop c_p_a corrupting the pds
When change_page_attr splits a large page on x86_32 (without PAE), it is
currently corrupting every process's page directory: fix that by removing
the thinko which passes down a physical instead of a virtual address.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 14:37:14 -08:00
Benjamin Herrenschmidt
5e5419734c add mm argument to pte/pmd/pud/pgd_free
(with Martin Schwidefsky <schwidefsky@de.ibm.com>)

The pgd/pud/pmd/pte page table allocation functions get a mm_struct pointer as
first argument.  The free functions do not get the mm_struct argument.  This
is 1) asymmetrical and 2) to do mm related page table allocations the mm
argument is needed on the free function as well.

[kamalesh@linux.vnet.ibm.com: i386 fix]
[akpm@linux-foundation.org: coding-syle fixes]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:18 -08:00
Thomas Gleixner
7b610eec7a x86: cpa, micro-optimization
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:10 +01:00
Ingo Molnar
87f7f8fe32 x86: cpa, clean up code flow
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:10 +01:00
Ingo Molnar
beaff6333b x86: cpa, eliminate CPA_ enum
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:09 +01:00
Ingo Molnar
9df84993cb x86: cpa, cleanups
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:09 +01:00
Andi Kleen
f07333fd14 x86: implement gbpages support in change_page_attr()
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:09 +01:00
Andi Kleen
b536022227 x86: support gbpages in pagetable dump
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:09 +01:00
Andi Kleen
c2f71ee214 x86: add gbpages support to lookup_address
[ tglx@linutronix.de: fix bootup crash on sparse mappings. ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:09 +01:00
Andi Kleen
d4f71f7969 x86: switch direct mapping setup over to set_pte
Use set_pte() for setting up the 2MB pages in the direct mapping.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:09 +01:00
Thomas Gleixner
7bfb72e847 x86: fix page-present check in cpa_flush_range
pte_present() might return true for PROT_NONE mappings.
Explicitely check the present bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:08 +01:00
Ingo Molnar
6ce9fc17d9 x86: remove cpa warning
this race is legit and can happen on SMP systems.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:08 +01:00
Andi Kleen
bde1965ce8 x86: remove now unused clear_kernel_mapping
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:08 +01:00
Thomas Gleixner
64f351d197 x86: cpa selftest, skip non present entries
pud and pmd entries in the RAM area might be marked as non present.
Do not try to modify them in the selftest.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:08 +01:00
Thomas Gleixner
07cf89c05f x86: CPA fix pagetable split
Move the readout of the large entry into the spinlock section to
prevent an unlikely but possible race.

Mark the pmd/pud entry present after the split. We preserved the
non present bit in the new split mapping.

Remove the stale gfp_flags double initialization.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:08 +01:00
Andi Kleen
31422c51e0 x86: rename LARGE_PAGE_SIZE to PMD_PAGE_SIZE
Fix up all users.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:08 +01:00
Thomas Gleixner
9a14aefc1d x86: cpa, fix lookup_address
lookup_address() returns a wrong level and a wrong pointer to a non
existing pte, when pmd or pud entries are marked !present. This
happens for example due to boot time mapping of GART into the low
memory space.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:07 +01:00
Ingo Molnar
34508f66b6 x86: AMD Athlon X2 hard hang fix
An Athlon 64 X2 test system showed hard hangs shortly after marking
the kernel text read-only, if we tried to preserve largepages and
changed the PSE entry from RW to RO. The pagetable code itself is
correct, it's the CPU that locked up hard (and not even the NMI
watchdog could punch through that hard hang).

So be conservative and always do splitups - like we did in the past.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:07 +01:00
Thomas Gleixner
65e074dffa x86: cpa, preserve large pages if possible
When CPA is called on a range which fits into a large page mapping,
avoid to split the page when:

1) There is no change of attributes
2) The range to change is a complete large mapping

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:07 +01:00
Thomas Gleixner
f4ae5da0e8 x86: cpa, check if we changed anything and tlb flushing is necessary
Flush tlbs only when there was a real change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:07 +01:00
Thomas Gleixner
72e458dfa6 x86: introduce struct cpa_data
The number of arguments which need to be transported is increasing
and we want to add flush optimizations and large page preserving.

Create struct cpa data and pass a pointer instead of increasing the
number of arguments further.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:07 +01:00
Andi Kleen
6bb8383beb x86: cpa, only flush the cache if the caching attributes have changed
We only need to flush the caches in cpa() if the the caching attributes
have changed. Otherwise only flush the TLBs.

This checks the PAT bits too although they are currently not used by
the kernel.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:06 +01:00
Thomas Gleixner
331e406588 x86: CPA return early when requested feature is not available
Mask out the not supported bits (e.g. NX). If the clr/set masks
are empty after the mask return without changing anything.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:06 +01:00
Thomas Gleixner
f56d005d30 x86: no CPA on iounmap
When an ioremap is unmapped, do not change the page attributes. There might
be another mapping of the same physical address. PAT might detect a conflicting
mapping attribute for no good reason. The mapping is removed anyway.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:05 +01:00
Thomas Gleixner
75ab43bfce x86: ioremap remove the range check of cpa
Now that cpa works on non-direct mappings as well, we can safely
remove the range check in ioremap_change_attr().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:05 +01:00
Thomas Gleixner
e66aadbe6c x86: simplify __ioremap
Remove tons of castings which make the code hard to read.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:05 +01:00
Thomas Gleixner
63c1dcf4bc x86: CPA use the existing pfn in split as well
When splitting large pages, we ge the pfn from the existing entry
instead of calculating it ourself.

This removes the last remaining range restriction of the cpa code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:48:05 +01:00
Arjan van de Ven
626c2c9d06 x86: use the pfn from the page when change its attributes
When changing the attributes of a pte, we should use the PFN from the
existing PTE rather than going through hoops calculating what we think
it might have been; this is both fragile and totally unneeded. It also
makes it more hairy to call any of these functions on non-direct maps
for no good reason whatsover.

With this change, __change_page_attr() no longer takes a pfn as argument,
which simplifies all the callers.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@tglx.de>
2008-02-04 16:48:05 +01:00
Arjan van de Ven
cc0f21bbc1 x86: teach the static_protection function about high mappings
Right now, enforcing that the high mapping of the kernel text doesn't
get the NX bit is done deep in the guts of CPA, rather than in the
static_protection() function that enforces all other per-arch sanity
checks.

This patch moves this sanity check into the central static_protection()
function instead, and makes it apply ONLY to the kernel text, not to all
other areas in the high mapping.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:05 +01:00
Jeremy Fitzhardinge
a67ad9c9f8 x86: revert "defer cr3 reload when doing pud_clear()"
Revert "defer cr3 reload when doing pud_clear()" since I'm going to
replace it.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:02 +01:00
Jeremy Fitzhardinge
e618c9579c x86: unify PAE/non-PAE pgd_ctor
The constructors for PAE and non-PAE pgd_ctors are more or less
identical, and can be made into the same function.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:48:02 +01:00
H. Peter Anvin
f832ff18e8 x86: use _ASM_EXTABLE macro in arch/x86/mm/init_32.c
Use the _ASM_EXTABLE macro from <asm/asm.h>, instead of open-coding
__ex_table entires in arch/x86/mm/init_32.c.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:47:58 +01:00
Harvey Harrison
cf89ec924d x86: reduce ifdef sections in fault.c
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:47:56 +01:00
Yinghai Lu
6118f76fb7 x86: print out node_data addr and bootmap_start addr
print out node_data addr and bootmap_start addr.

helpful for debugging early crashes on high-end NUMA systems.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:47:56 +01:00
Thomas Gleixner
b50516fc20 x86: CPA remove bogus NX clear
In split_large_page we clear the NX bit for the new split ptes, but we
need to preserve the original setting of it for the split ptes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-04 16:47:55 +01:00
Ingo Molnar
38cb47ba01 x86: relax RAM check in ioremap()
Kevin Winchester reported the loss of direct rendering, due to:

[    0.588184] agpgart: Detected AGP bridge 0
[    0.588184] agpgart: unable to get memory for graphics translation table.
[    0.588184] agpgart: agp_backend_initialize() failed.
[    0.588207] agpgart-amd64: probe of 0000:00:00.0 failed with error -12

and bisected it down to:

  commit 266b9f8727
  Author: Thomas Gleixner <tglx@linutronix.de>
  Date:   Wed Jan 30 13:34:06 2008 +0100

      x86: fix ioremap RAM check

this check was too strict and caused an ioremap() failure.

the problem is due to the somewhat unclean way of how the GART code
reserves a memory range for its aperture, and how it utilizes it
later on.

Allow RAM pages to be ioremap()-ed too, as long as they are reserved.

Bisected-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-04 16:47:54 +01:00
Rafael J. Wysocki
a6eb84bc1e suspend: cleanup reference to swsusp_pg_dir[]
swsusp_pg_dir[] is used for suspend, but not for hibernation.
clean-up the ifdefs which worked by accident, while implying the opposite.
Delete the __nosavedata, which also implied the opposite.

Some day we may optimize CONFIG_ACPI_SLEEP to build minimal kernels
for just hibernate or just suspend but not both,
but today isn't that day.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-02-01 18:30:59 -05:00
Harvey Harrison
93809be8b1 x86: fixes for lookup_address args
Signedness mismatches in level argument.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:49:43 +01:00
Yinghai Lu
9347e0b0ce x86: remove unneeded round_up
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:49:42 +01:00
Yinghai Lu
24a5da73f4 x86_64: make bootmap_start page align v6
boot oopses when a system has 64 or 128 GB of RAM installed:

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>]  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60  EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS:  0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack:  ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
 [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
 [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12
 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
 [<ffffffff8020ccee>] ? child_rip+0x0/0x12

Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
 RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss are corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008a485 - 0000000000091484]
  bootmap [0000000000d406f4 -  0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100

setup_node_bootmem() makes NODE_DATA cacheline aligned,
and bootmap is page-aligned.

the patch updates find_e820_area() to make sure we can meet
the alignment constraints.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:49:41 +01:00
Yinghai Lu
25eff8d4cd x86_64: add debug name for early_res
helps debugging problems in this rather murky area of code.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-01 17:49:41 +01:00
Ingo Molnar
5aa0508508 x86: uninline __pte_free_tlb() and __pmd_free_tlb()
this also removes an include file dependency.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31 22:05:48 +01:00
Huang, Ying
1fd6a53ddc x86: early_ioremap_reset fix 2
This patch fixes a bug of early_ioremap_reset(), which had been fixed
before by "convert the boot time page table to the kernels native
format" patch. But that patch has been reverted now.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31 22:05:45 +01:00
Huang, Ying
5827040df0 x86: change_page_attr_clear fix
This patch replaces __change_page_attr_set_clr() with
change_page_attr_set_clr() in change_page_attr_clear() to flush the
TLB/cache properly.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-31 22:05:43 +01:00
Yinghai Lu
afadcd788f x86: fix nodemap_size according to nodeid bits
memnode.map is s16 array because of nodeid is 16 bit now.

so need to increase the nodemap_size according to that bits.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:12 +01:00
Yinghai Lu
9198715763 x86: fix overlap between pagetable with bss section
one early crash on one 8 node 256g machine:

Command line: console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug apic=debug acpi.debug_level=0x0000000f pci=routeirq ip=dhcp load_ramdisk=1 ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
 BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000dffe0000 (usable)
 BIOS-e820: 00000000dffe0000 - 00000000dffee000 (ACPI data)
 BIOS-e820: 00000000dffee000 - 00000000dffff050 (ACPI NVS)
 BIOS-e820: 00000000dffff050 - 00000000e0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ff700000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000004020000000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map = 67239936
Kernel panic - not syncing: Duplicated early reservation d40000-e42000

Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3

Call Trace:
 [<ffffffff80221545>] lapic_get_maxlvt+0x0/0x10
 [<ffffffff80221657>] clear_local_APIC+0x5/0xcf
 [<ffffffff80221726>] disable_local_APIC+0x5/0x17
 [<ffffffff8021fe16>] smp_send_stop+0x46/0x4c
 [<ffffffff80235293>] panic+0x94/0x13e
 [<ffffffff80bc3b03>] sctp_eps_proc_init+0x12/0x34
 [<ffffffff80b9f1c5>] reserve_early+0x30/0x6c
 [<ffffffff80803925>] init_memory_mapping+0x2cd/0x2dc
 [<ffffffff80b9dc01>] setup_arch+0x21f/0x44e
 [<ffffffff80b978be>] start_kernel+0x6f/0x2c7
 [<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3

it turns out there is overlap between pgtable and bss...

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

need to round up table_start to PAGE_SIZE.

also make the panic more informative.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:12 +01:00
Joachim Deguara
bb4a1d644a x86: add PCI IDs to k8topology_64.c
This just adds the PCI IDs of AMD's family 10h and 11h CPU's northbridges to
k8topology discovery.

Signed-off-by: Joachim Deguara <joachim.deguara@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:12 +01:00
Jeremy Fitzhardinge
f6df72e71e x86: fix early_ioremap pagetable ops
Put appropriate pagetable update hooks in so that paravirt knows
what's going on in there.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Jeremy Fitzhardinge
e3ed910db2 x86: use the same pgd_list for PAE and 64-bit
Use a standard list threaded through page->lru for maintaining the pgd
list on PAE.  This is the same as 64-bit, and seems saner than using a
non-standard list via page->index.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Jeremy Fitzhardinge
6194ba6ff6 x86: don't special-case pmd allocations as much
In x86 PAE mode, stop treating pmds as a special case.  Previously
they were always allocated and freed with the pgd.  The modifies the
code to be the same as 64-bit mode, where they are allocated on
demand.

This is a step on the way to unifying 32/64-bit pagetable allocation
as much as possible.

There is a complicating wart, however.  When you install a new
reference to a pmd in the pgd, the processor isn't guaranteed to see
it unless you reload cr3.  Since reloading cr3 also has the
side-effect of flushing the tlb, this is an expense that we want to
avoid whereever possible.

This patch simply avoids reloading cr3 unless the update is to the
current pagetable.  Later patches will optimise this further.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Harvey Harrison
fd40d6e318 x86: shrink some ifdefs in fault.c
The change from current to tsk in do_page_fault is safe as
this is set at the very beginning of the function.

Removes a likely() annotation from the 64-bit version, this
could have instead been added to 32-bit.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Jeremy Fitzhardinge
5b727a3b01 x86: ignore spurious faults
When changing a kernel page from RO->RW, it's OK to leave stale TLB
entries around, since doing a global flush is expensive and they pose
no security problem.  They can, however, generate a spurious fault,
which we should catch and simply return from (which will have the
side-effect of reloading the TLB to the current PTE).

This can occur when running under Xen, because it frequently changes
kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it avoids
doing a global TLB flush after changing page permissions.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Harvey Harrison
b406ac61e9 x86: remove nx_enabled from fault.c
On !PAE 32-bit, _PAGE_NX will be 0, making is_prefetch always
return early.  The test is sufficient on PAE as __supported_pte_mask
is updated in the same places as nx_enabled in init_32.c which also
takes disable_nx into account.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Harvey Harrison
c61e211d99 x86: unify fault_32|64.c
Unify includes in moved fault.c.

Modify Makefiles to pick up unified file.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:11 +01:00
Harvey Harrison
f8c2ee224d x86: unify fault_32|64.c with ifdefs
Elimination of these ifdefs can be done in a unified file.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Harvey Harrison
1156e098c5 x86: unify fault_32|64.c by ifdef'd function bodies
It's about time to get on with unifying these files, elimination
of the ugly ifdefs can occur in the unified file.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Ingo Molnar
d7d119d777 x86: arch/x86/mm/init_32.c printk fixes
printk fixes. NOP in terms of functionality, but strings got
a bit larger due to the KERN_ markers that were added.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Ingo Molnar
8550eb9982 x86: arch/x86/mm/init_32.c cleanup
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Ingo Molnar
10f22dde55 x86: arch/x86/mm/init_64.c printk fixes
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Thomas Gleixner
14a62c34b1 x86: unify ioremap
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Harvey Harrison
19f0dda91e x86: unify page fault oops printing
This changes the oops dumping format for page faults to
be similar between X86_32 and 64.

This is the first user of printk_address on X86_32.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Harvey Harrison
b3279c7fd7 x86: introduce show_fault_oops helper to fault_32|64.c
This will help when unifying the oops dumping code on 32/64
bit.  No functional changes.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:10 +01:00
Harvey Harrison
35f3266ffb x86: add is_errata100 helper to fault_32|64.c
Further towards unifying these files, add another helper
in same spirit as is_errata93.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:09 +01:00
Harvey Harrison
29caf2f98c x86: add is_f00f_bug helper to fault_32|64.c
Further towards unifying these files, add another helper
in same spirit as is_errata93.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:09 +01:00
Thomas Gleixner
0879750f5d x86: cpa cleanup the 64-bit alias math
Cleanup the address calculations, which are necessary to identify the
high/low alias mappings of the kernel on 64 bit machines. Instead of
calling __pa/__va back and forth, calculate the physical address once
and base the other calculations on it. Add understandable constants so
we can use the already available within() helper. Also add comments,
which help mere mortals to understand what this code does.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 13:34:09 +01:00
Ingo Molnar
86f03989d9 x86: cpa: fix the self-test
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:34:09 +01:00