Commit graph

581 commits

Author SHA1 Message Date
H Hartley Sweeten
72c3368856 nodemask.h: remove macro any_online_node
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'.  Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
Cc: Geoff Levand <geoffrey.levand@am.sony.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:31 -08:00
Linus Torvalds
ac0f6f927d Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (100 commits)
  ARM: Eliminate decompressor -Dstatic= PIC hack
  ARM: 5958/1: ARM: U300: fix inverted clk round rate
  ARM: 5956/1: misplaced parentheses
  ARM: 5955/1: ep93xx: move timer defines into core.c and document
  ARM: 5954/1: ep93xx: move gpio interrupt support to gpio.c
  ARM: 5953/1: ep93xx: fix broken build of clock.c
  ARM: 5952/1: ARM: MM: Add ARM_L1_CACHE_SHIFT_6 for handle inside each ARCH Kconfig
  ARM: 5949/1: NUC900 add gpio virtual memory map
  ARM: 5948/1: Enable timer0 to time4 clock support for nuc910
  ARM: 5940/2: ARM: MMCI: remove custom DBG macro and printk
  ARM: make_coherent(): fix problems with highpte, part 2
  MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
  ARM: 5945/1: ep93xx: include correct irq.h in core.c
  ARM: 5933/1: amba-pl011: support hardware flow control
  ARM: 5930/1: Add PKMAP area description to memory.txt.
  ARM: 5929/1: Add checks to detect overlap of memory regions.
  ARM: 5928/1: Change type of VMALLOC_END to unsigned long.
  ARM: 5927/1: Make delimiters of DMA area globally visibly.
  ARM: 5926/1: Add "Virtual kernel memory..." printout.
  ARM: 5920/1: OMAP4: Enable L2 Cache
  ...

Fix up trivial conflict in arch/arm/mach-mx25/clock.c
2010-03-01 09:15:15 -08:00
Russell King
4b3073e1c5 MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies.  We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

  On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
  to construct a pointer to the pte again.  Passing a pte_t * is much
  more elegant.  Maybe we might even replace the pte argument with the
  pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

  Passing the ptep in there is exactly what I want.  I want that
  -instead- of the PTE value, because I have issue on some ppc cases,
  for I$/D$ coherency, where set_pte_at() may decide to mask out the
  _PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

  sparc: fix fallout from update_mmu_cache API change

  Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-02-20 16:41:46 +00:00
Thomas Gleixner
3eb93c558a powerpc: Convert tlbivax_lock to raw_spinlock
tlbivax_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-19 14:52:33 +11:00
Thomas Gleixner
6b9c9b8a66 powerpc: Convert native_tlbie_lock to raw_spinlock
native_tlbie_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-19 14:52:30 +11:00
Thomas Gleixner
be833f3371 powerpc: Convert context_lock to raw_spinlock
context_lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-19 14:52:30 +11:00
Benjamin Herrenschmidt
efd0f0f385 Merge commit 'jwb/next' into next 2010-02-18 09:34:38 +11:00
Anton Blanchard
66d99b8834 powerpc: Convert open coded native hashtable bit lock
Now we have real bit locks use them instead of open coding it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-17 14:03:15 +11:00
Benjamin Herrenschmidt
ec144a81ad Merge commit 'origin/master' into next 2010-02-17 10:00:42 +11:00
Stefan Roese
c7b6669812 powerpc/40x: Add support for PPC40x boards with > 512MB SDRAM
This patch adds support for boards with more that 512MByte RAM. Currently
only 512MB of memory are enabled in the DCCR/ICCR real-mode cache
control registers. This patch now enables caching in real-mode for
2GByte.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2010-02-12 07:54:45 -05:00
David Gibson
77058e1adc powerpc: Fix address masking bug in hpte_need_flush()
Commit f71dc176aa 'Make
hpte_need_flush() correctly mask for multiple page sizes' introduced
bug, which is triggered when a kernel with a 64k base page size is run
on a system whose hardware does not 64k hash PTEs.  In this case, we
emulate 64k pages with multiple 4k hash PTEs, however in
hpte_need_flush() we incorrectly only mask the hardware page size from
the address, instead of the logical page size.  This causes things to
go wrong when we later attempt to iterate through the hardware
subpages of the logical page.

This patch corrects the error.  It has been tested on pSeries bare
metal by Michael Neuling.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-10 13:58:06 +11:00
Anton Blanchard
7317ac8711 powerpc: Convert mmu context allocator from idr to ida
We can use the much more lightweight ida allocator since we don't
need the pointer storage idr provides.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-09 13:56:07 +11:00
Thadeu Lima de Souza Cascardo
6c504d4231 powerpc: Fix typo s/leve/level/ in TLB code
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-03 17:39:50 +11:00
Jiri Slaby
4bf936b9e4 powerpc: Use helpers for rlimits
Make sure compiler won't do weird things with limits. E.g. fetching
them twice may return 2 different values after writable limits are
implemented.

I.e. either use rlimit helpers added in
3e10e716ab
or ACCESS_ONCE if not applicable.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-01-15 13:20:08 +11:00
David Gibson
a1128f8f0f powerpc/mm: Fix stupid bug in subpge protection handling
Commit d28513bc7f ("Fix bug in pagetable
cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for
breakage caused by an earlier clean up patch of mine, contains a
stupid bug.  I changed the parameters of the subpage_protection()
function, but failed to update one of the callers.

This patch fixes it, and replaces a void * with a typed pointer so
that the compiler will warn on such an error in future.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:55:44 +11:00
Yang Li
f04b10cddb powerpc/mm: Fix typo of cpumask_clear_cpu()
The function name of cpumask_clear_cpu was not correct. Fortunately
nobody uses that code with hotplug yet :-)

Reported-by: Jin Qing <b24347@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:54:27 +11:00
Sachin P. Sant
5c33991987 powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled.
This time without the funny characters.

Fix following build errors generated with DEBUG=1

cc1: warnings being treated as errors
arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes':
arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int'
arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize':
arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
... SNIP ...

Signed-off-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:54:27 +11:00
Benjamin Herrenschmidt
50891457f1 powerpc/mm: Fix a WARN_ON() with CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_VM
Set need to call __set_pte_at() and not set_pte_at() from __change_page_attr()
since the later will perform checks with CONFIG_DEBUG_VM that aren't suitable
to the way we override an existing PTE. (More specifically, it doesn't let
you write over a present PTE).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18 14:54:26 +11:00
Linus Torvalds
a73611b6aa Merge branch 'next' of git://git.secretlab.ca/git/linux-2.6
* 'next' of git://git.secretlab.ca/git/linux-2.6: (23 commits)
  powerpc: fix up for mmu_mapin_ram api change
  powerpc: wii: allow ioremap within the memory hole
  powerpc: allow ioremap within reserved memory regions
  wii: use both mem1 and mem2 as ram
  wii: bootwrapper: add fixup to calc useable mem2
  powerpc: gamecube/wii: early debugging using usbgecko
  powerpc: reserve fixmap entries for early debug
  powerpc: wii: default config
  powerpc: wii: platform support
  powerpc: wii: hollywood interrupt controller support
  powerpc: broadway processor support
  powerpc: wii: bootwrapper bits
  powerpc: wii: device tree
  powerpc: gamecube: default config
  powerpc: gamecube: platform support
  powerpc: gamecube/wii: flipper interrupt controller support
  powerpc: gamecube/wii: udbg support for usbgecko
  powerpc: gamecube/wii: do not include PCI support
  powerpc: gamecube/wii: declare as non-coherent platforms
  powerpc: gamecube/wii: introduce GAMECUBE_COMMON
  ...

Fix up conflicts in arch/powerpc/mm/fsl_booke_mmu.c.

Hopefully even close to correctly.
2009-12-16 13:26:53 -08:00
Stephen Rothwell
ae4cec4736 powerpc: fix up for mmu_mapin_ram api change
Today's linux-next build (powerpc ppc44x_defconfig) failed like this:

arch/powerpc/mm/pgtable_32.c: In function 'mapin_ram':
arch/powerpc/mm/pgtable_32.c:318: error: too many arguments to function 'mmu_mapin_ram'

Casued by commit de32400dd2 ("wii: use both
mem1 and mem2 as ram").

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-12-14 09:04:24 -07:00
Albert Herranz
c5df7f7751 powerpc: allow ioremap within reserved memory regions
Add a flag to let a platform ioremap memory regions marked as reserved.

This flag will be used later by the Nintendo Wii support code to allow
ioremapping the I/O region sitting between MEM1 and MEM2 and marked
as reserved RAM in the patch "wii: use both mem1 and mem2 as ram".

This will no longer be needed when proper discontig memory support
for 32-bit PowerPC is added to the kernel.

Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-12-12 22:24:32 -07:00
Albert Herranz
de32400dd2 wii: use both mem1 and mem2 as ram
The Nintendo Wii video game console has two discontiguous RAM regions:
- MEM1: 24MB @ 0x00000000
- MEM2: 64MB @ 0x10000000

Unfortunately, the kernel currently does not support discontiguous RAM
memory regions on 32-bit PowerPC platforms.

This patch adds a series of workarounds to allow the use of the second
memory region (MEM2) as RAM by the kernel.
Basically, a single range of memory from the beginning of MEM1 to the
end of MEM2 is reported to the kernel, and a memory reservation is
created for the hole between MEM1 and MEM2.

With this patch the system is able to use all the available RAM and not
just ~27% of it.

This will no longer be needed when proper discontig memory support
for 32-bit PowerPC is added to the kernel.

Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-12-12 22:24:31 -07:00
Joakim Tjernlund
5efab4a02c powerpc/8xx: Invalidate non present TLBs
8xx sometimes need to load a invalid/non-present TLBs in
it DTLB asm handler.

These must be invalidated separaly as linux mm don't.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-09 17:10:35 +11:00
David Gibson
d28513bc7f powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT
Commit a0668cdc15 cleans up the handling
of kmem_caches for allocating various levels of pagetables.
Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to
the latter's cleverly hidden technique of adding some extra allocation
space to the top level page directory to store the extra information
it needs.

Since that extra allocation really doesn't fit into the cleaned up
page directory allocating scheme, this patch alters
CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct
subpage_prot_table as part of the mm_context_t.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-08 15:59:33 +11:00
Benjamin Herrenschmidt
5a7b4193e5 Revert "powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT"
This reverts commit c045256d14.

It breaks build when CONFIG_PPC_SUBPAGE_PROT is not set. I will
commit a fixed version separately

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-02 09:28:35 +11:00
David Gibson
39adfa540f powerpc/mm: Fix bug in gup_hugepd()
Commit a4fe3ce769 introduced a new
get_user_pages() path for hugepages on powerpc.  Unfortunately, there
is a bug in it's loop logic, which can cause it to overrun the end of
the intended region.  This came about by copying the logic from the
normal page path, which assumes the address and end parameters have
been pagesize aligned at the top-level.  Since they're not *hugepage*
size aligned, the simplistic logic could step over the end of the gup
region without triggering the loop end condition.

This patch fixes the bug by using the technique that the normal page
path uses in levels above the lowest to truncate the ending address to
something we know we'll match with.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-27 14:24:30 +11:00
David Gibson
c045256d14 powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT
Commit a0668cdc15 cleans up the handling
of kmem_caches for allocating various levels of pagetables.
Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to
the latter's cleverly hidden technique of adding some extra allocation
space to the top level page directory to store the extra information
it needs.

Since that extra allocation really doesn't fit into the cleaned up
page directory allocating scheme, this patch alters
CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct
subpage_prot_table as part of the mm_context_t.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-27 14:24:29 +11:00
Kumar Gala
8b27f0b61d powerpc/fsl-booke: Rework TLB CAM code
Re-write the code so its more standalone and fixed some issues:
* Bump'd # of CAM entries to 64 to support e500mc
* Make the code handle MAS7 properly
* Use pr_cont instead of creating a string as we go

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-11-20 16:45:33 -06:00
Benjamin Herrenschmidt
0526484aa3 Merge commit 'origin/master' into next 2009-11-12 10:59:04 +11:00
Alexander Graf
e85a47106a Split init_new_context and destroy_context
For KVM we need to allocate a new context id, but don't really care about
all the mm context around it.

So let's split the alloc and destroy functions for the context id, so we can
grab one without allocating an mm context.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:50:25 +11:00
Alexander Graf
4ab79aa801 Export symbols for KVM module
We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.

This patch exports all functions and variables that are required for KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:50:24 +11:00
Benjamin Herrenschmidt
f1167fb318 powerpc/mm: Remove debug context clamping from nohash code
I inadvertently left that debug code enabled, causing the number of
contexts to be clamped to 31 which is going to slow things down on
4xx and just plain breaks 8xx

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:41:59 +11:00
David Gibson
0895ecda79 powerpc/mm: Bring hugepage PTE accessor functions back into sync with normal accessors
The hugepage arch code provides a number of hook functions/macros
which mirror the functionality of various normal page pte access
functions.  Various changes in the normal page accessors (in
particular BenH's recent changes to the handling of lazy icache
flushing and PAGE_EXEC) have caused the hugepage versions to get out
of sync with the originals.  In some cases, this is a bug, at least on
some MMU types.

One of the reasons that some hooks were not identical to the normal
page versions, is that the fact we're dealing with a hugepage needed
to be passed down do use the correct dcache-icache flush function.
This patch makes the main flush_dcache_icache_page() function hugepage
aware (by checking for the PageCompound flag).  That in turn means we
can make set_huge_pte_at() just a call to set_pte_at() bringing it
back into sync.  As a bonus, this lets us remove the
hash_huge_page_do_lazy_icache() function, replacing it with a call to
the hash_page_do_lazy_icache() function it was based on.

Some other hugepage pte access hooks - huge_ptep_get_and_clear() and
huge_ptep_clear_flush() - are not so easily unified, but this patch at
least brings them back into sync with the current versions of the
corresponding normal page functions.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:21:23 +11:00
David Gibson
883a3e5236 powerpc/mm: Split hash MMU specific hugepage code into a new file
This patch separates the parts of hugetlbpage.c which are inherently
specific to the hash MMU into a new hugelbpage-hash64.c file.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:59 +11:00
David Gibson
d1837cba5d powerpc/mm: Cleanup initialization of hugepages on powerpc
This patch simplifies the logic used to initialize hugepages on
powerpc.  The somewhat oddly named set_huge_psize() is renamed to
add_huge_page_size() and now does all necessary verification of
whether it's given a valid hugepage sizes (instead of just some) and
instantiates the generic hstate structure (but no more).

hugetlbpage_init() now steps through the available pagesizes, checks
if they're valid for hugepages by calling add_huge_page_size() and
initializes the kmem_caches for the hugepage pagetables.  This means
we can now eliminate the mmu_huge_psizes array, since we no longer
need to pass the sizing information for the pagetable caches from
set_huge_psize() into hugetlbpage_init()

Determination of the default huge page size is also moved from the
hash code into the general hugepage code.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:58 +11:00
David Gibson
a4fe3ce769 powerpc/mm: Allow more flexible layouts for hugepage pagetables
Currently each available hugepage size uses a slightly different
pagetable layout: that is, the bottem level table of pointers to
hugepages is a different size, and may branch off from the normal page
tables at a different level.  Every hugepage aware path that needs to
walk the pagetables must therefore look up the hugepage size from the
slice info first, and work out the correct way to walk the pagetables
accordingly.  Future hardware is likely to add more possible hugepage
sizes, more layout options and more mess.

This patch, therefore reworks the handling of hugepage pagetables to
reduce this complexity.  In the new scheme, instead of having to
consult the slice mask, pagetable walking code can check a flag in the
PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
and the entry also contains the information (eseentially hugepage
shift) necessary to then interpret that table without recourse to the
slice mask.  This scheme can be extended neatly to handle multiple
levels of self-describing "special" hugepage pagetables, although for
now we assume only one level exists.

This approach means that only the pagetable allocation path needs to
know how the pagetables should be set out.  All other (hugepage)
pagetable walking paths can just interpret the structure as they go.

There already was a flag bit in PGD/PUD/PMD entries for hugepage
directory pointers, but it was only used for debug.  We alter that
flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
pointer (normally it would be 1 since the pointer lies in the linear
mapping).  This means that asm pagetable walking can test for (and
punt on) hugepage pointers with the same test that checks for
unpopulated page directory entries (beq becomes bge), since hugepage
pointers will always be positive, and normal pointers always negative.

While we're at it, we get rid of the confusing (and grep defeating)
#defining of hugepte_shift to be the same thing as mmu_huge_psizes.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:58 +11:00
David Gibson
a0668cdc15 powerpc/mm: Cleanup management of kmem_caches for pagetables
Currently we have a fair bit of rather fiddly code to manage the
various kmem_caches used to store page tables of various levels.  We
generally have two caches holding some combination of PGD, PUD and PMD
tables, plus several more for the special hugepage pagetables.

This patch cleans this all up by taking a different approach.  Rather
than the caches being designated as for PUDs or for hugeptes for 16M
pages, the caches are simply allocated to be a specific size.  Thus
sharing of caches between different types/levels of pagetables happens
naturally.  The pagetable size, where needed, is passed around encoded
in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
pagetable contains 2^n pointers.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:57 +11:00
David Gibson
f71dc176aa powerpc/mm: Make hpte_need_flush() correctly mask for multiple page sizes
Currently, hpte_need_flush() only correctly flushes the given address
for normal pages.  Callers for hugepages are required to mask the
address themselves.

But hpte_need_flush() already looks up the page sizes for its own
reasons, so this is a rather silly imposition on the callers.  This
patch alters it to mask based on the pagesize it has looked up itself,
and removes the awkward masking code in the hugepage caller.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:57 +11:00
Benjamin Herrenschmidt
8d8997f34e powerpc/mm: Fix hang accessing top of vmalloc space
On pSeries, we always force the IO space to be mapped using 4K
pages even with a 64K base page size to cope with some limitations
in the HV interface to some devices.

However, the SLB miss handler code to discriminate between vmalloc
and ioremap space uses a CPU feature section such that the code
is nop'ed out when the processor support large pages non-cachable
mappings.

Thus, we end up always using the ioremap page size for vmalloc
segments on such processors, causing a discrepency between the
segment and the hash table, and thus a hang continously hashing
the page.

It works for the first segment of the vmalloc space since that
segment is "bolted" in by C code correctly, and thankfully we
almost never use the vmalloc space beyond the first segment,
but the new percpu code made the bug happen.

This fixes it by removing the feature section from the assembly,
we now always do the comparison between vmalloc and ioremap.

Signed-off-by; Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-14 16:58:36 +11:00
Rex Feany
e0908085fc powerpc/8xx: Fix regression introduced by cache coherency rewrite
After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.

Signed-off-by: Rex Feany <rfeany@mrv.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:56:30 +10:00
Huang Weiyi
b9eceb2307 powerpc/mm: Remove duplicated #include
Remove duplicated #include('s) in
  arch/powerpc/mm/tlb_low_64e.S

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:31:42 +10:00
KAMEZAWA Hiroyuki
3089aa1b0c kcore: use registerd physmem information
For /proc/kcore, each arch registers its memory range by kclist_add().
In usual,

	- range of physical memory
	- range of vmalloc area
	- text, etc...

are registered but "range of physical memory" has some troubles.  It
doesn't updated at memory hotplug and it tend to include unnecessary
memory holes.  Now, /proc/iomem (kernel/resource.c) includes required
physical memory range information and it's properly updated at memory
hotplug.  Then, it's good to avoid using its own code(duplicating
information) and to rebuild kclist for physical memory based on
/proc/iomem.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki
908eedc616 walk system ram range
Originally, walk_memory_resource() was introduced to traverse all memory
of "System RAM" for detecting memory hotplug/unplug range.  For doing so,
flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
memory hotplug.

But for using other purpose, /proc/kcore, this may includes some firmware
area marked as IORESOURCE_BUSY | IORESOUCE_MEM.  This patch makes the
check strict to find out busy "System RAM".

Note: PPC64 keeps their own walk_memory_resouce(), which walk through
ppc64's lmb informaton.  Because old kclist_add() is called per lmb, this
patch makes no difference in behavior, finally.

And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
Because pfn_valid() just show "there is memmap or not* and cannot be used
for "there is physical memory or not", this function is useful in generic
to scan physical memory range.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki
a0614da88b kcore: register vmalloc area in generic way
For /proc/kcore, vmalloc areas are registered per arch.  But, all of them
registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies
them.  By this.  archs which have no kclist_add() hooks can see vmalloc
area correctly.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki
c30bb2a25f kcore: add kclist types
Presently, kclist_add() only eats start address and size as its arguments.
Considering to make kclist dynamically reconfigulable, it's necessary to
know which kclists are for System RAM and which are not.

This patch add kclist types as
  KCORE_RAM
  KCORE_VMALLOC
  KCORE_TEXT
  KCORE_OTHER

This "type" is used in a patch following this for detecting KCORE_RAM.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
Geert Uytterhoeven
cc013a8890 arches: drop superfluous casts in nr_free_pages() callers
Commit 9617729941 ("Drop free_pages()")
modified nr_free_pages() to return 'unsigned long' instead of 'unsigned
int'.  This made the casts to 'unsigned long' in most callers superfluous,
so remove them.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <zankel@tensilica.com>
Cc: Michal Simek <monstr@monstr.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:34 -07:00
Ingo Molnar
cdd6c482c9 perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:28:04 +02:00
Linus Torvalds
723e9db7a4 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
  powerpc/nvram: Enable use Generic NVRAM driver for different size chips
  powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
  powerpc/ps3: Workaround for flash memory I/O error
  powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
  powerpc/perf_counters: Reduce stack usage of power_check_constraints
  powerpc: Fix bug where perf_counters breaks oprofile
  powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
  powerpc/irq: Improve nanodoc
  powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
  powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
  powerpc/book3e: Add missing page sizes
  powerpc/pseries: Fix to handle slb resize across migration
  powerpc/powermac: Thermal control turns system off too eagerly
  powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
  powerpc/405ex: support cuImage via included dtb
  powerpc/405ex: provide necessary fixup function to support cuImage
  powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
  powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
  powerpc/44x: Update Arches defconfig
  powerpc/44x: Update Arches dts
  ...

Fix up conflicts in drivers/char/agp/uninorth-agp.c
2009-09-15 09:51:09 -07:00
Linus Torvalds
ada3fa1505 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
  powerpc64: convert to dynamic percpu allocator
  sparc64: use embedding percpu first chunk allocator
  percpu: kill lpage first chunk allocator
  x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
  percpu: update embedding first chunk allocator to handle sparse units
  percpu: use group information to allocate vmap areas sparsely
  vmalloc: implement pcpu_get_vm_areas()
  vmalloc: separate out insert_vmalloc_vm()
  percpu: add chunk->base_addr
  percpu: add pcpu_unit_offsets[]
  percpu: introduce pcpu_alloc_info and pcpu_group_info
  percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
  percpu: add @align to pcpu_fc_alloc_fn_t
  percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
  percpu: drop @static_size from first chunk allocators
  percpu: generalize first chunk allocator selection
  percpu: build first chunk allocators selectively
  percpu: rename 4k first chunk allocator to page
  percpu: improve boot messages
  percpu: fix pcpu_reclaim() locking
  ...

Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-15 09:39:44 -07:00
Brian King
46db2f86a3 powerpc/pseries: Fix to handle slb resize across migration
The SLB can change sizes across a live migration, which was not
being handled, resulting in possible machine crashes during
migration if migrating to a machine which has a smaller max SLB
size than the source machine. Fix this by first reducing the
SLB size to the minimum possible value, which is 32, prior to
migration. Then during the device tree update which occurs after
migration, we make the call to ensure the SLB gets updated. Also
add the slb_size to the lparcfg output so that the migration
tools can check to make sure the kernel has this capability
before allowing migration in scenarios where the SLB size will change.

BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid
      breaking ppc32 build

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-02 16:19:01 +10:00