Commit graph

65276 commits

Author SHA1 Message Date
Michael Ellerman
d385366a9b [POWERPC] Simplify rtas_change_msi() error semantics
Currently rtas_change_msi() returns either the error code from RTAS, or if
the RTAS call succeeded the number of irqs that were configured by RTAS.
This makes checking the return value more complicated than it needs to be.

Instead, have rtas_change_msi() check that the number of irqs configured by
RTAS is equal to what we requested - and return an error otherwise. This makes
the return semantics match the usual 0 for success, something else for error.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-03 09:11:39 +10:00
Michael Ellerman
fcbe8090a0 [POWERPC] Simplify error logic in rtas_setup_msi_irqs()
rtas_setup_msi_irqs() doesn't need to call teardown() itself, the
generic code will do this for us as long as we return a non-zero
value.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-03 09:11:35 +10:00
Michael Ellerman
d9303d662f [POWERPC] Simplify error logic in u3msi_setup_msi_irqs()
u3msi_setup_msi_irqs() doesn't need to call teardown() itself,
the generic code will do this for us as long as we return a non
zero value.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-03 09:11:32 +10:00
Michael Ellerman
db220b234d [POWERPC] Make sure to of_node_get() the result of pci_device_to_OF_node()
pci_device_to_OF_node() returns the device node attached to a PCI device,
but doesn't actually grab a reference - we need to do it ourselves.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-03 09:11:29 +10:00
Arnd Bergmann
a35e370cfd [POWERPC] Move embedded6xx into multiplatform
The various embedded 6xx systems can easily coexist in one kernel
together with the other 6xx based systems, so there is no strict
reason to keep them separate.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-03 09:11:25 +10:00
Linas Vepstas
a7fb7ea76e [POWERPC] pseries: device node status can be "ok" or "okay"
It seems that some versions of firmware will report a device
node status as the string "okay". As we are not expecting this
string, the device node will be ignored by the EEH subsystem.
Which means EEH will not be enabled.

When EEH is not enabled, PCI errors will be converted into
Machine Check exceptions, and we'll have a very unhappy system.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-02 22:09:56 +10:00
Emil Medve
576e393e74 [POWERPC] Fix build errors when BLOCK=n
These are the symptom error messages:

  CC      arch/powerpc/kernel/setup_32.o
In file included from include/linux/blkdev.h:17,
                 from include/linux/ide.h:13,
                 from arch/powerpc/kernel/setup_32.c:13:
include/linux/bsg.h:67: warning: 'struct request_queue' declared inside parameter list
include/linux/bsg.h:67: warning: its scope is only this definition or declaration, which is probably not what you want
include/linux/bsg.h:71: warning: 'struct request_queue' declared inside parameter list
In file included from arch/powerpc/kernel/setup_32.c:13:
include/linux/ide.h:857: error: field 'wrq' has incomplete type

  CC      arch/powerpc/kernel/ppc_ksyms.o
In file included from include/linux/blkdev.h:17,
                 from include/linux/ide.h:13,
                 from arch/powerpc/kernel/ppc_ksyms.c:15:
include/linux/bsg.h:67: warning: 'struct request_queue' declared inside parameter list
include/linux/bsg.h:67: warning: its scope is only this definition or declaration, which is probably not what you want
include/linux/bsg.h:71: warning: 'struct request_queue' declared inside parameter list
In file included from arch/powerpc/kernel/ppc_ksyms.c:15:
include/linux/ide.h:857: error: field 'wrq' has incomplete type

The fix tries to use the smallest scope CONFIG_* symbols that will fix
the build problem.  In this case <linux/ide.h> needs to be included
only if IDE=y or IDE=m were selected.  Also, ppc_ide_md is needed only
if BLK_DEV_IDE=y or BLK_DEV_IDE=m

Moved the EXPORT_SYMBOL(ppc_ide_md) from ppc_ksysms.c next to its
declaration in setup_32.c which made <linux/ide.h> not needed. With
<linux/ide.h> gone from ppc_ksyms.c, <asm/cacheflush.h> is needed to
address the following warnings and errors:

  CC      arch/powerpc/kernel/ppc_ksyms.o
arch/powerpc/kernel/ppc_ksyms.c:122: error: '__flush_icache_range' undeclared here (not in a function)
arch/powerpc/kernel/ppc_ksyms.c:122: warning: type defaults to 'int' in declaration of '__flush_icache_range'
arch/powerpc/kernel/ppc_ksyms.c:123: error: 'flush_dcache_range' undeclared here (not in a function)
arch/powerpc/kernel/ppc_ksyms.c:123: warning: type defaults to 'int' in declaration of 'flush_dcache_range'

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:22 +10:00
Benjamin Herrenschmidt
4c2a54b09b [POWERPC] Fix platinumfb framebuffer
Current kernels have a non-working platinumfb due to some resource
management issues.  This fixes it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:22 +10:00
Satyam Sharma
8fd7675c09 [POWERPC] Avoid pointless WARN_ON(irqs_disabled()) from panic codepath
> ------------[ cut here ]------------
> Badness at arch/powerpc/kernel/smp.c:202

comes when smp_call_function_map() has been called with irqs disabled,
which is illegal. However, there is a special case, the panic() codepath,
when we do not want to warn about this -- warning at that time is pointless
anyway, and only serves to scroll away the *real* cause of the panic and
distracts from the real bug.

* So let's extract the WARN_ON() from smp_call_function_map() into all its
  callers -- smp_call_function() and smp_call_function_single()

* Also, introduce another caller of smp_call_function_map(), namely
  __smp_call_function() (and make smp_call_function() a wrapper over this)
  which does *not* warn about disabled irqs

* Use this __smp_call_function() from the panic codepath's smp_send_stop()

We also end having to move code of smp_send_stop() below the definition
of __smp_call_function().

Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:22 +10:00
Olof Johansson
17b5ee04c0 [POWERPC] Support setting affinity for U3/U4 MSI sources
Hook up affinity-setting for U3/U4 MSI interrupt sources.

Tested on Quad G5 with myri10ge.

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:22 +10:00
Arnd Bergmann
3164cccdc0 [POWERPC] add Kconfig option for optimizing for cell
Since the PPE on cell is an in-order core, it suffers significantly
from wrong instruction scheduling.  This adds a Kconfig option that
enables passing -mtune=cell to gcc in order to generate object
code that runs well on cell.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:22 +10:00
Jeremy Kerr
fb8299ed31 [POWERPC] cell: Don't cast the result of of_get_property()
The cast to u32 * isn't required, of_get_property returns a void *.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:22 +10:00
Tony Breeds
408e83a682 [POWERPC] Convert define_machine(mpc885_ads) to C99 initializer syntax
Make the define_machine() block for mpc885_ads more greppable and
consistent with other examples in tree.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:22 +10:00
Grant Likely
85498ae87c [POWERPC] mpc5200: Add cuimage support for mpc5200 boards
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:22 +10:00
Grant Likely
ad25a4cca7 [POWERPC] mpc8349: Add linux,network-index to ethernet nodes in device tree
cuImage needs to know the logical index of the ethernet devices in order
to assign mac addresses.  This adds the needed properties.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
CC: Scott Wood <scottwood@freescale.com>
CC: Kumar Gala <galak@kernel.crashing.org>
CC: Timur Tabi <timur@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:22 +10:00
Meelis Roos
3e61576b2a [POWERPC] Fix ppc kernels after build-id addition
This patch fixes arch/ppc kernels, at least for prep subarch, after
build-id addition.  Without this, kernels were 3 times the size and
bootloader refused to load them.  Now they are back to normal again.

Tested only with Roland McGrath's "Use LDFLAGS_MODULE only for .ko
links" patch applied - boots and works fine.

Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:21 +10:00
Robert P. J. Day
20b31b53ea [POWERPC] Prevent direct inclusion of <asm/rwsem.h>.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:21 +10:00
Aristeu Rozanski
555ddbb4e2 [POWERPC] adbhid: Enable KEY_FN key reporting
When a Fn key is used in combination with another key in ADB keyboards
it will generate a Fn event and then a second event that can be a
different key than pressed (Fn + F1 for instance can generate Fn +
brightness down if it's configured like that).  This enables the
reporting of the Fn key to the input system.

As Fn is a dead key for most purposes, it's useful to report it so
applications can make use of it.  One example is apple_mouse
(https://jake.ruivo.org/uinputd/trunk/apple_mouse/) that emulates the
second and third keys using a combination of keyboard keys and the mouse
button.  Other applications may use the KEY_FN as a modifier as well.
I've been updating and using this patch for months without problems.

Signed-off-by: Aristeu Rozanski <aris@ruivo.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:21 +10:00
Mike Frysinger
fc624eae32 [POWERPC] Use __attribute__ in asm-powerpc
Pretty much everyone uses "__attribute__" or "attribute", no one uses
"__attribute".  This tweaks the three places in asm-powerpc where this
comes up.  While only asm-powerpc/types.h is interesting (for
userspace), I did asm-powerpc/processor.h as well for consistency.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:21 +10:00
Dale Farnsworth
9b41fcb0eb [POWERPC] Add Marvell mv64x60 udbg putc/getc functions
Commit 69331af, "Fixes and cleanups for earlyprintk aka boot console",
resulted in printk output prior to the initialization of the mpsc
console driver not being printed.  That commit causes the mpsc's
CON_PRINTBUFFER flag to be cleared since udbg should have printed
the previous output.

I guess we can no longer ignore udbg. :)

This patch provides udbg_putc() and udbg_getc() functions for the
Marvell mv64x60 chips. These functions are enabled if an mv64x60
port is to be used as the console as determined from the device tree.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Acked-by: Mark A. Greer <mgreer@mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:21 +10:00
David Woodhouse
e4533b243e [POWERPC] Optionally use new device number for pmac_zilog
This adds the option for the pmac_zilog driver to use the major/minor
numbers recently allocated specifically for it (/dev/ttyPZn) instead
of the /dev/ttySn numbers.  The advantage of doing this is that it
allows the pmac_zilog and 8250 drivers to coexist.  The disadvantage
of doing this is that it is a user-visible ABI change and it will
break existing working setups on powermacs, and could be confusing to
users.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-22 14:49:21 +10:00
David Gibson
c4d5e37547 [POWERPC] Cleanups for physmap_of.c (v2)
This patch includes a whole batch of smallish cleanups for
drivers/mtd/physmap_of.c.

	- A bunch of uneeded #includes are removed
	- We switch to the modern linux/of.h etc. in place of
asm/prom.h
	- Use some helper macros to avoid some ugly inline #ifdefs
	- A few lines of unreachable code are removed
	- A number of indentation / line-wrapping fixes
	- More consistent use of kernel idioms such as if (!p) instead
of if (p == NULL)
	- Clarify some printk()s and other informative strings.
	- parse_obsolete_partitions() now returns 0 if no partition
information is found, instead of returning -ENOENT which the caller
had to handle specially.
	- (the big one) Despite the name, this driver really has
nothing to do with drivers/mtd/physmap.c.  The fact that the flash
chips must be physically direct mapped is a constrant, but doesn't
really say anything about the actual purpose of this driver, which is
to instantiate MTD devices based on information from the device tree.
Therefore the physmap name is replaced everywhere within the file with
"of_flash".  The file itself and the Kconfig option is not renamed for
now (so that the diff is actually a diff).  That can come later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2007-09-20 07:37:16 -05:00
Valentine Barshak
bd0076cc33 [POWERPC] 4xx: Fix Sequoia MAL0 and EMAC dts entries.
According to PowerPC 440EPx documentation,
MAL0 is comprised of four channels (two transmit and two receive).
Each channel is dedicated to one of two EMAC cores.
This patch fixes Sequoia DTS MAL0 entry and EMAC entries,
assigning correct channel numbers to EMACs.

Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2007-09-20 07:37:14 -05:00
Valentine Barshak
e52f5677bf [POWERPC] 4xx: Fix Bamboo MAL0 dts entry.
According to PowerPC 440EP documentation,
MAL0 consists of 6 channels (4 transmit channels and 2 receive channels)
This patch fixes Bamboo DTS MAL0 "num-rx-chans" entry.

Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2007-09-20 07:37:03 -05:00
Valentine Barshak
472b5b43be [POWERPC] Add 64-bit resources support to pci_iomap
The patch adds support for the 64-bit resources to the PCI
iomap code.

Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2007-09-20 07:36:52 -05:00
Josh Boyer
9a474fff4f [POWERPC] Update PowerPC 4xx entry in MAINTAINERS
Add myself as PowerPC 4xx maintainer and list the git tree

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm..com>
2007-09-19 21:19:07 -05:00
Hollis Blanchard
70dea47da1 [POWERPC] 4xx: Implement udbg_getc() for 440
Implement udbg_getc() for 440, which fixes xmon input.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2007-09-19 21:13:17 -05:00
Josh Boyer
504ca43e5e [POWERPC] 4xx: Convert Seqouia flash mappings to new binding
A new binding for flash devices was recently introduced.  This updates the
Sequoia DTS to use the new binding and enabled MTD in the defconfig.

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Stefan Roese <sr@denx.de>
2007-09-19 21:13:16 -05:00
Josh Boyer
bf07f32d43 [POWERPC] 4xx: Convert Walnut flash mappings to new binding
A new binding for flash devices was recently introduced.  This updates the
Walnut DTS to use the new binding.

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
2007-09-19 21:13:16 -05:00
Josh Boyer
8d9ae994d8 [POWERPC] Make partitions optional in physmap_of
The latest physmap_of driver has a small error where it will fail the probe
with:

physmap-flash: probe of fff00000.small-flas failed with error -2

if there are no partition subnodes in the device tree and the old style binding
is not used.  Since partition definitions are optional, the probe should still
succeed.

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
2007-09-19 21:13:16 -05:00
Josh Boyer
658e817019 [POWERPC] cuimage for Bamboo board
Add a cuboot wrapper for the Bamboo board.  Additionally, we enable MAC
address fixups for both cuboot and treeboot.

This also removes some obsoleted linker declarations that have been
moved into ops.h

Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
2007-09-19 21:13:16 -05:00
Paul Mackerras
0ce49a3945 Merge branch 'linux-2.6' 2007-09-20 10:09:27 +10:00
Linus Torvalds
a88a8eff1e Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] cpu-bugs64.c: GCC 3.3 constraint workaround
  [MIPS] DEC: Initialise ioasic_ssr_lock
2007-09-19 11:45:32 -07:00
Linus Torvalds
c39c06b961 Merge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
  V4L/DVB (6173a): Documentation: Remove reference to dead "cpia_pp=" boot-time option
  Revert "V4L/DVB (6173a): Documentation: Remove reference to dead "cpia_pp=" boot-time option"
2007-09-19 11:41:15 -07:00
Linus Torvalds
a78feb7c8a Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
  [XFS] Avoid replaying inode buffer initialisation log items if on-disk version is newer.
  [XFS] Ensure file size updates have been completed before writing inode to disk.
  [XFS] On-demand reaping of the MRU cache
2007-09-19 11:40:13 -07:00
Linus Torvalds
91fe7d7cdd Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SUNSAB]: Fix several bugs.
2007-09-19 11:39:39 -07:00
Linus Torvalds
d56c5c414c Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6:
  ide: remove unused variables from drivers/ide/ppc/pmac.c
  ide: ST320413A has the same problem as ST340823A
2007-09-19 11:39:10 -07:00
Linus Torvalds
f15f41383d Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Fix timekeeping on PowerPC 601
  [POWERPC] Don't expose clock vDSO functions when CPU has no timebase
  [POWERPC] spusched: Fix null pointer dereference in find_victim
2007-09-19 11:38:25 -07:00
Linus Torvalds
dbe3ed1c07 x86-64: page faults from user mode are always user faults
Randy Dunlap noticed an interesting "crashme" behaviour on his dual
Prescott Xeon setup, where he gets page faults with the error code
having a zero "user" bit, but the register state points back to user
mode.

This may be a CPU microcode buglet triggered by some strange instruction
pattern that crashme generates, and loading a microcode update seems to
possibly have fixed it.

Regardless, we really should trust the register state more than the
error code, since it's really the register state that determines whether
we can actually send a signal, or whether we're in kernel mode and need
to oops/kill the process in the case of a page fault.

Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:37:14 -07:00
Maciej W. Rozycki
09abbcffb3 [MIPS] cpu-bugs64.c: GCC 3.3 constraint workaround
Add a workaround to address warnings generated on the "n" constraint by
GCC 3.3 and below.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-09-19 19:33:14 +01:00
Maciej W. Rozycki
6883599943 [MIPS] DEC: Initialise ioasic_ssr_lock
Fix the definition of the ioasic_ssr_lock spinlock to include a proper 
initialisation.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-09-19 19:33:14 +01:00
Dmitry Torokhov
4f01a757e7 Driver core: fix deprectated sysfs structure for nested class devices
Nested class devices used to have 'device' symlink point to a real
(physical) device instead of a parent class device.  When converting
subsystems to struct device we need to keep doing what class devices did if
CONFIG_SYSFS_DEPRECATED is Y, otherwise parts of udev break.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg KH <greg@kroah.com>
Tested-by: Anssi Hannula <anssi.hannula@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Jeff Dike
508a92741a uml: fix irqstack crash
This patch fixes a crash caused by an interrupt coming in when an IRQ stack
is being torn down.  When this happens, handle_signal will loop, setting up
the IRQ stack again because the tearing down had finished, and handling
whatever signals had come in.

However, to_irq_stack returns a mask of pending signals to be handled, plus
bit zero is set if the IRQ stack was already active, and thus shouldn't be
torn down.  This causes a problem because when handle_signal goes around
the loop, sig will be zero, and to_irq_stack will duly set bit zero in the
returned mask, faking handle_signal into believing that it shouldn't tear
down the IRQ stack and return thread_info pointers back to their original
values.

This will eventually cause a crash, as the IRQ stack thread_info will
continue pointing to the original task_struct and an interrupt will look
into it after it has been freed.

The fix is to stop passing a signal number into to_irq_stack.  Rather, the
pending signals mask is initialized beforehand with the bit for sig already
set.  References to sig in to_irq_stack can be replaced with references to
the mask.

[akpm@linux-foundation.org: use UL]
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Lee Schermerhorn
480eccf9ae Fix NUMA Memory Policy Reference Counting
This patch proposes fixes to the reference counting of memory policy in the
page allocation paths and in show_numa_map().  Extracted from my "Memory
Policy Cleanups and Enhancements" series as stand-alone.

Shared policy lookup [shmem] has always added a reference to the policy,
but this was never unrefed after page allocation or after formatting the
numa map data.

Default system policy should not require additional ref counting, nor
should the current task's task policy.  However, show_numa_map() calls
get_vma_policy() to examine what may be [likely is] another task's policy.
The latter case needs protection against freeing of the policy.

This patch adds a reference count to a mempolicy returned by
get_vma_policy() when the policy is a vma policy or another task's
mempolicy.  Again, shared policy is already reference counted on lookup.  A
matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
shared and vma policies, and in show_numa_map() for shared and another
task's mempolicy.  We can call __mpol_free() directly, saving an admittedly
inexpensive inline NULL test, because we know we have a non-NULL policy.

Handling policy ref counts for hugepages is a bit trickier.
huge_zonelist() returns a zone list that might come from a shared or vma
'BIND policy.  In this case, we should hold the reference until after the
huge page allocation in dequeue_hugepage().  The patch modifies
huge_zonelist() to return a pointer to the mempolicy if it needs to be
unref'd after allocation.

Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:

		w/o patch	w/ refcount patch
	    Avg	  Std Devn	   Avg	  Std Devn
Real:	 100.59	    0.38	 100.63	    0.43
User:	1209.60	    0.37	1209.91	    0.31
System:   81.52	    0.42	  81.64	    0.34

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Pavel Emelyanov
28f300d236 Fix user namespace exiting OOPs
It turned out, that the user namespace is released during the do_exit() in
exit_task_namespaces(), but the struct user_struct is released only during the
put_task_struct(), i.e.  MUCH later.

On debug kernels with poisoned slabs this will cause the oops in
uid_hash_remove() because the head of the chain, which resides inside the
struct user_namespace, will be already freed and poisoned.

Since the uid hash itself is required only when someone can search it, i.e.
when the namespace is alive, we can safely unhash all the user_struct-s from
it during the namespace exiting.  The subsequent free_uid() will complete the
user_struct destruction.

For example simple program

   #include <sched.h>

   char stack[2 * 1024 * 1024];

   int f(void *foo)
   {
   	return 0;
   }

   int main(void)
   {
   	clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
   	return 0;
   }

run on kernel with CONFIG_USER_NS turned on will oops the
kernel immediately.

This was spotted during OpenVZ kernel testing.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: "Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Pavel Emelyanov
735de2230f Convert uid hash to hlist
Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
list_heads, thus occupying twice as much place as it could.  Convert it to
hlist_heads.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Matthias Kaehlcke
d8a4821dca kernel/user.c: Use list_for_each_entry instead of list_for_each
kernel/user.c: Convert list_for_each to list_for_each_entry in
uid_hash_find()

Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Eric Sandeen
ef2b02d3e6 ext34: ensure do_split leaves enough free space in both blocks
The do_split() function for htree dir blocks is intended to split a leaf
block to make room for a new entry.  It sorts the entries in the original
block by hash value, then moves the last half of the entries to the new
block - without accounting for how much space this actually moves.  (IOW,
it moves half of the entry *count* not half of the entry *space*).  If by
chance we have both large & small entries, and we move only the smallest
entries, and we have a large new entry to insert, we may not have created
enough space for it.

The patch below stores each record size when calculating the dx_map, and
then walks the hash-sorted dx_map, calculating how many entries must be
moved to more evenly split the existing entries between the old block and
the new block, guaranteeing enough space for the new entry.

The dx_map "offs" member is reduced to u16 so that the overall map size
does not change - it is temporarily stored at the end of the new block, and
if it grows too large it may be overwritten.  By making offs and size both
u16, we won't grow the map size.

Also add a few comments to the functions involved.

This fixes the testcase reported by hooanon05@yahoo.co.jp on the
linux-ext4 list, "ext3 dir_index causes an error"

Thanks to Andreas Dilger for discussing the problem & solution with me.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Tested-by: Junjiro Okajima <hooanon05@yahoo.co.jp>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Andrew Morton
e42601973b disable sys_timerfd() for 2.6.23
There is still some confusion and disagreement over what this interface should
actually do.  So it is best that we disable it in 2.6.23 until we get that
fully sorted out.

(sys_timerfd() was present in 2.6.22 but it was apparently broken, so here we
assume that nobody is using it yet).

Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Davide Libenzi <davidel@xmailserver.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Alexey Dobriyan
49af7ee181 nfs: fix oops re sysctls and V4 support
NFS unregisters sysctls only if V4 support is compiled in.  However, sysctl
table is not V4 specific, so unregister it always.

Steps to reproduce:

	[build nfs.ko with CONFIG_NFS_V4=n]
	modrobe nfs
	rmmod nfs
	ls /proc/sys

Unable to handle kernel paging request at ffffffff880661c0 RIP:
 [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
PGD 203067 PUD 207063 PMD 7e216067 PTE 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: lockd nfs_acl sunrpc
Pid: 3335, comm: ls Not tainted 2.6.23-rc3-bloat #2
RIP: 0010:[<ffffffff802af8e3>]  [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
RSP: 0018:ffff81007fd93e78  EFLAGS: 00010286
RAX: ffffffff880661c0 RBX: ffffffff80466370 RCX: ffffffff880661c0
RDX: 00000000000014c0 RSI: ffff81007f3ad020 RDI: ffff81007efd8b40
RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff802a8570 R12: ffffffff880661c0
R13: ffff81007e219640 R14: ffff81007efd8b40 R15: ffff81007ded7280
FS:  00002ba25ef03060(0000) GS:ffff81007ff81258(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff880661c0 CR3: 000000007dfaf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 3335, threadinfo ffff81007fd92000, task ffff81007d8a0000)
Stack:  ffff81007f3ad150 ffffffff80283f30 ffff81007fd93f48 ffff81007efd8b40
 ffff81007ee00440 0000000422222222 0000000200035593 ffffffff88037e9a
 2222222222222222 ffffffff80466500 ffff81007e416400 ffff81007e219640
Call Trace:
 [<ffffffff80283f30>] filldir+0x0/0xf0
 [<ffffffff80283f30>] filldir+0x0/0xf0
 [<ffffffff802840c7>] vfs_readdir+0xa7/0xc0
 [<ffffffff80284376>] sys_getdents+0x96/0xe0
 [<ffffffff8020bb3e>] system_call+0x7e/0x83

Code: 41 8b 14 24 85 d2 74 dc 49 8b 44 24 08 48 85 c0 74 e7 49 3b
RIP  [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
 RSP <ffff81007fd93e78>
CR2: ffffffff880661c0
Kernel panic - not syncing: Fatal exception

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00