Commit graph

180400 commits

Author SHA1 Message Date
Andy Adamson
e95e60daee nfs41: remove uneeded checks in callback processing
All callback operations have arguments to decode and require processing.
The preprocess_nfs4X_op functions catch unsupported or illegal ops so
decode_args and process_op pointers are always non NULL.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:57 -05:00
Andy Adamson
b92b301900 nfs41: directly encode back channel error
Skip all other processing when error is encountered.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:56 -05:00
Andy Adamson
31d2b4356b nfs41: fix wrong error on callback header xdr overflow
Set NFS4ERR_RESOURCE as CB_COMPOUND status and do not return an op on
decode_op_hdr or encode_op_hdr buffer overflow.

NFS4ERR_RESOURCE is correct for v4.0. Will fix the return for v4.1 along with
all the other NFS4ERR_RESOURCE errors in a later patch.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:56 -05:00
Mike Sager
72ce2b3c06 nfs41: Process callback's referring call list
If a CB_SEQUENCE referring call triple matches a slot table entry, the
client is still waiting for a response to the original request.  In this
case, return NFS4ERR_DELAY as the response to the callback.

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:55 -05:00
Mike Sager
a7989c3e47 nfs41: Check slot table for referring calls
Traverse a list of referring calls and look for a session/slot/seq number
match.

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:55 -05:00
Mike Sager
8e0d46e138 nfs41: Adjust max cache response size value
For the CREATE_SESSION attribute ca_maxresponsesize_cached, calculate
the value based on the rpc reply header size plus the maximum nfs compound
reply size.

Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:54 -05:00
H Hartley Sweeten
5a51f13adf xprtsock.c: make bc_{malloc/free} static
xprtsock.c: make bc_{malloc/free} static

The server backchannel buf_alloc and buf_free methods should
be static since they are not used outside this file.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:53 -05:00
Chuck Lever
7a88efe976 SUNRPC: Don't display zero scope IDs
A zero scope ID means that it wasn't set, so we don't need to append
it to presentation format addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:53 -05:00
Chuck Lever
f1a89a1182 SUNRPC: Deprecate support for site-local addresses
RFC 3879 "formally deprecates" site-local IPv6 addresses.  We
interpret that to mean that the scope ID is ignored for all but
link-local addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:52 -05:00
Jeff Layton
97cefcc6d0 nfs: handle NFSv2 -EKEYEXPIRED returns from RPC layer appropriately
Add a wrapper around rpc_call_sync that handles -EKEYEXPIRED errors from
the RPC layer as it would an -EJUKEBOX error if NFSv2 had such a thing.
Also, add a handler for that error for async calls that makes it
resubmit the RPC on -EKEYEXPIRED.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:52 -05:00
Jeff Layton
b68d69b8c6 nfs: handle NFSv3 -EKEYEXPIRED errors as we would -EJUKEBOX
We're using -EKEYEXPIRED to indicate that a krb5 credcache contains an
expired ticket and that we should have the NFS layer retry the RPC call
instead of returning an error back to the caller. Handle this as we
would an -EJUKEBOX error return.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:51 -05:00
Jeff Layton
2c6434888c nfs4: handle -EKEYEXPIRED errors from RPC layer
If a KRB5 TGT ticket expires, we don't want to return an error
immediatel. If someone has a long running job and just forgets to run
"kinit" in time then this will make it fail.

Instead, we want to treat this situation as we would NFS4ERR_DELAY and
retry the upcall after delaying a bit with an exponential backoff.

This patch just makes any place that would handle NFS4ERR_DELAY also
handle -EKEYEXPIRED the same way. In the future, we may want to be more
sophisticated however and handle hard vs. soft mounts differently, or
specify some upper limit on how long we'll wait for a new TGT to be
acquired.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:50 -05:00
Jeff Layton
dc5ddce956 sunrpc: parse and return errors reported by gssd
The kernel currently ignores any error code sent by gssd and always
considers it to be -EACCES. In order to better handle the situation of
an expired KRB5 TGT, the kernel needs to be able to parse and deal with
the errors that gssd sends. Aside from -EACCES the only error we care
about is -EKEYEXPIRED, which we're using to indicate that the upper
layers should retry the call a little later.

To maintain backward compatibility with older gssd's, any error other
than -EKEYEXPIRED is interpreted as -EACCES.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:50 -05:00
Linus Torvalds
ac73fddfc5 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: fix some lockdep issues between md and sysfs.
  md: fix 'degraded' calculation when starting a reshape.
2010-02-09 17:01:26 -08:00
NeilBrown
ef286f6fa6 md: fix some lockdep issues between md and sysfs.
======
This fix is related to
    http://bugzilla.kernel.org/show_bug.cgi?id=15142
but does not address that exact issue.
======

sysfs does like attributes being removed while they are being accessed
(i.e. read or written) and waits for the access to complete.

As accessing some md attributes takes the same lock that is held while
removing those attributes a deadlock can occur.

This patch addresses 3 issues in md that could lead to this deadlock.

Two relate to calling flush_scheduled_work while the lock is held.
This is probably a bad idea in general and as we use schedule_work to
delete various sysfs objects it is particularly bad.

In one case flush_scheduled_work is called from md_alloc (called by
md_probe) called from do_md_run which holds the lock.  This call is
only present to ensure that ->gendisk is set.  However we can be sure
that gendisk is always set (though possibly we couldn't when that code
was originally written.  This is because do_md_run is called in three
different contexts:
  1/ from md_ioctl.  This requires that md_open has succeeded, and it
     fails if ->gendisk is not set.
  2/ from writing a sysfs attribute.  This can only happen if the
     mddev has been registered in sysfs which happens in md_alloc
     after ->gendisk has been set.
  3/ from autorun_array which is only called by autorun_devices, which
     checks for ->gendisk to be set before calling autorun_array.
So the call to md_probe in do_md_run can be removed, and the check on
->gendisk can also go.


In the other case flush_scheduled_work is being called in do_md_stop,
purportedly to wait for all md_delayed_delete calls (which delete the
component rdevs) to complete.  However there really isn't any need to
wait for them - they have already been disconnected in all important
ways.

The third issue is that raid5->stop() removes some attribute names
while the lock is held.  There is already some infrastructure in place
to delay attribute removal until after the lock is released (using
schedule_work).  So extend that infrastructure to remove the
raid5_attrs_group.

This does not address all lockdep issues related to the sysfs
"s_active" lock.  The rest can be address by splitting that lockdep
context between symlinks and non-symlinks which hopefully will happen.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-10 11:26:09 +11:00
Linus Torvalds
3af9cf11b6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix p9_client_destroy unconditional calling v9fs_put_trans
  9p: fix memory leak in v9fs_parse_options()
  9p: Fix the kernel crash on a failed mount
  9p: fix option parsing
  9p: Include fsync support for 9p client
  net/9p: fix statsize inside twstat
  net/9p: fail when user specifies a transport which we can't find
  net/9p: fix virtio transport to correctly update status on connect
2010-02-09 11:19:06 -08:00
NeilBrown
9eb07c2592 md: fix 'degraded' calculation when starting a reshape.
This code was written long ago when it was not possible to
reshape a degraded array.  Now it is so the current level of
degraded-ness needs to be taken in to account.  Also newly addded
devices should only reduce degradedness if they are deemed to be
in-sync.

In particular, if you convert a RAID5 to a RAID6, and increase the
number of devices at the same time, then the 5->6 conversion will
make the array degraded so the current code will produce a wrong
value for 'degraded' - "-1" to be precise.

If the reshape runs to completion end_reshape will calculate a correct
new value for 'degraded', but if a device fails during the reshape an
incorrect decision might be made based on the incorrect value of
"degraded".

This patch is suitable for 2.6.32-stable and if they are still open,
2.6.31-stable and 2.6.30-stable as well.

Cc: stable@kernel.org
Reported-by: Michael Evans <mjevans1983@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-09 16:34:29 +11:00
Linus Torvalds
deb0c98c7f Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux:
  Revert "nfsd4: fix error return when pseudoroot missing"
2010-02-08 17:08:01 -08:00
Eric Van Hensbergen
8781ff9495 9p: fix p9_client_destroy unconditional calling v9fs_put_trans
restructure client create code to handle error cases better and
only cleanup initialized portions of the stack.

Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 18:18:34 -06:00
Linus Torvalds
a5f28ae4df Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2/cluster: Make o2net connect messages KERN_NOTICE
  ocfs2/dlm: Fix printing of lockname
  ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map()
  ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node
  ocfs2: Plugs race between the dc thread and an unlock ast message
  ocfs2: Remove overzealous BUG_ON during blocked lock processing
  ocfs2: Do not downconvert if the lock level is already compatible
  ocfs2: Prevent a livelock in dlmglue
  ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast
  ocfs2: Use compat_ptr in reflink_arguments.
  ocfs2/dlm: Handle EAGAIN for compatibility - v2
  ocfs2: Add parenthesis to wrap the check for O_DIRECT.
  ocfs2: Only bug out when page size is larger than cluster size.
  ocfs2: Fix memory overflow in cow_by_page.
  ocfs2/dlm: Print more messages during lock migration
  ocfs2/dlm: Ignore LVBs of locks in the Blocked list
  ocfs2/trivial: Remove trailing whitespaces
  ocfs2: fix a misleading variable name
  ocfs2: Sync max_inline_data_with_xattr from tools.
  ocfs2: Fix refcnt leak on ocfs2_fast_follow_link() error path
2010-02-08 16:05:50 -08:00
Eric Van Hensbergen
bf2d29c64d 9p: fix memory leak in v9fs_parse_options()
If match_strdup() fail this function exits without freeing the options string.

Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Sigend-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 17:59:34 -06:00
Aneesh Kumar K.V
fb786100f7 9p: Fix the kernel crash on a failed mount
The patch fix the crash repoted below

[   15.149907] BUG: unable to handle kernel NULL pointer dereference at 00000001
[   15.150806] IP: [<c140b886>] p9_virtio_close+0x18/0x24
.....
....
[   15.150806] Call Trace:
[   15.150806]  [<c1408e78>] ? p9_client_destroy+0x3f/0x163
[   15.150806]  [<c1409342>] ? p9_client_create+0x25f/0x270
[   15.150806]  [<c1063b72>] ? trace_hardirqs_on+0xb/0xd
[   15.150806]  [<c11ed4e8>] ? match_token+0x64/0x164
[   15.150806]  [<c1175e8d>] ? v9fs_session_init+0x2f1/0x3c8
[   15.150806]  [<c109cfc9>] ? kmem_cache_alloc+0x98/0xb8
[   15.150806]  [<c1063b72>] ? trace_hardirqs_on+0xb/0xd
[   15.150806]  [<c1173dd1>] ? v9fs_get_sb+0x47/0x1e8
[   15.150806]  [<c1173dea>] ? v9fs_get_sb+0x60/0x1e8
[   15.150806]  [<c10a2e77>] ? vfs_kern_mount+0x81/0x11a
[   15.150806]  [<c10a2f55>] ? do_kern_mount+0x33/0xbe
[   15.150806]  [<c10b40b9>] ? do_mount+0x654/0x6b3
[   15.150806]  [<c1038949>] ? do_page_fault+0x0/0x284
[   15.150806]  [<c10b28ec>] ? copy_mount_options+0x73/0xd2
[   15.150806]  [<c10b4179>] ? sys_mount+0x61/0x94
[   15.150806]  [<c14284e9>] ? syscall_call+0x7/0xb
....
[   15.203562] ---[ end trace 1dd159357709eb4b ]---
[

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 17:25:33 -06:00
Eric Van Hensbergen
d8c8a9e365 9p: fix option parsing
Options pointer is being moved before calling kfree() which seems
to cause problems.  This uses a separate pointer to track and free
original allocation.

Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>w
2010-02-08 16:23:23 -06:00
M. Mohan Kumar
7a4439c406 9p: Include fsync support for 9p client
Implement the fsync in the client side by marking stat field values to 'don't touch' so that server may 
interpret it as a request to guarantee that the contents of the associated file are committed to stable 
storage before the Rwstat message is returned.

Without this patch, calling fsync on a 9p file results in "Invalid argument" error. Please check the attached 
C program.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> 
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> 
Acked-by: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 15:36:48 -06:00
Linus Torvalds
8defcaa6ba Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix ondemand to not request targets outside policy limits
  [CPUFREQ] Fix use after free of struct powernow_k8_data
  [CPUFREQ] fix default value for ondemand governor
2010-02-08 13:33:31 -08:00
Sunil Mushran
6efd806634 ocfs2/cluster: Make o2net connect messages KERN_NOTICE
Connect and disconnect messages are more than informational as they are required
during root cause analysis for failures. This patch changes them from KERN_INFO
to KERN_NOTICE.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Faseh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-08 13:02:28 -08:00
Sunil Mushran
86a06abab0 ocfs2/dlm: Fix printing of lockname
The debug call printing the name of the lock resource was chopping
off the last character. This patch fixes the problem.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-08 13:01:31 -08:00
J. Bruce Fields
260c64d235 Revert "nfsd4: fix error return when pseudoroot missing"
Commit f39bde24b2 fixed the error return from PUTROOTFH in the
case where there is no pseudofilesystem.

This is really a case we shouldn't hit on a correctly configured server:
in the absence of a root filehandle, there's no point accepting version
4 NFS rpc calls at all.

But the shared responsibility between kernel and userspace here means
the kernel on its own can't eliminate the possiblity of this happening.
And we have indeed gotten this wrong in distro's, so new client-side
mount code that attempts to negotiate v4 by default first has to work
around this case.

Therefore when commit f39bde24b2 arrived at roughly the same
time as the new v4-default mount code, which explicitly checked only for
the previous error, the result was previously fine mounts suddenly
failing.

We'll fix both sides for now: revert the error change, and make the
client-side mount workaround more robust.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-02-08 15:25:23 -05:00
Eric Van Hensbergen
9d6939dac7 net/9p: fix statsize inside twstat
stat structures contain a size prefix.  In our twstat messages
we were including the size of the size prefix in the prefix, which is not
what the protocol wants, and Inferno servers would complain.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 14:13:30 -06:00
Eric Van Hensbergen
349d3bb878 net/9p: fail when user specifies a transport which we can't find
If the user specifies a transport and we can't find it, we failed back
to the default trainsport silently.  This patch will make the code
complain more loudly and return an error code.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 14:13:30 -06:00
Eric Van Hensbergen
562ada6120 net/9p: fix virtio transport to correctly update status on connect
The 9p virtio transport was not updating its connection status correctly
preventing it from being able to mount the server.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 14:13:30 -06:00
Linus Torvalds
08c4f1b096 Merge branch 'v4l_for_linus' of git://linuxtv.org/fixes
* 'v4l_for_linus' of git://linuxtv.org/fixes:
  V4L/DVB: dvb-core: fix initialization of feeds list in demux filter
  V4L/DVB: dvb_demux: Don't use vmalloc at dvb_dmx_swfilter_packet
  V4L/DVB: Fix the risk of an oops at dvb_dmx_release
2010-02-08 11:07:10 -08:00
Linus Torvalds
2b1f5c3ac3 Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Invalidate dcache before enabling it
2010-02-08 10:10:36 -08:00
Linus Torvalds
9d2bc1a4cc Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/pseries: Fix kexec regression caused by CPPR tracking
2010-02-08 10:10:18 -08:00
Linus Torvalds
8bd73803e1 Merge branch 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Remove superfluous setup_frame_reg call
  sh: Don't continue unwinding across interrupts
  sh: Setup frame pointer in handle_exception path
  sh: Correct the offset of the return address in ret_from_exception
  usb: r8a66597-hcd: Fix up spinlock recursion in root hub polling.
  usb: r8a66597-hcd: Flush the D-cache for the pipe-in transfer buffers.
2010-02-08 10:09:55 -08:00
Francesco Lavra
691c9ae099 V4L/DVB: dvb-core: fix initialization of feeds list in demux filter
A DVB demultiplexer device can be used to set up either a PES filter or
a section filter. In the former case, the ts field of the feed union of
struct dmxdev_filter is used, in the latter case the sec field of the
same union is used.
The ts field is a struct list_head, and is currently initialized in the
open() method of the demux device. When for a given demuxer a section
filter is set up, the sec field is played with, thus if a PES filter
needs to be set up after that the ts field will be corrupted, causing a
kernel oops.
This fix moves the list head initialization to
dvb_dmxdev_pes_filter_set(), so that the ts field is properly
initialized every time a PES filter is set up.

Signed-off-by: Francesco Lavra <francescolavra@interfree.it>
Cc: stable <stable@kernel.org>
Reviewed-by: Andy Walls <awalls@radix.net>
Tested-by: hermann pitton <hermann-pitton@arcor.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-02-08 10:47:17 -02:00
Mauro Carvalho Chehab
bc081cc869 V4L/DVB: dvb_demux: Don't use vmalloc at dvb_dmx_swfilter_packet
As dvb_dmx_swfilter_packet() is protected by a spinlock, it shouldn't sleep.
However, vmalloc() may call sleep. So, move the initialization of
dvb_demux::cnt_storage field to a better place.

Reviewed-by: Andy Walls <awalls@radix.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-02-08 10:46:58 -02:00
Mauro Carvalho Chehab
adefdceef4 V4L/DVB: Fix the risk of an oops at dvb_dmx_release
dvb_dmx_init tries to allocate virtual memory for 2 pointers: filter and feed.

If the second vmalloc fails, filter is freed, but the pointer keeps pointing
to the old place. Later, when dvb_dmx_release() is called, it will try to
free an already freed memory, causing an OOPS.

Reviewed-by: Andy Walls <awalls@radix.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-02-08 10:45:24 -02:00
Michal Simek
a601341111 microblaze: Invalidate dcache before enabling it
We found that on write-trough kernel is necessary to do that invalidation.
One WB is possible to use invalidation too.

Signed-off-by: Michal Simek <monstr@monstr.eu>
2010-02-08 11:39:18 +01:00
Mark Nelson
36350e0069 powerpc/pseries: Fix kexec regression caused by CPPR tracking
The code to track the CPPR values added by commit
49bd364713 ("powerpc/pseries: Track previous
CPPR values to correctly EOI interrupts") broke kexec on pseries because
the kexec code in xics.c calls xics_set_cpu_priority() before the IPI has
been EOI'ed. This wasn't a problem previously but it now triggers a BUG_ON
in xics_set_cpu_priority() because os_cppr->index isn't 0.

Fix this problem by setting the index on the CPPR stack to 0 before calling
xics_set_cpu_priority() in xics_teardown_cpu().

Also make it clear that we only want to set the priority when there's just
one CPPR value in the stack, and enforce it by updating the value of
os_cppr->stack[0] rather than os_cppr->stack[os_cppr->index].

While we're at it change the BUG_ON to a WARN_ON.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-08 15:29:19 +11:00
Matt Fleming
1af0b2fc67 sh: Remove superfluous setup_frame_reg call
There's no need to setup the frame pointer again in
call_handle_tlbmiss. The frame pointer will already have been setup in
handle_interrupt.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-02-08 10:47:11 +09:00
Matt Fleming
944a343861 sh: Don't continue unwinding across interrupts
Unfortunately, due to poor DWARF info in current toolchains, unwinding
through interrutps cannot be done reliably. The problem is that the
DWARF info for function epilogues is wrong.

Take this standard epilogue sequence,

80003cc4:       e3 6f           mov     r14,r15
80003cc6:       26 4f           lds.l   @r15+,pr
80003cc8:       f6 6e           mov.l   @r15+,r14
						<---- interrupt here
80003cca:       f6 6b           mov.l   @r15+,r11
80003ccc:       f6 6a           mov.l   @r15+,r10
80003cce:       f6 69           mov.l   @r15+,r9
80003cd0:       0b 00           rts

If we take an interrupt at the highlighted point, the DWARF info will
bogusly claim that the return address can be found at some offset from
the frame pointer, even though the frame pointer was just restored. The
worst part is if the unwinder finds a text address at the bogus stack
address - unwinding will continue, for a bit, until it finally comes
across an unexpected address on the stack and blows up.

The only solution is to stop unwinding once we've calculated the
function that was executing when the interrupt occurred. This PC can be
easily calculated from pt_regs->pc.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-02-08 10:47:04 +09:00
Matt Fleming
1dca56f138 sh: Setup frame pointer in handle_exception path
In order to allow the DWARF unwinder to unwind through exceptions we
need to setup the frame pointer register (r14).

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-02-08 10:46:53 +09:00
Matt Fleming
142698282c sh: Correct the offset of the return address in ret_from_exception
The address that ret_from_exception and ret_from_irq will return to is
found in the stack slot for SPC, not PR. This error was causing the
DWARF unwinder to pick up the wrong return address on the stack and then
unwind using the unwind tables for the wrong function.

While I'm here I might as well add CFI annotations for the other
registers since they could be useful when unwinding.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-02-08 10:46:46 +09:00
Linus Torvalds
6339204ecc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Take ima_file_free() to proper place.
  ima: rename PATH_CHECK to FILE_CHECK
  ima: rename ima_path_check to ima_file_check
  ima: initialize ima before inodes can be allocated
  fix ima breakage
  Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
  freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb
  befs: fix leak
2010-02-07 11:18:28 -08:00
Linus Torvalds
80e1e82398 Fix race in tty_fasync() properly
This reverts commit 7036251180 ("tty: fix race in tty_fasync") and
commit b04da8bfdf ("fnctl: f_modown should call write_lock_irqsave/
restore") that tried to fix up some of the fallout but was incomplete.

It turns out that we really cannot hold 'tty->ctrl_lock' over calling
__f_setown, because not only did that cause problems with interrupt
disables (which the second commit fixed), it also causes a potential
ABBA deadlock due to lock ordering.

Thanks to Tetsuo Handa for following up on the issue, and running
lockdep to show the problem.  It goes roughly like this:

 - f_getown gets filp->f_owner.lock for reading without interrupts
   disabled, so an interrupt that happens while that lock is held can
   cause a lockdep chain from f_owner.lock -> sighand->siglock.

 - at the same time, the tty->ctrl_lock -> f_owner.lock chain that
   commit 7036251180 introduced, together with the pre-existing
   sighand->siglock -> tty->ctrl_lock chain means that we have a lock
   dependency the other way too.

So instead of extending tty->ctrl_lock over the whole __f_setown() call,
we now just take a reference to the 'pid' structure while holding the
lock, and then release it after having done the __f_setown.  That still
guarantees that 'struct pid' won't go away from under us, which is all
we really ever needed.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Américo Wang <xiyou.wangcong@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-07 10:26:01 -08:00
Al Viro
89068c576b Take ima_file_free() to proper place.
Hooks: Just Say No.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:07:29 -05:00
Mimi Zohar
1e93d0052d ima: rename PATH_CHECK to FILE_CHECK
With the movement of the ima hooks functions were renamed from *path* to
*file* since they always deal with struct file.  This patch renames some of
the ima internal flags to make them consistent with the rest of the code.

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:23 -05:00
Mimi Zohar
9bbb6cad01 ima: rename ima_path_check to ima_file_check
ima_path_check actually deals with files!  call it ima_file_check instead.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00
Eric Paris
54bb6552bd ima: initialize ima before inodes can be allocated
ima wants to create an inode information struct (iint) when inodes are
allocated.  This means that at least the part of ima which does this
allocation (the allocation is filled with information later) should
before any inodes are created.  To accomplish this we split the ima
initialization routine placing the kmem cache allocator inside a
security_initcall() function.  Since this makes use of radix trees we also
need to make sure that is initialized before security_initcall().

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-02-07 03:06:22 -05:00