Commit graph

27109 commits

Author SHA1 Message Date
Chuck Lever
c3607282b4 NFS: Don't swap bytes in nfs4_construct_boot_verifier()
The SETCLIENTID boot verifier is opaque to NFSv4 servers, thus there
is no requirement for byte swapping before the client puts the
verifier on the wire.

This treatment is similar to other timestamp-based verifiers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:42 -04:00
Bryan Schumaker
497826af60 NFS: Fix compiler warnings
The "struct inode *inode" was only used in a dprintk, so compiling with
CONFIG_SUNRPC_DEBUG off triggers a warning.  To get around this, I
remove the "struct inode *inode" variable and instead change the
dprintk()s to use hdr->inode instead.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:43:04 -04:00
Andy Adamson
bd4aeffb5b NFSv4.1 skip rpc_call_done only on disconnected DS slot_table_waitq tasks
We reset all I/O on a disconnected data server through the pgio layer indicated
by the NFS_IOHDR_REDO flag.

Differentiate between on-the-wire tasks returning with an error which must
call rpc_call_done and tasks woken from the data server slot_table_waitq
waiting for a session slot with a status of zero which call rpc_exit in
rpc_prepare and need to skip rpc_call_done.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:42:42 -04:00
Andy Adamson
996074cb8c NFSv4.1 Just use nfs_put_client in filelayout release
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:42:37 -04:00
Andy Adamson
d42e78737c NFSv4.1 fix null state reference in filelayout_async_handle_error
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:42:28 -04:00
Trond Myklebust
53b8ee3464 NFSv4.1: Fix a bad reference count issue in the pNFS commit code
filelayout_scan_commit_lists needs to bump the reference count on
the struct nfs_page just like nfs_scan_commit_list().

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:36:27 -04:00
Rafael J. Wysocki
8714c8d74d Merge branch 'pm-sleep'
* pm-sleep:
  epoll: Fix user space breakage related to EPOLLWAKEUP
2012-05-22 20:57:19 +02:00
Rafael J. Wysocki
a8159414d7 epoll: Fix user space breakage related to EPOLLWAKEUP
Commit 4d7e30d (epoll: Add a flag, EPOLLWAKEUP, to prevent
suspend while epoll events are ready) caused some applications to
malfunction, because they set the bit corresponding to the new
EPOLLWAKEUP flag in their eventpoll flags and they don't have the
new CAP_EPOLLWAKEUP capability.

To prevent that from happening, change epoll_ctl() to clear
EPOLLWAKEUP in epds.events if the caller doesn't have the
CAP_EPOLLWAKEUP capability instead of failing and returning an
error code, which allows the affected applications to function
normally.

Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-05-22 20:57:06 +02:00
Linus Torvalds
cb60e3e65c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "New notable features:
   - The seccomp work from Will Drewry
   - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski
   - Longer security labels for Smack from Casey Schaufler
   - Additional ptrace restriction modes for Yama by Kees Cook"

Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
  apparmor: fix long path failure due to disconnected path
  apparmor: fix profile lookup for unconfined
  ima: fix filename hint to reflect script interpreter name
  KEYS: Don't check for NULL key pointer in key_validate()
  Smack: allow for significantly longer Smack labels v4
  gfp flags for security_inode_alloc()?
  Smack: recursive tramsmute
  Yama: replace capable() with ns_capable()
  TOMOYO: Accept manager programs which do not start with / .
  KEYS: Add invalidation support
  KEYS: Do LRU discard in full keyrings
  KEYS: Permit in-place link replacement in keyring list
  KEYS: Perform RCU synchronisation on keys prior to key destruction
  KEYS: Announce key type (un)registration
  KEYS: Reorganise keys Makefile
  KEYS: Move the key config into security/keys/Kconfig
  KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat
  Yama: remove an unused variable
  samples/seccomp: fix dependencies on arch macros
  Yama: add additional ptrace scopes
  ...
2012-05-21 20:27:36 -07:00
Linus Torvalds
62c8d92278 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 changes from Steven Whitehouse.

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits)
  GFS2: Fix quota adjustment return code
  GFS2: Add rgrp information to block_alloc trace point
  GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer
  GFS2: Update glock doc to add new stats info
  GFS2: Update main gfs2 doc
  GFS2: Remove redundant metadata block type check
  GFS2: Fix sgid propagation when using ACLs
  GFS2: eliminate log elements and simplify
  GFS2: Eliminate vestigial sd_log_le_rg
  GFS2: Eliminate needless parameter from function gfs2_setbit
  GFS2: Log code fixes
  GFS2: Remove unused argument from gfs2_internal_read
  GFS2: Remove bd_list_tr
  GFS2: Remove duplicate log code
  GFS2: Clean up log write code path
  GFS2: Use variable rather than qa to determine if unstuff necessary
  GFS2: Change variable blk to biblk
  GFS2: Fix function parameter comments in rgrp.c
  GFS2: Eliminate offset parameter to gfs2_setbit
  GFS2: Use slab for block reservation memory
  ...
2012-05-21 19:21:20 -07:00
Linus Torvalds
2e321806b6 Revert "vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu"
This reverts commit 8c01a529b8.

It turns out the d_unhashed() check isn't unnecessary after all: while
it's true that unhashing will increment the sequence numbers, that does
not necessarily invalidate the RCU lookup, because it might have seen
the dentry pointer (before it got unhashed), but by the time it loaded
the sequence number, it could have seen the *new* sequence number (after
it got unhashed).

End result: we might look up an unhashed dentry that is about to be
freed, with the sequence number never indicating anything bad about it.
So checking that the dentry is still hashed (*after* reading the sequence
number) is indeed the proper fix, and was never unnecessary.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-21 18:48:10 -07:00
James Morris
ff2bb047c4 Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next
Per pull request, for 3.5.
2012-05-22 11:21:06 +10:00
Linus Torvalds
6326c71fd2 vfs: be even more careful about dentry RCU name lookups
Miklos Szeredi points out that we need to also worry about memory
odering when doing the dentry name comparison asynchronously with RCU.

In particular, doing a rename can do a memcpy() of one dentry name over
another, and we want to make sure that any unlocked reader will always
see the proper terminating NUL character, so that it won't ever run off
the allocation.

Rather than having to be extra careful with the name copy or at lookup
time for each character, this resolves the issue by making sure that all
names that are inlined in the dentry always have a NUL character at the
end of the name allocation.  If we do that at dentry allocation time, we
know that no future name copy will ever change that final NUL to
anything else, so there are no memory ordering issues.

So even if a concurrent rename ends up overwriting the NUL character
that terminates the original name, we always know that there is one
final NUL at the end, and there is no worry about the lockless RCU
lookup traversing the name too far.

The out-of-line allocations are never copied over, so we can just make
sure that we write the name (with terminating NULL) and do a write
barrier before we expose the name to anything else by setting it in the
dentry.

Reported-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-21 16:14:04 -07:00
Linus Torvalds
a70b52ec1a vfs: make AIO use the proper rw_verify_area() area helpers
We had for some reason overlooked the AIO interface, and it didn't use
the proper rw_verify_area() helper function that checks (for example)
mandatory locking on the file, and that the size of the access doesn't
cause us to overflow the provided offset limits etc.

Instead, AIO did just the security_file_permission() thing (that
rw_verify_area() also does) directly.

This fixes it to do all the proper helper functions, which not only
means that now mandatory file locking works with AIO too, we can
actually remove lines of code.

Reported-by: Manish Honap <manish_honap_vit@yahoo.co.in>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-21 16:06:20 -07:00
Linus Torvalds
cd975ae0ce Clean up some c6x Kconfig items and add support for Elf FDPIC loader.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPunCjAAoJEOiN4VijXeFPR3cP+gLDhWJGWC/xnI+vr1WJJxZU
 xJQbaopEyqJa7GBT+3BasCCLUCU7MVIrYaqubzEI4dzVCMDxwuJYPT7m/N3Avssg
 bm1ZaLUnanc8ZLfLHlnsG02xtBtEYI4mUM7ggCfcDJJC0b7tDGcz0kiM2WDGK0fq
 LRbZmlC57qK0UTx2IEvl0DoMuH7pq7N1sSXFr6keBm/lqN7Qj3VtAf/ASVQdIgN8
 TfQiWbu40v8EEcKAZSkVKvhS3kSiPI9Dr1DGeR1JIhjFqFYUaw52aOzNTJyC9k1g
 4oPIoDh0xnSzYrYpetP1gN5bT/FMRcVXLvKM7cIbeUzoAlgnQsBDAGA7x205X8Bf
 ChrQEpm/yKZnwEBtp9uIEQY3rRT7iecqVgeVO5LXskg+/OX0+gO5CtCDPopWzqJN
 wSOCsP9va0G9W0a8G5hKqEGYhbCXDkzU6KzmcgxbxIeF+CvJ+72mWgqxJkF0GtuU
 KCoLQT7Necq+4p/SaiXL7KogS9m2rClbszManceyAYSGwqAPIfU3RzUw+Fz6NcIh
 DrYXNTZelrVN2wrVrlhn3B6GdgfyVabhQ1e3CftmAtbY/i+i/ArHCT9JxjyAHTuV
 ekTnpVVDBZUCUluJXFJKzrvRuSUt9X18AxK0NfJ+xglOPQnPh026K/sCuvmn6mAq
 U1f1XmCqG2l2TDtkZ6Wu
 =q4Z6
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming

Pull c6x updates from Mark Salter:
 "Clean up some c6x Kconfig items and add support for Elf FDPIC loader."

* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
  C6X: remove unused config items
  C6X: add support to build with BINFMT_ELF_FDPIC
  C6X: change main arch kbuild symbol
2012-05-21 12:46:48 -07:00
Linus Torvalds
cb62ab71fe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

 1) Get rid of the error prone NLA_PUT*() macros that used an embedded
    goto.

 2) Kill off the token-ring and MCA networking drivers, from Paul
    Gortmaker.

 3) Reduce high-order allocations made by datagram AF_UNIX sockets, from
    Eric Dumazet.

 4) Add PTP hardware clock support to IGB and IXGBE, from Richard
    Cochran and Jacob Keller.

 5) Allow users to query timestamping capabilities of a card via
    ethtool, from Richard Cochran.

 6) Add loadbalance mode to the teaming driver, from Jiri Pirko.  Part
    of this is that we can now have BPF filters not attached to sockets,
    and the loadbalancing function is calculated using one.

 7) Francois Romieu went through the network drivers removing gratuitous
    uses of netdev->base_addr, perhaps some day we can remove it
    completely but it's used for ISA probing still.

 8) Add a BPF JIT for sparc.  I know, who cares, right? :-)

 9) Move networking sysctl registry away from using the compatability
    mode interfaces in the sysctl code.  From Eric W Biederman.

10) Pavel Emelyanov added a way to save and restore TCP socket state via
    TCP_REPAIR, TCP_REPAIR_QUEUE, and TCP_QUEUE_SEQ socket options as
    well as a way to forcefully bind a socket to a port via the
    sk->sk_reuse value SK_FORCE_REUSE.  There is also a
    TCP_REPAIR_OPTIONS which allows to reinstante the TCP options
    enabled on the connection.

11) Several enhancements from Eric Dumazet that, in particular, can
    enhance splice performance on TCP sockets significantly.

     a) Reset the offset of the per-socket sendmsg page when we know
        we're the only use of the page in linear_to_page().

     b) Add facilities such that skb->data can be backed a page rather
        than SLAB kmalloc'd memory.  In particular devices which were
        receiving into linear RX buffers can now end up providing paged
        data.

    The big result is that code like splice and GRO do not have to copy
    any more.

12) Allow a pure sender to more gracefully handle ACK backlogs in TCP.
    What can happen at high rates is that the sender hasn't grown his
    receive buffer limits at all (he's not receiving data so really
    doesn't need to), but the non-data ACKs consume receive buffer
    space.

    sk_add_backlog() is too aggressive in dropping frames in this case,
    so relax it's requirements by using the receive buffer plus the send
    buffer limit as the backlog limit instead of just the former.

    Also from Eric Dumazet.

13) Add ipv6 support to L2TP, from Benjamin LaHaise, James Chapman, and
    Chris Elston.

14) Implement TCP early retransmit (RFC 5827), from Yuchung Cheng.
    Basically, we can start fast retransmit before hiting the dupack
    threshold under certain conditions.

15) New CODEL active queue management packet scheduler, from Eric
    Dumazet based upon initial work by Dave Taht.

    Basically, the big feature is that packets are dropped (or ECN bits
    are set) based upon how long packets live in the queue, rather than
    the queue length (which is what RED uses).

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1341 commits)
  drivers/net/stmmac: seq_file fix memory leak
  ipv6/exthdrs: strict Pad1 and PadN check
  USB: qmi_wwan: Add ZTE (Vodafone) K3520-Z
  USB: qmi_wwan: Add ZTE (Vodafone) K3765-Z
  USB: qmi_wwan: Make forced int 4 whitelist generic
  net/ipv4: replace simple_strtoul with kstrtoul
  net/ipv4/ipconfig: neaten __setup placement
  net: qmi_wwan: Add Vodafone/Huawei K5005 support
  net: cdc_ether: Add ZTE WWAN matches before generic Ethernet
  ipv6: use skb coalescing in reassembly
  ipv4: use skb coalescing in defragmentation
  net: introduce skb_try_coalesce()
  net:ipv6:fixed space issues relating to operators.
  net:ipv6:fixed a trailing white space issue.
  ipv6: disable GSO on sockets hitting dst_allfrag
  tg3: use netdev_alloc_frag() API
  net: napi_frags_skb() is static
  ppp: avoid false drop_monitor false positives
  ipv6: bool/const conversions phase2
  ipx: Remove spurious NULL checking in ipx_ioctl().
  ...
2012-05-21 10:03:46 -07:00
Linus Torvalds
31ed8e6f93 Merge branch 'dentry-cleanups' (dcache access cleanups and optimizations)
This branch simplifies and clarifies the dcache lookup, and allows us to
do certain nice optimizations when comparing dentries.  It also cleans
up the interface to __d_lookup_rcu(), especially around passing the
inode information around.

* dentry-cleanups:
  vfs: make it possible to access the dentry hash/len as one 64-bit entry
  vfs: move dentry name length comparison from dentry_cmp() into callers
  vfs: do the careful dentry name access for all dentry_cmp cases
  vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu
  vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces
2012-05-21 08:50:57 -07:00
Linus Torvalds
7e5cb5e151 Merge branch 'vfs-cleanups' (random vfs cleanups)
This teaches vfs_fstat() to use the appropriate f[get|put]_light
functions, allowing it to avoid some unnecessary locking for the common
case.

More noticeably, it also cleans up and simplifies the "getname_flags()"
function, which now relies on the architecture strncpy_from_user() doing
all the user access checks properly, instead of hacking around the fact
that on x86 it didn't use to do it right (see commit 92ae03f2ef: "x86:
merge 32/64-bit versions of 'strncpy_from_user()' and speed it up").

* vfs-cleanups:
  VFS: make vfs_fstat() use f[get|put]_light()
  VFS: clean up and simplify getname_flags()
  x86: make word-at-a-time strncpy_from_user clear bytes at the end
2012-05-21 08:46:08 -07:00
Dave Chinner
14c26c6a05 xfs: add trace points for log forces
To enable easy tracing of the location of log forces and the
frequency of them via perf, add a pair of trace points to the log
force functions.  This will help debug where excessive log forces
are being issued from by simple perf commands like:

# ~/perf/perf top -e xfs:xfs_log_force -G -U

Which gives this sort of output:

Events: 141  xfs:xfs_log_force
-  100.00%  [kernel]  [k] xfs_log_force
   - xfs_log_force
        87.04% xfsaild
           kthread
           kernel_thread_helper
      - 12.87% xfs_buf_lock
           _xfs_buf_find
           xfs_buf_get
           xfs_trans_get_buf
           xfs_da_do_buf
           xfs_da_get_buf
           xfs_dir2_data_init
           xfs_dir2_leaf_addname
           xfs_dir_createname
           xfs_create
           xfs_vn_mknod
           xfs_vn_create
           vfs_create
           do_last.isra.41
           path_openat
           do_filp_open
           do_sys_open
           sys_open
           system_call_fastpath

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sig.com>
2012-05-21 10:45:44 -05:00
Peter Watkins
3ba3160374 xfs: fix memory reclaim deadlock on agi buffer
Note xfs_iget can be called while holding a locked agi buffer. If
it goes into memory reclaim then inode teardown may try to lock the
same buffer. Prevent the deadlock by calling radix_tree_preload
with GFP_NOFS.

Signed-off-by: Peter Watkins <treestem@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-21 10:45:44 -05:00
Dave Chinner
ea562ed6e7 xfs: fix delalloc quota accounting on failure
xfstest 270 was causing quota reservations way beyond what was sane
(ten to hundreds of TB) for a 4GB filesystem. There's a sign problem
in the error handling path of xfs_bmapi_reserve_delalloc() because
xfs_trans_unreserve_quota_nblks() simple negates the value passed -
which doesn't work for an unsigned variable. This causes
reservations of close to 2^32 block instead of removing a
reservation of a handful of blocks.

Fix the same problem in the other xfs_trans_unreserve_quota_nblks()
callers where unsigned integer variables are used, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-21 10:45:43 -05:00
Linus Torvalds
8c12fec90c Merge branch 'stat-cleanups' (clean up copying of stat info to user space)
This makes cp_new_stat() a bit more readable, and avoids having to
memset() the whole structure just to fill in a couple of padding fields.

This is another result of me looking at code generation of functions
that show up high on certain kernel profiles, and just going "Oh, let's
just clean that up".

Architectures that don't supply the #define to fill just the padding
fields will still fall back to memset().

* stat-cleanups:
  vfs: don't force a big memset of stat data just to clear padding fields
  vfs: de-crapify "cp_new_stat()" function
2012-05-21 08:41:38 -07:00
Trond Myklebust
b3f87b98aa Merge branch 'bugfixes' into nfs-for-next 2012-05-21 10:12:39 -04:00
Sachin Bhamare
8b56a30caa exofs: Add SYSFS info for autologin/pNFS export
Introduce sysfs infrastructure for exofs cluster filesystem.

Each OSD target shows up as below in the sysfs hierarchy:
	/sys/fs/exofs/<osdname>_<partition_id>/devX

Where <osdname>_<partition_id> is the unique identification
of a Superblock.

Where devX: 0 <= X < device_table_size. They are ordered
in device-table order as specified to the mkfs.exofs command

Each OSD device  devX has following attributes :
	osdname - ReadOnly
	systemid - ReadOnly
	uri - Read/Write

It is up to user-mode to update devX/uri for support of
autologin.

These sysfs information are used both for autologin as well
as support for exporting exofs via a pNFSD server in user-mode.
(.eg NFS-Ganesha)

Signed-off-by: Sachin Bhamare <sbhamare@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-05-21 12:24:01 +03:00
David S. Miller
17eea0df5f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-05-20 21:53:04 -04:00
Artem Bityutskiy
4415626732 UBI: amend commentaries WRT dtype
Richard removed the "dtype" hint, but few commentaries were left and this patch
removes them. I've also added a better description about the "dtype" field in
the ubi-user.h for people who may ever wonder what was that dtype thing about.

This patch also adds an important note that it is better to use value "3" for
the "dtype" field.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-20 20:25:59 +03:00
Richard Weinberger
b36a261e8c UBI: Kill data type hint
We do not need this feature and to our shame it even was not working
and there was a bug found very recently.
	-- Artem Bityutskiy

Without the data type hint UBI2 (fastmap) will be easier to implement.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-20 20:25:59 +03:00
Sidney Amani
56b04e3e8b UBIFS: fix memory leak on error path
UBIFS leaks memory on error path in 'mount_ubifs()'. In case of failure in
'ubifs_fixup_free_space()', it does not call 'ubifs_lpt_free()' whereas LPT
data structures can potentially be allocated. The amount of memory leaked can
be quite high -- see 'ubifs_lpt_init()'.

The bug was introduced when moving the LPT initialisation earlier in the
mount process (commit '781c5717a95a74b294beb38b8276943b0f8b5bb4').

Signed-off-by: Sidney Amani <seed95@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-20 20:19:08 +03:00
Artem Bityutskiy
4994297606 UBIFS: make ubifs_lpt_init clean-up in case of failure
Most functions in UBIFS follow the following designn pattern: if the function
allocates multiple resources, and failss at some point, it frees what it has
allocated and returns an error. So the caller can rely on the fact that the
callee has cleaned up everything after own failure.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Sidney Amani <seed95@gmail.com>
2012-05-20 20:19:01 +03:00
Boaz Harrosh
6abe4a87f7 exofs: Fix CRASH on very early IO errors.
If at exofs_fill_super() we had an early termination
do to any error, like an IO error while reading the
super-block. We would crash inside exofs_free_sbi().

This is because sbi->oc.numdevs was set to 1, before
we actually have a device table at all.

Fix it by moving the sbi->oc.numdevs = 1 to after the
allocation of the device table.

Reported-by: Johannes Schild <JSchild@gmx.de>

Stable: This is a bug since v3.2.0
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-05-20 19:42:41 +03:00
Andy Adamson
041245c88a NFSv4.1 resend LAYOUTGET on data server invalid layout errors
The "invalid layout" class of errors is handled by destroying the layout and
getting a new layout from the server.  Currently, the layout must be
destroyed before a new layout can be obtained.

This means that all references (e.g.lsegs) to the "to be destroyed" layout
header must be dropped before it can be destroyed. This in turn means waiting
for all in flight RPC's using the old layout as well as draining the data
server session slot table wait queue.

Set the NFS_LAYOUT_INVALID flag to redirect I/O to the MDS while waiting for
the old layout to be destroyed.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:33 -04:00
Andy Adamson
b4a2967e52 NFSv4.1 dereference a disconnected data server client record
When the last DS io is processed, the data server client record will be
freed.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:32 -04:00
Andy Adamson
3a7936c3fc NFSv4.1 ref count nfs_client across filelayout data server io
Prepare to put a dis-connected DS client record.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:32 -04:00
Andy Adamson
0a57cdac3f NFSv4.1 send layoutreturn to fence disconnected data server
Let the MDS know that you are redirecting I/O from pNFS to MDS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:31 -04:00
Andy Adamson
671fb89695 NFSv4.1 wake up all tasks on un-connected DS slot table waitq
The DS has a connection error (invalid deviceid). Drain the fore channel
slot table waitq.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:31 -04:00
Andy Adamson
0ad2f378e1 NFSv4.1 Check invalid deviceid upon slot table waitq wakeup
Tasks sleeping on the slot table waitq wake to the rpc_prepare_task state.
Reset the task for io through the MDS if the deviceid is invalid.

The reset functions put the io pages through the pageio layer which has the
advantage of re-coalescing which allows for the MDS and DS having different
r/wsizes. Exit the awakened task without executing the rpc_call_done routine.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:31 -04:00
Andy Adamson
a033a09189 NFSv4.1 remove nfs4_reset_write and nfs4_reset_read
Replaced by filelayout_reset_write and filelayout_reset_read

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:59 -04:00
Andy Adamson
e7dd79af01 NFSv4.1: mark deviceid invalid on filelayout DS connection errors
This prevents the use of any layout for i/o that references the deviceid.
I/O is redirected through the MDS.

Redirect the unhandled failed I/O to the MDS without marking either the
layout or the deviceid invalid.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:20 -04:00
Andy Adamson
98fc685ae2 NFSv4.1 data server timeo and retrans module parameters
Set the recovery parameters for data servers.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:20 -04:00
Andy Adamson
9f0ec176b3 NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls
RPC_TASK_SOFTCONN returns connection errors to the caller which allows the pNFS
file layout to quickly try the MDS or perhaps another DS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:19 -04:00
Andy Adamson
90fecfcb34 NFSv4.1 cleanup filelayout invalid layout handling
The invalid layout bits are should only be used to block LAYOUTGETs.

Do not invalidate a layout on deviceid invalidation.
Do not invalidate a layout on un-handled READ, WRITE, COMMIT errors.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:19 -04:00
Andy Adamson
554d458d79 NFSv4.1: cleanup filelayout invalid deviceid handling
Move the invalid deviceid test into nfs4_fl_prepare_ds, called by the
filelayout read, write, and commit routines. NFS4_DEVICE_ID_NEG_ENTRY
is no longer needed.
Remove redundant printk's - filelayout_mark_devid_invalid prints a KERN_WARNING.

An invalid device prevents pNFS io.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:18 -04:00
Matthew Treinish
e73e6c9e85 Fixed goto readability in nfs_update_inode.
Simplified error gotos to make it slightly easier to read,
it doesn't affect the functionality of the routine.

Signed-off-by: Matthew Treinish <treinish@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:10:10 -04:00
Linus Torvalds
14e931a264 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A few small, but important fixes.  Most of them are marked for stable
  as well

   - Fix failure to release a semaphore on error path in mtip32xx.
   - Fix crashable condition in bio_get_nr_vecs().
   - Don't mark end-of-disk buffers as mapped, limit it to i_size.
   - Fix for build problem with CONFIG_BLOCK=n on arm at least.
   - Fix for a buffer overlow on UUID partition printing.
   - Trivial removal of unused variables in dac960."

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix buffer overflow when printing partition UUIDs
  Fix blkdev.h build errors when BLOCK=n
  bio allocation failure due to bio_get_nr_vecs()
  block: don't mark buffers beyond end of disk as mapped
  mtip32xx: release the semaphore on an error path
  dac960: Remove unused variables from DAC960_CreateProcEntries()
2012-05-19 10:12:17 -07:00
Linus Torvalds
73f1f5dd3e Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc fixes from Andrew Morton.

* emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches)
  frv: delete incorrect task prototypes causing compile fail
  slub: missing test for partial pages flush work in flush_all()
  fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
  drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
2012-05-18 15:56:25 -07:00
Linus Torvalds
30a08bf2d3 proc: move fd symlink i_mode calculations into tid_fd_revalidate()
Instead of doing the i_mode calculations at proc_fd_instantiate() time,
move them into tid_fd_revalidate(), which is where the other inode state
(notably uid/gid information) is updated too.

Otherwise we'll end up with stale i_mode information if an fd is re-used
while the dentry still hangs around.  Not that anything really *cares*
(symlink permissions don't really matter), but Tetsuo Handa noticed that
the owner read/write bits don't always match the state of the
readability of the file descriptor, and we _used_ to get this right a
long time ago in a galaxy far, far away.

Besides, aside from fixing an ugly detail (that has apparently been this
way since commit 61a2878402: "proc: Remove the hard coded inode
numbers" in 2006), this removes more lines of code than it adds.  And it
just makes sense to update i_mode in the same place we update i_uid/gid.

Al Viro correctly points out that we could just do the inode fill in the
inode iops ->getattr() function instead.  However, that does require
somewhat slightly more invasive changes, and adds yet *another* lookup
of the file descriptor.  We need to do the revalidate() for other
reasons anyway, and have the file descriptor handy, so we might as well
fill in the information at this point.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-18 14:06:17 -07:00
Wang Sheng-Hui
0324876628 ext2: trivial fix to comment for ext2_free_blocks
The function is ext2_free_blocks(), not ext2_free_blocks_sb().

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-18 15:23:08 +02:00
Cyrill Gorcunov
eb94cd96e0 fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
map_files/ entries are never supposed to be executed, still curious
minds might try to run them, which leads to the following deadlock

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.4.0-rc4-24406-g841e6a6 #121 Not tainted
  -------------------------------------------------------
  bash/1556 is trying to acquire lock:
   (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1

  but task is already holding lock:
   (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (&sig->cred_guard_mutex){+.+.+.}:
         validate_chain+0x444/0x4f4
         __lock_acquire+0x387/0x3f8
         lock_acquire+0x12b/0x158
         __mutex_lock_common+0x56/0x3a9
         mutex_lock_killable_nested+0x40/0x45
         lock_trace+0x24/0x59
         proc_map_files_lookup+0x5a/0x165
         __lookup_hash+0x52/0x73
         do_lookup+0x276/0x2b1
         walk_component+0x3d/0x114
         do_last+0xfc/0x540
         path_openat+0xd3/0x306
         do_filp_open+0x3d/0x89
         do_sys_open+0x74/0x106
         sys_open+0x21/0x23
         tracesys+0xdd/0xe2

  -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
         check_prev_add+0x6a/0x1ef
         validate_chain+0x444/0x4f4
         __lock_acquire+0x387/0x3f8
         lock_acquire+0x12b/0x158
         __mutex_lock_common+0x56/0x3a9
         mutex_lock_nested+0x40/0x45
         do_lookup+0x267/0x2b1
         walk_component+0x3d/0x114
         link_path_walk+0x1f9/0x48f
         path_openat+0xb6/0x306
         do_filp_open+0x3d/0x89
         open_exec+0x25/0xa0
         do_execve_common+0xea/0x2f9
         do_execve+0x43/0x45
         sys_execve+0x43/0x5a
         stub_execve+0x6c/0xc0

This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
and when do_lookup happens we try to grab task->signal->cred_guard_mutex
again in lock_trace.

Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
and in proc_map_files_readdir() instead of lock_trace(), the caller must
be CAP_SYS_ADMIN granted anyway.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-17 18:00:51 -07:00
Anton Vorontsov
39eb7e9791 pstore/ram: Add ECC support
This is now straightforward: just introduce a module parameter and pass
the needed value to persistent_ram_new().

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Marco Stornelli <marco.stornelli@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-17 08:51:59 -07:00
Anton Vorontsov
896fc1f0c4 pstore/ram: Switch to persistent_ram routines
The patch switches pstore RAM backend to use persistent_ram routines,
one step closer to the ECC support.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-17 08:51:41 -07:00
Anton Vorontsov
cddb8751c8 staging: android: persistent_ram: Move to fs/pstore/ram_core.c
This is a first step for adding ECC support for pstore RAM backend: we
will use the persistent_ram routines, kindly provided by Google.

Basically, persistent_ram is a set of helper routines to deal with the
[optionally] ECC-protected persistent ram regions.

A bit of Makefile, Kconfig and header files adjustments were needed
because of the move.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-17 08:50:00 -07:00
Shirish Pargaonkar
2608bee744 cifs: Include backup intent search flags during searches {try #2)
As observed and suggested by Tushar Gosavi...

---------
readdir calls these function to send TRANS2_FIND_FIRST and
TRANS2_FIND_NEXT command to the server. The current cifs module is
not specifying CIFS_SEARCH_BACKUP_SEARCH flag while sending these
command when backupuid/backupgid is specified. This can be resolved
by specifying CIFS_SEARCH_BACKUP_SEARCH flag.
---------

Cc: <stable@kernel.org>
Reported-and-Tested-by: Tushar Gosavi <tugosavi@in.ibm.com>
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-17 13:07:49 +04:00
Pavel Shilovsky
7f92447aa7 CIFS: Separate protocol specific part from setlk
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
2012-05-17 13:07:48 +04:00
Pavel Shilovsky
55157dfbb5 CIFS: Separate protocol specific part from getlk
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
2012-05-17 13:07:41 +04:00
David S. Miller
028940342a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-05-16 22:17:37 -04:00
Pavel Shilovsky
106dc538ab CIFS: Separate protocol specific lock type handling
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
2012-05-16 20:13:36 -05:00
Pavel Shilovsky
04a6aa8acf CIFS: Convert lock type to 32 bit variable
to handle SMB2 lock type field further.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
2012-05-16 20:13:35 -05:00
Pavel Shilovsky
fbd35acadd CIFS: Move locks to cifsFileInfo structure
CIFS brlock cache can be used by several file handles if we have a
write-caching lease on the file that is supported by SMB2 protocol.
Prepate the code to handle this situation correctly by sorting brlocks
by a fid to easily push them in portions when lease break comes.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
2012-05-16 20:13:35 -05:00
Jeff Layton
121b046af5 cifs: convert send_nt_cancel into a version specific op
For SMB2, this should be a no-op. Obviously if we wanted to do something
for the SMB2 case, we could also define an operation here for it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
2012-05-16 20:13:34 -05:00
Jeff Layton
23db65f511 cifs: add a smb_version_operations/values structures and a smb_version enum
We need a way to dispatch different operations for different versions.
Behold the smb_version_operations/values structures. For now, those
structures just hold the version enum value and nothing uses them.
Eventually, we'll expand them to cover other operations/values as we
change the callers to dispatch from here.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
2012-05-16 20:13:34 -05:00
Jeff Layton
5249af32da cifs: remove the vers= and version= synonyms for ver=
We want these to mean something different entirely, and the mount.cifs
helper only ever passed in ver= automatically. Also, don't allow
ver=cifs anymore since that was never passed in by the mount helper.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:34 -05:00
Jeff Layton
296838b182 cifs: add warning about change in default cache semantics in 3.7
Add a warning that will be displayed when there is no cache= option
specified. We want to ensure that users are aware of the change in
defaults coming in 3.7.

Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:33 -05:00
Jeff Layton
d06b5056ae cifs: display cache= option in /proc/mounts
...and deprecate the display of strictcache, forcedirectio, and fsc
as separate options.

Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:33 -05:00
Jeff Layton
09983b2fab cifs: add deprecation warnings to strictcache and forcedirectio
Leave them in for 2 releases and remove for 3.7.

Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:32 -05:00
Jeff Layton
15b6a47322 cifs: add a cache= option to better describe the different cache flavors
Currently, we have several mount options that control cifs' cache
behavior, but those options aren't considered to be mutually exclusive.
The result is poorly-defined when someone specifies more than one of
these options at mount time.

Fix this by adding a new cache= mount option that will supercede
"strictcache", and "forcedirectio". That will help make it clear that
these options are mutually exclusive. Also, change the legacy options to
be mutually exclusive too, to ensure that users don't get surprises.

Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:32 -05:00
Jeff Layton
4d61cd6ec7 cifs: add a deprecation warning to CIFS_IOC_CHECKUMOUNT ioctl
This was used by an ancient version of umount.cifs and in nowhere else
that I'm aware of. Let's add a warning now and dump it for 3.7.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:32 -05:00
Jeff Layton
5e500ed125 cifs: remove legacy MultiuserMount option
We've now warned about this for two releases. Remove it for 3.5.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:31 -05:00
Jeff Layton
1c89254926 cifs: convert cifs_iovec_read to use async reads
Convert cifs_iovec_read to use async I/O. This also raises the limit on
the rsize for uncached reads. We first allocate a set of pages to hold
the replies, then issue the reads in parallel and then collect the
replies and copy the results into the iovec.

A possible future optimization would be to kmap and inline the iovec
buffers and read the data directly from the socket into that. That would
require some rather complex conversion of the iovec into a kvec however.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:31 -05:00
Jeff Layton
2a1bb13853 cifs: add wrapper for cifs_async_readv to retry opening file
We'll need this same bit of code for the uncached case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:30 -05:00
Jeff Layton
6993f74a5b cifs: add refcounting to cifs_readdata structures
This isn't strictly necessary for the async readpages code, but the
uncached version will need to be able to collect the replies after
issuing the calls. Add a kref to cifs_readdata and use change the
code to take and put references appropriately.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:30 -05:00
Jeff Layton
8d5ce4d23c cifs: abstract out function to marshal the iovec for readv receives
Cached and uncached reads will need to do different things here to
handle the difference when the pages are in pagecache and not. Abstract
out the function that marshals the page list into a kvec array.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:29 -05:00
Jeff Layton
0471ca3fe4 cifs: make cifs_readdata_alloc take a work_func_t arg
We'll need different completion routines for an uncached read. Allow
the caller to set the one he needs at allocation time. Also, move
most of these functions to file.c so we can make more of them static.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2012-05-16 20:13:29 -05:00
Suresh Siddha
11aeca0b3a coredump: ensure the fpu state is flushed for proper multi-threaded core dump
Nalluru reported hitting the BUG_ON(__thread_has_fpu(tsk)) in
arch/x86/kernel/xsave.c:__sanitize_i387_state() during the coredump
of a multi-threaded application.

A look at the exit seqeuence shows that other threads can still be on the
runqueue potentially at the below shown exit_mm() code snippet:

		if (atomic_dec_and_test(&core_state->nr_threads))
			complete(&core_state->startup);

===> other threads can still be active here, but we notify the thread
===> dumping core to wakeup from the coredump_wait() after the last thread
===> joins this point. Core dumping thread will continue dumping
===> all the threads state to the core file.

		for (;;) {
			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
			if (!self.task) /* see coredump_finish() */
				break;
			schedule();
		}

As some of those threads are on the runqueue and didn't call schedule() yet,
their fpu state is still active in the live registers and the thread
proceeding with the coredump will hit the above mentioned BUG_ON while
trying to dump other threads fpustate to the coredump file.

BUG_ON() in arch/x86/kernel/xsave.c:__sanitize_i387_state() is
in the code paths for processors supporting xsaveopt. With or without
xsaveopt, multi-threaded coredump is broken and maynot contain
the correct fpustate at the time of exit.

In coredump_wait(), wait for all the threads to be come inactive, so
that we are sure all the extended register state is flushed to
the memory, so that it can be reliably copied to the core file.

Reported-by: Suresh Nalluru <suresh@aristanetworks.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-2-git-send-email-suresh.b.siddha@intel.com
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-05-16 15:16:48 -07:00
Linus Torvalds
dfae359f08 Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fix from Jeff Layton

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix misspelling of "forcedirectio"
2012-05-16 14:22:38 -07:00
Benny Halevy
5f23eff381 NFS: fix unsigned comparison in nfs4_create_sec_client
fs/nfs/nfs4namespace.c: In function ‘nfs4_create_sec_client’:
fs/nfs/nfs4namespace.c:171:2: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]

Introduced by commit 72de53ec4b
"NFS: Do secinfo as part of lookup"

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-16 10:36:50 -07:00
Trond Myklebust
39ffb9218e NFS: Fix a compile issue when CONFIG_NFS_FSCACHE was undefined
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-16 10:24:20 -07:00
Artem Bityutskiy
a6aae4dd0f UBIFS: get rid of dbg_err
This patch removes the 'dbg_err()' macro and we now use 'ubifs_err()' instead.
The idea of 'dbg_err()' was to compile out some error message to make the
binary a bit smaller - but I think it was a bad idea.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-16 20:11:23 +03:00
Artem Bityutskiy
f70b7e52aa UBIFS: remove Kconfig debugging option
Have the debugging stuff always compiled-in instead. It simplifies maintanance
a lot.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-16 19:53:46 +03:00
Artem Bityutskiy
1baafd28dc UBIFS: remove a couple of unused macros
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-16 19:36:04 +03:00
Jeff Layton
531c8ff0d4 cifs: fix misspelling of "forcedirectio"
...and add a "directio" synonym since that's what the manpage has
always advertised.

Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-05-16 11:26:25 -05:00
Artem Bityutskiy
edf6be245f UBIFS: rename dumping functions
This commit re-names all functions which dump something from "dbg_dump_*()" to
"ubifs_dump_*()". This is done for consistency with UBI and because this way it
will be more logical once we remove the debugging sompilation option.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-16 19:15:56 +03:00
Artem Bityutskiy
7c46d0ae29 UBIFS: get rid of dbg_dump_stack
In case of errors we almost always need the stack dump - it makes no sense
to compile it out. Remove the 'dbg_dump_stack()' function completely.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-05-16 19:04:54 +03:00
Anton Vorontsov
1894a253db ramoops: Move to fs/pstore/ram.c
Since ramoops was converted to pstore, it has nothing to do with character
devices nowadays. Instead, today it is just a RAM backend for pstore.

The patch just moves things around. There are a few changes were needed
because of the move:

1. Kconfig and Makefiles fixups, of course.

2. In pstore/ram.c we have to play a bit with MODULE_PARAM_PREFIX, this
   is needed to keep user experience the same as with ramoops driver
   (i.e. so that ramoops.foo kernel command line arguments would still
   work).

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Marco Stornelli <marco.stornelli@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-16 08:06:37 -07:00
Bob Peterson
500242ac61 GFS2: Fix quota adjustment return code
This patch changes function gfs2_adjust_quota so that it properly
returns a good (zero) return code on the normal path through the code.
Without this, mounting GFS2 with -o quota=account periodically gave
this error message: GFS2: fsid=cluster:fs: gfs2_quotad: sync error -5

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-16 12:22:38 +01:00
Eric W. Biederman
ab27b91b9f userns: Convert sysfs to use kgid/kuid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:29 -07:00
Eric W. Biederman
091bd3ea4e userns: Convert sysctl permission checks to use kuid and kgids.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:28 -07:00
Eric W. Biederman
dcb0f22282 userns: Convert proc to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:28 -07:00
Eric W. Biederman
08cefc7ab8 userns: Convert ext4 to user kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:27 -07:00
Eric W. Biederman
1523299d58 userns: Convert ext3 to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:27 -07:00
Eric W. Biederman
b8a9f9e183 userns: Convert ext2 to use kuid/kgid where appropriate.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:26 -07:00
Eric W. Biederman
f04c6ce2cf userns: Convert devpts to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:26 -07:00
Eric W. Biederman
ebc887b278 userns: Convert binary formats to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:25 -07:00
Eric W. Biederman
9e4a36ece6 userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:23 -07:00
Wang Sheng-Hui
aa9e939d52 ext2: remove the redundant comment for ext2_export_ops
The comment is outdated and isn't particularly informative anyway - NULL
meaning the default behavior is very common in kernel. And we really set about
half of entries. So remove the whole comment for ext2_export_ops.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:39 +02:00
Eric Sandeen
d7dab39b6e ext3: return 32/64-bit dir name hash according to usage type
This is based on commit d1f5273e9a
ext4: return 32/64-bit dir name hash according to usage type
by Fan Yong <yong.fan@whamcloud.com>

Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir().  However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.

Allow ext3 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions.

This patch does implement a new ext3_dir_llseek op, because with 64-bit
hashes, nfs will attempt to seek to a hash "offset" which is much
larger than ext3's s_maxbytes.  So for dx dirs, we call
generic_file_llseek_size() with the appropriate max hash value as the
maximum seekable size.  Otherwise we just pass through to
generic_file_llseek().

Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Patch-updated-by: Eric Sandeen <sandeen@redhat.com>
(blame us if something is not correct)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:39 +02:00
Jan Kara
a80b12c3d0 quota: Get rid of nested I_MUTEX_QUOTA locking subclass
So far i_mutex was ranking above dqonoff_mutex and i_mutex on quota files
was special and ranking below dqonoff_mutex (and several other locks).
However there's no real need for i_mutex on quota files to be special.
IO on quota files is serialized by dqio_mutex anyway so we don't need to
take i_mutex when writing to quota files. Other places where we take i_mutex
on quota file can accomodate standard i_mutex lock ranking, we only need
to change the lock ranking to be dqonoff_mutex > i_mutex which is a matter
of changing documentation because there's no place which would enforce
ordering in the other direction.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:39 +02:00
Jan Kara
f9ef178412 quota: Use precomputed value of sb_dqopt in dquot_quota_sync
Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:39 +02:00
Jan Kara
e2a3fde750 ext2: Remove i_mutex use from ext2_quota_write()
We don't need i_mutex in ext2_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:38 +02:00
Jan Kara
67f1648d21 reiserfs: Remove i_mutex use from reiserfs_quota_write()
We don't need i_mutex in reiserfs_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:38 +02:00
Jan Kara
0b7f7cefae ext4: Remove i_mutex use from ext4_quota_write()
We don't need i_mutex in ext4_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:38 +02:00
Jan Kara
905c393762 ext3: Remove i_mutex use from ext3_quota_write()
We don't need i_mutex in ext3_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:37 +02:00
Jan Kara
d7e9711760 quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
When CONFIG_QUOTA_DEBUG is enabled we call inode_get_rsv_space() from
add_dquot_ref() while holding i_lock. But inode_get_rsv_space() is trying
to get i_lock as well resulting in double lock.

Fix the problem by moving inode_get_rsv_space() call out of i_lock.

Reported-and-analyzed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:37 +02:00
Jan Kara
fd2cbd4dfa jbd: Write journal superblock with WRITE_FUA after checkpointing
If journal superblock is written only in disk's caches and other transaction
starts reusing space of the transaction cleaned from the log, it can happen
blocks of a new transaction reach the disk before journal superblock. When
power failure happens in such case, subsequent journal replay would still try
to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:37 +02:00
Jan Kara
1ce8486dcc jbd: protect all log tail updates with j_checkpoint_mutex
There are some log tail updates that are not protected by j_checkpoint_mutex.
Some of these are harmless because they happen during startup or shutdown but
updates in journal_commit_transaction() and journal_flush() can really race
with other log tail updates (e.g. someone doing journal_flush() with someone
running cleanup_journal_tail()). So protect all log tail updates with
j_checkpoint_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:36 +02:00
Jan Kara
9754e39c7b jbd: Split updating of journal superblock and marking journal empty
There are three case of updating journal superblock. In the first case, we want
to mark journal as empty (setting s_sequence to 0), in the second case we want
to update log tail, in the third case we want to update s_errno. Split these
cases into separate functions. It makes the code slightly more straightforward
and later patches will make the distinction even more important.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:36 +02:00
Eric W. Biederman
a7c1938e22 userns: Convert stat to return values mapped from kuids and kgids
- Store uids and gids with kuid_t and kgid_t in struct kstat
- Convert uid and gids to userspace usable values with
  from_kuid and from_kgid

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:08:35 -07:00
Ben Myers
1307bbd2af xfs: protect xfs_sync_worker with s_umount semaphore
xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing
work during mount and unmount.  This flag can be cleared by unmount
after the xfs_sync_worker checks it but before the work is completed.
The has caused crashes in the completion handler for the dummy
transaction commited by xfs_sync_worker:

PID: 27544  TASK: ffff88013544e040  CPU: 3   COMMAND: "kworker/3:0"
 #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
 #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
 #2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
 #3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
 #4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
 #5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
 #6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
 #7 [ffff88016fdffc60] page_fault at ffffffff813ac635
    [exception RIP: xlog_get_lowest_lsn+0x30]
    RIP: ffffffffa04a9910  RSP: ffff88016fdffd10  RFLAGS: 00010246
    RAX: ffffc90014e48000  RBX: ffff88014d879980  RCX: ffff88014d879980
    RDX: ffff8802214ee4c0  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88016fdffd10   R8: ffff88014d879a80   R9: 0000000000000000
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff8802214ee400
    R13: ffff88014d879980  R14: 0000000000000000  R15: ffff88022fd96605
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
 #9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]

Protect xfs_sync_worker by using the s_umount semaphore at the read
level to provide exclusion with unmount while work is progressing.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-15 14:35:43 -05:00
Dan Carpenter
75af271ed5 dlm: NULL dereference on failure in kmem_cache_create()
We aren't allowed to pass NULL pointers to kmem_cache_destroy() so if
both allocations fail, it leads to a NULL dereference.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-15 10:39:28 -05:00
Mark Salter
fce2447627 C6X: add support to build with BINFMT_ELF_FDPIC
C6x userspace supports a shared library mechanism called DSBT for systems with
no MMU. DSBT is similar to FDPIC in allowing shared text segments and private
copies of data segments without an MMU. Both methods access data using a base
register and offset. With FDPIC, the caller of an external function sets up the
base register for the callee. With DSBT, the called function sets up its own
base register. Other details differ but both userspaces need the same thing
from the kernel loader: a map of where each ELF segment was loaded. The FDPIC
loader already provides this, so DSBT just uses it.

This patch enables BINFMT_ELF_FDPIC by default for C6X and provides the
necessary architecture hooks for the generic loader.

Signed-off-by: Mark Salter <msalter@redhat.com>
2012-05-15 09:17:34 -04:00
Dan Carpenter
5abc03cd91 NFS: kmalloc() doesn't return an ERR_PTR()
Obviously we should check for NULL here instead of IS_ERR().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@vger.kernel.org [3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:44:01 -07:00
Bryan Schumaker
981f9face8 NFS: Turn v3 on by default
Most users will use NFS v3 or possibly v4 so this makes it easier for
them.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:42:22 -07:00
Bryan Schumaker
2ba68002a7 NFS: Make v2 configurable
With this patch NFS v2 can be disabled during Kconfig.  I default the
option to "y" to match the current behavior.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:42:22 -07:00
Bryan Schumaker
5e7e5a0da2 NFS: Create an NFS v3 stat_to_errno()
In theory, NFS v3 can have different error versions than NFS v2. v4 is
already using its own nfs4_stat_to_errno() to map error codes, so
rather than create something in the generic client for v2 and v3 to
share I instead give v3 its own function.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:42:21 -07:00
Bryan Schumaker
87c7083dc3 NFS: Pass mntfh as part of the nfs_mount_info structure
This allows me to use the filehandle allocated in nfs_fs_mount() for nfs
v4 mounts instead of allocating a new one.  Rather than change
nfs4_mount() to look almost exactly like nfs_fs_mount(), I instead
remove the function.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:31 -07:00
Bryan Schumaker
46058d46d3 NFS: Allocate parsed mount data directly to the nfs_mount_info structure
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:31 -07:00
Bryan Schumaker
d72c727cd9 NFS: Create a single nfs_validate_mount_data() function
This new function chooses between the v2/3 parser and the v4 parser by
filesystem type.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:30 -07:00
Bryan Schumaker
b72e4f42a3 NFS: Create a single function for text mount data
The v2/3 and v4 cases were very similar, with just a few parameters
changed.  This makes it easy to share code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:30 -07:00
Bryan Schumaker
486aa699ff NFS: Create a new nfs_try_mount()
This function returns the same same return type as nfs4_try_mount() so
they two can be more easily substituted.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:29 -07:00
Bryan Schumaker
db83335191 NFS: Let mount data parsing set the NFS version
This field is unconditionally set while parsing mount data, so there is
no need to fill it in here.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:29 -07:00
Bryan Schumaker
21e4b82e13 NFS: Use nfs_fs_mount_common() for remote referral mounts
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:28 -07:00
Bryan Schumaker
3d176e3fe4 NFS: Use nfs_fs_mount_common() for xdev mounts
At this point, there are only a few small differences between these two
functions.  I can set a few function pointers in the nfs_mount_info
struct to get around these differences.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:28 -07:00
Bryan Schumaker
8c958e0c4c NFS: Create a common xdev_mount() function
The only difference between nfs_xdev_mount() and nfs4_xdev_mount() is the
clone_super() function called to clone the super block.  I can combine
these two functions by using the fill_super field in the mount_info
structure.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:28 -07:00
Bryan Schumaker
c40f8d1d35 NFS: Create a common fs_mount() function
The nfs4_remote_mount() function was only slightly different from the
nfs_fs_mount() function used by the generic client.  I created a new
nfs_mount_info structure to set different parameters to help combine
these functions.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:27 -07:00
Bryan Schumaker
586f95cd4f NFS: Remove NFS4_MOUNT_UNSHARED
This flag is numerically equivalent to NFS_MOUNT_UNSHARED, so I can
remove it to make collapsing functions more straightforward.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:27 -07:00
Bryan Schumaker
2311b9439c NFS: Don't pass mount data to nfs_fscache_get_super_cookie()
I intend on creating a single nfs_fs_mount() function used by all our
mount paths.  To avoid checking between new mounts and clone mounts, I
instead pass both structures to a new function in super.c that finds the
cache key and then looks up the super cookie.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:26 -07:00
Bryan Schumaker
bae36241be NFS: Create a single nfs_get_root()
This patch splits out the NFS v4 specific functionality of
nfs4_get_root() into its own rpc_op called by the generic client, and
leaves nfs4_proc_get_rootfh() as its own stand alone function.  This
also allows me to change nfs4_remote_mount(), nfs4_xdev_mount() and
nfs4_remote_referral_mount() to use the generic client's nfs_get_root()
function.  Later patches in this series will collapse these functions
into one common function, so using the same get_root() function
everywhere simplifies future changes.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:26 -07:00
Bryan Schumaker
3028eb2b32 NFS: Rename nfs4_proc_get_root()
This function is really getting the root filehandle and not the root
dentry of the filesystem.  I also removed the rpc_ops lookup from
nfs4_get_rootfh() under the assumption that if we reach this function
then we already know we are using NFS v4.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:25 -07:00
Jeff Liu
3fe3e6b182 xfs: introduce SEEK_DATA/SEEK_HOLE support
This patch adds lseek(2) SEEK_DATA/SEEK_HOLE functionality to xfs.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:21:05 -05:00
Ben Myers
e700a06c71 xfs: make xfs_extent_busy_trim not static
Commit e459df5, 'xfs: move busy extent handling to it's own file'
moved some code from xfs_alloc.c into xfs_extent_busy.c for
convenience in userspace code merges.  One of the functions moved is
xfs_extent_busy_trim (formerly xfs_alloc_busy_trim) which is defined
STATIC.  Unfortunately this function is still used in xfs_alloc.c, and
this results in an undefined symbol in xfs.ko.

Make xfs_extent_busy_trim not static and add its prototype to
xfs_extent_busy.h.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
2012-05-14 16:21:04 -05:00
Dave Chinner
611c99468c xfs: make XBF_MAPPED the default behaviour
Rather than specifying XBF_MAPPED for almost all buffers, introduce
XBF_UNMAPPED for the couple of users that use unmapped buffers.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:21:03 -05:00
Dave Chinner
d4f3512b08 xfs: flush outstanding buffers on log mount failure
When we fail to mount the log in xfs_mountfs(), we tear down all the
infrastructure we have already allocated. However, the process of
mounting the log may have progressed to the point of reading,
caching and modifying buffers in memory. Hence before we can free
all the infrastructure, we have to flush and remove all the buffers
from memory.

Problem first reported by Eric Sandeen, later a different incarnation
was reported by Ben Myers.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:21:02 -05:00
Dave Chinner
12bcb3f7d4 xfs: Properly exclude IO type flags from buffer flags
Recent event tracing during a debugging session showed that flags
that define the IO type for a buffer are leaking into the flags on
the buffer incorrectly. Fix the flag exclusion mask in
xfs_buf_alloc() to avoid problems that may be caused by such
leakage.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:21:01 -05:00
Dave Chinner
ad1e95c54e xfs: clean up xfs_bit.h includes
With the removal of xfs_rw.h and other changes over time, xfs_bit.h
is being included in many files that don't actually need it. Clean
up the includes as necessary.

Also move the only-used-once xfs_ialloc_find_free() static inline
function out of a header file that is widely included to reduce
the number of needless dependencies on xfs_bit.h.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:21:00 -05:00
Dave Chinner
2af51f3a4e xfs: move xfs_do_force_shutdown() and kill xfs_rw.c
xfs_do_force_shutdown now is the only thing in xfs_rw.c. There is no
need to keep it in it's own file anymore, so move it to xfs_fsops.c
next to xfs_fs_goingdown() and kill xfs_rw.c.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:59 -05:00
Dave Chinner
2a0ec1d9ed xfs: move xfs_get_extsz_hint() and kill xfs_rw.h
The only thing left in xfs_rw.h is a function prototype for an inode
function.  Move that to xfs_inode.h, and kill xfs_rw.h.

Also move the function implementing the prototype from xfs_rw.c to
xfs_inode.c so we only have one function left in xfs_rw.c

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:58 -05:00
Dave Chinner
fd50092c08 xfs: move xfs_fsb_to_db to xfs_bmap.h
This is the only remaining useful function in xfs_rw.h, so move it
to a header file responsible for block mapping functions that the
callers already include. Soon we can get rid of xfs_rw.h.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:57 -05:00
Dave Chinner
4ecbfe637c xfs: clean up busy extent naming
Now that the busy extent tracking has been moved out of the
allocation files, clean up the namespace it uses to
"xfs_extent_busy" rather than a mix of "xfs_busy" and
"xfs_alloc_busy".

Signed-off-by: Dave Chinner<dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:56 -05:00
Dave Chinner
efc27b5259 xfs: move busy extent handling to it's own file
To make it easier to handle userspace code merges, move all the busy
extent handling out of the allocation code and into it's own file.
The userspace code does not need the busy extent code, so this
simplifies the merging of the kernel code into the userspace
xfsprogs library.

Because the busy extent code has been almost completely rewritten
over the past couple of years, also update the copyright on this new
file to include the authors that made all those changes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:55 -05:00
Dave Chinner
60a34607b2 xfs: move xfsagino_t to xfs_types.h
Untangle the header file includes a bit by moving the definition of
xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
xfs_ag.h.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:54 -05:00
Dave Chinner
bc4010ecb8 xfs: use iolock on XFS_IOC_ALLOCSP calls
fsstress has a particular effective way of stopping debug XFS
kernels. We keep seeing assert failures due finding delayed
allocation extents where there should be none. This shows up when
extracting extent maps and we are holding all the locks we should be
to prevent races, so this really makes no sense to see these errors.

After checking that fsstress does not use mmap, it occurred to me
that fsstress uses something that no sane application uses - the
XFS_IOC_ALLOCSP ioctl interfaces for preallocation. These interfaces
do allocation of blocks beyond EOF without using preallocation, and
then call setattr to extend and zero the allocated blocks.

THe problem here is this is a buffered write, and hence the
allocation is a delayed allocation. Unlike the buffered IO path, the
allocation and zeroing are not serialised using the IOLOCK. Hence
the ALLOCSP operation can race with operations holding the iolock to
prevent buffered IO operations from occurring.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:53 -05:00
Dave Chinner
aa5c158ec9 xfs: kill XBF_DONTBLOCK
Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
avoid recursion through memory reclaim back into the filesystem.

All the blocking get calls in growfs occur inside a transaction, even though
they are no part of the transaction, so all allocation will be GFP_NOFS due to
the task flag PF_TRANS being set. The blocking read calls occur during log
recovery, so they will probably be unaffected by converting to GFP_NOFS
allocations.

Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:52 -05:00
Dave Chinner
7ca790a507 xfs: kill xfs_read_buf()
xfs_read_buf() is effectively the same as xfs_trans_read_buf() when called
outside a transaction context. The error handling is slightly different in that
xfs_read_buf stales the errored buffer it gets back, but there is probably good
reason for xfs_trans_read_buf() for doing this.

Hence update xfs_trans_read_buf() to the same error handling as xfs_read_buf(),
and convert all the callers of xfs_read_buf() to use the former function. We can
then remove xfs_read_buf().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:51 -05:00
Dave Chinner
a8acad7073 xfs: kill XBF_LOCK
Buffers are always returned locked from the lookup routines. Hence
we don't need to tell the lookup routines to return locked buffers,
on to try and lock them. Remove XBF_LOCK from all the callers and
from internal buffer cache usage.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:50 -05:00
Dave Chinner
795cac72e9 xfs: kill xfs_buf_btoc
xfs_buf_btoc and friends are simple macros that do basic block
to page index conversion and vice versa. These aren't widely used,
and we use open coded masking and shifting everywhere else. Hence
remove the macros and open code the work they do.

Also, use of PAGE_CACHE_{SIZE|SHIFT|MASK} for these macros is now
incorrect - we are using pages directly and not the page cache, so
use PAGE_{SIZE|MASK|SHIFT} instead.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:49 -05:00
Dave Chinner
aa0e8833b0 xfs: use blocks for storing the desired IO size
Now that we pass block counts everywhere, and index buffers by block
number and length in units of blocks, convert the desired IO size
into block counts rather than bytes. Convert the code to use block
counts, and those that need byte counts get converted at the time of
use.

Rename the b_desired_count variable to something closer to it's
purpose - b_io_length - as it is only used to specify the length of
an IO for a subset of the buffer.  The only time this is used is for
log IO - both writing iclogs and during log recovery. In all other
cases, the b_io_length matches b_length, and hence a lot of code
confuses the two. e.g. the buf item code uses the io count
exclusively when it should be using the buffer length. Fix these
apprpriately as they are found.

Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers
around the desired IO length. They only serve to make the code
shouty loud, don't actually add any real value, and are often used
incorrectly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:48 -05:00
Dave Chinner
4e94b71b70 xfs: use blocks for counting length of buffers
Now that we pass block counts everywhere, and index buffers by block
number, track the length of the buffer in units of blocks rather
than bytes. Convert the code to use block counts, and those that
need byte counts get converted at the time of use.

Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers
around the buffer length. They only serve to make the code shouty
loud and don't actually add any real value.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:47 -05:00
Dave Chinner
de1cbee462 xfs: kill b_file_offset
Seeing as we pass block numbers around everywhere in the buffer
cache now, it makes no sense to index everything by byte offset.
Replace all the byte offset indexing with block number based
indexing, and replace all uses of the byte offset with direct
conversion from the block index.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:46 -05:00
Dave Chinner
e70b73f84f xfs: clean up buffer get/read call API
The xfs_buf_get/read API is not consistent in the units it uses, and
does not use appropriate or consistent units/types for the
variables.

Convert the API to use disk addresses and block counts for all
buffer get and read calls. Use consistent naming for all the
functions and their declarations, and convert the internal functions
to use disk addresses and block counts to avoid need to convert them
from one type to another and back again.

Fix all the callers to use disk addresses and block counts. In many
cases, this removes an additional conversion from the function call
as the callers already have a block count.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:45 -05:00
Dave Chinner
bf813cdddf xfs: use kmem_zone_zalloc for buffers
To replace the alloc/memset pair.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:44 -05:00
Dave Chinner
ead360c50d xfs: fix incorrect b_offset initialisation
Because we no longer use the page cache for buffering, there is no
direct block number to page offset relationship anymore.
xfs_buf_get_pages is still setting up b_offset as if there was some
relationship, and that is leading to incorrectly setting up
*uncached* buffers that don't overwrite b_offset once they've had
pages allocated.

For cached buffers, the first block of the buffer is always at offset
zero into the allocated memory. This is true for sub-page sized
buffers, as well as for multiple-page buffers.

For uncached buffers, b_offset is only non-zero when we are
associating specific memory to the buffers, and that is set
correctly by the code setting up the buffer.

Hence remove the setting of b_offset in xfs_buf_get_pages, because
it is now always the wrong thing to do.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-05-14 16:20:43 -05:00