Commit graph

975 commits

Author SHA1 Message Date
Linus Torvalds
d3dc366bba Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block
Pull core block layer changes from Jens Axboe:
 "This is the core block IO pull request for 3.18.  Apart from the new
  and improved flush machinery for blk-mq, this is all mostly bug fixes
  and cleanups.

   - blk-mq timeout updates and fixes from Christoph.

   - Removal of REQ_END, also from Christoph.  We pass it through the
     ->queue_rq() hook for blk-mq instead, freeing up one of the request
     bits.  The space was overly tight on 32-bit, so Martin also killed
     REQ_KERNEL since it's no longer used.

   - blk integrity updates and fixes from Martin and Gu Zheng.

   - Update to the flush machinery for blk-mq from Ming Lei.  Now we
     have a per hardware context flush request, which both cleans up the
     code should scale better for flush intensive workloads on blk-mq.

   - Improve the error printing, from Rob Elliott.

   - Backing device improvements and cleanups from Tejun.

   - Fixup of a misplaced rq_complete() tracepoint from Hannes.

   - Make blk_get_request() return error pointers, fixing up issues
     where we NULL deref when a device goes bad or missing.  From Joe
     Lawrence.

   - Prep work for drastically reducing the memory consumption of dm
     devices from Junichi Nomura.  This allows creating clone bio sets
     without preallocating a lot of memory.

   - Fix a blk-mq hang on certain combinations of queue depths and
     hardware queues from me.

   - Limit memory consumption for blk-mq devices for crash dump
     scenarios and drivers that use crazy high depths (certain SCSI
     shared tag setups).  We now just use a single queue and limited
     depth for that"

* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
  block: Remove REQ_KERNEL
  blk-mq: allocate cpumask on the home node
  bio-integrity: remove the needless fail handle of bip_slab creating
  block: include func name in __get_request prints
  block: make blk_update_request print prefix match ratelimited prefix
  blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
  block: fix alignment_offset math that assumes io_min is a power-of-2
  blk-mq: Make bt_clear_tag() easier to read
  blk-mq: fix potential hang if rolling wakeup depth is too high
  block: add bioset_create_nobvec()
  block: use bio_clone_fast() in blk_rq_prep_clone()
  block: misplaced rq_complete tracepoint
  sd: Honor block layer integrity handling flags
  block: Replace strnicmp with strncasecmp
  block: Add T10 Protection Information functions
  block: Don't merge requests if integrity flags differ
  block: Integrity checksum flag
  block: Relocate bio integrity flags
  block: Add a disk flag to block integrity profile
  block: Add prefix to block integrity profile flags
  ...
2014-10-18 11:53:51 -07:00
Martin K. Petersen
e19a8a0ad2 block: Remove REQ_KERNEL
REQ_KERNEL is no longer used. Remove it and drop the redundant uio
argument to nfs_file_direct_{read,write}.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-10-14 09:00:44 -06:00
Linus Torvalds
77c688ac87 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "The big thing in this pile is Eric's unmount-on-rmdir series; we
  finally have everything we need for that.  The final piece of prereqs
  is delayed mntput() - now filesystem shutdown always happens on
  shallow stack.

  Other than that, we have several new primitives for iov_iter (Matt
  Wilcox, culled from his XIP-related series) pushing the conversion to
  ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c
  cleanups and fixes (including the external name refcounting, which
  gives consistent behaviour of d_move() wrt procfs symlinks for long
  and short names alike) and assorted cleanups and fixes all over the
  place.

  This is just the first pile; there's a lot of stuff from various
  people that ought to go in this window.  Starting with
  unionmount/overlayfs mess...  ;-/"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits)
  fs/file_table.c: Update alloc_file() comment
  vfs: Deduplicate code shared by xattr system calls operating on paths
  reiserfs: remove pointless forward declaration of struct nameidata
  don't need that forward declaration of struct nameidata in dcache.h anymore
  take dname_external() into fs/dcache.c
  let path_init() failures treated the same way as subsequent link_path_walk()
  fix misuses of f_count() in ppp and netlink
  ncpfs: use list_for_each_entry() for d_subdirs walk
  vfs: move getname() from callers to do_mount()
  gfs2_atomic_open(): skip lookups on hashed dentry
  [infiniband] remove pointless assignments
  gadgetfs: saner API for gadgetfs_create_file()
  f_fs: saner API for ffs_sb_create_file()
  jfs: don't hash direct inode
  [s390] remove pointless assignment of ->f_op in vmlogrdr ->open()
  ecryptfs: ->f_op is never NULL
  android: ->f_op is never NULL
  nouveau: __iomem misannotations
  missing annotation in fs/file.c
  fs: namespace: suppress 'may be used uninitialized' warnings
  ...
2014-10-13 11:28:42 +02:00
Seunghun Lee
5e6123f347 vfs: move getname() from callers to do_mount()
It would make more sense to pass char __user * instead of
char * in callers of do_mount() and do getname() inside do_mount().

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Seunghun Lee <waydi1@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:16 -04:00
Al Viro
1fa97e8b1f constify file_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:00 -04:00
Jeff Layton
1b2b32dcdb locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
Currently they both just return 0. Fix them to return more appropriate
values instead.

For better or worse, most places in the kernel return -EINVAL when
leases aren't available. -ENOLCK would probably have been better, but
let's follow suit here in the case of F_SETLEASE.

In the F_GETLEASE case, just return F_UNLCK since we know that no
lease will have been set.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:13 -04:00
Jeff Layton
7ca76311fe locks: set fl_owner for leases to filp instead of current->files
Like flock locks, leases are owned by the file description. Now that the
i_have_this_lease check in __break_lease is gone, we don't actually use
the fl_owner for leases for anything. So, it's now safe to set this more
appropriately to the same value as the fl_file.

While we're at it, fix up the comments over the fl_owner_t definition
since they're rather out of date.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-10-07 14:06:13 -04:00
Jeff Layton
4d01b7f5e7 locks: give lm_break a return value
Christoph suggests:

   "Add a return value to lm_break so that the lock manager can tell the
    core code "you can delete this lease right now".  That gets rid of
    the games with the timeout which require all kinds of race avoidance
    code in the users."

Do that here and have the nfsd lease break routine use it when it detects
that there was a race between setting up the lease and it being broken.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:13 -04:00
Jeff Layton
c45198eda2 locks: move freeing of leases outside of i_lock
There was only one place where we still could free a file_lock while
holding the i_lock -- lease_modify. Add a new list_head argument to the
lm_change operation, pass in a private list when calling it, and fix
those callers to dispose of the list once the lock has been dropped.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:13 -04:00
Jeff Layton
1c7dd2ff43 locks: define a lm_setup handler for leases
...and move the fasync setup into it for fcntl lease calls. At the same
time, change the semantics of how the file_lock double-pointer is
handled. Up until now, on a successful lease return you got a pointer to
the lock on the list. This is bad, since that pointer can no longer be
relied on as valid once the inode->i_lock has been released.

Change the code to instead just zero out the pointer if the lease we
passed in ended up being used. Then the callers can just check to see
if it's NULL after the call and free it if it isn't.

The priv argument has the same semantics. The lm_setup function can
zero the pointer out to signal to the caller that it should not be
freed after the function returns.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:12 -04:00
Jeff Layton
e6f5c78930 locks: plumb a "priv" pointer into the setlease routines
In later patches, we're going to add a new lock_manager_operation to
finish setting up the lease while still holding the i_lock.  To do
this, we'll need to pass a little bit of info in the fcntl setlease
case (primarily an fasync structure). Plumb the extra pointer into
there in advance of that.

We declare this pointer as a void ** to make it clear that this is
private info, and that the caller isn't required to set this unless
the lm_setup specifically requires it.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07 14:06:12 -04:00
Jeff Layton
e0b93eddfe security: make security_file_set_fowner, f_setown and __f_setown void return
security_file_set_fowner always returns 0, so make it f_setown and
__f_setown void return functions and fix up the error handling in the
callers.

Cc: linux-security-module@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-09-09 16:01:36 -04:00
Jeff Layton
1c994a0909 locks: consolidate "nolease" routines
GFS2 and NFS have setlease routines that always just return -EINVAL.
Turn that into a generic routine that can live in fs/libfs.c.

Cc: <linux-nfs@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: <cluster-devel@redhat.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-09-09 16:01:36 -04:00
Jeff Layton
699688a416 locks: remove lock_may_read and lock_may_write
There are no callers of these functions.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-09 16:01:09 -04:00
Kinglong Mee
5c97d7b147 locks: New ops in lock_manager_operations for get/put owner
NFSD or other lockmanager may increase the owner's reference,
so adds two new options for copying and releasing owner.

v5: change order from 2/6 to 3/6
v4: rename lm_copy_owner/lm_release_owner to lm_get_owner/lm_put_owner

Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-09 16:01:09 -04:00
Kinglong Mee
3fe0fff18f locks: Rename __locks_copy_lock() to locks_copy_conflock()
Jeff advice, " Right now __locks_copy_lock is only used to copy
conflocks. It would be good to rename that to something more
distinct (i.e.locks_copy_conflock), to make it clear that we're
generating a conflock there."

v5: change order from 3/6 to 2/6
v4: new patch only renaming function name

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-09 16:01:09 -04:00
Joe Perches
d0449b90f8 locks: Remove unused conf argument from lm_grant
This argument is always NULL so don't pass it around.

[jlayton: remove dependencies on previous patches in series]

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-09-09 16:01:06 -04:00
Linus Torvalds
f6f993328b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Stuff in here:

   - acct.c fixes and general rework of mnt_pin mechanism.  That allows
     to go for delayed-mntput stuff, which will permit mntput() on deep
     stack without worrying about stack overflows - fs shutdown will
     happen on shallow stack.  IOW, we can do Eric's umount-on-rmdir
     series without introducing tons of stack overflows on new mntput()
     call chains it introduces.
   - Bruce's d_splice_alias() patches
   - more Miklos' rename() stuff.
   - a couple of regression fixes (stable fodder, in the end of branch)
     and a fix for API idiocy in iov_iter.c.

  There definitely will be another pile, maybe even two.  I'd like to
  get Eric's series in this time, but even if we miss it, it'll go right
  in the beginning of for-next in the next cycle - the tricky part of
  prereqs is in this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
  fix copy_tree() regression
  __generic_file_write_iter(): fix handling of sync error after DIO
  switch iov_iter_get_pages() to passing maximal number of pages
  fs: mark __d_obtain_alias static
  dcache: d_splice_alias should detect loops
  exportfs: update Exporting documentation
  dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED
  dcache: remove unused d_find_alias parameter
  dcache: d_obtain_alias callers don't all want DISCONNECTED
  dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
  dcache: d_splice_alias mustn't create directory aliases
  dcache: close d_move race in d_splice_alias
  dcache: move d_splice_alias
  namei: trivial fix to vfs_rename_dir comment
  VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.
  cifs: support RENAME_NOREPLACE
  hostfs: support rename flags
  shmem: support RENAME_EXCHANGE
  shmem: support RENAME_NOREPLACE
  btrfs: add RENAME_NOREPLACE
  ...
2014-08-11 11:44:11 -07:00
David Herrmann
4bb5f5d939 mm: allow drivers to prevent new writable mappings
This patch (of 6):

The i_mmap_writable field counts existing writable mappings of an
address_space.  To allow drivers to prevent new writable mappings, make
this counter signed and prevent new writable mappings if it is negative.
This is modelled after i_writecount and DENYWRITE.

This will be required by the shmem-sealing infrastructure to prevent any
new writable mappings after the WRITE seal has been set.  In case there
exists a writable mapping, this operation will fail with EBUSY.

Note that we rely on the fact that iff you already own a writable mapping,
you can increase the counter without using the helpers.  This is the same
that we do for i_writecount.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:31 -07:00
Al Viro
215752fce3 acct: get rid of acct_list
Put these suckers on per-vfsmount and per-superblock lists instead.
Note: right now it's still acct_lock for everything, but that's
going to change.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:08 -04:00
Al Viro
ed44724b79 acct: switch to __kernel_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:07 -04:00
Joe Perches
68be302963 fs.h, drivers/hwmon/asus_atk0110.c: fix DEFINE_SIMPLE_ATTRIBUTE semicolon definition and use
The DEFINE_SIMPLE_ATTRIBUTE macro should not end in a ; Fix the one use
in the kernel tree that did not have a semicolon.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:23 -07:00
Christoph Hellwig
17fa388ddc locks: typedef fl_owner_t to void *
fl_owner_t is a cookie that can store all kinds of different pointers,
so don't pretends it points to a file structure.

For now just change the typedef, but as a follow on this will allow
to get rids of lots of casts and eventually the typedef itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-07-13 21:38:49 -04:00
Linus Torvalds
2dfded8210 File locking related bugfixes for v3.16 (pile #2)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTpiwZAAoJEAAOaEEZVoIVplsP/383a9q3eXonbsi+Ea8CGbRl
 tdjjVhM1OY4NZYFAoulILDt3HqPTC6MBnqKlHz+BuMziwd/1+3w8S4E7IEwm/KtM
 ghNYX8ct5Bf1nc5QEdDmwf4PX48QbRwTuT1uIcXaJ+KtxTzI9qN7mnjRN91TUtq4
 WRGvOl0AsGCVq8YqxjztgD3TYbu7AG/72Em+DE9f81PTArAPTo2ySc3gxPuJJAsg
 G1x46Gx46sfqFX2FY4SPsXen+J/67Og67y6eBawxnT2Bp6ZGDuW+jyPRmkhf0yth
 pWAtkUi3XmEe6kk6GHiICsS0Yn0RG4jbz39+Ja+X7jibQVJ8Iz6b+Optw9RNQwYt
 jDWHKFS2AaL/CDejHYOQ1shHcozpRojtIbDLIZ9vTNTQ2r5cdaBvkXMmQzdoktmN
 wQtQ9AzBl8fHOFOQeCAwd/ZfCLIotvLoLds3K/CSqmpsyK2+9IyriQLKKZ2xm6Iu
 8+UUspGQcNVwcMP6YWtI6G+u58/mVanmK6dtpiyXrncZLAfU4H7ETL2IPu8jJTbv
 kTFCOJtXQzNZa5Xqur1hIewOG9/RlvAZAnii0Ghc3nXTWCCeNeI8re3jw4g9KdRv
 33t4sYfJld8LQ1NSMqIDyAs+fvytmGurYt+uhVpb58G/4CqBLNpmmIGIQ6LFLMfs
 75FQnbAezrD0H/JyAHUk
 =AJD3
 -----END PGP SIGNATURE-----

Merge tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux

Pull file locking fixes from Jeff Layton:
 "File locking related bugfixes

  Nothing too earth-shattering here.  A fix for a potential regression
  due to a patch in pile #1, and the addition of a memory barrier to
  prevent a race condition between break_deleg and generic_add_lease"

* tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux:
  locks: set fl_owner for leases back to current->files
  locks: add missing memory barrier in break_deleg
2014-06-21 16:40:30 -10:00
Linus Torvalds
16b9057804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "This the bunch that sat in -next + lock_parent() fix.  This is the
  minimal set; there's more pending stuff.

  In particular, I really hope to get acct.c fixes merged this cycle -
  we need that to deal sanely with delayed-mntput stuff.  In the next
  pile, hopefully - that series is fairly short and localized
  (kernel/acct.c, fs/super.c and fs/namespace.c).  In this pile: more
  iov_iter work.  Most of prereqs for ->splice_write with sane locking
  order are there and Kent's dio rewrite would also fit nicely on top of
  this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
  lock_parent: don't step on stale ->d_parent of all-but-freed one
  kill generic_file_splice_write()
  ceph: switch to iter_file_splice_write()
  shmem: switch to iter_file_splice_write()
  nfs: switch to iter_splice_write_file()
  fs/splice.c: remove unneeded exports
  ocfs2: switch to iter_file_splice_write()
  ->splice_write() via ->write_iter()
  bio_vec-backed iov_iter
  optimize copy_page_{to,from}_iter()
  bury generic_file_aio_{read,write}
  lustre: get rid of messing with iovecs
  ceph: switch to ->write_iter()
  ceph_sync_direct_write: stop poking into iov_iter guts
  ceph_sync_read: stop poking into iov_iter guts
  new helper: copy_page_from_iter()
  fuse: switch to ->write_iter()
  btrfs: switch to ->write_iter()
  ocfs2: switch to ->write_iter()
  xfs: switch to ->write_iter()
  ...
2014-06-12 10:30:18 -07:00
Al Viro
5f07385060 kill generic_file_splice_write()
no callers left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-12 00:21:13 -04:00
Al Viro
8d0207652c ->splice_write() via ->write_iter()
iter_file_splice_write() - a ->splice_write() instance that gathers the
pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
it to ->write_iter().  A bunch of simple cases coverted to that...

[AV: fixed the braino spotted by Cyrill]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-12 00:18:51 -04:00
Jeff Layton
962bd40bc3 locks: add missing memory barrier in break_deleg
break_deleg is subject to the same potential race as break_lease. Add
a memory barrier to prevent it.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-06-10 12:24:40 -04:00
Fabian Frederick
ac13a829f6 fs/libfs.c: add generic data flush to fsync
Description by Jan Kara:
 "A lot of older filesystems don't properly flush volatile disk caches
  on fsync(2) which can lead to loss of fsynced data after power failure.

This patch makes generic_file_fsync() issue proper cache flush to fix the
problem.  Sysadmin can use /sys/devices/.../cache_type to tell the system
it should not send the cache flush."

[akpm@linux-foundation.org: nuke ifdef]
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:55 -07:00
Al Viro
6abd232274 bury generic_file_aio_{read,write}
no callers left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:39:44 -04:00
Al Viro
a8f3550cd2 bury __generic_file_aio_write()
all users converted to __generic_file_write_iter() now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:39:37 -04:00
Al Viro
1456c0a87c blkdev_aio_write() - turn into blkdev_write_iter()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:38:01 -04:00
Al Viro
8174202b34 write_iter variants of {__,}generic_file_aio_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:38:00 -04:00
Al Viro
293bc9822f new methods: ->read_iter() and ->write_iter()
Beginning to introduce those.  Just the callers for now, and it's
clumsier than it'll eventually become; once we finish converting
aio_read and aio_write instances, the things will get nicer.

For now, these guys are in parallel to ->aio_read() and ->aio_write();
they take iocb and iov_iter, with everything in iov_iter already
validated.  File offset is passed in iocb->ki_pos, iov/nr_segs -
in iov_iter.

Main concerns in that series are stack footprint and ability to
split the damn thing cleanly.

[fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:36:00 -04:00
Al Viro
7f7f25e82d replace checking for ->read/->aio_read presence with check in ->f_mode
Since we are about to introduce new methods (read_iter/write_iter), the
tests in a bunch of places would have to grow inconveniently.  Check
once (at open() time) and store results in ->f_mode as FMODE_CAN_READ
and FMODE_CAN_WRITE resp.  It might end up being a temporary measure -
once everything switches from ->aio_{read,write} to ->{read,write}_iter
it might make sense to return to open-coded checks.  We'll see...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:55 -04:00
Al Viro
0c949334a9 iov_iter_truncate()
Now It Can Be Done(tm) - we don't need to do iov_shorten() in
generic_file_direct_write() anymore, now that all ->direct_IO()
instances are converted to proper iov_iter methods and honour
iter->count and iter->iov_offset properly.

Get rid of count/ocount arguments of generic_file_direct_write(),
while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:54 -04:00
Al Viro
ed978a811e new helper: generic_file_read_iter()
iov_iter-using variant of generic_file_aio_read().  Some callers
converted.  Note that it's still not quite there for use as ->read_iter() -
we depend on having zero iter->iov_offset in O_DIRECT case.  Fortunately,
that's true for all converted callers (and for generic_file_aio_read() itself).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:49 -04:00
Al Viro
31b140398c switch {__,}blockdev_direct_IO() to iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:46 -04:00
Al Viro
d8d3d94b80 pass iov_iter to ->direct_IO()
unmodified, for now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:44 -04:00
Al Viro
cb66a7a1f1 kill generic_segment_checks()
all callers of ->aio_read() and ->aio_write() have iov/nr_segs already
checked - generic_segment_checks() done after that is just an odd way
to spell iov_length().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:43 -04:00
Al Viro
f8579f8673 generic_file_direct_write(): switch to iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:42 -04:00
Jeff Layton
cff2fce58b locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" instead
File-private locks have been re-christened as "open file description"
locks.  Finish the symbol name cleanup in the internal implementation.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-04-23 16:17:03 -04:00
Linus Torvalds
5166701b36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "The first vfs pile, with deep apologies for being very late in this
  window.

  Assorted cleanups and fixes, plus a large preparatory part of iov_iter
  work.  There's a lot more of that, but it'll probably go into the next
  merge window - it *does* shape up nicely, removes a lot of
  boilerplate, gets rid of locking inconsistencie between aio_write and
  splice_write and I hope to get Kent's direct-io rewrite merged into
  the same queue, but some of the stuff after this point is having
  (mostly trivial) conflicts with the things already merged into
  mainline and with some I want more testing.

  This one passes LTP and xfstests without regressions, in addition to
  usual beating.  BTW, readahead02 in ltp syscalls testsuite has started
  giving failures since "mm/readahead.c: fix readahead failure for
  memoryless NUMA nodes and limit readahead pages" - might be a false
  positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  missing bits of "splice: fix racy pipe->buffers uses"
  cifs: fix the race in cifs_writev()
  ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
  kill generic_file_buffered_write()
  ocfs2_file_aio_write(): switch to generic_perform_write()
  ceph_aio_write(): switch to generic_perform_write()
  xfs_file_buffered_aio_write(): switch to generic_perform_write()
  export generic_perform_write(), start getting rid of generic_file_buffer_write()
  generic_file_direct_write(): get rid of ppos argument
  btrfs_file_aio_write(): get rid of ppos
  kill the 5th argument of generic_file_buffered_write()
  kill the 4th argument of __generic_file_aio_write()
  lustre: don't open-code kernel_recvmsg()
  ocfs2: don't open-code kernel_recvmsg()
  drbd: don't open-code kernel_recvmsg()
  constify blk_rq_map_user_iov() and friends
  lustre: switch to kernel_sendmsg()
  ocfs2: don't open-code kernel_sendmsg()
  take iov_iter stuff to mm/iov_iter.c
  process_vm_access: tidy up a bit
  ...
2014-04-12 14:49:50 -07:00
Linus Torvalds
d15e03104e xfs: update for 3.15-rc1
The main changes in the XFS tree for 3.15-rc1 are:
 
         - O_TMPFILE support
         - allowing AIO+DIO writes beyond EOF
         - FALLOC_FL_COLLAPSE_RANGE support for fallocate syscall and XFS
           implementation
         - FALLOC_FL_ZERO_RANGE support for fallocate syscall and XFS
           implementation
         - IO verifier cleanup and rework
         - stack usage reduction changes
         - vm_map_ram NOIO context fixes to remove lockdep warings
         - various bug fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJTPykMAAoJEK3oKUf0dfod/KoP/jKQwzQPdtT8EtAu5vENh9AO
 55zwCDXXFjCNIGIFPkrUBQbbARVAqhLZn3vuLUUhqtRRELdgJy/yFKZ37MPd8bhU
 dKetivEB192Jcd6Sn74vsOsNLm1u9mJqbQ1aothz0TiOrkkWFZlz4Otu36MZRHN3
 9WgZXWSxr6I/hYHGyCorJWZ5ISs0XD3vR5dYXYeZChbTpTxlCT4X/YgUtW4WH/Tq
 y4gG0fKfwr9KK07/LXuQgUuZGU8vwVuNNsXPhqh+FZ39SLD2Ea83h46Hzf/+vVNI
 kCIyYN1y40uBWczmwAptVEnUwhpGK8PzNrhKwTZICDtuct9sikf7c+o0aEE9lcqo
 8IBt0Dy4l7BQVFSZOjYo5Jw5a8jAbkh47zru31HxogEVqafdz80iWB12JagOOnXM
 v/McvDvZMyfgGckih32FM4G7ElvTYgGai5/3dLhfMuhc4/DdwcBOF1yHmFmnjhWO
 QRsQxLdefUtP3MfMYKaJHM6v2wE1S2l0owgp+HdPluNiOUmH/fqFq1WpHxqqeRPz
 nuHF8oYlxaZP5WAarz6Yf1/twIeZJ1rTD8np8uocvMqQJzMYJgrQyH+xJqjJaITR
 iveQcEoRB8D7/fXMGDdcjZYE2fG4l4JE2kuh97k5NZw76e3v2YXSGh0kd9WqR1uN
 t07joLRQKR2pJuSmuD5E
 =uSkJ
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Dave Chinner:
 "There are a couple of new fallocate features in this request - it was
  decided that it was easiest to push them through the XFS tree using
  topic branches and have the ext4 support be based on those branches.
  Hence you may see some overlap with the ext4 tree merge depending on
  how they including those topic branches into their tree.  Other than
  that, there is O_TMPFILE support, some cleanups and bug fixes.

  The main changes in the XFS tree for 3.15-rc1 are:

   - O_TMPFILE support
   - allowing AIO+DIO writes beyond EOF
   - FALLOC_FL_COLLAPSE_RANGE support for fallocate syscall and XFS
     implementation
   - FALLOC_FL_ZERO_RANGE support for fallocate syscall and XFS
     implementation
   - IO verifier cleanup and rework
   - stack usage reduction changes
   - vm_map_ram NOIO context fixes to remove lockdep warings
   - various bug fixes and cleanups"

* tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfs: (34 commits)
  xfs: fix directory hash ordering bug
  xfs: extra semi-colon breaks a condition
  xfs: Add support for FALLOC_FL_ZERO_RANGE
  fs: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
  xfs: inode log reservations are still too small
  xfs: xfs_check_page_type buffer checks need help
  xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation
  xfs: use NOIO contexts for vm_map_ram
  xfs: don't leak EFSBADCRC to userspace
  xfs: fix directory inode iolock lockdep false positive
  xfs: allocate xfs_da_args to reduce stack footprint
  xfs: always do log forces via the workqueue
  xfs: modify verifiers to differentiate CRC from other errors
  xfs: print useful caller information in xfs_error_report
  xfs: add xfs_verifier_error()
  xfs: add helper for updating checksums on xfs_bufs
  xfs: add helper for verifying checksums on xfs_bufs
  xfs: Use defines for CRC offsets in all cases
  xfs: skip pointless CRC updates after verifier failures
  xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
  ...
2014-04-04 15:50:08 -07:00
Linus Torvalds
24e7ea3bea Major changes for 3.14 include support for the newly added ZERO_RANGE
and COLLAPSE_RANGE fallocate operations, and scalability improvements
 in the jbd2 layer and in xattr handling when the extended attributes
 spill over into an external block.
 
 Other than that, the usual clean ups and minor bug fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y
 Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A
 ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI
 kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd
 bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15
 a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs
 rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa
 ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/
 +165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA
 2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr
 700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU
 DrPKlXwIgva7zq5/S0Vr
 =4s1Z
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Major changes for 3.14 include support for the newly added ZERO_RANGE
  and COLLAPSE_RANGE fallocate operations, and scalability improvements
  in the jbd2 layer and in xattr handling when the extended attributes
  spill over into an external block.

  Other than that, the usual clean ups and minor bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
  ext4: fix premature freeing of partial clusters split across leaf blocks
  ext4: remove unneeded test of ret variable
  ext4: fix comment typo
  ext4: make ext4_block_zero_page_range static
  ext4: atomically set inode->i_flags in ext4_set_inode_flags()
  ext4: optimize Hurd tests when reading/writing inodes
  ext4: kill i_version support for Hurd-castrated file systems
  ext4: each filesystem creates and uses its own mb_cache
  fs/mbcache.c: doucple the locking of local from global data
  fs/mbcache.c: change block and index hash chain to hlist_bl_node
  ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
  ext4: refactor ext4_fallocate code
  ext4: Update inode i_size after the preallocation
  ext4: fix partial cluster handling for bigalloc file systems
  ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
  ext4: only call sync_filesystm() when remounting read-only
  fs: push sync_filesystem() down to the file system's remount_fs()
  jbd2: improve error messages for inconsistent journal heads
  jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
  jbd2: minimize region locked by j_list_lock in journal_get_create_access()
  ...
2014-04-04 15:39:39 -07:00
Linus Torvalds
f7789dc0d4 Merge branch 'locks-3.15' of git://git.samba.org/jlayton/linux
Pull file locking updates from Jeff Layton:
 "Highlights:

   - maintainership change for fs/locks.c.  Willy's not interested in
     maintaining it these days, and is OK with Bruce and I taking it.
   - fix for open vs setlease race that Al ID'ed
   - cleanup and consolidation of file locking code
   - eliminate unneeded BUG() call
   - merge of file-private lock implementation"

* 'locks-3.15' of git://git.samba.org/jlayton/linux:
  locks: make locks_mandatory_area check for file-private locks
  locks: fix locks_mandatory_locked to respect file-private locks
  locks: require that flock->l_pid be set to 0 for file-private locks
  locks: add new fcntl cmd values for handling file private locks
  locks: skip deadlock detection on FL_FILE_PVT locks
  locks: pass the cmd value to fcntl_getlk/getlk64
  locks: report l_pid as -1 for FL_FILE_PVT locks
  locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT"
  locks: rename locks_remove_flock to locks_remove_file
  locks: consolidate checks for compatible filp->f_mode values in setlk handlers
  locks: fix posix lock range overflow handling
  locks: eliminate BUG() call when there's an unexpected lock on file close
  locks: add __acquires and __releases annotations to locks_start and locks_stop
  locks: remove "inline" qualifier from fl_link manipulation functions
  locks: clean up comment typo
  locks: close potential race between setlease and open
  MAINTAINERS: update entry for fs/locks.c
2014-04-04 14:21:20 -07:00
Linus Torvalds
7df934526c Merge branch 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull renameat2 system call from Miklos Szeredi:
 "This adds a new syscall, renameat2(), which is the same as renameat()
  but with a flags argument.

  The purpose of extending rename is to add cross-rename, a symmetric
  variant of rename, which exchanges the two files.  This allows
  interesting things, which were not possible before, for example
  atomically replacing a directory tree with a symlink, etc...  This
  also allows overlayfs and friends to operate on whiteouts atomically.

  Andy Lutomirski also suggested a "noreplace" flag, which disables the
  overwriting behavior of rename.

  These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only
  implemented for ext4 as an example and for testing"

* 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ext4: add cross rename support
  ext4: rename: split out helper functions
  ext4: rename: move EMLINK check up
  ext4: rename: create ext4_renament structure for local vars
  vfs: add cross-rename
  vfs: lock_two_nondirectories: allow directory args
  security: add flags to rename hooks
  vfs: add RENAME_NOREPLACE flag
  vfs: add renameat2 syscall
  vfs: rename: use common code for dir and non-dir
  vfs: rename: move d_move() up
  vfs: add d_is_dir()
2014-04-04 14:03:05 -07:00
Johannes Weiner
91b0abe36a mm + fs: store shadow entries in page cache
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page.  As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently.  At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.

Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate.  Reclaim will check
for this flag before installing shadow pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
Linus Torvalds
bea803183e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Apart from reordering the SELinux mmap code to ensure DAC is called
  before MAC, these are minor maintenance updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
  selinux: correctly label /proc inodes in use before the policy is loaded
  selinux: put the mmap() DAC controls before the MAC controls
  selinux: fix the output of ./scripts/get_maintainer.pl for SELinux
  evm: enable key retention service automatically
  ima: skip memory allocation for empty files
  evm: EVM does not use MD5
  ima: return d_name.name if d_path fails
  integrity: fix checkpatch errors
  ima: fix erroneous removal of security.ima xattr
  security: integrity: Use a more current logging style
  MAINTAINERS: email updates and other misc. changes
  ima: reduce memory usage when a template containing the n field is used
  ima: restore the original behavior for sending data with ima template
  Integrity: Pass commname via get_task_comm()
  fs: move i_readcount
  ima: use static const char array definitions
  security: have cap_dentry_init_security return error
  ima: new helper: file_inode(file)
  kernel: Mark function as static in kernel/seccomp.c
  capability: Use current logging styles
  ...
2014-04-03 09:26:18 -07:00
Al Viro
ccad236566 kill generic_file_buffered_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:38 -04:00