Commit graph

25720 commits

Author SHA1 Message Date
Jan Kara
697e6fed9f writeback: Remove outdated comment
The comment is hopelessly outdated and misplaced. We no longer have 'bdi'
part of writeback work, the comment about blockdev super is outdated,
comment about throttling as well. Information about list handling is in
more detail at queue_io(). So just move the bit about older_than_this to
close to move_expired_inodes() and remove the rest.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-03-21 15:27:08 +08:00
Jan Kara
f469ec9c5b fs: Remove bogus wait in write_inode_now()
inode_sync_wait() in write_inode_now() is just bogus. That function waits for
I_SYNC bit to be cleared but writeback_single_inode() clears the bit on return
so the wait is effectivelly a nop unless someone else submits the inode for
writeback again. All the waiting write_inode_now() needs is achieved by using
WB_SYNC_ALL writeback mode.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-03-21 15:26:47 +08:00
Linus Torvalds
71fece9511 Merge git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix dentry refcount leak when opening a FIFO on lookup
  CIFS: Fix mkdir/rmdir bug for the non-POSIX case
2012-03-06 16:55:50 -08:00
Linus Torvalds
3e85fb9cd4 Merge branch 'akpm' (Andrew's patch bomb)
Merge the emailed seties of 19 patches from Andrew Morton

* akpm:
  rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler
  memcg: fix mapcount check in move charge code for anonymous page
  mm: thp: fix BUG on mm->nr_ptes
  alpha: fix 32/64-bit bug in futex support
  memcg: fix GPF when cgroup removal races with last exit
  debugobjects: Fix selftest for static warnings
  floppy/scsi: fix setting of BIO flags
  memcg: fix deadlock by inverting lrucare nesting
  drivers/rtc/rtc-r9701.c: fix crash in r9701_remove()
  c2port: class_create() returns an ERR_PTR
  pps: class_create() returns an ERR_PTR, not NULL
  hung_task: fix the broken rcu_lock_break() logic
  vfork: kill PF_STARTING
  coredump_wait: don't call complete_vfork_done()
  vfork: make it killable
  vfork: introduce complete_vfork_done()
  aio: wake up waiters when freeing unused kiocbs
  kprobes: return proper error code from register_kprobe()
  kmsg_dump: don't run on non-error paths by default
2012-03-05 15:50:25 -08:00
Oleg Nesterov
57b59c4a14 coredump_wait: don't call complete_vfork_done()
Now that CLONE_VFORK is killable, coredump_wait() no longer needs
complete_vfork_done().  zap_threads() should find and kill all tasks with
the same ->mm, this includes our parent if ->vfork_done is set.

mm_release() becomes the only caller, unexport complete_vfork_done().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Oleg Nesterov
c415c3b47e vfork: introduce complete_vfork_done()
No functional changes.

Move the clear-and-complete-vfork_done code into the new trivial helper,
complete_vfork_done().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Jeff Moyer
880641bb9d aio: wake up waiters when freeing unused kiocbs
Bart Van Assche reported a hung fio process when either hot-removing
storage or when interrupting the fio process itself.  The (pruned) call
trace for the latter looks like so:

  fio             D 0000000000000001     0  6849   6848 0x00000004
   ffff880092541b88 0000000000000046 ffff880000000000 ffff88012fa11dc0
   ffff88012404be70 ffff880092541fd8 ffff880092541fd8 ffff880092541fd8
   ffff880128b894d0 ffff88012404be70 ffff880092541b88 000000018106f24d
  Call Trace:
    schedule+0x3f/0x60
    io_schedule+0x8f/0xd0
    wait_for_all_aios+0xc0/0x100
    exit_aio+0x55/0xc0
    mmput+0x2d/0x110
    exit_mm+0x10d/0x130
    do_exit+0x671/0x860
    do_group_exit+0x44/0xb0
    get_signal_to_deliver+0x218/0x5a0
    do_signal+0x65/0x700
    do_notify_resume+0x65/0x80
    int_signal+0x12/0x17

The problem lies with the allocation batching code.  It will
opportunistically allocate kiocbs, and then trim back the list of iocbs
when there is not enough room in the completion ring to hold all of the
events.

In the case above, what happens is that the pruning back of events ends
up freeing up the last active request and the context is marked as dead,
so it is thus responsible for waking up waiters.  Unfortunately, the
code does not check for this condition, so we end up with a hung task.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org>		[3.2.x only]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:42 -08:00
Al Viro
6414fa6a15 aout: move setup_arg_pages() prior to reading/mapping the binary
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 13:51:32 -08:00
Linus Torvalds
5483f18e98 vfs: move dentry_cmp from <linux/dcache.h> to fs/dcache.c
It's only used inside fs/dcache.c, and we're going to play games with it
for the word-at-a-time patches.  This time we really don't even want to
export it, because it really is an internal function to fs/dcache.c, and
has been since it was introduced.

Having it in that extremely hot header file (it's included in pretty
much everything, thanks to <linux/fs.h>) is a disaster for testing
different versions, and is utterly pointless.

We really should have some kind of header file diet thing, where we
figure out which parts of header files are really better off private and
only result in more expensive compiles.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-04 15:51:42 -08:00
Linus Torvalds
ae942ae719 vfs: export full_name_hash() function to modules
Commit 5707c87f "vfs: uninline full_name_hash()" broke the modular
build, because it needs exporting now that it isn't inlined any more.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 19:40:57 -08:00
Linus Torvalds
200e9ef7ab vfs: split up name hashing in link_path_walk() into helper function
The code in link_path_walk() that finds out the length and the hash of
the next path component is some of the hottest code in the kernel.  And
I have a version of it that does things at the full width of the CPU
wordsize at a time, but that means that we *really* want to split it up
into a separate helper function.

So this re-organizes the code a bit and splits the hashing part into a
helper function called "hash_name()".  It returns the length of the
pathname component, while at the same time computing and writing the
hash to the appropriate location.

The code generation is slightly changed by this patch, but generally for
the better - and the added abstraction actually makes the code easier to
read too.  And the new interface is well suited for replacing just the
"hash_name()" function with alternative implementations.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 14:49:24 -08:00
Linus Torvalds
0145acc202 vfs: uninline full_name_hash()
.. and also use it in lookup_one_len() rather than open-coding it.

There aren't any performance-critical users, so inlining it is silly.
But it wouldn't matter if it wasn't for the fact that the word-at-a-time
dentry name patches want to conditionally replace the function, and
uninlining it sets the stage for that.

So again, this is a preparatory patch that doesn't change any semantics,
and only prepares for a much cleaner and testable word-at-a-time dentry
name accessor patch.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 14:32:59 -08:00
Linus Torvalds
8966be9030 vfs: trivial __d_lookup_rcu() cleanups
These don't change any semantics, but they clean up the code a bit and
mark some arguments appropriately 'const'.

They came up as I was doing the word-at-a-time dcache name accessor
code, and cleaning this up now allows me to send out a smaller relevant
interesting patch for the experimental stuff.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 14:23:30 -08:00
H. Peter Anvin
c8e252586f regset: Prevent null pointer reference on readonly regsets
The regset common infrastructure assumed that regsets would always
have .get and .set methods, but not necessarily .active methods.
Unfortunately people have since written regsets without .set methods.

Rather than putting in stub functions everywhere, handle regsets with
null .get or .set methods explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-02 11:38:15 -08:00
Randy Dunlap
164974a8f2 ecryptfs: fix printk format warning for size_t
Fix printk format warning (from Linus's suggestion):

on i386:
  fs/ecryptfs/miscdev.c:433:38: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'unsigned int'

and on x86_64:
  fs/ecryptfs/miscdev.c:433:38: warning: format '%u' expects type 'unsigned int', but argument 4 has type 'long unsigned int'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc:	Geert Uytterhoeven <geert@linux-m68k.org>
Cc:	Tyler Hicks <tyhicks@canonical.com>
Cc:	Dustin Kirkland <dustin.kirkland@gazzang.com>
Cc:	ecryptfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-28 16:55:30 -08:00
Steven Whitehouse
a365fbf354 GFS2: Read resource groups on mount
This makes mount take slightly longer, but at the same time, the first
write to the filesystem will be faster too. It also means that if there
is a problem in the resource index, then we can refuse to mount rather
than having to try and report that when the first write occurs.

In addition, to avoid recursive locking, we hvae to take account of
instances when the rindex glock may already be held when we are
trying to update the rbtree of resource groups.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 09:52:39 +00:00
Bob Peterson
9e73f571ea GFS2: Ensure rindex is uptodate for fallocate
This patch fixes a problem whereby gfs2_grow was failing and causing GFS2
to assert. The problem was that when GFS2's fallocate operation tried to
acquire an "allocation" it made sure the rindex was up to date, and if not,
it called gfs2_rindex_update. However, if the file being fallocated was
the rindex itself, it was already locked at that point. By calling
gfs2_rindex_update at an earlier point in time, we bring rindex up to date
and thereby avoid trying to lock it when the "allocation" is acquired.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 09:48:30 +00:00
Bob Peterson
718b97bd6b GFS2: Read in rindex if necessary during unlink
This patch fixes a problem whereby you were unable to delete
files until other file system operations were done (such as
statfs, touch, writes, etc.) that caused the rindex to be
read in.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 09:48:02 +00:00
Steven Whitehouse
4043b886b0 GFS2: Fix race between lru_list and glock ref count
This patch fixes a narrow race window between the glock ref count
hitting zero and glocks being removed from the lru_list.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28 09:43:07 +00:00
Anton Altaparmakov
f621c53343 Merge branch 'master' of /Volumes/CaseSensitiveDisk/linux 2012-02-27 09:01:22 +00:00
Jeff Layton
5bccda0ebc cifs: fix dentry refcount leak when opening a FIFO on lookup
The cifs code will attempt to open files on lookup under certain
circumstances. What happens though if we find that the file we opened
was actually a FIFO or other special file?

Currently, the open filehandle just ends up being leaked leading to
a dentry refcount mismatch and oops on umount. Fix this by having the
code close the filehandle on the server if it turns out not to be a
regular file. While we're at it, change this spaghetti if statement
into a switch too.

Cc: stable@vger.kernel.org
Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-02-26 23:16:26 -06:00
Pavel Shilovsky
6de2ce4231 CIFS: Fix mkdir/rmdir bug for the non-POSIX case
Currently we do inc/drop_nlink for a parent directory for every
mkdir/rmdir calls. That's wrong when Unix extensions are disabled
because in this case a server doesn't follow the same semantic and
returns the old value on the next QueryInfo request. As the result,
we update our value with the server one and then decrement it on
every rmdir call - go to negative nlink values.

Fix this by removing inc/drop_nlink for the parent directory from
mkdir/rmdir, setting it for a revalidation and ignoring NumberOfLinks
for directories when Unix extensions are disabled.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-02-26 22:59:43 -06:00
Ian Kent
a32744d4ab autofs: work around unhappy compat problem on x86-64
When the autofs protocol version 5 packet type was added in commit
5c0a32fc2c ("autofs4: add new packet type for v5 communications"), it
obvously tried quite hard to be word-size agnostic, and uses explicitly
sized fields that are all correctly aligned.

However, with the final "char name[NAME_MAX+1]" array at the end, the
actual size of the structure ends up being not very well defined:
because the struct isn't marked 'packed', doing a "sizeof()" on it will
align the size of the struct up to the biggest alignment of the members
it has.

And despite all the members being the same, the alignment of them is
different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte
alignment on x86-64.  And while 'NAME_MAX+1' ends up being a nice round
number (256), the name[] array starts out a 4-byte aligned.

End result: the "packed" size of the structure is 300 bytes: 4-byte, but
not 8-byte aligned.

As a result, despite all the fields being in the same place on all
architectures, sizeof() will round up that size to 304 bytes on
architectures that have 8-byte alignment for u64.

Note that this is *not* a problem for 32-bit compat mode on POWER, since
there __u64 is 8-byte aligned even in 32-bit mode.  But on x86, 32-bit
and 64-bit alignment is different for 64-bit entities, and as a result
the structure that has exactly the same layout has different sizes.

So on x86-64, but no other architecture, we will just subtract 4 from
the size of the structure when running in a compat task.  That way we
will write the properly sized packet that user mode expects.

Not pretty.  Sadly, this very subtle, and unnecessary, size difference
has been encoded in user space that wants to read packets of *exactly*
the right size, and will refuse to touch anything else.

Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-25 12:10:27 -08:00
Oleg Nesterov
971316f050 epoll: ep_unregister_pollwait() can use the freed pwq->whead
signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.

Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL under
rcu_read_lock() before remove_wait_queue(). We add the new helper,
ep_remove_wait_queue(), for this.

This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because
->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand.
ep_unregister_pollwait()->remove_wait_queue() can play with already
freed and potentially reused ->sighand, but this is fine. This memory
must have the valid ->signalfd_wqh until rcu_read_unlock().

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-24 11:42:50 -08:00
Oleg Nesterov
d80e731eca epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()
This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

	- we do not have wake_up_all_poll() but wake_up_poll()
	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
	  we need a couple of simple changes in eventpoll.c to
	  make sure it can't be "lost".

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-24 11:42:50 -08:00
Linus Torvalds
855a85f704 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Quoth Chris:
 "This is later than I wanted because I got backed up running through
  btrfs bugs from the Oracle QA teams.  But they are all bug fixes that
  we've queued and tested since rc1.

  Nothing in particular stands out, this just reflects bug fixing and QA
  done in parallel by all the btrfs developers.  The most user visible
  of these is:

    Btrfs: clear the extent uptodate bits during parent transid failures

  Because that helps deal with out of date drives (say an iscsi disk
  that has gone away and come back).  The old code wasn't always
  properly retrying the other mirror for this type of failure."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
  Btrfs: fix compiler warnings on 32 bit systems
  Btrfs: increase the global block reserve estimates
  Btrfs: clear the extent uptodate bits during parent transid failures
  Btrfs: add extra sanity checks on the path names in btrfs_mksubvol
  Btrfs: make sure we update latest_bdev
  Btrfs: improve error handling for btrfs_insert_dir_item callers
  Btrfs: be less strict on finding next node in clear_extent_bit
  Btrfs: fix a bug on overcommit stuff
  Btrfs: kick out redundant stuff in convert_extent_bit
  Btrfs: skip states when they does not contain bits to clear
  Btrfs: check return value of lookup_extent_mapping() correctly
  Btrfs: fix deadlock on page lock when doing auto-defragment
  Btrfs: fix return value check of extent_io_ops
  btrfs: honor umask when creating subvol root
  btrfs: silence warning in raid array setup
  btrfs: fix structs where bitfields and spinlock/atomic share 8B word
  btrfs: delalloc for page dirtied out-of-band in fixup worker
  Btrfs: fix memory leak in load_free_space_cache()
  btrfs: don't check DUP chunks twice
  Btrfs: fix trim 0 bytes after a device delete
  ...
2012-02-24 09:02:53 -08:00
Chris Mason
e77266e4c4 Btrfs: fix compiler warnings on 32 bit systems
The enospc tracing code added some interesting uses of
u64 pointer casts.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-02-24 10:39:05 -05:00
Anton Altaparmakov
0afa1b62e3 Merge branch 'master' of /Volumes/CaseSensitiveDisk/linux 2012-02-24 09:39:16 +00:00
Anton Altaparmakov
9b556248ec NTFS: Correct two spelling errors "dealocate" to "deallocate" in mft.c.
From: Masanari Iida <standby24x7@gmail.com>

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
2012-02-24 09:17:09 +00:00
Anton Altaparmakov
37fbf4bfb8 Restore direct_io / truncate locking API
With kernel 3.1, Christoph removed i_alloc_sem and replaced it with
calls (namely inode_dio_wait() and inode_dio_done()) which are
EXPORT_SYMBOL_GPL() thus they cannot be used by non-GPL file systems and
further inode_dio_wait() was pushed from notify_change() into the file
system ->setattr() method but no non-GPL file system can make this call.

That means non-GPL file systems cannot exist any more unless they do not
use any VFS functionality related to reading/writing as far as I can
tell or at least as long as they want to implement direct i/o.

Both Linus and Al (and others) have said on LKML that this breakage of
the VFS API should not have happened and that the change was simply
missed as it was not documented in the change logs of the patches that
did those changes.

This patch changes the two function exports in question to be
EXPORT_SYMBOL() thus restoring the VFS API as it used to be - accessible
for all modules.

Christoph, who introduced the two functions and exported them GPL-only
is CC-ed on this patch to give him the opportunity to object to the
symbols being changed in this manner if he did indeed intend them to be
GPL-only and does not want them to become available to all modules.

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-23 15:56:21 -08:00
Linus Torvalds
bb4c7e9a99 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
A fix from Jesper Juhl removes an assignment in an ASSERT when a compare
is intended.  Two fixes from Mitsuo Hayasaka address off-by-ones in XFS
quota enforcement.

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: make inode quota check more general
  xfs: change available ranges of softlimit and hardlimit in quota check
  XFS: xfs_trans_add_item() - don't assign in ASSERT() when compare is intended
2012-02-23 15:38:57 -08:00
Liu Bo
5500cdbe14 Btrfs: increase the global block reserve estimates
When doing IO with large amounts of data fragmentation, the global block
reserve calulations are too low.  This increases them to avoid
ENOSPC crashes.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-02-23 10:49:04 -05:00
Chris Mason
5065319052 Btrfs: clear the extent uptodate bits during parent transid failures
If btrfs reads a block and finds a parent transid mismatch, it clears
the uptodate flags on the extent buffer, and the pages inside it.  But
we only clear the uptodate bits in the state tree if the block straddles
more than one page.

This is from an old optimization from to reduce contention on the extent
state tree.  But it is buggy because the code that retries a read from
a different copy of the block is going to find the uptodate state bits
set and skip the IO.

The end result of the bug is that we'll never actually read the good
copy (if there is one).

The fix here is to always clear the uptodate state bits, which is safe
because this code is only called when the parent transid fails.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-02-23 10:43:45 -05:00
Chris Mason
16780cabb8 Btrfs: add extra sanity checks on the path names in btrfs_mksubvol
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-02-23 10:43:45 -05:00
Chris Mason
a6b0d5c8db Btrfs: make sure we update latest_bdev
When we are setting up the mount, we close all the
devices that were not actually part of the metadata we found.

But, we don't make sure that one of those devices wasn't
fs_devices->latest_bdev, which means we can do a use after free
on the one we closed.

This updates latest_bdev as it goes.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-02-23 10:43:45 -05:00
Chris Mason
fe66a05a06 Btrfs: improve error handling for btrfs_insert_dir_item callers
This allows us to gracefully continue if we aren't able to insert
directory items, both for normal files/dirs and snapshots.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-02-23 10:43:45 -05:00
Linus Torvalds
437cf4c7b7 Bugfixes for the NFS client.
Fix a nasty Oops in the NFSv4 getacl code, another source of infinite loops
 in the NFSv4 state recovery code, and a regression in NFSv4.1 session
 initialisation.
 Also deal with an NFSv4.1 memory leak.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPRLALAAoJEGcL54qWCgDy5yoP/0eKjYERpqf00ETRnmQw6ngt
 PCaC33mUwfsJlLdSfW6hhG+2IhKkiWR4mraCo1Es9CYS7eSsPU9/djK62neHZG3s
 lJF2xcPlfvbiYtbyG2MsmtRffUl/XRbLLMfZMADgG4iQy1y4uTDxsaxcKrBIu5ig
 mQWlEpsYeE9TLea3Qvw6oHiXMKkrwdeR9Et5aGo3Y5hxbpDhD86yR1cyw2ds2aUP
 FmXk/ERqJDJpEt+uEQKFsMDzjcZ27r/nca/AhwE+cVQku6Fi9QdnFcdNtQmwFYq6
 iwYS/grUoW0tLV7Fv/iYNvZmVtXtS2Ng1VTR77gcLRXps0naEciuM5PO7MmuUe/j
 0aS/fV1s4/ZloRCu6l/gj2JmOAkh0joy6msQTcdEt4LQcvQSxtUmLQtWyLoGm6aa
 5LwTOFg25qRH6c7mJglSsiPdjgAcOIdwzH1gikSzSJeV2NjroZZ4nPW/noz0Idfq
 l36B9T3iHc0jZa6wr/Q/S1qF6RdfLYfuXcrDKcxJFDpuYNRNDN1jjOq+OKs5yGw9
 Fgui6r9/g0uDsEKWtqR30NzUnUx4PxY0hxduT04T2sxMMapU3oXcXWW4vRgly9be
 ASx/hAEnBovNwzHeKo9wVt3WdHqP2M4wMx8dD8hFLL1qPx9uj90IeclOVneNEHKa
 UMtC3BH1DBF44FYhZgT4
 =yq1J
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.3-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Bugfixes for the NFS client.

Fix a nasty Oops in the NFSv4 getacl code, another source of infinite
loops in the NFSv4 state recovery code, and a regression in NFSv4.1
session initialisation.

Also deal with an NFSv4.1 memory leak.

* tag 'nfs-for-3.3-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: fix server_scope memory leak
  NFSv4.1: Fix a NFSv4.1 session initialisation regression
  NFSv4: Ensure we throw out bad delegation stateids on NFS4ERR_BAD_STATEID
  NFSv4: Fix an Oops in the NFSv4 getacl code
2012-02-22 08:43:35 -08:00
Anton Altaparmakov
45d95bcd7a NTFS: Do not dereference pointer before checking for NULL.
Found by Coverity software (http://scan.coverity.com).

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
2012-02-22 11:15:43 +00:00
Anton Altaparmakov
0e37579506 NTFS: Remove unused variable.
Found by Coverity software (http://scan.coverity.com).

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
2012-02-22 10:49:58 +00:00
Linus Torvalds
faf309009e sys_poll: fix incorrect type for 'timeout' parameter
The 'poll()' system call timeout parameter is supposed to be 'int', not
'long'.

Now, the reason this matters is that right now 32-bit compat mode is
broken on at least x86-64, because the 32-bit code just calls
'sys_poll()' directly on x86-64, and the 32-bit argument will have been
zero-extended, turning a signed 'int' into a large unsigned 'long'
value.

We could just introduce a 'compat_sys_poll()' function for this, and
that may eventually be what we have to do, but since the actual standard
poll() semantics is *supposed* to be 'int', and since at least on x86-64
glibc sign-extends the argument before invocing the system call (so
nobody can actually use a 64-bit timeout value in user space _anyway_,
even in 64-bit binaries), the simpler solution would seem to be to just
fix the definition of the system call to match what it should have been
from the very start.

If it turns out that somebody somehow circumvents the user-level libc
64-bit sign extension and actually uses a large unsigned 64-bit timeout
despite that not being how poll() is supposed to work, we will need to
do the compat_sys_poll() approach.

Reported-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-21 17:24:20 -08:00
Mitsuo Hayasaka
c922bbc819 xfs: make inode quota check more general
The xfs checks quota when reserving disk blocks and inodes. In the block
reservation, it checks if the total number of blocks including current
usage and new reservation exceed quota. In the inode reservation,
it checks using the total number of inodes including only current usage
without new reservation. However, this inode quota check works well
since the caller of xfs_trans_dquot() always sets the argument of the
number of new inode reservation to 1 or 0 and inode is reserved one by
one in current xfs.

To make it more general, this patch changes it to the same way as the
block quota check.

Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-21 10:12:43 -06:00
Mitsuo Hayasaka
20f12d8ac0 xfs: change available ranges of softlimit and hardlimit in quota check
In general, quota allows us to use disk blocks and inodes up to each
limit, that is, they are available if they don't exceed their limitations.
Current xfs sets their available ranges to lower than them except disk
inode quota check. So, this patch changes the ranges to not beyond them.

Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-21 10:12:43 -06:00
Liu Bo
692e5759a4 Btrfs: be less strict on finding next node in clear_extent_bit
In clear_extent_bit, it is enough that next node is adjacent in tree level.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-02-21 16:02:10 +01:00
Linus Torvalds
8ebbfb4957 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Assorted fixes, sat in -next for a week or so...

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ocfs2: deal with wraparounds of i_nlink in ocfs2_rename()
  vfs: fix compat_sys_stat() handling of overflows in st_nlink
  quota: Fix deadlock with suspend and quotas
  vfs: Provide function to get superblock and wait for it to thaw
  vfs: fix panic in __d_lookup() with high dentry hashtable counts
  autofs4 - fix lockdep splat in autofs
  vfs: fix d_inode_lookup() dentry ref leak
2012-02-20 16:13:58 -08:00
Weston Andros Adamson
abe9a6d57b NFSv4: fix server_scope memory leak
server_scope would never be freed if nfs4_check_cl_exchange_flags() returned
non-zero

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17 17:34:03 -05:00
Trond Myklebust
f86f36a6ae NFSv4.1: Fix a NFSv4.1 session initialisation regression
Commit aacd553 (NFSv4.1: cleanup init and reset of session slot tables)
introduces a regression in the session initialisation code. New tables
now find their sequence ids initialised to 0, rather than the mandated
value of 1 (see RFC5661).

Fix the problem by merging nfs4_reset_slot_table() and nfs4_init_slot_table().
Since the tbl->max_slots is initialised to 0, the test in
nfs4_reset_slot_table for max_reqs != tbl->max_slots will automatically
pass for an empty table.

Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17 17:33:39 -05:00
Cong Wang
465c9343c5 ecryptfs: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-02-16 16:06:27 -06:00
Tyler Hicks
545d680938 eCryptfs: Copy up lower inode attrs after setting lower xattr
After passing through a ->setxattr() call, eCryptfs needs to copy the
inode attributes from the lower inode to the eCryptfs inode, as they
may have changed in the lower filesystem's ->setxattr() path.

One example is if an extended attribute containing a POSIX Access
Control List is being set. The new ACL may cause the lower filesystem to
modify the mode of the lower inode and the eCryptfs inode would need to
be updated to reflect the new mode.

https://launchpad.net/bugs/926292

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sebastien Bacher <seb128@ubuntu.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: <stable@vger.kernel.org>
2012-02-16 16:06:27 -06:00
Tyler Hicks
4a26620df4 eCryptfs: Improve statfs reporting
statfs() calls on eCryptfs files returned the wrong filesystem type and,
when using filename encryption, the wrong maximum filename length.

If mount-wide filename encryption is enabled, the cipher block size and
the lower filesystem's max filename length will determine the max
eCryptfs filename length. Pre-tested, known good lengths are used when
the lower filesystem's namelen is 255 and a cipher with 8 or 16 byte
block sizes is used. In other, less common cases, we fall back to a safe
rounded-down estimate when determining the eCryptfs namelen.

https://launchpad.net/bugs/885744

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
2012-02-16 16:06:21 -06:00
Liu Bo
d9b0218f6c Btrfs: fix a bug on overcommit stuff
When overcommitting, we should check the sum of pinned space and
bytes for delayed item.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-02-16 17:23:18 +01:00