kernel-fxtec-pro1x

Author	SHA1	Message	Date
Herb Shiu	637ae8d547	ceph: pass lock information by struct file_lock instead of as individual params. Signed-off-by: Herb Shiu <herb_shiu@tcloudcomputing.com> Acked-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 14:22:34 -08:00
Herb Shiu	25933abdd8	ceph: Handle file locks in replies from the MDS. Previously the kernel client incorrectly assumed everything was a directory. Signed-off-by: Herb Shiu <herb_shiu@tcloudcomputing.com> Acked-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 14:22:27 -08:00
Sage Weil	884ea89276	ceph: avoid possible null deref in readdir after dir llseek last may be NULL, but we dereference it in the else branch without checking. Normally it doesn't trigger because last == NULL when fpos == 2, but it could happen on a newly opened dir if the user seeks forward. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 14:15:31 -08:00
Dave Chinner	c76febef57	xfs: only run xfs_error_test if error injection is active Recent tests writing lots of small files showed the flusher thread being CPU bound and taking a long time to do allocations on a debug kernel. perf showed this as the prime reason: samples pcnt function DSO _______ _____ ___________________________ _________________ 224648.00 36.8% xfs_error_test [kernel.kallsyms] 86045.00 14.1% xfs_btree_check_sblock [kernel.kallsyms] 39778.00 6.5% prandom32 [kernel.kallsyms] 37436.00 6.1% xfs_btree_increment [kernel.kallsyms] 29278.00 4.8% xfs_btree_get_rec [kernel.kallsyms] 27717.00 4.5% random32 [kernel.kallsyms] Walking btree blocks during allocation checking them requires each block (a cache hit, so no I/O) call xfs_error_test(), which then does a random32() call as the first operation. IOWs, ~50% of the CPU is being consumed just testing whether we need to inject an error, even though error injection is not active. Kill this overhead when error injection is not active by adding a global counter of active error traps and only calling into xfs_error_test when fault injection is active. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:20 -06:00
Dave Chinner	de25c1818c	xfs: avoid moving stale inodes in the AIL When an inode has been marked stale because the cluster is being freed, we don't want to (re-)insert this inode into the AIL. There is a race condition where the cluster buffer may be unpinned before the inode is inserted into the AIL during transaction committed processing. If the buffer is unpinned before the inode item has been committed and inserted, then it is possible for the buffer to be released and hence processthe stale inode callbacks before the inode is inserted into the AIL. In this case, we then insert a clean, stale inode into the AIL which will never get removed by an IO completion. It will, however, get reclaimed and that triggers an assert in xfs_inode_free() complaining about freeing an inode still in the AIL. This race can be avoided by not moving stale inodes forward in the AIL during transaction commit completion processing. This closes the race condition by ensuring we never insert clean stale inodes into the AIL. It is safe to do this because a dirty stale inode, by definition, must already be in the AIL. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:20 -06:00
Dave Chinner	309c848002	xfs: delayed alloc blocks beyond EOF are valid after writeback There is an assumption in the parts of XFS that flushing a dirty file will make all the delayed allocation blocks disappear from an inode. That is, that after calling xfs_flush_pages() then ip->i_delayed_blks will be zero. This is an invalid assumption as we may have specualtive preallocation beyond EOF and they are recorded in ip->i_delayed_blks. A flush of the dirty pages of an inode will not change the state of these blocks beyond EOF, so a non-zero deeelalloc block count after a flush is valid. The bmap code has an invalid ASSERT() that needs to be removed, and the swapext code has a bug in that while it swaps the data forks around, it fails to swap the i_delayed_blks counter associated with the fork and hence can get the block accounting wrong. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:20 -06:00
Dave Chinner	90810b9e82	xfs: push stale, pinned buffers on trylock failures As reported by Nick Piggin, XFS is suffering from long pauses under highly concurrent workloads when hosted on ramdisks. The problem is that an inode buffer is stuck in the pinned state in memory and as a result either the inode buffer or one of the inodes within the buffer is stopping the tail of the log from being moved forward. The system remains in this state until a periodic log force issued by xfssyncd causes the buffer to be unpinned. The main problem is that these are stale buffers, and are hence held locked until the transaction/checkpoint that marked them state has been committed to disk. When the filesystem gets into this state, only the xfssyncd can cause the async transactions to be committed to disk and hence unpin the inode buffer. This problem was encountered when scaling the busy extent list, but only the blocking lock interface was fixed to solve the problem. Extend the same fix to the buffer trylock operations - if we fail to lock a pinned, stale buffer, then force the log immediately so that when the next attempt to lock it comes around, it will have been unpinned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:20 -06:00
Dave Chinner	c726de4409	xfs: fix failed write truncation handling. Since the move to the new truncate sequence we call xfs_setattr to truncate down excessively instanciated blocks. As shown by the testcase in kernel.org BZ #22452 that doesn't work too well. Due to the confusion of the internal inode size, and the VFS inode i_size it zeroes data that it shouldn't. But full blown truncate seems like overkill here. We only instanciate delayed allocations in the write path, and given that we never released the iolock we can't have converted them to real allocations yet either. The only nasty case is pre-existing preallocation which we need to skip. We already do this for page discard during writeback, so make the delayed allocation block punching a generic function and call it from the failed write path as well as xfs_aops_discard_page. The callers are responsible for ensuring that partial blocks are not truncated away, and that they hold the ilock. Based on a fix originally from Christoph Hellwig. This version used filesystem blocks as the range unit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:19 -06:00
Trond Myklebust	0aded708d1	NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler We need to use the cookie from the previous array entry, not the actual cookie that we are searching for (except for the case of uncached_readdir). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-01 08:16:16 -05:00
Oleg Nesterov	114279be21	exec: copy-and-paste the fixes into compat_do_execve() paths Note: this patch targets 2.6.37 and tries to be as simple as possible. That is why it adds more copy-and-paste horror into fs/compat.c and uglifies fs/exec.c, this will be cleanuped later. compat_copy_strings() plays with bprm->vma/mm directly and thus has two problems: it lacks the RLIMIT_STACK check and argv/envp memory is not visible to oom killer. Export acct_arg_size() and get_arg_page(), change compat_copy_strings() to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0) as do_execve() does. Add the fatal_signal_pending/cond_resched checks into compat_count() and compat_copy_strings(), this matches the code in fs/exec.c and certainly makes sense. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-30 17:56:38 -08:00
Oleg Nesterov	3c77f84572	exec: make argv/envp memory visible to oom-killer Brad Spengler published a local memory-allocation DoS that evades the OOM-killer (though not the virtual memory RLIMIT): http://www.grsecurity.net/~spender/64bit_dos.c execve()->copy_strings() can allocate a lot of memory, but this is not visible to oom-killer, nobody can see the nascent bprm->mm and take it into account. With this patch get_arg_page() increments current's MM_ANONPAGES counter every time we allocate the new page for argv/envp. When do_execve() succeds or fails, we change this counter back. Technically this is not 100% correct, we can't know if the new page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but I don't think this really matters and everything becomes correct once exec changes ->mm or fails. Reported-by: Brad Spengler <spender@grsecurity.net> Reviewed-and-discussed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-30 17:56:37 -08:00
Jeff Layton	ba03864872	cifs: fix parsing of hostname in dfs referrals The DFS referral parsing code does a memchr() call to find the '\\' delimiter that separates the hostname in the referral UNC from the sharename. It then uses that value to set the length of the hostname via pointer subtraction. Instead of subtracting the start of the hostname however, it subtracts the start of the UNC, which causes the code to pass in a hostname length that is 2 bytes too long. Regression introduced in commit `1a4240f4`. Reported-and-Tested-by: Robbert Kouprie <robbert@exx.nl> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Wang Lei <wang840925@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: stable@kernel.org Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-11-30 20:44:05 +00:00
Trond Myklebust	37a09f0745	NFS: Fix a readdirplus bug When comparing filehandles in the helper nfs_same_file(), we should not be using 'strncmp()': filehandles are not null terminated strings. Instead, we should just use the existing helper nfs_compare_fh(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-30 10:18:49 -08:00
Miklos Szeredi	7572777eef	fuse: verify ioctl retries Verify that the total length of the iovec returned in FUSE_IOCTL_RETRY doesn't overflow iov_length(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Tejun Heo <tj@kernel.org> CC: <stable@kernel.org> [2.6.31+]	2010-11-30 16:39:27 +01:00
Miklos Szeredi	d9d318d39d	fuse: fix ioctl when server is 32bit If a 32bit CUSE server is run on 64bit this results in EIO being returned to the caller. The reason is that FUSE_IOCTL_RETRY reply was defined to use 'struct iovec', which is different on 32bit and 64bit archs. Work around this by looking at the size of the reply to determine which struct was used. This is only needed if CONFIG_COMPAT is defined. A more permanent fix for the interface will be to use the same struct on both 32bit and 64bit. Reported-by: "ccmail111" <ccmail111@yahoo.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Tejun Heo <tj@kernel.org> CC: <stable@kernel.org> [2.6.31+]	2010-11-30 16:39:27 +01:00
Suresh Jayaraman	476428f8c3	cifs: display fsc in /proc/mounts Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-11-30 05:51:49 +00:00
Suresh Jayaraman	b81209de24	cifs: enable fscache iff fsc mount option is used explicitly Currently, if CONFIG_CIFS_FSCACHE is set, fscache is enabled on files opened as read-only irrespective of the 'fsc' mount option. Fix this by enabling fscache only if 'fsc' mount option is specified explicitly. Remove an extraneous cFYI debug message and fix a typo while at it. Reported-by: Jeff Layton <jlayton@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-11-30 05:49:32 +00:00
Suresh Jayaraman	607a569da4	cifs: allow fsc mount option only if CONFIG_CIFS_FSCACHE is set Currently, it is possible to specify 'fsc' mount option even if CONFIG_CIFS_FSCACHE has not been set. The option is being ignored silently while the user fscache functionality to work. Fix this by raising error when the CONFIG option is not set. Reported-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-11-30 05:49:28 +00:00
Shirish Pargaonkar	fbeba8bb16	cifs: Handle extended attribute name cifs_acl to generate cifs acl blob (try #4 ) Add extended attribute name system.cifs_acl Get/generate cifs/ntfs acl blob and hand over to the invoker however it wants to parse/process it under experimental configurable option CIFS_ACL. Do not get CIFS/NTFS ACL for xattr for attribute system.posix_acl_access Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-11-30 05:49:24 +00:00
Shirish Pargaonkar	78415d2d30	cifs: Misc. cleanup in cifsacl handling [try #4 ] Change the name of function mode_to_acl to mode_to_cifs_acl. Handle return code in functions mode_to_cifs_acl and cifs_acl_to_fattr. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-11-30 05:49:17 +00:00
Linus Torvalds	aa3fc52546	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits) Btrfs: don't use migrate page without CONFIG_MIGRATION Btrfs: deal with DIO bios that span more than one ordered extent Btrfs: setup blank root and fs_info for mount time Btrfs: fix fiemap Btrfs - fix race between btrfs_get_sb() and umount Btrfs: update inode ctime when using links Btrfs: make sure new inode size is ok in fallocate Btrfs: fix typo in fallocate to make it honor actual size Btrfs: avoid NULL pointer deref in try_release_extent_buffer Btrfs: make btrfs_add_nondir take parent inode as an argument Btrfs: hold i_mutex when calling btrfs_log_dentry_safe Btrfs: use dget_parent where we can UPDATED Btrfs: fix more ESTALE problems with NFS Btrfs: handle NFS lookups properly btrfs: make 1-bit signed fileds unsigned btrfs: Show device attr correctly for symlinks btrfs: Set file size correctly in file clone btrfs: Check if dest_offset is block-size aligned before cloning file Btrfs: handle the space_cache option properly btrfs: Fix early enospc because 'unused' calculated with wrong sign. ...	2010-11-29 14:11:08 -08:00
Linus Torvalds	1bfe4eefe5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Userland expects quota limit/warn/usage in 512b blocks	2010-11-29 14:10:22 -08:00
Suresh Jayaraman	523fb8c867	cifs: trivial comment fix for cifs_invalidate_mapping Only the callers check whether the invalid_mapping flag is set and not cifs_invalidate_mapping(). Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-11-29 17:48:16 +00:00
Chris Mason	5a92bc88ce	Btrfs: don't use migrate page without CONFIG_MIGRATION Fixes compile error Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-29 09:49:11 -05:00
Chris Mason	163cf09c2a	Btrfs: deal with DIO bios that span more than one ordered extent The new DIO bio splitting code has problems when the bio spans more than one ordered extent. This will happen as the generic DIO code merges our get_blocks calls together into a bigger single bio. This fixes things by walking forward in the ordered extent code finding all the overlapping ordered extents and completing them all at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-28 19:56:33 -05:00
Linus Torvalds	7208364652	Un-inline get_pipe_info() helper function This avoids some include-file hell, and the function isn't really important enough to be inlined anyway. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-28 16:27:19 -08:00
Linus Torvalds	c66fb34794	Export 'get_pipe_info()' to other users And in particular, use it in 'pipe_fcntl()'. The other pipe functions do not need to use the 'careful' version, since they are only ever called for things that are already known to be pipes. The normal read/write/ioctl functions are called through the file operations structures, so if a file isn't a pipe, they'd never get called. But pipe_fcntl() is special, and called directly from the generic fcntl code, and needs to use the same careful function that the splice code is using. Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-28 14:09:57 -08:00
Linus Torvalds	71993e62a4	Rename 'pipe_info()' to 'get_pipe_info()' .. and change it to take the 'file' pointer instead of an inode, since that's what all users want anyway. The renaming is preparatory to exporting it to other users. The old 'pipe_info()' name was too generic and is already used elsewhere, so before making the function public we need to use a more specific name. Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-28 13:56:09 -08:00
Josef Bacik	450ba0ea06	Btrfs: setup blank root and fs_info for mount time There is a problem with how we use sget, it searches through the list of supers attached to the fs_type looking for a super with the same fs_devices as what we're trying to mount. This depends on sb->s_fs_info being filled, but we don't fill that in until we get to btrfs_fill_super, so we could hit supers on the fs_type super list that have a null s_fs_info. In order to fix that we need to go ahead and setup a blank root with a blank fs_info to hold fs_devices, that way our test will work out right and then we can set s_fs_info in btrfs_set_super, and then open_ctree will simply use our pre-allocated root and fs_info when setting everything up. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-27 13:37:51 -05:00
Josef Bacik	975f84fee2	Btrfs: fix fiemap There are two big problems currently with FIEMAP 1) We return extents for holes. This isn't supposed to happen, we just don't return extents for holes and then userspace interprets the lack of an extent as a hole. 2) We sometimes don't set FIEMAP_EXTENT_LAST properly. This is because we wait to see a EXTENT_FLAG_VACANCY flag on the em, but this won't happen if say we ask fiemap to map up to the last extent in a file, and there is nothing but holes up to the i_size. To fix this we need to lookup the last extent in this file and save the logical offset, so if we happen to try and map that extent we can be sure to set FIEMAP_EXTENT_LAST. With this patch we now pass xfstest 225, which we never have before. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-27 13:37:50 -05:00
Ian Kent	619c8c7639	Btrfs - fix race between btrfs_get_sb() and umount When mounting a btrfs file system btrfs_test_super() may attempt to use sb->s_fs_info, the btrfs root, of a super block that is going away and that has had the btrfs root set to NULL in its ->put_super(). But if the super block is going away it cannot be an existing super block so we can return false in this case. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-27 13:37:44 -05:00
Josef Bacik	bc1cbf1f86	Btrfs: update inode ctime when using links Currently we fail xfstest 236 because we're not updating the inode ctime on link. This is a simple fix, and makes it so we pass 236 now. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-27 13:00:07 -05:00
Josef Bacik	0ed42a63f3	Btrfs: make sure new inode size is ok in fallocate We have been failing xfstest 228 forever, because we don't check to make sure the new inode size is acceptable as far as RLIMIT is concerned. Just check to make sure it's ok to create a inode with this new size and error out if not. With this patch we now pass 228. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-27 13:00:07 -05:00
Josef Bacik	55a61d1d06	Btrfs: fix typo in fallocate to make it honor actual size There is a typo in __btrfs_prealloc_file_range() where we set the i_size to actual_len/cur_offset, and then just set it to cur_offset again, and do the same with btrfs_ordered_update_i_size(). This fixes it back to keeping i_size in a local variable and then updating i_size properly. Tested this with xfs_io -F -f -c "falloc 0 1" -c "pwrite 0 1" foo stat'ing foo gives us a size of 1 instead of 4096 like it was. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-27 12:59:16 -05:00
Linus Torvalds	19650e8580	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Ensure we return the dirent->d_type when it is known NFS: Correct the array bound calculation in nfs_readdir_add_to_array NFS: Don't ignore errors from nfs_do_filldir() NFS: Fix the error handling in "uncached_readdir()" NFS: Fix a page leak in uncached_readdir() NFS: Fix a page leak in nfs_do_filldir() NFS: Assume eof if the server returns no readdir records NFS: Buffer overflow in ->decode_dirent() should not be fatal Pure nfs client performance using odirect. SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult	2010-11-27 07:30:30 +09:00
Linus Torvalds	78daa87b1d	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: fix build for PROC_FS disabled block: fix amiga and atari floppy driver compile warning blk-throttle: Fix calculation of max number of WRITES to be dispatched ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() xen/blkfront: cope with backend that fail empty BLKIF_OP_WRITE_BARRIER requests xen/blkfront: Implement FUA with BLKIF_OP_WRITE_BARRIER xen/blkfront: change blk_shadow.request to proper pointer xen/blkfront: map REQ_FLUSH into a full barrier	2010-11-27 07:17:50 +09:00
Linus Torvalds	0a66a59649	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix typo in comment of nilfs_dat_move function nilfs2: nilfs_iget_for_gc() returns ERR_PTR	2010-11-27 07:14:00 +09:00
Frederic Weisbecker	da905873ef	reiserfs: fix inode mutex - reiserfs lock misordering reiserfs_unpack() locks the inode mutex with reiserfs_mutex_lock_safe() to protect against reiserfs lock dependency. However this protection requires to have the reiserfs lock to be locked. This is the case if reiserfs_unpack() is called by reiserfs_ioctl but not from reiserfs_quota_on() when it tries to unpack tails of quota files. Fix the ordering of the two locks in reiserfs_unpack() to fix this issue. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: Markus Gapp <markus.gapp@gmx.net> Reported-by: Jan Kara <jack@suse.cz> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.36.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-25 06:50:48 +09:00
Naoya Horiguchi	ea251c1d5c	pagemap: set pagemap walk limit to PMD boundary Currently one pagemap_read() call walks in PAGEMAP_WALK_SIZE bytes (== 512 pages.) But there is a corner case where walk_pmd_range() accidentally runs over a VMA associated with a hugetlbfs file. For example, when a process has mappings to VMAs as shown below: # cat /proc/<pid>/maps ... 3a58f6d000-3a58f72000 rw-p 00000000 00:00 0 7fbd51853000-7fbd51855000 rw-p 00000000 00:00 0 7fbd5186c000-7fbd5186e000 rw-p 00000000 00:00 0 7fbd51a00000-7fbd51c00000 rw-s 00000000 00:12 8614 /hugepages/test then pagemap_read() goes into walk_pmd_range() path and walks in the range 0x7fbd51853000-0x7fbd51a53000, but the hugetlbfs VMA should be handled by walk_hugetlb_range(). Otherwise PMD for the hugepage is considered bad and cleared, which causes undesirable results. This patch fixes it by separating pagemap walk range into one PMD. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-25 06:50:46 +09:00
Ken Sumrall	a0822c5577	fuse: fix attributes after open(O_TRUNC) The attribute cache for a file was not being cleared when a file is opened with O_TRUNC. If the filesystem's open operation truncates the file ("atomic_o_trunc" feature flag is set) then the kernel should invalidate the cached st_mtime and st_ctime attributes. Also i_size should be explicitly be set to zero as it is used sometimes without refreshing the cache. Signed-off-by: Ken Sumrall <ksumrall@android.com> Cc: Anfei <anfei.zhou@gmail.com> Cc: "Anand V. Avati" <avati@gluster.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-25 06:50:41 +09:00
Ryusuke Konishi	f6c26ec508	nilfs2: fix typo in comment of nilfs_dat_move function Fixes a typo: "uncommited" -> "uncommitted". Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-11-24 12:51:48 +09:00
Dan Carpenter	103cfcf522	nilfs2: nilfs_iget_for_gc() returns ERR_PTR nilfs_iget_for_gc() returns an ERR_PTR() on failure and doesn't return NULL. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>	2010-11-23 16:32:19 +09:00
Trond Myklebust	0b26a0bf6f	NFS: Ensure we return the dirent->d_type when it is known Store the dirent->d_type in the struct nfs_cache_array_entry so that we can use it in getdents() calls. This fixes a regression with the new readdir code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:48 -05:00
Trond Myklebust	3020093f57	NFS: Correct the array bound calculation in nfs_readdir_add_to_array It looks as if the array size calculation in MAX_READDIR_ARRAY does not take the alignment of struct nfs_cache_array_entry into account. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:47 -05:00
Trond Myklebust	ece0b4233b	NFS: Don't ignore errors from nfs_do_filldir() We should ignore the errors from the filldir callback, and just interpret them as meaning we should exit, however we should definitely pass back ENOMEM errors. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:47 -05:00
Trond Myklebust	85f8607e16	NFS: Fix the error handling in "uncached_readdir()" Currently, uncached_readdir() is broken because if fails to handle the results from nfs_readdir_xdr_to_array() correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:46 -05:00
Trond Myklebust	7a8e1dc34f	NFS: Fix a page leak in uncached_readdir() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:45 -05:00
Trond Myklebust	e7c58e974a	NFS: Fix a page leak in nfs_do_filldir() nfs_do_filldir() must always free desc->page when it is done, otherwise we end up leaking the page. Also remove unused variable 'dentry'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:44 -05:00
Trond Myklebust	5c346854d8	NFS: Assume eof if the server returns no readdir records Some servers are known to be buggy w.r.t. this. Deal with them... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:44 -05:00
Trond Myklebust	463a376eae	NFS: Buffer overflow in ->decode_dirent() should not be fatal Overflowing the buffer in the readdir ->decode_dirent() should not lead to a fatal error, but rather to an attempt to reread the record in question. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:43 -05:00
Arun Bharadwaj	b47d19de2c	Pure nfs client performance using odirect. When an application opens a file with O_DIRECT flag, if the size of the data that is written is equal to wsize, the client sends a WRITE RPC with stable flag set to UNSTABLE followed by a single COMMIT RPC rather than sending a single WRITE RPC with the stable flag set to FILE_SYNC. This a bug. Patch to fix this. Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:24:42 -05:00
Chris Mason	45f49bce99	Btrfs: avoid NULL pointer deref in try_release_extent_buffer If we fail to find a pointer in the radix tree, don't try to deref the NULL one we do have. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:27:44 -05:00
Josef Bacik	a1b075d28d	Btrfs: make btrfs_add_nondir take parent inode as an argument Everybody who calls btrfs_add_nondir just passes in the dentry of the new file and then dereference dentry->d_parent->d_inode, but everybody who calls btrfs_add_nondir() are already passed the parent's inode. So instead of dereferencing dentry->d_parent, just make btrfs_add_nondir take the dir inode as an argument and pass that along so we don't have to worry about d_parent. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:10 -05:00
Josef Bacik	495e86779f	Btrfs: hold i_mutex when calling btrfs_log_dentry_safe Since we walk up the path logging all of the parts of the inode's path, we need to hold i_mutex to make sure that the inode is not renamed while we're logging everything. btrfs_log_dentry_safe does dget_parent and all of that jazz, but we may get unexpected results if the rename changes the inode's location while we're higher up the path logging those dentries, so do this for safety reasons. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:09 -05:00
Josef Bacik	6a91221304	Btrfs: use dget_parent where we can UPDATED There are lots of places where we do dentry->d_parent->d_inode without holding the dentry->d_lock. This could cause problems with rename. So instead we need to use dget_parent() and hold the reference to the parent as long as we are going to use it's inode and then dput it at the end. Signed-off-by: Josef Bacik <josef@redhat.com> Cc: raven@themaw.net Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:09 -05:00
Josef Bacik	7619585390	Btrfs: fix more ESTALE problems with NFS When creating new inodes we don't setup inode->i_generation. So if we generate an fh with a newly created inode we save the generation of 0, but if we flush the inode to disk and have to read it back when getting the inode on the server we'll have the right i_generation, so gens wont match and we get ESTALE. This patch properly sets inode->i_generation when we create the new inode and now I'm no longer getting ESTALE. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:08 -05:00
Josef Bacik	2ede0daf01	Btrfs: handle NFS lookups properly People kept reporting NFS issues, specifically getting ESTALE alot. I figured out how to reproduce the problem SERVER mkfs.btrfs /dev/sda1 mount /dev/sda1 /mnt/btrfs-test <add /mnt/btrfs-test to /etc/exports> btrfs subvol create /mnt/btrfs-test/foo service nfs start CLIENT mount server:/mnt/btrfs /mnt/test cd /mnt/test/foo ls SERVER echo 3 > /proc/sys/vm/drop_caches CLIENT ls <-- get an ESTALE here This is because the standard way to lookup a name in nfsd is to use readdir, and what it does is do a readdir on the parent directory looking for the inode of the child. So in this case the parent being / and the child being foo. Well subvols all have the same inode number, so doing a readdir of / looking for inode 256 will return '.', which obviously doesn't match foo. So instead we need to have our own .get_name so that we can find the right name. Our .get_name will either lookup the inode backref or the root backref, whichever we're looking for, and return the name we find. Running the above reproducer with this patch results in everything acting the way its supposed to. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:08 -05:00
Mariusz Kozlowski	0410c94aff	btrfs: make 1-bit signed fileds unsigned Fixes these sparse warnings: fs/btrfs/ctree.h:811:17: error: dubious one-bit signed bitfield fs/btrfs/ctree.h:812:20: error: dubious one-bit signed bitfield fs/btrfs/ctree.h:813:19: error: dubious one-bit signed bitfield Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:07 -05:00
Li Zefan	f209561ad8	btrfs: Show device attr correctly for symlinks Symlinks and files of other types show different device numbers, though they are on the same partition: $ touch tmp; ln -s tmp tmp2; stat tmp tmp2 File: `tmp' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 15h/21d Inode: 984027 Links: 1 --- snip --- File: `tmp2' -> `tmp' Size: 3 Blocks: 0 IO Block: 4096 symbolic link Device: 13h/19d Inode: 984028 Links: 1 Reported-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:07 -05:00
Li Zefan	5f3888ff6f	btrfs: Set file size correctly in file clone Set src_offset = 0, src_length = 20K, dest_offset = 20K. And the original filesize of the dest file 'file2' is 30K: # ls -l /mnt/file2 -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2 Now clone file1 to file2, the dest file should be 40K, but it still shows 30K: # ls -l /mnt/file2 -rw-r--r-- 1 root root 30720 Nov 18 16:42 /mnt/file2 Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:06 -05:00
Li Zefan	2a6b8daeda	btrfs: Check if dest_offset is block-size aligned before cloning file We've done the check for src_offset and src_length, and We should also check dest_offset, otherwise we'll corrupt the destination file: (After cloning file1 to file2 with unaligned dest_offset) # cat /mnt/file2 cat: /mnt/file2: Input/output error Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:05 -05:00
Josef Bacik	0de90876c6	Btrfs: handle the space_cache option properly When I added the clear_cache option I screwed up and took the break out of the space_cache case statement, so whenever you mount with space_cache you also get clear_cache, which does you no good if you say set space_cache in fstab so it always gets set. This patch adds the break back in properly. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:05 -05:00
Arne Jansen	6f33434850	btrfs: Fix early enospc because 'unused' calculated with wrong sign. 'unused' calculated with wrong sign in reserve_metadata_bytes(). This might have lead to unwanted over-reservations. Signed-off-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:04 -05:00
Miao Xie	e65e153554	btrfs: fix panic caused by direct IO btrfs paniced when we write >64KB data by direct IO at one time. Reproduce steps: # mkfs.btrfs /dev/sda5 /dev/sda6 # mount /dev/sda5 /mnt # dd if=/dev/zero of=/mnt/tmpfile bs=100K count=1 oflag=direct Then btrfs paniced: mapping failed logical 1103155200 bio len 69632 len 12288 ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:3010! [SNIP] Pid: 1992, comm: btrfs-worker-0 Not tainted 2.6.37-rc1 #1 D2399/PRIMERGY RIP: 0010:[<ffffffffa03d1462>] [<ffffffffa03d1462>] btrfs_map_bio+0x202/0x210 [btrfs] [SNIP] Call Trace: [<ffffffffa03ab3eb>] __btrfs_submit_bio_done+0x1b/0x20 [btrfs] [<ffffffffa03a35ff>] run_one_async_done+0x9f/0xb0 [btrfs] [<ffffffffa03d3d20>] run_ordered_completions+0x80/0xc0 [btrfs] [<ffffffffa03d45a4>] worker_loop+0x154/0x5f0 [btrfs] [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs] [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs] [<ffffffff81083216>] kthread+0x96/0xa0 [<ffffffff8100cec4>] kernel_thread_helper+0x4/0x10 [<ffffffff81083180>] ? kthread+0x0/0xa0 [<ffffffff8100cec0>] ? kernel_thread_helper+0x0/0x10 We fix this problem by splitting bios when we submit bios. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:04 -05:00
Miao Xie	88f794ede7	btrfs: cleanup duplicate bio allocating functions extent_bio_alloc() and compressed_bio_alloc() are similar, cleanup similar source code. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:03 -05:00
Miao Xie	0c56fa9662	btrfs: fix free dip and dip->csums twice bio_endio() will free dip and dip->csums, so dip and dip->csums twice will be freed twice. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:02 -05:00
Chris Mason	784b4e29a2	Btrfs: add migrate page for metadata inode Migrate page will directly call the btrfs btree writepage function, which isn't actually allowed. Our writepage assumes that you have locked the extent_buffer and flagged the block as written. Without doing these steps, we can corrupt metadata blocks. A later commit will remove the btree writepage function since it is really only safely used internally by btrfs. We use writepages for everything else. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-11-21 22:26:02 -05:00
Linus Torvalds	b86db47442	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard fs: Do not dispatch FITRIM through separate super_operation ext4: ext4_fill_super shouldn't return 0 on corruption jbd2: fix /proc/fs/jbd2/<dev> when using an external journal ext4: missing unlock in ext4_clear_request_list() ext4: fix setting random pages PageUptodate	2010-11-19 19:46:45 -08:00
Lukas Czerner	e681c047e4	ext4: Add EXT4_IOC_TRIM ioctl to handle batched discard Filesystem independent ioctl was rejected as not common enough to be in core vfs ioctl. Since we still need to access to this functionality this commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch ext4_trim_fs(). It takes fstrim_range structure as an argument. fstrim_range is definec in the include/linux/fs.h and its definition is as follows. struct fstrim_range { __u64 start; __u64 len; __u64 minlen; } start - first Byte to trim len - number of Bytes to trim from start minlen - minimum extent length to trim, free extents shorter than this number of Bytes will be ignored. This will be rounded up to fs block size. After the FITRIM is done, the number of actually discarded Bytes is stored in fstrim_range.len to give the user better insight on how much storage space has been really released for wear-leveling. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-19 21:47:07 -05:00
Lukas Czerner	93bb41f4f8	fs: Do not dispatch FITRIM through separate super_operation There was concern that FITRIM ioctl is not common enough to be included in core vfs ioctl, as Christoph Hellwig pointed out there's no real point in dispatching this out to a separate vector instead of just through ->ioctl. So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs from super_operation structure. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-19 21:18:35 -05:00
Linus Torvalds	76db8ac45f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix readdir EOVERFLOW on 32-bit archs ceph: fix frag offset for non-leftmost frags ceph: fix dangling pointer ceph: explicitly specify page alignment in network messages ceph: make page alignment explicit in osd interface ceph: fix comment, remove extraneous args ceph: fix update of ctime from MDS ceph: fix version check on racing inode updates ceph: fix uid/gid on resent mds requests ceph: fix rdcache_gen usage and invalidate ceph: re-request max_size if cap auth changes ceph: only let auth caps update max_size ceph: fix open for write on clustered mds ceph: fix bad pointer dereference in ceph_fill_trace ceph: fix small seq message skipping Revert "ceph: update issue_seq on cap grant"	2010-11-19 15:32:22 -08:00
Darrick J. Wong	5a9ae68a34	ext4: ext4_fill_super shouldn't return 0 on corruption At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path out of that function returns ret. However, the generic_check_addressable clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g. a group checksum error) returns 0 even though the mount should fail. This causes vfs_kern_mount in turn to think that the mount succeeded, leading to an oops. A simple fix is to avoid using ret for the generic_check_addressable check, which was last changed in commit `30ca22c70e`. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-19 09:56:44 -05:00
Abhijith Das	14870b4575	GFS2: Userland expects quota limit/warn/usage in 512b blocks Userland programs using the quotactl() syscall assume limit/warn/usage block counts in 512b basic blocks which were instead being read/written in fs blocksize in gfs2. With this patch, gfs2 correctly interacts with the syscall using 512b blocks. Signed-off-by: Abhi Das <adas@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-19 11:20:29 +00:00
dann frazier	226291aa46	ocfs2_connection_find() returns pointer to bad structure If ocfs2_live_connection_list is empty, ocfs2_connection_find() will return a pointer to the LIST_HEAD, cast as a ocfs2_live_connection. This can cause an oops when ocfs2_control_send_down() dereferences c->oc_conn: Call Trace: [<ffffffffa00c2a3c>] ocfs2_control_message+0x28c/0x2b0 [ocfs2_stack_user] [<ffffffffa00c2a95>] ocfs2_control_write+0x35/0xb0 [ocfs2_stack_user] [<ffffffff81143a88>] vfs_write+0xb8/0x1a0 [<ffffffff8155cc13>] ? do_page_fault+0x153/0x3b0 [<ffffffff811442f1>] sys_write+0x51/0x80 [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b Fix by explicitly returning NULL if no match is found. Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-11-18 15:41:41 -08:00
Milton Miller	a2a2f55291	ocfs2: char is not always signed Commit `1c66b360fe` (Change some lock status member in ocfs2_lock_res to char.) states that these fields need to be signed due to comparision to -1, but only changed the type from unsigned char to char. However, it is a compiler option if char is a signed or unsigned type. Change these fields to signed char so the code will work with all compilers. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-11-18 14:10:56 -08:00
Tristan Ye	1989a80a60	Ocfs2: Stop tracking a negative dentry after dentry_iput(). I suddenly hit the problem during 2.6.37-rc1 regression test, which was introduced by commit '5e98d492406818e6a94c0ba54c61f59d40cefa4a'(Track negative entries v3), following scenario reproduces the issue easily: Node A Node B ================ ============ $touch testfile $ls testfile $rm -rf testfile $touch testfile $ls testfile ls: cannot access testfile: No such file or directory This patch stops tracking the dentry which was negativated by a inode deletion, so as to force the revaliation in next lookup, in case we'll touch the inode again in the same node. It didn't hurt the performance of multiple lookup for none-existed files anyway, while regresses a bit in the first try after a file deletion. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-11-18 14:10:56 -08:00
Jiri Slaby	1cf257f511	ocfs2: fix memory leak Stanse found that o2hb_heartbeat_group_make_item leaks some memory on fail paths. Fix the paths by adding a new label and jump there. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: ocfs2-devel@oss.oracle.com Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-11-18 14:10:56 -08:00
David Sterba	a48a982a6b	fs/ocfs2/dlm: Use GFP_ATOMIC under spin_lock coccinelle check scripts/coccinelle/locks/call_kern.cocci found that in fs/ocfs2/dlm/dlmdomain.c an allocation with GFP_KERNEL is done with locks held: dlm_query_region_handler spin_lock(dlm_domain_lock) dlm_match_regions kmalloc(GFP_KERNEL) Change it to GFP_ATOMIC. Signed-off-by: David Sterba <dsterba@suse.cz> CC: Joel Becker <joel.becker@oracle.com> CC: Mark Fasheh <mfasheh@suse.com> CC: ocfs2-devel@oss.oracle.com -- Exists in v2.6.37-rc1 and current linux-next. Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-11-18 14:10:55 -08:00
Sage Weil	3105c19c45	ceph: fix readdir EOVERFLOW on 32-bit archs One of the readdir filldir_t callers was passing the raw ceph 64-bit ino instead of the hashed 32-bit one, producing an EOVERFLOW in the filler callback. Fix this by calling the ceph_vino_to_ino() helper to do the conversion. Reported-by: Jan Smets <jan.smets@alcatel-lucent.com> Tested-by: Jan Smets <jan.smets@alcatel-lucent.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-18 09:15:07 -08:00
yangsheng	0587aa3d11	jbd2: fix /proc/fs/jbd2/<dev> when using an external journal In jbd2_journal_init_dev(), we need make sure the journal structure is fully initialzied before calling jbd2_stats_proc_init(). Reviewed-by: Andreas Dilger <andreas.dilger@oracle.com> Signed-off-by: yangsheng <sheng.yang@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-17 21:46:26 -05:00
Dan Carpenter	f4c8cc652d	ext4: missing unlock in ext4_clear_request_list() If the the li_request_list was empty then it returned with the lock held. Instead of adding a "goto unlock" I just removed that special case and let it go past the empty list_for_each_safe(). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-17 21:46:25 -05:00
Markus Trippelsdorf	08da1193d2	ext4: fix setting random pages PageUptodate ext4_end_bio calls put_page and kmem_cache_free before calling SetPageUpdate(). This can result in setting the PageUptodate bit on random pages and causes the following BUG: BUG: Bad page state in process rm pfn:52e54 page:ffffea0001222260 count:0 mapcount:0 mapping: (null) index:0x0 arch kernel: page flags: 0x4000000000000008(uptodate) Fix the problem by moving put_io_page() after the SetPageUpdate() call. Thanks to Hugh Dickins for analyzing this problem. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-17 21:46:06 -05:00
Arnd Bergmann	460781b542	BKL: remove references to lock_kernel from comments Lock_kernel is gone from the code, so the comments should be updated, too. nfsd now uses lock_flocks instead of lock_kernel to protect against posix file locks. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: J. Bruce Fields <bfields@redhat.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
Catalin Marinas	04e4bd1c67	nfs: Ignore kmemleak false positive in nfs_readdir_make_qstr Strings allocated via kmemdup() in nfs_readdir_make_qstr() are referenced from the nfs_cache_array which is stored in a page cache page. Kmemleak does not scan such pages and it reports several false positives. This patch annotates the string->name pointer so that kmemleak does not consider it a real leak. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-16 12:03:14 -05:00
Trond Myklebust	ac39612824	NFS: readdir shouldn't read beyond the reply returned by the server Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-15 20:44:29 -05:00
Trond Myklebust	8cd51a0ccd	NFS: Fix a couple of regressions in readdir. Fix up the issue that array->eof_index needs to be able to be set even if array->size == 0. Ensure that we catch all important memory allocation error conditions and/or kmap() failures. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-15 20:44:28 -05:00
Trond Myklebust	23ebbd9acf	Revert "NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR" This reverts commit `80e60639f1`. This change requires further fixes to ensure that the open doesn't succeed if the lookup later results in a regular file being created. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-15 20:44:27 -05:00
Paulius Zaleckas	1e657bd51f	Regression: fix mounting NFS when NFSv3 support is not compiled Trying to mount NFS (root partition in my case) fails if CONFIG_NFS_V3 is not selected. nfs_validate_mount_data() returns EPROTONOSUPPORT, because of this check: #ifndef CONFIG_NFS_V3 if (args->version == 3) goto out_v3_not_compiled; #endif /* !CONFIG_NFS_V3 */ and args->version was always initialized to 3. It was working in 2.6.36 Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-15 20:44:27 -05:00
Trond Myklebust	8e35f8e7c6	NLM: Fix a regression in lockd Nick Bowler reports: There are no unusual messages on the client... but I just logged into the server and I see lots of messages of the following form: nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! Bisected to commit `9247685088` (SUNRPC: Properly initialize sock_xprt.srcaddr in all cases) Apparently, removing the 'transport->srcaddr.ss_family = family' from xs_create_sock() triggers this due to nlmclnt_lookup_host() incorrectly initialising the srcaddr family to AF_UNSPEC. Reported-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-15 20:44:26 -05:00
Linus Torvalds	620751a259	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Fix inode deallocation race	2010-11-15 08:43:29 -08:00
Steven Whitehouse	044b9414c7	GFS2: Fix inode deallocation race This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-11-15 12:44:42 +00:00
Greg Thelen	d69b78ba1d	ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() Using: - CONFIG_LOCKUP_DETECTOR=y - CONFIG_PREEMPT=y - CONFIG_LOCKDEP=y - CONFIG_PROVE_LOCKING=y - CONFIG_PROVE_RCU=y found a missing rcu lock during boot on a 512 MiB x86_64 ubuntu vm: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- kernel/pid.c:419 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by ureadahead/1355: #0: (tasklist_lock){.+.+..}, at: [<ffffffff8115bc09>] sys_ioprio_set+0x7f/0x29e stack backtrace: Pid: 1355, comm: ureadahead Not tainted 2.6.37-dbg-DEV #1 Call Trace: [<ffffffff8109c10c>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffff81088cbf>] find_task_by_pid_ns+0x44/0x5d [<ffffffff81088cfa>] find_task_by_vpid+0x22/0x24 [<ffffffff8115bc3e>] sys_ioprio_set+0xb4/0x29e [<ffffffff8147cf21>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff8105c409>] sysenter_dispatch+0x7/0x2c [<ffffffff8147cee2>] ? trace_hardirqs_on_thunk+0x3a/0x3f The fix is to: a) grab rcu lock in sys_ioprio_{set,get}() and b) avoid grabbing tasklist_lock. Discussion in: http://marc.info/?l=linux-kernel&m=128951324702889 Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Modified by Jens to remove the now redundant inner rcu lock and unlock since they are now protected by the outer lock. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-15 10:23:31 +01:00
Linus Torvalds	1ca7318cac	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Change some lock status member in ocfs2_lock_res to char.	2010-11-14 13:04:53 -08:00
Steve French	362d31297f	[CIFS] fs/cifs/Kconfig: CIFS depends on CRYPTO_HMAC linux-2.6.37-rc1: I compiled a kernel with CIFS which subsequently failed with an error indicating it couldn't initialize crypto module "hmacmd5". CONFIG_CRYPTO_HMAC=y fixed the problem. This patch makes CIFS depend on CRYPTO_HMAC in kconfig. Signed-off-by: Jody Bruchon<jody@nctritech.com> CC: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-11-14 03:34:30 +00:00
Tao Ma	1c66b360fe	ocfs2: Change some lock status member in ocfs2_lock_res to char. Commit `83fd9c7` changes l_level, l_requested and l_blocking of ocfs2_lock_res from int to unsigned char. But actually it is initially as -1(ocfs2_lock_res_init_common) which correspoding to 255 for unsigned char. So the whole dlm lock mechanism doesn't work now which means a disaster to ocfs2. Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-11-13 03:15:08 -08:00
Jeff Layton	59c55ba1fb	cifs: don't take extra tlink reference in initiate_cifs_search It's possible for initiate_cifs_search to be called on a filp that already has private_data attached. If this happens, we'll end up calling cifs_sb_tlink, taking an extra reference to the tlink and attaching that to the cifsFileInfo. This leads to refcount leaks that manifest as a "stuck" cifsd at umount time. Fix this by only looking up the tlink for the cifsFile on the filp's first pass through this function. When called on a filp that already has cifsFileInfo associated with it, just use the tlink reference that it already owns. This patch fixes samba.org bug 7792: https://bugzilla.samba.org/show_bug.cgi?id=7792 Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-11-13 03:26:17 +00:00
Linus Torvalds	8a9f772c14	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits) block: remove unused copy_io_context() Documentation: remove anticipatory scheduler info block: remove REQ_HARDBARRIER ioprio: rcu_read_lock/unlock protect find_task_by_vpid call (V2) ioprio: fix RCU locking around task dereference block: ioctl: fix information leak to userland block: read i_size with i_size_read() cciss: fix proc warning on attempt to remove non-existant directory bio: take care not overflow page count when mapping/copying user data block: limit vec count in bio_kmalloc() and bio_alloc_map_data() block: take care not to overflow when calculating total iov length block: check for proper length of iov entries in blk_rq_map_user_iov() cciss: remove controllers supported by hpsa cciss: use usleep_range not msleep for small sleeps cciss: limit commands allocated on reset_devices cciss: Use kernel provided PCI state save and restore functions cciss: fix board status waiting code drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code ...	2010-11-12 08:52:47 -08:00
Linus Torvalds	fb1cb7b27b	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: remove incorrect assert in xfs_vm_writepage xfs: use hlist_add_fake xfs: fix a few compiler warnings with CONFIG_XFS_QUOTA=n xfs: tell lockdep about parent iolock usage in filestreams xfs: move delayed write buffer trace xfs: fix per-ag reference counting in inode reclaim tree walking xfs: xfs_ioctl: fix information leak to userland xfs: remove experimental tag from the delaylog option	2010-11-12 08:11:03 -08:00
Linus Torvalds	0f90933c47	Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: locks: remove dead lease error-handling code locks: fix leak on merging leases nfsd4: fix 4.1 connection registration race	2010-11-12 07:59:41 -08:00

1 2 3 4 5 ...

20490 commits