kernel-fxtec-pro1x

Author	SHA1	Message	Date
Joe Perches	5a4a798937	ext4: Remove unnecessary semicolons in mballoc.c Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-05 22:33:08 -04:00
Curt Wohlgemuth	6487a9d3b5	ext4: More buffer head reference leaks After the patch I posted last week regarding buffer head ref leaks in no-journal mode, I looked at all the code that uses buffer heads and searched for more potential leaks. The patch below fixes the issues I found; these can occur even when a journal is present. The change to inode.c fixes a double release if ext4_journal_get_create_access() fails. The changes to namei.c are more complicated. add_dirent_to_buf() will release the input buffer head EXCEPT when it returns -ENOSPC. There are some callers of this routine that don't always do the brelse() in the event that -ENOSPC is returned. Unfortunately, to put this fix into ext4_add_entry() required capturing the return value of make_indexed_dir() and add_dirent_to_buf(). Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-17 10:54:08 -04:00
Theodore Ts'o	78f1ddbb49	ext4: Avoid null pointer dereference when decoding EROFS w/o a journal We need to check to make sure a journal is present before checking the journal flags in ext4_decode_error(). Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-27 23:09:47 -04:00
Manish Katiyar	43b3852029	ext4: Fix typo in ext4/Kconfig Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-27 21:38:17 -04:00
Aneesh Kumar K.V	024eab4d5b	ext4: Fix memory leak fix when mounting an ext4 filesystem The allocation of the ext4_group_info array was moved to a new function ext4_mb_add_group_info() in commit `5f21b0e6` so that online resize would use a common (and correct) codepath. Unfortunately, the call to the new ext4_mb_add_group_info() function was added without removing the code which originally allocated the array. This caused a memory leak each time an ext4 filesystem was mounted. The fix is simple; remove the code that did the original allocation, since it is no longer needed. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-17 09:01:04 -04:00
Jan Kara	0d34ec62e1	ext4: Remove syncing logic from ext4_file_write The syncing is now properly handled by generic_file_aio_write() so no special ext4 code is needed. CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Signed-off-by: Jan Kara <jack@suse.cz>	2009-09-14 17:08:16 +02:00
Linus Torvalds	1d5ccd1c42	ext[234]: move over to 'check_acl' permission model Don't implement per-filesystem 'extX_permission()' functions that have to be called for every path component operation, and instead just expose the actual ACL checking so that the VFS layer can now do it for us. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-08 11:09:04 -07:00
Linus Torvalds	1cf29683f4	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix race between write_metadata_buffer and get_write_access ext4: Fix ext4_mb_initialize_context() to initialize all fields ext4: fix null handler of ioctls in no journal mode ext4: Fix buffer head reference leak in no-journal mode ext4: Move __ext4_journalled_writepage() to avoid forward declaration ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaoc ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocation ext4: Don't look at buffer_heads outside i_size. ext4: Fix goal inum check in the inode allocator ext4: fix no journal corruption with locale-gen ext4: Calculate required journal credits for inserting an extent properly ext4: Fix truncation of symlinks after failed write jbd2: Fix a race between checkpointing code and journal_get_write_access() ext4: Use rcu_barrier() on module unload. ext4: naturally align struct ext4_allocation_request ext4: mark several more functions in mballoc.c as noinline ext4: Fix potential reclaim deadlock when truncating partial block jbd2: Remove GFP_ATOMIC kmalloc from inside spinlock critical region ext4: Fix type warning on 64-bit platforms in tracing events header	2009-07-13 16:39:25 -07:00
Theodore Ts'o	833576b362	ext4: Fix ext4_mb_initialize_context() to initialize all fields Pavel Roskin pointed out that kmemcheck indicated that ext4_mb_store_history() was accessing uninitialized values of ac->ac_tail and ac->ac_buddy leading to garbage in the mballoc history. Fix this by initializing the entire structure to all zeros first. Also, two fields were getting doubly initialized by the caller of ext4_mb_initialize_context, so remove them for efficiency's sake. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 09:45:52 -04:00
Peng Tao	ac046f1d61	ext4: fix null handler of ioctls in no journal mode The EXT4_IOC_GROUP_ADD and EXT4_IOC_GROUP_EXTEND ioctls should not flush the journal in no_journal mode. Otherwise, running resize2fs on a mounted no_journal partition triggers the following error messages: BUG: unable to handle kernel NULL pointer dereference at 00000014 IP: [<c039d282>] _spin_lock+0x8/0x19 *pde = 00000000 Oops: 0002 [#1] SMP Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 09:30:17 -04:00
Curt Wohlgemuth	e6b5d30104	ext4: Fix buffer head reference leak in no-journal mode We found a problem with buffer head reference leaks when using an ext4 partition without a journal. In particular, calls to ext4_forget() would not to a brelse() on the input buffer head, which will cause pages they belong to to not be reclaimable. Further investigation showed that all places where ext4_journal_forget() and ext4_journal_revoke() are called are subject to the same problem. The patch below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit release of the buffer head when the journal handle isn't valid. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 09:07:20 -04:00
Alexey Dobriyan	405f55712d	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-12 12:22:34 -07:00
Al Viro	073aaa1b14	helpers for acl caching + switch to those helpers: get_cached_acl(inode, type), set_cached_acl(inode, type, acl), forget_cached_acl(inode, type). ubifs/xattr.c needed includes reordered, the rest is a plain switchover. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:07 -04:00
Al Viro	d4bfe2f76d	switch ext4 to inode->i_acl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:04 -04:00
Linus Torvalds	31583d6acf	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Fix kernel-doc parameter name typo in blk-settings.c: block: rename CONFIG_LBD to CONFIG_LBDAF block: Fix bounce_pfn setting hd: stop defining MAJOR_NR	2009-06-19 17:43:04 -07:00
Bartlomiej Zolnierkiewicz	90c699a9ee	block: rename CONFIG_LBD to CONFIG_LBDAF Follow-up to "block: enable by default support for large devices and files on 32-bit archs". Rename CONFIG_LBD to CONFIG_LBDAF to: - allow update of existing [def]configs for "default y" change - reflect that it is used also for large files support nowadays Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-06-19 08:08:50 +02:00
Theodore Ts'o	210ad6aedb	ext4: avoid unnecessary spinlock in critical POSIX ACL path If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem to do POSIX ACL checks on any files not owned by the caller, and it does this for every single pathname component that it looks up. That obviously can be pretty expensive if the filesystem isn't careful about it, especially with locking. That's doubly sad, since the common case tends to be that there are no ACL's associated with the files in question. ext4 already caches the ACL data so that it doesn't have to look it up over and over again, but it does so by taking the inode->i_lock spinlock on every lookup. Which is a noticeable overhead even if it's a private lock, especially on CPU's where the serialization is expensive (eg Intel Netburst aka 'P4'). For the special case of not actually having any ACL's, all that locking is unnecessary. Even if somebody else were to be changing the ACL's on another CPU, we simply don't care - if we've seen a NULL ACL, we might as well use it. So just load the ACL speculatively without any locking, and if it was NULL, just use it. If it's non-NULL (either because we had a cached entry, or because the cache hasn't been filled in at all), it means that we'll need to get the lock and re-load it properly. (This commit was ported from a patch originally authored by Linus for ext3.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-17 00:36:35 -04:00
Theodore Ts'o	4159175058	ext4: Don't update ctime for non-extent-mapped inodes The VFS handles updating ctime, so we don't need to update the inode's ctime in ext4_splace_branch() to update the direct or indirect blocks. This was harmless when we did this in ext3, but in ext4, thanks to delayed allocation, updating the ctime in ext4_splice_branch() can cause the ctime to mysteriously jump when the blocks are finally allocated. Thanks to Björn Steinbrink for pointing out this problem on the git mailing list. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-15 03:41:23 -04:00
Aneesh Kumar K.V	62e086be5d	ext4: Move __ext4_journalled_writepage() to avoid forward declaration In addition, fix two unused variable warnings. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-14 17:59:34 -04:00
Aneesh Kumar K.V	43ce1d23b4	ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaoc This patch fixes the mmap/truncate race that was fixed for delayed allocation by merging ext4_{journalled,normal,da}_writepage() into ext4_writepage(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-14 17:58:45 -04:00
Aneesh Kumar K.V	c364b22c95	ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocation It is possible to see buffer_heads which are not mapped in the writepage callback in the following scneario (where the fs blocksize is 1k and the page size is 4k): 1) truncate(f, 1024) 2) mmap(f, 0, 4096) 3) a[0] = 'a' 4) truncate(f, 4096) 5) writepage(...) Now if we get a writepage callback immediately after (4) and before an attempt to write at any other offset via mmap address (which implies we are yet to get a pagefault and do a get_block) what we would have is the page which is dirty have first block allocated and the other three buffer_heads unmapped. In the above case the writepage should go ahead and try to write the first blocks and clear the page_dirty flag. Further attempts to write to the page will again create a fault and result in allocating blocks and marking page dirty. If we don't write any other offset via mmap address we would still have written the first block to the disk and rest of the space will be considered as a hole. So to address this, we change all of the places where we look for delayed, unmapped, or unwritten buffer heads, and only check for delayed or unwritten buffer heads instead. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-14 17:57:10 -04:00
Theodore Ts'o	de9a55b841	ext4: Fix up whitespace issues in fs/ext4/inode.c This is a pure cleanup patch. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-14 17:45:34 -04:00
Theodore Ts'o	0610b6e999	ext4: Fix 64-bit block type problem on 32-bit platforms The function ext4_mb_free_blocks() was using an "unsigned long" to pass a block number; this will cause 64-bit block numbers to get truncated on x86 and other 32-bit platforms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>	2009-06-15 03:45:05 -04:00
Andreas Dilger	11013911da	ext4: teach the inode allocator to use a goal inode number Enhance the inode allocator to take a goal inode number as a paremeter; if it is specified, it takes precedence over Orlov or parent directory inode allocation algorithms. The extents migration function uses the goal inode number so that the extent trees allocated the migration function use the correct flex_bg. In the future, the goal inode functionality will also be used to allocate an adjacent inode for the extended attributes. Also, for testing purposes the goal inode number can be specified via /sys/fs/{dev}/inode_goal. This can be useful for testing inode allocation beyond 2^32 blocks on very large filesystems. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 11:45:35 -04:00
Theodore Ts'o	f157a4aa98	ext4: Use a hash of the topdir directory name for the Orlov parent group Instead of using a random number to determine the goal parent grop for the Orlov top directories, use a hash of the directory name. This allows for repeatable results when trying to benchmark filesystem layout algorithms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 11:09:42 -04:00
Theodore Ts'o	4ab2f15b7f	ext4: move the abort flag from s_mount_opts to s_mount_flags We're running out of space in the mount options word, and EXT4_MOUNT_ABORT isn't really a mount option, but a run-time flag. So move it to become EXT4_MF_FS_ABORTED in s_mount_flags. Also remove bogus ext2_fs.h / ext4.h simultaneous #include protection, which can never happen. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:36 -04:00
Theodore Ts'o	bc0b0d6d69	ext4: update the s_last_mounted field in the superblock This field can be very helpful when a system administrator is trying to sort through large numbers of block devices or filesystem images. What is stored in this field can be ambiguous if multiple filesystem namespaces are in play; what we store in practice is the mountpoint interpreted by the process's namespace which first opens a file in the filesystem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:48 -04:00
Theodore Ts'o	7f4520cc62	ext4: change s_mount_opt to be an unsigned int We can only fit 32 options in s_mount_opt because an unsigned long is 32-bits on a x86 machine. So use an unsigned int to save space on 64-bit platforms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 10:09:41 -04:00
Akira Fujita	748de6736c	ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl The EXT4_IOC_MOVE_EXT exchanges the blocks between orig_fd and donor_fd, and then write the file data of orig_fd to donor_fd. ext4_mext_move_extent() is the main fucntion of ext4 online defrag, and this patch includes all functions related to ext4 online defrag. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-17 19:24:03 -04:00
Alessio Igor Bogani	337eb00a2c	Push BKL down into ->remount_fs() [xfs, btrfs, capifs, shmem don't need BKL, exempt] Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:11 -04:00
Christoph Hellwig	ebc1ac1645	->write_super lock_super pushdown Push down lock_super into ->write_super instances and remove it from the caller. Following filesystem don't need ->s_lock in ->write_super and are skipped: * bfs, nilfs2 - no other uses of s_lock and have internal locks in ->write_super * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in ->write_super * xfs - no other uses of s_lock and uses internal lock (buffer lock on superblock buffer) to serialize ->write_super. Also xfs_fs_write_super is superflous and will go away in the next merge window Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:09 -04:00
Al Viro	bbd6851a32	Push lock_super() into the ->remount_fs() of filesystems that care about it Note that since we can't run into contention between remount_fs and write_super (due to exclusion on s_umount), we have to care only about filesystems that touch lock_super() on their own. Out of those ext3, ext4, hpfs, sysv and ufs do need it; fat doesn't since its ->remount_fs() only accesses assign-once data (basically, it's "we have no atime on directories and only have atime on files for vfat; force nodiratime and possibly noatime into *flags"). [folded a build fix from hch] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:08 -04:00
Christoph Hellwig	6cfd014842	push BKL down into ->put_super Move BKL into ->put_super from the only caller. A couple of filesystems had trivial enough ->put_super (only kfree and NULLing of s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs, hugetlbfs, omfs, qnx4, shmem, all others got the full treatment. Most of them probably don't need it, but I'd rather sort that out individually. Preferably after all the other BKL pushdowns in that area. [AV: original used to move lock_super() down as well; these changes are removed since we don't do lock_super() at all in generic_shutdown_super() now] [AV: fuse, btrfs and xfs are known to need no damn BKL, exempt] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:07 -04:00
Al Viro	a9e220f832	No need to do lock_super() for exclusion in generic_shutdown_super() We can't run into contention on it. All other callers of lock_super() either hold s_umount (and we have it exclusive) or hold an active reference to superblock in question, which prevents the call of generic_shutdown_super() while the reference is held. So we can replace lock_super(s) with get_fs_excl() in generic_shutdown_super() (and corresponding change for unlock_super(), of course). Since ext4 expects s_lock held for its put_super, take lock_super() into it. The rest of filesystems do not care at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:07 -04:00
Christoph Hellwig	8c85e12512	remove ->write_super call in generic_shutdown_super We just did a full fs writeout using sync_filesystem before, and if that's not enough for the filesystem it can perform it's own writeout in ->put_super, which many filesystems already do. Move a call to foofs_write_super into every foofs_put_super for now to guarantee identical behaviour until it's cleaned up by the individual filesystem maintainers. Exceptions: - affs already has identical copy & pasted code at the beginning of affs_put_super so no need to do it twice. - xfs does the right thing without it and I have changes pending for the xfs tree touching this are so I don't really need conflicts here.. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:06 -04:00
Linus Torvalds	c9059598ea	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c	2009-06-11 11:10:35 -07:00
Aneesh Kumar K.V	a41f207169	ext4: Avoid corrupting the uninitialized bit in the extent during truncate The unitialized bit was not properly getting preserved in in an extent which is partially truncated because the it was geting set to the value of the first extent to be removed or truncated as part of the truncate operation, and if there are multiple extents are getting removed or modified as part of the truncate operation, it is only the last extent which will might be partially truncated, and its uninitalized bit is not necessarily the same as the first extent to be truncated. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-10 14:22:55 -04:00
Theodore Ts'o	0eab928221	ext4: Don't treat a truncation of a zero-length file as replace-via-truncate If a non-existent file is opened via O_WRONLY\|O_CREAT\|O_TRUNC, there's no need to treat this as a true file truncation, so we shouldn't activate the replace-via-truncate hueristic. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-09 09:54:40 -04:00
Toshiyuki Okajima	9aee228607	ext4: fix dx_map_entry to support 256k directory blocks The dx_map_entry structure doesn't support over 64KB block size by current usage of its member("offs"). Because "offs" treats an offset of copies of the ext4_dir_entry_2 structure as is. This member size is 16 bits. But real offset for over 64KB(256KB) block size needs 18 bits. However, real offset keeps 4 byte boundary, so lower 2 bits is not used. Therefore, we do the following to fix this limitation: For "store": we divide the real offset by 4 and then store this result to "offs" member. For "use": we multiply "offs" member by 4 and then use this result as real offset. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-08 12:41:35 -04:00
Aneesh Kumar K.V	f8514083cd	ext4: truncate the file properly if we fail to copy data from userspace In generic_perform_write if we fail to copy the user data we don't update the inode->i_size. We should truncate the file in the above case so that we don't have blocks allocated outside inode->i_size. Add the inode to orphan list in the same transaction as block allocation This ensures that if we crash in between the recovery would do the truncate. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> CC: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-05 00:56:49 -04:00
Aneesh Kumar K.V	1938a150c2	ext4: Avoid leaking blocks after a block allocation failure We should add inode to the orphan list in the same transaction as block allocation. This ensures that if we crash after a failed block allocation and before we do a vmtruncate we don't leak block (ie block marked as used in bitmap but not claimed by the inode). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> CC: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-05 01:00:26 -04:00
Eric Sandeen	b31e15527a	ext4: Change all super.c messages to print the device This patch changes ext4 super.c to include the device name with all warning/error messages, by using a new utility function ext4_msg. It's a rather large patch, but very mechanic. I left debug printks alone. This is a straightforward port of a patch which Andi Kleen did for ext3. Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-04 17:36:36 -04:00
Jan Kara	03f5d8bcf0	ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle() Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle(). This seems to be a relict from some old days and setting disksize in this function does not make much sense. Currently it was set only by ext4_getblk(). Since the parameter has some effect only if create == 1, it is easy to check by grepping through the sources that the three callers which end up calling ext4_getblk() with create == 1 (ext4_append, ext4_quota_write, ext4_mkdir) do the right thing and set disksize themselves. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-09 00:17:05 -04:00
Aneesh Kumar K.V	b767e78a17	ext4: Don't look at buffer_heads outside i_size. Buffer heads outside i_size will be unmapped. So when we are doing "walk_page_buffers" limit ourself to i_size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Josef Bacik <jbacik@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> ----	2009-06-04 08:06:06 -04:00
Johann Lombardi	e6462869e4	ext4: Fix goal inum check in the inode allocator The goal inode is specificed by inode number which belongs to [1; s_inodes_count]. Signed-off-by: Johann Lombardi <johann@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-05 23:45:11 -04:00
Theodore Ts'o	5adfee9c17	ext4: fix no journal corruption with locale-gen If there is no journal, ext4_should_writeback_data() should return TRUE. This will fix ext4_set_aops() to set ext4_da_ops in the case of delayed allocation; otherwise ext4_journaled_aops gets used by default, which doesn't handle delayed allocation properly. The advantage of using ext4_should_writeback_data() approach is that it should handle nobh better as well. Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh Kumar for suggesting this approach. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-08 17:11:24 -04:00
Aneesh Kumar K.V	5887e98b60	ext4: Calculate required journal credits for inserting an extent properly When we have space in the extent tree leaf node we should be able to insert the extent with much less journal credits. The code was doing proper calculation but missed a return statement. Reported-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-05 23:12:04 -04:00
Jan Kara	ffacfa7a79	ext4: Fix truncation of symlinks after failed write Contents of long symlinks is written via standard write methods. So when the write fails, we add inode to orphan list. But symlinks don't have .truncate method defined so nobody properly removes them from the on disk orphan list. Fix this by calling ext4_truncate() directly instead of calling vmtruncate() (which is saner anyway since we don't need anything vmtruncate() does except from calling .truncate in these paths). We also add inode to orphan list only if ext4_can_truncate() is true (currently, it can be false for symlinks when there are no blocks allocated) - otherwise orphan list processing will complain and ext4_truncate() will not remove inode from on-disk orphan list. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 16:22:22 -04:00
Jesper Dangaard Brouer	3e03f9ca6a	ext4: Use rcu_barrier() on module unload. The ext4 module uses rcu_call() thus it should use rcu_barrier()on module unload. The kmem cache ext4_pspace_cachep is sometimes free'ed using call_rcu() callbacks. Thus, we must wait for completion of call_rcu() before doing kmem_cache_destroy(). Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-05 22:29:27 -04:00
Eric Sandeen	726447d803	ext4: naturally align struct ext4_allocation_request As Ted noted, the ext4_allocation_request isn't well aligned. Looking at it with pahole we're wasting space on 64-bit arches: struct ext4_allocation_request { struct inode * inode; /* 0 8 / ext4_lblk_t logical; / 8 4 / / XXX 4 bytes hole, try to pack / ext4_fsblk_t goal; / 16 8 / ext4_lblk_t lleft; / 24 4 / / XXX 4 bytes hole, try to pack / ext4_fsblk_t pleft; / 32 8 / ext4_lblk_t lright; / 40 4 / / XXX 4 bytes hole, try to pack / ext4_fsblk_t pright; / 48 8 / unsigned int len; / 56 4 / unsigned int flags; / 60 4 / / --- cacheline 1 boundary (64 bytes) --- / / size: 64, cachelines: 1, members: 9 / / sum members: 52, holes: 3, sum holes: 12 */ }; Grouping 32-bit members together closes these holes and shrinks the structure by 12 bytes. which is important since ext4 can get on the hairy edge of stack overruns. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-13 10:24:17 -04:00
Eric Sandeen	089ceecc1e	ext4: mark several more functions in mballoc.c as noinline Ted noticed a stack-deep callchain through writepages->ext4_mb_regular_allocator->ext4_mb_init_cache->submit_bh ... With all the static functions in mballoc.c, gcc helpfully inlines for us, and we get something like this: ext4_mb_regular_allocator (232 bytes stack) ext4_mb_init_cache (232 bytes stack) submit_bh (starts 464 deeper) the 2 ext4 functions here get several others inlined; by telling gcc not to inline them, we can save stack space for when we head off into submit_bh land and associated block layer callchains. The following noinlined functions are only called once, so this won't impact any other callchains: ext4_mb_regular_allocator (104) (was 232) ext4_mb_find_by_goal (56) (noinlined) ext4_mb_init_group (24) (noinlined) ext4_mb_init_cache (136) (was 232) ext4_mb_generate_buddy (88) (noinlined) ext4_mb_generate_from_pa (40) (noinlined) submit_bh ext4_mb_simple_scan_group (24) (noinlined) ext4_mb_scan_aligned (56) (noinlined) ext4_mb_complex_scan_group (40) (noinlined) ext4_mb_try_best_found (24) (noinlined) now when we head off into submit_bh() we're only 264 bytes deeper in stack than when we entered ext4_mb_regular_allocator() (vs. 464 bytes before). Every 200 bytes helps. :) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-05 22:17:31 -04:00
Theodore Ts'o	f4a01017d6	ext4: Fix potential reclaim deadlock when truncating partial block The ext4_block_truncate_page() function previously called grab_cache_page(), which called find_or_create_page() with the __GFP_FS flag potentially set. This could cause a deadlock if the system is low on memory and it attempts a memory reclaim, which could potentially call back into ext4. So we need to call find_or_create_page() directly, and remove the __GFP_FP flag to avoid this potential deadlock. Thanks to Roland Dreier for reporting a lockdep warning which showed this problem. [20786.363249] ================================= [20786.363257] [ INFO: inconsistent lock state ] [20786.363265] 2.6.31-2-generic #14~rbd4gitd960eea9 [20786.363270] --------------------------------- [20786.363276] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. [20786.363285] http/8397 [HC0[0]:SC0[0]:HE1:SE1] takes: [20786.363291] (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150 [20786.363314] {IN-RECLAIM_FS-W} state was registered at: [20786.363320] [<ffffffff8108bef6>] mark_irqflags+0xc6/0x1a0 [20786.363334] [<ffffffff8108d347>] __lock_acquire+0x287/0x430 [20786.363345] [<ffffffff8108d595>] lock_acquire+0xa5/0x150 [20786.363355] [<ffffffff812008da>] jbd2_journal_start+0xfa/0x150 [20786.363365] [<ffffffff811d98a8>] ext4_journal_start_sb+0x58/0x90 [20786.363377] [<ffffffff811cce85>] ext4_delete_inode+0xc5/0x2c0 [20786.363389] [<ffffffff81146fa3>] generic_delete_inode+0xd3/0x1a0 [20786.363401] [<ffffffff81147095>] generic_drop_inode+0x25/0x30 [20786.363411] [<ffffffff81145ce2>] iput+0x62/0x70 [20786.363420] [<ffffffff81142878>] dentry_iput+0x98/0x110 [20786.363429] [<ffffffff81142a00>] d_kill+0x50/0x80 [20786.363438] [<ffffffff811444c5>] dput+0x95/0x180 [20786.363447] [<ffffffff8120de4b>] ecryptfs_d_release+0x2b/0x70 [20786.363459] [<ffffffff81142978>] d_free+0x28/0x60 [20786.363468] [<ffffffff81142a18>] d_kill+0x68/0x80 [20786.363477] [<ffffffff81142ad3>] prune_one_dentry+0xa3/0xc0 [20786.363487] [<ffffffff81142d61>] __shrink_dcache_sb+0x271/0x290 [20786.363497] [<ffffffff81142e89>] prune_dcache+0x109/0x1b0 [20786.363506] [<ffffffff81142f6f>] shrink_dcache_memory+0x3f/0x50 [20786.363516] [<ffffffff810f6d3d>] shrink_slab+0x12d/0x190 [20786.363527] [<ffffffff810f97d7>] balance_pgdat+0x4d7/0x640 [20786.363537] [<ffffffff810f9a57>] kswapd+0x117/0x170 [20786.363546] [<ffffffff810773ce>] kthread+0x9e/0xb0 [20786.363558] [<ffffffff8101430a>] child_rip+0xa/0x20 [20786.363569] [<ffffffffffffffff>] 0xffffffffffffffff [20786.363598] irq event stamp: 15997 [20786.363603] hardirqs last enabled at (15997): [<ffffffff81125f9d>] kmem_cache_alloc+0xfd/0x1a0 [20786.363617] hardirqs last disabled at (15996): [<ffffffff81125f01>] kmem_cache_alloc+0x61/0x1a0 [20786.363628] softirqs last enabled at (15966): [<ffffffff810631ea>] __do_softirq+0x14a/0x220 [20786.363641] softirqs last disabled at (15861): [<ffffffff8101440c>] call_softirq+0x1c/0x30 [20786.363651] [20786.363653] other info that might help us debug this: [20786.363660] 3 locks held by http/8397: [20786.363665] #0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<ffffffff8112ed24>] do_truncate+0x64/0x90 [20786.363685] #1: (&sb->s_type->i_alloc_sem_key#5){+++++.}, at: [<ffffffff81147f90>] notify_change+0x250/0x350 [20786.363707] #2: (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150 [20786.363724] [20786.363726] stack backtrace: [20786.363734] Pid: 8397, comm: http Tainted: G C 2.6.31-2-generic #14~rbd4gitd960eea9 [20786.363741] Call Trace: [20786.363752] [<ffffffff8108ad7c>] print_usage_bug+0x18c/0x1a0 [20786.363763] [<ffffffff8108b0c0>] ? check_usage_backwards+0x0/0xb0 [20786.363773] [<ffffffff8108bad2>] mark_lock_irq+0xf2/0x280 [20786.363783] [<ffffffff8108bd97>] mark_lock+0x137/0x1d0 [20786.363793] [<ffffffff8108c03c>] mark_held_locks+0x6c/0xa0 [20786.363803] [<ffffffff8108c11f>] lockdep_trace_alloc+0xaf/0xe0 [20786.363813] [<ffffffff810efbac>] __alloc_pages_nodemask+0x7c/0x180 [20786.363824] [<ffffffff810e9411>] ? find_get_page+0x91/0xf0 [20786.363835] [<ffffffff8111d3b7>] alloc_pages_current+0x87/0xd0 [20786.363845] [<ffffffff810e9827>] __page_cache_alloc+0x67/0x70 [20786.363856] [<ffffffff810eb7df>] find_or_create_page+0x4f/0xb0 [20786.363867] [<ffffffff811cb3be>] ext4_block_truncate_page+0x3e/0x460 [20786.363876] [<ffffffff812008da>] ? jbd2_journal_start+0xfa/0x150 [20786.363885] [<ffffffff812008bb>] ? jbd2_journal_start+0xdb/0x150 [20786.363895] [<ffffffff811c6415>] ? ext4_meta_trans_blocks+0x75/0xf0 [20786.363905] [<ffffffff811e8d8b>] ext4_ext_truncate+0x1bb/0x1e0 [20786.363916] [<ffffffff811072c5>] ? unmap_mapping_range+0x75/0x290 [20786.363926] [<ffffffff811ccc28>] ext4_truncate+0x498/0x630 [20786.363938] [<ffffffff8129b4ce>] ? _raw_spin_unlock+0x5e/0xb0 [20786.363947] [<ffffffff81107306>] ? unmap_mapping_range+0xb6/0x290 [20786.363957] [<ffffffff8108c3ad>] ? trace_hardirqs_on+0xd/0x10 [20786.363966] [<ffffffff811ffe58>] ? jbd2_journal_stop+0x1f8/0x2e0 [20786.363976] [<ffffffff81107690>] vmtruncate+0xb0/0x110 [20786.363986] [<ffffffff81147c05>] inode_setattr+0x35/0x170 [20786.363995] [<ffffffff811c9906>] ext4_setattr+0x186/0x370 [20786.364005] [<ffffffff81147eab>] notify_change+0x16b/0x350 [20786.364014] [<ffffffff8112ed30>] do_truncate+0x70/0x90 [20786.364021] [<ffffffff8112f48b>] T.657+0xeb/0x110 [20786.364021] [<ffffffff8112f4be>] sys_ftruncate+0xe/0x10 [20786.364021] [<ffffffff81013132>] system_call_fastpath+0x16/0x1b Reported-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-05 22:08:16 -04:00
Andreas Dilger	0b8e58a140	ext4: super.c whitespace cleanup Cleanup of whitespace and formatting. Initially driven by confusing indents for the ext4_{block,inode}_bitmap() et. al. helper routines, but figured I'd cleanup some other 80-column wrapping and other indenting problems at the same time. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-03 17:59:28 -04:00
Theodore Ts'o	88b6edd17c	ext4: Clean up calls to ext4_get_group_desc() If the caller isn't planning on modifying the block group descriptors, there's no need to pass in a pointer to a struct buffer_head. Nuking this saves a tiny amount of CPU time and stack space usage. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-25 11:50:39 -04:00
Theodore Ts'o	759d427aa5	ext4: remove unused function __ext4_write_dirty_metadata The __ext4_write_dirty_metadata() function was introduced by commit `0390131b`, "ext4: Allow ext4 to run without a journal", but nothing ever used the function, either then or since. So let's remove it and save a bit of space. Cc: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-25 11:51:00 -04:00
Martin K. Petersen	e1defc4ff0	block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
Manish Katiyar	f68301656b	ext4: Fix memory leak in ext4_fill_super() in case of a failed mount Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-17 23:52:44 -04:00
Theodore Ts'o	0568c51893	ext4: down i_data_sem only for read when walking tree for fiemap Not sure why I put this in as down_write originally; all we are doing is walking the tree, nothing will change under us and concurrent reads should be no problem. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-17 23:31:23 -04:00
Theodore Ts'o	6fd058f779	ext4: Add a comprehensive block validity check to ext4_get_blocks() To catch filesystem bugs or corruption which could lead to the filesystem getting severly damaged, this patch adds a facility for tracking all of the filesystem metadata blocks by contiguous regions in a red-black tree. This allows quick searching of the tree to locate extents which might overlap with filesystem metadata blocks. This facility is also used by the multi-block allocator to assure that it is not allocating blocks out of the system zone, as well as by the routines used when reading indirect blocks and extents information from disk to make sure their contents are valid. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-17 15:38:01 -04:00
Theodore Ts'o	2ec0ae3ace	ext4: Fix race in ext4_inode_info.i_cached_extent If two CPU's simultaneously call ext4_ext_get_blocks() at the same time, there is nothing protecting the i_cached_extent structure from being used and updated at the same time. This could potentially cause the wrong location on disk to be read or written to, including potentially causing the corruption of the block group descriptors and/or inode table. This bug has been in the ext4 code since almost the very beginning of ext4's development. Fortunately once the data is stored in the page cache cache, ext4_get_blocks() doesn't need to be called, so trying to replicate this problem to the point where we could identify its root cause was extremely difficult. Many thanks to Kevin Shanahan for working over several months to be able to reproduce this easily so we could finally nail down the cause of the corruption. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>	2009-05-15 09:07:28 -04:00
Aneesh Kumar K.V	2a8964d63d	ext4: Clear the unwritten buffer_head flag after the extent is initialized The BH_Unwritten flag indicates that the buffer is allocated on disk but has not been written; that is, the disk was part of a persistent preallocation area. That flag should only be set when a get_blocks() function is looking up a inode's logical to physical block mapping. When ext4_get_blocks_wrap() is called with create=1, the uninitialized extent is converted into an initialized one, so the BH_Unwritten flag is no longer appropriate. Hence, we need to make sure the BH_Unwritten is not left set, since the combination of BH_Mapped and BH_Unwritten is not allowed; among other things, it will result ext4's get_block() to be called over and over again during the write_begin phase of write(2). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-14 17:05:39 -04:00
Theodore Ts'o	2ac3b6e00a	ext4: Clean up ext4_get_blocks() so it does not depend on bh_result->b_state The ext4_get_blocks() function was depending on the value of bh_result->b_state as an input parameter to decide whether or not update the delalloc accounting statistics by calling ext4_da_update_reserve_space(). We now use a separate flag, EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE, to requests this update, so that all callers of ext4_get_blocks() can clear map_bh.b_state before calling ext4_get_blocks() without worrying about any consistency issues. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-14 13:57:08 -04:00
Theodore Ts'o	2fa3cdfb31	ext4: Merge ext4_da_get_block_write() into mpage_da_map_blocks() The static function ext4_da_get_block_write() was only used by mpage_da_map_blocks(). So to simplify the code, merge that function into mpage_da_map_blocks(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-14 09:29:45 -04:00
Aneesh Kumar K.V	33b9817e2a	ext4: Use a fake block number for delayed new buffer_head Use a very large unsigned number (~0xffff) as as the fake block number for the delayed new buffer. The VFS should never try to write out this number, but if it does, this will make it obvious. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-12 14:40:37 -04:00
Aneesh Kumar K.V	9c1ee184a3	ext4: Fix sub-block zeroing for writes into preallocated extents We need to mark the buffer_head mapping preallocated space as new during write_begin. Otherwise we don't zero out the page cache content properly for a partial write. This will cause file corruption with preallocation. Now that we mark the buffer_head new we also need to have a valid buffer_head blocknr so that unmap_underlying_metadata() unmaps the correct block. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-13 18:36:58 -04:00
Theodore Ts'o	a2dc52b5d1	ext4: Add BUG_ON debugging checks to noalloc_get_block_write() Enforce that noalloc_get_block_write() is only called to map one block at a time, and that it always is successful in finding a mapping for given an inode's logical block block number if it is called with create == 1. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-12 13:51:29 -04:00
Theodore Ts'o	b920c75502	ext4: Add documentation to the ext4_get_block functions This adds more documentation to various internal functions in fs/ext4/inode.c, most notably ext4_ind_get_blocks(), ext4_da_get_block_write(), ext4_da_get_block_prep(), ext4_normal_get_block_write(). In addition, the static function ext4_normal_get_block_write() has been renamed noalloc_get_block_write(), since it is used in many places far beyond ext4_normal_writepage(). Plenty of warnings have been added to the noalloc_get_block_write() function, since the way it is used is amazingly fragile. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-14 00:54:29 -04:00
Theodore Ts'o	c217705733	ext4: Define a new set of flags for ext4_get_blocks() The functions ext4_get_blocks(), ext4_ext_get_blocks(), and ext4_ind_get_blocks() used an ad-hoc set of integer variables used as boolean flags passed in as arguments. Use a single flags parameter and a setandard set of bitfield flags instead. This saves space on the call stack, and it also makes the code a bit more understandable. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-14 00:58:52 -04:00
Theodore Ts'o	12b7ac1768	ext4: Rename ext4_get_blocks_wrap() to be ext4_get_blocks() Another function rename for clarity's sake. The _wrap prefix simply confuses people, and didn't add much people trying to follow the code paths. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-14 00:57:44 -04:00
Theodore Ts'o	e4d996ca80	ext4: Rename ext4_get_blocks_handle() to be ext4_ind_get_blocks() The static function ext4_get_blocks_handle() is badly named. Of course it takes a handle. Since its counterpart for extent-based file is ext4_ext_get_blocks(), rename it to be ext4_ind_get_blocks(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-12 00:25:28 -04:00
Theodore Ts'o	f888e652d7	ext4: Simplify function signature for ext4_da_get_block_write() The function ext4_da_get_block_write() is called in exactly one write, and the last argument, create, is always 1. Remove it to simplify the code slightly. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-12 00:21:29 -04:00
Vincent Minet	bc8e67409c	ext4: Fix spinlock assertions on UP systems On UP systems without DEBUG_SPINLOCK, ext4_is_group_locked always fails which triggers a BUG_ON() call. This patch fixes it by using assert_spin_locked instead. Signed-off-by: Vincent Minet <vincent@vincent-minet.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-15 08:33:18 -04:00
Aneesh Kumar K.V	955ce5f5be	ext4: Convert ext4_lock_group to use sb_bgl_lock We have sb_bgl_lock() and ext4_group_info.bb_state bit spinlock to protech group information. The later is only used within mballoc code. Consolidate them to use sb_bgl_lock(). This makes the mballoc.c code much simpler and also avoid confusion with two locks protecting same info. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-02 20:35:09 -04:00
Theodore Ts'o	eefd7f03b8	ext4: fix the length returned by fiemap for an unallocated extent If the file's blocks have not yet been allocated because of delayed allocation, the length of the extent returned by fiemap is incorrect. This commit fixes this bug. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-02 19:05:37 -04:00
Eric Sandeen	c9877b205f	ext4: fix for fiemap last-block test Carl Henrik Lunde reported and debugged this; the test for the last allocated block was comparing bytes to blocks in this test: if (logical + length - 1 == EXT_MAX_BLOCK \|\| ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK) flags \|= FIEMAP_EXTENT_LAST; so any extent which ended right at 4G was stopping the extent walk. Just replacing these values with the extent block & length should fix it. Also give blksize_bits a saner type, and reverse the order of the tests to make the more likely case tested first. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no> Tested-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 23:32:06 -04:00
Aneesh Kumar K.V	abc8746eb9	ext4: hook fiemap operation for directories Add fiemap callback for directories Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-02 22:54:32 -04:00
Curt Wohlgemuth	f40339031b	ext4: Make the length of the mb_history file tunable In memory-constrained systems with many partitions, the ~68K for each partition for the mb_history buffer can be excessive. This patch adds a new mount option, mb_history_length, as well as a way of setting the default via a module parameter (or via a sysfs parameter in /sys/module/ext4/parameter/default_mb_history_length). If the mb_history_length is set to zero, the mb_history facility is disabled entirely. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 20:27:20 -04:00
Theodore Ts'o	bb23c20a85	ext4: Move fs/ext4/group.h into ext4.h Move the function prototypes in group.h into ext4.h so they are all defined in one place. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 19:44:44 -04:00
Theodore Ts'o	596397b77c	ext4: Move fs/ext4/namei.h into ext4.h The fs/ext4/namei.h header file had only a single function declaration, and should have never been a standalone file. Move it into ext4.h, where should have been from the beginning. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 13:49:15 -04:00
Theodore Ts'o	ca0faba0e8	ext4: Move the ext4_sb.h header file into ext4.h There is no longer a reason for a separate ext4_sb.h header file, so move it into ext4.h just to make life easier for developers to find the relevant data structures and typedefs. Should also speed up compiles slightly, too. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-03 16:33:44 -04:00
Theodore Ts'o	d444c3c381	ext4: Move the ext4_i.h header file into ext4.h There is no longer a reason for a separate ext4_i.h header file, so move it into ext4.h just to make life easier for developers to find the relevant data structures and typedefs. Should also speed up compiles slightly, too. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 13:44:33 -04:00
Theodore Ts'o	75507efb13	ext4: Don't avoid using BLOCK_UNINIT block groups in mballoc By avoiding the use of not-yet-used block groups (i.e., block groups with the BLOCK_UNINIT flag), mballoc had a tendency to create large files with large non-contiguous gaps. In addition avoiding the use of new block groups had a tendency to push regular file data into the first block group in a flex_bg group, which slows down the speed of e2fsck pass 2, since it has a tendency to seek much more. For example: Before Patch After Patch Time in seconds Time in seconds Real / User/ Sys MB/s Real / User/ Sys MB/s Pass 1 8.52 / 2.21 / 0.46 20.43 8.84 / 4.97 / 1.11 19.68 Pass 2 21.16 / 1.02 / 1.86 11.30 6.54 / 1.77 / 1.78 36.39 Pass 3 0.01 / 0.00 / 0.00 139.00 0.01 / 0.01 / 0.00 128.90 Pass 4 0.16 / 0.15 / 0.00 0.00 0.17 / 0.17 / 0.00 0.00 Pass 5 2.52 / 1.99 / 0.09 0.79 2.31 / 1.78 / 0.06 0.86 Total 32.40 / 5.11 / 2.49 12.81 17.99 / 8.75 / 2.98 23.01 This was on a sample 80 gig root filesystem which was approximately 50% full. Note the improved e2fsck pass 2 performance, by over a factor of 3, due to a decreased number of seeks. (The total amount of I/O in pass 2 was unchanged; the layout of the directory blocks was simply much better from e2fsck's's perspective.) Other changes as a result of this patch on this sample filesystem: Before Patch After Patch # of non-contig files 762 779 # of non-contig directories 571 570 # of BLOCK_UNINIT bg's 307 293 # of INODE_UNINIT bg's 503 503 Out of 640 block groups, of which 333 were in use, this patch caused an extra 14 block groups to be utilized. The number of non-contiguous files did go up slightly, but when measured against the 99.9% of the files (603,154) which were contiguously allocated, this is pretty insignificant. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andreas Dilger <adilger@sun.com>	2009-05-01 12:58:36 -04:00
Theodore Ts'o	8b0f9e8f78	ext4: avoid unnecessary spinlock in critical POSIX ACL path If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem to do POSIX ACL checks on any files not owned by the caller, and it does this for every single pathname component that it looks up. That obviously can be pretty expensive if the filesystem isn't careful about it, especially with locking. That's doubly sad, since the common case tends to be that there are no ACL's associated with the files in question. ext4 already caches the ACL data so that it doesn't have to look it up over and over again, but it does so by taking the inode->i_lock spinlock on every lookup. Which is a noticeable overhead even if it's a private lock, especially on CPU's where the serialization is expensive (eg Intel Netburst aka 'P4'). For the special case of not actually having any ACL's, all that locking is unnecessary. Even if somebody else were to be changing the ACL's on another CPU, we simply don't care - if we've seen a NULL ACL, we might as well use it. So just load the ACL speculatively without any locking, and if it was NULL, just use it. If it's non-NULL (either because we had a cached entry, or because the cache hasn't been filled in at all), it means that we'll need to get the lock and re-load it properly. (This commit was ported from a patch originally authored by Linus for ext3.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-27 17:33:23 -04:00
Theodore Ts'o	9bffad1ed2	ext4: convert instrumentation from markers to tracepoints Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-17 11:48:11 -04:00
Theodore Ts'o	32ed5058ce	ext4: Replace lock/unlock_super() with an explicit lock for resizing Use a separate lock to protect s_groups_count and the other block group descriptors which get changed via an on-line resize operation, so we can stop overloading the use of lock_super(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-25 22:53:39 -04:00
Theodore Ts'o	3b9d4ed266	ext4: Replace lock/unlock_super() with an explicit lock for the orphan list Use a separate lock to protect the orphan list, so we can stop overloading the use of lock_super(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-25 22:54:04 -04:00
Theodore Ts'o	a63c9eb2ce	ext4: ext4_mark_recovery_complete() doesn't need to use lock_super The function ext4_mark_recovery_complete() is called from two call paths: either (a) while mounting the filesystem, in which case there's no danger of any other CPU calling write_super() until the mount is completed, and (b) while remounting the filesystem read-write, in which case the fs core has already locked the superblock. This also allows us to take out a very vile unlock_super()/lock_super() pair in ext4_remount(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 01:59:42 -04:00
Theodore Ts'o	114e9fc907	ext4: Remove outdated comment about lock_super() ext4_fill_super() is no longer called by read_super(), and it is no longer called with the superblock locked. The unlock_super()/lock_super() is no longer present, so this comment is entirely superfluous. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-25 15:48:07 -04:00
Theodore Ts'o	8df9675f8b	ext4: Avoid races caused by on-line resizing and SMP memory reordering Ext4's on-line resizing adds a new block group and then, only at the last step adjusts s_groups_count. However, it's possible on SMP systems that another CPU could see the updated the s_group_count and not see the newly initialized data structures for the just-added block group. For this reason, it's important to insert a SMP read barrier after reading s_groups_count and before reading any (for example) the new block group descriptors allowed by the increased value of s_groups_count. Unfortunately, we rather blatently violate this locking protocol documented in fs/ext4/resize.c. Fortunately, (1) on-line resizes happen relatively rarely, and (2) it seems rare that the filesystem code will immediately try to use just-added block group before any memory ordering issues resolve themselves. So apparently problems here are relatively hard to hit, since ext3 has been vulnerable to the same issue for years with no one apparently complaining. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 08:50:38 -04:00
Theodore Ts'o	9ca92389c5	ext4: Use separate super_operations structure for no_journal filesystems By using a separate super_operations structure for filesystems that have and don't have journals, we can simply ext4_write_super() --- which is only needed when no journal is present --- and ext4_freeze(), ext4_unfreeze(), and ext4_sync_fs(), which are only needed when the journal is present. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 12:52:25 -04:00
Theodore Ts'o	7234ab2a55	ext4: Fix and simplify s_dirt handling The s_dirt flag wasn't completely handled correctly, but it didn't really matter when journalling was enabled. It turns out that when ext4 runs without a journal, we don't clear s_dirt in places where we should have, with the result that the high-level write_super() function was writing the superblock when it wasn't necessary. So we fix this by making ext4_commit_super() clear the s_dirt flag, and removing many of the other places where s_dirt is manipulated. When journalling is enabled, the s_dirt flag might be left set more often, but s_dirt really doesn't matter when journalling is enabled. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-30 21:24:04 -04:00
Theodore Ts'o	e2d670523c	ext4: Simplify ext4_commit_super()'s function signature The ext4_commit_super() function took both a struct super_block * and a struct ext4_super_block *, but the struct ext4_super_block can be derived from the struct super_block. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 00:33:44 -04:00
Theodore Ts'o	f7c439504c	ext4: Use is_power_of_2() for clarity Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-24 23:31:59 -04:00
Theodore Ts'o	c5ca7c7636	ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array For very large filesystems, the s_flex_groups array can get quite big. For example, a filesystem that can be resized up to 16TB will have 8192 flex groups (assuming the default flex_bg size of 16), so the array is 96k, which is very marginal for kmalloc(). On the other hand, a 160GB filesystem without the resize_inode feature will only require 960 bytes. So we try to allocate the array first using kmalloc(), and if that fails, we'll try to use vmalloc() instead. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-27 22:48:48 -04:00
Aneesh Kumar K.V	29fa89d088	ext4: Mark the unwritten buffer_head as mapped during write_begin Setting BH_Unwritten buffer_heads as BH_Mapped avoids multiple (unnecessary) calls to get_block() during the call to the write(2) system call. Setting BH_Unwritten buffer heads as BH_Mapped requires that the writepages() functions can handle BH_Unwritten buffer_heads. After this commit, things work as follows: ext4_ext_get_block() returns unmapped, unwritten, buffer head when called with create = 0 for prealloc space. This makes sure we handle the read path and non-delayed allocation case correctly. Even though the buffer head is marked unmapped we have valid b_blocknr and b_bdev values in the buffer_head. ext4_da_get_block_prep() called for block resrevation will now return mapped, unwritten, new buffer_head for prealloc space. This avoids multiple calls to get_block() for write to same offset. By making such buffers as BH_New, we also assure that sub-block zeroing of buffered writes happens correctly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-12 16:30:27 -04:00
Aneesh Kumar K.V	79ffab3439	ext4: Properly initialize the buffer_head state These struct buffer_heads are allocated on the stack (and hence are initialized with stack garbage). They are only used to call a get_blocks() function, so that's mostly OK, but b_state must be initialized to be 0 so we don't have any unexpected BH_* flags set by accident, such as BH_Unwritten or BH_Delay. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-13 15:13:42 -04:00
Theodore Ts'o	c4b5a61431	ext4: Do not try to validate extents on special files The EXTENTS_FL flag should never be set on special files, but if it is, don't bother trying to validate that the extents tree is valid, since only files, directories, and non-fast symlinks will ever have an extent data structure. We perhaps should flag the filesystem as being corrupted if we see a special file (named pipes, device nodes, Unix domain sockets, etc.) with the EXTENTS_FL flag, but e2fsck doesn't currently check this case, so we'll just ignore this for now, since it's harmless. Without this fix, a special device with the extents flag is flagged as an error by the kernel, so it is impossible to access or delete the inode, but e2fsck doesn't see it as a problem, leading to confused/frustrated users. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-24 18:45:35 -04:00
Theodore Ts'o	a9e817425d	ext4: Ignore i_file_acl_high unless EXT4_FEATURE_INCOMPAT_64BIT is present Don't try to look at i_file_acl_high unless the INCOMPAT_64BIT feature bit is set. The field is normally zero, but older versions of e2fsck didn't automatically check to make sure of this, so in the spirit of "be liberal in what you accept", don't look at i_file_acl_high unless we are using a 64-bit filesystem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-24 16:11:18 -04:00
Theodore Ts'o	485c26ec70	ext4: Fix softlockup caused by illegal i_file_acl value in on-disk inode If the block containing external extended attributes (which is stored in i_file_acl and i_file_acl_high) is larger than the on-disk filesystem, the process which tried to access the extended attributes will endlessly issue kernel printks complaining that "__find_get_block_slow() failed", locking up that CPU until the system is forcibly rebooted. So when we read in the inode, make sure the i_file_acl value is legal, and if not, flag the filesystem as being corrupted. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-24 13:43:20 -04:00
Linus Torvalds	a4277bf122	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix potential inode allocation soft lockup in Orlov allocator ext4: Make the extent validity check more paranoid jbd: use SWRITE_SYNC_PLUG when writing synchronous revoke records jbd2: use SWRITE_SYNC_PLUG when writing synchronous revoke records ext4: really print the find_group_flex fallback warning only once	2009-04-24 08:37:40 -07:00

1 2 3 4 5 ...

691 commits