This patch doesn't make any changes to the ordering of the various
operations related to glocking, but it does tidy up the calls to the
glops.c functions to make the structure more obvious.
The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
made static within glock.c since they are called by every set of glock
operations. The xmote_th and drop_th glock operations are then made
conditional upon those two routines existing and called from the
previously mentioned functions in glock.c respectively.
Also it can be seen that the go_sync operation isn't needed since it can
easily be replaced by calls to xmote_bh and drop_bh respectively. This
results in no longer (confusingly) calling back into routines in glock.c
from glops.c and also reducing the glock operations by one member.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Here is a patch for GFS2 to remove the local exclusive flag. In
the places it was used, mutex's are always held earlier in the
call path, so it appears redundant in the LM_ST_SHARED case.
Also, the GFS2 holders were setting local exclusive in any case where
the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
code where the flag was tested have been replaced with tests for the
lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
as globally exclusive).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The "greedy" code was an attempt to retain glocks for a minimum length
of time when they relate to mmap()ed files. The current implementation
of this feature is not, however, ideal in that it required allocating
memory in order to do this and its overly complicated.
It also misses the mark by ignoring the other I/O operations which are
just as likely to suffer from the same problem. So the plan is to remove
this now and then add the functionality back as part of the glock state
machine at a later date (and thus take into account all the possible
users of this feature)
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Here is something I spotted (while looking for something entirely
different) the other day.
Rather than using a completion in each and every struct gfs2_holder,
this removes it in favour of hashed wait queues, thus saving a
considerable amount of memory both on the stack (where a number of
gfs2_holder structures are allocated) and in particular in the
gfs2_inode which has 8 gfs2_holder structures embedded within it.
As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
In actual practice we get a much better result than that since
now that a gfs2_inode is under the 2048 byte barrier, we get two
per 4k slab page effectively halving the amount of memory required
to store gfs2_inodes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This removes the extra filldir callback which gfs2 was using to
enclose an attempt at readahead for inodes during readdir. The
code was too complicated and also hurts performance badly in the
case that the getdents64/readdir call isn't being followed by
stat() and it wasn't even getting it right all the time when it
was.
As a result, on my test box an "ls" of a directory containing 250000
files fell from about 7mins (freshly mounted, so nothing cached) to
between about 15 to 25 seconds. When the directory content was cached,
the time taken fell from about 3mins to about 4 or 5 seconds.
Interestingly in the cached case, running "ls -l" once reduced the time
taken for subsequent runs of "ls" to about 6 secs even without this
patch. Now it turns out that there was a special case of glocks being
used for prefetching the metadata, but because of the timeouts for these
locks (set to 10 secs) the metadata was being timed out before it was
being used and this the prefetch code was constantly trying to prefetch
the same data over and over.
Calling "ls -l" meant that the inodes were brought into memory and once
the inodes are cached, the glocks are not disposed of until the inodes
are pushed out of the cache, thus extending the lifetime of the glocks,
and thus bringing down the time for subsequent runs of "ls"
considerably.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
It occurred to me that although a gfs2 specific writepages for ordered
writes and journaled data would be tricky, by hooking writepages only
for "data=writeback" mounts we could take advantage of not needing
buffer heads (we don't use them on the read side, nor have we for some
time) and create much larger I/Os for the block layer.
Using blktrace both before and after, its possible to see that for large
I/Os, most of the requests generated through writepages are now 1024
sectors after this patch is applied as opposed to 8 sectors before.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc3-mm1:
>...
> git-gfs2-nmw.patch
>...
> git trees
>...
This patch makes the needlessly globlal gfs2_change_nlink_i() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is for Red Hat bugzilla bug bz #222302:
Moving a virtual IP from node to node between two NFS-over-GFS2
servers was causing one of the GFS2 servers to become confused and
reference a deleted inode. The problem was due to vfs dentries that did
not reference the gfs2_dops and therefore didn't call the gfs2 revalidate
code to revalidate a dentry after a directory had been deleted & recreated.
This patch is a crosswrite from a RHEL4 bug found in GFS1 as
bz #190756 and it is against the latest -nmw git tree.
Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Second round of gfs2_rename lock re-ordering to allow Anaconda adding
root partition on top of gfs2. Previous to this patch the recursive
lock detector in glock.c can be triggered due to attempting to lock
the rgrp twice. This fixes it by checking to see whether the rgrp
is already locked.
This fixes Red Hat bugzilla #221237
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Update the quilt header comments to match the
code changes.
Change gfs2_lookup_simple to return an error in the case
of a NULL inode.
The callers of gfs2_lookup_simple do not check for NULL
in the no entry case and such would end up dereferencing a NULL ptr.
This fixes:
http://projects.info-pull.com/mokb/MOKB-15-11-2006.html
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In case of unlinked files with dirty pages GFS2 wasn't clearing
the pages in quite the right order. This patch clears the pages
earlier (before the qlock_dq) to avoid the situation that the
release of the glock results in attempting to write back data that
has already been deallocated.
This fixes Red Hat bugzilla: #220117
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Bugzilla 215088
Fix deadlock in gfs2_change_nlink() while installing RHEL5 into GFS2
partition. The gfs2_rename() apparently needs block allocation for the
new name (into the directory) where it requires rg locks. At the same
time, while updating the nlink count for the replaced file,
gfs2_change_nlink() tries to return the inode meta-data back to resource
group where it needs rg locks too. Our logic doesn't allow process to
acquire these locks recursively by the same process (RHEL installer)
that results a BUG call. This only happens within rename code path and
only if the destination file exists before the rename operation.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is partially derrived from a patch written by Russell Cattelan.
It fixes a bug where there is a race between readpages and truncate
by ignoring readpages for stuffed files. This is ok because a stuffed
file will never be more than one block (minus sizeof(struct gfs2_dinode))
in size and block size is always less than page size, so we do not lose
anything efficiency-wise by not doing readahead for stuffed files. They
will have already been "read ahead" by the action of reading the inode
in, in the first place.
This is the remaining part of the fix for Red Hat bugzilla #218966
which had not yet made it upstream.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Russell Cattelan <cattelan@redhat.com>
This patch fixes Red Hat bugzilla #212627 in which a deadlock occurs
due to trying to take the i_mutex while holding a glock. The correct
locking order is defined as i_mutex -> glock in all cases.
I've left dealing with allocating writes. I know that we need to do
that, but for now this should do the trick. We don't need to take the
i_mutex on write, because the VFS has already taken it for us. On read
we don't need it since the glock is enough protection. The reason that
I've made some of the checks into a separate function is that we'll need
to do the checks again in the allocating write case eventually, so this
is partly in preparation for this. Likewise the return value test of !=
1 might look a bit odd and thats because we'll need a third return value
in case of requiring an allocation.
I've made the change to deferred mode on the glock to ensure flushing
read caches on other nodes. I notice that (using blktrace to look at
whats going on) we appear to do a better job of large I/Os than ext3
after this patch (in terms of not splitting up the I/Os).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Wendy Cheng <wcheng@redhat.com>
If an fs has already been shut down, a lockfs callback should do nothing.
An fs that's been shut down can't acquire locks or do anything with
respect to the cluster.
Also, remove FIXME comment in withdraw function. The missing bits of the
withdraw procedure are now all done by user space.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest;
xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings.
(XFS unlocks the semaphore from a different task, by design. The mutex
code warns about this)
Signed-off-by: Dave Chinner <dgc@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is a patch to fix up the Kconfig so that we don't land up with
problems when people disable the NET subsystem. Thanks for all the hints and
suggestions that people have sent me regarding this.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Aleksandr Koltsoff <czr@iki.fi>
Cc: Toralf Förster <toralf.foerster@gmx.de>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: Chris Zubrzycki <chris@middle--earth.org>
Cc: Patrick Caulfield <pcaulfie@redhat.com>
* master.kernel.org:/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (73 commits)
[DLM] Clean up lowcomms
[GFS2] Change gfs2_fsync() to use write_inode_now()
[GFS2] Fix indent in recovery.c
[GFS2] Don't flush everything on fdatasync
[GFS2] Add a comment about reading the super block
[GFS2] Mount problem with the GFS2 code
[GFS2] Remove gfs2_check_acl()
[DLM] fix format warnings in rcom.c and recoverd.c
[GFS2] lock function parameter
[DLM] don't accept replies to old recovery messages
[DLM] fix size of STATUS_REPLY message
[GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning
[DLM] fix add_requestqueue checking nodes list
[GFS2] Fix recursive locking in gfs2_getattr
[GFS2] Fix recursive locking in gfs2_permission
[GFS2] Reduce number of arguments to meta_io.c:getbuf()
[GFS2] Move gfs2_meta_syncfs() into log.c
[GFS2] Fix journal flush problem
[GFS2] mark_inode_dirty after write to stuffed file
[GFS2] Fix glock ordering on inode creation
...
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a bit better than the previous version of gfs2_fsync()
although it would be better still if we were able to call a
function which only wrote the inode & metadata. Its no big deal
though that this will potentially write the data as well since
the VFS has already done that before calling gfs2_fsync(). I've
also added a comment to explain whats going on here.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>
As per comments from Andrew Morton and Jan Engelhardt, this fixes the
indent and removes the "static" from a variable declaration since its
not needed in this case (now allocated on the stack of the function
in question).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: Andrew Morton <akpm@osdl.org>
Conflicts:
drivers/ata/libata-scsi.c
include/linux/libata.h
Futher merge of Linus's head and compilation fixups.
Signed-Off-By: David Howells <dhowells@redhat.com>
The gfs2_fsync() function was doing a journal flush on each
and every call. While this is correct, its also a lot of
overhead. This patch means that on fdatasync flushes we
rely on the VFS to flush the data for us and we don't do
a journal flush unless we really need to.
We have to do a journal flush for stuffed files though because
they have the data and the inode metadata in the same block.
Journaled files also need a journal flush too of course.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The comment explains why we use the bio functions to read
the super block.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Srinivasa Ds <srinivasa@in.ibm.com>
While mounting the gfs2 filesystem,our test team had a problem and we
got this error message.
=======================================================
GFS2: fsid=: Trying to join cluster "lock_nolock", "dasde1"
GFS2: fsid=dasde1.0: Joined cluster. Now mounting FS...
GFS2: not a GFS2 filesystem
GFS2: fsid=dasde1.0: can't read superblock: -22
==========================================================================
On debugging further we found that problem is while reading the super
block(gfs2_read_super) and comparing the magic number in it.
When I replace the submit_bio() call(present in gfs2_read_super) with
the sb_getblk() and ll_rw_block(), mount operation succeded.
On further analysis we found that before calling submit_bio(),
bio->bi_sector was set to "sector" variable. This "sector" variable has
the same value of bh->b_blocknr(block number). Hence there is a need to
multiply this valuwith (blocksize >> 9)(9 because,sector size
2^9,samething happens in ll_rw_block also, before calling submit_bio()).
So I have developed the patch which solves this problem. Please let me
know your comments.
================================================================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As pointed out by Adrian Bunk, the gfs2_check_acl() function is no
longer used. This patch removes it and renamed gfs2_check_acl_locked()
to gfs2_check_acl() since we only need one variant of that function now.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Adrian Bunk <bunk@stusta.de>
Fix function parameter typing:
fs/gfs2/glock.c💯 warning: function declaration isn't a prototype
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix a printk format warning in fs/gfs2/log.c:
fs/gfs2/log.c:322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t'
Signed-off-by: Ryusuke Konishi <ryusuke@osrg.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The readdirplus NFS operation can result in gfs2_getattr being
called with the glock already held. In this case we do not want
to try and grab the lock again.
This fixes Red Hat bugzilla #215727
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since gfs2_permission may be called either from the VFS (in which case
we need to obtain a shared glock) or from GFS2 (in which case we already
have a glock) we need to test to see whether or not a lock is required.
The original test was buggy due to a potential race. This one should
be safe.
This fixes Red Hat bugzilla #217129
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since the superblock and the address_space are determined by the
glock, we might as well just pass that as the argument since all
the callers already have that available.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fixes a bug which resulted in poor performance due to flushing
the journal too often. The code path in question was via the inode_go_sync()
function in glops.c. The solution is not to flush the journal immediately
when inodes are ejected from memory, but batch up the work for glockd to
deal with later on. This means that glocks may now live on beyond the end of
the lifetime of their inodes (but not very much longer in the normal case).
Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in
calculation of the number of free journal blocks.
The gfs2_logd process has been altered to be more responsive to the journal
filling up. We now wake it up when the number of uncommitted journal blocks
has reached the threshold level rather than trying to flush directly at the
end of each transaction. This again means doing fewer, but larger, log
flushes in general.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The go_sync callback took two flags, but one of them was set on every
call, so this patch removes once of the flags and makes the previously
conditional operations (on this flag), unconditional.
The go_inval callback took three flags, each of which was set on every
call to it. This patch removes the flags and makes the operations
unconditional, which makes the logic rather more obvious.
Two now unused flags are also removed from incore.h.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2 requires the CRC32 library function. This was reported by
Toralf Förster.
Cc: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
When deleting directory entries, we set the inum.no_addr to zero
in a dirent when its the first dirent in a block and thus cannot
be merged into the previous dirent as is the usual case. In gfs1,
inum.no_formal_ino was used instead.
This patch changes gfs2 to set both inum.no_addr and inum.no_formal_ino
to zero. It also changes the test from just looking at inum.no_addr to
look at both inum.no_addr and inum.no_formal_ino and a sentinel is
now considered to be a dirent in which _either_ (or both) of them
is set to zero.
This resolves Red Hat bugzillas: #215809, #211465
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The gfs2_glock_nq_m_atime function is unused in so far as its only
ever called with num_gh = 1, and this falls through to the
gfs2_glock_nq_atime function, so we might as well call that directly.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This moves the locking for bmap into the bmap function itself
rather than using a wrapper function. It also fixes a bug where
the boundary flag was set on the wrong bh. Also the flags on
the mapped bh are reset earlier in the function to ensure that
they are 100% correct on the error path.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Change from GFP_KERNEL to GFP_NOFS as this was causing a
slow down when trying to push inodes from cache.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Stuffed files only consist of a maximum of
(gfs2 block size - sizeof(struct gfs2_dinode)) bytes. Since the
gfs2 block size is always less than page size, we will never see
a call to stuffed_readpage for anything other than the first page
in the file.
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The log lock is dropped prior to io submittion, but
this exposes a hole in which the log data structures
may be going away due to a truncate.
Store the buffer head in a local pointer prior to
dropping the lock and relay on the buffer_head lock
for consitency on the buffer head.
Signed-Off-By: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This function wasn't really doing the right thing. There was no need
to update the inode size at this point and the updating of the
i_blocks field has now been moved to the places where di_blocks is
updated. A result of this patch and some those preceeding it is that
unlocking a glock is now a much more efficient process, since there
is no longer any requirement to copy data from the gfs2 inode into
the vfs inode at this point.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since the inode number is constant, we don't need to keep updating
it everytime we refresh the other inode fields.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We were setting the inode flags from GFS2's flags far too often, even when they
couldn't possibly have changed. This patch reduces the amount of flag
setting going on so that we do it only when the inode is read in or
when the flags have changed. The create case is covered by the "when
the inode is read in" case.
This also fixes a bug where we didn't set S_SYNC correctly.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fixes a race between the glock and the page lock encountered
during truncate in gfs2_readpage and gfs2_prepare_write. The gfs2_readpages
function doesn't need the same fix since it only uses a try lock anyway, so
it will fail back to gfs2_readpage in the case of a potential deadlock.
This bug was spotted by Russell Cattelan.
Cc: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There is no way to set the GL_DUMP flag, and in any case the
same thing can be done with systemtap if required for debugging,
so this removes it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The meta_header for an ondisk rgrp never changes, so there is no point
copying it in and back out to disk. Also there is no reason to keep
a copy for each rgrp in memory.
The code already checks to ensure that the header is correct before
it calls the routine to copy the data in, so that we don't even need
to check whether its correct on disk in the functions in ondisk.c
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
We don't need to use endian conversions for 0 initialisations
when creating a new on-disk inode.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This shrinks the size of the gfs2_inode by 8 bytes by
replacing the version counter with a one bit valid/invalid
flag.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is almost never used. Its there for backward
compatibility with GFS1. It doesn't need its own
field since it can always be calculated from the
inode mode & flags. This saves a bit more space
in the gfs2_inode.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Remove the di_[amc]time fields and use inode->i_[amc]time
fields instead. This saves 24 bytes from the gfs2_inode.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Remove the di_nlink field in favour of inode->i_nlink and
update the nlink handling to use the proper macros. This
saves 4 bytes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Remove duplicate di_uid/di_gid fields in favour of using
inode->i_uid/inode->i_gid instead. This saves 8 bytes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This removes the duplicate di_mode field in favour of using the
inode->i_mode field. This saves 4 bytes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This removes the device numbers from this structure by using
inode->i_rdev instead. It also cleans up the code in gfs2_mknod.
It results in shrinking the gfs2_inode by 8 bytes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The metadata header doesn't need to be stored in the incore
struct gfs2_inode since its constant, and this patch removes it.
Also, there is already a field for the inode's number in the
struct gfs2_inode, so we don't need one in struct gfs2_dinode_host
as well.
This saves 28 bytes of space in the struct gfs2_inode.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Change argument for gfs2_dinode_print in order to prepare
for removal of duplicate fields between struct inode and
struct gfs2_dinode_host.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
gfs2_dinode_in() is only ever called from one place, so move it
to that place (in inode.c) and make it static.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is a preliminary patch to enable the removal of fields
in gfs2_dinode_host which are duplicated in struct inode.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Everywhere this was called, a struct gfs2_inode was available,
but despite that, it was always called with a struct gfs2_dinode
as an argument. By making this change it paves the way to start
eliminating fields duplicated between the kernel's struct inode
and the struct gfs2_dinode.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Commit "[GFS2] split and annotate gfs2_log_head" resulted in an incorrect
checksum calculation for log headers. This patch corrects the
problem without resorting to copying the whole log header as
the previous code used to.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Annotated scalar fields, dropped unused ones. Note that
it's not at all obvious that we want to convert all of them
to host-endian...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The latter is used as part of gfs2-private part of struct inode.
It actually stores a lot of fields differently; for now the
declaration is just cloned, inode field is swtiched and changes
propagated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix the OOM error handling in inode.c where it was possible for
a NULL pointer to be dereferenced.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This adds a sync_fs superblock operation for GFS2 and removes
the journal flush from write_super in favour of sync_fs where it
ought to be. This is more or less identical to the way in which ext3
does this.
This bug was pointed out by Russell Cattelan <cattelan@redhat.com>
Cc: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
First, SLAB_PANIC is unjustified. Second, all error propagating and backing out
is in place.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This just ignore the remaining pages, and remove unneeded unlock_pages().
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the "if (extlen)" case, "bh" was used uninitialized.
This patch changes the code to what seems to have been intended.
Spotted by the Coverity checker.
This patch also removes a pointless "bh = NULL" asignment (the variable
is never accessed again after this point).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Don't dereference new->s_root when we do know it's NULL.
Spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In the "if (extlen)" case, "new" might be used uninitialized.
Looking at the code, it should be initialized to 0.
Spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The Coverity checker spotted this obviously dead code.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fix means that bmap will map extents of the length requested
by the VFS rather than guessing at it, or just mapping one block
at a time. The other callers of gfs2_block_map are audited to ensure
they send the correct max extent lengths (i.e. set bh->b_size correctly).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Pass kaddr rather than (incorrect) struct page to kunmap_atomic.
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The log lock needs to be held when manipulating the counter
for the number of free journal blocks.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This fixes a bug where, in certain cases an uninitialised variable
could cause a dereference of a NULL pointer in gfs2_commit_write().
Also a typo in a comment is fixed at the same time.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix a size calculation error.
The size was incorrect being computed as a
negative length and then being passed to an
unsigned parameter.
This in turn would cause the allocator to
think it needed enough meta data to store
a gigabyte file for every file created.
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch has gfs2_security_init declared as a static function, which
is correct. As a result, the declaration of this function in inode.h is
removed (and thus inode.h is unchanged). Also removed #include eaops.h,
which is not needed.
Signed-Off-By: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This moves the logging code from meta_io.c into log.c and glops.c. As a
result the routines can now be static and all the logging code is together
in log.c, leaving meta_io.c with just metadata i/o code in it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
In many places GFS2 was calling the endian conversion routines
for an inode even when only a single field, or a few fields might
have changed. As a result we were copying lots of data needlessly.
This patch replaces those calls with conversion of just the
required fields in each case. This should be faster and easier
to understand. There are still other places which suffer from this
problem, but this is a start in the right direction.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
For some reason we had two different sets of code for reading in the
superblock. This removes one of them in favour of the other. Also we
don't need the temporary buffer for the sb since we already have one
in the gfs2 sb itself.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Update GFS2 in the light of David Howells' patch:
[PATCH] BLOCK: Move common FS-specific ioctls to linux/fs.h [try #6]
36695673b0
which calls the filesystem independant flags FS_..._FL. As a result
we no longer need the flags.h file and the conversion routine is
moved into the GFS2 source code.
Userland programs which used to include iflags.h should now include
fs.h and use the new flag names.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix GFS for streamline-generic_file_-interfaces-and-filemap.patch
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).
This patch:
The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The Makefile had the wrong CONFIG_ variable in it so that in
case GFS2 was y and the lock modules were m, they were not
getting built properly.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix a bug in the directory reading code, where we might have dereferenced
a NULL pointer in case of OOM. Updated the directory code to use the new
& improved version of gfs2_meta_ra() which now returns the first block
that was being read. Previously it was releasing it requiring following
code to grab the block again at each point it was called.
Also turned off readahead on directory lookups since we are reading a
hash table, and therefore reading the entries in order is very
unlikely. Readahead is still used for all other calls to the
directory reading function (e.g. when growing the hash table).
Removed the DIO_START constant. Everywhere this was used, it was
used to unconditionally start i/o aside from a couple of places, so
I've removed it and made the couple of exceptions to this rule into
separate functions.
Also hunted through the other DIO flags and removed them as arguments
from functions which were always called with the same combination of
arguments.
Updated gfs2_meta_indirect_buffer to be a bit more efficient and
hopefully also be a bit easier to read.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is an attempt to fix Red Hat bz 204364. I don't hit it all
the time, but with these changes, running postmark which used to
trigger it on a regular basis no longer appears to. So I'm not
saying that its 100% certain that its fixed, but it does look
promising at the moment.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
lm_interface.h has a few out of the tree clients such as GFS1
and userland tools.
Right now, these clients keeps a copy of the file in their build tree
that can go out of sync.
Move lm_interface.h to include/linux, export it to userland and
clean up fs/gfs2 to use the new location.
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
i_blksize got removed in -mm.
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Fix for Red Hat bz 205307. Don't need to lock in readpage if
the higher level code has already grabbed the lock.
Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is a tidy up of the GFS2 bmap code. The main change is that the
bh is passed to gfs2_block_map allowing the flags to be set directly
rather than having to repeat that code several times in ops_address.c.
At the same time, the extent mapping code from gfs2_extent_map has
been moved into gfs2_block_map. This allows all calls to gfs2_block_map
to map extents in the case that no allocation is taking place. As a
result reads and non-allocating writes should be faster. A quick test
with postmark appears to support this.
There is a limit on the number of blocks mapped in a single bmap
call in that it will only ever map blocks which are pointed to
from a single pointer block. So in other words, it will never try
to do additional i/o in order to satisfy read-ahead. The maximum
number of blocks is thus somewhat less than 512 (the GFS2 4k block
size minus the header divided by sizeof(u64)). I've further limited
the mapping of "normal" blocks to 32 blocks (to avoid extra work)
since readpages() will currently read a maximum of 32 blocks ahead (128k).
Some further work will probably be needed to set a suitable value
for DIO as well, but for now thats left at the maximum 512 (see
ops_address.c:gfs2_get_block_direct).
There is probably a lot more that can be done to improve bmap for GFS2,
but this is a good first step.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Print an error message if mount fails in setting up the sysfs files.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
A one liner bug fix to prevent the return value being
wrong when more than one superblock is mounted.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Based upon previous feedback from lkml and also removing some
commented out debugging which is no longer needed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use atomic_t as the ref count in glocks rather than a kref.
This is another step towards using RCU for the glock hash.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This results in smaller list heads, so that we can have more chains
in the same amount of memory (twice as many). I've multiplied the
size of the table by four though - this is because we are saving
memory by not having one lock per chain any more. So we land up
using about the same amount of memory for the hash table as we
did before I started these changes, the difference being that we
now have four times as many hash chains.
The reason that I say "about the same amount of memory" is that the
actual amount now depends upon the NR_CPUS and some of the config
variables, so that its not exact and in some cases we do use more
memory. Eventually we might want to scale the hash table size
according to the size of physical ram as measured on module load.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The existing implementation of this function in glock.c was not
very efficient as it relied upon keeping a cursor element upon the
hash chain in question and moving it along. This new version improves
upon this by using the current element as a cursor. This is possible
since we only look at the "next" element in the list after we've
taken the read_lock() subsequent to calling the examiner function.
Obviously we have to eventually drop the ref count that we are then
left with and we cannot do that while holding the read_lock, so we
do that next time we drop the lock. That means either just before
we examine another glock, or when the loop has terminated.
The new implementation has several advantages: it uses only a
read_lock() rather than a write_lock(), so it can run simnultaneously
with other code, it doesn't need a "plug" element, so that it removes
a test not only from this list iterator, but from all the other glock
list iterators too. So it makes things faster and smaller.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Add back the consts which were casted away in the glock sorting
function. Also add early exit code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Make the number of locks used for hash chains in glock.c
proportional to NR_CPUS. Also move constants for the number
of hash chains into glock.c from incore.h since they are
not used outside of glock.c.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This splits the rwlocks guarding the hash chains of the glock hash
table into their own array. This will reduce memory usage in some
cases due to better alignment, although the real reason for doing it
is to allow the two tables to be different sizes in future (i.e.
the locks will be sized proportionally with the max number of CPUs
and the hash chains sized proportinally with the size of physical memory)
In order to allow this, the gl_bucket member of struct gfs2_glock has
now become gl_hash, so we record the hash rather than a pointer to the
bucket itself.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As requested by Jan Engelhardt, this removes the typedefs in the
locking module interface and replaces them with void *. Also
since we are changing the interface, I've added a few consts
as well.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This was missed in an earlier patch when changing over from vmalloc
to kmalloc for the superblock.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This code is no longer used for anything and can be removed
from the locking modules. The sync_lvb function is not required
as this happens automatically with the current locking system.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This removes one of the typedefs from the locking interface. It
is replaced by a forward declaration of the gfs2 superblock. The
other two are not so easy to solve since in their case, they
can refer to one of two possible structures.
Cc: David Teigland <teigland@redhat.com>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
There are several reasons why we want to do this:
- Firstly its large and thus we'll scale better with multiple
GFS2 fs mounted at the same time
- Secondly its easier to scale its size as required (thats a plan
for later patches)
- Thirdly, we can use kzalloc rather than vmalloc when allocating
the superblock (its now only 4888 bytes)
- Fourth its all part of my plan to eventually be able to use RCU
with the glock hash.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
This is another patch preparing for sharing of the glock hash
table between different gfs2 mounts.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Use snprintf(buf, PAGE_SIZE, ...) instead of sprintf for sysfs show
methods. Per instructions in Documentation/filesystems/sysfs.txt
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Ass a comment explaining the slightly odd construct used to
pass error values back.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's follow up emails, here are a few small
fixes which were missed earlier.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request, some unused code is removed and
some consts added in the quota code.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's comments, removed some unused code and
removed some brackets which were not required.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
As per Jan Engelhardt's request and also a few of my own. It has
been possible to add a few most const to the code as a result of
the change in gfs2_ea_name2type.
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>