The direct I/O write codepath for CIFS is done through
cifs_user_write(). That function does not currently call
generic_write_checks() so the file position isn't being properly set
when the file is opened with O_APPEND. It's also not doing the other
"normal" checks that should be done for a write call.
The problem is currently that when you open a file with O_APPEND on a
mount with the directio mount option, the file position is set to the
beginning of the file. This makes any subsequent writes clobber the data
in the file starting at the beginning.
This seems to fix the problem in cursory testing. It is, however
important to note that NFS disallows the combination of
(O_DIRECT|O_APPEND). If my understanding is correct, the concern is
races with multiple clients appending to a file clobbering each others'
data. Since the write model for CIFS and NFS is pretty similar in this
regard, CIFS is probably subject to the same sort of races. What's
unclear to me is why this is a particular problem with O_DIRECT and not
with buffered writes...
Regardless, disallowing O_APPEND on an entire mount is probably not
reasonable, so we'll probably just have to deal with it and reevaluate
this flag combination when we get proper support for O_DIRECT. In the
meantime this patch at least fixes the existing problem.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] list entry can not return null
turn cifs_setattr into a multiplexor that calls the correct function
move file time and dos attribute setting logic into new function
spin off cifs_setattr with unix extensions to its own function
[CIFS] Code cleanup in old sessionsetup code
[CIFS] cifs_mkdir and cifs_create should respect the setgid bit on parent dir
Rename CIFSSMBSetFileTimes to CIFSSMBSetFileInfo and add PID arg
change CIFSSMBSetTimes to CIFSSMBSetPathInfo
[CIFS] fix trailing whitespace
bundle up Unix SET_PATH_INFO args into a struct and change name
Fix missing braces in cifs_revalidate()
remove locking around tcpSesAllocCount atomic variable
[CIFS] properly account for new user= field in SPNEGO upcall string allocation
[CIFS] remove level of indentation from decode_negTokenInit
[CIFS] cifs send2 not retrying enough in some cases on full socket
[CIFS] oid should also be checked against class in cifs asn
We'd like to be able to use the unix SET_PATH_INFO_BASIC args to set
file times as well, but that makes the argument list rather long. Bundle
up the args for unix SET_PATH_INFO call into a struct. For now, we don't
actually use the times fields anywhere. That will be done in a follow-on
patch.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).
This also facilitates lockdeping of page lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cifs_convert_flags returns 0x20197 in the default case. It's not
immediately evident where that number comes from, so change it
to be an or'ed set of flags. The compiler will boil it down anyway.
(Thanks to Guenter Kukkukk for clarifying the flags).
Signed-off-by: Steve French <sfrench@us.ibm.com>
Shirish Pargaonkar noted:
With cifsacl mount option, when a file is created on the Windows server,
exclusive oplock is broken right away because the get cifs acl code
again opens the file to obtain security descriptor.
The client does not have the newly created file handle or inode in any
of its lists yet so it does not respond to oplock break and server waits for
its duration and then responds to the second open. This slows down file
creation signficantly. The fix is to pass the file descriptor to the get
cifsacl code wherever available so that get cifs acl code does not send
second open (NT Create ANDX) and oplock is not broken.
CC: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Christoph had noticed too many ifdefs in the CIFS code making it
hard to read. This patch removes about a quarter of them from
the C files in cifs by improving a few key ifdefs in the .h files.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Fix RedHat bug 329431
The idea here is separate "conscious" from "unconscious" flushes.
Conscious flushes are those due to a fsync() or close(). Unconscious
ones are flushes that occur as a side effect of some other operation or
due to memory pressure.
Currently, when an error occurs during an unconscious flush (ENOSPC or
EIO), we toss out the page and don't preserve that error to report to
the user when a conscious flush occurs. If after the unconscious flush,
there are no more dirty pages for the inode, the conscious flush will
simply return success even though there were previous errors when writing
out pages. This can lead to data corruption.
The easiest way to reproduce this is to mount up a CIFS share that's
very close to being full or where the user is very close to quota. mv
a file to the share that's slightly larger than the quota allows. The
writes will all succeed (since they go to pagecache). The mv will do a
setattr to set the new file's attributes. This calls
filemap_write_and_wait,
which will return an error since all of the pages can't be written out.
Then later, when the flush and release ops occur, there are no more
dirty pages in pagecache for the file and those operations return 0. mv
then assumes that the file was written out correctly and deletes the
original.
CIFS already has a write_behind_rc variable where it stores the results
from earlier flushes, but that value is only reported in cifs_close.
Since the VFS ignores the return value from the release operation, this
isn't helpful. We should be reporting this error during the flush
operation.
This patch does the following:
1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also
sync to check its return code. If it returns successful, they then check
the value of write_behind_rc to see if an earlier flush had reported any
errors. If so, they return that error and clear write_behind_rc.
2) sets write_behind_rc in a few other places where pages are written
out as a side effect of other operations and the code waits on them.
3) changes cifs_setattr to only call filemap_write_and_wait for
ATTR_SIZE changes.
4) makes cifs_writepages accurately distinguish between EIO and ENOSPC
errors when writing out pages.
Some simple testing indicates that the patch works as expected and that
it fixes the reproduceable known problem.
Acked-by: Dave Kleikamp <shaggy@austin.rr.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
request
In SendReceive() function in transport.c - it memcpy's
message payload into a buffer passed via out_buf param. The function
assumes that all buffers are of size (CIFSMaxBufSize +
MAX_CIFS_HDR_SIZE) , unfortunately it is also called with smaller
(MAX_CIFS_SMALL_BUFFER_SIZE) buffers. There are eight callers
(SMB worker functions) which are primarily affected by this change:
TreeDisconnect, uLogoff, Close, findClose, SetFileSize, SetFileTimes,
Lock and PosixLock
CC: Dave Kleikamp <shaggy@austin.ibm.com>
CC: Przemyslaw Wegrzyn <czajnik@czajsoft.pl>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When find_writable_file is racing with close and the session
to the server goes down, Shaggy noticed that there was a
chance that an open file in the list of files off the inode
could have been freed by close since cifs_reconnect can
block (the spinlock thus not held). This means that
we have to start over at the beginning of the list in some
cases.
There is a 2nd change that needs to be made later
(pointed out by Jeremy Allison and Shaggy) in order to
prevent cifs_close ever freeing the cifs per file info
when a write is pending. Although we delay close from
freeing this memory for sufficiently long for all known
cases, ultimately on a very, very slow write
overlapping a close pending we need to allow close to return
(without freeing the cifs file info) and defer freeing the
memory to be the responsibility of the (sloooow) write
thread (presumably have to look at every place wrtPending
is decremented - and add a flag for deferred free for
after wrtPending goes to zero).
Acked-by: Shaggy <shaggy@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Harmless since it only protected turning off caching for the
inode, but cleaner to lock around this in case we have a close
racing with open.
Signed-off-by: Shaggy <shaggy@us.ibm.com>
CC: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
There was a case in which find_writable_file was not waiting long enough
under heavy stress when writepages was racing with close of the file
handle being used by the write.
Signed-off-by: Steve French <sfrench@us.ibm.com>
On a mount without posix extensions enabled, when an unlock request is
made, the client can release more than is intended. To reproduce, on a
CIFS mount without posix extensions enabled:
1) open file
2) do fcntl lock: start=0 len=1
3) do fcntl lock: start=2 len=1
4) do fcntl unlock: start=0 len=1
...on the unlock call the client sends an unlock request to the server
for both locks. The problem is a bad test in cifs_lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Caused by unneeded reopen during reconnect while spinlock held.
Fixes kernel bugzilla bug #7903
Thanks to Lin Feng Shen for testing this, and Amit Arora for
some nice problem determination to narrow this down.
Acked-by: Dave Kleikamp <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Previously the only way to do this was to umount all mounts to that server,
turn off a proc setting (/proc/fs/cifs/LinuxExtensionsEnabled).
Fixes Samba bugzilla bug number: 4582 (and also 2008)
Signed-off-by: Steve French <sfrench@us.ibm.com>
It's common for file systems to need to zero data on either side of a
write, if a page is not Uptodate during prepare_write. It just so happens
that simple_prepare_write() in libfs.c does exactly that, so we can avoid
duplication and just call that function to zero page data.
Signed-off-by: Nate Diller <nate.diller@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This should be the last big batch of whitespace/formatting fixes.
checkpatch warnings for the cifs directory are down about 90% and
many of the remaining ones are harder to remove or make the code
harder to read.
Signed-off-by: Steve French <sfrench@us.ibm.com>
nfsd is passing null nameidata (probably the only one doing that)
on call to create - cifs was missing one check for this.
Note that running nfsd over a cifs mount requires specifying fsid on
the nfs exports entry and requires mounting cifs with serverino mount
option.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.
Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally at http://lkml.org/lkml/2006/9/2/86
The recent change to "allow Windows blocking locks to be cancelled via a
CANCEL_LOCK call" introduced a new semaphore in struct cifsFileInfo,
lock_sem. However, semaphores used as mutexes are deprecated these days,
and there's no reason to add a new one to the kernel. Therefore, convert
lock_sem to a struct mutex (and also fix one indentation glitch on one of
the lines changed anyway).
Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Also expand debug entry to show which character on a failed Unicode
mapping.
Acked-by: Shaggy <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
file->f_path.dentry or file->f_path.dentry.d_inode can't be NULL since at
least ten years, similar for all but very few arguments passed in from the
VFS.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Could cause hangs on smp systems in i_size_read on a cifs inode
whose size has been previously simultaneously updated from
different processes.
Thanks to Brian Wang for some great testing/debugging on this
hard problem.
Fixes kernel bugzilla #7903
CC: Shirish Pargoankar <shirishp@us.ibm.com>
CC: Shaggy <shaggy@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
The two cifs functions that used the most stack according
to "make checkstack" have been changed to use less stack.
Thanks to jra and Shaggy for helpful ideas
Signed-off-by: Steve French <sfrench@us.ibm.com>
cc: jra@samba.org
cc: shaggy@us.ibm.com
This also adds he required page "writeback" flag handling, that cifs
hasn't been doing and that the page dirty flag changes made obvious.
Acked-by: Steve French <smfltc@us.ibm.com>
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
CIFS implements ->readpages and doesn't use read_cache_pages(). So wire the
read IO accounting up within CIFS.
Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: David Wright <daw@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the cifs
filesystem.
Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Informational/debug message was being logged too often. The error
case of logging having to send a close with (presumably stuck on buggy
server) pending writes is still logged.
Signed-off-by: Steve French <sfrench@us.ibm.com>
This just ignore the remaining pages, and will fix a forgot put_pages_list().
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove inclusions of linux/mpage.h that are no longer necessary due to the
transfer of generic_writepages().
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow Windows blocking locks to be cancelled via a
CANCEL_LOCK call. TODO - restrict this to servers
that support NT_STATUS codes (Win9x will probably
not support this call).
Signed-off-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 570d4d2d895569825d0d017d4e76b51138f68864 commit)
request and do not time out slow requests to a server that is still responding
well to other threads
Suggested by jra of Samba team
Signed-off-by: Steve French <sfrench@us.ibm.com>
(cherry picked from 89b57148115479eef074b8d3f86c4c86c96ac969 commit)
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>