License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 08:07:57 -06:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2005-04-16 16:20:36 -06:00
|
|
|
/*
|
|
|
|
* linux/fs/read_write.c
|
|
|
|
*
|
|
|
|
* Copyright (C) 1991, 1992 Linus Torvalds
|
|
|
|
*/
|
|
|
|
|
2017-02-08 10:51:33 -07:00
|
|
|
#include <linux/slab.h>
|
2005-04-16 16:20:36 -06:00
|
|
|
#include <linux/stat.h>
|
2017-02-08 10:51:33 -07:00
|
|
|
#include <linux/sched/xacct.h>
|
2005-04-16 16:20:36 -06:00
|
|
|
#include <linux/fcntl.h>
|
|
|
|
#include <linux/file.h>
|
|
|
|
#include <linux/uio.h>
|
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12 15:06:03 -06:00
|
|
|
#include <linux/fsnotify.h>
|
2005-04-16 16:20:36 -06:00
|
|
|
#include <linux/security.h>
|
2011-11-16 21:57:37 -07:00
|
|
|
#include <linux/export.h>
|
2005-04-16 16:20:36 -06:00
|
|
|
#include <linux/syscalls.h>
|
2006-01-04 17:20:40 -07:00
|
|
|
#include <linux/pagemap.h>
|
2007-06-04 01:59:47 -06:00
|
|
|
#include <linux/splice.h>
|
2013-02-24 08:52:26 -07:00
|
|
|
#include <linux/compat.h>
|
2015-11-10 14:53:30 -07:00
|
|
|
#include <linux/mount.h>
|
2016-02-16 14:20:59 -07:00
|
|
|
#include <linux/fs.h>
|
2013-03-20 11:19:30 -06:00
|
|
|
#include "internal.h"
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2016-12-24 12:46:01 -07:00
|
|
|
#include <linux/uaccess.h>
|
2005-04-16 16:20:36 -06:00
|
|
|
#include <asm/unistd.h>
|
|
|
|
|
2006-03-28 02:56:42 -07:00
|
|
|
const struct file_operations generic_ro_fops = {
|
2005-04-16 16:20:36 -06:00
|
|
|
.llseek = generic_file_llseek,
|
2014-04-02 12:33:16 -06:00
|
|
|
.read_iter = generic_file_read_iter,
|
2005-04-16 16:20:36 -06:00
|
|
|
.mmap = generic_file_readonly_mmap,
|
2007-06-01 06:52:37 -06:00
|
|
|
.splice_read = generic_file_splice_read,
|
2005-04-16 16:20:36 -06:00
|
|
|
};
|
|
|
|
|
|
|
|
EXPORT_SYMBOL(generic_ro_fops);
|
|
|
|
|
2017-07-06 10:58:37 -06:00
|
|
|
static inline bool unsigned_offsets(struct file *file)
|
2010-10-01 15:20:22 -06:00
|
|
|
{
|
2010-12-17 05:44:05 -07:00
|
|
|
return file->f_mode & FMODE_UNSIGNED_OFFSET;
|
2010-10-01 15:20:22 -06:00
|
|
|
}
|
|
|
|
|
2013-06-24 22:02:13 -06:00
|
|
|
/**
|
|
|
|
* vfs_setpos - update the file offset for lseek
|
|
|
|
* @file: file structure in question
|
|
|
|
* @offset: file offset to seek to
|
|
|
|
* @maxsize: maximum file size
|
|
|
|
*
|
|
|
|
* This is a low-level filesystem helper for updating the file offset to
|
|
|
|
* the value specified by @offset if the given offset is valid and it is
|
|
|
|
* not equal to the current file offset.
|
|
|
|
*
|
|
|
|
* Return the specified offset on success and -EINVAL on invalid offset.
|
|
|
|
*/
|
|
|
|
loff_t vfs_setpos(struct file *file, loff_t offset, loff_t maxsize)
|
2011-09-15 17:06:48 -06:00
|
|
|
{
|
|
|
|
if (offset < 0 && !unsigned_offsets(file))
|
|
|
|
return -EINVAL;
|
|
|
|
if (offset > maxsize)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (offset != file->f_pos) {
|
|
|
|
file->f_pos = offset;
|
|
|
|
file->f_version = 0;
|
|
|
|
}
|
|
|
|
return offset;
|
|
|
|
}
|
2013-06-24 22:02:13 -06:00
|
|
|
EXPORT_SYMBOL(vfs_setpos);
|
2011-09-15 17:06:48 -06:00
|
|
|
|
2008-08-11 07:37:17 -06:00
|
|
|
/**
|
2011-09-15 17:06:50 -06:00
|
|
|
* generic_file_llseek_size - generic llseek implementation for regular files
|
2008-08-11 07:37:17 -06:00
|
|
|
* @file: file structure to seek on
|
|
|
|
* @offset: file offset to seek to
|
2012-12-17 16:59:39 -07:00
|
|
|
* @whence: type of seek
|
vfs: allow custom EOF in generic_file_llseek code
For ext3/4 htree directories, using the vfs llseek function with
SEEK_END goes to i_size like for any other file, but in reality
we want the maximum possible hash value. Recent changes
in ext4 have cut & pasted generic_file_llseek() back into fs/ext4/dir.c,
but replicating this core code seems like a bad idea, especially
since the copy has already diverged from the vfs.
This patch updates generic_file_llseek_size to accept
both a custom maximum offset, and a custom EOF position. With this
in place, ext4_dir_llseek can pass in the appropriate maximum hash
position for both maxsize and eof, and get what it wants.
As far as I know, this does not fix any bugs - nfs in the kernel
doesn't use SEEK_END, and I don't know of any user who does. But
some ext4 folks seem keen on doing the right thing here, and I can't
really argue.
(Patch also fixes up some comments slightly)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-30 12:11:29 -06:00
|
|
|
* @size: max size of this file in file system
|
|
|
|
* @eof: offset used for SEEK_END position
|
2008-08-11 07:37:17 -06:00
|
|
|
*
|
2011-09-15 17:06:50 -06:00
|
|
|
* This is a variant of generic_file_llseek that allows passing in a custom
|
vfs: allow custom EOF in generic_file_llseek code
For ext3/4 htree directories, using the vfs llseek function with
SEEK_END goes to i_size like for any other file, but in reality
we want the maximum possible hash value. Recent changes
in ext4 have cut & pasted generic_file_llseek() back into fs/ext4/dir.c,
but replicating this core code seems like a bad idea, especially
since the copy has already diverged from the vfs.
This patch updates generic_file_llseek_size to accept
both a custom maximum offset, and a custom EOF position. With this
in place, ext4_dir_llseek can pass in the appropriate maximum hash
position for both maxsize and eof, and get what it wants.
As far as I know, this does not fix any bugs - nfs in the kernel
doesn't use SEEK_END, and I don't know of any user who does. But
some ext4 folks seem keen on doing the right thing here, and I can't
really argue.
(Patch also fixes up some comments slightly)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-30 12:11:29 -06:00
|
|
|
* maximum file size and a custom EOF position, for e.g. hashed directories
|
2011-09-15 17:06:48 -06:00
|
|
|
*
|
|
|
|
* Synchronization:
|
2011-09-15 17:06:50 -06:00
|
|
|
* SEEK_SET and SEEK_END are unsynchronized (but atomic on 64bit platforms)
|
2011-09-15 17:06:48 -06:00
|
|
|
* SEEK_CUR is synchronized against other SEEK_CURs, but not read/writes.
|
|
|
|
* read/writes behave like SEEK_SET against seeks.
|
2008-08-11 07:37:17 -06:00
|
|
|
*/
|
2008-06-27 03:05:24 -06:00
|
|
|
loff_t
|
2012-12-17 16:59:39 -07:00
|
|
|
generic_file_llseek_size(struct file *file, loff_t offset, int whence,
|
vfs: allow custom EOF in generic_file_llseek code
For ext3/4 htree directories, using the vfs llseek function with
SEEK_END goes to i_size like for any other file, but in reality
we want the maximum possible hash value. Recent changes
in ext4 have cut & pasted generic_file_llseek() back into fs/ext4/dir.c,
but replicating this core code seems like a bad idea, especially
since the copy has already diverged from the vfs.
This patch updates generic_file_llseek_size to accept
both a custom maximum offset, and a custom EOF position. With this
in place, ext4_dir_llseek can pass in the appropriate maximum hash
position for both maxsize and eof, and get what it wants.
As far as I know, this does not fix any bugs - nfs in the kernel
doesn't use SEEK_END, and I don't know of any user who does. But
some ext4 folks seem keen on doing the right thing here, and I can't
really argue.
(Patch also fixes up some comments slightly)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-30 12:11:29 -06:00
|
|
|
loff_t maxsize, loff_t eof)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2012-12-17 16:59:39 -07:00
|
|
|
switch (whence) {
|
2008-08-11 07:37:17 -06:00
|
|
|
case SEEK_END:
|
vfs: allow custom EOF in generic_file_llseek code
For ext3/4 htree directories, using the vfs llseek function with
SEEK_END goes to i_size like for any other file, but in reality
we want the maximum possible hash value. Recent changes
in ext4 have cut & pasted generic_file_llseek() back into fs/ext4/dir.c,
but replicating this core code seems like a bad idea, especially
since the copy has already diverged from the vfs.
This patch updates generic_file_llseek_size to accept
both a custom maximum offset, and a custom EOF position. With this
in place, ext4_dir_llseek can pass in the appropriate maximum hash
position for both maxsize and eof, and get what it wants.
As far as I know, this does not fix any bugs - nfs in the kernel
doesn't use SEEK_END, and I don't know of any user who does. But
some ext4 folks seem keen on doing the right thing here, and I can't
really argue.
(Patch also fixes up some comments slightly)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-30 12:11:29 -06:00
|
|
|
offset += eof;
|
2008-08-11 07:37:17 -06:00
|
|
|
break;
|
|
|
|
case SEEK_CUR:
|
2008-11-10 18:08:08 -07:00
|
|
|
/*
|
|
|
|
* Here we special-case the lseek(fd, 0, SEEK_CUR)
|
|
|
|
* position-querying operation. Avoid rewriting the "same"
|
|
|
|
* f_pos value back to the file because a concurrent read(),
|
|
|
|
* write() or lseek() might have altered it
|
|
|
|
*/
|
|
|
|
if (offset == 0)
|
|
|
|
return file->f_pos;
|
2011-09-15 17:06:48 -06:00
|
|
|
/*
|
|
|
|
* f_lock protects against read/modify/write race with other
|
|
|
|
* SEEK_CURs. Note that parallel writes and reads behave
|
|
|
|
* like SEEK_SET.
|
|
|
|
*/
|
|
|
|
spin_lock(&file->f_lock);
|
2013-06-24 22:02:13 -06:00
|
|
|
offset = vfs_setpos(file, file->f_pos + offset, maxsize);
|
2011-09-15 17:06:48 -06:00
|
|
|
spin_unlock(&file->f_lock);
|
|
|
|
return offset;
|
2011-07-18 11:21:35 -06:00
|
|
|
case SEEK_DATA:
|
|
|
|
/*
|
|
|
|
* In the generic case the entire file is data, so as long as
|
|
|
|
* offset isn't at the end of the file then the offset is data.
|
|
|
|
*/
|
2017-09-25 04:23:03 -06:00
|
|
|
if ((unsigned long long)offset >= eof)
|
2011-07-18 11:21:35 -06:00
|
|
|
return -ENXIO;
|
|
|
|
break;
|
|
|
|
case SEEK_HOLE:
|
|
|
|
/*
|
|
|
|
* There is a virtual hole at the end of the file, so as long as
|
|
|
|
* offset isn't i_size or larger, return i_size.
|
|
|
|
*/
|
2017-09-25 04:23:03 -06:00
|
|
|
if ((unsigned long long)offset >= eof)
|
2011-07-18 11:21:35 -06:00
|
|
|
return -ENXIO;
|
vfs: allow custom EOF in generic_file_llseek code
For ext3/4 htree directories, using the vfs llseek function with
SEEK_END goes to i_size like for any other file, but in reality
we want the maximum possible hash value. Recent changes
in ext4 have cut & pasted generic_file_llseek() back into fs/ext4/dir.c,
but replicating this core code seems like a bad idea, especially
since the copy has already diverged from the vfs.
This patch updates generic_file_llseek_size to accept
both a custom maximum offset, and a custom EOF position. With this
in place, ext4_dir_llseek can pass in the appropriate maximum hash
position for both maxsize and eof, and get what it wants.
As far as I know, this does not fix any bugs - nfs in the kernel
doesn't use SEEK_END, and I don't know of any user who does. But
some ext4 folks seem keen on doing the right thing here, and I can't
really argue.
(Patch also fixes up some comments slightly)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-30 12:11:29 -06:00
|
|
|
offset = eof;
|
2011-07-18 11:21:35 -06:00
|
|
|
break;
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
2008-08-11 07:37:17 -06:00
|
|
|
|
2013-06-24 22:02:13 -06:00
|
|
|
return vfs_setpos(file, offset, maxsize);
|
2011-09-15 17:06:50 -06:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(generic_file_llseek_size);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* generic_file_llseek - generic llseek implementation for regular files
|
|
|
|
* @file: file structure to seek on
|
|
|
|
* @offset: file offset to seek to
|
2012-12-17 16:59:39 -07:00
|
|
|
* @whence: type of seek
|
2011-09-15 17:06:50 -06:00
|
|
|
*
|
|
|
|
* This is a generic implemenation of ->llseek useable for all normal local
|
|
|
|
* filesystems. It just updates the file offset to the value specified by
|
2013-04-29 16:06:07 -06:00
|
|
|
* @offset and @whence.
|
2011-09-15 17:06:50 -06:00
|
|
|
*/
|
2012-12-17 16:59:39 -07:00
|
|
|
loff_t generic_file_llseek(struct file *file, loff_t offset, int whence)
|
2011-09-15 17:06:50 -06:00
|
|
|
{
|
|
|
|
struct inode *inode = file->f_mapping->host;
|
|
|
|
|
2012-12-17 16:59:39 -07:00
|
|
|
return generic_file_llseek_size(file, offset, whence,
|
vfs: allow custom EOF in generic_file_llseek code
For ext3/4 htree directories, using the vfs llseek function with
SEEK_END goes to i_size like for any other file, but in reality
we want the maximum possible hash value. Recent changes
in ext4 have cut & pasted generic_file_llseek() back into fs/ext4/dir.c,
but replicating this core code seems like a bad idea, especially
since the copy has already diverged from the vfs.
This patch updates generic_file_llseek_size to accept
both a custom maximum offset, and a custom EOF position. With this
in place, ext4_dir_llseek can pass in the appropriate maximum hash
position for both maxsize and eof, and get what it wants.
As far as I know, this does not fix any bugs - nfs in the kernel
doesn't use SEEK_END, and I don't know of any user who does. But
some ext4 folks seem keen on doing the right thing here, and I can't
really argue.
(Patch also fixes up some comments slightly)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-30 12:11:29 -06:00
|
|
|
inode->i_sb->s_maxbytes,
|
|
|
|
i_size_read(inode));
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
2008-06-27 03:05:24 -06:00
|
|
|
EXPORT_SYMBOL(generic_file_llseek);
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2013-06-16 10:27:42 -06:00
|
|
|
/**
|
|
|
|
* fixed_size_llseek - llseek implementation for fixed-sized devices
|
|
|
|
* @file: file structure to seek on
|
|
|
|
* @offset: file offset to seek to
|
|
|
|
* @whence: type of seek
|
|
|
|
* @size: size of the file
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
loff_t fixed_size_llseek(struct file *file, loff_t offset, int whence, loff_t size)
|
|
|
|
{
|
|
|
|
switch (whence) {
|
|
|
|
case SEEK_SET: case SEEK_CUR: case SEEK_END:
|
|
|
|
return generic_file_llseek_size(file, offset, whence,
|
|
|
|
size, size);
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(fixed_size_llseek);
|
|
|
|
|
2015-12-05 20:04:48 -07:00
|
|
|
/**
|
|
|
|
* no_seek_end_llseek - llseek implementation for fixed-sized devices
|
|
|
|
* @file: file structure to seek on
|
|
|
|
* @offset: file offset to seek to
|
|
|
|
* @whence: type of seek
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
loff_t no_seek_end_llseek(struct file *file, loff_t offset, int whence)
|
|
|
|
{
|
|
|
|
switch (whence) {
|
|
|
|
case SEEK_SET: case SEEK_CUR:
|
|
|
|
return generic_file_llseek_size(file, offset, whence,
|
2016-02-16 14:20:59 -07:00
|
|
|
OFFSET_MAX, 0);
|
2015-12-05 20:04:48 -07:00
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(no_seek_end_llseek);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* no_seek_end_llseek_size - llseek implementation for fixed-sized devices
|
|
|
|
* @file: file structure to seek on
|
|
|
|
* @offset: file offset to seek to
|
|
|
|
* @whence: type of seek
|
|
|
|
* @size: maximal offset allowed
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
loff_t no_seek_end_llseek_size(struct file *file, loff_t offset, int whence, loff_t size)
|
|
|
|
{
|
|
|
|
switch (whence) {
|
|
|
|
case SEEK_SET: case SEEK_CUR:
|
|
|
|
return generic_file_llseek_size(file, offset, whence,
|
|
|
|
size, 0);
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(no_seek_end_llseek_size);
|
|
|
|
|
2010-05-26 15:44:48 -06:00
|
|
|
/**
|
|
|
|
* noop_llseek - No Operation Performed llseek implementation
|
|
|
|
* @file: file structure to seek on
|
|
|
|
* @offset: file offset to seek to
|
2012-12-17 16:59:39 -07:00
|
|
|
* @whence: type of seek
|
2010-05-26 15:44:48 -06:00
|
|
|
*
|
|
|
|
* This is an implementation of ->llseek useable for the rare special case when
|
|
|
|
* userspace expects the seek to succeed but the (device) file is actually not
|
|
|
|
* able to perform the seek. In this case you use noop_llseek() instead of
|
|
|
|
* falling back to the default implementation of ->llseek.
|
|
|
|
*/
|
2012-12-17 16:59:39 -07:00
|
|
|
loff_t noop_llseek(struct file *file, loff_t offset, int whence)
|
2010-05-26 15:44:48 -06:00
|
|
|
{
|
|
|
|
return file->f_pos;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(noop_llseek);
|
|
|
|
|
2012-12-17 16:59:39 -07:00
|
|
|
loff_t no_llseek(struct file *file, loff_t offset, int whence)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
|
|
|
return -ESPIPE;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(no_llseek);
|
|
|
|
|
2012-12-17 16:59:39 -07:00
|
|
|
loff_t default_llseek(struct file *file, loff_t offset, int whence)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2013-01-23 15:07:38 -07:00
|
|
|
struct inode *inode = file_inode(file);
|
2008-04-22 07:09:22 -06:00
|
|
|
loff_t retval;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2016-01-22 13:40:57 -07:00
|
|
|
inode_lock(inode);
|
2012-12-17 16:59:39 -07:00
|
|
|
switch (whence) {
|
2007-05-08 01:24:13 -06:00
|
|
|
case SEEK_END:
|
2011-07-18 11:21:35 -06:00
|
|
|
offset += i_size_read(inode);
|
2005-04-16 16:20:36 -06:00
|
|
|
break;
|
2007-05-08 01:24:13 -06:00
|
|
|
case SEEK_CUR:
|
2008-11-10 18:08:08 -07:00
|
|
|
if (offset == 0) {
|
|
|
|
retval = file->f_pos;
|
|
|
|
goto out;
|
|
|
|
}
|
2005-04-16 16:20:36 -06:00
|
|
|
offset += file->f_pos;
|
2011-07-18 11:21:35 -06:00
|
|
|
break;
|
|
|
|
case SEEK_DATA:
|
|
|
|
/*
|
|
|
|
* In the generic case the entire file is data, so as
|
|
|
|
* long as offset isn't at the end of the file then the
|
|
|
|
* offset is data.
|
|
|
|
*/
|
2011-07-26 08:25:20 -06:00
|
|
|
if (offset >= inode->i_size) {
|
|
|
|
retval = -ENXIO;
|
|
|
|
goto out;
|
|
|
|
}
|
2011-07-18 11:21:35 -06:00
|
|
|
break;
|
|
|
|
case SEEK_HOLE:
|
|
|
|
/*
|
|
|
|
* There is a virtual hole at the end of the file, so
|
|
|
|
* as long as offset isn't i_size or larger, return
|
|
|
|
* i_size.
|
|
|
|
*/
|
2011-07-26 08:25:20 -06:00
|
|
|
if (offset >= inode->i_size) {
|
|
|
|
retval = -ENXIO;
|
|
|
|
goto out;
|
|
|
|
}
|
2011-07-18 11:21:35 -06:00
|
|
|
offset = inode->i_size;
|
|
|
|
break;
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
retval = -EINVAL;
|
2010-12-17 05:44:05 -07:00
|
|
|
if (offset >= 0 || unsigned_offsets(file)) {
|
2005-04-16 16:20:36 -06:00
|
|
|
if (offset != file->f_pos) {
|
|
|
|
file->f_pos = offset;
|
|
|
|
file->f_version = 0;
|
|
|
|
}
|
|
|
|
retval = offset;
|
|
|
|
}
|
2008-11-10 18:08:08 -07:00
|
|
|
out:
|
2016-01-22 13:40:57 -07:00
|
|
|
inode_unlock(inode);
|
2005-04-16 16:20:36 -06:00
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(default_llseek);
|
|
|
|
|
2012-12-17 16:59:39 -07:00
|
|
|
loff_t vfs_llseek(struct file *file, loff_t offset, int whence)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
|
|
|
loff_t (*fn)(struct file *, loff_t, int);
|
|
|
|
|
|
|
|
fn = no_llseek;
|
|
|
|
if (file->f_mode & FMODE_LSEEK) {
|
2013-09-22 14:27:52 -06:00
|
|
|
if (file->f_op->llseek)
|
2005-04-16 16:20:36 -06:00
|
|
|
fn = file->f_op->llseek;
|
|
|
|
}
|
2012-12-17 16:59:39 -07:00
|
|
|
return fn(file, offset, whence);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(vfs_llseek);
|
|
|
|
|
2018-03-13 14:51:17 -06:00
|
|
|
off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
|
|
|
off_t retval;
|
2014-03-03 10:36:58 -07:00
|
|
|
struct fd f = fdget_pos(fd);
|
2012-08-28 10:52:22 -06:00
|
|
|
if (!f.file)
|
|
|
|
return -EBADF;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
|
|
|
retval = -EINVAL;
|
2012-12-17 16:59:39 -07:00
|
|
|
if (whence <= SEEK_MAX) {
|
|
|
|
loff_t res = vfs_llseek(f.file, offset, whence);
|
2005-04-16 16:20:36 -06:00
|
|
|
retval = res;
|
|
|
|
if (res != (loff_t)retval)
|
|
|
|
retval = -EOVERFLOW; /* LFS: should only happen on 32 bit platforms */
|
|
|
|
}
|
2014-03-03 10:36:58 -07:00
|
|
|
fdput_pos(f);
|
2005-04-16 16:20:36 -06:00
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2018-03-13 14:51:17 -06:00
|
|
|
SYSCALL_DEFINE3(lseek, unsigned int, fd, off_t, offset, unsigned int, whence)
|
|
|
|
{
|
|
|
|
return ksys_lseek(fd, offset, whence);
|
|
|
|
}
|
|
|
|
|
2013-02-24 08:52:26 -07:00
|
|
|
#ifdef CONFIG_COMPAT
|
|
|
|
COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset, unsigned int, whence)
|
|
|
|
{
|
2018-03-13 14:51:17 -06:00
|
|
|
return ksys_lseek(fd, offset, whence);
|
2013-02-24 08:52:26 -07:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2005-04-16 16:20:36 -06:00
|
|
|
#ifdef __ARCH_WANT_SYS_LLSEEK
|
2009-01-14 06:14:21 -07:00
|
|
|
SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
|
|
|
|
unsigned long, offset_low, loff_t __user *, result,
|
2012-12-17 16:59:39 -07:00
|
|
|
unsigned int, whence)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
|
|
|
int retval;
|
2014-03-16 13:24:08 -06:00
|
|
|
struct fd f = fdget_pos(fd);
|
2005-04-16 16:20:36 -06:00
|
|
|
loff_t offset;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
if (!f.file)
|
|
|
|
return -EBADF;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
|
|
|
retval = -EINVAL;
|
2012-12-17 16:59:39 -07:00
|
|
|
if (whence > SEEK_MAX)
|
2005-04-16 16:20:36 -06:00
|
|
|
goto out_putf;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
offset = vfs_llseek(f.file, ((loff_t) offset_high << 32) | offset_low,
|
2012-12-17 16:59:39 -07:00
|
|
|
whence);
|
2005-04-16 16:20:36 -06:00
|
|
|
|
|
|
|
retval = (int)offset;
|
|
|
|
if (offset >= 0) {
|
|
|
|
retval = -EFAULT;
|
|
|
|
if (!copy_to_user(result, &offset, sizeof(offset)))
|
|
|
|
retval = 0;
|
|
|
|
}
|
|
|
|
out_putf:
|
2014-03-16 13:24:08 -06:00
|
|
|
fdput_pos(f);
|
2005-04-16 16:20:36 -06:00
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2013-06-19 05:26:04 -06:00
|
|
|
int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t count)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
|
|
|
struct inode *inode;
|
|
|
|
loff_t pos;
|
2008-01-12 04:05:48 -07:00
|
|
|
int retval = -EINVAL;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2013-01-23 15:07:38 -07:00
|
|
|
inode = file_inode(file);
|
2006-01-04 17:20:40 -07:00
|
|
|
if (unlikely((ssize_t) count < 0))
|
2008-01-12 04:05:48 -07:00
|
|
|
return retval;
|
2005-04-16 16:20:36 -06:00
|
|
|
pos = *ppos;
|
2010-12-17 05:44:05 -07:00
|
|
|
if (unlikely(pos < 0)) {
|
|
|
|
if (!unsigned_offsets(file))
|
|
|
|
return retval;
|
|
|
|
if (count >= -pos) /* both values are in 0..LLONG_MAX */
|
|
|
|
return -EOVERFLOW;
|
|
|
|
} else if (unlikely((loff_t) (pos + count) < 0)) {
|
|
|
|
if (!unsigned_offsets(file))
|
2010-10-01 15:20:22 -06:00
|
|
|
return retval;
|
|
|
|
}
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2015-01-16 13:05:55 -07:00
|
|
|
if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
|
2015-12-03 04:59:49 -07:00
|
|
|
retval = locks_mandatory_area(inode, file, pos, pos + count - 1,
|
|
|
|
read_write == READ ? F_RDLCK : F_WRLCK);
|
2006-01-04 17:20:40 -07:00
|
|
|
if (retval < 0)
|
|
|
|
return retval;
|
|
|
|
}
|
2016-03-31 19:48:20 -06:00
|
|
|
return security_file_permission(file,
|
2008-01-12 04:05:48 -07:00
|
|
|
read_write == READ ? MAY_READ : MAY_WRITE);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
2015-04-03 13:41:18 -06:00
|
|
|
static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
|
2014-02-11 16:37:41 -07:00
|
|
|
{
|
|
|
|
struct iovec iov = { .iov_base = buf, .iov_len = len };
|
|
|
|
struct kiocb kiocb;
|
|
|
|
struct iov_iter iter;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
init_sync_kiocb(&kiocb, filp);
|
|
|
|
kiocb.ki_pos = *ppos;
|
|
|
|
iov_iter_init(&iter, READ, &iov, 1, len);
|
|
|
|
|
2017-02-20 08:51:23 -07:00
|
|
|
ret = call_read_iter(filp, &kiocb, &iter);
|
2015-02-11 11:59:44 -07:00
|
|
|
BUG_ON(ret == -EIOCBQUEUED);
|
2014-02-11 16:37:41 -07:00
|
|
|
*ppos = kiocb.ki_pos;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-11-05 08:01:17 -07:00
|
|
|
ssize_t __vfs_read(struct file *file, char __user *buf, size_t count,
|
|
|
|
loff_t *pos)
|
|
|
|
{
|
|
|
|
if (file->f_op->read)
|
2015-04-03 13:09:18 -06:00
|
|
|
return file->f_op->read(file, buf, count, pos);
|
2014-11-05 08:01:17 -07:00
|
|
|
else if (file->f_op->read_iter)
|
2015-04-03 13:09:18 -06:00
|
|
|
return new_sync_read(file, buf, count, pos);
|
2014-11-05 08:01:17 -07:00
|
|
|
else
|
2015-04-03 13:09:18 -06:00
|
|
|
return -EINVAL;
|
2014-11-05 08:01:17 -07:00
|
|
|
}
|
|
|
|
|
2017-09-01 09:39:13 -06:00
|
|
|
ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
|
2017-09-01 09:39:12 -06:00
|
|
|
{
|
|
|
|
mm_segment_t old_fs;
|
2017-09-01 09:39:13 -06:00
|
|
|
ssize_t result;
|
2017-09-01 09:39:12 -06:00
|
|
|
|
|
|
|
old_fs = get_fs();
|
|
|
|
set_fs(get_ds());
|
|
|
|
/* The cast to a user pointer is valid due to the set_fs() */
|
2017-09-01 09:39:13 -06:00
|
|
|
result = vfs_read(file, (void __user *)buf, count, pos);
|
2017-09-01 09:39:12 -06:00
|
|
|
set_fs(old_fs);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(kernel_read);
|
2014-11-05 08:01:17 -07:00
|
|
|
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
|
|
|
|
{
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
if (!(file->f_mode & FMODE_READ))
|
|
|
|
return -EBADF;
|
2014-02-11 15:49:24 -07:00
|
|
|
if (!(file->f_mode & FMODE_CAN_READ))
|
2005-04-16 16:20:36 -06:00
|
|
|
return -EINVAL;
|
|
|
|
if (unlikely(!access_ok(VERIFY_WRITE, buf, count)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
ret = rw_verify_area(READ, file, pos, count);
|
2016-03-31 19:48:20 -06:00
|
|
|
if (!ret) {
|
|
|
|
if (count > MAX_RW_COUNT)
|
|
|
|
count = MAX_RW_COUNT;
|
2014-11-05 08:01:17 -07:00
|
|
|
ret = __vfs_read(file, buf, count, pos);
|
2008-01-12 04:05:48 -07:00
|
|
|
if (ret > 0) {
|
2009-12-17 19:24:21 -07:00
|
|
|
fsnotify_access(file);
|
2008-01-12 04:05:48 -07:00
|
|
|
add_rchar(current, ret);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
2008-01-12 04:05:48 -07:00
|
|
|
inc_syscr(current);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2019-10-01 01:41:17 -06:00
|
|
|
EXPORT_SYMBOL_GPL(vfs_read);
|
2018-01-30 15:24:02 -07:00
|
|
|
|
2015-04-03 13:41:18 -06:00
|
|
|
static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
|
2014-02-11 16:37:41 -07:00
|
|
|
{
|
|
|
|
struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };
|
|
|
|
struct kiocb kiocb;
|
|
|
|
struct iov_iter iter;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
init_sync_kiocb(&kiocb, filp);
|
|
|
|
kiocb.ki_pos = *ppos;
|
|
|
|
iov_iter_init(&iter, WRITE, &iov, 1, len);
|
|
|
|
|
2017-02-20 08:51:23 -07:00
|
|
|
ret = call_write_iter(filp, &kiocb, &iter);
|
2015-02-11 11:59:44 -07:00
|
|
|
BUG_ON(ret == -EIOCBQUEUED);
|
2015-04-06 18:50:38 -06:00
|
|
|
if (ret > 0)
|
|
|
|
*ppos = kiocb.ki_pos;
|
2014-02-11 16:37:41 -07:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2015-04-03 13:06:43 -06:00
|
|
|
ssize_t __vfs_write(struct file *file, const char __user *p, size_t count,
|
|
|
|
loff_t *pos)
|
|
|
|
{
|
|
|
|
if (file->f_op->write)
|
|
|
|
return file->f_op->write(file, p, count, pos);
|
|
|
|
else if (file->f_op->write_iter)
|
|
|
|
return new_sync_write(file, p, count, pos);
|
|
|
|
else
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2017-09-01 09:39:15 -06:00
|
|
|
ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos)
|
2013-03-20 11:19:30 -06:00
|
|
|
{
|
|
|
|
mm_segment_t old_fs;
|
|
|
|
const char __user *p;
|
|
|
|
ssize_t ret;
|
|
|
|
|
2014-02-11 15:49:24 -07:00
|
|
|
if (!(file->f_mode & FMODE_CAN_WRITE))
|
2013-03-27 09:20:30 -06:00
|
|
|
return -EINVAL;
|
|
|
|
|
2013-03-20 11:19:30 -06:00
|
|
|
old_fs = get_fs();
|
|
|
|
set_fs(get_ds());
|
|
|
|
p = (__force const char __user *)buf;
|
|
|
|
if (count > MAX_RW_COUNT)
|
|
|
|
count = MAX_RW_COUNT;
|
2015-04-03 13:06:43 -06:00
|
|
|
ret = __vfs_write(file, p, count, pos);
|
2013-03-20 11:19:30 -06:00
|
|
|
set_fs(old_fs);
|
|
|
|
if (ret > 0) {
|
|
|
|
fsnotify_modify(file);
|
|
|
|
add_wchar(current, ret);
|
|
|
|
}
|
|
|
|
inc_syscw(current);
|
|
|
|
return ret;
|
|
|
|
}
|
2014-08-19 09:48:09 -06:00
|
|
|
EXPORT_SYMBOL(__kernel_write);
|
|
|
|
|
2017-09-01 09:39:14 -06:00
|
|
|
ssize_t kernel_write(struct file *file, const void *buf, size_t count,
|
|
|
|
loff_t *pos)
|
2017-09-01 09:39:11 -06:00
|
|
|
{
|
|
|
|
mm_segment_t old_fs;
|
|
|
|
ssize_t res;
|
|
|
|
|
|
|
|
old_fs = get_fs();
|
|
|
|
set_fs(get_ds());
|
|
|
|
/* The cast to a user pointer is valid due to the set_fs() */
|
2017-09-01 09:39:14 -06:00
|
|
|
res = vfs_write(file, (__force const char __user *)buf, count, pos);
|
2017-09-01 09:39:11 -06:00
|
|
|
set_fs(old_fs);
|
|
|
|
|
|
|
|
return res;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(kernel_write);
|
|
|
|
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
|
|
|
|
{
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
if (!(file->f_mode & FMODE_WRITE))
|
|
|
|
return -EBADF;
|
2014-02-11 15:49:24 -07:00
|
|
|
if (!(file->f_mode & FMODE_CAN_WRITE))
|
2005-04-16 16:20:36 -06:00
|
|
|
return -EINVAL;
|
|
|
|
if (unlikely(!access_ok(VERIFY_READ, buf, count)))
|
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
ret = rw_verify_area(WRITE, file, pos, count);
|
2016-03-31 19:48:20 -06:00
|
|
|
if (!ret) {
|
|
|
|
if (count > MAX_RW_COUNT)
|
|
|
|
count = MAX_RW_COUNT;
|
2013-03-20 11:04:20 -06:00
|
|
|
file_start_write(file);
|
2015-04-03 13:06:43 -06:00
|
|
|
ret = __vfs_write(file, buf, count, pos);
|
2008-01-12 04:05:48 -07:00
|
|
|
if (ret > 0) {
|
2009-12-17 19:24:21 -07:00
|
|
|
fsnotify_modify(file);
|
2008-01-12 04:05:48 -07:00
|
|
|
add_wchar(current, ret);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
2008-01-12 04:05:48 -07:00
|
|
|
inc_syscw(current);
|
2013-03-20 11:04:20 -06:00
|
|
|
file_end_write(file);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2019-10-01 01:41:17 -06:00
|
|
|
EXPORT_SYMBOL_GPL(vfs_write);
|
2018-01-30 15:24:02 -07:00
|
|
|
|
2005-04-16 16:20:36 -06:00
|
|
|
static inline loff_t file_pos_read(struct file *file)
|
|
|
|
{
|
fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
[ Upstream commit 10dce8af34226d90fa56746a934f8da5dcdba3df ]
Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.
This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.
The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.
However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655e3 -
is doing so irregardless of whether a file is seekable or not.
See
https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
https://lwn.net/Articles/180387
https://lwn.net/Articles/180396
for historic context.
The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:
kernel/power/user.c snapshot_read
fs/debugfs/file.c u32_array_read
fs/fuse/control.c fuse_conn_waiting_read + ...
drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read
arch/s390/hypfs/inode.c hypfs_read_iter
...
Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.
Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):
drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()
In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.
FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:
See
https://github.com/libfuse/osspd
https://lwn.net/Articles/308445
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510
Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216
I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:
https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169
Let's fix this regression. The plan is:
1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
doing so would break many in-kernel nonseekable_open users which
actually use ppos in read/write handlers.
2. Add stream_open() to kernel to open stream-like non-seekable file
descriptors. Read and write on such file descriptors would never use
nor change ppos. And with that property on stream-like files read and
write will be running without taking f_pos lock - i.e. read and write
could be running simultaneously.
3. With semantic patch search and convert to stream_open all in-kernel
nonseekable_open users for which read and write actually do not
depend on ppos and where there is no other methods in file_operations
which assume @offset access.
4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
steam_open if that bit is present in filesystem open reply.
It was tempting to change fs/fuse/ open handler to use stream_open
instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and
write handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
from v3.14+ (the kernel where 9c225f2655 first appeared).
This will allow to patch OSSPD and other FUSE filesystems that
provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
in their open handler and this way avoid the deadlock on all kernel
versions. This should work because fs/fuse/ ignores unknown open
flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel
that is not aware of FOPEN_STREAM will be < v3.14 where just
FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
write deadlock.
This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.
Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.
The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-26 16:20:43 -06:00
|
|
|
return file->f_mode & FMODE_STREAM ? 0 : file->f_pos;
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void file_pos_write(struct file *file, loff_t pos)
|
|
|
|
{
|
fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
[ Upstream commit 10dce8af34226d90fa56746a934f8da5dcdba3df ]
Commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.
This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.
The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.
However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655e3 -
is doing so irregardless of whether a file is seekable or not.
See
https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
https://lwn.net/Articles/180387
https://lwn.net/Articles/180396
for historic context.
The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:
kernel/power/user.c snapshot_read
fs/debugfs/file.c u32_array_read
fs/fuse/control.c fuse_conn_waiting_read + ...
drivers/hwmon/asus_atk0110.c atk_debugfs_ggrp_read
arch/s390/hypfs/inode.c hypfs_read_iter
...
Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.
Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):
drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()
In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.
FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f715 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:
See
https://github.com/libfuse/osspd
https://lwn.net/Articles/308445
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510
Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216
I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:
https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169
Let's fix this regression. The plan is:
1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
doing so would break many in-kernel nonseekable_open users which
actually use ppos in read/write handlers.
2. Add stream_open() to kernel to open stream-like non-seekable file
descriptors. Read and write on such file descriptors would never use
nor change ppos. And with that property on stream-like files read and
write will be running without taking f_pos lock - i.e. read and write
could be running simultaneously.
3. With semantic patch search and convert to stream_open all in-kernel
nonseekable_open users for which read and write actually do not
depend on ppos and where there is no other methods in file_operations
which assume @offset access.
4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
steam_open if that bit is present in filesystem open reply.
It was tempting to change fs/fuse/ open handler to use stream_open
instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and
write handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
from v3.14+ (the kernel where 9c225f2655 first appeared).
This will allow to patch OSSPD and other FUSE filesystems that
provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
in their open handler and this way avoid the deadlock on all kernel
versions. This should work because fs/fuse/ ignores unknown open
flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel
that is not aware of FOPEN_STREAM will be < v3.14 where just
FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
write deadlock.
This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.
Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.
The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-26 16:20:43 -06:00
|
|
|
if ((file->f_mode & FMODE_STREAM) == 0)
|
|
|
|
file->f_pos = pos;
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
2018-03-13 14:56:26 -06:00
|
|
|
ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2014-03-03 10:36:58 -07:00
|
|
|
struct fd f = fdget_pos(fd);
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t ret = -EBADF;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
if (f.file) {
|
|
|
|
loff_t pos = file_pos_read(f.file);
|
|
|
|
ret = vfs_read(f.file, buf, count, &pos);
|
2013-06-14 19:49:36 -06:00
|
|
|
if (ret >= 0)
|
|
|
|
file_pos_write(f.file, pos);
|
2014-03-03 10:36:58 -07:00
|
|
|
fdput_pos(f);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-03-13 14:56:26 -06:00
|
|
|
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
|
|
|
|
{
|
|
|
|
return ksys_read(fd, buf, count);
|
|
|
|
}
|
|
|
|
|
2018-03-11 04:34:41 -06:00
|
|
|
ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2014-03-03 10:36:58 -07:00
|
|
|
struct fd f = fdget_pos(fd);
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t ret = -EBADF;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
if (f.file) {
|
|
|
|
loff_t pos = file_pos_read(f.file);
|
|
|
|
ret = vfs_write(f.file, buf, count, &pos);
|
2013-06-14 19:49:36 -06:00
|
|
|
if (ret >= 0)
|
|
|
|
file_pos_write(f.file, pos);
|
2014-03-03 10:36:58 -07:00
|
|
|
fdput_pos(f);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-03-11 04:34:41 -06:00
|
|
|
SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
|
|
|
|
size_t, count)
|
|
|
|
{
|
|
|
|
return ksys_write(fd, buf, count);
|
|
|
|
}
|
|
|
|
|
2018-03-19 10:38:31 -06:00
|
|
|
ssize_t ksys_pread64(unsigned int fd, char __user *buf, size_t count,
|
|
|
|
loff_t pos)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2012-08-28 10:52:22 -06:00
|
|
|
struct fd f;
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t ret = -EBADF;
|
|
|
|
|
|
|
|
if (pos < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
f = fdget(fd);
|
|
|
|
if (f.file) {
|
2005-04-16 16:20:36 -06:00
|
|
|
ret = -ESPIPE;
|
2012-08-28 10:52:22 -06:00
|
|
|
if (f.file->f_mode & FMODE_PREAD)
|
|
|
|
ret = vfs_read(f.file, buf, count, &pos);
|
|
|
|
fdput(f);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-03-19 10:38:31 -06:00
|
|
|
SYSCALL_DEFINE4(pread64, unsigned int, fd, char __user *, buf,
|
|
|
|
size_t, count, loff_t, pos)
|
|
|
|
{
|
|
|
|
return ksys_pread64(fd, buf, count, pos);
|
|
|
|
}
|
|
|
|
|
|
|
|
ssize_t ksys_pwrite64(unsigned int fd, const char __user *buf,
|
|
|
|
size_t count, loff_t pos)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2012-08-28 10:52:22 -06:00
|
|
|
struct fd f;
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t ret = -EBADF;
|
|
|
|
|
|
|
|
if (pos < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
f = fdget(fd);
|
|
|
|
if (f.file) {
|
2005-04-16 16:20:36 -06:00
|
|
|
ret = -ESPIPE;
|
2012-08-28 10:52:22 -06:00
|
|
|
if (f.file->f_mode & FMODE_PWRITE)
|
|
|
|
ret = vfs_write(f.file, buf, count, &pos);
|
|
|
|
fdput(f);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-03-19 10:38:31 -06:00
|
|
|
SYSCALL_DEFINE4(pwrite64, unsigned int, fd, const char __user *, buf,
|
|
|
|
size_t, count, loff_t, pos)
|
|
|
|
{
|
|
|
|
return ksys_pwrite64(fd, buf, count, pos);
|
|
|
|
}
|
|
|
|
|
2015-03-20 18:10:21 -06:00
|
|
|
static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
|
2017-07-06 10:58:37 -06:00
|
|
|
loff_t *ppos, int type, rwf_t flags)
|
2014-02-11 16:37:41 -07:00
|
|
|
{
|
|
|
|
struct kiocb kiocb;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
init_sync_kiocb(&kiocb, filp);
|
2017-06-20 06:05:40 -06:00
|
|
|
ret = kiocb_set_rw_flags(&kiocb, flags);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2014-02-11 16:37:41 -07:00
|
|
|
kiocb.ki_pos = *ppos;
|
|
|
|
|
2017-02-20 08:51:23 -07:00
|
|
|
if (type == READ)
|
2017-02-20 08:51:23 -07:00
|
|
|
ret = call_read_iter(filp, &kiocb, iter);
|
2017-02-20 08:51:23 -07:00
|
|
|
else
|
2017-02-20 08:51:23 -07:00
|
|
|
ret = call_write_iter(filp, &kiocb, iter);
|
2015-02-11 11:59:44 -07:00
|
|
|
BUG_ON(ret == -EIOCBQUEUED);
|
2014-02-11 16:37:41 -07:00
|
|
|
*ppos = kiocb.ki_pos;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2006-10-01 00:28:47 -06:00
|
|
|
/* Do it by hand, with file-ops */
|
2015-03-20 18:10:21 -06:00
|
|
|
static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
|
2017-07-06 10:58:37 -06:00
|
|
|
loff_t *ppos, int type, rwf_t flags)
|
2006-10-01 00:28:47 -06:00
|
|
|
{
|
|
|
|
ssize_t ret = 0;
|
|
|
|
|
2016-03-03 08:04:01 -07:00
|
|
|
if (flags & ~RWF_HIPRI)
|
2016-03-03 08:03:58 -07:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
2015-03-20 18:10:21 -06:00
|
|
|
while (iov_iter_count(iter)) {
|
|
|
|
struct iovec iovec = iov_iter_iovec(iter);
|
2006-10-01 00:28:47 -06:00
|
|
|
ssize_t nr;
|
|
|
|
|
2017-02-20 08:51:23 -07:00
|
|
|
if (type == READ) {
|
|
|
|
nr = filp->f_op->read(filp, iovec.iov_base,
|
|
|
|
iovec.iov_len, ppos);
|
|
|
|
} else {
|
|
|
|
nr = filp->f_op->write(filp, iovec.iov_base,
|
|
|
|
iovec.iov_len, ppos);
|
|
|
|
}
|
2006-10-01 00:28:47 -06:00
|
|
|
|
|
|
|
if (nr < 0) {
|
|
|
|
if (!ret)
|
|
|
|
ret = nr;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
ret += nr;
|
2015-03-20 18:10:21 -06:00
|
|
|
if (nr != iovec.iov_len)
|
2006-10-01 00:28:47 -06:00
|
|
|
break;
|
2015-03-20 18:10:21 -06:00
|
|
|
iov_iter_advance(iter, nr);
|
2006-10-01 00:28:47 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2005-04-16 16:20:36 -06:00
|
|
|
/* A write operation does a read from user space and vice versa */
|
|
|
|
#define vrfy_dir(type) ((type) == READ ? VERIFY_WRITE : VERIFY_READ)
|
|
|
|
|
2016-10-08 03:18:07 -06:00
|
|
|
/**
|
|
|
|
* rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
|
|
|
|
* into the kernel and check that it is valid.
|
|
|
|
*
|
|
|
|
* @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
|
|
|
|
* @uvector: Pointer to the userspace array.
|
|
|
|
* @nr_segs: Number of elements in userspace array.
|
|
|
|
* @fast_segs: Number of elements in @fast_pointer.
|
|
|
|
* @fast_pointer: Pointer to (usually small on-stack) kernel array.
|
|
|
|
* @ret_pointer: (output parameter) Pointer to a variable that will point to
|
|
|
|
* either @fast_pointer, a newly allocated kernel array, or NULL,
|
|
|
|
* depending on which array was used.
|
|
|
|
*
|
|
|
|
* This function copies an array of &struct iovec of @nr_segs from
|
|
|
|
* userspace into the kernel and checks that each element is valid (e.g.
|
|
|
|
* it does not point to a kernel address or cause overflow by being too
|
|
|
|
* large, etc.).
|
|
|
|
*
|
|
|
|
* As an optimization, the caller may provide a pointer to a small
|
|
|
|
* on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
|
|
|
|
* (the size of this array, or 0 if unused, should be given in @fast_segs).
|
|
|
|
*
|
|
|
|
* @ret_pointer will always point to the array that was used, so the
|
|
|
|
* caller must take care not to call kfree() on it e.g. in case the
|
|
|
|
* @fast_pointer array was used and it was allocated on the stack.
|
|
|
|
*
|
|
|
|
* Return: The total number of bytes covered by the iovec array on success
|
|
|
|
* or a negative error code on error.
|
|
|
|
*/
|
2006-10-01 00:28:49 -06:00
|
|
|
ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
|
|
|
|
unsigned long nr_segs, unsigned long fast_segs,
|
|
|
|
struct iovec *fast_pointer,
|
2012-05-31 17:26:42 -06:00
|
|
|
struct iovec **ret_pointer)
|
2010-10-29 11:36:49 -06:00
|
|
|
{
|
2006-10-01 00:28:49 -06:00
|
|
|
unsigned long seg;
|
2010-10-29 11:36:49 -06:00
|
|
|
ssize_t ret;
|
2006-10-01 00:28:49 -06:00
|
|
|
struct iovec *iov = fast_pointer;
|
|
|
|
|
2010-10-29 11:36:49 -06:00
|
|
|
/*
|
|
|
|
* SuS says "The readv() function *may* fail if the iovcnt argument
|
|
|
|
* was less than or equal to 0, or greater than {IOV_MAX}. Linux has
|
|
|
|
* traditionally returned zero for zero segments, so...
|
|
|
|
*/
|
2006-10-01 00:28:49 -06:00
|
|
|
if (nr_segs == 0) {
|
|
|
|
ret = 0;
|
2010-10-29 11:36:49 -06:00
|
|
|
goto out;
|
2006-10-01 00:28:49 -06:00
|
|
|
}
|
|
|
|
|
2010-10-29 11:36:49 -06:00
|
|
|
/*
|
|
|
|
* First get the "struct iovec" from user memory and
|
|
|
|
* verify all the pointers
|
|
|
|
*/
|
2006-10-01 00:28:49 -06:00
|
|
|
if (nr_segs > UIO_MAXIOV) {
|
|
|
|
ret = -EINVAL;
|
2010-10-29 11:36:49 -06:00
|
|
|
goto out;
|
2006-10-01 00:28:49 -06:00
|
|
|
}
|
|
|
|
if (nr_segs > fast_segs) {
|
treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:
kmalloc(a * b, gfp)
with:
kmalloc_array(a * b, gfp)
as well as handling cases of:
kmalloc(a * b * c, gfp)
with:
kmalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kmalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kmalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 14:55:00 -06:00
|
|
|
iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
|
2006-10-01 00:28:49 -06:00
|
|
|
if (iov == NULL) {
|
|
|
|
ret = -ENOMEM;
|
2010-10-29 11:36:49 -06:00
|
|
|
goto out;
|
2006-10-01 00:28:49 -06:00
|
|
|
}
|
2010-10-29 11:36:49 -06:00
|
|
|
}
|
2006-10-01 00:28:49 -06:00
|
|
|
if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
|
|
|
|
ret = -EFAULT;
|
2010-10-29 11:36:49 -06:00
|
|
|
goto out;
|
2006-10-01 00:28:49 -06:00
|
|
|
}
|
|
|
|
|
2010-10-29 11:36:49 -06:00
|
|
|
/*
|
2006-10-01 00:28:49 -06:00
|
|
|
* According to the Single Unix Specification we should return EINVAL
|
|
|
|
* if an element length is < 0 when cast to ssize_t or if the
|
|
|
|
* total length would overflow the ssize_t return value of the
|
|
|
|
* system call.
|
2010-10-29 11:36:49 -06:00
|
|
|
*
|
|
|
|
* Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
|
|
|
|
* overflow case.
|
|
|
|
*/
|
2006-10-01 00:28:49 -06:00
|
|
|
ret = 0;
|
2010-10-29 11:36:49 -06:00
|
|
|
for (seg = 0; seg < nr_segs; seg++) {
|
|
|
|
void __user *buf = iov[seg].iov_base;
|
|
|
|
ssize_t len = (ssize_t)iov[seg].iov_len;
|
2006-10-01 00:28:49 -06:00
|
|
|
|
|
|
|
/* see if we we're about to use an invalid len or if
|
|
|
|
* it's about to overflow ssize_t */
|
2010-10-29 11:36:49 -06:00
|
|
|
if (len < 0) {
|
2006-10-01 00:28:49 -06:00
|
|
|
ret = -EINVAL;
|
2010-10-29 11:36:49 -06:00
|
|
|
goto out;
|
2006-10-01 00:28:49 -06:00
|
|
|
}
|
2012-05-31 17:26:42 -06:00
|
|
|
if (type >= 0
|
2011-10-31 18:06:39 -06:00
|
|
|
&& unlikely(!access_ok(vrfy_dir(type), buf, len))) {
|
2006-10-01 00:28:49 -06:00
|
|
|
ret = -EFAULT;
|
2010-10-29 11:36:49 -06:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
if (len > MAX_RW_COUNT - ret) {
|
|
|
|
len = MAX_RW_COUNT - ret;
|
|
|
|
iov[seg].iov_len = len;
|
2006-10-01 00:28:49 -06:00
|
|
|
}
|
|
|
|
ret += len;
|
2010-10-29 11:36:49 -06:00
|
|
|
}
|
2006-10-01 00:28:49 -06:00
|
|
|
out:
|
|
|
|
*ret_pointer = iov;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-04-08 16:18:48 -06:00
|
|
|
#ifdef CONFIG_COMPAT
|
|
|
|
ssize_t compat_rw_copy_check_uvector(int type,
|
|
|
|
const struct compat_iovec __user *uvector, unsigned long nr_segs,
|
|
|
|
unsigned long fast_segs, struct iovec *fast_pointer,
|
|
|
|
struct iovec **ret_pointer)
|
|
|
|
{
|
|
|
|
compat_ssize_t tot_len;
|
|
|
|
struct iovec *iov = *ret_pointer = fast_pointer;
|
|
|
|
ssize_t ret = 0;
|
|
|
|
int seg;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* SuS says "The readv() function *may* fail if the iovcnt argument
|
|
|
|
* was less than or equal to 0, or greater than {IOV_MAX}. Linux has
|
|
|
|
* traditionally returned zero for zero segments, so...
|
|
|
|
*/
|
|
|
|
if (nr_segs == 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
ret = -EINVAL;
|
|
|
|
if (nr_segs > UIO_MAXIOV)
|
|
|
|
goto out;
|
|
|
|
if (nr_segs > fast_segs) {
|
|
|
|
ret = -ENOMEM;
|
treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:
kmalloc(a * b, gfp)
with:
kmalloc_array(a * b, gfp)
as well as handling cases of:
kmalloc(a * b * c, gfp)
with:
kmalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kmalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kmalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 14:55:00 -06:00
|
|
|
iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
|
2017-04-08 16:18:48 -06:00
|
|
|
if (iov == NULL)
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
*ret_pointer = iov;
|
|
|
|
|
|
|
|
ret = -EFAULT;
|
|
|
|
if (!access_ok(VERIFY_READ, uvector, nr_segs*sizeof(*uvector)))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Single unix specification:
|
|
|
|
* We should -EINVAL if an element length is not >= 0 and fitting an
|
|
|
|
* ssize_t.
|
|
|
|
*
|
|
|
|
* In Linux, the total length is limited to MAX_RW_COUNT, there is
|
|
|
|
* no overflow possibility.
|
|
|
|
*/
|
|
|
|
tot_len = 0;
|
|
|
|
ret = -EINVAL;
|
|
|
|
for (seg = 0; seg < nr_segs; seg++) {
|
|
|
|
compat_uptr_t buf;
|
|
|
|
compat_ssize_t len;
|
|
|
|
|
|
|
|
if (__get_user(len, &uvector->iov_len) ||
|
|
|
|
__get_user(buf, &uvector->iov_base)) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
if (len < 0) /* size_t not fitting in compat_ssize_t .. */
|
|
|
|
goto out;
|
|
|
|
if (type >= 0 &&
|
|
|
|
!access_ok(vrfy_dir(type), compat_ptr(buf), len)) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
if (len > MAX_RW_COUNT - tot_len)
|
|
|
|
len = MAX_RW_COUNT - tot_len;
|
|
|
|
tot_len += len;
|
|
|
|
iov->iov_base = compat_ptr(buf);
|
|
|
|
iov->iov_len = (compat_size_t) len;
|
|
|
|
uvector++;
|
|
|
|
iov++;
|
|
|
|
}
|
|
|
|
ret = tot_len;
|
|
|
|
|
|
|
|
out:
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2017-05-27 02:16:48 -06:00
|
|
|
static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
|
2017-07-06 10:58:37 -06:00
|
|
|
loff_t *pos, rwf_t flags)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
|
|
|
size_t tot_len;
|
2017-02-20 08:51:23 -07:00
|
|
|
ssize_t ret = 0;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2017-05-27 02:16:49 -06:00
|
|
|
if (!(file->f_mode & FMODE_READ))
|
|
|
|
return -EBADF;
|
|
|
|
if (!(file->f_mode & FMODE_CAN_READ))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2017-02-20 08:51:23 -07:00
|
|
|
tot_len = iov_iter_count(iter);
|
2015-03-21 17:40:11 -06:00
|
|
|
if (!tot_len)
|
|
|
|
goto out;
|
2017-05-27 02:16:48 -06:00
|
|
|
ret = rw_verify_area(READ, file, pos, tot_len);
|
2006-01-04 17:20:40 -07:00
|
|
|
if (ret < 0)
|
2017-05-27 02:16:48 -06:00
|
|
|
return ret;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2017-05-27 02:16:48 -06:00
|
|
|
if (file->f_op->read_iter)
|
|
|
|
ret = do_iter_readv_writev(file, iter, pos, READ, flags);
|
2006-10-01 00:28:47 -06:00
|
|
|
else
|
2017-05-27 02:16:48 -06:00
|
|
|
ret = do_loop_readv_writev(file, iter, pos, READ, flags);
|
2005-04-16 16:20:36 -06:00
|
|
|
out:
|
2017-05-27 02:16:48 -06:00
|
|
|
if (ret >= 0)
|
|
|
|
fsnotify_access(file);
|
2005-04-16 16:20:36 -06:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-05-27 02:16:51 -06:00
|
|
|
ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
|
2017-07-06 10:58:37 -06:00
|
|
|
rwf_t flags)
|
2017-02-20 08:51:23 -07:00
|
|
|
{
|
2017-05-27 02:16:51 -06:00
|
|
|
if (!file->f_op->read_iter)
|
|
|
|
return -EINVAL;
|
|
|
|
return do_iter_read(file, iter, ppos, flags);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(vfs_iter_read);
|
2017-02-20 08:51:23 -07:00
|
|
|
|
2017-05-27 02:16:48 -06:00
|
|
|
static ssize_t do_iter_write(struct file *file, struct iov_iter *iter,
|
2017-07-06 10:58:37 -06:00
|
|
|
loff_t *pos, rwf_t flags)
|
2017-05-27 02:16:48 -06:00
|
|
|
{
|
|
|
|
size_t tot_len;
|
|
|
|
ssize_t ret = 0;
|
2013-03-20 11:04:20 -06:00
|
|
|
|
2017-05-27 02:16:49 -06:00
|
|
|
if (!(file->f_mode & FMODE_WRITE))
|
|
|
|
return -EBADF;
|
|
|
|
if (!(file->f_mode & FMODE_CAN_WRITE))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2017-05-27 02:16:48 -06:00
|
|
|
tot_len = iov_iter_count(iter);
|
|
|
|
if (!tot_len)
|
|
|
|
return 0;
|
|
|
|
ret = rw_verify_area(WRITE, file, pos, tot_len);
|
2017-02-20 08:51:23 -07:00
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
2017-05-27 02:16:48 -06:00
|
|
|
if (file->f_op->write_iter)
|
|
|
|
ret = do_iter_readv_writev(file, iter, pos, WRITE, flags);
|
|
|
|
else
|
|
|
|
ret = do_loop_readv_writev(file, iter, pos, WRITE, flags);
|
|
|
|
if (ret > 0)
|
|
|
|
fsnotify_modify(file);
|
2017-02-20 08:51:23 -07:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2017-05-27 02:16:52 -06:00
|
|
|
ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos,
|
2017-07-06 10:58:37 -06:00
|
|
|
rwf_t flags)
|
2017-05-27 02:16:52 -06:00
|
|
|
{
|
|
|
|
if (!file->f_op->write_iter)
|
|
|
|
return -EINVAL;
|
|
|
|
return do_iter_write(file, iter, ppos, flags);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(vfs_iter_write);
|
|
|
|
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, loff_t *pos, rwf_t flags)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2017-02-20 08:51:23 -07:00
|
|
|
struct iovec iovstack[UIO_FASTIOV];
|
|
|
|
struct iovec *iov = iovstack;
|
|
|
|
struct iov_iter iter;
|
|
|
|
ssize_t ret;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2017-05-27 02:16:46 -06:00
|
|
|
ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
|
2017-05-27 02:16:49 -06:00
|
|
|
if (ret >= 0) {
|
|
|
|
ret = do_iter_read(file, &iter, pos, flags);
|
|
|
|
kfree(iov);
|
|
|
|
}
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2017-05-27 02:16:46 -06:00
|
|
|
return ret;
|
|
|
|
}
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2017-09-01 09:39:25 -06:00
|
|
|
static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, loff_t *pos, rwf_t flags)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2017-05-27 02:16:46 -06:00
|
|
|
struct iovec iovstack[UIO_FASTIOV];
|
|
|
|
struct iovec *iov = iovstack;
|
|
|
|
struct iov_iter iter;
|
|
|
|
ssize_t ret;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
2017-05-27 02:16:46 -06:00
|
|
|
ret = import_iovec(WRITE, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
|
2017-05-27 02:16:49 -06:00
|
|
|
if (ret >= 0) {
|
2017-07-06 07:15:47 -06:00
|
|
|
file_start_write(file);
|
2017-05-27 02:16:49 -06:00
|
|
|
ret = do_iter_write(file, &iter, pos, flags);
|
2017-07-06 07:15:47 -06:00
|
|
|
file_end_write(file);
|
2017-05-27 02:16:49 -06:00
|
|
|
kfree(iov);
|
|
|
|
}
|
2017-05-27 02:16:46 -06:00
|
|
|
return ret;
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, rwf_t flags)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2014-03-03 10:36:58 -07:00
|
|
|
struct fd f = fdget_pos(fd);
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t ret = -EBADF;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
if (f.file) {
|
|
|
|
loff_t pos = file_pos_read(f.file);
|
2016-03-03 08:03:59 -07:00
|
|
|
ret = vfs_readv(f.file, vec, vlen, &pos, flags);
|
2013-06-14 19:49:36 -06:00
|
|
|
if (ret >= 0)
|
|
|
|
file_pos_write(f.file, pos);
|
2014-03-03 10:36:58 -07:00
|
|
|
fdput_pos(f);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
if (ret > 0)
|
[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct
They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".
And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-10 02:46:45 -07:00
|
|
|
add_rchar(current, ret);
|
|
|
|
inc_syscr(current);
|
2005-04-16 16:20:36 -06:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, rwf_t flags)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2014-03-03 10:36:58 -07:00
|
|
|
struct fd f = fdget_pos(fd);
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t ret = -EBADF;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
if (f.file) {
|
|
|
|
loff_t pos = file_pos_read(f.file);
|
2016-03-03 08:03:59 -07:00
|
|
|
ret = vfs_writev(f.file, vec, vlen, &pos, flags);
|
2013-06-14 19:49:36 -06:00
|
|
|
if (ret >= 0)
|
|
|
|
file_pos_write(f.file, pos);
|
2014-03-03 10:36:58 -07:00
|
|
|
fdput_pos(f);
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
if (ret > 0)
|
[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct
They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".
And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-10 02:46:45 -07:00
|
|
|
add_wchar(current, ret);
|
|
|
|
inc_syscw(current);
|
2005-04-16 16:20:36 -06:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
Make non-compat preadv/pwritev use native register size
Instead of always splitting the file offset into 32-bit 'high' and 'low'
parts, just split them into the largest natural word-size - which in C
terms is 'unsigned long'.
This allows 64-bit architectures to avoid the unnecessary 32-bit
shifting and masking for native format (while the compat interfaces will
obviously always have to do it).
This also changes the order of 'high' and 'low' to be "low first". Why?
Because when we have it like this, the 64-bit system calls now don't use
the "pos_high" argument at all, and it makes more sense for the native
system call to simply match the user-mode prototype.
This results in a much more natural calling convention, and allows the
compiler to generate much more straightforward code. On x86-64, we now
generate
testq %rcx, %rcx # pos_l
js .L122 #,
movq %rcx, -48(%rbp) # pos_l, pos
from the C source
loff_t pos = pos_from_hilo(pos_h, pos_l);
...
if (pos < 0)
return -EINVAL;
and the 'pos_h' register isn't even touched. It used to generate code
like
mov %r8d, %r8d # pos_low, pos_low
salq $32, %rcx #, tmp71
movq %r8, %rax # pos_low, pos.386
orq %rcx, %rax # tmp71, pos.386
js .L122 #,
movq %rax, -48(%rbp) # pos.386, pos
which isn't _that_ horrible, but it does show how the natural word size
is just a more sensible interface (same arguments will hold in the user
level glibc wrapper function, of course, so the kernel side is just half
of the equation!)
Note: in all cases the user code wrapper can again be the same. You can
just do
#define HALF_BITS (sizeof(unsigned long)*4)
__syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS);
or something like that. That way the user mode wrapper will also be
nicely passing in a zero (it won't actually have to do the shifts, the
compiler will understand what is going on) for the last argument.
And that is a good idea, even if nobody will necessarily ever care: if
we ever do move to a 128-bit lloff_t, this particular system call might
be left alone. Of course, that will be the least of our worries if we
really ever need to care, so this may not be worth really caring about.
[ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ralf Baechle <ralf@linux-mips.org>>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-03 09:03:22 -06:00
|
|
|
static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
|
|
|
|
{
|
|
|
|
#define HALF_LONG_BITS (BITS_PER_LONG / 2)
|
|
|
|
return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
|
|
|
|
}
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, loff_t pos, rwf_t flags)
|
2009-04-02 17:59:23 -06:00
|
|
|
{
|
2012-08-28 10:52:22 -06:00
|
|
|
struct fd f;
|
2009-04-02 17:59:23 -06:00
|
|
|
ssize_t ret = -EBADF;
|
|
|
|
|
|
|
|
if (pos < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
f = fdget(fd);
|
|
|
|
if (f.file) {
|
2009-04-02 17:59:23 -06:00
|
|
|
ret = -ESPIPE;
|
2012-08-28 10:52:22 -06:00
|
|
|
if (f.file->f_mode & FMODE_PREAD)
|
2016-03-03 08:03:59 -07:00
|
|
|
ret = vfs_readv(f.file, vec, vlen, &pos, flags);
|
2012-08-28 10:52:22 -06:00
|
|
|
fdput(f);
|
2009-04-02 17:59:23 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
if (ret > 0)
|
|
|
|
add_rchar(current, ret);
|
|
|
|
inc_syscr(current);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, loff_t pos, rwf_t flags)
|
2009-04-02 17:59:23 -06:00
|
|
|
{
|
2012-08-28 10:52:22 -06:00
|
|
|
struct fd f;
|
2009-04-02 17:59:23 -06:00
|
|
|
ssize_t ret = -EBADF;
|
|
|
|
|
|
|
|
if (pos < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2012-08-28 10:52:22 -06:00
|
|
|
f = fdget(fd);
|
|
|
|
if (f.file) {
|
2009-04-02 17:59:23 -06:00
|
|
|
ret = -ESPIPE;
|
2012-08-28 10:52:22 -06:00
|
|
|
if (f.file->f_mode & FMODE_PWRITE)
|
2016-03-03 08:03:59 -07:00
|
|
|
ret = vfs_writev(f.file, vec, vlen, &pos, flags);
|
2012-08-28 10:52:22 -06:00
|
|
|
fdput(f);
|
2009-04-02 17:59:23 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
if (ret > 0)
|
|
|
|
add_wchar(current, ret);
|
|
|
|
inc_syscw(current);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
|
|
|
|
unsigned long, vlen)
|
|
|
|
{
|
|
|
|
return do_readv(fd, vec, vlen, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
|
|
|
|
unsigned long, vlen)
|
|
|
|
{
|
|
|
|
return do_writev(fd, vec, vlen, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
|
|
|
|
unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
|
|
|
|
{
|
|
|
|
loff_t pos = pos_from_hilo(pos_h, pos_l);
|
|
|
|
|
|
|
|
return do_preadv(fd, vec, vlen, pos, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
|
|
|
|
unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
|
2017-07-06 10:58:37 -06:00
|
|
|
rwf_t, flags)
|
2016-03-03 08:03:59 -07:00
|
|
|
{
|
|
|
|
loff_t pos = pos_from_hilo(pos_h, pos_l);
|
|
|
|
|
|
|
|
if (pos == -1)
|
|
|
|
return do_readv(fd, vec, vlen, flags);
|
|
|
|
|
|
|
|
return do_preadv(fd, vec, vlen, pos, flags);
|
|
|
|
}
|
|
|
|
|
|
|
|
SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
|
|
|
|
unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
|
|
|
|
{
|
|
|
|
loff_t pos = pos_from_hilo(pos_h, pos_l);
|
|
|
|
|
|
|
|
return do_pwritev(fd, vec, vlen, pos, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
|
|
|
|
unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
|
2017-07-06 10:58:37 -06:00
|
|
|
rwf_t, flags)
|
2016-03-03 08:03:59 -07:00
|
|
|
{
|
|
|
|
loff_t pos = pos_from_hilo(pos_h, pos_l);
|
|
|
|
|
|
|
|
if (pos == -1)
|
|
|
|
return do_writev(fd, vec, vlen, flags);
|
|
|
|
|
|
|
|
return do_pwritev(fd, vec, vlen, pos, flags);
|
|
|
|
}
|
|
|
|
|
2013-03-20 08:42:10 -06:00
|
|
|
#ifdef CONFIG_COMPAT
|
|
|
|
static size_t compat_readv(struct file *file,
|
|
|
|
const struct compat_iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, loff_t *pos, rwf_t flags)
|
2013-03-20 08:42:10 -06:00
|
|
|
{
|
|
|
|
struct iovec iovstack[UIO_FASTIOV];
|
|
|
|
struct iovec *iov = iovstack;
|
2015-03-20 18:10:21 -06:00
|
|
|
struct iov_iter iter;
|
2013-03-20 08:42:10 -06:00
|
|
|
ssize_t ret;
|
|
|
|
|
2017-05-27 02:16:47 -06:00
|
|
|
ret = compat_import_iovec(READ, vec, vlen, UIO_FASTIOV, &iov, &iter);
|
2017-05-27 02:16:49 -06:00
|
|
|
if (ret >= 0) {
|
|
|
|
ret = do_iter_read(file, &iter, pos, flags);
|
|
|
|
kfree(iov);
|
|
|
|
}
|
2013-03-20 08:42:10 -06:00
|
|
|
if (ret > 0)
|
|
|
|
add_rchar(current, ret);
|
|
|
|
inc_syscr(current);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
static size_t do_compat_readv(compat_ulong_t fd,
|
|
|
|
const struct compat_iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
compat_ulong_t vlen, rwf_t flags)
|
2013-03-20 08:42:10 -06:00
|
|
|
{
|
2014-03-03 10:36:58 -07:00
|
|
|
struct fd f = fdget_pos(fd);
|
2013-03-20 08:42:10 -06:00
|
|
|
ssize_t ret;
|
|
|
|
loff_t pos;
|
|
|
|
|
|
|
|
if (!f.file)
|
|
|
|
return -EBADF;
|
|
|
|
pos = f.file->f_pos;
|
2016-03-03 08:03:59 -07:00
|
|
|
ret = compat_readv(f.file, vec, vlen, &pos, flags);
|
2013-06-14 19:49:36 -06:00
|
|
|
if (ret >= 0)
|
|
|
|
f.file->f_pos = pos;
|
2014-03-03 10:36:58 -07:00
|
|
|
fdput_pos(f);
|
2013-03-20 08:42:10 -06:00
|
|
|
return ret;
|
2016-03-03 08:03:59 -07:00
|
|
|
|
2013-03-20 08:42:10 -06:00
|
|
|
}
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
|
|
|
|
const struct compat_iovec __user *,vec,
|
|
|
|
compat_ulong_t, vlen)
|
|
|
|
{
|
|
|
|
return do_compat_readv(fd, vec, vlen, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static long do_compat_preadv64(unsigned long fd,
|
2014-03-05 02:43:51 -07:00
|
|
|
const struct compat_iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, loff_t pos, rwf_t flags)
|
2013-03-20 08:42:10 -06:00
|
|
|
{
|
|
|
|
struct fd f;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
if (pos < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
f = fdget(fd);
|
|
|
|
if (!f.file)
|
|
|
|
return -EBADF;
|
|
|
|
ret = -ESPIPE;
|
|
|
|
if (f.file->f_mode & FMODE_PREAD)
|
2016-03-03 08:03:59 -07:00
|
|
|
ret = compat_readv(f.file, vec, vlen, &pos, flags);
|
2013-03-20 08:42:10 -06:00
|
|
|
fdput(f);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-03-05 02:43:51 -07:00
|
|
|
#ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
|
|
|
|
COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
|
|
|
|
const struct compat_iovec __user *,vec,
|
|
|
|
unsigned long, vlen, loff_t, pos)
|
|
|
|
{
|
2016-03-03 08:03:59 -07:00
|
|
|
return do_compat_preadv64(fd, vec, vlen, pos, 0);
|
2014-03-05 02:43:51 -07:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2014-01-29 15:05:44 -07:00
|
|
|
COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
|
2013-03-20 08:42:10 -06:00
|
|
|
const struct compat_iovec __user *,vec,
|
2014-01-29 15:05:44 -07:00
|
|
|
compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
|
2013-03-20 08:42:10 -06:00
|
|
|
{
|
|
|
|
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
|
2014-03-05 02:43:51 -07:00
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
return do_compat_preadv64(fd, vec, vlen, pos, 0);
|
|
|
|
}
|
|
|
|
|
2016-07-14 13:31:53 -06:00
|
|
|
#ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
|
|
|
|
COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd,
|
|
|
|
const struct compat_iovec __user *,vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long, vlen, loff_t, pos, rwf_t, flags)
|
2016-07-14 13:31:53 -06:00
|
|
|
{
|
2018-12-06 12:05:34 -07:00
|
|
|
if (pos == -1)
|
|
|
|
return do_compat_readv(fd, vec, vlen, flags);
|
|
|
|
|
2016-07-14 13:31:53 -06:00
|
|
|
return do_compat_preadv64(fd, vec, vlen, pos, flags);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
|
|
|
|
const struct compat_iovec __user *,vec,
|
|
|
|
compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
|
2017-07-06 10:58:37 -06:00
|
|
|
rwf_t, flags)
|
2016-03-03 08:03:59 -07:00
|
|
|
{
|
|
|
|
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
|
|
|
|
|
|
|
|
if (pos == -1)
|
|
|
|
return do_compat_readv(fd, vec, vlen, flags);
|
|
|
|
|
|
|
|
return do_compat_preadv64(fd, vec, vlen, pos, flags);
|
2013-03-20 08:42:10 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
static size_t compat_writev(struct file *file,
|
|
|
|
const struct compat_iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, loff_t *pos, rwf_t flags)
|
2013-03-20 08:42:10 -06:00
|
|
|
{
|
2017-05-27 02:16:47 -06:00
|
|
|
struct iovec iovstack[UIO_FASTIOV];
|
|
|
|
struct iovec *iov = iovstack;
|
|
|
|
struct iov_iter iter;
|
2017-05-27 02:16:49 -06:00
|
|
|
ssize_t ret;
|
2013-03-20 08:42:10 -06:00
|
|
|
|
2017-05-27 02:16:47 -06:00
|
|
|
ret = compat_import_iovec(WRITE, vec, vlen, UIO_FASTIOV, &iov, &iter);
|
2017-05-27 02:16:49 -06:00
|
|
|
if (ret >= 0) {
|
2017-07-06 07:15:47 -06:00
|
|
|
file_start_write(file);
|
2017-05-27 02:16:49 -06:00
|
|
|
ret = do_iter_write(file, &iter, pos, flags);
|
2017-07-06 07:15:47 -06:00
|
|
|
file_end_write(file);
|
2017-05-27 02:16:49 -06:00
|
|
|
kfree(iov);
|
|
|
|
}
|
2013-03-20 08:42:10 -06:00
|
|
|
if (ret > 0)
|
|
|
|
add_wchar(current, ret);
|
|
|
|
inc_syscw(current);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
static size_t do_compat_writev(compat_ulong_t fd,
|
|
|
|
const struct compat_iovec __user* vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
compat_ulong_t vlen, rwf_t flags)
|
2013-03-20 08:42:10 -06:00
|
|
|
{
|
2014-03-03 10:36:58 -07:00
|
|
|
struct fd f = fdget_pos(fd);
|
2013-03-20 08:42:10 -06:00
|
|
|
ssize_t ret;
|
|
|
|
loff_t pos;
|
|
|
|
|
|
|
|
if (!f.file)
|
|
|
|
return -EBADF;
|
|
|
|
pos = f.file->f_pos;
|
2016-03-03 08:03:59 -07:00
|
|
|
ret = compat_writev(f.file, vec, vlen, &pos, flags);
|
2013-06-14 19:49:36 -06:00
|
|
|
if (ret >= 0)
|
|
|
|
f.file->f_pos = pos;
|
2014-03-03 10:36:58 -07:00
|
|
|
fdput_pos(f);
|
2013-03-20 08:42:10 -06:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
|
|
|
|
const struct compat_iovec __user *, vec,
|
|
|
|
compat_ulong_t, vlen)
|
|
|
|
{
|
|
|
|
return do_compat_writev(fd, vec, vlen, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static long do_compat_pwritev64(unsigned long fd,
|
2014-03-05 02:43:51 -07:00
|
|
|
const struct compat_iovec __user *vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long vlen, loff_t pos, rwf_t flags)
|
2013-03-20 08:42:10 -06:00
|
|
|
{
|
|
|
|
struct fd f;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
if (pos < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
f = fdget(fd);
|
|
|
|
if (!f.file)
|
|
|
|
return -EBADF;
|
|
|
|
ret = -ESPIPE;
|
|
|
|
if (f.file->f_mode & FMODE_PWRITE)
|
2016-03-03 08:03:59 -07:00
|
|
|
ret = compat_writev(f.file, vec, vlen, &pos, flags);
|
2013-03-20 08:42:10 -06:00
|
|
|
fdput(f);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-03-05 02:43:51 -07:00
|
|
|
#ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64
|
|
|
|
COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
|
|
|
|
const struct compat_iovec __user *,vec,
|
|
|
|
unsigned long, vlen, loff_t, pos)
|
|
|
|
{
|
2016-03-03 08:03:59 -07:00
|
|
|
return do_compat_pwritev64(fd, vec, vlen, pos, 0);
|
2014-03-05 02:43:51 -07:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2014-01-29 15:05:44 -07:00
|
|
|
COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
|
2013-03-20 08:42:10 -06:00
|
|
|
const struct compat_iovec __user *,vec,
|
2014-01-29 15:05:44 -07:00
|
|
|
compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
|
2013-03-20 08:42:10 -06:00
|
|
|
{
|
|
|
|
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
|
2014-03-05 02:43:51 -07:00
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
return do_compat_pwritev64(fd, vec, vlen, pos, 0);
|
2013-03-20 08:42:10 -06:00
|
|
|
}
|
2016-03-03 08:03:59 -07:00
|
|
|
|
2016-07-14 13:31:53 -06:00
|
|
|
#ifdef __ARCH_WANT_COMPAT_SYS_PWRITEV64V2
|
|
|
|
COMPAT_SYSCALL_DEFINE5(pwritev64v2, unsigned long, fd,
|
|
|
|
const struct compat_iovec __user *,vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
unsigned long, vlen, loff_t, pos, rwf_t, flags)
|
2016-07-14 13:31:53 -06:00
|
|
|
{
|
2018-12-06 12:05:34 -07:00
|
|
|
if (pos == -1)
|
|
|
|
return do_compat_writev(fd, vec, vlen, flags);
|
|
|
|
|
2016-07-14 13:31:53 -06:00
|
|
|
return do_compat_pwritev64(fd, vec, vlen, pos, flags);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2016-03-03 08:03:59 -07:00
|
|
|
COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
|
|
|
|
const struct compat_iovec __user *,vec,
|
2017-07-06 10:58:37 -06:00
|
|
|
compat_ulong_t, vlen, u32, pos_low, u32, pos_high, rwf_t, flags)
|
2016-03-03 08:03:59 -07:00
|
|
|
{
|
|
|
|
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
|
|
|
|
|
|
|
|
if (pos == -1)
|
|
|
|
return do_compat_writev(fd, vec, vlen, flags);
|
|
|
|
|
|
|
|
return do_compat_pwritev64(fd, vec, vlen, pos, flags);
|
2013-03-20 08:42:10 -06:00
|
|
|
}
|
2016-03-03 08:03:59 -07:00
|
|
|
|
2013-03-20 08:42:10 -06:00
|
|
|
#endif
|
|
|
|
|
2013-02-24 00:17:03 -07:00
|
|
|
static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
|
|
|
|
size_t count, loff_t max)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
2012-08-28 10:52:22 -06:00
|
|
|
struct fd in, out;
|
|
|
|
struct inode *in_inode, *out_inode;
|
2005-04-16 16:20:36 -06:00
|
|
|
loff_t pos;
|
2013-06-20 08:58:36 -06:00
|
|
|
loff_t out_pos;
|
2005-04-16 16:20:36 -06:00
|
|
|
ssize_t retval;
|
2012-08-28 10:52:22 -06:00
|
|
|
int fl;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Get input file, and verify that it is ok..
|
|
|
|
*/
|
|
|
|
retval = -EBADF;
|
2012-08-28 10:52:22 -06:00
|
|
|
in = fdget(in_fd);
|
|
|
|
if (!in.file)
|
2005-04-16 16:20:36 -06:00
|
|
|
goto out;
|
2012-08-28 10:52:22 -06:00
|
|
|
if (!(in.file->f_mode & FMODE_READ))
|
2005-04-16 16:20:36 -06:00
|
|
|
goto fput_in;
|
|
|
|
retval = -ESPIPE;
|
2013-06-20 08:58:36 -06:00
|
|
|
if (!ppos) {
|
|
|
|
pos = in.file->f_pos;
|
|
|
|
} else {
|
|
|
|
pos = *ppos;
|
2012-08-28 10:52:22 -06:00
|
|
|
if (!(in.file->f_mode & FMODE_PREAD))
|
2005-04-16 16:20:36 -06:00
|
|
|
goto fput_in;
|
2013-06-20 08:58:36 -06:00
|
|
|
}
|
|
|
|
retval = rw_verify_area(READ, in.file, &pos, count);
|
2006-01-04 17:20:40 -07:00
|
|
|
if (retval < 0)
|
2005-04-16 16:20:36 -06:00
|
|
|
goto fput_in;
|
2016-03-31 19:48:20 -06:00
|
|
|
if (count > MAX_RW_COUNT)
|
|
|
|
count = MAX_RW_COUNT;
|
2005-04-16 16:20:36 -06:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Get output file, and verify that it is ok..
|
|
|
|
*/
|
|
|
|
retval = -EBADF;
|
2012-08-28 10:52:22 -06:00
|
|
|
out = fdget(out_fd);
|
|
|
|
if (!out.file)
|
2005-04-16 16:20:36 -06:00
|
|
|
goto fput_in;
|
2012-08-28 10:52:22 -06:00
|
|
|
if (!(out.file->f_mode & FMODE_WRITE))
|
2005-04-16 16:20:36 -06:00
|
|
|
goto fput_out;
|
|
|
|
retval = -EINVAL;
|
2013-01-23 15:07:38 -07:00
|
|
|
in_inode = file_inode(in.file);
|
|
|
|
out_inode = file_inode(out.file);
|
2013-06-20 08:58:36 -06:00
|
|
|
out_pos = out.file->f_pos;
|
|
|
|
retval = rw_verify_area(WRITE, out.file, &out_pos, count);
|
2006-01-04 17:20:40 -07:00
|
|
|
if (retval < 0)
|
2005-04-16 16:20:36 -06:00
|
|
|
goto fput_out;
|
|
|
|
|
|
|
|
if (!max)
|
|
|
|
max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
|
|
|
|
|
|
|
|
if (unlikely(pos + count > max)) {
|
|
|
|
retval = -EOVERFLOW;
|
|
|
|
if (pos >= max)
|
|
|
|
goto fput_out;
|
|
|
|
count = max - pos;
|
|
|
|
}
|
|
|
|
|
2007-06-11 04:18:52 -06:00
|
|
|
fl = 0;
|
2007-06-01 06:52:37 -06:00
|
|
|
#if 0
|
2007-06-11 04:18:52 -06:00
|
|
|
/*
|
|
|
|
* We need to debate whether we can enable this or not. The
|
|
|
|
* man page documents EAGAIN return for the output at least,
|
|
|
|
* and the application is arguably buggy if it doesn't expect
|
|
|
|
* EAGAIN on a non-blocking file descriptor.
|
|
|
|
*/
|
2012-08-28 10:52:22 -06:00
|
|
|
if (in.file->f_flags & O_NONBLOCK)
|
2007-06-11 04:18:52 -06:00
|
|
|
fl = SPLICE_F_NONBLOCK;
|
2007-06-01 06:52:37 -06:00
|
|
|
#endif
|
2013-05-23 18:10:34 -06:00
|
|
|
file_start_write(out.file);
|
2013-06-20 08:58:36 -06:00
|
|
|
retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
|
2013-05-23 18:10:34 -06:00
|
|
|
file_end_write(out.file);
|
2005-04-16 16:20:36 -06:00
|
|
|
|
|
|
|
if (retval > 0) {
|
[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct
They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".
And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-10 02:46:45 -07:00
|
|
|
add_rchar(current, retval);
|
|
|
|
add_wchar(current, retval);
|
2012-12-20 16:05:52 -07:00
|
|
|
fsnotify_access(in.file);
|
|
|
|
fsnotify_modify(out.file);
|
2013-06-20 08:58:36 -06:00
|
|
|
out.file->f_pos = out_pos;
|
|
|
|
if (ppos)
|
|
|
|
*ppos = pos;
|
|
|
|
else
|
|
|
|
in.file->f_pos = pos;
|
2005-04-16 16:20:36 -06:00
|
|
|
}
|
|
|
|
|
[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct
They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".
And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-10 02:46:45 -07:00
|
|
|
inc_syscr(current);
|
|
|
|
inc_syscw(current);
|
2013-06-20 08:58:36 -06:00
|
|
|
if (pos > max)
|
2005-04-16 16:20:36 -06:00
|
|
|
retval = -EOVERFLOW;
|
|
|
|
|
|
|
|
fput_out:
|
2012-08-28 10:52:22 -06:00
|
|
|
fdput(out);
|
2005-04-16 16:20:36 -06:00
|
|
|
fput_in:
|
2012-08-28 10:52:22 -06:00
|
|
|
fdput(in);
|
2005-04-16 16:20:36 -06:00
|
|
|
out:
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2009-01-14 06:14:18 -07:00
|
|
|
SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd, off_t __user *, offset, size_t, count)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
|
|
|
loff_t pos;
|
|
|
|
off_t off;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
if (offset) {
|
|
|
|
if (unlikely(get_user(off, offset)))
|
|
|
|
return -EFAULT;
|
|
|
|
pos = off;
|
|
|
|
ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
|
|
|
|
if (unlikely(put_user(pos, offset)))
|
|
|
|
return -EFAULT;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
return do_sendfile(out_fd, in_fd, NULL, count, 0);
|
|
|
|
}
|
|
|
|
|
2009-01-14 06:14:18 -07:00
|
|
|
SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, loff_t __user *, offset, size_t, count)
|
2005-04-16 16:20:36 -06:00
|
|
|
{
|
|
|
|
loff_t pos;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
if (offset) {
|
|
|
|
if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
|
|
|
|
return -EFAULT;
|
|
|
|
ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
|
|
|
|
if (unlikely(put_user(pos, offset)))
|
|
|
|
return -EFAULT;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
return do_sendfile(out_fd, in_fd, NULL, count, 0);
|
|
|
|
}
|
2013-02-24 00:17:03 -07:00
|
|
|
|
|
|
|
#ifdef CONFIG_COMPAT
|
|
|
|
COMPAT_SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd,
|
|
|
|
compat_off_t __user *, offset, compat_size_t, count)
|
|
|
|
{
|
|
|
|
loff_t pos;
|
|
|
|
off_t off;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
if (offset) {
|
|
|
|
if (unlikely(get_user(off, offset)))
|
|
|
|
return -EFAULT;
|
|
|
|
pos = off;
|
|
|
|
ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
|
|
|
|
if (unlikely(put_user(pos, offset)))
|
|
|
|
return -EFAULT;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
return do_sendfile(out_fd, in_fd, NULL, count, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
|
|
|
|
compat_loff_t __user *, offset, compat_size_t, count)
|
|
|
|
{
|
|
|
|
loff_t pos;
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
if (offset) {
|
|
|
|
if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
|
|
|
|
return -EFAULT;
|
|
|
|
ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
|
|
|
|
if (unlikely(put_user(pos, offset)))
|
|
|
|
return -EFAULT;
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
return do_sendfile(out_fd, in_fd, NULL, count, 0);
|
|
|
|
}
|
|
|
|
#endif
|
2015-11-10 14:53:30 -07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* copy_file_range() differs from regular file read and write in that it
|
|
|
|
* specifically allows return partial success. When it does so is up to
|
|
|
|
* the copy_file_range method.
|
|
|
|
*/
|
|
|
|
ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
|
|
|
|
struct file *file_out, loff_t pos_out,
|
|
|
|
size_t len, unsigned int flags)
|
|
|
|
{
|
|
|
|
struct inode *inode_in = file_inode(file_in);
|
|
|
|
struct inode *inode_out = file_inode(file_out);
|
|
|
|
ssize_t ret;
|
|
|
|
|
|
|
|
if (flags != 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2017-01-31 01:34:56 -07:00
|
|
|
if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
|
|
|
|
return -EISDIR;
|
|
|
|
if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2015-11-10 14:53:30 -07:00
|
|
|
ret = rw_verify_area(READ, file_in, &pos_in, len);
|
2016-03-31 19:48:20 -06:00
|
|
|
if (unlikely(ret))
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ret = rw_verify_area(WRITE, file_out, &pos_out, len);
|
|
|
|
if (unlikely(ret))
|
2015-11-10 14:53:30 -07:00
|
|
|
return ret;
|
|
|
|
|
|
|
|
if (!(file_in->f_mode & FMODE_READ) ||
|
|
|
|
!(file_out->f_mode & FMODE_WRITE) ||
|
2015-11-10 14:53:33 -07:00
|
|
|
(file_out->f_flags & O_APPEND))
|
2015-11-10 14:53:30 -07:00
|
|
|
return -EBADF;
|
|
|
|
|
|
|
|
/* this could be relaxed once a method supports cross-fs copies */
|
|
|
|
if (inode_in->i_sb != inode_out->i_sb)
|
|
|
|
return -EXDEV;
|
|
|
|
|
|
|
|
if (len == 0)
|
|
|
|
return 0;
|
|
|
|
|
2017-01-31 01:34:57 -07:00
|
|
|
file_start_write(file_out);
|
2015-11-10 14:53:30 -07:00
|
|
|
|
2016-12-09 17:17:19 -07:00
|
|
|
/*
|
|
|
|
* Try cloning first, this is supported by more file systems, and
|
|
|
|
* more efficient if both clone and copy are supported (e.g. NFS).
|
|
|
|
*/
|
|
|
|
if (file_in->f_op->clone_file_range) {
|
|
|
|
ret = file_in->f_op->clone_file_range(file_in, pos_in,
|
|
|
|
file_out, pos_out, len);
|
|
|
|
if (ret == 0) {
|
|
|
|
ret = len;
|
|
|
|
goto done;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (file_out->f_op->copy_file_range) {
|
2015-11-10 14:53:33 -07:00
|
|
|
ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
|
|
|
|
pos_out, len, flags);
|
2016-12-09 17:17:19 -07:00
|
|
|
if (ret != -EOPNOTSUPP)
|
|
|
|
goto done;
|
|
|
|
}
|
2015-11-10 14:53:33 -07:00
|
|
|
|
2016-12-09 17:17:19 -07:00
|
|
|
ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
|
|
|
|
len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
|
2015-11-10 14:53:33 -07:00
|
|
|
|
2016-12-09 17:17:19 -07:00
|
|
|
done:
|
2015-11-10 14:53:30 -07:00
|
|
|
if (ret > 0) {
|
|
|
|
fsnotify_access(file_in);
|
|
|
|
add_rchar(current, ret);
|
|
|
|
fsnotify_modify(file_out);
|
|
|
|
add_wchar(current, ret);
|
|
|
|
}
|
2016-12-09 17:17:19 -07:00
|
|
|
|
2015-11-10 14:53:30 -07:00
|
|
|
inc_syscr(current);
|
|
|
|
inc_syscw(current);
|
|
|
|
|
2017-01-31 01:34:57 -07:00
|
|
|
file_end_write(file_out);
|
2015-11-10 14:53:30 -07:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(vfs_copy_file_range);
|
|
|
|
|
|
|
|
SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
|
|
|
|
int, fd_out, loff_t __user *, off_out,
|
|
|
|
size_t, len, unsigned int, flags)
|
|
|
|
{
|
|
|
|
loff_t pos_in;
|
|
|
|
loff_t pos_out;
|
|
|
|
struct fd f_in;
|
|
|
|
struct fd f_out;
|
|
|
|
ssize_t ret = -EBADF;
|
|
|
|
|
|
|
|
f_in = fdget(fd_in);
|
|
|
|
if (!f_in.file)
|
|
|
|
goto out2;
|
|
|
|
|
|
|
|
f_out = fdget(fd_out);
|
|
|
|
if (!f_out.file)
|
|
|
|
goto out1;
|
|
|
|
|
|
|
|
ret = -EFAULT;
|
|
|
|
if (off_in) {
|
|
|
|
if (copy_from_user(&pos_in, off_in, sizeof(loff_t)))
|
|
|
|
goto out;
|
|
|
|
} else {
|
|
|
|
pos_in = f_in.file->f_pos;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (off_out) {
|
|
|
|
if (copy_from_user(&pos_out, off_out, sizeof(loff_t)))
|
|
|
|
goto out;
|
|
|
|
} else {
|
|
|
|
pos_out = f_out.file->f_pos;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
|
|
|
|
flags);
|
|
|
|
if (ret > 0) {
|
|
|
|
pos_in += ret;
|
|
|
|
pos_out += ret;
|
|
|
|
|
|
|
|
if (off_in) {
|
|
|
|
if (copy_to_user(off_in, &pos_in, sizeof(loff_t)))
|
|
|
|
ret = -EFAULT;
|
|
|
|
} else {
|
|
|
|
f_in.file->f_pos = pos_in;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (off_out) {
|
|
|
|
if (copy_to_user(off_out, &pos_out, sizeof(loff_t)))
|
|
|
|
ret = -EFAULT;
|
|
|
|
} else {
|
|
|
|
f_out.file->f_pos = pos_out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
fdput(f_out);
|
|
|
|
out1:
|
|
|
|
fdput(f_in);
|
|
|
|
out2:
|
|
|
|
return ret;
|
|
|
|
}
|
2015-12-03 04:59:50 -07:00
|
|
|
|
|
|
|
static int clone_verify_area(struct file *file, loff_t pos, u64 len, bool write)
|
|
|
|
{
|
|
|
|
struct inode *inode = file_inode(file);
|
|
|
|
|
|
|
|
if (unlikely(pos < 0))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (unlikely((loff_t) (pos + len) < 0))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
|
|
|
|
loff_t end = len ? pos + len - 1 : OFFSET_MAX;
|
|
|
|
int retval;
|
|
|
|
|
|
|
|
retval = locks_mandatory_area(inode, file, pos, end,
|
|
|
|
write ? F_WRLCK : F_RDLCK);
|
|
|
|
if (retval < 0)
|
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
|
|
|
return security_file_permission(file, write ? MAY_WRITE : MAY_READ);
|
|
|
|
}
|
2018-10-29 17:40:55 -06:00
|
|
|
/*
|
|
|
|
* Ensure that we don't remap a partial EOF block in the middle of something
|
|
|
|
* else. Assume that the offsets have already been checked for block
|
|
|
|
* alignment.
|
|
|
|
*
|
|
|
|
* For deduplication we always scale down to the previous block because we
|
|
|
|
* can't meaningfully compare post-EOF contents.
|
|
|
|
*
|
|
|
|
* For clone we only link a partial EOF block above the destination file's EOF.
|
|
|
|
*/
|
|
|
|
static int generic_remap_check_len(struct inode *inode_in,
|
|
|
|
struct inode *inode_out,
|
|
|
|
loff_t pos_out,
|
|
|
|
u64 *len,
|
|
|
|
bool is_dedupe)
|
|
|
|
{
|
|
|
|
u64 blkmask = i_blocksize(inode_in) - 1;
|
|
|
|
|
|
|
|
if ((*len & blkmask) == 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (is_dedupe)
|
|
|
|
*len &= ~blkmask;
|
|
|
|
else if (pos_out + *len < i_size_read(inode_out))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2015-12-03 04:59:50 -07:00
|
|
|
|
2016-12-09 17:18:30 -07:00
|
|
|
/*
|
|
|
|
* Check that the two inodes are eligible for cloning, the ranges make
|
|
|
|
* sense, and then flush all dirty data. Caller must ensure that the
|
|
|
|
* inodes have been locked against any other modifications.
|
2016-12-19 16:13:26 -07:00
|
|
|
*
|
|
|
|
* Returns: 0 for "nothing to clone", 1 for "something to clone", or
|
|
|
|
* the usual negative error code.
|
2016-12-09 17:18:30 -07:00
|
|
|
*/
|
|
|
|
int vfs_clone_file_prep_inodes(struct inode *inode_in, loff_t pos_in,
|
|
|
|
struct inode *inode_out, loff_t pos_out,
|
|
|
|
u64 *len, bool is_dedupe)
|
|
|
|
{
|
|
|
|
loff_t bs = inode_out->i_sb->s_blocksize;
|
|
|
|
loff_t blen;
|
|
|
|
loff_t isize;
|
|
|
|
bool same_inode = (inode_in == inode_out);
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* Don't touch certain kinds of inodes */
|
|
|
|
if (IS_IMMUTABLE(inode_out))
|
|
|
|
return -EPERM;
|
|
|
|
|
|
|
|
if (IS_SWAPFILE(inode_in) || IS_SWAPFILE(inode_out))
|
|
|
|
return -ETXTBSY;
|
|
|
|
|
|
|
|
/* Don't reflink dirs, pipes, sockets... */
|
|
|
|
if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
|
|
|
|
return -EISDIR;
|
|
|
|
if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Are we going all the way to the end? */
|
|
|
|
isize = i_size_read(inode_in);
|
2016-12-19 16:13:26 -07:00
|
|
|
if (isize == 0)
|
2016-12-09 17:18:30 -07:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* Zero length dedupe exits immediately; reflink goes to EOF. */
|
|
|
|
if (*len == 0) {
|
2016-12-19 16:13:26 -07:00
|
|
|
if (is_dedupe || pos_in == isize)
|
2016-12-09 17:18:30 -07:00
|
|
|
return 0;
|
2016-12-19 16:13:26 -07:00
|
|
|
if (pos_in > isize)
|
|
|
|
return -EINVAL;
|
2016-12-09 17:18:30 -07:00
|
|
|
*len = isize - pos_in;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Ensure offsets don't wrap and the input is inside i_size */
|
|
|
|
if (pos_in + *len < pos_in || pos_out + *len < pos_out ||
|
|
|
|
pos_in + *len > isize)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Don't allow dedupe past EOF in the dest file */
|
|
|
|
if (is_dedupe) {
|
|
|
|
loff_t disize;
|
|
|
|
|
|
|
|
disize = i_size_read(inode_out);
|
|
|
|
if (pos_out >= disize || pos_out + *len > disize)
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* If we're linking to EOF, continue to the block boundary. */
|
|
|
|
if (pos_in + *len == isize)
|
|
|
|
blen = ALIGN(isize, bs) - pos_in;
|
|
|
|
else
|
|
|
|
blen = *len;
|
|
|
|
|
|
|
|
/* Only reflink if we're aligned to block boundaries */
|
|
|
|
if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_in + blen, bs) ||
|
|
|
|
!IS_ALIGNED(pos_out, bs) || !IS_ALIGNED(pos_out + blen, bs))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Don't allow overlapped reflink within the same file */
|
|
|
|
if (same_inode) {
|
|
|
|
if (pos_out + blen > pos_in && pos_out < pos_in + blen)
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Wait for the completion of any pending IOs on both files */
|
|
|
|
inode_dio_wait(inode_in);
|
|
|
|
if (!same_inode)
|
|
|
|
inode_dio_wait(inode_out);
|
|
|
|
|
|
|
|
ret = filemap_write_and_wait_range(inode_in->i_mapping,
|
|
|
|
pos_in, pos_in + *len - 1);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ret = filemap_write_and_wait_range(inode_out->i_mapping,
|
|
|
|
pos_out, pos_out + *len - 1);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check that the extents are the same.
|
|
|
|
*/
|
|
|
|
if (is_dedupe) {
|
|
|
|
bool is_same = false;
|
|
|
|
|
|
|
|
ret = vfs_dedupe_file_range_compare(inode_in, pos_in,
|
|
|
|
inode_out, pos_out, *len, &is_same);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
if (!is_same)
|
|
|
|
return -EBADE;
|
|
|
|
}
|
|
|
|
|
2018-10-29 17:40:55 -06:00
|
|
|
ret = generic_remap_check_len(inode_in, inode_out, pos_out, len,
|
|
|
|
is_dedupe);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-12-19 16:13:26 -07:00
|
|
|
return 1;
|
2016-12-09 17:18:30 -07:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(vfs_clone_file_prep_inodes);
|
|
|
|
|
2018-09-18 07:34:34 -06:00
|
|
|
int do_clone_file_range(struct file *file_in, loff_t pos_in,
|
|
|
|
struct file *file_out, loff_t pos_out, u64 len)
|
2015-12-03 04:59:50 -07:00
|
|
|
{
|
|
|
|
struct inode *inode_in = file_inode(file_in);
|
|
|
|
struct inode *inode_out = file_inode(file_out);
|
|
|
|
int ret;
|
|
|
|
|
2016-10-26 13:34:01 -06:00
|
|
|
if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
|
|
|
|
return -EISDIR;
|
|
|
|
if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-09-23 02:38:10 -06:00
|
|
|
/*
|
|
|
|
* FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
|
|
|
|
* the same mount. Practically, they only need to be on the same file
|
|
|
|
* system.
|
|
|
|
*/
|
|
|
|
if (inode_in->i_sb != inode_out->i_sb)
|
2015-12-03 04:59:50 -07:00
|
|
|
return -EXDEV;
|
|
|
|
|
|
|
|
if (!(file_in->f_mode & FMODE_READ) ||
|
|
|
|
!(file_out->f_mode & FMODE_WRITE) ||
|
2016-02-26 10:53:12 -07:00
|
|
|
(file_out->f_flags & O_APPEND))
|
2015-12-03 04:59:50 -07:00
|
|
|
return -EBADF;
|
|
|
|
|
2016-02-26 10:53:12 -07:00
|
|
|
if (!file_in->f_op->clone_file_range)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
2015-12-03 04:59:50 -07:00
|
|
|
ret = clone_verify_area(file_in, pos_in, len, false);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ret = clone_verify_area(file_out, pos_out, len, true);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
if (pos_in + len > i_size_read(inode_in))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
ret = file_in->f_op->clone_file_range(file_in, pos_in,
|
|
|
|
file_out, pos_out, len);
|
|
|
|
if (!ret) {
|
|
|
|
fsnotify_access(file_in);
|
|
|
|
fsnotify_modify(file_out);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
2018-09-18 07:34:34 -06:00
|
|
|
EXPORT_SYMBOL(do_clone_file_range);
|
|
|
|
|
|
|
|
int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
|
|
|
|
struct file *file_out, loff_t pos_out, u64 len)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
file_start_write(file_out);
|
|
|
|
ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len);
|
|
|
|
file_end_write(file_out);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
2015-12-03 04:59:50 -07:00
|
|
|
EXPORT_SYMBOL(vfs_clone_file_range);
|
2015-12-19 01:55:59 -07:00
|
|
|
|
2019-08-11 16:52:25 -06:00
|
|
|
/* Read a page's worth of file data into the page cache. */
|
2016-12-09 17:18:30 -07:00
|
|
|
static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
|
|
|
|
{
|
|
|
|
struct address_space *mapping;
|
|
|
|
struct page *page;
|
|
|
|
pgoff_t n;
|
|
|
|
|
|
|
|
n = offset >> PAGE_SHIFT;
|
|
|
|
mapping = inode->i_mapping;
|
|
|
|
page = read_mapping_page(mapping, n, NULL);
|
|
|
|
if (IS_ERR(page))
|
|
|
|
return page;
|
|
|
|
if (!PageUptodate(page)) {
|
|
|
|
put_page(page);
|
|
|
|
return ERR_PTR(-EIO);
|
|
|
|
}
|
|
|
|
return page;
|
|
|
|
}
|
|
|
|
|
2019-08-11 16:52:25 -06:00
|
|
|
/*
|
|
|
|
* Lock two pages, ensuring that we lock in offset order if the pages are from
|
|
|
|
* the same file.
|
|
|
|
*/
|
|
|
|
static void vfs_lock_two_pages(struct page *page1, struct page *page2)
|
|
|
|
{
|
|
|
|
/* Always lock in order of increasing index. */
|
|
|
|
if (page1->index > page2->index)
|
|
|
|
swap(page1, page2);
|
|
|
|
|
|
|
|
lock_page(page1);
|
|
|
|
if (page1 != page2)
|
|
|
|
lock_page(page2);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Unlock two pages, being careful not to unlock the same page twice. */
|
|
|
|
static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
|
|
|
|
{
|
|
|
|
unlock_page(page1);
|
|
|
|
if (page1 != page2)
|
|
|
|
unlock_page(page2);
|
|
|
|
}
|
|
|
|
|
2016-12-09 17:18:30 -07:00
|
|
|
/*
|
|
|
|
* Compare extents of two files to see if they are the same.
|
|
|
|
* Caller must have locked both inodes to prevent write races.
|
|
|
|
*/
|
|
|
|
int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
|
|
|
|
struct inode *dest, loff_t destoff,
|
|
|
|
loff_t len, bool *is_same)
|
|
|
|
{
|
|
|
|
loff_t src_poff;
|
|
|
|
loff_t dest_poff;
|
|
|
|
void *src_addr;
|
|
|
|
void *dest_addr;
|
|
|
|
struct page *src_page;
|
|
|
|
struct page *dest_page;
|
|
|
|
loff_t cmp_len;
|
|
|
|
bool same;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = -EINVAL;
|
|
|
|
same = true;
|
|
|
|
while (len) {
|
|
|
|
src_poff = srcoff & (PAGE_SIZE - 1);
|
|
|
|
dest_poff = destoff & (PAGE_SIZE - 1);
|
|
|
|
cmp_len = min(PAGE_SIZE - src_poff,
|
|
|
|
PAGE_SIZE - dest_poff);
|
|
|
|
cmp_len = min(cmp_len, len);
|
|
|
|
if (cmp_len <= 0)
|
|
|
|
goto out_error;
|
|
|
|
|
|
|
|
src_page = vfs_dedupe_get_page(src, srcoff);
|
|
|
|
if (IS_ERR(src_page)) {
|
|
|
|
error = PTR_ERR(src_page);
|
|
|
|
goto out_error;
|
|
|
|
}
|
|
|
|
dest_page = vfs_dedupe_get_page(dest, destoff);
|
|
|
|
if (IS_ERR(dest_page)) {
|
|
|
|
error = PTR_ERR(dest_page);
|
|
|
|
put_page(src_page);
|
|
|
|
goto out_error;
|
|
|
|
}
|
2019-08-11 16:52:25 -06:00
|
|
|
|
|
|
|
vfs_lock_two_pages(src_page, dest_page);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Now that we've locked both pages, make sure they're still
|
|
|
|
* mapped to the file data we're interested in. If not,
|
|
|
|
* someone is invalidating pages on us and we lose.
|
|
|
|
*/
|
|
|
|
if (!PageUptodate(src_page) || !PageUptodate(dest_page) ||
|
|
|
|
src_page->mapping != src->i_mapping ||
|
|
|
|
dest_page->mapping != dest->i_mapping) {
|
|
|
|
same = false;
|
|
|
|
goto unlock;
|
|
|
|
}
|
|
|
|
|
2016-12-09 17:18:30 -07:00
|
|
|
src_addr = kmap_atomic(src_page);
|
|
|
|
dest_addr = kmap_atomic(dest_page);
|
|
|
|
|
|
|
|
flush_dcache_page(src_page);
|
|
|
|
flush_dcache_page(dest_page);
|
|
|
|
|
|
|
|
if (memcmp(src_addr + src_poff, dest_addr + dest_poff, cmp_len))
|
|
|
|
same = false;
|
|
|
|
|
|
|
|
kunmap_atomic(dest_addr);
|
|
|
|
kunmap_atomic(src_addr);
|
2019-08-11 16:52:25 -06:00
|
|
|
unlock:
|
|
|
|
vfs_unlock_two_pages(src_page, dest_page);
|
2016-12-09 17:18:30 -07:00
|
|
|
put_page(dest_page);
|
|
|
|
put_page(src_page);
|
|
|
|
|
|
|
|
if (!same)
|
|
|
|
break;
|
|
|
|
|
|
|
|
srcoff += cmp_len;
|
|
|
|
destoff += cmp_len;
|
|
|
|
len -= cmp_len;
|
|
|
|
}
|
|
|
|
|
|
|
|
*is_same = same;
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
out_error:
|
|
|
|
return error;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(vfs_dedupe_file_range_compare);
|
|
|
|
|
2018-07-18 07:44:40 -06:00
|
|
|
int vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
|
|
|
|
struct file *dst_file, loff_t dst_pos, u64 len)
|
2018-07-06 15:57:03 -06:00
|
|
|
{
|
|
|
|
s64 ret;
|
|
|
|
|
|
|
|
ret = mnt_want_write_file(dst_file);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ret = clone_verify_area(dst_file, dst_pos, len, true);
|
|
|
|
if (ret < 0)
|
|
|
|
goto out_drop_write;
|
|
|
|
|
|
|
|
ret = -EINVAL;
|
|
|
|
if (!(capable(CAP_SYS_ADMIN) || (dst_file->f_mode & FMODE_WRITE)))
|
|
|
|
goto out_drop_write;
|
|
|
|
|
|
|
|
ret = -EXDEV;
|
|
|
|
if (src_file->f_path.mnt != dst_file->f_path.mnt)
|
|
|
|
goto out_drop_write;
|
|
|
|
|
|
|
|
ret = -EISDIR;
|
|
|
|
if (S_ISDIR(file_inode(dst_file)->i_mode))
|
|
|
|
goto out_drop_write;
|
|
|
|
|
|
|
|
ret = -EINVAL;
|
|
|
|
if (!dst_file->f_op->dedupe_file_range)
|
|
|
|
goto out_drop_write;
|
|
|
|
|
|
|
|
ret = dst_file->f_op->dedupe_file_range(src_file, src_pos,
|
|
|
|
dst_file, dst_pos, len);
|
|
|
|
out_drop_write:
|
|
|
|
mnt_drop_write_file(dst_file);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
2018-07-18 07:44:40 -06:00
|
|
|
EXPORT_SYMBOL(vfs_dedupe_file_range_one);
|
2018-07-06 15:57:03 -06:00
|
|
|
|
2015-12-19 01:55:59 -07:00
|
|
|
int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
|
|
|
|
{
|
|
|
|
struct file_dedupe_range_info *info;
|
|
|
|
struct inode *src = file_inode(file);
|
|
|
|
u64 off;
|
|
|
|
u64 len;
|
|
|
|
int i;
|
|
|
|
int ret;
|
|
|
|
u16 count = same->dest_count;
|
2018-07-06 15:57:03 -06:00
|
|
|
int deduped;
|
2015-12-19 01:55:59 -07:00
|
|
|
|
|
|
|
if (!(file->f_mode & FMODE_READ))
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (same->reserved1 || same->reserved2)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
off = same->src_offset;
|
|
|
|
len = same->src_length;
|
|
|
|
|
|
|
|
ret = -EISDIR;
|
|
|
|
if (S_ISDIR(src->i_mode))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
ret = -EINVAL;
|
|
|
|
if (!S_ISREG(src->i_mode))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
ret = clone_verify_area(file, off, len, false);
|
|
|
|
if (ret < 0)
|
|
|
|
goto out;
|
|
|
|
ret = 0;
|
|
|
|
|
2016-12-19 16:13:26 -07:00
|
|
|
if (off + len > i_size_read(src))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2018-07-06 15:57:02 -06:00
|
|
|
/* Arbitrary 1G limit on a single dedupe request, can be raised. */
|
|
|
|
len = min_t(u64, len, 1 << 30);
|
|
|
|
|
2015-12-19 01:55:59 -07:00
|
|
|
/* pre-format output fields to sane values */
|
|
|
|
for (i = 0; i < count; i++) {
|
|
|
|
same->info[i].bytes_deduped = 0ULL;
|
|
|
|
same->info[i].status = FILE_DEDUPE_RANGE_SAME;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0, info = same->info; i < count; i++, info++) {
|
|
|
|
struct fd dst_fd = fdget(info->dest_fd);
|
2018-07-06 15:57:03 -06:00
|
|
|
struct file *dst_file = dst_fd.file;
|
2015-12-19 01:55:59 -07:00
|
|
|
|
|
|
|
if (!dst_file) {
|
|
|
|
info->status = -EBADF;
|
|
|
|
goto next_loop;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (info->reserved) {
|
|
|
|
info->status = -EINVAL;
|
2018-07-06 15:57:03 -06:00
|
|
|
goto next_fdput;
|
2015-12-19 01:55:59 -07:00
|
|
|
}
|
|
|
|
|
2018-07-06 15:57:03 -06:00
|
|
|
deduped = vfs_dedupe_file_range_one(file, off, dst_file,
|
|
|
|
info->dest_offset, len);
|
|
|
|
if (deduped == -EBADE)
|
|
|
|
info->status = FILE_DEDUPE_RANGE_DIFFERS;
|
|
|
|
else if (deduped < 0)
|
|
|
|
info->status = deduped;
|
|
|
|
else
|
|
|
|
info->bytes_deduped = len;
|
|
|
|
|
2018-04-14 00:16:58 -06:00
|
|
|
next_fdput:
|
2015-12-19 01:55:59 -07:00
|
|
|
fdput(dst_fd);
|
2018-04-14 00:16:58 -06:00
|
|
|
next_loop:
|
2016-01-22 17:58:28 -07:00
|
|
|
if (fatal_signal_pending(current))
|
|
|
|
goto out;
|
2015-12-19 01:55:59 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(vfs_dedupe_file_range);
|