kernel-fxtec-pro1x/fs
Kent Overstreet df2cb6daa4 block: Avoid deadlocks with bio allocation by stacking drivers
Previously, if we ever try to allocate more than once from the same bio
set while running under generic_make_request() (i.e. a stacking block
driver), we risk deadlock.

This is because of the code in generic_make_request() that converts
recursion to iteration; any bios we submit won't actually be submitted
(so they can complete and eventually be freed) until after we return -
this means if we allocate a second bio, we're blocking the first one
from ever being freed.

Thus if enough threads call into a stacking block driver at the same
time with bios that need multiple splits, and the bio_set's reserve gets
used up, we deadlock.

This can be worked around in the driver code - we could check if we're
running under generic_make_request(), then mask out __GFP_WAIT when we
go to allocate a bio, and if the allocation fails punt to workqueue and
retry the allocation.

But this is tricky and not a generic solution. This patch solves it for
all users by inverting the previously described technique. We allocate a
rescuer workqueue for each bio_set, and then in the allocation code if
there are bios on current->bio_list we would be blocking, we punt them
to the rescuer workqueue to be submitted.

This guarantees forward progress for bio allocations under
generic_make_request() provided each bio is submitted before allocating
the next, and provided the bios are freed after they complete.

Note that this doesn't do anything for allocation from other mempools.
Instead of allocating per bio data structures from a mempool, code
should use bio_set's front_pad.

Tested it by forcing the rescue codepath to be taken (by disabling the
first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
of arbitrary bio splitting) and verified that the rescuer was being
invoked.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Muthukumar Ratty <muthur@gmail.com>
2013-03-23 14:15:26 -07:00
..
9p fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
adfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
affs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
afs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
autofs4 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
befs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
bfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2013-03-17 11:04:14 -07:00
cachefiles FS-Cache: Mark cancellation of in-progress operation 2012-12-20 22:34:00 +00:00
ceph fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
cifs fs: Limit sys_mount to only request filesystem modules. (Part 3) 2013-03-11 07:09:48 -07:00
coda fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
configfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
cramfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
debugfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
devpts fs: Limit sys_mount to only request filesystem modules (Part 2). 2013-03-07 01:08:55 -08:00
dlm hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ecryptfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-03-09 16:51:13 -08:00
efs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
exofs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
exportfs hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ext2 ext2: Fix BUG_ON in evict() on inode deletion 2013-03-13 15:23:44 +01:00
ext3 ext3: Fix format string issues 2013-03-11 22:05:56 +01:00
ext4 fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
f2fs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
fat fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
freevxfs fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
fscache hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
fuse fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
gfs2 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
hfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
hfsplus fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
hostfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-03-13 15:47:50 -07:00
hpfs fs: Limit sys_mount to only request filesystem modules. (Part 3) 2013-03-11 07:09:48 -07:00
hppfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
hugetlbfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
isofs fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
jbd jbd: don't wake kjournald unnecessarily 2013-01-14 22:50:45 +01:00
jbd2 jbd2: fix ERR_PTR dereference in jbd2__journal_start 2013-03-02 17:08:46 -05:00
jffs2 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
jfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
lockd Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux 2013-02-28 18:02:55 -08:00
logfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
minix fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
ncpfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
nfs fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
nfs_common nfs_common: Update the translation between nfsv3 acls linux posix acls 2013-02-13 06:15:14 -08:00
nfsd nfsd: convert to idr_alloc() 2013-03-13 15:21:45 -07:00
nilfs2 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
nls
notify hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ntfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
ocfs2 fs: Limit sys_mount to only request filesystem modules (Part 2). 2013-03-07 01:08:55 -08:00
omfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
openpromfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
proc proc: Use nd_jump_link in proc_ns_follow_link 2013-03-09 00:14:45 -08:00
pstore A few fixes to reduce places where pstore might hang 2013-02-21 09:38:18 -08:00
qnx4 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
qnx6 fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
quota quota: add missing use of dq_data_lock in __dquot_initialize 2013-03-11 22:05:56 +01:00
ramfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
reiserfs reiserfs: Use kstrdup instead of kmalloc/strcpy 2013-03-11 22:05:57 +01:00
romfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
squashfs fs: Limit sys_mount to only request filesystem modules. (Part 3) 2013-03-11 07:09:48 -07:00
sysfs hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
sysv fs: Readd the fs module aliases. 2013-03-12 18:55:21 -07:00
ubifs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
udf fs: Limit sys_mount to only request filesystem modules. (Part 3) 2013-03-11 07:09:48 -07:00
ufs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
xfs fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
aio.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
anon_inodes.c get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero 2013-02-26 02:46:11 -05:00
attr.c userns: Allow chown and setgid preservation 2012-11-20 04:17:24 -08:00
bad_inode.c lseek: the "whence" argument is called "whence" 2012-12-17 17:15:12 -08:00
binfmt_aout.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
binfmt_elf.c ImgTec Meta architecture changes for v3.9-rc1 2013-03-03 12:06:09 -08:00
binfmt_elf_fdpic.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
binfmt_em86.c exec: use -ELOOP for max recursion depth 2012-12-17 17:15:23 -08:00
binfmt_flat.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
binfmt_misc.c fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
binfmt_script.c exec: do not leave bprm->interp on stack 2012-12-20 17:40:19 -08:00
binfmt_som.c get rid of pt_regs argument of ->load_binary() 2012-11-28 21:53:38 -05:00
bio-integrity.c block: Ues bi_pool for bio_integrity_alloc() 2012-09-09 10:35:38 +02:00
bio.c block: Avoid deadlocks with bio allocation by stacking drivers 2013-03-23 14:15:26 -07:00
block_dev.c Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block 2013-02-28 12:52:24 -08:00
buffer.c Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block 2013-02-28 12:52:24 -08:00
char_dev.c char_dev: pin parent kobject 2012-10-22 08:50:37 +03:00
compat.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-12 11:05:45 -07:00
compat_binfmt_elf.c coredump: extend core dump note section to contain file names of mapped files 2012-10-06 03:05:17 +09:00
compat_ioctl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
coredump.c coredump: remove redundant defines for dumpable states 2013-02-27 19:10:11 -08:00
coredump.h coredump: update coredump-related headers 2012-10-06 03:05:15 +09:00
dcache.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
dcookies.c
direct-io.c fs: Fix possible use-after-free with AIO 2013-02-22 23:31:36 -05:00
drop_caches.c
eventfd.c fs, eventfd: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
eventpoll.c epoll: prevent missed events on EPOLL_CTL_MOD 2013-01-02 09:16:43 -08:00
exec.c coredump: remove redundant defines for dumpable states 2013-02-27 19:10:11 -08:00
fcntl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
fhandle.c Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux 2012-12-20 14:04:11 -08:00
fifo.c
file.c locking: Various static lock initializer fixes 2013-02-19 08:42:45 +01:00
file_table.c cache the value of file_inode() in struct file 2013-03-01 19:48:30 -05:00
filesystems.c fs: Limit sys_mount to only request filesystem modules. 2013-03-03 19:36:31 -08:00
fs-writeback.c 2 writeback fixes 2013-02-28 13:21:44 -08:00
fs_struct.c constify path_get/path_put and fs_struct.c stuff 2013-03-01 23:51:07 -05:00
generic_acl.c userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr 2012-09-18 01:01:35 -07:00
inode.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
internal.h constify path_get/path_put and fs_struct.c stuff 2013-03-01 23:51:07 -05:00
ioctl.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
ioprio.c
Kconfig fuse: Move CUSE Kconfig entry from fs/Kconfig into fs/fuse/Kconfig 2013-01-17 13:08:45 +01:00
Kconfig.binfmt coredump: make core dump functionality optional 2012-10-06 03:05:15 +09:00
libfs.c vfs: drop vmtruncate 2012-12-20 18:46:29 -05:00
locks.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
Makefile f2fs: update Kconfig and Makefile 2012-12-11 13:43:42 +09:00
mbcache.c
mount.h proc: Usable inode numbers for the namespace file descriptors. 2012-11-20 04:19:49 -08:00
mpage.c
namei.c vfs: don't BUG_ON() if following a /proc fd pseudo-symlink results in a symlink 2013-03-08 09:03:07 -08:00
namespace.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
no-block.c
open.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-03-03 13:23:03 -08:00
pipe.c vfs: fix pipe counter breakage 2013-03-12 08:29:17 -07:00
pnode.c
pnode.h vfs: Only support slave subtrees across different user namespaces 2012-11-19 05:59:20 -08:00
posix_acl.c userns: Convert vfs posix_acl support to use kuids and kgids 2012-09-18 01:01:35 -07:00
proc_namespace.c
read_write.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-03-02 08:34:06 -08:00
read_write.h compat: fs: Generic compat_sys_sendfile implementation 2012-10-02 21:35:55 -04:00
readdir.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
select.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
seq_file.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-03-03 13:23:03 -08:00
signalfd.c fs, epoll: add procfs fdinfo helper 2012-12-17 17:15:27 -08:00
splice.c export kernel_write(), convert open-coded instances 2013-02-26 02:46:11 -05:00
stack.c
stat.c switch vfs_getattr() to struct path 2013-02-26 02:46:08 -05:00
statfs.c vfs: fix user_statfs to retry once on ESTALE errors 2012-12-20 18:50:07 -05:00
super.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
sync.c new helper: file_inode(file) 2013-02-22 23:31:31 -05:00
timerfd.c compat: restore timerfd settime and gettime compat syscalls 2013-03-02 09:35:13 -05:00
utimes.c vfs: allow utimensat() calls to retry once on an ESTALE error 2012-12-20 18:50:08 -05:00
xattr.c vfs: make lremovexattr retry once on ESTALE error 2012-12-20 18:50:11 -05:00
xattr_acl.c userns: Fix posix_acl_file_xattr_userns gid conversion 2012-10-12 13:16:48 -07:00