kernel-fxtec-pro1x/fs
Douglas Anderson b6256c2966 bdev: Reduce time holding bd_mutex in sync in blkdev_close()
[ Upstream commit b849dd84b6ccfe32622988b79b7b073861fcf9f7 ]

While trying to "dd" to the block device for a USB stick, I
encountered a hung task warning (blocked for > 120 seconds).  I
managed to come up with an easy way to reproduce this on my system
(where /dev/sdb is the block device for my USB stick) with:

  while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done

With my reproduction here are the relevant bits from the hung task
detector:

 INFO: task udevd:294 blocked for more than 122 seconds.
 ...
 udevd           D    0   294      1 0x00400008
 Call trace:
  ...
  mutex_lock_nested+0x40/0x50
  __blkdev_get+0x7c/0x3d4
  blkdev_get+0x118/0x138
  blkdev_open+0x94/0xa8
  do_dentry_open+0x268/0x3a0
  vfs_open+0x34/0x40
  path_openat+0x39c/0xdf4
  do_filp_open+0x90/0x10c
  do_sys_open+0x150/0x3c8
  ...

 ...
 Showing all locks held in the system:
 ...
 1 lock held by dd/2798:
  #0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204
 ...
 dd              D    0  2798   2764 0x00400208
 Call trace:
  ...
  schedule+0x8c/0xbc
  io_schedule+0x1c/0x40
  wait_on_page_bit_common+0x238/0x338
  __lock_page+0x5c/0x68
  write_cache_pages+0x194/0x500
  generic_writepages+0x64/0xa4
  blkdev_writepages+0x24/0x30
  do_writepages+0x48/0xa8
  __filemap_fdatawrite_range+0xac/0xd8
  filemap_write_and_wait+0x30/0x84
  __blkdev_put+0x88/0x204
  blkdev_put+0xc4/0xe4
  blkdev_close+0x28/0x38
  __fput+0xe0/0x238
  ____fput+0x1c/0x28
  task_work_run+0xb0/0xe4
  do_notify_resume+0xfc0/0x14bc
  work_pending+0x8/0x14

The problem appears related to the fact that my USB disk is terribly
slow and that I have a lot of RAM in my system to cache things.
Specifically my writes seem to be happening at ~15 MB/s and I've got
~4 GB of RAM in my system that can be used for buffering.  To write 4
GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds.

The 267 second number is a problem because in __blkdev_put() we call
sync_blockdev() while holding the bd_mutex.  Any other callers who
want the bd_mutex will be blocked for the whole time.

The problem is made worse because I believe blkdev_put() specifically
tells other tasks (namely udev) to go try to access the device at right
around the same time we're going to hold the mutex for a long time.

Putting some traces around this (after disabling the hung task detector),
I could confirm:
 dd:    437.608600: __blkdev_put() right before sync_blockdev() for sdb
 udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb
 dd:    661.468451: __blkdev_put() right after sync_blockdev() for sdb
 udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb

A simple fix for this is to realize that sync_blockdev() works fine if
you're not holding the mutex.  Also, it's not the end of the world if
you sync a little early (though it can have performance impacts).
Thus we can make a guess that we're going to need to do the sync and
then do it without holding the mutex.  We still do one last sync with
the mutex but it should be much, much faster.

With this, my hung task warnings for my test case are gone.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:42 +02:00
..
9p 9p: Fix memory leak in v9fs_mount 2020-08-19 08:15:06 +02:00
adfs fs/adfs: super: fix use-after-free bug 2019-08-06 19:06:49 +02:00
affs affs: fix basic permission bits to actually work 2020-09-09 19:04:30 +02:00
afs afs: Fix NULL deref in afs_dynroot_depopulate() 2020-08-26 10:31:05 +02:00
autofs autofs: fix a leak in autofs_expire_indirect() 2019-12-13 08:51:01 +01:00
befs fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
bfs bfs: add sanity check at bfs_fill_super() 2018-12-01 09:37:27 +01:00
btrfs btrfs: fix wrong address when faulting in pages in the search ioctl 2020-09-17 13:45:28 +02:00
cachefiles cachefiles: Fix race between read_waiter and read_copier involving op->to_do 2020-06-03 08:19:29 +02:00
ceph ceph: ensure we have a new cap before continuing in fill_inode 2020-10-01 13:14:31 +02:00
cifs CIFS: Properly process SMB3 lease breaks 2020-10-01 13:14:29 +02:00
coda coda: add error handling for fget 2019-08-06 19:06:51 +02:00
configfs configfs: fix config_item refcnt leak in configfs_rmdir() 2020-05-27 17:37:32 +02:00
cramfs Cramfs: fix abad comparison when wrap-arounds occur 2018-11-13 11:08:55 -08:00
crypto fscrypt: clean up some BUG_ON()s in block encryption/decryption 2019-07-26 09:14:02 +02:00
debugfs debugfs: fix use-after-free on symlink traversal 2019-05-08 07:21:48 +02:00
devpts fs/devpts: always delete dcache dentry-s in dput() 2019-03-23 20:09:59 +01:00
dlm dlm: Fix kobject memleak 2020-08-19 08:15:02 +02:00
ecryptfs ecryptfs: replace BUG_ON with error handling code 2020-02-28 16:38:59 +01:00
efivarfs efivars: Call guid_parse() against guid_t type of variable 2018-07-22 14:13:44 +02:00
efs
exofs exofs_mount(): fix leaks on failure exits 2019-12-05 09:20:32 +01:00
exportfs exportfs: fix 'passing zero to ERR_PTR()' warning 2020-01-27 14:50:02 +01:00
ext2 ext2: don't update mtime on COW faults 2020-09-09 19:04:28 +02:00
ext4 ext4: mark block bitmap corrupted when found instead of BUGON 2020-10-01 13:14:37 +02:00
f2fs f2fs: Return EOF on unaligned end of file DIO read 2020-09-23 12:10:59 +02:00
fat fat: don't allow to mount if the FAT length == 0 2020-06-22 09:05:08 +02:00
freevxfs freevxfs_lookup(): use d_splice_alias() 2018-05-22 14:27:51 -04:00
fscache fscache: fix race between enablement and dropping of object 2018-12-17 09:24:40 +01:00
fuse fuse: fix weird page warning 2020-07-29 10:16:46 +02:00
gfs2 gfs2: clean up iopen glock mess in gfs2_create_inode 2020-10-01 13:14:28 +02:00
hfs fs/hfs/extent.c: fix array out of bounds read of array extent 2019-12-01 09:17:10 +01:00
hfsplus hfsplus: fix crash and filesystem corruption when deleting files 2020-04-17 10:48:52 +02:00
hostfs vfs: discard ATTR_ATTR_FLAG 2018-08-17 16:20:28 -07:00
hpfs hpfs: remove unnecessary checks on the value of r when assigning error code 2018-08-25 12:42:33 -07:00
hugetlbfs hugetlb: use same fault hash key for shared and private mappings 2019-05-22 07:37:40 +02:00
isofs isofs: reject hardware sector size > 2048 bytes 2018-08-21 11:37:41 +02:00
jbd2 jbd2: abort journal if free a async write error metadata buffer 2020-09-03 11:24:24 +02:00
jffs2 jffs2: fix UAF problem 2020-08-26 10:31:01 +02:00
jfs jfs: fix bogus variable self-initialization 2020-01-27 14:50:33 +01:00
kernfs kernfs: fix ino wrap-around detection 2019-12-13 08:52:43 +01:00
lockd lockd: fix decoding of TEST results 2019-12-13 08:51:59 +01:00
minix fs/minix: remove expected error message in block_to_path() 2020-08-21 11:05:38 +02:00
nfs NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests() 2020-10-01 13:14:41 +02:00
nfs_common
nfsd nfsd: Don't add locks to closed or closing open stateids 2020-10-01 13:14:38 +02:00
nilfs2 nilfs2: fix null pointer dereference at nilfs_segctor_do_construct() 2020-06-22 09:05:03 +02:00
nls
notify fanotify: fix ignore mask logic for events on child and on dir 2020-06-30 23:17:00 -04:00
ntfs ntfs: mft: remove VLA usage 2018-08-17 16:20:27 -07:00
ocfs2 ocfs2: change slot number type s16 to u16 2020-08-21 11:05:33 +02:00
omfs omfs_lookup(): report IO errors, use d_splice_alias() 2018-05-22 14:27:58 -04:00
openpromfs openpromfs: switch to d_splice_alias() 2018-05-22 14:27:57 -04:00
orangefs help_next should increase position index 2020-02-24 08:34:53 +01:00
overlayfs ovl: fix unneeded call to ovl_change_flags() 2020-07-22 09:32:10 +02:00
proc proc: Use new_inode not new_inode_pseudo 2020-06-22 09:05:06 +02:00
pstore pstore: Fix linking when crypto API disabled 2020-08-19 08:15:04 +02:00
qnx4 qnx4_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
qnx6 qnx6_lookup: switch to d_splice_alias() 2018-05-22 14:27:54 -04:00
quota fs: avoid softlockups in s_inodes iterators 2020-01-12 12:17:20 +01:00
ramfs
reiserfs reiserfs: prevent NULL pointer dereference in reiserfs_insert_item() 2020-02-24 08:34:52 +01:00
romfs romfs: fix uninitialized memory leak in romfs_dev_read() 2020-08-26 10:30:59 +02:00
squashfs Squashfs: Compute expected length from inode size rather than block length 2018-08-02 09:34:02 -07:00
sysfs Driver core patches for 4.19-rc1 2018-08-18 11:44:53 -07:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-12-17 09:24:30 +01:00
tracefs tracefs: Annotate tracefs_ops with __ro_after_init 2018-07-31 11:32:44 -04:00
ubifs ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len 2020-10-01 13:14:40 +02:00
udf udf: Fix free space reporting for metadata and virtual partitions 2020-02-24 08:34:45 +01:00
ufs fs/ufs: avoid potential u32 multiplication overflow 2020-08-21 11:05:38 +02:00
xfs xfs: mark dir corrupt when lookup-by-hash fails 2020-10-01 13:14:37 +02:00
aio.c aio: fix async fsync creds 2020-06-22 09:05:01 +02:00
anon_inodes.c anon_inode_getfile(): switch to alloc_file_pseudo() 2018-07-12 10:04:27 -04:00
attr.c fs: Fix attr.c kernel-doc 2018-07-03 16:44:45 -04:00
bad_inode.c get rid of 'opened' argument of ->atomic_open() - part 3 2018-07-12 10:04:20 -04:00
binfmt_aout.c
binfmt_elf.c fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info() 2020-06-03 08:19:41 +02:00
binfmt_elf_fdpic.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
binfmt_em86.c
binfmt_flat.c fs/binfmt_flat.c: make load_flat_shared_library() work 2019-07-03 13:14:44 +02:00
binfmt_misc.c turn filp_clone_open() into inline wrapper for dentry_open() 2018-07-10 23:29:03 -04:00
binfmt_script.c exec: load_script: Do not exec truncated interpreter path 2019-11-06 13:05:37 +01:00
block_dev.c bdev: Reduce time holding bd_mutex in sync in blkdev_close() 2020-10-01 13:14:42 +02:00
buffer.c fs: prevent BUG_ON in submit_bh_wbc() 2020-09-03 11:24:24 +02:00
char_dev.c chardev: Avoid potential use-after-free in 'chrdev_open()' 2020-01-14 20:06:57 +01:00
compat.c ncpfs: remove compat functionality 2018-06-05 19:23:26 +02:00
compat_binfmt_elf.c
compat_ioctl.c fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP 2020-01-09 10:19:07 +01:00
coredump.c coredump: fix crash when umh is disabled 2020-05-14 07:57:21 +02:00
d_path.c
dax.c dax: pass NOWAIT flag to iomap_apply 2020-03-05 16:42:12 +01:00
dcache.c fix dget_parent() fastpath race 2020-10-01 13:14:27 +02:00
dcookies.c
direct-io.c direct-io: allow direct writes to empty inodes 2019-03-05 17:58:50 +01:00
drop_caches.c fs: avoid softlockups in s_inodes iterators 2020-01-12 12:17:20 +01:00
eventfd.c eventfd: track eventfd_signal() recursion depth 2020-02-11 04:34:08 -08:00
eventpoll.c fix regression in "epoll: Keep a reference on files added to the check list" 2020-09-09 19:04:27 +02:00
exec.c exec: Move would_dump into flush_old_exec 2020-05-20 08:18:50 +02:00
fcntl.c signal: Don't send signals to tasks that don't exist 2018-08-15 23:03:20 -05:00
fhandle.c
file.c fix multiplication overflow in copy_fdtable() 2020-05-27 17:37:29 +02:00
file_table.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
filesystems.c fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() 2020-04-17 10:48:51 +02:00
fs-writeback.c writeback: Fix sync livelock due to b_dirty_time processing 2020-09-03 11:24:28 +02:00
fs_pin.c
fs_struct.c
inode.c futex: Fix inode life-time issue 2020-03-25 08:06:14 +01:00
internal.h acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
ioctl.c vfs: fix FIGETBSZ ioctl on an overlayfs file 2018-11-21 09:19:14 +01:00
iomap.c iomap: partially revert 4721a601099 (simulated directio short read on EFAULT) 2019-12-13 08:52:56 +01:00
Kconfig autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
Kconfig.binfmt kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
libfs.c libfs: fix infoleak in simple_attr_read() 2020-04-02 15:28:21 +02:00
locks.c locks: print unsigned ino in /proc/locks 2020-01-09 10:19:00 +01:00
Makefile autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
mbcache.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
mount.h
mpage.c mpage: mpage_readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
namei.c namei: only return -ECHILD from follow_dotdot_rcu() 2020-03-05 16:42:20 +01:00
namespace.c fs/namespace.c: fix mountpoint reference counter race 2020-04-29 16:31:26 +02:00
no-block.c
nsfs.c dcache: sort the freeing-without-RCU-delay mess for good. 2019-05-25 18:23:26 +02:00
open.c cifs_atomic_open(): fix double-put on late allocation failure 2020-03-18 07:14:21 +01:00
pipe.c fs: prevent page refcount overflow in pipe_buf_get 2019-05-04 09:20:11 +02:00
pnode.c propagate_one(): mnt_set_mountpoint() needs mount_lock 2020-05-02 17:26:01 +02:00
pnode.h
posix_acl.c
proc_namespace.c
read_write.c vfs: avoid problematic remapping requests into partial EOF block 2019-12-01 09:17:04 +01:00
readdir.c filldir[64]: remove WARN_ON_ONCE() for bad directory entries 2020-01-04 19:13:26 +01:00
select.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
seq_file.c seq_file: fix problem when seeking mid-record 2019-08-25 10:47:43 +02:00
signalfd.c fs/signalfd.c: fix inconsistent return codes for signalfd4 2020-08-26 10:31:02 +02:00
splice.c splice: only read in as much information as there is pipe buffer space 2019-12-17 20:35:43 +01:00
stack.c
stat.c
statfs.c vfs: Fix EOVERFLOW testing in put_compat_statfs64 2019-10-11 18:21:39 +02:00
super.c Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax 2018-08-26 11:48:42 -07:00
sync.c
timerfd.c Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 20:56:23 -07:00
userfaultfd.c userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK 2020-01-04 19:13:18 +01:00
utimes.c
xattr.c xattr: break delegations in {set,remove}xattr 2020-08-11 15:32:34 +02:00