kernel-fxtec-pro1x

Author	SHA1	Message	Date
Greg Kroah-Hartman	af4136af6b	This is the 4.19.145 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl9ctJoACgkQONu9yGCS aT408xAAgPLoJt28UWiyWqcMQyQgvZQVJ8Y5nD66pZ53PtcY2V7MMLbeXFY8I8Xy U0p5HnGEHeYUHsXcZo7nNlkg+aBFZw9kBWne9OExPWw4NTjGmUXs2AusRSFkEY22 sLYPXwnStOZzyuakAPRJ2Q3EEz2+tdhZiTMOeAT1BKqcLfNBebsZjNgc7i4ogFty UaGYnrhDzYcj0jf1uzPNeANXqK2czvbB1d/KLdUKINvaCaSpw65xfjhfjciAlRTH 3uGINBAOw5npYMBK3nNW6hgNNeyKvVVRbZf239x7d6xzQnVNfEgAdQ1Cg2lbTgDB M9I8uO5jRM8i3mKNdjffuBnfagCJwlVs56G6Unp8dt8f8vOncVZ+B7hTPLaCUPN/ LVbF4QYGWXBE7V8VVtPMoKz1V5FgTTeshQsRmu6EXihxQK2HgPnV2HUyfkt132kI arWkeW3Qsl2TmGBk0nCXbikWAp2E6ToiC2dX0MWsyA2US6DqaZvIyaqmI6E9QCPM o+PJwzePcUD7CUnOqxWN6OHulQ9FNBRs0BUdxwcNIU4YO8ne6lx2rh95MtQk4Bvf qEFYZFHZDLtJojsY5bOQ9z+ChowKyMDcZVY+/X+C1PEDq7cjlwdLfJWoF5PIltb9 /OjQH3Ox9yBw1ECA4F4ObcnTEs8bd8/VnfMayLpMhVrlh5kLAAY= =hMRJ -----END PGP SIGNATURE----- Merge 4.19.145 into android-4.19-stable Changes in 4.19.145 ALSA; firewire-tascam: exclude Tascam FE-8 from detection block: ensure bdi->io_pages is always initialized netlabel: fix problems with mapping removal net: usb: dm9601: Add USB ID of Keenetic Plus DSL sctp: not disable bh in the whole sctp_get_port_local() tipc: fix shutdown() of connectionless socket net: disable netpoll on fresh napis net/mlx5e: Don't support phys switch id if not in switchdev mode Linux 4.19.145 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I84b193f4c87a87efbba84219856368d3cfba907b	2020-09-12 14:23:25 +02:00
Jens Axboe	732fd460bb	block: ensure bdi->io_pages is always initialized [ Upstream commit de1b0ee490eafdf65fac9eef9925391a8369f2dc ] If a driver leaves the limit settings as the defaults, then we don't initialize bdi->io_pages. This means that file systems may need to work around bdi->io_pages == 0, which is somewhat messy. Initialize the default value just like we do for ->ra_pages. Cc: stable@vger.kernel.org Fixes: `9491ae4aad` ("mm: don't cap request size based on read-ahead setting") Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-09-12 13:40:22 +02:00
Satya Tangirala	b01c73ea71	BACKPORT: FROMLIST: Update Inline Encryption from v5 to v6 of patch series Changes v5 => v6: - Blk-crypto's kernel crypto API fallback is no longer restricted to 8-byte DUNs. It's also now separately configurable from blk-crypto, and can be disabled entirely, while still allowing the kernel to use inline encryption hardware. Further, struct bio_crypt_ctx takes up less space, and no longer contains the information needed by the crypto API fallback - the fallback allocates the required memory when necessary. - Blk-crypto now supports all file content encryption modes supported by fscrypt. - Fixed bio merging logic in blk-merge.c - Fscrypt now supports inline encryption with the direct key policy, since blk-crypto now has support for larger DUNs. - Keyslot manager now uses a hashtable to lookup which keyslot contains any particular key (thanks Eric!) - Fscrypt support for inline encryption now handles filesystems with multiple underlying block devices (thanks Eric!) - Numerous cleanups Bug: 137270441 Test: refer to I26376479ee38259b8c35732cb3a1d7e15f9b05a3 Change-Id: I13e2e327e0b4784b394cb1e7cf32a04856d95f01 Link: https://lore.kernel.org/linux-block/20191218145136.172774-1-satyat@google.com/ Signed-off-by: Satya Tangirala <satyat@google.com>	2020-01-13 07:11:38 -08:00
Greg Kroah-Hartman	2700cf837e	This is the 4.19.87 stable release -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3jdysACgkQONu9yGCS aT49mhAAxl50oRh0bSk/SikbVCn7kRaoRNlCBsPtvtvbDDIpyREIzMRmIbH2OZjS ji1umUu8iMj+HXW72dWNqPo2K337UzcNRsUXWAdJH8Et+ao+xOUV1jps/Zr3D9ca 2Cw6iTn2QFhKQythMlCxb30sf+kN4cAU1XY3M8xEWXB+4nAqc/aFW7mRAP1jusRb 1DAW+xqPiwCbaag1v5OzumAOGBhpmTcX8sfEYM+3DKcPgGL1jPyYeWlXA26nih8/ LQR6r2tAb454pipV0uApJ2u7V5nNxprcrfUNDmAfap2q/eF1w5pBbZoS5sqpf1eZ 2ycZ36w0ThE7lJKvNrfjq13Su+bGtpENxHlwesNPbsvz0F4xoEkelYSCE1gJBaHX CZvq1Dhrk5DBvkCCElV8+CJxxuhMUZwzOwrz2iBLdPnpCpSgj8uNPLXMJno6D9fH PZMCcBFf4WnCUBc06fB7qG+Z7y0TeAuLsNQmK3zYQoEc1gz4Yk5aFo7NgTyajqbD YKINVP2Wj11TDBssolIA58EYcc2J38As54wuOvYtwy+k/mVkvZVhCPOtI+h3UYm4 lX2ROCHTzt+Av5qFlM8aSBcIlm1qihzSHEdnyqX2EZvUGC4C5Mc5/Eml3QJxAnVh SzUfLZGzzfntpnWn2cJZ/EA/p6hXujG5k95LEwtAnxxYBFpLTKc= =d8kA -----END PGP SIGNATURE----- Merge 4.19.87 into android-4.19 Changes in 4.19.87 mlxsw: spectrum_router: Fix determining underlay for a GRE tunnel net/mlx4_en: fix mlx4 ethtool -N insertion net/mlx4_en: Fix wrong limitation for number of TX rings net: rtnetlink: prevent underflows in do_setvfinfo() net/sched: act_pedit: fix WARN() in the traffic path net: sched: ensure opts_len <= IP_TUNNEL_OPTS_MAX in act_tunnel_key sfc: Only cancel the PPS workqueue if it exists net/mlx5e: Fix set vf link state error flow net/mlxfw: Verify FSM error code translation doesn't exceed array size net/mlx5: Fix auto group size calculation vhost/vsock: split packets to send using multiple buffers gpio: max77620: Fixup debounce delays tools: gpio: Correctly add make dependencies for gpio_utils nbd:fix memory leak in nbd_get_socket() virtio_console: allocate inbufs in add_port() only if it is needed Revert "fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()" mm/ksm.c: don't WARN if page is still mapped in remove_stable_node() drm/amd/powerplay: issue no PPSMC_MSG_GetCurrPkgPwr on unsupported ASICs drm/i915/pmu: "Frequency" is reported as accumulated cycles drm/i915/userptr: Try to acquire the page lock around set_page_dirty() mwifiex: Fix NL80211_TX_POWER_LIMITED ALSA: isight: fix leak of reference to firewire unit in error path of .probe callback crypto: testmgr - fix sizeof() on COMP_BUF_SIZE printk: lock/unlock console only for new logbuf entries printk: fix integer overflow in setup_log_buf() pinctrl: madera: Fix uninitialized variable bug in madera_mux_set_mux PCI: cadence: Write MSI data with 32bits gfs2: Fix marking bitmaps non-full pty: fix compat ioctls synclink_gt(): fix compat_ioctl() powerpc: Fix signedness bug in update_flash_db() powerpc/boot: Fix opal console in boot wrapper powerpc/boot: Disable vector instructions powerpc/eeh: Fix null deref for devices removed during EEH powerpc/eeh: Fix use of EEH_PE_KEEP on wrong field EDAC, thunderx: Fix memory leak in thunderx_l2c_threaded_isr() mt76: do not store aggregation sequence number for null-data frames mt76x0: phy: fix restore phase in mt76x0_phy_recalibrate_after_assoc brcmsmac: AP mode: update beacon when TIM changes ath10k: set probe request oui during driver start ath10k: allocate small size dma memory in ath10k_pci_diag_write_mem skd: fixup usage of legacy IO API cdrom: don't attempt to fiddle with cdo->capability spi: sh-msiof: fix deferred probing mmc: mediatek: fill the actual clock for mmc debugfs mmc: mediatek: fix cannot receive new request when msdc_cmd_is_ready fail PCI: mediatek: Fix class type for MT7622 to PCI_CLASS_BRIDGE_PCI btrfs: defrag: use btrfs_mod_outstanding_extents in cluster_pages_for_defrag btrfs: handle error of get_old_root gsmi: Fix bug in append_to_eventlog sysfs handler misc: mic: fix a DMA pool free failure w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for unsigned, count for max size). m68k: fix command-line parsing when passed from u-boot scsi: hisi_sas: Feed back linkrate(max/min) when re-attached scsi: hisi_sas: Fix the race between IO completion and timeout for SMP/internal IO scsi: hisi_sas: Free slot later in slot_complete_vx_hw() RDMA/bnxt_re: Avoid NULL check after accessing the pointer RDMA/bnxt_re: Fix qp async event reporting RDMA/bnxt_re: Avoid resource leak in case the NQ registration fails pinctrl: sunxi: Fix a memory leak in 'sunxi_pinctrl_build_state()' pwm: lpss: Only set update bit if we are actually changing the settings amiflop: clean up on errors during setup qed: Align local and global PTT to propagate through the APIs. scsi: ips: fix missing break in switch nfp: bpf: protect against mis-initializing atomic counters KVM: nVMX: reset cache/shadows when switching loaded VMCS KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode() KVM/x86: Fix invvpid and invept register operand size in 64-bit mode clk: tegra: Fixes for MBIST work around scsi: isci: Use proper enumerated type in atapi_d2h_reg_frame_handler scsi: isci: Change sci_controller_start_task's return type to sci_status scsi: bfa: Avoid implicit enum conversion in bfad_im_post_vendor_event scsi: iscsi_tcp: Explicitly cast param in iscsi_sw_tcp_host_get_param crypto: ccree - avoid implicit enum conversion nvmet: avoid integer overflow in the discard code nvmet-fcloop: suppress a compiler warning nvme-pci: fix hot removal during error handling PCI: mediatek: Fixup MSI enablement logic by enabling MSI before clocks clk: mmp2: fix the clock id for sdh2_clk and sdh3_clk clk: at91: audio-pll: fix audio pmc type ASoC: tegra_sgtl5000: fix device_node refcounting scsi: dc395x: fix dma API usage in srb_done scsi: dc395x: fix DMA API usage in sg_update_list scsi: zorro_esp: Limit DMA transfers to 65535 bytes net: dsa: mv88e6xxx: Fix 88E6141/6341 2500mbps SERDES speed net: fix warning in af_unix net: ena: Fix Kconfig dependency on X86 xfs: fix use-after-free race in xfs_buf_rele xfs: clear ail delwri queued bufs on unmount of shutdown fs kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack ACPI / scan: Create platform device for INT33FE ACPI nodes PM / Domains: Deal with multiple states but no governor in genpd ALSA: i2c/cs8427: Fix int to char conversion macintosh/windfarm_smu_sat: Fix debug output PCI: vmd: Detach resources after stopping root bus USB: misc: appledisplay: fix backlight update_status return code usbip: tools: fix atoi() on non-null terminated string sctp: use sk_wmem_queued to check for writable space dm raid: avoid bitmap with raid4/5/6 journal device selftests/bpf: fix file resource leak in load_kallsyms SUNRPC: Fix a compile warning for cmpxchg64() sunrpc: safely reallow resvport min/max inversion atm: zatm: Fix empty body Clang warnings s390/perf: Return error when debug_register fails swiotlb: do not panic on mapping failures spi: omap2-mcspi: Set FIFO DMA trigger level to word length x86/intel_rdt: Prevent pseudo-locking from using stale pointers sparc: Fix parport build warnings. scsi: hisi_sas: Fix NULL pointer dereference powerpc/pseries: Export raw per-CPU VPA data via debugfs powerpc/mm/radix: Fix off-by-one in split mapping logic powerpc/mm/radix: Fix overuse of small pages in splitting logic powerpc/mm/radix: Fix small page at boundary when splitting powerpc/64s/radix: Fix radix__flush_tlb_collapsed_pmd double flushing pmd selftests/bpf: fix return value comparison for tests in test_libbpf.sh tools: bpftool: fix completion for "bpftool map update" ceph: fix dentry leak in ceph_readdir_prepopulate ceph: only allow punch hole mode in fallocate rtc: s35390a: Change buf's type to u8 in s35390a_init RISC-V: Avoid corrupting the upper 32-bit of phys_addr_t in ioremap thermal: armada: fix a test in probe() f2fs: fix to spread clear_cold_data() f2fs: spread f2fs_set_inode_flags() mISDN: Fix type of switch control variable in ctrl_teimanager qlcnic: fix a return in qlcnic_dcb_get_capability() net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode mfd: arizona: Correct calling of runtime_put_sync mfd: mc13xxx-core: Fix PMIC shutdown when reading ADC values mfd: intel_soc_pmic_bxtwc: Chain power button IRQs as well mfd: max8997: Enale irq-wakeup unconditionally net: socionext: Stop PHY before resetting netsec fs/cifs: fix uninitialised variable warnings spi: uniphier: fix incorrect property items selftests/ftrace: Fix to test kprobe $comm arg only if available selftests: watchdog: fix message when /dev/watchdog open fails selftests: watchdog: Fix error message. selftests: kvm: Fix -Wformat warnings selftests: fix warning: "_GNU_SOURCE" redefined thermal: rcar_thermal: fix duplicate IRQ request thermal: rcar_thermal: Prevent hardware access during system suspend net: ethernet: cadence: fix socket buffer corruption problem bpf: devmap: fix wrong interface selection in notifier_call bpf, btf: fix a missing check bug in btf_parse powerpc/process: Fix flush_all_to_thread for SPE sparc64: Rework xchg() definition to avoid warnings. arm64: lib: use C string functions with KASAN enabled fs/ocfs2/dlm/dlmdebug.c: fix a sleep-in-atomic-context bug in dlm_print_one_mle() mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition macsec: update operstate when lower device changes macsec: let the administrator set UP state even if lowerdev is down block: fix the DISCARD request merge i2c: uniphier-f: make driver robust against concurrency i2c: uniphier-f: fix occasional timeout error i2c: uniphier-f: fix race condition when IRQ is cleared um: Make line/tty semantics use true write IRQ vfs: avoid problematic remapping requests into partial EOF block ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12 powerpc/xmon: Relax frame size for clang selftests/powerpc/ptrace: Fix out-of-tree build selftests/powerpc/signal: Fix out-of-tree build selftests/powerpc/switch_endian: Fix out-of-tree build selftests/powerpc/cache_shape: Fix out-of-tree build block: call rq_qos_exit() after queue is frozen mm/gup_benchmark.c: prevent integer overflow in ioctl linux/bitmap.h: handle constant zero-size bitmaps correctly linux/bitmap.h: fix type of nbits in bitmap_shift_right() lib/bitmap.c: fix remaining space computation in bitmap_print_to_pagebuf hfsplus: fix BUG on bnode parent update hfs: fix BUG on bnode parent update hfsplus: prevent btree data loss on ENOSPC hfs: prevent btree data loss on ENOSPC hfsplus: fix return value of hfsplus_get_block() hfs: fix return value of hfs_get_block() hfsplus: update timestamps on truncate() hfs: update timestamp on truncate() fs/hfs/extent.c: fix array out of bounds read of array extent kernel/panic.c: do not append newline to the stack protector panic string mm/memory_hotplug: make add_memory() take the device_hotplug_lock mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock powerpc/powernv: hold device_hotplug_lock when calling device_online() igb: shorten maximum PHC timecounter update interval fm10k: ensure completer aborts are marked as non-fatal after a resume net: hns3: bugfix for buffer not free problem during resetting net: hns3: bugfix for reporting unknown vector0 interrupt repeatly problem net: hns3: bugfix for is_valid_csq_clean_head() net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read ntb_netdev: fix sleep time mismatch ntb: intel: fix return value for ndev_vec_mask() irq/matrix: Fix memory overallocation nvme-pci: fix conflicting p2p resource adds arm64: makefile fix build of .i file in external module case tools/power turbosat: fix AMD APIC-id output mm: handle no memcg case in memcg_kmem_charge() properly ocfs2: without quota support, avoid calling quota recovery ocfs2: don't use iocb when EIOCBQUEUED returns ocfs2: don't put and assigning null to bh allocated outside ocfs2: fix clusters leak in ocfs2_defrag_extent() net: do not abort bulk send on BQL status sched/topology: Fix off by one bug sched/fair: Don't increase sd->balance_interval on newidle balance openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS ARM: dts: imx6sx-sdb: Fix enet phy regulator clk: sunxi-ng: enable so-said LDOs for A64 SoC's pll-mipi clock soc: bcm: brcmstb: Fix re-entry point with a THUMB2_KERNEL audit: print empty EXECVE args sock_diag: fix autoloading of the raw_diag module net: bpfilter: fix iptables failure if bpfilter_umh is disabled nds32: Fix bug in bitfield.h media: ov13858: Check for possible null pointer btrfs: avoid link error with CONFIG_NO_AUTO_INLINE wil6210: fix debugfs memory access alignment wil6210: fix L2 RX status handling wil6210: fix RGF_CAF_ICR address for Talyn-MB wil6210: fix locking in wmi_call ath10k: snoc: fix unbalanced clock error handling wlcore: Fix the return value in case of error in 'wlcore_vendor_cmd_smart_config_start()' rtl8xxxu: Fix missing break in switch brcmsmac: never log "tid x is not agg'able" by default wireless: airo: potential buffer overflow in sprintf() rtlwifi: rtl8192de: Fix misleading REG_MCUFWDL information net: dsa: bcm_sf2: Turn on PHY to allow successful registration scsi: mpt3sas: Fix Sync cache command failure during driver unload scsi: mpt3sas: Don't modify EEDPTagMode field setting on SAS3.5 HBA devices scsi: mpt3sas: Fix driver modifying persistent data in Manufacturing page11 scsi: megaraid_sas: Fix msleep granularity scsi: megaraid_sas: Fix goto labels in error handling scsi: lpfc: fcoe: Fix link down issue after 1000+ link bounces scsi: lpfc: Fix odd recovery in duplicate FLOGIs in point-to-point scsi: lpfc: Correct loss of fc4 type on remote port address change usb: typec: tcpm: charge current handling for sink during hard reset dlm: fix invalid free dlm: don't leak kernel pointer to userspace vrf: mark skb for multicast or link-local as enslaved to VRF clk: tegra20: Turn EMC clock gate into divider ACPICA: Use %d for signed int print formatting instead of %u net: bcmgenet: return correct value 'ret' from bcmgenet_power_down of: unittest: allow base devicetree to have symbol metadata of: unittest: initialize args before calling of_parse_() tools: bpftool: pass an argument to silence open_obj_pinned() cfg80211: Prevent regulatory restore during STA disconnect in concurrent interfaces pinctrl: qcom: spmi-gpio: fix gpio-hog related boot issues pinctrl: bcm2835: Use define directive for BCM2835_PINCONF_PARAM_PULL pinctrl: lpc18xx: Use define directive for PIN_CONFIG_GPIO_PIN_INT pinctrl: zynq: Use define directive for PIN_CONFIG_IO_STANDARD PCI: keystone: Use quirk to limit MRRS for K2G nvme-pci: fix surprise removal spi: omap2-mcspi: Fix DMA and FIFO event trigger size mismatch i2c: uniphier-f: fix timeout error after reading 8 bytes mm/memory_hotplug: Do not unlock when fails to take the device_hotplug_lock ipv6: Fix handling of LLA with VRF and sockets bound to VRF cfg80211: call disconnect_wk when AP stops mm/page_io.c: do not free shared swap slots Bluetooth: Fix invalid-free in bcsp_close() KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved ath10k: Fix a NULL-ptr-deref bug in ath10k_usb_alloc_urb_from_pipe ath9k_hw: fix uninitialized variable data md/raid10: prevent access of uninitialized resync_pages offset mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span() net: phy: dp83867: fix speed 10 in sgmii mode net: phy: dp83867: increase SGMII autoneg timer duration ocfs2: remove ocfs2_is_o2cb_active() ARM: 8904/1: skip nomap memblocks while finding the lowmem/highmem boundary ARC: perf: Accommodate big-endian CPU x86/insn: Fix awk regexp warnings x86/speculation: Fix incorrect MDS/TAA mitigation status x86/speculation: Fix redundant MDS mitigation message nbd: prevent memory leak y2038: futex: Move compat implementation into futex.c futex: Prevent robust futex exit race ALSA: usb-audio: Fix NULL dereference at parsing BADD nfc: port100: handle command failure cleanly media: vivid: Set vid_cap_streaming and vid_out_streaming to true media: vivid: Fix wrong locking that causes race conditions on streaming stop media: usbvision: Fix races among open, close, and disconnect cpufreq: Add NULL checks to show() and store() methods of cpufreq media: uvcvideo: Fix error path in control parsing failure media: b2c2-flexcop-usb: add sanity checking media: cxusb: detect cxusb_ctrl_msg error in query media: imon: invalid dereference in imon_touch_event virtio_ring: fix return code on DMA mapping fails USBIP: add config dependency for SGL_ALLOC usbip: tools: fix fd leakage in the function of read_attr_usbip_status usbip: Fix uninitialized symbol 'nents' in stub_recv_cmd_submit() usb-serial: cp201x: support Mark-10 digital force gauge USB: chaoskey: fix error case of a timeout appledisplay: fix error handling in the scheduled work USB: serial: mos7840: add USB ID to support Moxa UPort 2210 USB: serial: mos7720: fix remote wakeup USB: serial: mos7840: fix remote wakeup USB: serial: option: add support for DW5821e with eSIM support USB: serial: option: add support for Foxconn T77W968 LTE modules staging: comedi: usbduxfast: usbduxfast_ai_cmdtest rounding error powerpc/64s: support nospectre_v2 cmdline option powerpc/book3s64: Fix link stack flush on context switch KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel PM / devfreq: Fix kernel oops on governor module load Linux 4.19.87 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Id8c8d4cd92227f8f46c48c05440e09da957fa687	2019-12-01 09:53:43 +01:00
Ming Lei	9663d294ae	block: call rq_qos_exit() after queue is frozen [ Upstream commit c57cdf7a9e51d97a43e29b8f4a04157875104000 ] rq_qos_exit() removes the current q->rq_qos, this action has to be done after queue is frozen, otherwise the IO queue path may never be waken up, then IO hang is caused. So fixes this issue by moving rq_qos_exit() after queue is frozen. Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-01 09:17:06 +01:00
Satya Tangirala	392ad89e96	BACKPORT: FROMLIST: block: blk-crypto for Inline Encryption We introduce blk-crypto, which manages programming keyslots for struct bios. With blk-crypto, filesystems only need to call bio_crypt_set_ctx with the encryption key, algorithm and data_unit_num; they don't have to worry about getting a keyslot for each encryption context, as blk-crypto handles that. Blk-crypto also makes it possible for layered devices like device mapper to make use of inline encryption hardware. Blk-crypto delegates crypto operations to inline encryption hardware when available, and also contains a software fallback to the kernel crypto API. For more details, refer to Documentation/block/inline-encryption.rst. Bug: 137270441 Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec Change-Id: I6a98e518e5de50f1d4110441568ecd142a02e900 Signed-off-by: Satya Tangirala <satyat@google.com> Link: https://patchwork.kernel.org/patch/11214731/	2019-11-14 14:47:49 -08:00
Satya Tangirala	20efc30a3e	BACKPORT: FROMLIST: block: Add encryption context to struct bio We must have some way of letting a storage device driver know what encryption context it should use for en/decrypting a request. However, it's the filesystem/fscrypt that knows about and manages encryption contexts. As such, when the filesystem layer submits a bio to the block layer, and this bio eventually reaches a device driver with support for inline encryption, the device driver will need to have been told the encryption context for that bio. We want to communicate the encryption context from the filesystem layer to the storage device along with the bio, when the bio is submitted to the block layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can represent an encryption context (note that we can't use the bi_private field in struct bio to do this because that field does not function to pass information across layers in the storage stack). We also introduce various functions to manipulate the bio_crypt_ctx and make the bio/request merging logic aware of the bio_crypt_ctx. Bug: 137270441 Test: tested as series; see I26aac0ac7845a9064f28bb1421eb2522828a6dec Change-Id: I16d99bb97f8cd7971cc11281a0d7120c5f87d83c Signed-off-by: Satya Tangirala <satyat@google.com> Link: https://patchwork.kernel.org/patch/11214719/	2019-11-14 14:47:49 -08:00
Johannes Weiner	1bae2175c2	BACKPORT: block: annotate refault stalls from IO submission psi tracks the time tasks wait for refaulting pages to become uptodate, but it does not track the time spent submitting the IO. The submission part can be significant if backing storage is contended or when cgroup throttling (io.latency) is in effect - a lot of time is spent in submit_bio(). In that case, we underreport memory pressure. Annotate submit_bio() to account submission time as memory stall when the bio is reading userspace workingset pages. Tested-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit b8e24a9300b0836a9d39f6b20746766b3b81f1bd) Conflicts: include/linux/blk_types.h (1. Manually resolved BIO_WORKINGSET being definition instead of enum.) Bug: 141131229 Test: boot and run act test suite Change-Id: I99cef039844e219f1dc8196feead54b6f5fb26bb Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-10-07 18:04:48 +00:00
Jianchao Wang	313efb253d	blk-mq: change gfp flags to GFP_NOIO in blk_mq_realloc_hw_ctxs [ Upstream commit 5b202853ffbc54b29f23c4b1b5f3948efab489a2 ] blk_mq_realloc_hw_ctxs could be invoked during update hw queues. At the momemt, IO is blocked. Change the gfp flags from GFP_KERNEL to GFP_NOIO to avoid forever hang during memory allocation in blk_mq_realloc_hw_ctxs. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-10-01 08:26:10 +02:00
Ming Lei	e238e6dc22	blk-mq: free hw queue's resource in hctx's release handler [ Upstream commit c7e2d94b3d1634988a95ac4d77a72dc7487ece06 ] Once blk_cleanup_queue() returns, tags shouldn't be used any more, because blk_mq_free_tag_set() may be called. Commit `45a9c9d909` ("blk-mq: Fix a use-after-free") fixes this issue exactly. However, that commit introduces another issue. Before `45a9c9d909`, we are allowed to run queue during cleaning up queue if the queue's kobj refcount is held. After that commit, queue can't be run during queue cleaning up, otherwise oops can be triggered easily because some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue(). We have invented ways for addressing this kind of issue before, such as: 8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done") `c2856ae2f3` ("blk-mq: quiesce queue before freeing queue") But still can't cover all cases, recently James reports another such kind of issue: https://marc.info/?l=linux-scsi&m=155389088124782&w=2 This issue can be quite hard to address by previous way, given scsi_run_queue() may run requeues for other LUNs. Fixes the above issue by freeing hctx's resources in its release handler, and this way is safe becasue tags isn't needed for freeing such hctx resource. This approach follows typical design pattern wrt. kobject's release handler. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reported-by: James Smart <james.smart@broadcom.com> Fixes: `45a9c9d909` ("blk-mq: Fix a use-after-free") Cc: stable@vger.kernel.org Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-09-16 08:22:13 +02:00
Bart Van Assche	c58a650736	block, scsi: Change the preempt-only flag into a counter commit cd84a62e0078dce09f4ed349bec84f86c9d54b30 upstream. The RQF_PREEMPT flag is used for three purposes: - In the SCSI core, for making sure that power management requests are executed even if a device is in the "quiesced" state. - For domain validation by SCSI drivers that use the parallel port. - In the IDE driver, for IDE preempt requests. Rename "preempt-only" into "pm-only" because the primary purpose of this mode is power management. Since the power management core may but does not have to resume a runtime suspended device before performing system-wide suspend and since a later patch will set "pm-only" mode as long as a block device is runtime suspended, make it possible to set "pm-only" mode from more than one context. Since with this change scsi_device_quiesce() is no longer idempotent, make that function return early if it is called for a quiesced queue. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-08-04 09:30:57 +02:00
Josef Bacik	8a1a3d3839	block: init flush rq ref count to 1 [ Upstream commit b554db147feea39617b533ab6bca247c91c6198a ] We discovered a problem in newer kernels where a disconnect of a NBD device while the flush request was pending would result in a hang. This is because the blk mq timeout handler does if (!refcount_inc_not_zero(&rq->ref)) return true; to determine if it's ok to run the timeout handler for the request. Flush_rq's don't have a ref count set, so we'd skip running the timeout handler for this request and it would just sit there in limbo forever. Fix this by always setting the refcount of any request going through blk_init_rq() to 1. I tested this with a nbd-server that dropped flush requests to verify that it hung, and then tested with this patch to verify I got the timeout as expected and the error handling kicked in. Thanks, Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-07-31 07:27:07 +02:00
Guilherme G. Piccoli	c9d8d3e9d7	block: Fix a NULL pointer dereference in generic_make_request() ----------------------------------------------------------------- This patch is not on mainline and is meant to 4.19 stable only. After the patch description there's a reasoning about that. ----------------------------------------------------------------- Commit `37f9579f4c` ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash") introduced a NULL pointer dereference in generic_make_request(). The patch sets q to NULL and enter_succeeded to false; right after, there's an 'if (enter_succeeded)' which is not taken, and then the 'else' will dereference q in blk_queue_dying(q). This patch just moves the 'q = NULL' to a point in which it won't trigger the oops, although the semantics of this NULLification remains untouched. A simple test case/reproducer is as follows: a) Build kernel v4.19.56-stable with CONFIG_BLK_CGROUP=n. b) Create a raid0 md array with 2 NVMe devices as members, and mount it with an ext4 filesystem. c) Run the following oneliner (supposing the raid0 is mounted in /mnt): (dd of=/mnt/tmp if=/dev/zero bs=1M count=999 &); sleep 0.3; echo 1 > /sys/block/nvme1n1/device/device/remove (whereas nvme1n1 is the 2nd array member) This will trigger the following oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI RIP: 0010:generic_make_request+0x32b/0x400 Call Trace: submit_bio+0x73/0x140 ext4_io_submit+0x4d/0x60 ext4_writepages+0x626/0xe90 do_writepages+0x4b/0xe0 [...] This patch has no functional changes and preserves the md/raid0 behavior when a member is removed before kernel v4.17. ---------------------------- Why this is not on mainline? ---------------------------- The patch was originally submitted upstream in linux-raid and linux-block mailing-lists - it was initially accepted by Song Liu, but Christoph Hellwig[0] observed that there was a clean-up series ready to be accepted from Ming Lei[1] that fixed the same issue. The accepted patches from Ming's series in upstream are: commit 47cdee29ef9d ("block: move blk_exit_queue into __blk_release_queue") and commit fe2008640ae3 ("block: don't protect generic_make_request_checks with blk_queue_enter"). Those patches basically do a clean-up in the block layer involving: 1) Putting back blk_exit_queue() logic into __blk_release_queue(); that path was changed in the past and the logic from blk_exit_queue() was added to blk_cleanup_queue(). 2) Removing the guard/protection in generic_make_request_checks() with blk_queue_enter(). The problem with Ming's series for -stable is that it relies in the legacy request IO path removal. So it's "backport-able" to v5.0+, but doing that for early versions (like 4.19) would incur in complex code changes. Hence, it was suggested by Christoph and Song Liu that this patch was submitted to stable only; otherwise merging it upstream would add code to fix a path removed in a subsequent commit. [0] lore.kernel.org/linux-block/20190521172258.GA32702@infradead.org [1] lore.kernel.org/linux-block/20190515030310.20393-1-ming.lei@redhat.com Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Eric Ren <renzhengeek@gmail.com> Fixes: `37f9579f4c` ("blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash") Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-07-10 09:53:30 +02:00
Ming Lei	525b5265fd	blk-mq: move cancel of requeue_work into blk_mq_release [ Upstream commit fbc2a15e3433058582e5635aabe48a3011a644a8 ] With holding queue's kobject refcount, it is safe for driver to schedule requeue. However, blk_mq_kick_requeue_list() may be called after blk_sync_queue() is done because of concurrent requeue activities, then requeue work may not be completed when freeing queue, and kernel oops is triggered. So moving the cancel of requeue_work into blk_mq_release() for avoiding race between requeue and freeing queue. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-06-15 11:54:06 +02:00
Tetsuo Handa	96e4471d38	block: pass no-op callback to INIT_WORK(). [ Upstream commit 2e3c18d0ada16f145087b2687afcad1748c0827c ] syzbot is hitting flush_work() warning caused by commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without INIT_WORK().") [1]. Although that commit did not expect INIT_WORK(NULL) case, calling flush_work() without setting a valid callback should be avoided anyway. Fix this problem by setting a no-op callback instead of NULL. [1] https://syzkaller.appspot.com/bug?id=e390366bc48bc82a7c668326e0663be3b91cbd29 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: syzbot <syzbot+ba2a929dcf8e704c180e@syzkaller.appspotmail.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> [sl: rename blk_timeout_work] Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-05-08 07:21:51 +02:00
Ming Lei	410306a0f2	SCSI: fix queue cleanup race before queue initialization is done commit 8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a upstream. `c2856ae2f3` ("blk-mq: quiesce queue before freeing queue") has already fixed this race, however the implied synchronize_rcu() in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused performance regression. Then `1311326cf4` ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") tried to quiesce queue for avoiding unnecessary synchronize_rcu() only when queue initialization is done, because it is usual to see lots of inexistent LUNs which need to be probed. However, turns out it isn't safe to quiesce queue only when queue initialization is done. Because when one SCSI command is completed, the user of sending command can be waken up immediately, then the scsi device may be removed, meantime the run queue in scsi_end_request() is still in-progress, so kernel panic can be caused. In Red Hat QE lab, there are several reports about this kind of kernel panic triggered during kernel booting. This patch tries to address the issue by grabing one queue usage counter during freeing one request and the following run queue. Fixes: `1311326cf4` ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") Cc: Andrew Jones <drjones@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: stable <stable@vger.kernel.org> Cc: jianchao.wang <jianchao.w.wang@oracle.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-11-21 09:19:18 +01:00
Omar Sandoval	b57e99b4b8	block: use nanosecond resolution for iostat Klaus Kusche reported that the I/O busy time in /proc/diskstats was not updating properly on 4.18. This is because we started using ktime to track elapsed time, and we convert nanoseconds to jiffies when we update the partition counter. However, this gets rounded down, so any I/Os that take less than a jiffy are not accounted for. Previously in this case, the value of jiffies would sometimes increment while we were doing I/O, so at least some I/Os were accounted for. Let's convert the stats to use nanoseconds internally. We still report milliseconds as before, now more accurately than ever. The value is still truncated to 32 bits for backwards compatibility. Fixes: `522a777566` ("block: consolidate struct request timestamp fields") Cc: stable@vger.kernel.org Reported-by: Klaus Kusche <klaus.kusche@computerix.info> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-21 20:26:59 -06:00
Mikulas Patocka	8b2ded1c94	block: don't warn when doing fsync on read-only devices It is possible to call fsync on a read-only handle (for example, fsck.ext2 does it when doing read-only check), and this call results in kernel warning. The patch `b089cfd95d` ("block: don't warn for flush on read-only device") attempted to disable the warning, but it is buggy and it doesn't (op_is_flush tests flags, but bio_op strips off the flags). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: `721c7fc701` ("block: fail op_is_write() requests to read-only partitions") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-05 16:14:36 -06:00
Linus Torvalds	5bed49adfe	for-4.19/post-20180822 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlt9on8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpj1xEADKBmJlV9aVyxc5w6XggqAGeHqI4afFrl+v 9fW6WUQMAaBUrr7PMIEJQ0Zm4B7KxgBaEWNtuuj4ULkjpgYm2AuGUuTJSyKz41rS Ma+KNyCA2Zmq4SvwGFbcdCuCbUqnoxTycscAgCjuDvIYLW0+nFGNc47ibmu9lZIV 33Ef5LrxuCjhC2zyNxEdWpUxDCjoYzock85LW+wYyIYLU9uKdoExS+YmT8U+ebA/ AkXBcxPztNDxwkcsIwgGVoTjwxiowqGz3uueWfyEmYgaCPiNOsxkoNQAtjX4ykQE hnqnHWyzJkRwbYo7Vd/bRAZXvszKGYE1YcJmu5QrNf0dK5MSq2o5OYJAEJWbucPj m0R2u7O9qbS2JEnxGrm5+oYJwBzwNY5/Lajr15WkljTqobKnqcvn/Hdgz/XdGtek 0S1QHkkBsF7e+cax8sePWK+O3ilY7pl9CzyZKB/tJngl8A45Jv8xVojg0v3O7oS+ zZib0rwWg/bwR/uN6uPCDcEsQusqL5YovB7m6NRVshwz6cV1zVNp2q+iOulk7KuC MprW4Du9CJf8HA19XtyJIG1XLstnuz+Exy+i5BiimUJ5InoEFDuj/6OZa6Qaczbo SrDDvpGtSf4h7czKpE5kV4uZiTOrjuI30TrI+4csdZ7HQIlboxNL72seNTLJs55F nbLjRM8L6g== =FS7e -----END PGP SIGNATURE----- Merge tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block Pull more block updates from Jens Axboe: - Set of bcache fixes and changes (Coly) - The flush warn fix (me) - Small series of BFQ fixes (Paolo) - wbt hang fix (Ming) - blktrace fix (Steven) - blk-mq hardware queue count update fix (Jianchao) - Various little fixes * tag 'for-4.19/post-20180822' of git://git.kernel.dk/linux-block: (31 commits) block/DAC960.c: make some arrays static const, shrinks object size blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter blk-mq: init hctx sched after update ctx and hctx mapping block: remove duplicate initialization tracing/blktrace: Fix to allow setting same value pktcdvd: fix setting of 'ret' error return for a few cases block: change return type to bool block, bfq: return nbytes and not zero from struct cftype .write() method block, bfq: improve code of bfq_bfqq_charge_time block, bfq: reduce write overcharge block, bfq: always update the budget of an entity when needed block, bfq: readd missing reset of parent-entity service blk-wbt: fix IO hang in wbt_wait() block: don't warn for flush on read-only device bcache: add the missing comments for smp_mb()/smp_wmb() bcache: remove unnecessary space before ioctl function pointer arguments bcache: add missing SPDX header bcache: move open brace at end of function definitions to next line bcache: add static const prefix to char * array declarations bcache: fix code comments style ...	2018-08-22 13:38:05 -07:00
Chaitanya Kulkarni	fcedba42d9	block: remove duplicate initialization This patch removes the duplicate initialization of q->queue_head in the blk_alloc_queue_node(). This removes the 2nd initialization so that we preserve the initialization order same as declaration present in struct request_queue. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-17 12:33:36 -06:00
Linus Torvalds	73ba2fb33c	for-4.19/block-20180812 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltwvasQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpv65EACTq5gSLnJBI6ZPr1RAHruVDnjfzO2Veitl tUtjm0XfWmnEiwQ3dYvnyhy99xbyaG3900d9BClCTlH6xaUdSiQkDpcKG/R2F36J 5mZitYukQcpFAQJWF8YKsTTE7JPl4VglCIDqYiC4+C3rOSVi8lrKn2qp4J4MMCFn thRg3jCcq7c5s9Eigsop1pXWQSasubkXfk55Krcp4oybKYpYRKXXf74Mj14QAbwJ QHN3VisyAUWoBRg7UQZo1Npe2oPk6bbnJypnjf8M0M2EnlvddEkIlHob91sodka8 6p4APOEu5cbyXOBCAQsw/koff14mb8aEadqeQA68WvXfIdX9ZjfxCX0OoC3sBEXk yqJhZ0C980AM13zIBD8ejv4uasGcPca8W+47mE5P8sRiI++5kBsFWDZPCtUBna0X 2Kh24NsmEya9XRR5vsB84dsIPQ3tLMkxg/IgQRVDaSnfJz0c/+zm54xDyKRaFT4l 5iERk2WSkm9+8jNfVmWG0edrv6nRAXjpGwFfOCPh6/LCSCi4xQRULYN7sVzsX8ZK FRjt24HftBI8mJbh4BtweJvg+ppVe1gAk3IO3HvxAQhv29Hz+uvFYe9kL+3N8LJA Qosr9n9O4+wKYizJcDnw+5iPqCHfAwOm9th4pyedR+R7SmNcP3yNC8AbbheNBiF5 Zolos5H+JA== =b9ib -----END PGP SIGNATURE----- Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ...	2018-08-14 10:23:25 -07:00
Jens Axboe	b089cfd95d	block: don't warn for flush on read-only device Don't warn for a flush issued to a read-only device. It's not strictly a writable command, as it doesn't change any on-media data by itself. Reported-by: Stefan Agner <stefan@agner.ch> Fixes: `721c7fc701` ("block: fail op_is_write() requests to read-only partitions") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-14 10:52:40 -06:00
Bart Van Assche	4cf6324b17	block: Introduce blk_exit_queue() This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-09 09:12:59 -06:00
Linus Torvalds	a32e236eb9	Partially revert "block: fail op_is_write() requests to read-only partitions" It turns out that commit `721c7fc701` ("block: fail op_is_write() requests to read-only partitions"), while obviously correct, causes problems for some older lvm2 installations. The reason is that the lvm snapshotting will continue to write to the snapshow COW volume, even after the volume has been marked read-only. End result: snapshot failure. This has actually been fixed in newer version of the lvm2 tool, but the old tools still exist, and the breakage was reported both in the kernel bugzilla and in the Debian bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200439 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442 The lvm2 fix is here https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3 but until everybody has updated to recent versions, we'll have to weaken the "never write to read-only partitions" check. It now allows the write to happen, but causes a warning, something like this: generic_make_request: Trying to write to read-only block-device dm-3 (partno X) Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3 Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016 Workqueue: ksnaphd do_metadata RIP: 0010:generic_make_request_checks+0x4ac/0x600 ... Call Trace: generic_make_request+0x64/0x400 submit_bio+0x6c/0x140 dispatch_io+0x287/0x430 sync_io+0xc3/0x120 dm_io+0x1f8/0x220 do_metadata+0x1d/0x30 process_one_work+0x1b9/0x3e0 worker_thread+0x2b/0x3c0 kthread+0x113/0x130 ret_from_fork+0x35/0x40 Note that this is a "revert" in behavior only. I'm leaving alone the actual code cleanups in commit `721c7fc701`, but letting the previously uncaught request go through with a warning instead of stopping it. Fixes: `721c7fc701` ("block: fail op_is_write() requests to read-only partitions") Reported-and-tested-by: WGH <wgh@torlan.ru> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-08-04 18:19:55 -07:00
Ming Lei	b233f12704	block: really disable runtime-pm for blk-mq Runtime PM isn't ready for blk-mq yet, and commit `765e40b675` ("block: disable runtime-pm for blk-mq") tried to disable it. Unfortunately, it can't take effect in that way since user space still can switch it on via 'echo auto > /sys/block/sdN/device/power/control'. This patch disables runtime-pm for blk-mq really by pm_runtime_disable() and fixes all kinds of PM related kernel crash. Cc: Tomas Janousek <tomi@nomi.cz> Cc: Przemek Socha <soprwa@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: <stable@vger.kernel.org> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-08-02 10:36:02 -06:00
xiao jin	54648cf1ec	block: blk_init_allocated_queue() set q->fq as NULL in the fail case We find the memory use-after-free issue in __blk_drain_queue() on the kernel 4.14. After read the latest kernel 4.18-rc6 we think it has the same problem. Memory is allocated for q->fq in the blk_init_allocated_queue(). If the elevator init function called with error return, it will run into the fail case to free the q->fq. Then the __blk_drain_queue() uses the same memory after the free of the q->fq, it will lead to the unpredictable event. The patch is to set q->fq as NULL in the fail case of blk_init_allocated_queue(). Fixes: commit `7c94e1c157` ("block: introduce blk_flush_queue to drive flush machinery") Cc: <stable@vger.kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: xiao jin <jin.xiao@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-30 08:28:39 -06:00
Michael Callahan	ddcf35d397	block: Add and use op_stat_group() for indexing disk_stat fields. Add and use a new op_stat_group() function for indexing partition stat fields rather than indexing them by rq_data_dir() or bio_data_dir(). This function works similarly to op_is_sync() in that it takes the request::cmd_flags or bio::bi_opf flags and determines which stats should et updated. In addition, the second parameter to generic_start_io_acct() and generic_end_io_acct() is now a REQ_OP rather than simply a read or write bit and it uses op_stat_group() on the parameter to determine the stat group. Note that the partition in_flight counts are not part of the per-cpu statistics and as such are not indexed via this function. It's now indexed by op_is_write(). tj: Refreshed on top of v4.17. Updated to pass around REQ_OP. Signed-off-by: Michael Callahan <michaelcallahan@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-18 08:44:20 -06:00
Geert Uytterhoeven	e9a8385330	block: Add default switch case to blk_pm_allow_request() to kill warning With gcc 4.9.0 and 7.3.0: block/blk-core.c: In function 'blk_pm_allow_request': block/blk-core.c:2747:2: warning: enumeration value 'RPM_ACTIVE' not handled in switch [-Wswitch] switch (rq->q->rpm_status) { ^ Convert the return statement below the switch() block into a default case to fix this. Fixes: `e4f36b249b` ("block: fix peeking requests during PM") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	c1c80384c8	block: remove external dependency on wbt_flags We don't really need to save this stuff in the core block code, we can just pass the bio back into the helpers later on to derive the same flags and update the rq->wbt_flags appropriately. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Josef Bacik	a79050434b	blk-rq-qos: refactor out common elements of blk-wbt blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:54 -06:00
Bart Van Assche	1954e9a998	block: Document how blk_update_request() handles RQF_SPECIAL_PAYLOAD requests The payload of struct request is stored in the request.bio chain if the RQF_SPECIAL_PAYLOAD flag is not set and in request.special_vec if RQF_SPECIAL_PAYLOAD has been set. However, blk_update_request() iterates over req->bio whether or not RQF_SPECIAL_PAYLOAD has been set. Additionally, the RQF_SPECIAL_PAYLOAD flag is ignored by blk_rq_bytes() which means that the value returned by that function is incorrect if the RQF_SPECIAL_PAYLOAD flag has been set. It is not clear to me whether this is an oversight or whether this happened on purpose. Anyway, document that it is known that both functions ignore RQF_SPECIAL_PAYLOAD. See also commit `f9d03f96b9` ("block: improve handling of the magic discard payload"). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Ming Lei	1311326cf4	blk-mq: avoid to synchronize rcu inside blk_cleanup_queue() SCSI probing may synchronously create and destroy a lot of request_queues for non-existent devices. Any synchronize_rcu() in queue creation or destroy path may introduce long latency during booting, see detailed description in comment of blk_register_queue(). This patch removes one synchronize_rcu() inside blk_cleanup_queue() for this case, commit c2856ae2f315d75(blk-mq: quiesce queue before freeing queue) needs synchronize_rcu() for implementing blk_mq_quiesce_queue(), but when queue isn't initialized, it isn't necessary to do that since only pass-through requests are involved, no original issue in scsi_execute() at all. Without this patch and previous one, it may take more 20+ seconds for virtio-scsi to complete disk probe. With the two patches, the time becomes less than 100ms. Fixes: `c2856ae2f3` ("blk-mq: quiesce queue before freeing queue") Reported-by: Andrew Jones <drjones@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Tested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-09 09:07:52 -06:00
Bart Van Assche	297ba57dcd	block: Fix cloning of requests with a special payload This patch avoids that removing a path controlled by the dm-mpath driver while mkfs is running triggers the following kernel bug: kernel BUG at block/blk-core.c:3347! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 20 PID: 24369 Comm: mkfs.ext4 Not tainted 4.18.0-rc1-dbg+ #2 RIP: 0010:blk_end_request_all+0x68/0x70 Call Trace: <IRQ> dm_softirq_done+0x326/0x3d0 [dm_mod] blk_done_softirq+0x19b/0x1e0 __do_softirq+0x128/0x60d irq_exit+0x100/0x110 smp_call_function_single_interrupt+0x90/0x330 call_function_single_interrupt+0xf/0x20 </IRQ> Fixes: `f9d03f96b9` ("block: improve handling of the magic discard payload") Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-28 09:51:30 -06:00
Bart Van Assche	9c24c10a2c	Revert "block: Add warning for bi_next not NULL in bio_endio()" Commit `0ba99ca483` ("block: Add warning for bi_next not NULL in bio_endio()") breaks the dm driver. end_clone_bio() detects whether or not a bio is the last bio associated with a request by checking the .bi_next field. Commit `0ba99ca483` clears that field before end_clone_bio() has had a chance to inspect that field. Hence revert commit `0ba99ca483`. This patch avoids that KASAN reports the following complaint when running the srp-test software (srp-test/run_tests -c -d -r 10 -t 02-mq): ================================================================== BUG: KASAN: use-after-free in bio_advance+0x11b/0x1d0 Read of size 4 at addr ffff8801300e06d0 by task ksoftirqd/0/9 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_address_description+0x6f/0x270 kasan_report+0x241/0x360 __asan_load4+0x78/0x80 bio_advance+0x11b/0x1d0 blk_update_request+0xa7/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d run_ksoftirqd+0x3f/0x60 smpboot_thread_fn+0x352/0x460 kthread+0x1c1/0x1e0 ret_from_fork+0x24/0x30 Allocated by task 1918: save_stack+0x43/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0xfe/0x350 mempool_alloc_slab+0x15/0x20 mempool_alloc+0xfb/0x270 bio_alloc_bioset+0x244/0x350 submit_bh_wbc+0x9c/0x2f0 __block_write_full_page+0x299/0x5a0 block_write_full_page+0x16b/0x180 blkdev_writepage+0x18/0x20 __writepage+0x42/0x80 write_cache_pages+0x376/0x8a0 generic_writepages+0xbe/0x110 blkdev_writepages+0xe/0x10 do_writepages+0x9b/0x180 __filemap_fdatawrite_range+0x178/0x1c0 file_write_and_wait_range+0x59/0xc0 blkdev_fsync+0x46/0x80 vfs_fsync_range+0x66/0x100 do_fsync+0x3d/0x70 __x64_sys_fsync+0x21/0x30 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9: save_stack+0x43/0xd0 __kasan_slab_free+0x137/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xd3/0x380 mempool_free_slab+0x17/0x20 mempool_free+0x63/0x160 bio_free+0x81/0xa0 bio_put+0x59/0x60 end_bio_bh_io_sync+0x5d/0x70 bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 end_clone_bio+0xa3/0xd0 [dm_mod] bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d The buggy address belongs to the object at ffff8801300e0640 which belongs to the cache bio-0 of size 200 The buggy address is located 144 bytes inside of 200-byte region [ffff8801300e0640, ffff8801300e0708) The buggy address belongs to the page: page:ffffea0004c03800 count:1 mapcount:0 mapping:ffff88015a563a00 index:0x0 compound_mapcount: 0 flags: 0x8000000000008100(slab\|head) raw: 8000000000008100 dead000000000100 dead000000000200 ffff88015a563a00 raw: 0000000000000000 0000000000330033 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801300e0580: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc ffff8801300e0600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff8801300e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801300e0700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801300e0780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Cc: Kent Overstreet <kent.overstreet@gmail.com> Fixes: `0ba99ca483` ("block: Add warning for bi_next not NULL in bio_endio()") Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-19 11:59:47 -06:00
Hannes Reinecke	c04fa44b76	block: always set partition number to '0' in blk_partition_remap() blk_partition_remap() will only clear bi_partno if an actual remapping has happened. But flush request et al don't have an actual size, so the remapping doesn't happen and bi_partno is never cleared. So for stacked devices blk_partition_remap() will be called on each level. If (as is the case for native nvme multipathing) one of the lower-level devices do _not_support partitioning a spurious I/O error is generated. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-07 06:56:01 -06:00
Jens Axboe	cd4a4ae468	block: don't use blocking queue entered for recursive bio submits If we end up splitting a bio and the queue goes away between the initial submission and the later split submission, then we can block forever in blk_queue_enter() waiting for the reference to drop to zero. This will never happen, since we already hold a reference. Mark a split bio as already having entered the queue, so we can just use the live non-blocking queue enter variant. Thanks to Tetsuo Handa for the analysis. Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-02 20:35:00 -06:00
Christoph Hellwig	acddf3b308	block: move sysfs_lock into elevator_init Both callers take just around so function call, so move it in. Also remove the now pointless blk_mq_sched_init wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-01 07:38:19 -06:00
Christoph Hellwig	ddb7253254	block: remove the always unused name argument to elevator_init Reported-by: Damien Le Moal <Damien.LeMoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-01 07:38:17 -06:00
Christoph Hellwig	cbf62af353	block: move initialization of elevator-related fields to blk_alloc_queue_node No point in doing this in elevator_init. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Damien Le Moal <Damien.LeMoal@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-06-01 07:38:14 -06:00
Kent Overstreet	338aa96d56	block: convert bounce, q->bio_split to bioset_init()/mempool_init() Convert the core block functionality to embedded bio sets. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-30 15:33:32 -06:00
Keith Busch	12f5b93145	blk-mq: Remove generation seqeunce This patch simplifies the timeout handling by relying on the request reference counting to ensure the iterator is operating on an inflight and truly timed out request. Since the reference counting prevents the tag from being reallocated, the block layer no longer needs to prevent drivers from completing their requests while the timeout handler is operating on it: a driver completing a request is allowed to proceed to the next state without additional syncronization with the block layer. This also removes any need for generation sequence numbers since the request lifetime is prevented from being reallocated as a new sequence while timeout handling is operating on it. To enables this a refcount is added to struct request so that request users can be sure they're operating on the same request without it changing while they're processing it. The request's tag won't be released for reuse until both the timeout handler and the completion are done with it. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: slight cleanups, added back submission side hctx lock, use cmpxchg for completions] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-29 08:59:21 -06:00
Kent Overstreet	0ba99ca483	block: Add warning for bi_next not NULL in bio_endio() Recently found a bug where a driver left bi_next not NULL and then called bio_endio(), and then the submitter of the bio used bio_copy_data() which was treating src and dst as lists of bios. Fixed that bug by splitting out bio_list_copy_data(), but in case other things are depending on bi_next in weird ways, add a warning to help avoid more bugs like that in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-14 13:16:13 -06:00
Kent Overstreet	f4f8154a08	block: Use bioset_init() for fs_bio_set Minor optimization - remove a pointer indirection when using fs_bio_set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-14 13:16:06 -06:00
Christoph Hellwig	c3036021c7	block: use GFP_NOIO instead of __GFP_DIRECT_RECLAIM We just can't do I/O when doing block layer requests allocations, so use GFP_NOIO instead of the even more limited __GFP_DIRECT_RECLAIM. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-14 08:55:16 -06:00
Christoph Hellwig	4accf5fc79	block: pass an explicit gfp_t to get_request blk_old_get_request already has it at hand, and in blk_queue_bio, which is the fast path, it is constant. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-14 08:55:14 -06:00
Christoph Hellwig	ff005a0662	block: sanitize blk_get_request calling conventions Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-14 08:55:12 -06:00
Christoph Hellwig	a9a14d3671	block: fix __get_request documentation Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-14 08:55:11 -06:00
Omar Sandoval	522a777566	block: consolidate struct request timestamp fields Currently, struct request has four timestamp fields: - A start time, set at get_request time, in jiffies, used for iostats - An I/O start time, set at start_request time, in ktime nanoseconds, used for blk-stats (i.e., wbt, kyber, hybrid polling) - Another start time and another I/O start time, used for cfq and bfq These can all be consolidated into one start time and one I/O start time, both in ktime nanoseconds, shaving off up to 16 bytes from struct request depending on the kernel config. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-09 08:33:09 -06:00
Omar Sandoval	544ccc8dc9	block: get rid of struct blk_issue_stat struct blk_issue_stat squashes three things into one u64: - The time the driver started working on a request - The original size of the request (for the io.low controller) - Flags for writeback throttling It turns out that on x86_64, we have a 4 byte hole in struct request which we can fill with the non-timestamp fields from blk_issue_stat, simplifying things quite a bit. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-09 08:33:05 -06:00
Omar Sandoval	a8a4594170	block: pass struct request instead of struct blk_issue_stat to wbt issue_stat is going to go away, so first make writeback throttling take the containing request, update the internal wbt helpers accordingly, and change rwb->sync_cookie to be the request pointer instead of the issue_stat pointer. No functional change. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-05-09 08:33:02 -06:00

1 2 3 4 5 ...

702 commits