Commit graph

17 commits

Author SHA1 Message Date
Greg Kroah-Hartman
8ca5759502 This is the 4.19.73 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1/KiEACgkQONu9yGCS
 aT49JBAAy7b3wv1WXAtg9wsyS1JL4HbMXt3YjtokIX+UpkznoqII4B85QftPBbiD
 9zDuTWPjhrqKv1GsMkFRCqBVp5wGVik1MIbjVuKdstFN5W8KQybpbYnSW4T52+wS
 cs6oOPkLydAfWzKeq+ekEeU8yr5dua+Ui3huundZ49wseJWQP3fh9T+ToUx8V/cr
 tsLiRRgI0djj7KQWVuM1j8YGKT/6qk/UL0HMVZyoIdLmsxpLap+LWe0+CRXn8rvs
 eJJlVQTVtYf/ySoHkpnwR12VsjRYjx6pNkm/GrebMCkM7wF/4RMqxk7j9EU0PENH
 VUdRrUd+j/YPp6QzjSFMK0+0eb7Gm3X0FEN0IGZshu1r/CDnoj/7hqnBmOlYIbhv
 pdteYaLqWq7JjAHu7vF+S4aNQRGpAZb05LsbTJ39Eu3FbdVTLXsAuUveZ7Y4/y0X
 ri2M3d/sF/cjc3C+V7Y7h422SM36jSAK6496VAoRyqqjX/3JyROhgfU9NAMzVr83
 4uI904z9lH4TZGOd5YQgX2VuOtBcGwa7+g6fy97u1tp8UxSWFZRGDDLRysF/dIJO
 Wi51UK0Q7EWnqBTe0TFF6TjE5tC7R3ZgzqEQ1MU4eLI5mqokg82DAK4Ub2Wk5Qch
 CGs5/d16OOrLtG2RoaOGz9UdQR7IHUXLSqkKbaEdstc16MXNXns=
 =cmGh
 -----END PGP SIGNATURE-----

Merge 4.19.73 into android-4.19

Changes in 4.19.73
	ALSA: hda - Fix potential endless loop at applying quirks
	ALSA: hda/realtek - Fix overridden device-specific initialization
	ALSA: hda/realtek - Add quirk for HP Pavilion 15
	ALSA: hda/realtek - Enable internal speaker & headset mic of ASUS UX431FL
	ALSA: hda/realtek - Fix the problem of two front mics on a ThinkCentre
	sched/fair: Don't assign runtime for throttled cfs_rq
	drm/vmwgfx: Fix double free in vmw_recv_msg()
	vhost/test: fix build for vhost test
	vhost/test: fix build for vhost test - again
	powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction
	batman-adv: fix uninit-value in batadv_netlink_get_ifindex()
	batman-adv: Only read OGM tvlv_len after buffer len check
	hv_sock: Fix hang when a connection is closed
	Blk-iolatency: warn on negative inflight IO counter
	blk-iolatency: fix STS_AGAIN handling
	{nl,mac}80211: fix interface combinations on crypto controlled devices
	timekeeping: Use proper ktime_add when adding nsecs in coarse offset
	selftests: fib_rule_tests: use pre-defined DEV_ADDR
	x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace()
	powerpc/64: mark start_here_multiplatform as __ref
	media: stm32-dcmi: fix irq = 0 case
	arm64: dts: rockchip: enable usb-host regulators at boot on rk3328-rock64
	scripts/decode_stacktrace: match basepath using shell prefix operator, not regex
	riscv: remove unused variable in ftrace
	nvme-fc: use separate work queue to avoid warning
	clk: s2mps11: Add used attribute to s2mps11_dt_match
	remoteproc: qcom: q6v5: shore up resource probe handling
	modules: always page-align module section allocations
	kernel/module: Fix mem leak in module_add_modinfo_attrs
	drm/i915: Re-apply "Perform link quality check, unconditionally during long pulse"
	media: cec/v4l2: move V4L2 specific CEC functions to V4L2
	media: cec: remove cec-edid.c
	scsi: qla2xxx: Move log messages before issuing command to firmware
	keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h
	Drivers: hv: kvp: Fix two "this statement may fall through" warnings
	x86, hibernate: Fix nosave_regions setup for hibernation
	remoteproc: qcom: q6v5-mss: add SCM probe dependency
	drm/amdgpu/gfx9: Update gfx9 golden settings.
	drm/amdgpu: Update gc_9_0 golden settings.
	KVM: x86: hyperv: enforce vp_index < KVM_MAX_VCPUS
	KVM: x86: hyperv: consistently use 'hv_vcpu' for 'struct kvm_vcpu_hv' variables
	KVM: x86: hyperv: keep track of mismatched VP indexes
	KVM: hyperv: define VP assist page helpers
	x86/kvm/lapic: preserve gfn_to_hva_cache len on cache reinit
	drm/i915: Fix intel_dp_mst_best_encoder()
	drm/i915: Rename PLANE_CTL_DECOMPRESSION_ENABLE
	drm/i915/gen9+: Fix initial readout for Y tiled framebuffers
	drm/atomic_helper: Disallow new modesets on unregistered connectors
	Drivers: hv: kvp: Fix the indentation of some "break" statements
	Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up
	powerplay: Respect units on max dcfclk watermark
	drm/amd/pp: Fix truncated clock value when set watermark
	drm/amd/dm: Understand why attaching path/tile properties are needed
	ARM: davinci: da8xx: define gpio interrupts as separate resources
	ARM: davinci: dm365: define gpio interrupts as separate resources
	ARM: davinci: dm646x: define gpio interrupts as separate resources
	ARM: davinci: dm355: define gpio interrupts as separate resources
	ARM: davinci: dm644x: define gpio interrupts as separate resources
	s390/zcrypt: reinit ap queue state machine during device probe
	media: vim2m: use workqueue
	media: vim2m: use cancel_delayed_work_sync instead of flush_schedule_work
	drm/i915: Restore sane defaults for KMS on GEM error load
	drm/i915: Cleanup gt powerstate from gem
	KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switch
	Btrfs: clean up scrub is_dev_replace parameter
	Btrfs: fix deadlock with memory reclaim during scrub
	btrfs: Remove extent_io_ops::fill_delalloc
	btrfs: Fix error handling in btrfs_cleanup_ordered_extents
	scsi: megaraid_sas: Fix combined reply queue mode detection
	scsi: megaraid_sas: Add check for reset adapter bit
	scsi: megaraid_sas: Use 63-bit DMA addressing
	powerpc/pkeys: Fix handling of pkey state across fork()
	btrfs: volumes: Make sure no dev extent is beyond device boundary
	btrfs: Use real device structure to verify dev extent
	media: vim2m: only cancel work if it is for right context
	ARC: show_regs: lockdep: re-enable preemption
	ARC: mm: do_page_fault fixes #1: relinquish mmap_sem if signal arrives while handle_mm_fault
	IB/uverbs: Fix OOPs upon device disassociation
	crypto: ccree - fix resume race condition on init
	crypto: ccree - add missing inline qualifier
	drm/vblank: Allow dynamic per-crtc max_vblank_count
	drm/i915/ilk: Fix warning when reading emon_status with no output
	mfd: Kconfig: Fix I2C_DESIGNWARE_PLATFORM dependencies
	tpm: Fix some name collisions with drivers/char/tpm.h
	bcache: replace hard coded number with BUCKET_GC_GEN_MAX
	bcache: treat stale && dirty keys as bad keys
	KVM: VMX: Compare only a single byte for VMCS' "launched" in vCPU-run
	iio: adc: exynos-adc: Add S5PV210 variant
	dt-bindings: iio: adc: exynos-adc: Add S5PV210 variant
	iio: adc: exynos-adc: Use proper number of channels for Exynos4x12
	mt76: fix corrupted software generated tx CCMP PN
	drm/nouveau: Don't WARN_ON VCPI allocation failures
	iwlwifi: fix devices with PCI Device ID 0x34F0 and 11ac RF modules
	iwlwifi: add new card for 9260 series
	x86/kvmclock: set offset for kvm unstable clock
	spi: spi-gpio: fix SPI_CS_HIGH capability
	powerpc/kvm: Save and restore host AMR/IAMR/UAMOR
	mmc: renesas_sdhi: Fix card initialization failure in high speed mode
	btrfs: scrub: pass fs_info to scrub_setup_ctx
	btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
	btrfs: scrub: fix circular locking dependency warning
	btrfs: init csum_list before possible free
	PCI: qcom: Fix error handling in runtime PM support
	PCI: qcom: Don't deassert reset GPIO during probe
	drm: add __user attribute to ptr_to_compat()
	CIFS: Fix error paths in writeback code
	CIFS: Fix leaking locked VFS cache pages in writeback retry
	drm/i915: Handle vm_mmap error during I915_GEM_MMAP ioctl with WC set
	drm/i915: Sanity check mmap length against object size
	usb: typec: tcpm: Try PD-2.0 if sink does not respond to 3.0 source-caps
	arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's
	IB/mlx5: Reset access mask when looping inside page fault handler
	kvm: mmu: Fix overflow on kvm mmu page limit calculation
	x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
	KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
	cifs: Fix lease buffer length error
	media: i2c: tda1997x: select V4L2_FWNODE
	ext4: protect journal inode's blocks using block_validity
	ARM: dts: qcom: ipq4019: fix PCI range
	ARM: dts: qcom: ipq4019: Fix MSI IRQ type
	ARM: dts: qcom: ipq4019: enlarge PCIe BAR range
	dt-bindings: mmc: Add supports-cqe property
	dt-bindings: mmc: Add disable-cqe-dcmd property.
	PCI: Add macro for Switchtec quirk declarations
	PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary
	dm mpath: fix missing call of path selector type->end_io
	blk-mq: free hw queue's resource in hctx's release handler
	mmc: sdhci-pci: Add support for Intel CML
	PCI: dwc: Use devm_pci_alloc_host_bridge() to simplify code
	cifs: smbd: take an array of reqeusts when sending upper layer data
	dm crypt: move detailed message into debug level
	signal/arc: Use force_sig_fault where appropriate
	ARC: mm: fix uninitialised signal code in do_page_fault
	ARC: mm: SIGSEGV userspace trying to access kernel virtual memory
	drm/amdkfd: Add missing Polaris10 ID
	kvm: Check irqchip mode before assign irqfd
	drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2)
	drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc
	Btrfs: fix race between block group removal and block group allocation
	cifs: add spinlock for the openFileList to cifsInodeInfo
	clk: tegra: Fix maximum audio sync clock for Tegra124/210
	clk: tegra210: Fix default rates for HDA clocks
	IB/hfi1: Avoid hardlockup with flushlist_lock
	apparmor: reset pos on failure to unpack for various functions
	scsi: target/core: Use the SECTOR_SHIFT constant
	scsi: target/iblock: Fix overrun in WRITE SAME emulation
	staging: wilc1000: fix error path cleanup in wilc_wlan_initialize()
	scsi: zfcp: fix request object use-after-free in send path causing wrong traces
	cifs: Properly handle auto disabling of serverino option
	ALSA: hda - Don't resume forcibly i915 HDMI/DP codec
	ceph: use ceph_evict_inode to cleanup inode's resource
	KVM: x86: optimize check for valid PAT value
	KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value
	KVM: VMX: Fix handling of #MC that occurs during VM-Entry
	KVM: VMX: check CPUID before allowing read/write of IA32_XSS
	KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu struct
	KVM: PPC: Book3S HV: Fix CR0 setting in TM emulation
	ARM: dts: gemini: Set DIR-685 SPI CS as active low
	RDMA/srp: Document srp_parse_in() arguments
	RDMA/srp: Accept again source addresses that do not have a port number
	btrfs: correctly validate compression type
	resource: Include resource end in walk_*() interfaces
	resource: Fix find_next_iomem_res() iteration issue
	resource: fix locking in find_next_iomem_res()
	pstore: Fix double-free in pstore_mkfile() failure path
	dm thin metadata: check if in fail_io mode when setting needs_check
	drm/panel: Add support for Armadeus ST0700 Adapt
	ALSA: hda - Fix intermittent CORB/RIRB stall on Intel chips
	powerpc/mm: Limit rma_size to 1TB when running without HV mode
	iommu/iova: Remove stale cached32_node
	gpio: don't WARN() on NULL descs if gpiolib is disabled
	i2c: at91: disable TXRDY interrupt after sending data
	i2c: at91: fix clk_offset for sama5d2
	mm/migrate.c: initialize pud_entry in migrate_vma()
	iio: adc: gyroadc: fix uninitialized return code
	NFSv4: Fix delegation state recovery
	bcache: only clear BTREE_NODE_dirty bit when it is set
	bcache: add comments for mutex_lock(&b->write_lock)
	bcache: fix race in btree_flush_write()
	drm/i915: Make sure cdclk is high enough for DP audio on VLV/CHV
	virtio/s390: fix race on airq_areas[]
	drm/atomic_helper: Allow DPMS On<->Off changes for unregistered connectors
	ext4: don't perform block validity checks on the journal inode
	ext4: fix block validity checks for journal inodes using indirect blocks
	ext4: unsigned int compared against zero
	PCI: Reset both NVIDIA GPU and HDA in ThinkPad P50 workaround
	powerpc/tm: Remove msr_tm_active()
	powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts
	vhost: make sure log_num < in_num
	Linux 4.19.73

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I7bc57825aeb36759bb8e8726888da9af06392c09
2019-09-16 09:35:02 +02:00
Dennis Zhou
178d1337a5 blk-iolatency: fix STS_AGAIN handling
[ Upstream commit c9b3007feca018d3f7061f5d5a14cb00766ffe9b ]

The iolatency controller is based on rq_qos. It increments on
rq_qos_throttle() and decrements on either rq_qos_cleanup() or
rq_qos_done_bio(). a3fb01ba5af0 fixes the double accounting issue where
blk_mq_make_request() may call both rq_qos_cleanup() and
rq_qos_done_bio() on REQ_NO_WAIT. So checking STS_AGAIN prevents the
double decrement.

The above works upstream as the only way we can get STS_AGAIN is from
blk_mq_get_request() failing. The STS_AGAIN handling isn't a real
problem as bio_endio() skipping only happens on reserved tag allocation
failures which can only be caused by driver bugs and already triggers
WARN.

However, the fix creates a not so great dependency on how STS_AGAIN can
be propagated. Internally, we (Facebook) carry a patch that kills read
ahead if a cgroup is io congested or a fatal signal is pending. This
combined with chained bios progagate their bi_status to the parent is
not already set can can cause the parent bio to not clean up properly
even though it was successful. This consequently leaks the inflight
counter and can hang all IOs under that blkg.

To nip the adverse interaction early, this removes the rq_qos_cleanup()
callback in iolatency in favor of cleaning up always on the
rq_qos_done_bio() path.

Fixes: a3fb01ba5af0 ("blk-iolatency: only account submitted bios")
Debugged-by: Tejun Heo <tj@kernel.org>
Debugged-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16 08:21:41 +02:00
Liu Bo
5f33e81250 Blk-iolatency: warn on negative inflight IO counter
[ Upstream commit 391f552af213985d3d324c60004475759a7030c5 ]

This is to catch any unexpected negative value of inflight IO counter.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16 08:21:41 +02:00
Greg Kroah-Hartman
71ce27c31a This is the 4.19.61 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl06qFcACgkQONu9yGCS
 aT6O9A/+JZqoVYnItpOnT8Hu//0mYEKvREWqsoTJNpZJhLWtGjPTT9ospHNpVgfC
 GUkFqngWzXHpzCgTYHUV3Mm+SIiVXCM3nkCU1+2YOsPzrKo/lJSfFt3wOYGpKO5V
 qratAQLra5TqR0teR00aQblqKqfmrux05uL9dNcVIwve813m00jFALcpjrXnanpP
 tx5cqCo3uHOou5XLraHx/CMPnfJI/mLegBUTM4DxAmN2vG4gQck2gnrU7s1eg4cy
 1Fqh0Oo2Ycj5p9yoGss02JqR3wGZHOEmF55j2JcTZAPvW6/c55iPd52Trn8kPOHB
 Awq/VwJmP4p10a4TWoZpv7VqpL3PzO8/AW7QWOER8QnDzfOTHGae7YT8LVp5Xqj5
 1NqowuP/Tm0yaZSaDLqkdvhVqTi0oGL8OCYLErpeR9PQ3P+p3paaswopsPqnXURj
 Q4Pahe1vm9WG2NpKh2bHVmmVkQmvwuxxxnaa31HI/IyLd5bYFV1/LbEa/XrSK36W
 VJtO+0AjERO9uTVP/YDloDkQ4R3+3W+m520jYsgf1OwY7v/Kc6iLb7cDwci/ZWMy
 YSMm8hrO0nzuT0SI25TKLDvxjGbANKvxytzOQMOTb8NsIWwaoEKWh+4r9XkdUXNa
 +dx72I5J2Be+3hk+eaDNzCdEae5pgVTxBpwJbzI4RfnK1Doa4uE=
 =hJdd
 -----END PGP SIGNATURE-----

Merge 4.19.61 into android-4.19

Changes in 4.19.61
	MIPS: ath79: fix ar933x uart parity mode
	MIPS: fix build on non-linux hosts
	arm64/efi: Mark __efistub_stext_offset as an absolute symbol explicitly
	scsi: iscsi: set auth_protocol back to NULL if CHAP_A value is not supported
	dmaengine: imx-sdma: fix use-after-free on probe error path
	wil6210: fix potential out-of-bounds read
	ath10k: Do not send probe response template for mesh
	ath9k: Check for errors when reading SREV register
	ath6kl: add some bounds checking
	ath10k: add peer id check in ath10k_peer_find_by_id
	wil6210: fix spurious interrupts in 3-msi
	ath: DFS JP domain W56 fixed pulse type 3 RADAR detection
	regmap: debugfs: Fix memory leak in regmap_debugfs_init
	batman-adv: fix for leaked TVLV handler.
	media: dvb: usb: fix use after free in dvb_usb_device_exit
	media: spi: IR LED: add missing of table registration
	crypto: talitos - fix skcipher failure due to wrong output IV
	media: ov7740: avoid invalid framesize setting
	media: marvell-ccic: fix DMA s/g desc number calculation
	media: vpss: fix a potential NULL pointer dereference
	media: media_device_enum_links32: clean a reserved field
	net: stmmac: dwmac1000: Clear unused address entries
	net: stmmac: dwmac4/5: Clear unused address entries
	qed: Set the doorbell address correctly
	signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig
	af_key: fix leaks in key_pol_get_resp and dump_sp.
	xfrm: Fix xfrm sel prefix length validation
	fscrypt: clean up some BUG_ON()s in block encryption/decryption
	perf annotate TUI browser: Do not use member from variable within its own initialization
	media: mc-device.c: don't memset __user pointer contents
	media: saa7164: fix remove_proc_entry warning
	media: staging: media: davinci_vpfe: - Fix for memory leak if decoder initialization fails.
	net: phy: Check against net_device being NULL
	crypto: talitos - properly handle split ICV.
	crypto: talitos - Align SEC1 accesses to 32 bits boundaries.
	tua6100: Avoid build warnings.
	batman-adv: Fix duplicated OGMs on NETDEV_UP
	locking/lockdep: Fix merging of hlocks with non-zero references
	media: wl128x: Fix some error handling in fm_v4l2_init_video_device()
	net: hns3: set ops to null when unregister ad_dev
	cpupower : frequency-set -r option misses the last cpu in related cpu list
	arm64: mm: make CONFIG_ZONE_DMA32 configurable
	perf jvmti: Address gcc string overflow warning for strncpy()
	net: stmmac: dwmac4: fix flow control issue
	net: stmmac: modify default value of tx-frames
	crypto: inside-secure - do not rely on the hardware last bit for result descriptors
	net: fec: Do not use netdev messages too early
	net: axienet: Fix race condition causing TX hang
	s390/qdio: handle PENDING state for QEBSM devices
	RAS/CEC: Fix pfn insertion
	net: sfp: add mutex to prevent concurrent state checks
	ipset: Fix memory accounting for hash types on resize
	perf cs-etm: Properly set the value of 'old' and 'head' in snapshot mode
	perf test 6: Fix missing kvm module load for s390
	perf report: Fix OOM error in TUI mode on s390
	irqchip/meson-gpio: Add support for Meson-G12A SoC
	media: uvcvideo: Fix access to uninitialized fields on probe error
	media: fdp1: Support M3N and E3 platforms
	iommu: Fix a leak in iommu_insert_resv_region
	gpio: omap: fix lack of irqstatus_raw0 for OMAP4
	gpio: omap: ensure irq is enabled before wakeup
	regmap: fix bulk writes on paged registers
	bpf: silence warning messages in core
	media: s5p-mfc: fix reading min scratch buffer size on MFC v6/v7
	selinux: fix empty write to keycreate file
	x86/cpu: Add Ice Lake NNPI to Intel family
	ASoC: meson: axg-tdm: fix sample clock inversion
	rcu: Force inlining of rcu_read_lock()
	x86/cpufeatures: Add FDP_EXCPTN_ONLY and ZERO_FCS_FDS
	qed: iWARP - Fix tc for MPA ll2 connection
	net: hns3: fix for skb leak when doing selftest
	block: null_blk: fix race condition for null_del_dev
	blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration
	xfrm: fix sa selector validation
	sched/core: Add __sched tag for io_schedule()
	sched/fair: Fix "runnable_avg_yN_inv" not used warnings
	perf/x86/intel/uncore: Handle invalid event coding for free-running counter
	x86/atomic: Fix smp_mb__{before,after}_atomic()
	perf evsel: Make perf_evsel__name() accept a NULL argument
	vhost_net: disable zerocopy by default
	ipoib: correcly show a VF hardware address
	x86/cacheinfo: Fix a -Wtype-limits warning
	blk-iolatency: only account submitted bios
	ACPICA: Clear status of GPEs on first direct enable
	EDAC/sysfs: Fix memory leak when creating a csrow object
	nvme: fix possible io failures when removing multipathed ns
	nvme-pci: properly report state change failure in nvme_reset_work
	nvme-pci: set the errno on ctrl state change error
	lightnvm: pblk: fix freeing of merged pages
	arm64: Do not enable IRQs for ct_user_exit
	ipsec: select crypto ciphers for xfrm_algo
	ipvs: defer hook registration to avoid leaks
	media: s5p-mfc: Make additional clocks optional
	media: i2c: fix warning same module names
	ntp: Limit TAI-UTC offset
	timer_list: Guard procfs specific code
	acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
	media: coda: fix mpeg2 sequence number handling
	media: coda: fix last buffer handling in V4L2_ENC_CMD_STOP
	media: coda: increment sequence offset for the last returned frame
	media: vimc: cap: check v4l2_fill_pixfmt return value
	media: hdpvr: fix locking and a missing msleep
	net: stmmac: sun8i: force select external PHY when no internal one
	rtlwifi: rtl8192cu: fix error handle when usb probe failed
	mt7601u: do not schedule rx_tasklet when the device has been disconnected
	x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
	mt7601u: fix possible memory leak when the device is disconnected
	ipvs: fix tinfo memory leak in start_sync_thread
	ath10k: add missing error handling
	ath10k: fix PCIE device wake up failed
	perf tools: Increase MAX_NR_CPUS and MAX_CACHES
	ASoC: Intel: hdac_hdmi: Set ops to NULL on remove
	libata: don't request sense data on !ZAC ATA devices
	clocksource/drivers/exynos_mct: Increase priority over ARM arch timer
	xsk: Properly terminate assignment in xskq_produce_flush_desc
	rslib: Fix decoding of shortened codes
	rslib: Fix handling of of caller provided syndrome
	ixgbe: Check DDM existence in transceiver before access
	crypto: serpent - mark __serpent_setkey_sbox noinline
	crypto: asymmetric_keys - select CRYPTO_HASH where needed
	wil6210: drop old event after wmi_call timeout
	EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec
	bcache: check CACHE_SET_IO_DISABLE in allocator code
	bcache: check CACHE_SET_IO_DISABLE bit in bch_journal()
	bcache: acquire bch_register_lock later in cached_dev_free()
	bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush()
	bcache: fix potential deadlock in cached_def_free()
	net: hns3: fix a -Wformat-nonliteral compile warning
	net: hns3: add some error checking in hclge_tm module
	ath10k: destroy sdio workqueue while remove sdio module
	net: mvpp2: prs: Don't override the sign bit in SRAM parser shift
	igb: clear out skb->tstamp after reading the txtime
	iwlwifi: mvm: Drop large non sta frames
	bpf: fix uapi bpf_prog_info fields alignment
	perf stat: Make metric event lookup more robust
	perf stat: Fix group lookup for metric group
	bnx2x: Prevent ptp_task to be rescheduled indefinitely
	net: usb: asix: init MAC address buffers
	rxrpc: Fix oops in tracepoint
	bpf, libbpf, smatch: Fix potential NULL pointer dereference
	selftests: bpf: fix inlines in test_lwt_seg6local
	bonding: validate ip header before check IPPROTO_IGMP
	gpiolib: Fix references to gpiod_[gs]et_*value_cansleep() variants
	tools: bpftool: Fix json dump crash on powerpc
	Bluetooth: hci_bcsp: Fix memory leak in rx_skb
	Bluetooth: Add new 13d3:3491 QCA_ROME device
	Bluetooth: Add new 13d3:3501 QCA_ROME device
	Bluetooth: 6lowpan: search for destination address in all peers
	perf tests: Fix record+probe_libc_inet_pton.sh for powerpc64
	Bluetooth: Check state in l2cap_disconnect_rsp
	gtp: add missing gtp_encap_disable_sock() in gtp_encap_enable()
	Bluetooth: validate BLE connection interval updates
	gtp: fix suspicious RCU usage
	gtp: fix Illegal context switch in RCU read-side critical section.
	gtp: fix use-after-free in gtp_encap_destroy()
	gtp: fix use-after-free in gtp_newlink()
	net: mvmdio: defer probe of orion-mdio if a clock is not ready
	iavf: fix dereference of null rx_buffer pointer
	floppy: fix div-by-zero in setup_format_params
	floppy: fix out-of-bounds read in next_valid_format
	floppy: fix invalid pointer dereference in drive_name
	floppy: fix out-of-bounds read in copy_buffer
	xen: let alloc_xenballooned_pages() fail if not enough memory free
	scsi: NCR5380: Reduce goto statements in NCR5380_select()
	scsi: NCR5380: Always re-enable reselection interrupt
	Revert "scsi: ncr5380: Increase register polling limit"
	scsi: core: Fix race on creating sense cache
	scsi: megaraid_sas: Fix calculation of target ID
	scsi: mac_scsi: Increase PIO/PDMA transfer length threshold
	scsi: mac_scsi: Fix pseudo DMA implementation, take 2
	crypto: ghash - fix unaligned memory access in ghash_setkey()
	crypto: ccp - Validate the the error value used to index error messages
	crypto: arm64/sha1-ce - correct digest for empty data in finup
	crypto: arm64/sha2-ce - correct digest for empty data in finup
	crypto: chacha20poly1305 - fix atomic sleep when using async algorithm
	crypto: crypto4xx - fix AES CTR blocksize value
	crypto: crypto4xx - fix blocksize for cfb and ofb
	crypto: crypto4xx - block ciphers should only accept complete blocks
	crypto: ccp - memset structure fields to zero before reuse
	crypto: ccp/gcm - use const time tag comparison.
	crypto: crypto4xx - fix a potential double free in ppc4xx_trng_probe
	Revert "bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()"
	bcache: Revert "bcache: fix high CPU occupancy during journal"
	bcache: Revert "bcache: free heap cache_set->flush_btree in bch_journal_free"
	bcache: ignore read-ahead request failure on backing device
	bcache: fix mistaken sysfs entry for io_error counter
	bcache: destroy dc->writeback_write_wq if failed to create dc->writeback_thread
	Input: gtco - bounds check collection indent level
	Input: alps - don't handle ALPS cs19 trackpoint-only device
	Input: synaptics - whitelist Lenovo T580 SMBus intertouch
	Input: alps - fix a mismatch between a condition check and its comment
	regulator: s2mps11: Fix buck7 and buck8 wrong voltages
	arm64: tegra: Update Jetson TX1 GPU regulator timings
	iwlwifi: pcie: don't service an interrupt that was masked
	iwlwifi: pcie: fix ALIVE interrupt handling for gen2 devices w/o MSI-X
	iwlwifi: don't WARN when calling iwl_get_shared_mem_conf with RF-Kill
	iwlwifi: fix RF-Kill interrupt while FW load for gen2 devices
	NFSv4: Handle the special Linux file open access mode
	pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error
	pNFS: Fix a typo in pnfs_update_layout
	pnfs: Fix a problem where we gratuitously start doing I/O through the MDS
	lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE
	ASoC: dapm: Adapt for debugfs API change
	raid5-cache: Need to do start() part job after adding journal device
	ALSA: seq: Break too long mutex context in the write loop
	ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell platform
	ALSA: hda/realtek: apply ALC891 headset fixup to one Dell machine
	media: v4l2: Test type instead of cfg->type in v4l2_ctrl_new_custom()
	media: coda: Remove unbalanced and unneeded mutex unlock
	media: videobuf2-core: Prevent size alignment wrapping buffer size to 0
	media: videobuf2-dma-sg: Prevent size from overflowing
	KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed
	arm64: tegra: Fix AGIC register range
	fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.
	kconfig: fix missing choice values in auto.conf
	drm/nouveau/i2c: Enable i2c pads & busses during preinit
	padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
	dm zoned: fix zone state management race
	xen/events: fix binding user event channels to cpus
	9p/xen: Add cleanup path in p9_trans_xen_init
	9p/virtio: Add cleanup path in p9_virtio_init
	x86/boot: Fix memory leak in default_get_smp_config()
	perf/x86/intel: Fix spurious NMI on fixed counter
	perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCs
	perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs
	drm/edid: parse CEA blocks embedded in DisplayID
	intel_th: pci: Add Ice Lake NNPI support
	PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
	PCI: Do not poll for PME if the device is in D3cold
	PCI: qcom: Ensure that PERST is asserted for at least 100 ms
	Btrfs: fix data loss after inode eviction, renaming it, and fsync it
	Btrfs: fix fsync not persisting dentry deletions due to inode evictions
	Btrfs: add missing inode version, ctime and mtime updates when punching hole
	IB/mlx5: Report correctly tag matching rendezvous capability
	HID: wacom: generic: only switch the mode on devices with LEDs
	HID: wacom: generic: Correct pad syncing
	HID: wacom: correct touch resolution x/y typo
	libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
	coda: pass the host file in vma->vm_file on mmap
	include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
	xfs: fix pagecache truncation prior to reflink
	xfs: flush removing page cache in xfs_reflink_remap_prep
	xfs: don't overflow xattr listent buffer
	xfs: rename m_inotbt_nores to m_finobt_nores
	xfs: don't ever put nlink > 0 inodes on the unlinked list
	xfs: reserve blocks for ifree transaction during log recovery
	xfs: fix reporting supported extra file attributes for statx()
	xfs: serialize unaligned dio writes against all other dio writes
	xfs: abort unaligned nowait directio early
	gpu: ipu-v3: ipu-ic: Fix saturation bit offset in TPMEM
	crypto: caam - limit output IV to CBC to work around CTR mode DMA issue
	parisc: Ensure userspace privilege for ptraced processes in regset functions
	parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1
	powerpc/32s: fix suspend/resume when IBATs 4-7 are used
	powerpc/watchpoint: Restore NV GPRs while returning from exception
	powerpc/powernv/npu: Fix reference leak
	powerpc/pseries: Fix oops in hotplug memory notifier
	mmc: sdhci-msm: fix mutex while in spinlock
	eCryptfs: fix a couple type promotion bugs
	mtd: rawnand: mtk: Correct low level time calculation of r/w cycle
	mtd: spinand: read returns badly if the last page has bitflips
	intel_th: msu: Fix single mode with disabled IOMMU
	Bluetooth: Add SMP workaround Microsoft Surface Precision Mouse bug
	usb: Handle USB3 remote wakeup for LPM enabled devices correctly
	blk-throttle: fix zero wait time for iops throttled group
	blk-iolatency: clear use_delay when io.latency is set to zero
	blkcg: update blkcg_print_stat() to handle larger outputs
	net: mvmdio: allow up to four clocks to be specified for orion-mdio
	dt-bindings: allow up to four clocks for orion-mdio
	dm bufio: fix deadlock with loop device
	Linux 4.19.61

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2f565111b1c16f369fa86e0481527fcc6357fe1b
2019-07-26 10:31:53 +02:00
Tejun Heo
73efdc5d7d blk-iolatency: clear use_delay when io.latency is set to zero
commit 5de0073fcd50cc1f150895a7bb04d3cf8067b1d7 upstream.

If use_delay was non-zero when the latency target of a cgroup was set
to zero, it will stay stuck until io.latency is enabled on the cgroup
again.  This keeps readahead disabled for the cgroup impacting
performance negatively.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <jbacik@fb.com>
Fixes: d706751215 ("block: introduce blk-iolatency io controller")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-26 09:14:30 +02:00
Dennis Zhou
3ae98dc2db blk-iolatency: only account submitted bios
[ Upstream commit a3fb01ba5af066521f3f3421839e501bb2c71805 ]

As is, iolatency recognizes done_bio and cleanup as ending paths. If a
request is marked REQ_NOWAIT and fails to get a request, the bio is
cleaned up via rq_qos_cleanup() and ended in bio_wouldblock_error().
This results in underflowing the inflight counter. Fix this by only
accounting bios that were actually submitted.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-26 09:14:09 +02:00
Greg Kroah-Hartman
10f41ccfc7 This is the 4.19.36 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAly6xzUACgkQONu9yGCS
 aT5sIA//b7nAk2zuhmbkonsBfzFq5uBJmqXcCrOgy3XHMs4fE+Q11kLd1wMAV7dx
 U7FNHe4PIJ8Rczxgqr2VP3VmFbV6UuTK+UTclJKfbV3ouIAQiQBuutABBmbDUj2p
 FInc/yAYyhVc9n7gX78czTiUxKnKi4+sisUYDCZPr3hr6jDPcLvm/WVWdyrcXJje
 rYFNmE/2MBH1NofG+MOpq+ILhKHXlf2APN2/spl+I42a8bwodiSl9g+dhuWr7wgT
 Ln2Ocf7BZ6BPCQKoveZdD1Gd56NNR/lJh4ulqpuhaZw4Yp+B/C7GmrBtdPzVSGka
 IwPWoSc9/9VSUl+ooSZHms78VLbqq0rNNclskL2bN6m962u04Eu7sB2Tg/bwUs52
 Wkcw0DY4J/oMJtj/CMHcQOUPsk6vwHxqnjsj+LYJ1ZjHO68tUshnENxXrbAoDc45
 2fuY3TCA+XqFvqNt5HbkLPtFR78u8QmZ1lP/Pkri6xoG/GA6O0EAxhS0Z9hncGK7
 8wJNuxLMd2UX94wlajQ+DF7yyCU4HOFEdeSEOwlHHBid/fckXsGzL2tKJUAbbUPP
 ux3An8kJHni8nQrmUkyy1Nx29ROyAFxBLOQshWGpXgJrV3qRMYLyB2Icv0WYCGFk
 zZCTupPgvb46u81VzqxrLH4RZdy4Ar4uB3BQGPKs596rlYmvnSo=
 =CArs
 -----END PGP SIGNATURE-----

Merge 4.19.36 into android-4.19

Changes in 4.19.36
	ARC: u-boot args: check that magic number is correct
	arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM
	inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch()
	perf/core: Restore mmap record type correctly
	ext4: avoid panic during forced reboot
	ext4: add missing brelse() in add_new_gdb_meta_bg()
	ext4: report real fs size after failed resize
	ALSA: echoaudio: add a check for ioremap_nocache
	ALSA: sb8: add a check for request_region
	auxdisplay: hd44780: Fix memory leak on ->remove()
	drm/udl: use drm_gem_object_put_unlocked.
	IB/mlx4: Fix race condition between catas error reset and aliasguid flows
	i40iw: Avoid panic when handling the inetdev event
	mmc: davinci: remove extraneous __init annotation
	ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
	thermal/intel_powerclamp: fix __percpu declaration of worker_data
	thermal: samsung: Fix incorrect check after code merge
	thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs
	thermal/int340x_thermal: Add additional UUIDs
	thermal/int340x_thermal: fix mode setting
	thermal/intel_powerclamp: fix truncated kthread name
	scsi: iscsi: flush running unbind operations when removing a session
	sched/cpufreq: Fix 32-bit math overflow
	sched/core: Fix buffer overflow in cgroup2 property cpu.max
	x86/mm: Don't leak kernel addresses
	tools/power turbostat: return the exit status of a command
	perf list: Don't forget to drop the reference to the allocated thread_map
	perf config: Fix an error in the config template documentation
	perf config: Fix a memory leak in collect_config()
	perf build-id: Fix memory leak in print_sdt_events()
	perf top: Fix error handling in cmd_top()
	perf hist: Add missing map__put() in error case
	perf evsel: Free evsel->counts in perf_evsel__exit()
	perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test
	perf tests: Fix memory leak by expr__find_other() in test__expr()
	perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
	ACPI / utils: Drop reference in test for device presence
	PM / Domains: Avoid a potential deadlock
	blk-iolatency: #include "blk.h"
	drm/exynos/mixer: fix MIXER shadow registry synchronisation code
	irqchip/stm32: Don't clear rising/falling config registers at init
	irqchip/mbigen: Don't clear eventid when freeing an MSI
	x86/hpet: Prevent potential NULL pointer dereference
	x86/hyperv: Prevent potential NULL pointer dereference
	x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
	drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure
	iommu/vt-d: Check capability before disabling protected memory
	x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
	fix incorrect error code mapping for OBJECTID_NOT_FOUND
	x86/gart: Exclude GART aperture from kcore
	ext4: prohibit fstrim in norecovery mode
	drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up
	gpio: pxa: handle corner case of unprobed device
	rsi: improve kernel thread handling to fix kernel panic
	f2fs: fix to avoid NULL pointer dereference on se->discard_map
	9p: do not trust pdu content for stat item size
	9p locks: add mount option for lock retry interval
	ASoC: Fix UBSAN warning at snd_soc_get/put_volsw_sx()
	f2fs: fix to do sanity check with current segment number
	netfilter: xt_cgroup: shrink size of v2 path
	serial: uartps: console_setup() can't be placed to init section
	powerpc/pseries: Remove prrn_work workqueue
	media: au0828: cannot kfree dev before usb disconnect
	Bluetooth: Fix debugfs NULL pointer dereference
	HID: i2c-hid: override HID descriptors for certain devices
	pinctrl: core: make sure strcmp() doesn't get a null parameter
	ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms
	usbip: fix vhci_hcd controller counting
	ACPI / SBS: Fix GPE storm on recent MacBookPro's
	HID: usbhid: Add quirk for Redragon/Dragonrise Seymur 2
	KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
	compiler.h: update definition of unreachable()
	netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine
	f2fs: cleanup dirty pages if recover failed
	net: stmmac: Set OWN bit for jumbo frames
	cifs: fallback to older infolevels on findfirst queryinfo retry
	kernel: hung_task.c: disable on suspend
	platform/x86: Add Intel AtomISP2 dummy / power-management driver
	drm/ttm: Fix bo_global and mem_global kfree error
	ALSA: hda: fix front speakers on Huawei MBXP
	ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle
	net/rds: fix warn in rds_message_alloc_sgs
	xfrm: destroy xfrm_state synchronously on net exit path
	crypto: sha256/arm - fix crash bug in Thumb2 build
	crypto: sha512/arm - fix crash bug in Thumb2 build
	net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version
	iommu/dmar: Fix buffer overflow during PCI bus notification
	scsi: core: Avoid that system resume triggers a kernel warning
	soc/tegra: pmc: Drop locking from tegra_powergate_is_powered()
	lkdtm: Print real addresses
	lkdtm: Add tests for NULL pointer dereference
	drm/panel: panel-innolux: set display off in innolux_panel_unprepare
	crypto: axis - fix for recursive locking from bottom half
	Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk"
	coresight: cpu-debug: Support for CA73 CPUs
	PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports
	drm/nouveau/volt/gf117: fix speedo readout register
	ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
	drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI)
	appletalk: Fix use-after-free in atalk_proc_exit
	lib/div64.c: off by one in shift
	rxrpc: Fix client call connect/disconnect race
	f2fs: fix to dirty inode for i_mode recovery
	include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
	bpf: fix use after free in bpf_evict_inode
	IB/hfi1: Failed to drain send queue when QP is put into error state
	mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo
	mm: hide incomplete nr_indirectly_reclaimable in sysfs
	appletalk: Fix compile regression
	Linux 4.19.36

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-20 15:53:36 +02:00
Bart Van Assche
bde271d1ad blk-iolatency: #include "blk.h"
[ Upstream commit 373e915cd8e84544609eced57a44fbc084f8d60f ]

This patch avoids that the following warning is reported when building
with W=1:

block/blk-iolatency.c:734:5: warning: no previous prototype for 'blk_iolatency_init' [-Wmissing-prototypes]

Cc: Josef Bacik <jbacik@fb.com>
Fixes: d706751215 ("block: introduce blk-iolatency io controller") # v4.19
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20 09:15:58 +02:00
Johannes Weiner
2ba18b41d3 BACKPORT: sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
There are several definitions of those functions/macros in places that
mess with fixed-point load averages.  Provide an official version.

[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Daniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 8508cf3ffad4defa202b303e5b6379efc4cd9054)

Conflicts:
        block/blk-iolatency.c

(1. manual merge to replace stat->rqs.mean with stat.mean)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I716b4874491cff75a2355c6d95c64cf02d05e7ee
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-21 16:25:26 -07:00
Liu Bo
6d482bc569 blk-iolatency: fix IO hang due to negative inflight counter
[ Upstream commit 8c772a9bfc7c07c76f4a58b58910452fbb20843b ]

Our test reported the following stack, and vmcore showed that
->inflight counter is -1.

[ffffc9003fcc38d0] __schedule at ffffffff8173d95d
[ffffc9003fcc3958] schedule at ffffffff8173de26
[ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
[ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
[ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
[ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
[ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
[ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
[ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
[ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
[ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
[ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
[ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
[ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
[ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
[ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
[ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
[ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
[ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e

The ->inflight counter may be negative (-1) if

1) blk-iolatency was disabled when the IO was issued,

2) blk-iolatency was enabled before this IO reached its endio,

3) the ->inflight counter is decreased from 0 to -1 in endio()

In fact the hang can be easily reproduced by the below script,

H=/sys/fs/cgroup/unified/
P=/sys/fs/cgroup/unified/test

echo "+io" > $H/cgroup.subtree_control
mkdir -p $P

echo $$ > $P/cgroup.procs

xfs_io -f -d -c "pwrite 0 4k" /dev/sdg

echo "`cat /sys/block/sdg/dev` target=1000000" > $P/io.latency

xfs_io -f -d -c "pwrite 0 4k" /dev/sdg

This fixes the problem by freezing the queue so that while
enabling/disabling iolatency, there is no inflight rq running.

Note that quiesce_queue is not needed as this only updating iolatency
configuration about which dispatching request_queue doesn't care.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-03-13 14:02:38 -07:00
Dennis Zhou (Facebook)
c480bcf97b block: make iolatency avg_lat exponentially decay
Currently, avg_lat is calculated by accumulating the mean of every
window in a long running cumulative average. As time goes on, the metric
becomes less and less useful due to the accumulated history.

This patch reuses the same calculation done in load averages to make the
avg_lat metric more lively. Unlike load averages, the avg only advances
when a window elapses (due to an io). Idle periods extend the most
recent window. Bucketing is used to limit the history of avg_lat by
binding it to the window size. So, the window range for 1/exp (decay
rate) is [1 min, 2.5 min) when windows elapse immediately.

The current sample window size is exposed in the debug info to enable
calculation of the window range.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02 09:58:14 -06:00
Josef Bacik
52a1199ccd blk-iolatency: fix blkg leak in timer_fn
At this point we have a ref on the blkg, we need to drop it if we don't
have a iolat.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-01 09:16:01 -06:00
Josef Bacik
71e9690b59 blk-iolatency: truncate our current time
In our longer tests we noticed that some boxes would degrade to the
point of uselessness.  This is because we truncate the current time when
saving it in our bio, but I was using the raw current time to subtract
from.  So once the box had been up a certain amount of time it would
appear as if our IO's were taking several years to complete.  Fix this
by truncating the current time so it matches the issue time.  Verified
this worked by running with this patch for a week on our test tier.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-16 10:15:19 -06:00
Josef Bacik
d607eefa3b blk-iolatency: don't change the latency window
Early versions of these patches had us waiting for seconds at a time
during submission, so we had to adjust the timing window we monitored
for latency.  Now we don't do things like that so this is unnecessary
code.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-16 10:15:17 -06:00
Josef Bacik
a284390b39 blk-iolatency: fix max_depth comparisons
max_depth used to be a u64, but I changed it to a unsigned int but
didn't convert my comparisons over everywhere.  Fix by using UINT_MAX
everywhere instead of (u64)-1.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-11 08:37:38 -06:00
Arnd Bergmann
88b7210c81 block: iolatency: avoid 64-bit division
On 32-bit architectures, dividing a 64-bit number needs to use the
do_div() function or something like it to avoid a link failure:

block/blk-iolatency.o: In function `iolatency_prfill_limit':
blk-iolatency.c:(.text+0x8cc): undefined reference to `__aeabi_uldivmod'

Using div_u64() gives us the best output and avoids the need for an
explicit cast.

Fixes: d706751215 ("block: introduce blk-iolatency io controller")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-10 12:26:09 -06:00
Josef Bacik
d706751215 block: introduce blk-iolatency io controller
Current IO controllers for the block layer are less than ideal for our
use case.  The io.max controller is great at hard limiting, but it is
not work conserving.  This patch introduces io.latency.  You provide a
latency target for your group and we monitor the io in short windows to
make sure we are not exceeding those latency targets.  This makes use of
the rq-qos infrastructure and works much like the wbt stuff.  There are
a few differences from wbt

 - It's bio based, so the latency covers the whole block layer in addition to
   the actual io.
 - We will throttle all IO types that comes in here if we need to.
 - We use the mean latency over the 100ms window.  This is because writes can
   be particularly fast, which could give us a false sense of the impact of
   other workloads on our protected workload.
 - By default there's no throttling, we set the queue_depth to INT_MAX so that
   we can have as many outstanding bio's as we're allowed to.  Only at
   throttle time do we pay attention to the actual queue depth.
 - We backcharge cgroups for root cg issued IO and induce artificial
   delays in order to deal with cases like metadata only or swap heavy
   workloads.

In testing this has worked out relatively well.  Protected workloads
will throttle noisy workloads down to 1 io at time if they are doing
normal IO on their own, or induce up to a 1 second delay per syscall if
they are doing a lot of root issued IO (metadata/swap IO).

Our testing has revolved mostly around our production web servers where
we have hhvm (the web server application) in a protected group and
everything else in another group.  We see slightly higher requests per
second (RPS) on the test tier vs the control tier, and much more stable
RPS across all machines in the test tier vs the control tier.

Another test we run is a slow memory allocator in the unprotected group.
Before this would eventually push us into swap and cause the whole box
to die and not recover at all.  With these patches we see slight RPS
drops (usually 10-15%) before the memory consumer is properly killed and
things recover within seconds.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-09 09:07:54 -06:00