Commit graph

29354 commits

Author SHA1 Message Date
Paul E. McKenney
9c85fc004e locktorture: Print ratio of acquisitions, not failures
commit 80c503e0e68fbe271680ab48f0fe29bc034b01b7 upstream.

The __torture_print_stats() function in locktorture.c carefully
initializes local variable "min" to statp[0].n_lock_acquired, but
then compares it to statp[i].n_lock_fail.  Given that the .n_lock_fail
field should normally be zero, and given the initialization, it seems
reasonable to display the maximum and minimum number acquisitions
instead of miscomputing the maximum and minimum number of failures.
This commit therefore switches from failures to acquisitions.

And this turns out to be not only a day-zero bug, but entirely my
own fault.  I hate it when that happens!

Fixes: 0af3fe1efa ("locktorture: Add a lock-torture kernel module")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23 10:30:23 +02:00
Anil Kumar Mamidala
8660cb6674 ANDROID: GKI: qos: Register irq notify after adding the qos request
Before adding the irq affinity based qos request to the list, if
the affinity of the interrupt changes it will trigger notify call.
This notifier call will try to update the qos request. Accessing
the qos request which is not yet added to the list leads to a
NULL pointer exception.

Avoid this race by registering the notifier after adding the
qos request.

Test: build, boot
Bug: 150901210
Change-Id: I99869cc233573b5db10e4f3224d65c29511050ea
Signed-off-by: Anil Kumar Mamidala <amami@codeaurora.org>
(cherry picked from commit 5db62557cb)
Signed-off-by: Hridya Valsaraju <hridya@google.com>
2020-04-23 00:35:53 +00:00
Greg Kroah-Hartman
fd8a9d61cf This is the 4.19.117 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6emrgACgkQONu9yGCS
 aT7KHhAAnWFfpGr89QEPUIDcdYNqmjnBlf7WRmVqQxbM+umD5AWo8fdLkKA43Fsx
 nNdMP6POYUwMqXahNOYwxCfRuw5sqsz/5bZO8O5p6fIXk1WhtW6Nzw78DHmDpQSj
 Cdfo92dJVhRcsCOElhrdsIypuBr7LoAOFjTGIzx4OZVXM3VJhWPpIgDEtU5yy/+S
 ym9TSU1RyQ9C/mIev3z6AXTAzAzWKdHXKtkWf3YW/7Mgr2QCcwmZxDlp9L1+L6e3
 lLn2IMcFH91Wj0hJX98OhkmjA0EJ/LNU4LaaIe/DxGBEtzyLjn+aoxGIEREnU/Y6
 36+3neWC3tJmUIzgyoRgVby+Jti3APEq3ncD0xzD8MAKitxihru7vKdTyfSWwmY0
 xSz2UbCbbF1BeG3MZQNzgdSQCn4o21Iyxu+aQVGSvVd4k43x4jbtNedLqA6mHmkz
 7I/V7UXyyzztDwlgT+DZa3LT6j4iv8VI6rPl7Evm3b5Iu9un3KLjnOEsXnvxjx9D
 o8dsPkK/pqbIW75bfThkoo8llmm/SsQ0n5GTKbITx9x0jU9E3VlQNHv+DUkT2CEn
 1cY4hsVNql475RsOabhXbfOXI7+uwUCxKEOVN7DysT8UGARGIXZOkrGLr4UqjQHI
 B4J8oKBPPS5ZQKEIC7j/h4V/exqtSZYTQ1GWUNj4uo9X7KnJ+K4=
 =kytQ
 -----END PGP SIGNATURE-----

Merge 4.19.117 into android-4.19

Changes in 4.19.117
	amd-xgbe: Use __napi_schedule() in BH context
	hsr: check protocol version in hsr_newlink()
	net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin
	net: ipv6: do not consider routes via gateways for anycast address check
	net: qrtr: send msgs from local of same id as broadcast
	net: revert default NAPI poll timeout to 2 jiffies
	net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
	net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
	ovl: fix value of i_ino for lower hardlink corner case
	scsi: ufs: Fix ufshcd_hold() caused scheduling while atomic
	jbd2: improve comments about freeing data buffers whose page mapping is NULL
	pwm: pca9685: Fix PWM/GPIO inter-operation
	ext4: fix incorrect group count in ext4_fill_super error message
	ext4: fix incorrect inodes per group in error message
	ASoC: Intel: mrfld: fix incorrect check on p->sink
	ASoC: Intel: mrfld: return error codes when an error occurs
	ALSA: usb-audio: Filter error from connector kctl ops, too
	ALSA: usb-audio: Don't override ignore_ctl_error value from the map
	ALSA: usb-audio: Don't create jack controls for PCM terminals
	ALSA: usb-audio: Check mapping at creating connector controls, too
	keys: Fix proc_keys_next to increase position index
	tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
	btrfs: check commit root generation in should_ignore_root
	mac80211_hwsim: Use kstrndup() in place of kasprintf()
	usb: dwc3: gadget: don't enable interrupt when disabling endpoint
	usb: dwc3: gadget: Don't clear flags before transfer ended
	drm/amd/powerplay: force the trim of the mclk dpm_levels if OD is enabled
	ext4: do not zeroout extents beyond i_disksize
	kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD
	scsi: target: remove boilerplate code
	scsi: target: fix hang when multiple threads try to destroy the same iscsi session
	x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
	x86/resctrl: Preserve CDP enable over CPU hotplug
	x86/resctrl: Fix invalid attempt at removing the default resource group
	wil6210: check rx_buff_mgmt before accessing it
	wil6210: ignore HALP ICR if already handled
	wil6210: add general initialization/size checks
	wil6210: make sure Rx ring sizes are correlated
	wil6210: remove reset file from debugfs
	mm/vmalloc.c: move 'area->pages' after if statement
	Linux 4.19.117

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib4ab9aa34c22c034887be15902a625ecc5622b35
2020-04-21 10:20:12 +02:00
Xiao Yang
57f2a2ad73 tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
commit 0bbe7f719985efd9adb3454679ecef0984cb6800 upstream.

Traced event can trigger 'snapshot' operation(i.e. calls snapshot_trigger()
or snapshot_count_trigger()) when register_snapshot_trigger() has completed
registration but doesn't allocate buffer for 'snapshot' event trigger.  In
the rare case, 'snapshot' operation always detects the lack of allocated
buffer so make register_snapshot_trigger() allocate buffer first.

trigger-snapshot.tc in kselftest reproduces the issue on slow vm:
-----------------------------------------------------------
cat trace
...
ftracetest-3028  [002] ....   236.784290: sched_process_fork: comm=ftracetest pid=3028 child_comm=ftracetest child_pid=3036
     <...>-2875  [003] ....   240.460335: tracing_snapshot_instance_cond: *** SNAPSHOT NOT ALLOCATED ***
     <...>-2875  [003] ....   240.460338: tracing_snapshot_instance_cond: *** stopping trace here!   ***
-----------------------------------------------------------

Link: http://lkml.kernel.org/r/20200414015145.66236-1-yangx.jy@cn.fujitsu.com

Cc: stable@vger.kernel.org
Fixes: 93e31ffbf4 ("tracing: Add 'snapshot' event trigger command")
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21 09:03:09 +02:00
Greg Kroah-Hartman
95bff4cdab This is the 4.19.116 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6ZbYYACgkQONu9yGCS
 aT76ohAAn4lIjSuMRCILy/lq0DXVWDy7q6YdfyzNBITxc86tVfnfjMeQxUBviE/1
 OzShWgMRXeqrb0xJTJ5Rv6mt5Kf9a3DpPWt2jwo1iqWkl4AihDtDV7Z2Bh+QdnSX
 +lQ1xGPqDi4MMgoYlpMtlFc3wq/pJV0i8Q7amXC/KbsDkt5dlDrQYeEZHe2P7pR9
 ZljKLHEdGRE3uGqXmEM8qb6aLjQudnHmH/9uChP4UX6b+ZADDCc05DMhEkhEoCZT
 jdxiqVZvRdiiXTc1r6ckGv0xae77s0IAAZMQAd+24zFK94QByi6d9Cw0y6qyyDi7
 1rfHIWSjvetY3+4DCQDOu/k2/pLt/Vqh9zuvtaf8Tu8cKM9rxow0Hl9FlL3fZpBN
 btpqeCY6twFxApHoAp9ZDK6otaVEOtbg1MCsmpUbVxWIF9IR8cPqMGyYK3lR2Ao1
 HgdKEFkYOycAOu51ujuHsDLx/9k2ZqeSPyh0yrdVpFUVvMV/YqoYP9X3jzGRVllL
 hgYfFcywgrVgxK4c02/6cPiJNbFskTpLllDPVVXGIjO+9R4vTRUgJ74CNrqL25aT
 ioSFWJA00UvXObnbCDdA+otYYWAmYOJX7HVvEieb0oDqPYHZHa1UW6+1WlYSAQLm
 WAsHiejOv6PwzRmCDI6RyuZKQjjX6bppAWFq0/RLPO0uEqjXlxc=
 =Iq3k
 -----END PGP SIGNATURE-----

Merge 4.19.116 into android-4.19

Changes in 4.19.116
	ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltage
	bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads
	net: vxge: fix wrong __VA_ARGS__ usage
	hinic: fix a bug of waitting for IO stopped
	hinic: fix wrong para of wait_for_completion_timeout
	cxgb4/ptp: pass the sign of offset delta in FW CMD
	qlcnic: Fix bad kzalloc null test
	i2c: st: fix missing struct parameter description
	cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL
	media: venus: hfi_parser: Ignore HEVC encoding for V1
	firmware: arm_sdei: fix double-lock on hibernate with shared events
	null_blk: Fix the null_add_dev() error path
	null_blk: Handle null_add_dev() failures properly
	null_blk: fix spurious IO errors after failed past-wp access
	xhci: bail out early if driver can't accress host in resume
	x86: Don't let pgprot_modify() change the page encryption bit
	block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices
	irqchip/versatile-fpga: Handle chained IRQs properly
	sched: Avoid scale real weight down to zero
	selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault
	PCI/switchtec: Fix init_completion race condition with poll_wait()
	media: i2c: video-i2c: fix build errors due to 'imply hwmon'
	libata: Remove extra scsi_host_put() in ata_scsi_add_hosts()
	pstore/platform: fix potential mem leak if pstore_init_fs failed
	gfs2: Don't demote a glock until its revokes are written
	x86/boot: Use unsigned comparison for addresses
	efi/x86: Ignore the memory attributes table on i386
	genirq/irqdomain: Check pointer in irq_domain_alloc_irqs_hierarchy()
	block: Fix use-after-free issue accessing struct io_cq
	media: i2c: ov5695: Fix power on and off sequences
	usb: dwc3: core: add support for disabling SS instances in park mode
	irqchip/gic-v4: Provide irq_retrigger to avoid circular locking dependency
	md: check arrays is suspended in mddev_detach before call quiesce operations
	firmware: fix a double abort case with fw_load_sysfs_fallback
	locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()
	block, bfq: fix use-after-free in bfq_idle_slice_timer_body
	btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued
	btrfs: remove a BUG_ON() from merge_reloc_roots()
	btrfs: track reloc roots based on their commit root bytenr
	IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloads
	uapi: rename ext2_swab() to swab() and share globally in swab.h
	slub: improve bit diffusion for freelist ptr obfuscation
	ASoC: fix regwmask
	ASoC: dapm: connect virtual mux with default value
	ASoC: dpcm: allow start or stop during pause for backend
	ASoC: topology: use name_prefix for new kcontrol
	usb: gadget: f_fs: Fix use after free issue as part of queue failure
	usb: gadget: composite: Inform controller driver of self-powered
	ALSA: usb-audio: Add mixer workaround for TRX40 and co
	ALSA: hda: Add driver blacklist
	ALSA: hda: Fix potential access overflow in beep helper
	ALSA: ice1724: Fix invalid access for enumerated ctl items
	ALSA: pcm: oss: Fix regression by buffer overflow fix
	ALSA: doc: Document PC Beep Hidden Register on Realtek ALC256
	ALSA: hda/realtek - Set principled PC Beep configuration for ALC256
	ALSA: hda/realtek - Remove now-unnecessary XPS 13 headphone noise fixups
	ALSA: hda/realtek - Add quirk for MSI GL63
	media: ti-vpe: cal: fix disable_irqs to only the intended target
	acpi/x86: ignore unspecified bit positions in the ACPI global lock field
	thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n
	nvme-fc: Revert "add module to ops template to allow module references"
	nvme: Treat discovery subsystems as unique subsystems
	PCI: pciehp: Fix indefinite wait on sysfs requests
	PCI/ASPM: Clear the correct bits when enabling L1 substates
	PCI: Add boot interrupt quirk mechanism for Xeon chipsets
	PCI: endpoint: Fix for concurrent memory allocation in OB address region
	tpm: Don't make log failures fatal
	tpm: tpm1_bios_measurements_next should increase position index
	tpm: tpm2_bios_measurements_next should increase position index
	KEYS: reaching the keys quotas correctly
	irqchip/versatile-fpga: Apply clear-mask earlier
	pstore: pstore_ftrace_seq_next should increase position index
	MIPS/tlbex: Fix LDDIR usage in setup_pw() for Loongson-3
	MIPS: OCTEON: irq: Fix potential NULL pointer dereference
	ath9k: Handle txpower changes even when TPC is disabled
	signal: Extend exec_id to 64bits
	x86/entry/32: Add missing ASM_CLAC to general_protection entry
	KVM: nVMX: Properly handle userspace interrupt window request
	KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
	KVM: s390: vsie: Fix delivery of addressing exceptions
	KVM: x86: Allocate new rmap and large page tracking when moving memslot
	KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
	KVM: x86: Gracefully handle __vmalloc() failure during VM allocation
	KVM: VMX: fix crash cleanup when KVM wasn't used
	CIFS: Fix bug which the return value by asynchronous read is error
	mtd: spinand: Stop using spinand->oobbuf for buffering bad block markers
	mtd: spinand: Do not erase the block before writing a bad block marker
	Btrfs: fix crash during unmount due to race with delayed inode workers
	btrfs: set update the uuid generation as soon as possible
	btrfs: drop block from cache on error in relocation
	btrfs: fix missing file extent item for hole after ranged fsync
	btrfs: fix missing semaphore unlock in btrfs_sync_file
	crypto: mxs-dcp - fix scatterlist linearization for hash
	erofs: correct the remaining shrink objects
	powerpc/pseries: Drop pointless static qualifier in vpa_debugfs_init()
	x86/speculation: Remove redundant arch_smt_update() invocation
	tools: gpio: Fix out-of-tree build regression
	mm: Use fixed constant in page_frag_alloc instead of size + 1
	net: qualcomm: rmnet: Allow configuration updates to existing devices
	arm64: dts: allwinner: h6: Fix PMU compatible
	dm writecache: add cond_resched to avoid CPU hangs
	dm verity fec: fix memory leak in verity_fec_dtr
	scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point
	arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
	selftests: vm: drop dependencies on page flags from mlock2 tests
	rtc: omap: Use define directive for PIN_CONFIG_ACTIVE_HIGH
	drm/etnaviv: rework perfmon query infrastructure
	powerpc/pseries: Avoid NULL pointer dereference when drmem is unavailable
	NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
	ext4: fix a data race at inode->i_blocks
	fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
	ocfs2: no need try to truncate file beyond i_size
	perf tools: Support Python 3.8+ in Makefile
	s390/diag: fix display of diagnose call statistics
	Input: i8042 - add Acer Aspire 5738z to nomux list
	clk: ingenic/jz4770: Exit with error if CGU init failed
	kmod: make request_module() return an error when autoloading is disabled
	cpufreq: powernv: Fix use-after-free
	hfsplus: fix crash and filesystem corruption when deleting files
	libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set
	ipmi: fix hung processes in __get_guid()
	xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()
	powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle
	powerpc/64/tm: Don't let userspace set regs->trap via sigreturn
	powerpc/hash64/devmap: Use H_PAGE_THP_HUGE when setting up huge devmap PTE entries
	powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs
	powerpc/kprobes: Ignore traps that happened in real mode
	scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug
	powerpc: Add attributes for setjmp/longjmp
	powerpc: Make setjmp/longjmp signature standard
	btrfs: use nofs allocations for running delayed items
	dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone()
	crypto: caam - update xts sector size for large input length
	crypto: ccree - improve error handling
	crypto: ccree - zero out internal struct before use
	crypto: ccree - don't mangle the request assoclen
	crypto: ccree - dec auth tag size from cryptlen map
	crypto: ccree - only try to map auth tag if needed
	Revert "drm/dp_mst: Remove VCPI while disabling topology mgr"
	drm/dp_mst: Fix clearing payload state on topology disable
	drm: Remove PageReserved manipulation from drm_pci_alloc
	ftrace/kprobe: Show the maxactive number on kprobe_events
	powerpc/fsl_booke: Avoid creating duplicate tlb1 entry
	misc: echo: Remove unnecessary parentheses and simplify check for zero
	etnaviv: perfmon: fix total and idle HI cyleces readout
	mfd: dln2: Fix sanity checking for endpoints
	efi/x86: Fix the deletion of variables in mixed mode
	Linux 4.19.116

Change-Id: If09fbb53fcb11ea01eaaa7fee7ed21ed6234f352
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-04-18 13:33:51 +02:00
Minchan Kim
c27ea19ea4 ANDROID: GKI: attribute page lock and waitqueue functions as sched
trace_sched_blocked_trace in CFS is really useful for debugging via
trace because it tell where the process was stuck on callstack.

For example,
           <...>-6143  ( 6136) [005] d..2    50.278987: sched_blocked_reason: pid=6136 iowait=0 caller=SyS_mprotect+0x88/0x208
           <...>-6136  ( 6136) [005] d..2    50.278990: sched_blocked_reason: pid=6142 iowait=0 caller=do_page_fault+0x1f4/0x3b0
           <...>-6142  ( 6136) [006] d..2    50.278996: sched_blocked_reason: pid=6144 iowait=0 caller=SyS_prctl+0x52c/0xb58
           <...>-6144  ( 6136) [006] d..2    50.279007: sched_blocked_reason: pid=6136 iowait=0 caller=vm_mmap_pgoff+0x74/0x104

However, sometime it gives pointless information like this.
    RenderThread-2322  ( 1805) [006] d.s3    50.319046: sched_blocked_reason: pid=6136 iowait=1 caller=__lock_page_killable+0x17c/0x220
     logd.writer-594   (  587) [002] d.s3    50.334011: sched_blocked_reason: pid=6126 iowait=1 caller=wait_on_page_bit+0x194/0x208
  kworker/u16:13-333   (  333) [007] d.s4    50.343161: sched_blocked_reason: pid=6136 iowait=1 caller=__lock_page_killable+0x17c/0x220

Such wait_on_page_bit, __lock_page_killable are pointless because it doesn't
carry on higher information to identify the callstack.

The reason is page_lock and waitqueue are special synchronization method unlike
other normal locks(mutex, spinlock).
Let's mark them as "__sched" so get_wchan which used in trace_sched_blocked_trace
could detect it and skip them. It will produce more meaningful callstack
function like this.

           <...>-2867  ( 1068) [002] d.h4   124.209701: sched_blocked_reason: pid=329 iowait=0 caller=worker_thread+0x378/0x470
           <...>-2867  ( 1068) [002] d.s3   124.209763: sched_blocked_reason: pid=8454 iowait=1 caller=__filemap_fdatawait_range+0xa0/0x104
           <...>-2867  ( 1068) [002] d.s4   124.209803: sched_blocked_reason: pid=869 iowait=0 caller=worker_thread+0x378/0x470
 ScreenDecoratio-2364  ( 1867) [002] d.s3   124.209973: sched_blocked_reason: pid=8454 iowait=1 caller=f2fs_wait_on_page_writeback+0x84/0xcc
 ScreenDecoratio-2364  ( 1867) [002] d.s4   124.209986: sched_blocked_reason: pid=869 iowait=0 caller=worker_thread+0x378/0x470
           <...>-329   (  329) [000] d..3   124.210435: sched_blocked_reason: pid=538 iowait=0 caller=worker_thread+0x378/0x470
  kworker/u16:13-538   (  538) [007] d..3   124.210450: sched_blocked_reason: pid=6 iowait=0 caller=worker_thread+0x378/0x470

Bug: 144961676
Bug: 144713689
Change-Id: I30397400c5d056946bdfbc86c9ef5f4d7e6c98fe
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
Bug: 152417756
(cherry picked from commit 8a780c0eb6800cecbfce21362c2d2a3bcab14e1c)
Signed-off-by: Saravana Kannan <saravanak@google.com>
2020-04-17 22:51:44 -07:00
Masami Hiramatsu
52f1c4257c ftrace/kprobe: Show the maxactive number on kprobe_events
[ Upstream commit 6a13a0d7b4d1171ef9b80ad69abc37e1daa941b3 ]

Show maxactive parameter on kprobe_events.
This allows user to save the current configuration and
restore it without losing maxactive parameter.

Link: http://lkml.kernel.org/r/4762764a-6df7-bc93-ed60-e336146dce1f@gmail.com
Link: http://lkml.kernel.org/r/158503528846.22706.5549974121212526020.stgit@devnote2

Cc: stable@vger.kernel.org
Fixes: 696ced4fb1 ("tracing/kprobes: expose maxactive for kretprobe in kprobe_events")
Reported-by: Taeung Song <treeze.taeung@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:48:55 +02:00
Eric Biggers
2a87b491b7 kmod: make request_module() return an error when autoloading is disabled
commit d7d27cfc5cf0766a26a8f56868c5ad5434735126 upstream.

Patch series "module autoloading fixes and cleanups", v5.

This series fixes a bug where request_module() was reporting success to
kernel code when module autoloading had been completely disabled via
'echo > /proc/sys/kernel/modprobe'.

It also addresses the issues raised on the original thread
(https://lkml.kernel.org/lkml/20200310223731.126894-1-ebiggers@kernel.org/T/#u)
bydocumenting the modprobe sysctl, adding a self-test for the empty path
case, and downgrading a user-reachable WARN_ONCE().

This patch (of 4):

It's long been possible to disable kernel module autoloading completely
(while still allowing manual module insertion) by setting
/proc/sys/kernel/modprobe to the empty string.

This can be preferable to setting it to a nonexistent file since it
avoids the overhead of an attempted execve(), avoids potential
deadlocks, and avoids the call to security_kernel_module_request() and
thus on SELinux-based systems eliminates the need to write SELinux rules
to dontaudit module_request.

However, when module autoloading is disabled in this way,
request_module() returns 0.  This is broken because callers expect 0 to
mean that the module was successfully loaded.

Apparently this was never noticed because this method of disabling
module autoloading isn't used much, and also most callers don't use the
return value of request_module() since it's always necessary to check
whether the module registered its functionality or not anyway.

But improperly returning 0 can indeed confuse a few callers, for example
get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:

	if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
		fs = __get_fs_type(name, len);
		WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
	}

This is easily reproduced with:

	echo > /proc/sys/kernel/modprobe
	mount -t NONEXISTENT none /

It causes:

	request_module fs-NONEXISTENT succeeded, but still no fs?
	WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
	[...]

This should actually use pr_warn_once() rather than WARN_ONCE(), since
it's also user-reachable if userspace immediately unloads the module.
Regardless, request_module() should correctly return an error when it
fails.  So let's make it return -ENOENT, which matches the error when
the modprobe binary doesn't exist.

I've also sent patches to document and test this case.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Ben Hutchings <benh@debian.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200310223731.126894-1-ebiggers@kernel.org
Link: http://lkml.kernel.org/r/20200312202552.241885-1-ebiggers@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17 10:48:52 +02:00
Zhenzhong Duan
6209e0981b x86/speculation: Remove redundant arch_smt_update() invocation
commit 34d66caf251df91ff27b24a3a786810d29989eca upstream.

With commit a74cfffb03b7 ("x86/speculation: Rework SMT state change"),
arch_smt_update() is invoked from each individual CPU hotplug function.

Therefore the extra arch_smt_update() call in the sysfs SMT control is
redundant.

Fixes: a74cfffb03b7 ("x86/speculation: Rework SMT state change")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <konrad.wilk@oracle.com>
Cc: <dwmw@amazon.co.uk>
Cc: <bp@suse.de>
Cc: <srinivas.eeda@oracle.com>
Cc: <peterz@infradead.org>
Cc: <hpa@zytor.com>
Link: https://lkml.kernel.org/r/e2e064f2-e8ef-42ca-bf4f-76b612964752@default
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17 10:48:50 +02:00
Eric W. Biederman
a2a1be2de7 signal: Extend exec_id to 64bits
commit d1e7fd6462ca9fc76650fbe6ca800e35b24267da upstream.

Replace the 32bit exec_id with a 64bit exec_id to make it impossible
to wrap the exec_id counter.  With care an attacker can cause exec_id
wrap and send arbitrary signals to a newly exec'd parent.  This
bypasses the signal sending checks if the parent changes their
credentials during exec.

The severity of this problem can been seen that in my limited testing
of a 32bit exec_id it can take as little as 19s to exec 65536 times.
Which means that it can take as little as 14 days to wrap a 32bit
exec_id.  Adam Zabrocki has succeeded wrapping the self_exe_id in 7
days.  Even my slower timing is in the uptime of a typical server.
Which means self_exec_id is simply a speed bump today, and if exec
gets noticably faster self_exec_id won't even be a speed bump.

Extending self_exec_id to 64bits introduces a problem on 32bit
architectures where reading self_exec_id is no longer atomic and can
take two read instructions.  Which means that is is possible to hit
a window where the read value of exec_id does not match the written
value.  So with very lucky timing after this change this still
remains expoiltable.

I have updated the update of exec_id on exec to use WRITE_ONCE
and the read of exec_id in do_notify_parent to use READ_ONCE
to make it clear that there is no locking between these two
locations.

Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl
Fixes: 2.3.23pre2
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17 10:48:47 +02:00
Boqun Feng
c6090fe788 locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()
[ Upstream commit 25016bd7f4caf5fc983bbab7403d08e64cba3004 ]

Qian Cai reported a bug when PROVE_RCU_LIST=y, and read on /proc/lockdep
triggered a warning:

  [ ] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
  ...
  [ ] Call Trace:
  [ ]  lock_is_held_type+0x5d/0x150
  [ ]  ? rcu_lockdep_current_cpu_online+0x64/0x80
  [ ]  rcu_read_lock_any_held+0xac/0x100
  [ ]  ? rcu_read_lock_held+0xc0/0xc0
  [ ]  ? __slab_free+0x421/0x540
  [ ]  ? kasan_kmalloc+0x9/0x10
  [ ]  ? __kmalloc_node+0x1d7/0x320
  [ ]  ? kvmalloc_node+0x6f/0x80
  [ ]  __bfs+0x28a/0x3c0
  [ ]  ? class_equal+0x30/0x30
  [ ]  lockdep_count_forward_deps+0x11a/0x1a0

The warning got triggered because lockdep_count_forward_deps() call
__bfs() without current->lockdep_recursion being set, as a result
a lockdep internal function (__bfs()) is checked by lockdep, which is
unexpected, and the inconsistency between the irq-off state and the
state traced by lockdep caused the warning.

Apart from this warning, lockdep internal functions like __bfs() should
always be protected by current->lockdep_recursion to avoid potential
deadlocks and data inconsistency, therefore add the
current->lockdep_recursion on-and-off section to protect __bfs() in both
lockdep_count_forward_deps() and lockdep_count_backward_deps()

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200312151258.128036-1-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:48:42 +02:00
Alexander Sverdlin
1b16ddb28b genirq/irqdomain: Check pointer in irq_domain_alloc_irqs_hierarchy()
[ Upstream commit 87f2d1c662fa1761359fdf558246f97e484d177a ]

irq_domain_alloc_irqs_hierarchy() has 3 call sites in the compilation unit
but only one of them checks for the pointer which is being dereferenced
inside the called function. Move the check into the function. This allows
for catching the error instead of the following crash:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
PC is at 0x0
LR is at gpiochip_hierarchy_irq_domain_alloc+0x11f/0x140
...
[<c06c23ff>] (gpiochip_hierarchy_irq_domain_alloc)
[<c0462a89>] (__irq_domain_alloc_irqs)
[<c0462dad>] (irq_create_fwspec_mapping)
[<c06c2251>] (gpiochip_to_irq)
[<c06c1c9b>] (gpiod_to_irq)
[<bf973073>] (gpio_irqs_init [gpio_irqs])
[<bf974048>] (gpio_irqs_exit+0xecc/0xe84 [gpio_irqs])
Code: bad PC value

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200306174720.82604-1-alexander.sverdlin@nokia.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:48:41 +02:00
Michael Wang
2851621747 sched: Avoid scale real weight down to zero
[ Upstream commit 26cf52229efc87e2effa9d788f9b33c40fb3358a ]

During our testing, we found a case that shares no longer
working correctly, the cgroup topology is like:

  /sys/fs/cgroup/cpu/A		(shares=102400)
  /sys/fs/cgroup/cpu/A/B	(shares=2)
  /sys/fs/cgroup/cpu/A/B/C	(shares=1024)

  /sys/fs/cgroup/cpu/D		(shares=1024)
  /sys/fs/cgroup/cpu/D/E	(shares=1024)
  /sys/fs/cgroup/cpu/D/E/F	(shares=1024)

The same benchmark is running in group C & F, no other tasks are
running, the benchmark is capable to consumed all the CPUs.

We suppose the group C will win more CPU resources since it could
enjoy all the shares of group A, but it's F who wins much more.

The reason is because we have group B with shares as 2, since
A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus,
so A->cfs_rq.load.weight become very small.

And in calc_group_shares() we calculate shares as:

  load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);
  shares = (tg_shares * load) / tg_weight;

Since the 'cfs_rq->load.weight' is too small, the load become 0
after scale down, although 'tg_shares' is 102400, shares of the se
which stand for group A on root cfs_rq become 2.

While the se of D on root cfs_rq is far more bigger than 2, so it
wins the battle.

Thus when scale_load_down() scale real weight down to 0, it's no
longer telling the real story, the caller will have the wrong
information and the calculation will be buggy.

This patch add check in scale_load_down(), so the real weight will
be >= MIN_SHARES after scale, after applied the group C wins as
expected.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17 10:48:40 +02:00
Aaro Koskinen
4a853c72f4 UPSTREAM: GKI: panic/reboot: allow specifying reboot_mode for panic only
Allow specifying reboot_mode for panic only.  This is needed on systems
where ramoops is used to store panic logs, and user wants to use warm
reset to preserve those, while still having cold reset on normal
reboots.

Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fi
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b287a25a7148a89d977c819c1f7d6584f875b682)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 154175554
Change-Id: Id1075f4d97eddb818aa495903a7643958e6c73d6
2020-04-17 05:00:40 +00:00
Mark Salyzyn
a6a25a9d07 ANDROID: GKI: qcom: Fix compile issue when setting msm_lmh_dcvs as a module
Partial cherry picked from commit 5cb59eb5283f6f5a900c3c4971f7efbd83b7e43a
("qcom: Fix compile issue when setting msm_lmh_dcvs as a module")
adjusted: kernel/trace/power-traces.c
 skipped: drivers/thermal/qcom/lmh_dbg.h
          drivers/thermal/qcom/msm_lmh_dcvs.c

Export the trace symbol -- clock_set_rate -- for the msm_lmh_dcvs
driver.

Signed-off-by: Will McVicker <willmcvicker@google.com>
Change-Id: Ic68bc07997d73ba55f9ba7deff7dc7eef320e4bf
(cherry picked from commit 5cb59eb5283f6f5a900c3c4971f7efbd83b7e43a)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 154153737
2020-04-17 03:24:20 +00:00
Masahiro Yamada
9cd9f31cf2 UPSTREAM: kheaders: include only headers into kheaders_data.tar.xz
Currently, kheaders_data.tar.xz contains some build scripts as well as
headers. None of them is needed in the header archive.

For ARCH=x86, this commit excludes the following from the archive:

  arch/x86/include/asm/Kbuild
  arch/x86/include/uapi/asm/Kbuild
  include/asm-generic/Kbuild
  include/config/auto.conf
  include/config/kernel.release
  include/config/tristate.conf
  include/uapi/asm-generic/Kbuild
  include/uapi/Kbuild
  kernel/gen_kheaders.sh

This change is actually motivated for the planned header compile-testing
because it will generate more build artifacts, which should not be
included in the archive.

Change-Id: I688e041842740216cace0373ca9f358bc7704809
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
(cherry picked from commit 7199ff7d74003b5aad1e6328bf6128cd8ceea735)
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
2020-04-16 18:00:25 +00:00
Masahiro Yamada
667cbc0e2a UPSTREAM: kheaders: remove meaningless -R option of 'ls'
The -R option of 'ls' is supposed to be used for directories.

   -R, --recursive
          list subdirectories recursively

Since 'find ... -type f' only matches to regular files, we do not
expect directories passed to the 'ls' command here.

Giving -R is harmless at least, but unneeded.

Change-Id: I73588f18e40824ccecc4149fbc467015b5c5e142
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
(cherry picked from commit b60b7c2ea9b7f854d457fefd592c77f621a86580)
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
2020-04-16 18:00:19 +00:00
Saravana Kannan
cb2fe03684 ANDROID: GKI: Add vendor fields to root_domain
This is needed for ABI compatibility

Bug: 153905799
Change-Id: Idd802feeb29652e9f575faff8a0770af5697eedb
Signed-off-by: Saravana Kannan <saravanak@google.com>
2020-04-14 16:21:06 +00:00
Mark Salyzyn
35be952ae0 Revert "BACKPORT: mm: reclaim small amounts of memory when an external fragmentation event occurs"
This reverts commit 5cbbeadd5a.

Reason for revert: revert customized code
Bug: 140544941
Test: boot
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I65735f27f6a44a112957bcec07e2f63f2d8ccff6
2020-04-14 19:35:27 +08:00
Will McVicker
56ebfff5eb ANDROID: GKI: panic: add vendor callback function in panic()
Each vendor might want to implement some debug code when the kernel
panics. So, add a vendor_panic_cb callback for vendors to implement.

Bug: 149258398
Test: compile
Change-Id: I7a374b0089f72c2511db6fe3b8cdd18f41a1eb6c
Signed-off-by: Saravana Kannan <saravanak@google.com>
(cherry picked from commit 911d9c70c2c50b0383ed0b652bb84ca8832e4a2b)
Signed-off-by: Will McVicker <willmcvicker@google.com>
[willmcvicker: only pulled in the ABI diffs]
2020-04-14 05:24:52 +00:00
Will McVicker
0a2394dc5a ANDROID: GKI: export symbols from abi_gki_aarch64_qcom_whitelist
Run the script,
  $ ../build/gki/add_EXPORT_SYMBOL_GPL < abi_gki_aarch64_qcom_whitelist

This will export all the required symbols that are in this kernel.

Signed-off-by: Will McVicker <willmcvicker@google.com>
Bug: 153886473
Test: compile
Change-Id: I703509d75104cd86f472481346e3efbd235121ab
2020-04-13 21:36:41 +00:00
Greg Kroah-Hartman
2d2af525a7 This is the 4.19.115 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6USBUACgkQONu9yGCS
 aT6dNxAA1BJKHbO1TOMTYi21N8XNbMOOblVxrLDe+Y2nEj2KIqiehlsoreV34F/g
 IswNAuA3JXp7pU53RIKsTIWvx9CvNit55sJ1eKWTfFZGCotsBWH9Xzeh9Ao1wURG
 vhE5tX8PUwEzZ/sFphVmVvv5oUkQyHYKpEosyVOqL5eIQe5E430PxB/xvz4I0Vyq
 HHiXmrNekXi5kY156k1RqLQ/RhKMFPNi7swm1uFKLS1qrcIlQzgq5MFk5l59oEMo
 xob25EeeddVa/4roNSVk9IZGZjXpRPsvRM8kxjSXn2KVz1aO8TgYXF1RyWeNthsZ
 VXf6XkasSh3bwMX6imhV1fGmepG3OvSZg0k2EvRTpcY84kFlIrC1l2YuOHrCETgL
 GkptUtGK0a2FEiyBK/0nxvf2E6iaoT4NeTYlyTkL8iOgJ+xMvuSzCpFfQjfkOjGz
 h3AD+Twqu7lqY54nOvyAkA94joEFzVuzSoYCABAImFq4kvu4khhWBXTmkqUf47aI
 1O3m4bMEMLDBRwiBpRsu5c0C+ghHHQtOWTH/UjyOI1aGEKFZyBe5CHYmRo2W9tDg
 rrlymg1iVMR1o9pvzzRroCokKCzBSirEWKxyyMIFWko5xQvTvae5fTIaAWlvBGjP
 oH3eIPDWw1ZD1WxiSGzM2Wx4AyumZ1y3pnOHV3uUnYb3cM0l9g8=
 =bfll
 -----END PGP SIGNATURE-----

Merge 4.19.115 into android-4.19

Changes in 4.19.115
	ipv4: fix a RCU-list lock in fib_triestat_seq_show
	net, ip_tunnel: fix interface lookup with no key
	sctp: fix refcount bug in sctp_wfree
	sctp: fix possibly using a bad saddr with a given dst
	nvme-rdma: Avoid double freeing of async event data
	drm/amd/display: Add link_rate quirk for Apple 15" MBP 2017
	drm/bochs: downgrade pci_request_region failure from error to warning
	initramfs: restore default compression behavior
	drm/amdgpu: fix typo for vcn1 idle check
	tools/power turbostat: Fix gcc build warnings
	tools/power turbostat: Fix missing SYS_LPI counter on some Chromebooks
	drm/etnaviv: replace MMU flush marker with flush sequence
	media: rc: IR signal for Panasonic air conditioner too long
	misc: rtsx: set correct pcr_ops for rts522A
	misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices
	misc: pci_endpoint_test: Avoid using module parameter to determine irqtype
	coresight: do not use the BIT() macro in the UAPI header
	mei: me: add cedar fork device ids
	extcon: axp288: Add wakeup support
	power: supply: axp288_charger: Add special handling for HP Pavilion x2 10
	ALSA: hda/ca0132 - Add Recon3Di quirk to handle integrated sound on EVGA X99 Classified motherboard
	rxrpc: Fix sendmsg(MSG_WAITALL) handling
	net: Fix Tx hash bound checking
	padata: always acquire cpu_hotplug_lock before pinst->lock
	bitops: protect variables in set_mask_bits() macro
	include/linux/notifier.h: SRCU: fix ctags
	mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
	ipv6: don't auto-add link-local address to lag ports
	net: dsa: bcm_sf2: Do not register slave MDIO bus with OF
	net: dsa: bcm_sf2: Ensure correct sub-node is parsed
	net: phy: micrel: kszphy_resume(): add delay after genphy_resume() before accessing PHY registers
	net: stmmac: dwmac1000: fix out-of-bounds mac address reg setting
	slcan: Don't transmit uninitialized stack data in padding
	mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_VLAN_MANGLE
	random: always use batched entropy for get_random_u{32,64}
	usb: dwc3: gadget: Wrap around when skip TRBs
	tools/accounting/getdelays.c: fix netlink attribute length
	hwrng: imx-rngc - fix an error path
	ASoC: jz4740-i2s: Fix divider written at incorrect offset in register
	IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
	IB/hfi1: Fix memory leaks in sysfs registration and unregistration
	ceph: remove the extra slashes in the server path
	ceph: canonicalize server path in place
	RDMA/ucma: Put a lock around every call to the rdma_cm layer
	RDMA/cma: Teach lockdep about the order of rtnl and lock
	Bluetooth: RFCOMM: fix ODEBUG bug in rfcomm_dev_ioctl
	RDMA/cm: Update num_paths in cma_resolve_iboe_route error flow
	fbcon: fix null-ptr-deref in fbcon_switch
	clk: qcom: rcg: Return failure for RCG update
	drm/msm: stop abusing dma_map/unmap for cache
	arm64: Fix size of __early_cpu_boot_status
	rpmsg: glink: Remove chunk size word align warning
	usb: dwc3: don't set gadget->is_otg flag
	drm_dp_mst_topology: fix broken drm_dp_sideband_parse_remote_dpcd_read()
	drm/msm: Use the correct dma_sync calls in msm_gem
	Linux 4.19.115

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idc17d8aa387491167efc60df0a9764b82e4344da
2020-04-13 13:09:17 +02:00
Daniel Jordan
bf498d6b8d padata: always acquire cpu_hotplug_lock before pinst->lock
commit 38228e8848cd7dd86ccb90406af32de0cad24be3 upstream.

lockdep complains when padata's paths to update cpumasks via CPU hotplug
and sysfs are both taken:

  # echo 0 > /sys/devices/system/cpu/cpu1/online
  # echo ff > /sys/kernel/pcrypt/pencrypt/parallel_cpumask

  ======================================================
  WARNING: possible circular locking dependency detected
  5.4.0-rc8-padata-cpuhp-v3+ #1 Not tainted
  ------------------------------------------------------
  bash/205 is trying to acquire lock:
  ffffffff8286bcd0 (cpu_hotplug_lock.rw_sem){++++}, at: padata_set_cpumask+0x2b/0x120

  but task is already holding lock:
  ffff8880001abfa0 (&pinst->lock){+.+.}, at: padata_set_cpumask+0x26/0x120

  which lock already depends on the new lock.

padata doesn't take cpu_hotplug_lock and pinst->lock in a consistent
order.  Which should be first?  CPU hotplug calls into padata with
cpu_hotplug_lock already held, so it should have priority.

Fixes: 6751fb3c0e ("padata: Use get_online_cpus/put_online_cpus")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-13 10:45:05 +02:00
Tri Vo
e64d8dcc68 ANDROID: GKI: kernel/dma, mm/cma: Export symbols needed by vendor modules
This allows them to work with a GKI kernel.

Bug: 140651649
Bug: 140651863
Change-Id: I41ae14d90df31d552b2a0eab89a20d7ba8a9243d
Signed-off-by: Tri Vo <trong@google.com>
[saravanak partial cherry-pick and dropped a ton of changes]
Signed-off-by: Saravana Kannan <saravanak@google.com>
2020-04-10 02:39:49 +00:00
Saravana Kannan
61a3356ad1 ANDROID: GKI: Add stub __cpu_isolated_mask symbol
Some vendor modules might want to keep track of isolated CPUs. Add a
stub symbol that never isolates any CPU.

Bug: 149816871
Signed-off-by: Saravana Kannan <saravanak@google.com>
Change-Id: Ia494314168e94d72b0c1e8b786c150b9403dbf1a
2020-04-10 00:48:38 +00:00
Quentin Perret
eead51495c ANDROID: GKI: sched: stub sched_isolate symbols
These are needed to let modules load during compliance testing, but the
underlying core-isolation feature is not necessary for android-common.

Bug: 149816871
Test: compiled, checked abi diff for missing sched_*isolate* symbols
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Iaece1e98f821c50f2497b4a47b60714f49272750
2020-04-10 00:48:27 +00:00
Kelly Rossmoyer
e7b509cf04 ANDROID: power: wakeup_reason: wake reason enhancements
These changes build upon the existing Android kernel wakeup reason code
to:
* improve the positioning of suspend abort logging calls in suspend flow
* add logging of abnormal wakeup reasons like unexpected HW IRQs and
  IRQs configured as both wake-enabled and no-suspend
* add support for capturing deferred-processing threaded nested IRQs as
  wakeup reasons rather than their synchronously-processed parents

Bug: 150970830
Bug: 140217217

Signed-off-by: Kelly Rossmoyer <krossmo@google.com>
Change-Id: I903b811a0fe11a605a25815c3a341668a23de700
2020-04-09 15:27:37 +00:00
Lina Iyer
723feab600 ANDROID: GKI: QoS: Enhance framework to support cpu/irq specific QoS requests
QoS request for CPU_DMA_LATENCY can be better optimized if the request
can be set only for the required cpus and not all cpus. This helps save
power on other cores, while still gauranteeing the quality of service.

Enhance the QoS constraints data structures to support target value for
each core. Requests specify if the QoS is applicable to all cores
(default) or to a selective subset of the cores or to a core(s), that the
IRQ is affine to.

QoS requests that need to track an IRQ can be set to apply only on the
cpus to which the IRQ's smp_affinity attribute is set to. The QoS
framework will automatically track IRQ migration between the cores. The
QoS is updated to be applied only to the core(s) that the IRQ has been
migrated to.

Idle and interested drivers can request a PM QoS value for a constraint
across all cpus, or a specific cpu or a set of cpus. Separate APIs have
been added to request for individual cpu or a cpumask.  The default
behaviour of PM QoS is maintained i.e, requests that do not specify a
type of the request will continue to be effected on all cores.  Requests
that want to specify an affinity of cpu(s) or an irq, can modify the PM
QoS request data structures by specifying the type of the request and
either the mask of the cpus or the IRQ number depending on the type.
Updating the request does not reset the type of the request.

The userspace sysfs interface does not support CPU/IRQ affinity.

Signed-off-by: Lina Iyer <ilina@codeaurora.org>
(cherry picked from commit 16cafdb44f)

Bug: 153463922
Test: build
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I1aad836a985fd303f2254fe607bb909a6b720dd5
2020-04-08 16:37:25 +00:00
Eric Biggers
a898a4764c FROMLIST: kmod: make request_module() return an error when autoloading is disabled
It's long been possible to disable kernel module autoloading completely
(while still allowing manual module insertion) by setting
/proc/sys/kernel/modprobe to the empty string.  This can be preferable
to setting it to a nonexistent file since it avoids the overhead of an
attempted execve(), avoids potential deadlocks, and avoids the call to
security_kernel_module_request() and thus on SELinux-based systems
eliminates the need to write SELinux rules to dontaudit module_request.

However, when module autoloading is disabled in this way,
request_module() returns 0.  This is broken because callers expect 0 to
mean that the module was successfully loaded.

Apparently this was never noticed because this method of disabling
module autoloading isn't used much, and also most callers don't use the
return value of request_module() since it's always necessary to check
whether the module registered its functionality or not anyway.  But
improperly returning 0 can indeed confuse a few callers, for example
get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit:

	if (!fs && (request_module("fs-%.*s", len, name) == 0)) {
		fs = __get_fs_type(name, len);
		WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name);
	}

This is easily reproduced with:

	echo > /proc/sys/kernel/modprobe
	mount -t NONEXISTENT none /

It causes:

	request_module fs-NONEXISTENT succeeded, but still no fs?
	WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0
	[...]

This should actually use pr_warn_once() rather than WARN_ONCE(), since
it's also user-reachable if userspace immediately unloads the module.
Regardless, request_module() should correctly return an error when it
fails.  So let's make it return -ENOENT, which matches the error when
the modprobe binary doesn't exist.

I've also sent patches to document and test this case.

Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: NeilBrown <neilb@suse.com>
Link: https://lore.kernel.org/r/20200318230515.171692-2-ebiggers@kernel.org
Bug: 151589316
Change-Id: I5e04f85e12a4f85da23e53bc11da1ade565abcd6
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-04-06 10:44:34 -07:00
Alistair Delva
182b38fc31 ANDROID: Fix wq fp check for CFI builds
A previous change added a test on the wrong config flag; rename
CFI to CFI_CLANG.

Bug: 145210207
Change-Id: Id8aead2eb2c75ad6442d10165f6cb86ccfb9c2f9
Signed-off-by: Alistair Delva <adelva@google.com>
2020-04-04 16:30:40 +00:00
Greg Kroah-Hartman
6ca29140d7 This is the 4.19.114 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6F6HkACgkQONu9yGCS
 aT7FkxAAgZOwRDVRkqjfSE+MBAqbE41sO3iAWmv9gQazdK+APGdQaasQ73gBdcuQ
 wliG5W9k9J0qkcnUIAnEgooAWXB9+7p4NF1BZHmpmYleXZckmXtaDK3cKgFWAOVD
 KMQgiEYHgdm6otlNf328uOmoaggN1wRqmMsW/PZys0AvQ183oTsidhQwfOofCt3k
 LwJiu5o+gJCIePrqKuHtkteKmjFR1KQ2RZHPmJ2ApoxVymBreJWKMl8ZVCRyteDx
 JoWZfprPnZZaqb83ylkpE/lXyut0etT2zmI+W/Bg4LFDZTVfqw+HPB7opvITfP0p
 6H0YwH9Qn/BiOcP6JncVUPLe8/bEiOJ/jsJwPRCcl0C7PmDrn6uhBNVfrY4CreAL
 h38/vKSwK8iduyPpne6zq6hQDYBTdEpBDtXFsnElNBmyIE7yIH3ta8qDYsW13Fr7
 x9U7F9KagIR1AH2b/uMzjlTDv85hvzGP8vS06S1gJn6RJP0WSDtpE7RNT6MkfMIw
 Ti16a9nEJ3H+Zn76vdvlLirmziETsIVpxHSDRu/X9QfxJmXHnXg7581bu8aGZ1zN
 6xwWP9mWA8KJzbX5mxXChHoZ9qQ/o4D10MxS+7DXFYiya4prHWphyTS2MYbzMzIl
 TIOJ54FVg01QiQbh29X05hvd3RMOkdzJ9Tggq8oTSLvgTIUSmi0=
 =jtGQ
 -----END PGP SIGNATURE-----

Merge 4.19.114 into android-4.19

Changes in 4.19.114
	mmc: core: Allow host controllers to require R1B for CMD6
	mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard
	mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command
	mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY
	mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY
	Revert "drm/dp_mst: Skip validating ports during destruction, just ref"
	geneve: move debug check after netdev unregister
	hsr: fix general protection fault in hsr_addr_is_self()
	macsec: restrict to ethernet devices
	mlxsw: spectrum_mr: Fix list iteration in error path
	net: cbs: Fix software cbs to consider packet sending time
	net: dsa: Fix duplicate frames flooded by learning
	net: mvneta: Fix the case where the last poll did not process all rx
	net/packet: tpacket_rcv: avoid a producer race condition
	net: qmi_wwan: add support for ASKEY WWHC050
	net_sched: cls_route: remove the right filter from hashtable
	net_sched: keep alloc_hash updated after hash allocation
	net: stmmac: dwmac-rk: fix error path in rk_gmac_probe
	NFC: fdp: Fix a signedness bug in fdp_nci_send_patch()
	slcan: not call free_netdev before rtnl_unlock in slcan_open
	bnxt_en: fix memory leaks in bnxt_dcbnl_ieee_getets()
	bnxt_en: Reset rings if ring reservation fails during open()
	net: ip_gre: Separate ERSPAN newlink / changelink callbacks
	net: ip_gre: Accept IFLA_INFO_DATA-less configuration
	net: dsa: mt7530: Change the LINK bit to reflect the link status
	net: phy: mdio-mux-bcm-iproc: check clk_prepare_enable() return value
	r8169: re-enable MSI on RTL8168c
	tcp: repair: fix TCP_QUEUE_SEQ implementation
	vxlan: check return value of gro_cells_init()
	hsr: use rcu_read_lock() in hsr_get_node_{list/status}()
	hsr: add restart routine into hsr_get_node_list()
	hsr: set .netnsok flag
	cgroup-v1: cgroup_pidlist_next should update position index
	nfs: add minor version to nfs_server_key for fscache
	cpupower: avoid multiple definition with gcc -fno-common
	drivers/of/of_mdio.c:fix of_mdiobus_register()
	cgroup1: don't call release_agent when it is ""
	dt-bindings: net: FMan erratum A050385
	arm64: dts: ls1043a: FMan erratum A050385
	fsl/fman: detect FMan erratum A050385
	s390/qeth: handle error when backing RX buffer
	scsi: ipr: Fix softlockup when rescanning devices in petitboot
	mac80211: Do not send mesh HWMP PREQ if HWMP is disabled
	dpaa_eth: Remove unnecessary boolean expression in dpaa_get_headroom
	sxgbe: Fix off by one in samsung driver strncpy size arg
	ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare()
	i2c: hix5hd2: add missed clk_disable_unprepare in remove
	Input: raydium_i2c_ts - fix error codes in raydium_i2c_boot_trigger()
	Input: synaptics - enable RMI on HP Envy 13-ad105ng
	Input: avoid BIT() macro usage in the serio.h UAPI header
	ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
	ARM: dts: dra7: Add bus_dma_limit for L3 bus
	ARM: dts: omap5: Add bus_dma_limit for L3 bus
	perf probe: Do not depend on dwfl_module_addrsym()
	tools: Let O= makes handle a relative path with -C option
	scripts/dtc: Remove redundant YYLOC global declaration
	scsi: sd: Fix optimal I/O size for devices that change reported values
	nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute type
	mac80211: mark station unauthorized before key removal
	gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
	gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
	gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
	RDMA/core: Ensure security pkey modify is not lost
	genirq: Fix reference leaks on irq affinity notifiers
	xfrm: handle NETDEV_UNREGISTER for xfrm device
	vti[6]: fix packet tx through bpf_redirect() in XinY cases
	RDMA/mlx5: Block delay drop to unprivileged users
	xfrm: fix uctx len check in verify_sec_ctx_len
	xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire
	xfrm: policy: Fix doulbe free in xfrm_policy_timer
	afs: Fix some tracing details
	netfilter: flowtable: reload ip{v6}h in nf_flow_tuple_ip{v6}
	netfilter: nft_fwd_netdev: validate family and chain type
	bpf/btf: Fix BTF verification of enum members in struct/union
	vti6: Fix memory leak of skb if input policy check fails
	Revert "r8169: check that Realtek PHY driver module is loaded"
	mac80211: add option for setting control flags
	mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX
	USB: serial: option: add support for ASKEY WWHC050
	USB: serial: option: add BroadMobi BM806U
	USB: serial: option: add Wistron Neweb D19Q1
	USB: cdc-acm: restore capability check order
	USB: serial: io_edgeport: fix slab-out-of-bounds read in edge_interrupt_callback
	usb: musb: fix crash with highmen PIO and usbmon
	media: flexcop-usb: fix endpoint sanity check
	media: usbtv: fix control-message timeouts
	staging: rtl8188eu: Add ASUS USB-N10 Nano B1 to device table
	staging: wlan-ng: fix ODEBUG bug in prism2sta_disconnect_usb
	staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback
	ahci: Add Intel Comet Lake H RAID PCI ID
	libfs: fix infoleak in simple_attr_read()
	media: ov519: add missing endpoint sanity checks
	media: dib0700: fix rc endpoint lookup
	media: stv06xx: add missing descriptor sanity checks
	media: xirlink_cit: add missing descriptor sanity checks
	mac80211: Check port authorization in the ieee80211_tx_dequeue() case
	mac80211: fix authentication with iwlwifi/mvm
	vt: selection, introduce vc_is_sel
	vt: ioctl, switch VT_IS_IN_USE and VT_BUSY to inlines
	vt: switch vt_dont_switch to bool
	vt: vt_ioctl: remove unnecessary console allocation checks
	vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console
	vt: vt_ioctl: fix use-after-free in vt_in_use()
	platform/x86: pmc_atom: Add Lex 2I385SW to critclk_systems DMI table
	bpf: Explicitly memset the bpf_attr structure
	bpf: Explicitly memset some bpf info structures declared on the stack
	gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
	net: ks8851-ml: Fix IO operations, again
	arm64: alternative: fix build with clang integrated assembler
	perf map: Fix off by one in strncpy() size argument
	ARM: dts: oxnas: Fix clear-mask property
	ARM: bcm2835-rpi-zero-w: Add missing pinctrl name
	ARM: dts: imx6: phycore-som: fix arm and soc minimum voltage
	ARM: dts: N900: fix onenand timings
	arm64: dts: ls1043a-rdb: correct RGMII delay mode to rgmii-id
	arm64: dts: ls1046ardb: set RGMII interfaces to RGMII_ID mode
	Linux 4.19.114

Change-Id: Icc165d2e49aba750e1b5a8856d9774c149e59ce7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-04-03 08:17:23 +02:00
Saravana Kannan
1d887ea976 ANDROID: GKI: kernel: Export task and IRQ affinity symbols
A module uses these symbols. So, export them to allow loading of that
module.

Bug: 149816871
Bug: 149256712
Signed-off-by: Saravana Kannan <saravanak@google.com>
Change-Id: I949da5d091894ea3d79a6c9244bfc2f8426eee71
(cherry picked from commit dc928ba3bdfb4527e0ffca7c491d946a02e5bd11)
[ qperret: made changes to commit message for AOSP compliance ]
Signed-off-by: Quentin Perret <qperret@google.com>
2020-04-02 16:27:12 -07:00
Greg Kroah-Hartman
638d8c748e bpf: Explicitly memset some bpf info structures declared on the stack
commit 5c6f25887963f15492b604dd25cb149c501bbabf upstream.

Trying to initialize a structure with "= {};" will not always clean out
all padding locations in a structure. So be explicit and call memset to
initialize everything for a number of bpf information structures that
are then copied from userspace, sometimes from smaller memory locations
than the size of the structure.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200320162258.GA794295@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02 15:28:23 +02:00
Greg Kroah-Hartman
aca6a9b098 bpf: Explicitly memset the bpf_attr structure
commit 8096f229421f7b22433775e928d506f0342e5907 upstream.

For the bpf syscall, we are relying on the compiler to properly zero out
the bpf_attr union that we copy userspace data into. Unfortunately that
doesn't always work properly, padding and other oddities might not be
correctly zeroed, and in some tests odd things have been found when the
stack is pre-initialized to other values.

Fix this by explicitly memsetting the structure to 0 before using it.

Reported-by: Maciej Żenczykowski <maze@google.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Alexander Potapenko <glider@google.com>
Reported-by: Alistair Delva <adelva@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://android-review.googlesource.com/c/kernel/common/+/1235490
Link: https://lore.kernel.org/bpf/20200320094813.GA421650@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02 15:28:23 +02:00
Yoshiki Komachi
fb957d1003 bpf/btf: Fix BTF verification of enum members in struct/union
commit da6c7faeb103c493e505e87643272f70be586635 upstream.

btf_enum_check_member() was currently sure to recognize the size of
"enum" type members in struct/union as the size of "int" even if
its size was packed.

This patch fixes BTF enum verification to use the correct size
of member in BPF programs.

Fixes: 179cde8cef ("bpf: btf: Check members of struct/union")
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1583825550-18606-2-git-send-email-komachi.yoshiki@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02 15:28:19 +02:00
Edward Cree
277db1b634 genirq: Fix reference leaks on irq affinity notifiers
commit df81dfcfd6991d547653d46c051bac195cd182c1 upstream.

The handling of notify->work did not properly maintain notify->kref in two
 cases:
1) where the work was already scheduled, another irq_set_affinity_locked()
   would get the ref and (no-op-ly) schedule the work.  Thus when
   irq_affinity_notify() ran, it would drop the original ref but not the
   additional one.
2) when cancelling the (old) work in irq_set_affinity_notifier(), if there
   was outstanding work a ref had been got for it but was never put.
Fix both by checking the return values of the work handling functions
 (schedule_work() for (1) and cancel_work_sync() for (2)) and put the
 extra ref if the return value indicates preexisting work.

Fixes: cd7eab44e9 ("genirq: Add IRQ affinity notifiers")
Fixes: 59c39840f5ab ("genirq: Prevent use-after-free and work list corruption")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Link: https://lkml.kernel.org/r/24f5983f-2ab5-e83a-44ee-a45b5f9300f5@solarflare.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02 15:28:18 +02:00
Tycho Andersen
5a8a69435d cgroup1: don't call release_agent when it is ""
[ Upstream commit 2e5383d7904e60529136727e49629a82058a5607 ]

Older (and maybe current) versions of systemd set release_agent to "" when
shutting down, but do not set notify_on_release to 0.

Since 64e90a8acb ("Introduce STATIC_USERMODEHELPER to mediate
call_usermodehelper()"), we filter out such calls when the user mode helper
path is "". However, when used in conjunction with an actual (i.e. non "")
STATIC_USERMODEHELPER, the path is never "", so the real usermode helper
will be called with argv[0] == "".

Let's avoid this by not invoking the release_agent when it is "".

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-02 15:28:14 +02:00
Vasily Averin
967e97461e cgroup-v1: cgroup_pidlist_next should update position index
[ Upstream commit db8dd9697238be70a6b4f9d0284cd89f59c0e070 ]

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

 # mount | grep cgroup
 # dd if=/mnt/cgroup.procs bs=1  # normal output
...
1294
1295
1296
1304
1382
584+0 records in
584+0 records out
584 bytes copied

dd: /mnt/cgroup.procs: cannot skip to specified offset
83  <<< generates end of last line
1383  <<< ... and whole last line once again
0+1 records in
0+1 records out
8 bytes copied

dd: /mnt/cgroup.procs: cannot skip to specified offset
1386  <<< generates last line anyway
0+1 records in
0+1 records out
5 bytes copied

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-02 15:28:14 +02:00
Qais Yousef
b38208f02b FROMGIT: sched/rt: cpupri_find: Trigger a full search as fallback
If we failed to find a fitting CPU, in cpupri_find(), we only fallback
to the level we found a hit at.

But Steve suggested to fallback to a second full scan instead as this
could be a better effort.

	https://lore.kernel.org/lkml/20200304135404.146c56eb@gandalf.local.home/

We trigger the 2nd search unconditionally since the argument about
triggering a full search is that the recorded fall back level might have
become empty by then. Which means storing any data about what happened
would be meaningless and stale.

I had a humble try at timing it and it seemed okay for the small 6 CPUs
system I was running on

	https://lore.kernel.org/lkml/20200305124324.42x6ehjxbnjkklnh@e107158-lin.cambridge.arm.com/

On large system this second full scan could be expensive. But there are
no users outside capacity awareness for this fitness function at the
moment. Heterogeneous systems tend to be small with 8cores in total.

Bug: 120440300
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20200310142219.syxzn5ljpdxqtbgx@e107158-lin.cambridge.arm.com
(cherry picked from commit e94f80f6c49020008e6fa0f3d4b806b8595d17d8
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ib20d400be47cd913a43a5c71fafee6a7fffb78aa
2020-03-26 12:04:38 +00:00
Qais Yousef
32061ff6b5 FROMGIT: sched/rt: Remove unnecessary push for unfit tasks
In task_woken_rt() and switched_to_rto() we try trigger push-pull if the
task is unfit.

But the logic is found lacking because if the task was the only one
running on the CPU, then rt_rq is not in overloaded state and won't
trigger a push.

The necessity of this logic was under a debate as well, a summary of
the discussion can be found in the following thread:

  https://lore.kernel.org/lkml/20200226160247.iqvdakiqbakk2llz@e107158-lin.cambridge.arm.com/

Remove the logic for now until a better approach is agreed upon.

Bug: 120440300
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: 804d402fb6f6 ("sched/rt: Make RT capacity-aware")
Link: https://lkml.kernel.org/r/20200302132721.8353-6-qais.yousef@arm.com
(cherry picked from commit d94a9df49069ba8ff7c4aaeca1229e6471a01a15
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Id120ada4a89972b3feb8d8b022babb98db1a157f
2020-03-26 12:04:38 +00:00
Qais Yousef
8efce17187 BACKPORT: FROMGIT: sched/rt: Allow pulling unfitting task
When implemented RT Capacity Awareness; the logic was done such that if
a task was running on a fitting CPU, then it was sticky and we would try
our best to keep it there.

But as Steve suggested, to adhere to the strict priority rules of RT
class; allow pulling an RT task to unfitting CPU to ensure it gets a
chance to run ASAP.

Bug: 120440300
LINK: https://lore.kernel.org/lkml/20200203111451.0d1da58f@oasis.local.home/
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: 804d402fb6f6 ("sched/rt: Make RT capacity-aware")
Link: https://lkml.kernel.org/r/20200302132721.8353-5-qais.yousef@arm.com
(cherry picked from commit 98ca645f824301bde72e0a51cdc8bdbbea6774a5
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
[Trivial merge conflict]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ie25fa5a4f3b0979ed06df8d156e5586b2928479e
2020-03-26 12:04:38 +00:00
Qais Yousef
2bf4a52c7a FROMGIT: sched/rt: Optimize cpupri_find() on non-heterogenous systems
By introducing a new cpupri_find_fitness() function that takes the
fitness_fn as an argument and only called when asym_system static key is
enabled.

cpupri_find() is now a wrapper function that calls cpupri_find_fitness()
passing NULL as a fitness_fn, hence disabling the logic that handles
fitness by default.

Bug: 120440300
LINK: https://lore.kernel.org/lkml/c0772fca-0a4b-c88d-fdf2-5715fcf8447b@arm.com/
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: 804d402fb6f6 ("sched/rt: Make RT capacity-aware")
Link: https://lkml.kernel.org/r/20200302132721.8353-4-qais.yousef@arm.com
(cherry picked from commit a1bd02e1f28b1939cac8c64072a0e578c3cbc345
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I8ad4d9e391030ae499f7a1805485147de64abcdf
2020-03-26 12:04:38 +00:00
Qais Yousef
27d84b63b0 FROMGIT: sched/rt: Re-instate old behavior in select_task_rq_rt()
When RT Capacity Aware support was added, the logic in select_task_rq_rt
was modified to force a search for a fitting CPU if the task currently
doesn't run on one.

But if the search failed, and the search was only triggered to fulfill
the fitness request; we could end up selecting a new CPU unnecessarily.

Fix this and re-instate the original behavior by ensuring we bail out
in that case.

This behavior change only affected asymmetric systems that are using
util_clamp to implement capacity aware. None asymmetric systems weren't
affected.

Bug: 120440300
LINK: https://lore.kernel.org/lkml/20200218041620.GD28029@codeaurora.org/
Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: 804d402fb6f6 ("sched/rt: Make RT capacity-aware")
Link: https://lkml.kernel.org/r/20200302132721.8353-3-qais.yousef@arm.com
(cherry picked from commit b28bc1e002c23ff8a4999c4a2fb1d4d412bc6f5e
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I670ab7f95a3bd8b4790e1cafe89308ead524367e
2020-03-26 12:04:38 +00:00
Qais Yousef
ca79ac3cba BACKPORT: FROMGIT: sched/rt: cpupri_find: Implement fallback mechanism for !fit case
When searching for the best lowest_mask with a fitness_fn passed, make
sure we record the lowest_level that returns a valid lowest_mask so that
we can use that as a fallback in case we fail to find a fitting CPU at
all levels.

The intention in the original patch was not to allow a down migration to
unfitting CPU. But this missed the case where we are already running on
unfitting one.

With this change now RT tasks can still move between unfitting CPUs when
they're already running on such CPU.

And as Steve suggested; to adhere to the strict priority rules of RT, if
a task is already running on a fitting CPU but due to priority it can't
run on it, allow it to downmigrate to unfitting CPU so it can run.

Bug: 120440300
Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: 804d402fb6f6 ("sched/rt: Make RT capacity-aware")
Link: https://lkml.kernel.org/r/20200302132721.8353-2-qais.yousef@arm.com
Link: https://lore.kernel.org/lkml/20200203142712.a7yvlyo2y3le5cpn@e107158-lin/
(cherry picked from commit d9cb236b9429044dc694ea70a50163ddd283cea6
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
[Trivial merge conflict]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I3430e9624f8f7b11d3875c39c5765a51aec4a6f5
2020-03-26 12:04:38 +00:00
Greg Kroah-Hartman
248555d63c This is the 4.19.113 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl57BzQACgkQONu9yGCS
 aT4sXg/9ERHCo0CNoF+KpzfPcH718NEzICFa5pHVE5OJnjGQ+W+AUx4ERkU9gMrk
 W4N+zAFH+v6D/ejNVUPKVB+XMXeZo+QXVBfLme/N4VlrxaSeA2pMxiRrXVY1UX3/
 mb6eShhpBn5q942c8QkzcQJbA1TUKnrqGuqq6rfDQTnjm6OcU/PbPHWqOBVIuOVk
 Op9VSCcTC41N8sCdsjg411fRBue24+zRU05mw4leeqh0f/XaLjZ9xJyEAuW+6zIz
 Vu/8c/vzb4j9TPBg0FaEzgDrR7TUwde7F/O9eejY/GWdQngyfPxjOGpKdakrQOOA
 qRnPv0Fou8hFlKZWFKJhAYcPpTc7KP/y9o1aUsNDrIeA00t9R3nDKng+ewW09hfk
 K75Znyw6yrWlc5nHotd1pG2NEEeDvSKdXVbyarnaTwqunla9WAdQ0IDFwdaiTklt
 CfrJ+AJcd5Smnuo+JfljqF4oF88UJSzhI5hp0Zi9w0JROfbPOYFK0JM2DeEOE27J
 IFm1Z5lvTj4VEJmyLL7CvJSM23yjK5todlG3+zFJt2ZncY2Kw1eHEOIvIwbBtxBp
 2AWRkco+hsf+GToJNQopxGYyTjMI3NDy/FocAVIJ8wMSEWZkyS8NkIlgjPjTC2dk
 ygJ0ZDDiPU0pKouZofQhzGR/Esv/phjWvTLPerFkKIIuiaG+uUo=
 =1W2f
 -----END PGP SIGNATURE-----

Merge 4.19.113 into android-4.19

Changes in 4.19.113
	drm/mediatek: Find the cursor plane instead of hard coding it
	spi: qup: call spi_qup_pm_resume_runtime before suspending
	powerpc: Include .BTF section
	ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes
	spi: pxa2xx: Add CS control clock quirk
	spi/zynqmp: remove entry that causes a cs glitch
	drm/exynos: dsi: propagate error value and silence meaningless warning
	drm/exynos: dsi: fix workaround for the legacy clock name
	drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer
	altera-stapl: altera_get_note: prevent write beyond end of 'key'
	dm bio record: save/restore bi_end_io and bi_integrity
	dm integrity: use dm_bio_record and dm_bio_restore
	riscv: avoid the PIC offset of static percpu data in module beyond 2G limits
	drm/amd/display: Clear link settings on MST disable connector
	drm/amd/display: fix dcc swath size calculations on dcn1
	xenbus: req->body should be updated before req->state
	xenbus: req->err should be updated before req->state
	block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
	parse-maintainers: Mark as executable
	USB: Disable LPM on WD19's Realtek Hub
	usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters
	USB: serial: option: add ME910G1 ECM composition 0x110b
	usb: host: xhci-plat: add a shutdown
	USB: serial: pl2303: add device-id for HP LD381
	usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c
	ALSA: line6: Fix endless MIDI read loop
	ALSA: seq: virmidi: Fix running status after receiving sysex
	ALSA: seq: oss: Fix running status after receiving sysex
	ALSA: pcm: oss: Avoid plugin buffer overflow
	ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks
	iio: st_sensors: remap SMO8840 to LIS2DH12
	iio: trigger: stm32-timer: disable master mode when stopping
	iio: magnetometer: ak8974: Fix negative raw values in sysfs
	iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode
	mmc: rtsx_pci: Fix support for speed-modes that relies on tuning
	mmc: sdhci-of-at91: fix cd-gpios for SAMA5D2
	staging: rtl8188eu: Add device id for MERCUSYS MW150US v2
	staging: greybus: loopback_test: fix poll-mask build breakage
	staging/speakup: fix get_word non-space look-ahead
	intel_th: Fix user-visible error codes
	intel_th: pci: Add Elkhart Lake CPU support
	rtc: max8907: add missing select REGMAP_IRQ
	xhci: Do not open code __print_symbolic() in xhci trace events
	btrfs: fix log context list corruption after rename whiteout error
	drm/amd/amdgpu: Fix GPR read from debugfs (v2)
	drm/lease: fix WARNING in idr_destroy
	memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
	mm: slub: be more careful about the double cmpxchg of freelist
	mm, slub: prevent kmalloc_node crashes and memory leaks
	page-flags: fix a crash at SetPageError(THP_SWAP)
	x86/mm: split vmalloc_sync_all()
	USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL
	USB: cdc-acm: fix rounding error in TIOCSSERIAL
	iio: light: vcnl4000: update sampling periods for vcnl4200
	kbuild: Disable -Wpointer-to-enum-cast
	futex: Fix inode life-time issue
	futex: Unbreak futex hashing
	Revert "vrf: mark skb for multicast or link-local as enslaved to VRF"
	Revert "ipv6: Fix handling of LLA with VRF and sockets bound to VRF"
	ALSA: hda/realtek: Fix pop noise on ALC225
	arm64: smp: fix smp_send_stop() behaviour
	arm64: smp: fix crash_smp_send_stop() behaviour
	drm/bridge: dw-hdmi: fix AVI frame colorimetry
	staging: greybus: loopback_test: fix potential path truncation
	staging: greybus: loopback_test: fix potential path truncations
	Linux 4.19.113

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I90c48cd7189a964e59d199ecc0f32c0a68688ec5
2020-03-25 09:50:38 +01:00
Thomas Gleixner
17a8ca79a5 futex: Unbreak futex hashing
commit 8d67743653dce5a0e7aa500fcccb237cde7ad88e upstream.

The recent futex inode life time fix changed the ordering of the futex key
union struct members, but forgot to adjust the hash function accordingly,

As a result the hashing omits the leading 64bit and even hashes beyond the
futex key causing a bad hash distribution which led to a ~100% performance
regression.

Hand in the futex key pointer instead of a random struct member and make
the size calculation based of the struct offset.

Fixes: 8019ad13ef7f ("futex: Fix inode life-time issue")
Reported-by: Rong Chen <rong.a.chen@intel.com>
Decoded-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Rong Chen <rong.a.chen@intel.com>
Link: https://lkml.kernel.org/r/87h7yy90ve.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 08:06:14 +01:00
Peter Zijlstra
e6d506cd22 futex: Fix inode life-time issue
commit 8019ad13ef7f64be44d4f892af9c840179009254 upstream.

As reported by Jann, ihold() does not in fact guarantee inode
persistence. And instead of making it so, replace the usage of inode
pointers with a per boot, machine wide, unique inode identifier.

This sequence number is global, but shared (file backed) futexes are
rare enough that this should not become a performance issue.

Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 08:06:14 +01:00
Joerg Roedel
6c1051ffc7 x86/mm: split vmalloc_sync_all()
commit 763802b53a427ed3cbd419dbba255c414fdd9e7c upstream.

Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
the vunmap() code-path.  While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.

Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap().  But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.

To avoid the unnecessary work on x86-64 and to gain the performance
back, split up vmalloc_sync_all() into two functions:

	* vmalloc_sync_mappings(), and
	* vmalloc_sync_unmappings()

Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized.  The only exception is the new call-site added in the
above mentioned commit.

Shile Zhang directed us to a report of an 80% regression in reaim
throughput.

Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[GHES]
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 08:06:13 +01:00
Mahesh Sivasubramanian
db6999554d ANDROID: GKI: kernel: tick-sched: Add API to get the next wakeup for a CPU
Add get_next_event_cpu to get the next wakeup time for the CPU. This
is used by the sleep driver if it has to query the next wakeup for a
CPU other than the thread that its running on.

Test: make
Bug: 150895657
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
Change-Id: I0f0347f9648932a55cb64c630694d0a2e290b633
(cherry picked from commit e948653a6a)
[hridya: also added an EXPORT_SYMBOL_GPL(get_next_event_cpu) to make
the symbol available to kernel modules.]
Signed-off-by: Hridya Valsaraju <hridya@google.com>
2020-03-23 12:21:04 -07:00
Greg Kroah-Hartman
cb57b8b85f UPSTREAM: bpf: Explicitly memset some bpf info structures declared on the stack
Trying to initialize a structure with "= {};" will not always clean out
all padding locations in a structure. So be explicit and call memset to
initialize everything for a number of bpf information structures that
are then copied from userspace, sometimes from smaller memory locations
than the size of the structure.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200320162258.GA794295@kroah.com
(cherry picked from commit 269efb7fc478563a7e7b22590d8076823f4ac82a)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I52a2cab20aa310085ec104bd811ac4f2b83657b6
2020-03-23 16:40:45 +01:00
Greg Kroah-Hartman
de2b205fa7 UPSTREAM: bpf: Explicitly memset the bpf_attr structure
For the bpf syscall, we are relying on the compiler to properly zero out
the bpf_attr union that we copy userspace data into. Unfortunately that
doesn't always work properly, padding and other oddities might not be
correctly zeroed, and in some tests odd things have been found when the
stack is pre-initialized to other values.

Fix this by explicitly memsetting the structure to 0 before using it.

Reported-by: Maciej Żenczykowski <maze@google.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Alexander Potapenko <glider@google.com>
Reported-by: Alistair Delva <adelva@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://android-review.googlesource.com/c/kernel/common/+/1235490
Link: https://lore.kernel.org/bpf/20200320094813.GA421650@kroah.com
(cherry picked from commit 8096f229421f7b22433775e928d506f0342e5907)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2dc28cd45024da5cc6861ff4a9b25fae389cc6d8
2020-03-23 16:39:13 +01:00
Greg Kroah-Hartman
417d28a44d This is the 4.19.112 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl50oUYACgkQONu9yGCS
 aT5w2Q//TEz14rFZh31L+QyAmW2lPGb9UnukLgUjs5RQKbRL/Bl3h5hU0EbYMl4i
 fdb09WsU3Ns2twelCnzWqsXPgc3GiDrSMgOfDR1jJpELzAk4WGgxdYoVj2Wkyi6x
 Uus1KDH5tEgwvHMMjwszDxi8N3behr3IidcvxN6EnTRbmSmLck2rlfp1Y0q4hqF9
 /CDIHxYfArXFVS0sxG3tGduf2wK2XieHr2YLTetcXs8W0GV7KmrSxmcEOsQOjtnv
 aifFuuRY7T4XLfkxoWephKl/YZfVsCAlgOpLn5BgwfSQyXHr/X4GUG+MI7/rsQ6l
 X5cZiqoKAM8K4olnYJB23PHTmSK3AEp8sWQBRS6WxZPcZlu/eq6qnwxhnVBrDNKP
 l40CarJECD8bc3cZnKRZuQjPC2s7ay8KpsLFfSSn0bAmPNVaTF65oqfccA1OMtJd
 nuyBOQNQ4LATemtkloDI+Xxpkhng/klXH+/yVyhbaNFXkCE8XsPaAIClpLf5rVWx
 ojREnXSNfCr3GhPPskVLfjiYhBEEdwlmOB+dBUUqxSNL3IZkdRXPy2Hpv2KFmuUR
 9VNyndsFUFjKraZR29+CBIgFjgsAInvdqvyJkBAtqaAgH3u0nTK/zA/mx6yOGA6o
 ayFotBONmvfGDSRZUNoSkSHcEXHUSNucEzrR/1aMEAb+zqCEgjk=
 =nZnf
 -----END PGP SIGNATURE-----

Merge 4.19.112 into android-4.19

Changes in 4.19.112
	perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
	mmc: sdhci-omap: Add platform specific reset callback
	mmc: sdhci-omap: Workaround errata regarding SDR104/HS200 tuning failures (i929)
	mmc: host: Fix Kconfig warnings on keystone_defconfig
	ACPI: watchdog: Allow disabling WDAT at boot
	HID: apple: Add support for recent firmware on Magic Keyboards
	HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override
	cfg80211: check reg_rule for NULL in handle_channel_custom()
	scsi: libfc: free response frame from GPN_ID
	net: usb: qmi_wwan: restore mtu min/max values after raw_ip switch
	net: ks8851-ml: Fix IRQ handling and locking
	mac80211: rx: avoid RCU list traversal under mutex
	signal: avoid double atomic counter increments for user accounting
	slip: not call free_netdev before rtnl_unlock in slip_open
	hinic: fix a irq affinity bug
	hinic: fix a bug of setting hw_ioctxt
	net: rmnet: fix NULL pointer dereference in rmnet_newlink()
	net: rmnet: fix NULL pointer dereference in rmnet_changelink()
	net: rmnet: fix suspicious RCU usage
	net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device()
	net: rmnet: do not allow to change mux id if mux id is duplicated
	net: rmnet: use upper/lower device infrastructure
	net: rmnet: fix bridge mode bugs
	net: rmnet: fix packet forwarding in rmnet bridge mode
	sfc: fix timestamp reconstruction at 16-bit rollover points
	jbd2: fix data races at struct journal_head
	wimax: i2400: fix memory leak
	wimax: i2400: Fix memory leak in i2400m_op_rfkill_sw_toggle
	mmc: sdhci-omap: Don't finish_mrq() on a command error during tuning
	mmc: sdhci-omap: Fix Tuning procedure for temperatures < -20C
	driver core: Remove the link if there is no driver with AUTO flag
	driver core: Fix adding device links to probing suppliers
	driver core: Make driver core own stateful device links
	driver core: Add device link flag DL_FLAG_AUTOPROBE_CONSUMER
	driver core: Remove device link creation limitation
	driver core: Fix creation of device links with PM-runtime flags
	net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue
	ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional()
	ARM: 8958/1: rename missed uaccess .fixup section
	mm: slub: add missing TID bump in kmem_cache_alloc_bulk()
	HID: google: add moonball USB id
	efi: Fix debugobjects warning on 'efi_rts_work'
	ipv4: ensure rcu_read_lock() in cipso_v4_error()
	Linux 4.19.112

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I68bb3ea9d74f698994a1b958d112827a0873a0f7
2020-03-21 08:37:27 +01:00
Linus Torvalds
797479da0a signal: avoid double atomic counter increments for user accounting
[ Upstream commit fda31c50292a5062332fa0343c084bd9f46604d9 ]

When queueing a signal, we increment both the users count of pending
signals (for RLIMIT_SIGPENDING tracking) and we increment the refcount
of the user struct itself (because we keep a reference to the user in
the signal structure in order to correctly account for it when freeing).

That turns out to be fairly expensive, because both of them are atomic
updates, and particularly under extreme signal handling pressure on big
machines, you can get a lot of cache contention on the user struct.
That can then cause horrid cacheline ping-pong when you do these
multiple accesses.

So change the reference counting to only pin the user for the _first_
pending signal, and to unpin it when the last pending signal is
dequeued.  That means that when a user sees a lot of concurrent signal
queuing - which is the only situation when this matters - the only
atomic access needed is generally the 'sigpending' count update.

This was noticed because of a particularly odd timing artifact on a
dual-socket 96C/192T Cascade Lake platform: when you get into bad
contention, on that machine for some reason seems to be much worse when
the contention happens in the upper 32-byte half of the cacheline.

As a result, the kernel test robot will-it-scale 'signal1' benchmark had
an odd performance regression simply due to random alignment of the
'struct user_struct' (and pointed to a completely unrelated and
apparently nonsensical commit for the regression).

Avoiding the double increments (and decrements on the dequeueing side,
of course) makes for much less contention and hugely improved
performance on that will-it-scale microbenchmark.

Quoting Feng Tang:

 "It makes a big difference, that the performance score is tripled! bump
  from original 17000 to 54000. Also the gap between 5.0-rc6 and
  5.0-rc6+Jiri's patch is reduced to around 2%"

[ The "2% gap" is the odd cacheline placement difference on that
  platform: under the extreme contention case, the effect of which half
  of the cacheline was hot was 5%, so with the reduced contention the
  odd timing artifact is reduced too ]

It does help in the non-contended case too, but is not nearly as
noticeable.

Reported-and-tested-by: Feng Tang <feng.tang@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Philip Li <philip.li@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-20 11:55:53 +01:00
Greg Kroah-Hartman
bfe2901c20 This is the 4.19.111 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5xvNEACgkQONu9yGCS
 aT6VUg//SJSSC5IX7gulaIm8IzvVijE7EKkdkjukJ4TD672J1QqzXVlhKp8tSvAV
 ZBknOar0AP5sDNtvF3cgz0t6w6IJHrLWGyWqcMfUTC75M9HVZH6YUgHDkPmi0g8f
 dyTrVe20/lC5yBNAFmS0pnYB+UfL8biJEF6N++pULZQhOY0eRr6BMKdl2npxH7D3
 YL/jipdGHmwkr/OgOtRaOBgEP6HIu1xKnZUkGzvhF0BOxAM/ib/5lQognOD6x4Hm
 9vHzc8+nBXlWj6N7XkE+I3RiZumUx+vEr2kLljdrTE7cH7ALzJQl4GQ6Db6lbd0E
 q78Y44FhrfKiwxDeGPHKOX39sgzVwCsKhwTg3a4Rq4Aq0I7QQoPikAyCUj9kaeFq
 q8bI0Wub+4nQhzuyv6UgRWaQnIBZxXe56M8z3u4CTy6ljwvn4hXeZ9bkVRyXdQtS
 D4h3WtxFwBed0tQGb5ypv83Wg/lwK8bQHab4LDV9AZNZ3Jrbg70ldlea0GiA8Csc
 Y3MncS6zF9mnAU8ZdsYT3GNkRQS3OTFNeb7+V5MRgdCnG3xk5GltHTy0JYhZKmXH
 8zMXlUgUeyihFx6f7LwFhYk8NTSg3+W700SKND/zd+VK8m7mqT7PB1bkny5zJ6aC
 teehBWmHlxZlL1ENXya8lUEEsOieAWxi3IMlhOEo2roidPW1N0o=
 =ryZW
 -----END PGP SIGNATURE-----

Merge 4.19.111 into android-4.19

Changes in 4.19.111
	phy: Revert toggling reset changes.
	net: phy: Avoid multiple suspends
	cgroup, netclassid: periodically release file_lock on classid updating
	gre: fix uninit-value in __iptunnel_pull_header
	inet_diag: return classid for all socket types
	ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface
	ipvlan: add cond_resched_rcu() while processing muticast backlog
	ipvlan: do not add hardware address of master to its unicast filter list
	ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast()
	ipvlan: don't deref eth hdr before checking it's set
	net/ipv6: use configured metric when add peer route
	netlink: Use netlink header as base to calculate bad attribute offset
	net: macsec: update SCI upon MAC address change.
	net: nfc: fix bounds checking bugs on "pipe"
	net/packet: tpacket_rcv: do not increment ring index on drop
	net: stmmac: dwmac1000: Disable ACS if enhanced descs are not used
	net: systemport: fix index check to avoid an array out of bounds access
	r8152: check disconnect status after long sleep
	sfc: detach from cb_page in efx_copy_channel()
	bnxt_en: reinitialize IRQs when MTU is modified
	cgroup: memcg: net: do not associate sock with unrelated cgroup
	net: memcg: late association of sock to memcg
	net: memcg: fix lockdep splat in inet_csk_accept()
	devlink: validate length of param values
	fib: add missing attribute validation for tun_id
	nl802154: add missing attribute validation
	nl802154: add missing attribute validation for dev_type
	can: add missing attribute validation for termination
	macsec: add missing attribute validation for port
	net: fq: add missing attribute validation for orphan mask
	team: add missing attribute validation for port ifindex
	team: add missing attribute validation for array index
	nfc: add missing attribute validation for SE API
	nfc: add missing attribute validation for deactivate target
	nfc: add missing attribute validation for vendor subcommand
	net: phy: fix MDIO bus PM PHY resuming
	selftests/net/fib_tests: update addr_metric_test for peer route testing
	net/ipv6: need update peer route when modify metric
	net/ipv6: remove the old peer route if change it to a new one
	tipc: add missing attribute validation for MTU property
	devlink: validate length of region addr/len
	bonding/alb: make sure arp header is pulled before accessing it
	slip: make slhc_compress() more robust against malicious packets
	net: fec: validate the new settings in fec_enet_set_coalesce()
	macvlan: add cond_resched() during multicast processing
	cgroup: cgroup_procs_next should increase position index
	cgroup: Iterate tasks that did not finish do_exit()
	iwlwifi: mvm: Do not require PHY_SKU NVM section for 3168 devices
	virtio-blk: fix hw_queue stopped on arbitrary error
	iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint
	netfilter: nf_conntrack: ct_cpu_seq_next should increase position index
	netfilter: synproxy: synproxy_cpu_seq_next should increase position index
	netfilter: xt_recent: recent_seq_next should increase position index
	netfilter: x_tables: xt_mttg_seq_next should increase position index
	workqueue: don't use wq_select_unbound_cpu() for bound works
	drm/amd/display: remove duplicated assignment to grph_obj_type
	ktest: Add timeout for ssh sync testing
	cifs_atomic_open(): fix double-put on late allocation failure
	gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
	KVM: x86: clear stale x86_emulate_ctxt->intercept value
	ARC: define __ALIGN_STR and __ALIGN symbols for ARC
	macintosh: windfarm: fix MODINFO regression
	efi: Fix a race and a buffer overflow while reading efivars via sysfs
	efi: Make efi_rts_work accessible to efi page fault handler
	mt76: fix array overflow on receiving too many fragments for a packet
	x86/mce: Fix logic and comments around MSR_PPIN_CTL
	iommu/dma: Fix MSI reservation allocation
	iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint
	iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
	batman-adv: Don't schedule OGM for disabled interface
	pinctrl: meson-gxl: fix GPIOX sdio pins
	pinctrl: core: Remove extra kref_get which blocks hogs being freed
	drm/i915/gvt: Fix unnecessary schedule timer when no vGPU exits
	i2c: gpio: suppress error on probe defer
	nl80211: add missing attribute validation for critical protocol indication
	nl80211: add missing attribute validation for beacon report scanning
	nl80211: add missing attribute validation for channel switch
	perf bench futex-wake: Restore thread count default to online CPU count
	netfilter: cthelper: add missing attribute validation for cthelper
	netfilter: nft_payload: add missing attribute validation for payload csum flags
	netfilter: nft_tunnel: add missing attribute validation for tunnels
	iommu/vt-d: Fix the wrong printing in RHSA parsing
	iommu/vt-d: Ignore devices with out-of-spec domain number
	i2c: acpi: put device when verifying client fails
	ipv6: restrict IPV6_ADDRFORM operation
	net/smc: check for valid ib_client_data
	net/smc: cancel event worker during device removal
	efi: Add a sanity check to efivar_store_raw()
	batman-adv: Avoid free/alloc race when handling OGM2 buffer
	Linux 4.19.111

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ide220f0b6a12d291bda4a83f17cde25bbe64e2ff
2020-03-18 08:19:47 +01:00
Hillf Danton
3cd2a91a88 workqueue: don't use wq_select_unbound_cpu() for bound works
commit aa202f1f56960c60e7befaa0f49c72b8fa11b0a8 upstream.

wq_select_unbound_cpu() is designed for unbound workqueues only, but
it's wrongly called when using a bound workqueue too.

Fixing this ensures work queued to a bound workqueue with
cpu=WORK_CPU_UNBOUND always runs on the local CPU.

Before, that would happen only if wq_unbound_cpumask happened to include
it (likely almost always the case), or was empty, or we got lucky with
forced round-robin placement.  So restricting
/sys/devices/virtual/workqueue/cpumask to a small subset of a machine's
CPUs would cause some bound work items to run unexpectedly there.

Fixes: ef55718044 ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Hillf Danton <hdanton@sina.com>
[dj: massage changelog]
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:14:20 +01:00
Michal Koutný
ab3e3b23d8 cgroup: Iterate tasks that did not finish do_exit()
commit 9c974c77246460fa6a92c18554c3311c8c83c160 upstream.

PF_EXITING is set earlier than actual removal from css_set when a task
is exitting. This can confuse cgroup.procs readers who see no PF_EXITING
tasks, however, rmdir is checking against css_set membership so it can
transitionally fail with EBUSY.

Fix this by listing tasks that weren't unlinked from css_set active
lists.
It may happen that other users of the task iterator (without
CSS_TASK_ITER_PROCS) spot a PF_EXITING task before cgroup_exit(). This
is equal to the state before commit c03cd7738a83 ("cgroup: Include dying
leaders with live threads in PROCS iterations") but it may be reviewed
later.

Reported-by: Suren Baghdasaryan <surenb@google.com>
Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:14:19 +01:00
Vasily Averin
ff79a4a75c cgroup: cgroup_procs_next should increase position index
commit 2d4ecb030dcc90fb725ecbfc82ce5d6c37906e0e upstream.

If seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output:

1) dd bs=1 skip output of each 2nd elements
$ dd if=/sys/fs/cgroup/cgroup.procs bs=8 count=1
2
3
4
5
1+0 records in
1+0 records out
8 bytes copied, 0,000267297 s, 29,9 kB/s
[test@localhost ~]$ dd if=/sys/fs/cgroup/cgroup.procs bs=1 count=8
2
4 <<< NB! 3 was skipped
6 <<<    ... and 5 too
8 <<<    ... and 7
8+0 records in
8+0 records out
8 bytes copied, 5,2123e-05 s, 153 kB/s

 This happen because __cgroup_procs_start() makes an extra
 extra cgroup_procs_next() call

2) read after lseek beyond end of file generates whole last line.
3) read after lseek into middle of last line generates
expected rest of last line and unexpected whole line once again.

Additionally patch removes an extra position index changes in
__cgroup_procs_start()

Cc: stable@vger.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:14:19 +01:00
Shakeel Butt
941464dcbc cgroup: memcg: net: do not associate sock with unrelated cgroup
[ Upstream commit e876ecc67db80dfdb8e237f71e5b43bb88ae549c ]

We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.

mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.

Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:

CPU: 70 PID: 12720 Comm: ssh Tainted:  5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:
 <IRQ>
 dump_stack+0x57/0x75
 mem_cgroup_sk_alloc+0xe9/0xf0
 sk_clone_lock+0x2a7/0x420
 inet_csk_clone_lock+0x1b/0x110
 tcp_create_openreq_child+0x23/0x3b0
 tcp_v6_syn_recv_sock+0x88/0x730
 tcp_check_req+0x429/0x560
 tcp_v6_rcv+0x72d/0xa40
 ip6_protocol_deliver_rcu+0xc9/0x400
 ip6_input+0x44/0xd0
 ? ip6_protocol_deliver_rcu+0x400/0x400
 ip6_rcv_finish+0x71/0x80
 ipv6_rcv+0x5b/0xe0
 ? ip6_sublist_rcv+0x2e0/0x2e0
 process_backlog+0x108/0x1e0
 net_rx_action+0x26b/0x460
 __do_softirq+0x104/0x2a6
 do_softirq_own_stack+0x2a/0x40
 </IRQ>
 do_softirq.part.19+0x40/0x50
 __local_bh_enable_ip+0x51/0x60
 ip6_finish_output2+0x23d/0x520
 ? ip6table_mangle_hook+0x55/0x160
 __ip6_finish_output+0xa1/0x100
 ip6_finish_output+0x30/0xd0
 ip6_output+0x73/0x120
 ? __ip6_finish_output+0x100/0x100
 ip6_xmit+0x2e3/0x600
 ? ipv6_anycast_cleanup+0x50/0x50
 ? inet6_csk_route_socket+0x136/0x1e0
 ? skb_free_head+0x1e/0x30
 inet6_csk_xmit+0x95/0xf0
 __tcp_transmit_skb+0x5b4/0xb20
 __tcp_send_ack.part.60+0xa3/0x110
 tcp_send_ack+0x1d/0x20
 tcp_rcv_state_process+0xe64/0xe80
 ? tcp_v6_connect+0x5d1/0x5f0
 tcp_v6_do_rcv+0x1b1/0x3f0
 ? tcp_v6_do_rcv+0x1b1/0x3f0
 __release_sock+0x7f/0xd0
 release_sock+0x30/0xa0
 __inet_stream_connect+0x1c3/0x3b0
 ? prepare_to_wait+0xb0/0xb0
 inet_stream_connect+0x3b/0x60
 __sys_connect+0x101/0x120
 ? __sys_getsockopt+0x11b/0x140
 __x64_sys_connect+0x1a/0x20
 do_syscall_64+0x51/0x200
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d75807383 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:14:14 +01:00
Michal Koutný
6f2a9a3536 UPSTREAM: cgroup: Iterate tasks that did not finish do_exit()
PF_EXITING is set earlier than actual removal from css_set when a task
is exitting. This can confuse cgroup.procs readers who see no PF_EXITING
tasks, however, rmdir is checking against css_set membership so it can
transitionally fail with EBUSY.

Fix this by listing tasks that weren't unlinked from css_set active
lists.
It may happen that other users of the task iterator (without
CSS_TASK_ITER_PROCS) spot a PF_EXITING task before cgroup_exit(). This
is equal to the state before commit c03cd7738a83 ("cgroup: Include dying
leaders with live threads in PROCS iterations") but it may be reviewed
later.

Reported-by: Suren Baghdasaryan <surenb@google.com>
Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
(cherry picked from commit 9c974c77246460fa6a92c18554c3311c8c83c160)
Bug: 141213848
Bug: 146758430
Test: test_cgcore_destroy from linux-kselftest
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iac57661b931129ed1e44b89675f8115bb89084ff
(cherry picked from commit 21ee296526c70d6dc3c64639406f156f39b80fd0)
2020-03-16 21:15:43 +00:00
Laura Abbott
c9a574054d ANDROID: GKI: drivers: Add dma removed ops
The current DMA coherent pool assumes that there is a kernel
mapping at all times for the entire pool. This may not be
what we want for the entire times. Add the dma_removed ops to
support this use case.

Bug: 145617272
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
Signed-off-by: Liam Mark <lmark@codeaurora.org>
Signed-off-by: Swathi Sridhar <swatsrid@codeaurora.org>

[surenb Squashed the following commits:
a478a8bf78 "drivers: Add dma removed ops"
8510985ae3 "dma: removed: Merge dma removed changes from 4.14"
and removed-dma-pool driver changes from
9d4b7d6415 "iommu: arm-smmu: Merge for ..."]

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ie4f1e9bdf57b79699fa8fa7e7a6087e6d88ebbfa
2020-03-16 18:10:36 +00:00
Will Deacon
8e37367a32 FROMGIT: kallsyms: unexport kallsyms_lookup_name() and kallsyms_on_each_symbol()
kallsyms_lookup_name() and kallsyms_on_each_symbol() are exported to
modules despite having no in-tree users and being wide open to abuse by
out-of-tree modules that can use them as a method to invoke arbitrary
non-exported kernel functions.

Unexport kallsyms_lookup_name() and kallsyms_on_each_symbol().

Bug: 149978696
Change-Id: I8f3c1b5222939c46901f4d149d4c7bb63916ff04
Link: http://lkml.kernel.org/r/20200221114404.14641-4-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Quentin Perret <qperret@google.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ab3e66797c7fddbf80fbba31c5bf4574ad52f320
 https://github.com/hnaz/linux-mm.git master)
Signed-off-by: Quentin Perret <qperret@google.com>
2020-03-12 11:18:50 +00:00
Greg Kroah-Hartman
ca0a95ff50 This is the 4.19.109 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5o5HYACgkQONu9yGCS
 aT4/tQ//Xsg40emvKL+hfz22PB5OccTYr4LFGIiVFq5kOGbwM7/oXkVwzD3tk948
 0gGbad65hzi5pKnuQOhNgOtkIEtheub/lf0lUJPf+TM9xUj6Vi0/KjkODfJ01O1+
 OiUy/ZoqL1NB1GQxXMUdtZwayQJkIdVq/taralbfxFwrJlffhjjBg2/I7N+C/SPw
 LrlYzbtIqS8wu/d4xPwsEGm4vnhqb0jLiJ42/kb3+Ts21/FhUge8+lkwTGq8JyH1
 QeRtPUHEJxJ2hA6H2T9CJg4fiJYhD96tLZUKYz57A95z20uqVszvFWVGApz5uLi1
 n1BUkYFAOmJ+H4hWcT1kiYMhxA7iMk1JTbEs9EJOwq1CFfEK2LRfY7Xze3XqCDOd
 eujecPlpqji+7wxTCd5XkOKAwjVdBLo7faZypQCUai8A6ca9D/rrxerglAa/VSsj
 KfTaTThIxKHg8bskDSBJqe9rPZg92u7LEVMj+EE05CcfNevBvKDwV6I4F7cKZV2X
 Y7w76OaYgm8e+H6w6ryvCN8d/T5tSGNly2wJ+rHBl5kc3GD9NPUXRUJKKePU55/3
 SuD6q/8gnGp2upqC6FFNTdFMzAar8vIbvHYh9vMA8ISMfM5ShibE/V06PxaupmV6
 RWCiAOiBnOu5wi/hDQ8wrlWOI2+H8fUWUldy2LIp78HAS4OWnUw=
 =VaEt
 -----END PGP SIGNATURE-----

Merge 4.19.109 into android-4.19

Changes in 4.19.109
	EDAC/amd64: Set grain per DIMM
	ALSA: hda/realtek - Fix a regression for mute led on Lenovo Carbon X1
	net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/sec
	RDMA/core: Fix pkey and port assignment in get_new_pps
	RDMA/core: Fix use of logical OR in get_new_pps
	kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
	ALSA: hda: do not override bus codec_mask in link_get()
	serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE
	selftests: fix too long argument
	usb: gadget: composite: Support more than 500mA MaxPower
	usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags
	usb: gadget: serial: fix Tx stall after buffer overflow
	drm/msm/mdp5: rate limit pp done timeout warnings
	drm: msm: Fix return type of dsi_mgr_connector_mode_valid for kCFI
	scsi: megaraid_sas: silence a warning
	drm/msm/dsi: save pll state before dsi host is powered off
	drm/msm/dsi/pll: call vco set rate explicitly
	selftests: forwarding: use proto icmp for {gretap, ip6gretap}_mac testing
	net: dsa: b53: Ensure the default VID is untagged
	net: ks8851-ml: Remove 8-bit bus accessors
	net: ks8851-ml: Fix 16-bit data access
	net: ks8851-ml: Fix 16-bit IO operation
	watchdog: da9062: do not ping the hw during stop()
	s390/cio: cio_ignore_proc_seq_next should increase position index
	s390: make 'install' not depend on vmlinux
	x86/boot/compressed: Don't declare __force_order in kaslr_64.c
	s390/qdio: fill SL with absolute addresses
	nvme: Fix uninitialized-variable warning
	ice: Don't tell the OS that link is going down
	x86/xen: Distribute switch variables for initialization
	net: thunderx: workaround BGX TX Underflow issue
	ALSA: hda/realtek - Add Headset Mic supported
	ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master
	cifs: don't leak -EAGAIN for stat() during reconnect
	usb: storage: Add quirk for Samsung Fit flash
	usb: quirks: add NO_LPM quirk for Logitech Screen Share
	usb: dwc3: gadget: Update chain bit correctly when using sg list
	usb: core: hub: fix unhandled return by employing a void function
	usb: core: hub: do error out if usb_autopm_get_interface() fails
	usb: core: port: do error out if usb_autopm_get_interface() fails
	vgacon: Fix a UAF in vgacon_invert_region
	mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa
	mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
	fat: fix uninit-memory access for partial initialized inode
	arm: dts: dra76x: Fix mmc3 max-frequency
	tty:serial:mvebu-uart:fix a wrong return
	serial: 8250_exar: add support for ACCES cards
	vt: selection, close sel_buffer race
	vt: selection, push console lock down
	vt: selection, push sel_lock up
	media: v4l2-mem2mem.c: fix broken links
	x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
	dmaengine: tegra-apb: Fix use-after-free
	dmaengine: tegra-apb: Prevent race conditions of tasklet vs free list
	dm cache: fix a crash due to incorrect work item cancelling
	dm: report suspended device during destroy
	dm writecache: verify watermark during resume
	ARM: dts: ls1021a: Restore MDIO compatible to gianfar
	spi: bcm63xx-hsspi: Really keep pll clk enabled
	ASoC: topology: Fix memleak in soc_tplg_link_elems_load()
	ASoC: topology: Fix memleak in soc_tplg_manifest_load()
	ASoC: intel: skl: Fix pin debug prints
	ASoC: intel: skl: Fix possible buffer overflow in debug outputs
	dmaengine: imx-sdma: remove dma_slave_config direction usage and leave sdma_event_enable()
	ASoC: pcm: Fix possible buffer overflow in dpcm state sysfs output
	ASoC: pcm512x: Fix unbalanced regulator enable call in probe error path
	ASoC: dapm: Correct DAPM handling of active widgets during shutdown
	drm/sun4i: Fix DE2 VI layer format support
	drm/sun4i: de2/de3: Remove unsupported VI layer formats
	phy: mapphone-mdm6600: Fix timeouts by adding wake-up handling
	phy: mapphone-mdm6600: Fix write timeouts with shorter GPIO toggle interval
	ARM: dts: imx6: phycore-som: fix emmc supply
	RDMA/iwcm: Fix iwcm work deallocation
	RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
	IB/hfi1, qib: Ensure RCU is locked when accessing list
	ARM: imx: build v7_cpu_resume() unconditionally
	ARM: dts: am437x-idk-evm: Fix incorrect OPP node names
	ARM: dts: imx7-colibri: Fix frequency for sd/mmc
	hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT()
	dmaengine: coh901318: Fix a double lock bug in dma_tc_handle()
	powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems
	efi/x86: Align GUIDs to their size in the mixed mode runtime wrapper
	efi/x86: Handle by-ref arguments covering multiple pages in mixed mode
	dm integrity: fix a deadlock due to offloading to an incorrect workqueue
	scsi: pm80xx: Fixed kernel panic during error recovery for SATA drive
	Linux 4.19.109

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iae5cc72b8c7c96b0a15c76657b9c3bcc4341a7aa
2020-03-11 17:10:39 +01:00
Masami Hiramatsu
38d3707340 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic
[ Upstream commit e4add247789e4ba5e08ad8256183ce2e211877d4 ]

optimize_kprobe() and unoptimize_kprobe() cancels if a given kprobe
is on the optimizing_list or unoptimizing_list already. However, since
the following commit:

  f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")

modified the update timing of the KPROBE_FLAG_OPTIMIZED, it doesn't
work as expected anymore.

The optimized_kprobe could be in the following states:

- [optimizing]: Before inserting jump instruction
  op.kp->flags has KPROBE_FLAG_OPTIMIZED and
  op->list is not empty.

- [optimized]: jump inserted
  op.kp->flags has KPROBE_FLAG_OPTIMIZED and
  op->list is empty.

- [unoptimizing]: Before removing jump instruction (including unused
  optprobe)
  op.kp->flags has KPROBE_FLAG_OPTIMIZED and
  op->list is not empty.

- [unoptimized]: jump removed
  op.kp->flags doesn't have KPROBE_FLAG_OPTIMIZED and
  op->list is empty.

Current code mis-expects [unoptimizing] state doesn't have
KPROBE_FLAG_OPTIMIZED, and that can cause incorrect results.

To fix this, introduce optprobe_queued_unopt() to distinguish [optimizing]
and [unoptimizing] states and fixes the logic in optimize_kprobe() and
unoptimize_kprobe().

[ mingo: Cleaned up the changelog and the code a bit. ]

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bristot@redhat.com
Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
Link: https://lkml.kernel.org/r/157840814418.7181.13478003006386303481.stgit@devnote2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-11 14:14:47 +01:00
Saravana Kannan
ebb43b6aeb ANDROID: GKI: genirq: Export symbols to compile irqchip drivers as modules
We want to allow compiling irqchip drivers as modules. So export the
necessary symbols.

Bug: 148105066
Change-Id: Id3de4b8451bed1af9b0afeb5863493697730acb6
Signed-off-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Will McVicker <willmcvicker@google.com>
(cherry picked from commit cfc69e9b2fe82a46addfcb1912bd642456548baa)
2020-03-09 11:32:04 -07:00
Will McVicker
e0bd5f70e2 ANDROID: GKI: genirq/irqdomain: add export symbols for modularizing
These symbols are needed for modularizing pinctrl.

Signed-off-by: Will McVicker <willmcvicker@google.com>
Bug: 145771121
Test: compile, boot
Change-Id: I8693c3a41b5fcab05b8e4a8a82f4057205bafd3b
(cherry picked from commit 9d2cbb36a60747e885f77d776a3ec2bf7523e2e6)
2020-03-09 11:32:04 -07:00
Maulik Shah
657d3fdc70 ANDROID: GKI: genirq: Introduce irq_chip_get/set_parent_state calls
On certain QTI chipsets some GPIOs are direct-connect interrupts
to the GIC.

Even when GPIOs are not used for interrupt generation and interrupt
line is disabled, it does not prevent interrupt to get pending at
GIC_ISPEND. When drivers call enable_irq unwanted interrupt occures.

Introduce irq_chip_get/set_parent_state calls to clear pending irq
which can get called within irq_enable of child irq chip to clear
any pending irq before enabling.

Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
Bug: 150233439
Change-Id: Ie8559657bd8da926cc741514809ffe9adbd73a80
Signed-off-by: Will McVicker <willmcvicker@google.com>
(cherry picked from commit d923314622)
2020-03-09 11:32:04 -07:00
Greg Kroah-Hartman
8290fa4ad8 This is the 4.19.108 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5hHeEACgkQONu9yGCS
 aT7uFQ/+KcS1brUUid3C+zewoJ7vvB7wspMRogdJk5/9Y/ty4uxolFRNxM7Fq2Sj
 2Uq0jyt8TiiOQBguqpRfJN/GqPX7HBLPR6e9B4Pq67BR34VT6azqb5F7tKjhCa4I
 CGmA5XvCCQSGwqmwyP4biV0yOdN6Fy0B/9q7+7RSOsY/Mr86RyEHQgarRxaMy2QW
 v/dh/yIsdPXG27ZicETudKqIWaMiXL5k0zXr81HY4TcOzBrKW66nuqcI0uXZ6r54
 RwqxfGVTGQeGIN4bBAFGTlEvvMDO0NAENGA0vOpt8Mqf7yRIye78GCmn8A/nOgd/
 +ZsrS9y+baJun0O/7zmuYSFd37GDecRu6kNYI+fc1Hbf784wLj05A52kNZ5ndYPB
 CdHgcow63QV3DGTXsfQOi/yZEDm/YMUzhMoL2/KP/LlJzq8raXMf95pB3fgs6zmX
 HI3sA4AuyWQaQb/ogzW+8m8v1oHzT4+aNaBi9rBS1uqCg5q6AhTBRApUNpbybgsG
 vkiTwhIc2y74Y7M5wV0Fp29pQBPPn033smIq3V/qxgyMvoBbMXxNGZ7jTK882h5g
 UBjprtX/wyHgVLEXITiz1BPJTinweJarRCL6iGn5w7IOfd3enSamfph5wh5vuXR6
 ea0SCw3Dni5G930BMldxubZRtiYTiqvDCeC/IpG7trP9mpczGeE=
 =/2bv
 -----END PGP SIGNATURE-----

Merge 4.19.108 into android-4.19

Changes in 4.19.108
	irqchip/gic-v3-its: Fix misuse of GENMASK macro
	iwlwifi: pcie: fix rb_allocator workqueue allocation
	ipmi:ssif: Handle a possible NULL pointer reference
	drm/msm: Set dma maximum segment size for mdss
	dax: pass NOWAIT flag to iomap_apply
	mac80211: consider more elements in parsing CRC
	cfg80211: check wiphy driver existence for drvinfo report
	s390/zcrypt: fix card and queue total counter wrap
	qmi_wwan: re-add DW5821e pre-production variant
	qmi_wwan: unconditionally reject 2 ep interfaces
	ARM: dts: sti: fixup sound frame-inversion for stihxxx-b2120.dtsi
	soc/tegra: fuse: Fix build with Tegra194 configuration
	net: ena: fix potential crash when rxfh key is NULL
	net: ena: fix uses of round_jiffies()
	net: ena: add missing ethtool TX timestamping indication
	net: ena: fix incorrect default RSS key
	net: ena: rss: fix failure to get indirection table
	net: ena: rss: store hash function as values and not bits
	net: ena: fix incorrectly saving queue numbers when setting RSS indirection table
	net: ena: ethtool: use correct value for crc32 hash
	net: ena: ena-com.c: prevent NULL pointer dereference
	cifs: Fix mode output in debugging statements
	cfg80211: add missing policy for NL80211_ATTR_STATUS_CODE
	sysrq: Restore original console_loglevel when sysrq disabled
	sysrq: Remove duplicated sysrq message
	net: fib_rules: Correctly set table field when table number exceeds 8 bits
	net: mscc: fix in frame extraction
	net: phy: restore mdio regs in the iproc mdio driver
	net: sched: correct flower port blocking
	nfc: pn544: Fix occasional HW initialization failure
	sctp: move the format error check out of __sctp_sf_do_9_1_abort
	ipv6: Fix route replacement with dev-only route
	ipv6: Fix nlmsg_flags when splitting a multipath route
	qede: Fix race between rdma destroy workqueue and link change event
	net/tls: Fix to avoid gettig invalid tls record
	ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
	audit: fix error handling in audit_data_to_entry()
	ACPICA: Introduce ACPI_ACCESS_BYTE_WIDTH() macro
	ACPI: watchdog: Fix gas->access_width usage
	KVM: VMX: check descriptor table exits on instruction emulation
	HID: ite: Only bind to keyboard USB interface on Acer SW5-012 keyboard dock
	HID: core: fix off-by-one memset in hid_report_raw_event()
	HID: core: increase HID report buffer size to 8KiB
	macintosh: therm_windtunnel: fix regression when instantiating devices
	tracing: Disable trace_printk() on post poned tests
	Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"
	amdgpu/gmc_v9: save/restore sdpif regs during S3
	vhost: Check docket sk_family instead of call getname
	HID: alps: Fix an error handling path in 'alps_input_configured()'
	HID: hiddev: Fix race in in hiddev_disconnect()
	MIPS: VPE: Fix a double free and a memory leak in 'release_vpe()'
	i2c: altera: Fix potential integer overflow
	i2c: jz4780: silence log flood on txabrt
	drm/i915/gvt: Fix orphan vgpu dmabuf_objs' lifetime
	drm/i915/gvt: Separate display reset from ALL_ENGINES reset
	hv_netvsc: Fix unwanted wakeup in netvsc_attach()
	usb: charger: assign specific number for enum value
	s390/qeth: vnicc Fix EOPNOTSUPP precedence
	net: netlink: cap max groups which will be considered in netlink_bind()
	net: atlantic: fix use after free kasan warn
	net: atlantic: fix potential error handling
	net/smc: no peer ID in CLC decline for SMCD
	net: ena: make ena rxfh support ETH_RSS_HASH_NO_CHANGE
	namei: only return -ECHILD from follow_dotdot_rcu()
	mwifiex: drop most magic numbers from mwifiex_process_tdls_action_frame()
	mwifiex: delete unused mwifiex_get_intf_num()
	KVM: SVM: Override default MMIO mask if memory encryption is enabled
	KVM: Check for a bad hva before dropping into the ghc slow path
	sched/fair: Optimize update_blocked_averages()
	sched/fair: Fix O(nr_cgroups) in the load balancing path
	perf stat: Use perf_evsel__is_clocki() for clock events
	perf stat: Fix shadow stats for clock events
	drivers: net: xgene: Fix the order of the arguments of 'alloc_etherdev_mqs()'
	kprobes: Set unoptimized flag after unoptimizing code
	pwm: omap-dmtimer: put_device() after of_find_device_by_node()
	perf hists browser: Restore ESC as "Zoom out" of DSO/thread/etc
	KVM: x86: Remove spurious kvm_mmu_unload() from vcpu destruction path
	KVM: x86: Remove spurious clearing of async #PF MSR
	thermal: brcmstb_thermal: Do not use DT coefficients
	netfilter: nft_tunnel: no need to call htons() when dumping ports
	netfilter: nf_flowtable: fix documentation
	mm/huge_memory.c: use head to check huge zero page
	mm, thp: fix defrag setting if newline is not used
	audit: always check the netlink payload length in audit_receive_msg()
	Linux 4.19.108

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib98db500eded0a83d89c38900bbdf9ff5d6a37e0
2020-03-05 17:40:55 +01:00
Paul Moore
9d2fdc4c7e audit: always check the netlink payload length in audit_receive_msg()
[ Upstream commit 756125289285f6e55a03861bf4b6257aa3d19a93 ]

This patch ensures that we always check the netlink payload length
in audit_receive_msg() before we take any action on the payload
itself.

Cc: stable@vger.kernel.org
Reported-by: syzbot+399c44bf1f43b8747403@syzkaller.appspotmail.com
Reported-by: syzbot+e4b12d8d202701f08b6d@syzkaller.appspotmail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-05 16:42:23 +01:00
Masami Hiramatsu
39af044d1c kprobes: Set unoptimized flag after unoptimizing code
commit f66c0447cca1281116224d474cdb37d6a18e4b5b upstream.

Set the unoptimized flag after confirming the code is completely
unoptimized. Without this fix, when a kprobe hits the intermediate
modified instruction (the first byte is replaced by an INT3, but
later bytes can still be a jump address operand) while unoptimizing,
it can return to the middle byte of the modified code, which causes
an invalid instruction exception in the kernel.

Usually, this is a rare case, but if we put a probe on the function
call while text patching, it always causes a kernel panic as below:

 # echo p text_poke+5 > kprobe_events
 # echo 1 > events/kprobes/enable
 # echo 0 > events/kprobes/enable

invalid opcode: 0000 [#1] PREEMPT SMP PTI
 RIP: 0010:text_poke+0x9/0x50
 Call Trace:
  arch_unoptimize_kprobe+0x22/0x28
  arch_unoptimize_kprobes+0x39/0x87
  kprobe_optimizer+0x6e/0x290
  process_one_work+0x2a0/0x610
  worker_thread+0x28/0x3d0
  ? process_one_work+0x610/0x610
  kthread+0x10d/0x130
  ? kthread_park+0x80/0x80
  ret_from_fork+0x3a/0x50

text_poke() is used for patching the code in optprobes.

This can happen even if we blacklist text_poke() and other functions,
because there is a small time window during which we show the intermediate
code to other CPUs.

 [ mingo: Edited the changelog. ]

Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bristot@redhat.com
Fixes: 6274de4984 ("kprobes: Support delayed unoptimizing")
Link: https://lkml.kernel.org/r/157483422375.25881.13508326028469515760.stgit@devnote2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05 16:42:22 +01:00
Vincent Guittot
d71744b5c1 sched/fair: Fix O(nr_cgroups) in the load balancing path
commit 039ae8bcf7a5f4476f4487e6bf816885fb3fb617 upstream.

This re-applies the commit reverted here:

  commit c40f7d74c741 ("sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c")

I.e. now that cfs_rq can be safely removed/added in the list, we can re-apply:

 commit a9e7f6544b ("sched/fair: Fix O(nr_cgroups) in load balance path")

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sargun@sargun.me
Cc: tj@kernel.org
Cc: xiexiuqi@huawei.com
Cc: xiezhipeng1@huawei.com
Link: https://lkml.kernel.org/r/1549469662-13614-3-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Vishnu Rangayyan <vishnu.rangayyan@apple.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05 16:42:21 +01:00
Vincent Guittot
a1f1a978a7 sched/fair: Optimize update_blocked_averages()
commit 31bc6aeaab1d1de8959b67edbed5c7a4b3cdbe7c upstream.

Removing a cfs_rq from rq->leaf_cfs_rq_list can break the parent/child
ordering of the list when it will be added back. In order to remove an
empty and fully decayed cfs_rq, we must remove its children too, so they
will be added back in the right order next time.

With a normal decay of PELT, a parent will be empty and fully decayed
if all children are empty and fully decayed too. In such a case, we just
have to ensure that the whole branch will be added when a new task is
enqueued. This is default behavior since :

  commit f6783319737f ("sched/fair: Fix insertion in rq->leaf_cfs_rq_list")

In case of throttling, the PELT of throttled cfs_rq will not be updated
whereas the parent will. This breaks the assumption made above unless we
remove the children of a cfs_rq that is throttled. Then, they will be
added back when unthrottled and a sched_entity will be enqueued.

As throttled cfs_rq are now removed from the list, we can remove the
associated test in update_blocked_averages().

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sargun@sargun.me
Cc: tj@kernel.org
Cc: xiexiuqi@huawei.com
Cc: xiezhipeng1@huawei.com
Link: https://lkml.kernel.org/r/1549469662-13614-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Vishnu Rangayyan <vishnu.rangayyan@apple.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05 16:42:21 +01:00
Steven Rostedt (VMware)
91495e01e8 tracing: Disable trace_printk() on post poned tests
commit 78041c0c9e935d9ce4086feeff6c569ed88ddfd4 upstream.

The tracing seftests checks various aspects of the tracing infrastructure,
and one is filtering. If trace_printk() is active during a self test, it can
cause the filtering to fail, which will disable that part of the trace.

To keep the selftests from failing because of trace_printk() calls,
trace_printk() checks the variable tracing_selftest_running, and if set, it
does not write to the tracing buffer.

As some tracers were registered earlier in boot, the selftest they triggered
would fail because not all the infrastructure was set up for the full
selftest. Thus, some of the tests were post poned to when their
infrastructure was ready (namely file system code). The postpone code did
not set the tracing_seftest_running variable, and could fail if a
trace_printk() was added and executed during their run.

Cc: stable@vger.kernel.org
Fixes: 9afecfbb95 ("tracing: Postpone tracer start-up tests till the system is more robust")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05 16:42:18 +01:00
Paul Moore
c24d457a82 audit: fix error handling in audit_data_to_entry()
commit 2ad3e17ebf94b7b7f3f64c050ff168f9915345eb upstream.

Commit 219ca39427 ("audit: use union for audit_field values since
they are mutually exclusive") combined a number of separate fields in
the audit_field struct into a single union.  Generally this worked
just fine because they are generally mutually exclusive.
Unfortunately in audit_data_to_entry() the overlap can be a problem
when a specific error case is triggered that causes the error path
code to attempt to cleanup an audit_field struct and the cleanup
involves attempting to free a stored LSM string (the lsm_str field).
Currently the code always has a non-NULL value in the
audit_field.lsm_str field as the top of the for-loop transfers a
value into audit_field.val (both .lsm_str and .val are part of the
same union); if audit_data_to_entry() fails and the audit_field
struct is specified to contain a LSM string, but the
audit_field.lsm_str has not yet been properly set, the error handling
code will attempt to free the bogus audit_field.lsm_str value that
was set with audit_field.val at the top of the for-loop.

This patch corrects this by ensuring that the audit_field.val is only
set when needed (it is cleared when the audit_field struct is
allocated with kcalloc()).  It also corrects a few other issues to
ensure that in case of error the proper error code is returned.

Cc: stable@vger.kernel.org
Fixes: 219ca39427 ("audit: use union for audit_field values since they are mutually exclusive")
Reported-by: syzbot+1f4d90ead370d72e450b@syzkaller.appspotmail.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05 16:42:17 +01:00
Mark Salyzyn
5cbbeadd5a BACKPORT: mm: reclaim small amounts of memory when an external fragmentation event occurs
An external fragmentation event was previously described as

    When the page allocator fragments memory, it records the event using
    the mm_page_alloc_extfrag event. If the fallback_order is smaller
    than a pageblock order (order-9 on 64-bit x86) then it's considered
    an event that will cause external fragmentation issues in the future.

The kernel reduces the probability of such events by increasing the
watermark sizes by calling set_recommended_min_free_kbytes early in the
lifetime of the system.  This works reasonably well in general but if
there are enough sparsely populated pageblocks then the problem can still
occur as enough memory is free overall and kswapd stays asleep.

This patch introduces a watermark_boost_factor sysctl that allows a zone
watermark to be temporarily boosted when an external fragmentation causing
events occurs.  The boosting will stall allocations that would decrease
free memory below the boosted low watermark and kswapd is woken if the
calling context allows to reclaim an amount of memory relative to the size
of the high watermark and the watermark_boost_factor until the boost is
cleared.  When kswapd finishes, it wakes kcompactd at the pageblock order
to clean some of the pageblocks that may have been affected by the
fragmentation event.  kswapd avoids any writeback, slab shrinkage and swap
from reclaim context during this operation to avoid excessive system
disruption in the name of fragmentation avoidance.  Care is taken so that
kswapd will do normal reclaim work if the system is really low on memory.

This was evaluated using the same workloads as "mm, page_alloc: Spread
allocations across zones before introducing fragmentation".

1-socket Skylake machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 1 THP allocating thread
--------------------------------------

4.20-rc3 extfrag events < order 9:   804694
4.20-rc3+patch:                      408912 (49% reduction)
4.20-rc3+patch1-4:                    18421 (98% reduction)

                                   4.20.0-rc3             4.20.0-rc3
                                 lowzone-v5r8             boost-v5r8
Amean     fault-base-1      653.58 (   0.00%)      652.71 (   0.13%)
Amean     fault-huge-1        0.00 (   0.00%)      178.93 * -99.00%*

                              4.20.0-rc3             4.20.0-rc3
                            lowzone-v5r8             boost-v5r8
Percentage huge-1        0.00 (   0.00%)        5.12 ( 100.00%)

Note that external fragmentation causing events are massively reduced by
this path whether in comparison to the previous kernel or the vanilla
kernel.  The fault latency for huge pages appears to be increased but that
is only because THP allocations were successful with the patch applied.

1-socket Skylake machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------

4.20-rc3 extfrag events < order 9:  291392
4.20-rc3+patch:                     191187 (34% reduction)
4.20-rc3+patch1-4:                   13464 (95% reduction)

thpfioscale Fault Latencies
                                   4.20.0-rc3             4.20.0-rc3
                                 lowzone-v5r8             boost-v5r8
Min       fault-base-1      912.00 (   0.00%)      905.00 (   0.77%)
Min       fault-huge-1      127.00 (   0.00%)      135.00 (  -6.30%)
Amean     fault-base-1     1467.55 (   0.00%)     1481.67 (  -0.96%)
Amean     fault-huge-1     1127.11 (   0.00%)     1063.88 *   5.61%*

                              4.20.0-rc3             4.20.0-rc3
                            lowzone-v5r8             boost-v5r8
Percentage huge-1       77.64 (   0.00%)       83.46 (   7.49%)

As before, massive reduction in external fragmentation events, some jitter
on latencies and an increase in THP allocation success rates.

2-socket Haswell machine
config-global-dhp__workload_thpfioscale XFS (no special madvise)
4 fio threads, 5 THP allocating threads
----------------------------------------------------------------

4.20-rc3 extfrag events < order 9:  215698
4.20-rc3+patch:                     200210 (7% reduction)
4.20-rc3+patch1-4:                   14263 (93% reduction)

                                   4.20.0-rc3             4.20.0-rc3
                                 lowzone-v5r8             boost-v5r8
Amean     fault-base-5     1346.45 (   0.00%)     1306.87 (   2.94%)
Amean     fault-huge-5     3418.60 (   0.00%)     1348.94 (  60.54%)

                              4.20.0-rc3             4.20.0-rc3
                            lowzone-v5r8             boost-v5r8
Percentage huge-5        0.78 (   0.00%)        7.91 ( 910.64%)

There is a 93% reduction in fragmentation causing events, there is a big
reduction in the huge page fault latency and allocation success rate is
higher.

2-socket Haswell machine
global-dhp__workload_thpfioscale-madvhugepage-xfs (MADV_HUGEPAGE)
-----------------------------------------------------------------

4.20-rc3 extfrag events < order 9: 166352
4.20-rc3+patch:                    147463 (11% reduction)
4.20-rc3+patch1-4:                  11095 (93% reduction)

thpfioscale Fault Latencies
                                   4.20.0-rc3             4.20.0-rc3
                                 lowzone-v5r8             boost-v5r8
Amean     fault-base-5     6217.43 (   0.00%)     7419.67 * -19.34%*
Amean     fault-huge-5     3163.33 (   0.00%)     3263.80 (  -3.18%)

                              4.20.0-rc3             4.20.0-rc3
                            lowzone-v5r8             boost-v5r8
Percentage huge-5       95.14 (   0.00%)       87.98 (  -7.53%)

There is a large reduction in fragmentation events with some jitter around
the latencies and success rates.  As before, the high THP allocation
success rate does mean the system is under a lot of pressure.  However, as
the fragmentation events are reduced, it would be expected that the
long-term allocation success rate would be higher.

Link: http://lkml.kernel.org/r/20181123114528.28802-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ied06272defcdbf3fff07b7ebccb46c68ce081e1e
Git-commit: 1c30844d2dfe272d58c8fc000960b835d13aa2ac
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
(cherry picked from commit 1c30844d2dfe272d58c8fc000960b835d13aa2ac)
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 150378964
2020-03-04 12:01:17 -08:00
Qais Yousef
cb0a2cdd48 UPSTREAM: sched/uclamp: Reject negative values in cpu_uclamp_write()
The check to ensure that the new written value into cpu.uclamp.{min,max}
is within range, [0:100], wasn't working because of the signed
comparison

 7301                 if (req.percent > UCLAMP_PERCENT_SCALE) {
 7302                         req.ret = -ERANGE;
 7303                         return req;
 7304                 }

	# echo -1 > cpu.uclamp.min
	# cat cpu.uclamp.min
	42949671.96

Cast req.percent into u64 to force the comparison to be unsigned and
work as intended in capacity_from_percent().

	# echo -1 > cpu.uclamp.min
	sh: write error: Numerical result out of range

Bug: 120440300
Fixes: 2480c093130f ("sched/uclamp: Extend CPU's cgroup controller")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200114210947.14083-1-qais.yousef@arm.com
(cherry picked from commit b562d140649966d4daedd0483a8fe59ad3bb465a)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I17fc2b119dcbffb212e130ed2c37ae3a8d5bbb61
2020-03-03 11:41:08 +00:00
Greg Kroah-Hartman
7cd2c86c50 This is the 4.19.107 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5ZNBUACgkQONu9yGCS
 aT6vMhAAoNfLw1JEqsOgplIUKuLJnIBOldyJeZ8HCrR9yhIEDgevHQzaWutyD6H4
 2AzImhL8YBwAw+9UHq5Z1PT3PluKt78vRr1ZxDyNniHGJdDsoWTed9h+QjyRkDFl
 KZSV30GraO8/P6e9Ep5CgKLiCID7m2U9jYZkb6QL21wawprEi6dgSOb21prPyN1d
 SKCtcrhUQFqDPOgqU3Cyv9t/vxzrgBKSZRKOXZON5gBlwmFHuPk7lcSB80DKd+7S
 Um7oatwFBhQwKyuhJARXbrhIw2z6Y+xf1wJF+yNW9v/VpR4NE+SkzX2SaX7lercF
 JigVmtpth1KBa2wGw3N0XOdNG6NYrLtzeBW+o7mlZk4D2OKCeUoZEdM5RiVJNLCK
 Ze1soQtHoRFViqPx5Or06pOsMagKRNxzjkFPd1cfA7vpRw2KRNKCFXec/Ms8coUd
 /WslTHkyfryRfzFDtyyCATVXHPizkZqJyrR/3pes4sGITIpFczWVHiQ3mqUIrdXN
 d08CwsYS0ivQwvl5hZzxyqUlUWVhGccT1PpO6+SZp2IuGT3YWZzpQKDh0+IlIsv0
 TUvEtz3xjzL5EDUmUFsRUy5hBINdzjE/iKb3KOHw0y8xik5Rp0LkHtMRPmro5+TT
 A4JqVfxTGdTprRXPeCS/7X1jOoOxnxm06QZ+HqbHCL5CUi6nZFk=
 =XhCO
 -----END PGP SIGNATURE-----

Merge 4.19.107 into android-4.19

Changes in 4.19.107
	iommu/qcom: Fix bogus detach logic
	ALSA: hda: Use scnprintf() for printing texts for sysfs/procfs
	ALSA: hda/realtek - Apply quirk for MSI GP63, too
	ALSA: hda/realtek - Apply quirk for yet another MSI laptop
	ASoC: sun8i-codec: Fix setting DAI data format
	ecryptfs: fix a memory leak bug in parse_tag_1_packet()
	ecryptfs: fix a memory leak bug in ecryptfs_init_messaging()
	thunderbolt: Prevent crash if non-active NVMem file is read
	USB: misc: iowarrior: add support for 2 OEMed devices
	USB: misc: iowarrior: add support for the 28 and 28L devices
	USB: misc: iowarrior: add support for the 100 device
	floppy: check FDC index for errors before assigning it
	vt: fix scrollback flushing on background consoles
	vt: selection, handle pending signals in paste_selection
	vt: vt_ioctl: fix race in VT_RESIZEX
	staging: android: ashmem: Disallow ashmem memory from being remapped
	staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.
	xhci: Force Maximum Packet size for Full-speed bulk devices to valid range.
	xhci: fix runtime pm enabling for quirky Intel hosts
	xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2
	usb: host: xhci: update event ring dequeue pointer on purpose
	USB: core: add endpoint-blacklist quirk
	USB: quirks: blacklist duplicate ep on Sound Devices USBPre2
	usb: uas: fix a plug & unplug racing
	USB: Fix novation SourceControl XL after suspend
	USB: hub: Don't record a connect-change event during reset-resume
	USB: hub: Fix the broken detection of USB3 device in SMSC hub
	usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows
	usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields
	staging: rtl8188eu: Fix potential security hole
	staging: rtl8188eu: Fix potential overuse of kernel memory
	staging: rtl8723bs: Fix potential security hole
	staging: rtl8723bs: Fix potential overuse of kernel memory
	powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery
	jbd2: fix ocfs2 corrupt when clearing block group bits
	x86/mce/amd: Publish the bank pointer only after setup has succeeded
	x86/mce/amd: Fix kobject lifetime
	x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF
	serial: 8250: Check UPF_IRQ_SHARED in advance
	tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode
	tty: serial: imx: setup the correct sg entry for tx dma
	serdev: ttyport: restore client ops on deregistration
	MAINTAINERS: Update drm/i915 bug filing URL
	Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"
	mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps()
	nvme-multipath: Fix memory leak with ana_log_buf
	genirq/irqdomain: Make sure all irq domain flags are distinct
	mm/vmscan.c: don't round up scan size for online memory cgroup
	drm/amdgpu/soc15: fix xclk for raven
	xhci: apply XHCI_PME_STUCK_QUIRK to Intel Comet Lake platforms
	KVM: nVMX: Don't emulate instructions in guest mode
	KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI
	tty: serial: qcom_geni_serial: Fix UART hang
	tty: serial: qcom_geni_serial: Remove interrupt storm
	tty: serial: qcom_geni_serial: Remove use of *_relaxed() and mb()
	tty: serial: qcom_geni_serial: Remove set_rfr_wm() and related variables
	tty: serial: qcom_geni_serial: Remove xfer_mode variable
	tty: serial: qcom_geni_serial: Fix RX cancel command failure
	lib/stackdepot.c: fix global out-of-bounds in stack_slabs
	drm/nouveau/kms/gv100-: Re-set LUT after clearing for modesets
	ext4: fix a data race in EXT4_I(inode)->i_disksize
	ext4: add cond_resched() to __ext4_find_entry()
	ext4: fix potential race between online resizing and write operations
	ext4: fix potential race between s_group_info online resizing and access
	ext4: fix potential race between s_flex_groups online resizing and access
	ext4: fix mount failure with quota configured as module
	ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
	ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
	KVM: nVMX: Refactor IO bitmap checks into helper function
	KVM: nVMX: Check IO instruction VM-exit conditions
	KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1
	KVM: apic: avoid calculating pending eoi from an uninitialized val
	btrfs: fix bytes_may_use underflow in prealloc error condtition
	btrfs: reset fs_root to NULL on error in open_ctree
	btrfs: do not check delayed items are empty for single transaction cleanup
	Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents
	Revert "dmaengine: imx-sdma: Fix memory leak"
	scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout"
	scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session"
	usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus
	usb: dwc2: Fix in ISOC request length checking
	staging: rtl8723bs: fix copy of overlapping memory
	staging: greybus: use after free in gb_audio_manager_remove_all()
	ecryptfs: replace BUG_ON with error handling code
	iommu/vt-d: Fix compile warning from intel-svm.h
	genirq/proc: Reject invalid affinity masks (again)
	bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill
	ALSA: rawmidi: Avoid bit fields for state flags
	ALSA: seq: Avoid concurrent access to queue flags
	ALSA: seq: Fix concurrent access to queue current tick/time
	netfilter: xt_hashlimit: limit the max size of hashtable
	rxrpc: Fix call RCU cleanup using non-bh-safe locks
	ata: ahci: Add shutdown to freeze hardware resources of ahci
	xen: Enable interrupts when calling _cond_resched()
	s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in storage_key_init_range
	Revert "char/random: silence a lockdep splat with printk()"
	Linux 4.19.107

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I74e3d49c54d4afcfa4049042163cb879c3de3100
2020-03-03 07:33:01 +01:00
Johannes Krude
bf3043d277 bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill
commit e20d3a055a457a10a4c748ce5b7c2ed3173a1324 upstream.

This if guards whether user-space wants a copy of the offload-jited
bytecode and whether this bytecode exists. By erroneously doing a bitwise
AND instead of a logical AND on user- and kernel-space buffer-size can lead
to no data being copied to user-space especially when user-space size is a
power of two and bigger then the kernel-space buffer.

Fixes: fcfb126def ("bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info")
Signed-off-by: Johannes Krude <johannes@krude.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20200212193227.GA3769@phlox.h.transitiv.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28 16:38:59 +01:00
Thomas Gleixner
3132696dd7 genirq/proc: Reject invalid affinity masks (again)
commit cba6437a1854fde5934098ec3bd0ee83af3129f5 upstream.

Qian Cai reported that the WARN_ON() in the x86/msi affinity setting code,
which catches cases where the affinity setting is not done on the CPU which
is the current target of the interrupt, triggers during CPU hotplug stress
testing.

It turns out that the warning which was added with the commit addressing
the MSI affinity race unearthed yet another long standing bug.

If user space writes a bogus affinity mask, i.e. it contains no online CPUs,
then it calls irq_select_affinity_usr(). This was introduced for ALPHA in

  eee45269b0 ("[PATCH] Alpha: convert to generic irq framework (generic part)")

and subsequently made available for all architectures in

  1840475676 ("genirq: Expose default irq affinity mask (take 3)")

which introduced the circumvention of the affinity setting restrictions for
interrupt which cannot be moved in process context.

The whole exercise is bogus in various aspects:

  1) If the interrupt is already started up then there is absolutely
     no point to honour a bogus interrupt affinity setting from user
     space. The interrupt is already assigned to an online CPU and it
     does not make any sense to reassign it to some other randomly
     chosen online CPU.

  2) If the interupt is not yet started up then there is no point
     either. A subsequent startup of the interrupt will invoke
     irq_setup_affinity() anyway which will chose a valid target CPU.

So the only correct solution is to just return -EINVAL in case user space
wrote an affinity mask which does not contain any online CPUs, except for
ALPHA which has it's own magic sauce for this.

Fixes: 1840475676 ("genirq: Expose default irq affinity mask (take 3)")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Qian Cai <cai@lca.pw>
Link: https://lkml.kernel.org/r/878sl8xdbm.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28 16:38:59 +01:00
Suren Baghdasaryan
67e4408599 UPSTREAM: sched/psi: Fix OOB write when writing 0 bytes to PSI files
Issuing write() with count parameter set to 0 on any file under
/proc/pressure/ will cause an OOB write because of the access to
buf[buf_size-1] when NUL-termination is performed. Fix this by checking
for buf_size to be non-zero.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20200203212216.7076-1-surenb@google.com

(cherry picked from commit 6fcca0fa48118e6d63733eb4644c6cd880c15b8f)

Bug: 148159562
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9ec7acfc6e1083c677a95b0ea1c559ab50152873
2020-02-28 15:30:25 +00:00
Johannes Weiner
cf46cf40bc UPSTREAM: psi: Fix a division error in psi poll()
The psi window size is a u64 an can be up to 10 seconds right now,
which exceeds the lower 32 bits of the variable. We currently use
div_u64 for it, which is meant only for 32-bit divisors. The result is
garbage pressure sampling values and even potential div0 crashes.

Use div64_u64.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Jingfeng Xie <xiejingfeng@linux.alibaba.com>
Link: https://lkml.kernel.org/r/20191203183524.41378-3-hannes@cmpxchg.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

(cherry picked from commit c3466952ca1514158d7c16c9cfc48c27d5c5dc0f)

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I49fdfd55751d1a2cde19666624c9c5d76dc78dad
2020-02-28 15:30:04 +00:00
Johannes Weiner
55013802e8 UPSTREAM: sched/psi: Fix sampling error and rare div0 crashes with cgroups and high uptime
Jingfeng reports rare div0 crashes in psi on systems with some uptime:

[58914.066423] divide error: 0000 [#1] SMP
[58914.070416] Modules linked in: ipmi_poweroff ipmi_watchdog toa overlay fuse tcp_diag inet_diag binfmt_misc aisqos(O) aisqos_hotfixes(O)
[58914.083158] CPU: 94 PID: 140364 Comm: kworker/94:2 Tainted: G W OE K 4.9.151-015.ali3000.alios7.x86_64 #1
[58914.093722] Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 3.23.34 02/14/2019
[58914.102728] Workqueue: events psi_update_work
[58914.107258] task: ffff8879da83c280 task.stack: ffffc90059dcc000
[58914.113336] RIP: 0010:[] [] psi_update_stats+0x1c1/0x330
[58914.122183] RSP: 0018:ffffc90059dcfd60 EFLAGS: 00010246
[58914.127650] RAX: 0000000000000000 RBX: ffff8858fe98be50 RCX: 000000007744d640
[58914.134947] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00003594f700648e
[58914.142243] RBP: ffffc90059dcfdf8 R08: 0000359500000000 R09: 0000000000000000
[58914.149538] R10: 0000000000000000 R11: 0000000000000000 R12: 0000359500000000
[58914.156837] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8858fe98bd78
[58914.164136] FS: 0000000000000000(0000) GS:ffff887f7f380000(0000) knlGS:0000000000000000
[58914.172529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58914.178467] CR2: 00007f2240452090 CR3: 0000005d5d258000 CR4: 00000000007606f0
[58914.185765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58914.193061] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[58914.200360] PKRU: 55555554
[58914.203221] Stack:
[58914.205383] ffff8858fe98bd48 00000000000002f0 0000002e81036d09 ffffc90059dcfde8
[58914.213168] ffff8858fe98bec8 0000000000000000 0000000000000000 0000000000000000
[58914.220951] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[58914.228734] Call Trace:
[58914.231337] [] psi_update_work+0x22/0x60
[58914.237067] [] process_one_work+0x189/0x420
[58914.243063] [] worker_thread+0x4e/0x4b0
[58914.248701] [] ? process_one_work+0x420/0x420
[58914.254869] [] kthread+0xe6/0x100
[58914.259994] [] ? kthread_park+0x60/0x60
[58914.265640] [] ret_from_fork+0x39/0x50
[58914.271193] Code: 41 29 c3 4d 39 dc 4d 0f 42 dc <49> f7 f1 48 8b 13 48 89 c7 48 c1
[58914.279691] RIP [] psi_update_stats+0x1c1/0x330

The crashing instruction is trying to divide the observed stall time
by the sampling period. The period, stored in R8, is not 0, but we are
dividing by the lower 32 bits only, which are all 0 in this instance.

We could switch to a 64-bit division, but the period shouldn't be that
big in the first place. It's the time between the last update and the
next scheduled one, and so should always be around 2s and comfortably
fit into 32 bits.

The bug is in the initialization of new cgroups: we schedule the first
sampling event in a cgroup as an offset of sched_clock(), but fail to
initialize the last_update timestamp, and it defaults to 0. That
results in a bogusly large sampling period the first time we run the
sampling code, and consequently we underreport pressure for the first
2s of a cgroup's life. But worse, if sched_clock() is sufficiently
advanced on the system, and the user gets unlucky, the period's lower
32 bits can all be 0 and the sampling division will crash.

Fix this by initializing the last update timestamp to the creation
time of the cgroup, thus correctly marking the start of the first
pressure sampling period in a new cgroup.

Reported-by: Jingfeng Xie <xiejingfeng@linux.alibaba.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Link: https://lkml.kernel.org/r/20191203183524.41378-2-hannes@cmpxchg.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

(cherry picked from commit 3dfbe25c27eab7c90c8a7e97b4c354a9d24dd985)

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iaada5c2f1a03cf38cbb053adde478f762ce40843
2020-02-28 15:29:50 +00:00
Miles Chen
88a47f1659 UPSTREAM: sched/psi: Correct overly pessimistic size calculation
When passing a equal or more then 32 bytes long string to psi_write(),
psi_write() copies 31 bytes to its buf and overwrites buf[30]
with '\0'. Which makes the input string 1 byte shorter than
it should be.

Fix it by copying sizeof(buf) bytes when nbytes >= sizeof(buf).

This does not cause problems in normal use case like:
"some 500000 10000000" or "full 500000 10000000" because they
are less than 32 bytes in length.

	/* assuming nbytes == 35 */
	char buf[32];

	buf_size = min(nbytes, (sizeof(buf) - 1)); /* buf_size = 31 */
	if (copy_from_user(buf, user_buf, buf_size))
		return -EFAULT;

	buf[buf_size - 1] = '\0'; /* buf[30] = '\0' */

Before:

 %cd /proc/pressure/
 %echo "123456789|123456789|123456789|1234" > memory
 [   22.473497] nbytes=35,buf_size=31
 [   22.473775] 123456789|123456789|123456789| (print 30 chars)
 %sh: write error: Invalid argument

 %echo "123456789|123456789|123456789|1" > memory
 [   64.916162] nbytes=32,buf_size=31
 [   64.916331] 123456789|123456789|123456789| (print 30 chars)
 %sh: write error: Invalid argument

After:

 %cd /proc/pressure/
 %echo "123456789|123456789|123456789|1234" > memory
 [  254.837863] nbytes=35,buf_size=32
 [  254.838541] 123456789|123456789|123456789|1 (print 31 chars)
 %sh: write error: Invalid argument

 %echo "123456789|123456789|123456789|1" > memory
 [ 9965.714935] nbytes=32,buf_size=32
 [ 9965.715096] 123456789|123456789|123456789|1 (print 31 chars)
 %sh: write error: Invalid argument

Also remove the superfluous parentheses.

Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Cc: <linux-mediatek@lists.infradead.org>
Cc: <wsd_upstream@mediatek.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190912103452.13281-1-miles.chen@mediatek.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

(cherry picked from commit 4adcdcea717cb2d8436bef00dd689aa5bc76f11b)

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9371b4d5e465bb8b84ff7adf5f40f30696c6ff70
2020-02-28 15:28:23 +00:00
Sami Tolvanen
8028f78053 ANDROID: Disable wq fp check in CFI builds
With non-canonical CFI, LLVM generates jump table entries for external
symbols in modules and as a result, a function pointer passed from a
module to the core kernel will have a different address.

Disable the warning for now.

Bug: 145210207
Change-Id: Ifdcee3479280f7b97abdee6b4c746f447e0944e6
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Alistair Delva <adelva@google.com>
2020-02-27 00:09:59 +00:00
Todd Kjos
08256862e0 ANDROID: increase limit on sched-tune boost groups
Some devices need an additional sched-tune boost group to
optimize performance for key tasks

Bug: 150302001
Change-Id: I392c8cc05a8851f1d416c381b4a27242924c2c27
Signed-off-by: Todd Kjos <tkjos@google.com>
2020-02-26 21:55:29 +00:00
Greg Kroah-Hartman
4dc4199770 This is the 4.19.106 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5TfLwACgkQONu9yGCS
 aT5wlRAAhZELK39c78NMCTZKHtKGLsGb2os2IiI7zIRbqNNwnvJi+jAc3kgbS9jP
 +W+wnhYFtFisDvqdCQ009I6A0NA1p3Nqy166JplW0iIg1e7rgUKKUfabCN9sJmjh
 HGK913cJlHwGmkSxq//sBucBwWhYYGaHec28pZ7uCFATjWrTaH3G4VrvLStuicYR
 YgS9MH261tWJKJm5+V2MxnOOI0103+Uey+xVqwSnLlV+qmasxwDCMU5ae+SK7e7f
 cXIkNZwvDph1zunekHg+jd64GN3GYswXVcRighWP0n7Lr+0tGPN7SY5pvZIjZLv/
 sdroyrqAxytTYP32hypIUgsToVvJr7zXD09LGdsgOCKVwFVn8yl1e4zgGKH3L9Xu
 OK2krI90v1MVevibyaNndZ4UDKilF75oE2YYDOFW/BU1lorFAIzk4hh15CfKc8s1
 KHRjePfcgQREs/SGK8k2BAmf/JwxFN1/Ro5dl7MvKn07ZYqx6QOwUoMhgxspIntN
 9TlFw6elu1RSwu2BFts9wvoHO1tr7GZBa1cVkNF8qV1rzaGVY68aLDvvHGdffD6W
 JgX+BCfr6vcN7R4izak1RxzAoqDrRxS0vWoC1vVsPqeIIZydSxpYDquaFnbZm+Wc
 MRuh5gpQ2PzTXuMLeBB+ig6UnzsAO3x+3yIG/l5ZmmYxJbMFBKU=
 =zE/i
 -----END PGP SIGNATURE-----

Merge 4.19.106 into android-4.19

Changes in 4.19.106
	core: Don't skip generic XDP program execution for cloned SKBs
	enic: prevent waking up stopped tx queues over watchdog reset
	net/smc: fix leak of kernel memory to user space
	net: dsa: tag_qca: Make sure there is headroom for tag
	net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGS
	net/sched: flower: add missing validation of TCA_FLOWER_FLAGS
	Revert "KVM: nVMX: Use correct root level for nested EPT shadow page tables"
	Revert "KVM: VMX: Add non-canonical check on writes to RTIT address MSRs"
	KVM: nVMX: Use correct root level for nested EPT shadow page tables
	drm/gma500: Fixup fbdev stolen size usage evaluation
	cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
	brcmfmac: Fix use after free in brcmf_sdio_readframes()
	leds: pca963x: Fix open-drain initialization
	ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT
	ALSA: ctl: allow TLV read operation for callback type of element in locked case
	gianfar: Fix TX timestamping with a stacked DSA driver
	pinctrl: sh-pfc: sh7264: Fix CAN function GPIOs
	pxa168fb: Fix the function used to release some memory in an error handling path
	media: i2c: mt9v032: fix enum mbus codes and frame sizes
	powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number
	gpio: gpio-grgpio: fix possible sleep-in-atomic-context bugs in grgpio_irq_map/unmap()
	iommu/vt-d: Fix off-by-one in PASID allocation
	char/random: silence a lockdep splat with printk()
	media: sti: bdisp: fix a possible sleep-in-atomic-context bug in bdisp_device_run()
	pinctrl: baytrail: Do not clear IRQ flags on direct-irq enabled pins
	efi/x86: Map the entire EFI vendor string before copying it
	MIPS: Loongson: Fix potential NULL dereference in loongson3_platform_init()
	sparc: Add .exit.data section.
	uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()
	usb: gadget: udc: fix possible sleep-in-atomic-context bugs in gr_probe()
	usb: dwc2: Fix IN FIFO allocation
	clocksource/drivers/bcm2835_timer: Fix memory leak of timer
	kselftest: Minimise dependency of get_size on C library interfaces
	jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
	x86/sysfb: Fix check for bad VRAM size
	pwm: omap-dmtimer: Simplify error handling
	s390/pci: Fix possible deadlock in recover_store()
	powerpc/iov: Move VF pdev fixup into pcibios_fixup_iov()
	tracing: Fix tracing_stat return values in error handling paths
	tracing: Fix very unlikely race of registering two stat tracers
	ARM: 8952/1: Disable kmemleak on XIP kernels
	ext4, jbd2: ensure panic when aborting with zero errno
	ath10k: Correct the DMA direction for management tx buffers
	drm/amd/display: Retrain dongles when SINK_COUNT becomes non-zero
	nbd: add a flush_workqueue in nbd_start_device
	KVM: s390: ENOTSUPP -> EOPNOTSUPP fixups
	kconfig: fix broken dependency in randconfig-generated .config
	clk: qcom: rcg2: Don't crash if our parent can't be found; return an error
	drm/amdgpu: remove 4 set but not used variable in amdgpu_atombios_get_connector_info_from_object_table
	drm/amdgpu: Ensure ret is always initialized when using SOC15_WAIT_ON_RREG
	regulator: rk808: Lower log level on optional GPIOs being not available
	net/wan/fsl_ucc_hdlc: reject muram offsets above 64K
	NFC: port100: Convert cpu_to_le16(le16_to_cpu(E1) + E2) to use le16_add_cpu().
	selinux: fall back to ref-walk if audit is required
	arm64: dts: allwinner: H6: Add PMU mode
	arm: dts: allwinner: H3: Add PMU node
	selinux: ensure we cleanup the internal AVC counters on error in avc_insert()
	arm64: dts: qcom: msm8996: Disable USB2 PHY suspend by core
	ARM: dts: imx6: rdu2: Disable WP for USDHC2 and USDHC3
	ARM: dts: imx6: rdu2: Limit USBH1 to Full Speed
	PCI: iproc: Apply quirk_paxc_bridge() for module as well as built-in
	media: cx23885: Add support for AVerMedia CE310B
	PCI: Add generic quirk for increasing D3hot delay
	PCI: Increase D3 delay for AMD Ryzen5/7 XHCI controllers
	media: v4l2-device.h: Explicitly compare grp{id,mask} to zero in v4l2_device macros
	reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling
	r8169: check that Realtek PHY driver module is loaded
	fore200e: Fix incorrect checks of NULL pointer dereference
	netfilter: nft_tunnel: add the missing ERSPAN_VERSION nla_policy
	ALSA: usx2y: Adjust indentation in snd_usX2Y_hwdep_dsp_status
	b43legacy: Fix -Wcast-function-type
	ipw2x00: Fix -Wcast-function-type
	iwlegacy: Fix -Wcast-function-type
	rtlwifi: rtl_pci: Fix -Wcast-function-type
	orinoco: avoid assertion in case of NULL pointer
	ACPICA: Disassembler: create buffer fields in ACPI_PARSE_LOAD_PASS1
	scsi: ufs: Complete pending requests in host reset and restore path
	scsi: aic7xxx: Adjust indentation in ahc_find_syncrate
	drm/mediatek: handle events when enabling/disabling crtc
	ARM: dts: r8a7779: Add device node for ARM global timer
	selinux: ensure we cleanup the internal AVC counters on error in avc_update()
	dmaengine: Store module owner in dma_device struct
	dmaengine: imx-sdma: Fix memory leak
	crypto: chtls - Fixed memory leak
	x86/vdso: Provide missing include file
	PM / devfreq: rk3399_dmc: Add COMPILE_TEST and HAVE_ARM_SMCCC dependency
	pinctrl: sh-pfc: sh7269: Fix CAN function GPIOs
	reset: uniphier: Add SCSSI reset control for each channel
	RDMA/rxe: Fix error type of mmap_offset
	clk: sunxi-ng: add mux and pll notifiers for A64 CPU clock
	ALSA: sh: Fix unused variable warnings
	clk: uniphier: Add SCSSI clock gate for each channel
	ALSA: sh: Fix compile warning wrt const
	tools lib api fs: Fix gcc9 stringop-truncation compilation error
	ACPI: button: Add DMI quirk for Razer Blade Stealth 13 late 2019 lid switch
	mlx5: work around high stack usage with gcc
	drm: remove the newline for CRC source name.
	ARM: dts: stm32: Add power-supply for DSI panel on stm32f469-disco
	usbip: Fix unsafe unaligned pointer usage
	udf: Fix free space reporting for metadata and virtual partitions
	staging: rtl8188: avoid excessive stack usage
	IB/hfi1: Add software counter for ctxt0 seq drop
	soc/tegra: fuse: Correct straps' address for older Tegra124 device trees
	efi/x86: Don't panic or BUG() on non-critical error conditions
	rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nulls
	Input: edt-ft5x06 - work around first register access error
	x86/nmi: Remove irq_work from the long duration NMI handler
	wan: ixp4xx_hss: fix compile-testing on 64-bit
	ASoC: atmel: fix build error with CONFIG_SND_ATMEL_SOC_DMA=m
	tty: synclinkmp: Adjust indentation in several functions
	tty: synclink_gt: Adjust indentation in several functions
	visorbus: fix uninitialized variable access
	driver core: platform: Prevent resouce overflow from causing infinite loops
	driver core: Print device when resources present in really_probe()
	bpf: Return -EBADRQC for invalid map type in __bpf_tx_xdp_map
	vme: bridges: reduce stack usage
	drm/nouveau/secboot/gm20b: initialize pointer in gm20b_secboot_new()
	drm/nouveau/gr/gk20a,gm200-: add terminators to method lists read from fw
	drm/nouveau: Fix copy-paste error in nouveau_fence_wait_uevent_handler
	drm/nouveau/drm/ttm: Remove set but not used variable 'mem'
	drm/nouveau/fault/gv100-: fix memory leak on module unload
	drm/vmwgfx: prevent memory leak in vmw_cmdbuf_res_add
	usb: musb: omap2430: Get rid of musb .set_vbus for omap2430 glue
	iommu/arm-smmu-v3: Use WRITE_ONCE() when changing validity of an STE
	f2fs: set I_LINKABLE early to avoid wrong access by vfs
	f2fs: free sysfs kobject
	scsi: iscsi: Don't destroy session if there are outstanding connections
	arm64: fix alternatives with LLVM's integrated assembler
	drm/amd/display: fixup DML dependencies
	watchdog/softlockup: Enforce that timestamp is valid on boot
	f2fs: fix memleak of kobject
	x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd
	pwm: omap-dmtimer: Remove PWM chip in .remove before making it unfunctional
	cmd64x: potential buffer overflow in cmd64x_program_timings()
	ide: serverworks: potential overflow in svwks_set_pio_mode()
	pwm: Remove set but not set variable 'pwm'
	btrfs: fix possible NULL-pointer dereference in integrity checks
	btrfs: safely advance counter when looking up bio csums
	btrfs: device stats, log when stats are zeroed
	module: avoid setting info->name early in case we can fall back to info->mod->name
	remoteproc: Initialize rproc_class before use
	irqchip/mbigen: Set driver .suppress_bind_attrs to avoid remove problems
	ALSA: hda/hdmi - add retry logic to parse_intel_hdmi()
	kbuild: use -S instead of -E for precise cc-option test in Kconfig
	x86/decoder: Add TEST opcode to Group3-2
	s390: adjust -mpacked-stack support check for clang 10
	s390/ftrace: generate traced function stack frame
	driver core: platform: fix u32 greater or equal to zero comparison
	ALSA: hda - Add docking station support for Lenovo Thinkpad T420s
	drm/nouveau/mmu: fix comptag memory leak
	powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV
	bcache: cached_dev_free needs to put the sb page
	iommu/vt-d: Remove unnecessary WARN_ON_ONCE()
	selftests: bpf: Reset global state between reuseport test runs
	jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record
	jbd2: make sure ESHUTDOWN to be recorded in the journal superblock
	ARM: 8951/1: Fix Kexec compilation issue.
	hostap: Adjust indentation in prism2_hostapd_add_sta
	iwlegacy: ensure loop counter addr does not wrap and cause an infinite loop
	cifs: fix NULL dereference in match_prepath
	bpf: map_seq_next should always increase position index
	ceph: check availability of mds cluster on mount after wait timeout
	rbd: work around -Wuninitialized warning
	irqchip/gic-v3: Only provision redistributors that are enabled in ACPI
	drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
	ftrace: fpid_next() should increase position index
	trigger_next should increase position index
	radeon: insert 10ms sleep in dce5_crtc_load_lut
	ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
	lib/scatterlist.c: adjust indentation in __sg_alloc_table
	reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
	bcache: explicity type cast in bset_bkey_last()
	irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL
	iwlwifi: mvm: Fix thermal zone registration
	microblaze: Prevent the overflow of the start
	brd: check and limit max_part par
	drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_latency
	drm/amdgpu/smu10: fix smu10_get_clock_by_type_with_voltage
	NFS: Fix memory leaks
	help_next should increase position index
	cifs: log warning message (once) if out of disk space
	virtio_balloon: prevent pfn array overflow
	mlxsw: spectrum_dpipe: Add missing error path
	drm/amdgpu/display: handle multiple numbers of fclks in dcn_calcs.c (v2)
	Linux 4.19.106

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia1032b50dd82b42e13973120dcbf94ae7b864648
2020-02-24 09:13:25 +01:00
Vasily Averin
9ed840b756 trigger_next should increase position index
[ Upstream commit 6722b23e7a2ace078344064a9735fb73e554e9ef ]

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

Without patch:
 # dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger
 dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset
 n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
 # Available triggers:
 # traceon traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
 6+1 records in
 6+1 records out
 206 bytes copied, 0.00027916 s, 738 kB/s

Notice the printing of "# Available triggers:..." after the line.

With the patch:
 # dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger
 dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset
 n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
 2+1 records in
 2+1 records out
 88 bytes copied, 0.000526867 s, 167 kB/s

It only prints the end of the file, and does not restart.

Link: http://lkml.kernel.org/r/3c35ee24-dd3a-8119-9c19-552ed253388a@virtuozzo.com

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:52 +01:00
Vasily Averin
ddb005d906 ftrace: fpid_next() should increase position index
[ Upstream commit e4075e8bdffd93a9b6d6e1d52fabedceeca5a91b ]

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

Without patch:
 # dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
 dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
 id
 no pid
 2+1 records in
 2+1 records out
 10 bytes copied, 0.000213285 s, 46.9 kB/s

Notice the "id" followed by "no pid".

With the patch:
 # dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
 dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
 id
 0+1 records in
 0+1 records out
 3 bytes copied, 0.000202112 s, 14.8 kB/s

Notice that it only prints "id" and not the "no pid" afterward.

Link: http://lkml.kernel.org/r/4f87c6ad-f114-30bb-8506-c32274ce2992@virtuozzo.com

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:52 +01:00
Vasily Averin
ca2b459365 bpf: map_seq_next should always increase position index
[ Upstream commit 90435a7891a2259b0f74c5a1bc5600d0d64cba8f ]

If seq_file .next fuction does not change position index,
read after some lseek can generate an unexpected output.

See also: https://bugzilla.kernel.org/show_bug.cgi?id=206283

v1 -> v2: removed missed increment in end of function

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/eca84fdd-c374-a154-d874-6c7b55fc3bc4@virtuozzo.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:51 +01:00
Jessica Yu
c371b1e41f module: avoid setting info->name early in case we can fall back to info->mod->name
[ Upstream commit 708e0ada1916be765b7faa58854062f2bc620bbf ]

In setup_load_info(), info->name (which contains the name of the module,
mostly used for early logging purposes before the module gets set up)
gets unconditionally assigned if .modinfo is missing despite the fact
that there is an if (!info->name) check near the end of the function.
Avoid assigning a placeholder string to info->name if .modinfo doesn't
exist, so that we can fall back to info->mod->name later on.

Fixes: 5fdc7db644 ("module: setup load info before module_sig_check()")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:49 +01:00
Thomas Gleixner
c2913e2c50 watchdog/softlockup: Enforce that timestamp is valid on boot
[ Upstream commit 11e31f608b499f044f24b20be73f1dcab3e43f8a ]

Robert reported that during boot the watchdog timestamp is set to 0 for one
second which is the indicator for a watchdog reset.

The reason for this is that the timestamp is in seconds and the time is
taken from sched clock and divided by ~1e9. sched clock starts at 0 which
means that for the first second during boot the watchdog timestamp is 0,
i.e. reset.

Use ULONG_MAX as the reset indicator value so the watchdog works correctly
right from the start. ULONG_MAX would only conflict with a real timestamp
if the system reaches an uptime of 136 years on 32bit and almost eternity
on 64bit.

Reported-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87o8v3uuzl.fsf@nanos.tec.linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:49 +01:00
Steven Rostedt (VMware)
56d3793229 tracing: Fix very unlikely race of registering two stat tracers
[ Upstream commit dfb6cd1e654315168e36d947471bd2a0ccd834ae ]

Looking through old emails in my INBOX, I came across a patch from Luis
Henriques that attempted to fix a race of two stat tracers registering the
same stat trace (extremely unlikely, as this is done in the kernel, and
probably doesn't even exist). The submitted patch wasn't quite right as it
needed to deal with clean up a bit better (if two stat tracers were the
same, it would have the same files).

But to make the code cleaner, all we needed to do is to keep the
all_stat_sessions_mutex held for most of the registering function.

Link: http://lkml.kernel.org/r/1410299375-20068-1-git-send-email-luis.henriques@canonical.com

Fixes: 002bb86d8d ("tracing/ftrace: separate events tracing and stats tracing engine")
Reported-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:39 +01:00
Luis Henriques
fb0085070a tracing: Fix tracing_stat return values in error handling paths
[ Upstream commit afccc00f75bbbee4e4ae833a96c2d29a7259c693 ]

tracing_stat_init() was always returning '0', even on the error paths.  It
now returns -ENODEV if tracing_init_dentry() fails or -ENOMEM if it fails
to created the 'trace_stat' debugfs directory.

Link: http://lkml.kernel.org/r/1410299381-20108-1-git-send-email-luis.henriques@canonical.com

Fixes: ed6f1c996b ("tracing: Check return value of tracing_init_dentry()")
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
[ Pulled from the archeological digging of my INBOX ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:39 +01:00
Peter Zijlstra
b9dc4d61b5 cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
[ Upstream commit 45178ac0cea853fe0e405bf11e101bdebea57b15 ]

Paul reported a very sporadic, rcutorture induced, workqueue failure.
When the planets align, the workqueue rescuer's self-migrate fails and
then triggers a WARN for running a work on the wrong CPU.

Tejun then figured that set_cpus_allowed_ptr()'s stop_one_cpu() call
could be ignored! When stopper->enabled is false, stop_machine will
insta complete the work, without actually doing the work. Worse, it
will not WARN about this (we really should fix this).

It turns out there is a small window where a freshly online'ed CPU is
marked 'online' but doesn't yet have the stopper task running:

	BP				AP

	bringup_cpu()
	  __cpu_up(cpu, idle)	 -->	start_secondary()
					...
					cpu_startup_entry()
	  bringup_wait_for_ap()
	    wait_for_ap_thread() <--	  cpuhp_online_idle()
					  while (1)
					    do_idle()

					... available to run kthreads ...

	    stop_machine_unpark()
	      stopper->enable = true;

Close this by moving the stop_machine_unpark() into
cpuhp_online_idle(), such that the stopper thread is ready before we
start the idle loop and schedule.

Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Debugged-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24 08:34:35 +01:00
Quentin Perret
2a557de670 UPSTREAM: sched/topology: Introduce a sysctl for Energy Aware Scheduling
In its current state, Energy Aware Scheduling (EAS) starts automatically
on asymmetric platforms having an Energy Model (EM). However, there are
users who want to have an EM (for thermal management for example), but
don't want EAS with it.

In order to let users disable EAS explicitly, introduce a new sysctl
called 'sched_energy_aware'. It is enabled by default so that EAS can
start automatically on platforms where it makes sense. Flipping it to 0
rebuilds the scheduling domains and disables EAS.

Bug: 120440300
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-11-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 8d5d0cfb63cbcb4005e19a332b31d687b1d01e58)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I4ca842d07b82869cfab7542c8c4351f631e1024d
2020-02-19 10:50:59 +00:00
Greg Kroah-Hartman
4eee97caec This is the 4.19.104 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5HEigACgkQONu9yGCS
 aT7Gcw/6AkDGkK5U/aDpKMqWiRmZUqDIg8U9xR+44Gl57Q71vicrzq8NGPHxxbsF
 slWoCyXLVSD7bMWGsTD0qJR8muROAraMxDl8dCxojEXHnXFMx4A4Cf0h1E0lY0mu
 Jq/O9m33ZMSppjio88sCcLpo0pbXF+cCX1CY87NI5QUitUzHgRh18W8BtyFpMMI8
 eC0Fc+hMWax3+qqHt/hFVpufaTKm35zLCpGjGAJiHd7GFvqUJnuAzBYCs1Cf8NO1
 KrrL3l/IWk8z3Z0Wc9PbBz309a9H6FVpjrXSXj6URkxjtqJ0F0mBMaIYxhaUF8PD
 CHY5xLyqKodC8/7O5zNOrP80oT9nqJvsmKwUwlG34IJuMVaq/o+hZu+88JVB02Yw
 v9XVcaQda5aZgWF9cBWzFQEcNwHFDCQ9VNidLDcHJLGPyFo/BogvMo8T4yPM9tI0
 O0PSFm/yYu0airZSCzIbPzuF2Iv+iilVtq+o10VRDsGtEYAOzTL7nA01MkdXFhwy
 4V+Q51C90TGo13BnnZ6xpEqjspuDWgeOD71/xkQ5cnyFgam0XQq/5R6JJghJIHOP
 7p8NMMyNhK2FnOGrFUgqvwBCp6Dap1ISZyKvie1Z8vuCJsZcwMVIw8fxAzoZWOjj
 MlmmePjlbC7XTFxjdo0jrQTdvBwq+gFgNitD7UAlfHAdqKJKKA4=
 =8ktI
 -----END PGP SIGNATURE-----

Merge 4.19.104 into android-4.19

Changes in 4.19.104
	ASoC: pcm: update FE/BE trigger order based on the command
	hv_sock: Remove the accept port restriction
	IB/mlx4: Fix memory leak in add_gid error flow
	RDMA/netlink: Do not always generate an ACK for some netlink operations
	RDMA/core: Fix locking in ib_uverbs_event_read
	RDMA/uverbs: Verify MR access flags
	scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
	PCI/IOV: Fix memory leak in pci_iov_add_virtfn()
	ath10k: pci: Only dump ATH10K_MEM_REGION_TYPE_IOREG when safe
	PCI/switchtec: Fix vep_vector_number ioread width
	PCI: Don't disable bridge BARs when assigning bus resources
	nfs: NFS_SWAP should depend on SWAP
	NFS: Revalidate the file size on a fatal write error
	NFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes()
	NFSv4: try lease recovery on NFS4ERR_EXPIRED
	serial: uartps: Add a timeout to the tx empty wait
	gpio: zynq: Report gpio direction at boot
	spi: spi-mem: Add extra sanity checks on the op param
	spi: spi-mem: Fix inverted logic in op sanity check
	rtc: hym8563: Return -EINVAL if the time is known to be invalid
	rtc: cmos: Stop using shared IRQ
	ARC: [plat-axs10x]: Add missing multicast filter number to GMAC node
	platform/x86: intel_mid_powerbtn: Take a copy of ddata
	ARM: dts: at91: Reenable UART TX pull-ups
	ARM: dts: am43xx: add support for clkout1 clock
	ARM: dts: at91: sama5d3: fix maximum peripheral clock rates
	ARM: dts: at91: sama5d3: define clock rate range for tcb1
	tools/power/acpi: fix compilation error
	powerpc/pseries/vio: Fix iommu_table use-after-free refcount warning
	powerpc/pseries: Allow not having ibm, hypertas-functions::hcall-multi-tce for DDW
	iommu/arm-smmu-v3: Populate VMID field for CMDQ_OP_TLBI_NH_VA
	KVM: arm/arm64: vgic-its: Fix restoration of unmapped collections
	ARM: 8949/1: mm: mark free_memmap as __init
	arm64: cpufeature: Fix the type of no FP/SIMD capability
	arm64: ptrace: nofpsimd: Fail FP/SIMD regset operations
	KVM: arm/arm64: Fix young bit from mmu notifier
	KVM: arm: Fix DFSR setting for non-LPAE aarch32 guests
	KVM: arm: Make inject_abt32() inject an external abort instead
	KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
	mtd: onenand_base: Adjust indentation in onenand_read_ops_nolock
	mtd: sharpslpart: Fix unsigned comparison to zero
	crypto: artpec6 - return correct error code for failed setkey()
	crypto: atmel-sha - fix error handling when setting hmac key
	media: i2c: adv748x: Fix unsafe macros
	pinctrl: sh-pfc: r8a7778: Fix duplicate SDSELF_B and SD1_CLK_B
	mwifiex: Fix possible buffer overflows in mwifiex_ret_wmm_get_status()
	mwifiex: Fix possible buffer overflows in mwifiex_cmd_append_vsie_tlv()
	libertas: don't exit from lbs_ibss_join_existing() with RCU read lock held
	libertas: make lbs_ibss_join_existing() return error code on rates overflow
	scsi: megaraid_sas: Do not initiate OCR if controller is not in ready state
	x86/stackframe: Move ENCODE_FRAME_POINTER to asm/frame.h
	x86/stackframe, x86/ftrace: Add pt_regs frame annotations
	serial: uartps: Move the spinlock after the read of the tx empty
	padata: fix null pointer deref of pd->pinst
	Linux 4.19.104

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I42a465b140183dcc8cf49e19903d0e8f4b688930
2020-02-19 08:31:05 +01:00
Daniel Jordan
cad926f70b padata: fix null pointer deref of pd->pinst
The 4.19 backport dc34710a7a ("padata: Remove broken queue flushing")
removed padata_alloc_pd()'s assignment to pd->pinst, resulting in:

    Unable to handle kernel NULL pointer dereference ...
    ...
    pc : padata_reorder+0x144/0x2e0
    ...
    Call trace:
     padata_reorder+0x144/0x2e0
     padata_do_serial+0xc8/0x128
     pcrypt_aead_enc+0x60/0x70 [pcrypt]
     padata_parallel_worker+0xd8/0x138
     process_one_work+0x1bc/0x4b8
     worker_thread+0x164/0x580
     kthread+0x134/0x138
     ret_from_fork+0x10/0x18

This happened because the backport was based on an enhancement that
moved this assignment but isn't in 4.19:

  bfde23ce200e ("padata: unbind parallel jobs from specific CPUs")

Simply restore the assignment to fix the crash.

Fixes: dc34710a7a ("padata: Remove broken queue flushing")
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-14 16:33:28 -05:00
Greg Kroah-Hartman
3389e56d31 This is the 4.19.103 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl5Cn0wACgkQONu9yGCS
 aT584xAAtePSlzTxst/jukREoyrpAfTM1BeovMdsZEBpKh+/F3n1udqHeo+iNAAN
 qSOig012aW2qP7b5/4CrEU9ZRTvd0AM4fog7ABLJVahMYMqoJgod8TRaE4v0nVut
 eRans6w3NbZJCZwdw2aiu5gwFfjwJLSUckBNmj4XVYdyfh7q0BgnZV5OY0V+zhuG
 1MWXaylbRqjguR/ZFk0UPAmRaqNKHbwfCJ1V0ygL9xQkJM0cUn7hX9/CqM4aYnm6
 m1oux4ektLAmF1XK4NiQEuRBMeFO74XlKcsZqQHf/b4FZfcPergcPwIj8ugtCHzJ
 kx2QgURDjgH4Tnu+Q0ScPrjj2kjU8rWmjqlcv1PcUyOWm+MR0OK9bW7TLEntMSF8
 HOEe9j6SsjQNIOoYh1YcMnuGjKNIZjl2L3VbDzpVN2GxZxwAutY6G68tV7sbA2pu
 wtsrAVOqdcjoo0ruRmwognBqQAdNdsbiBx7bgcNjVEXWL0N3Ddiv6CNYwnehA5Hq
 cvQwVQpFGP9ZGYUcCMbdwR+7kJzVy6V2S615M8GkE9FouOwTfV60zM/sZ1rFVt1J
 70zxfRX5ys19aTAVkbi6pHHCUJ0ZAiTgWujp5Hp4kPt7gEz01Ur0s1kI3b7b6iWh
 cuycRFULvqeXCApQacs//lOVDoUV20uFcL/zqOFM33v/+YzkyjA=
 =3D8z
 -----END PGP SIGNATURE-----

Merge 4.19.103 into android-4.19

Changes in 4.19.103
	Revert "drm/sun4i: dsi: Change the start delay calculation"
	ovl: fix lseek overflow on 32bit
	kernel/module: Fix memleak in module_add_modinfo_attrs()
	media: iguanair: fix endpoint sanity check
	ocfs2: fix oops when writing cloned file
	x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
	udf: Allow writing to 'Rewritable' partitions
	printk: fix exclusive_console replaying
	iwlwifi: mvm: fix NVM check for 3168 devices
	sparc32: fix struct ipc64_perm type definition
	cls_rsvp: fix rsvp_policy
	gtp: use __GFP_NOWARN to avoid memalloc warning
	l2tp: Allow duplicate session creation with UDP
	net: hsr: fix possible NULL deref in hsr_handle_frame()
	net_sched: fix an OOB access in cls_tcindex
	net: stmmac: Delete txtimer in suspend()
	bnxt_en: Fix TC queue mapping.
	tcp: clear tp->total_retrans in tcp_disconnect()
	tcp: clear tp->delivered in tcp_disconnect()
	tcp: clear tp->data_segs{in|out} in tcp_disconnect()
	tcp: clear tp->segs_{in|out} in tcp_disconnect()
	rxrpc: Fix use-after-free in rxrpc_put_local()
	rxrpc: Fix insufficient receive notification generation
	rxrpc: Fix missing active use pinning of rxrpc_local object
	rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect
	media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
	mfd: dln2: More sanity checking for endpoints
	ipc/msg.c: consolidate all xxxctl_down() functions
	tracing: Fix sched switch start/stop refcount racy updates
	rcu: Avoid data-race in rcu_gp_fqs_check_wake()
	brcmfmac: Fix memory leak in brcmf_usbdev_qinit
	usb: typec: tcpci: mask event interrupts when remove driver
	usb: gadget: legacy: set max_speed to super-speed
	usb: gadget: f_ncm: Use atomic_t to track in-flight request
	usb: gadget: f_ecm: Use atomic_t to track in-flight request
	ALSA: usb-audio: Fix endianess in descriptor validation
	ALSA: dummy: Fix PCM format loop in proc output
	mm/memory_hotplug: fix remove_memory() lockdep splat
	mm: move_pages: report the number of non-attempted pages
	media/v4l2-core: set pages dirty upon releasing DMA buffers
	media: v4l2-core: compat: ignore native command codes
	media: v4l2-rect.h: fix v4l2_rect_map_inside() top/left adjustments
	lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more()
	irqdomain: Fix a memory leak in irq_domain_push_irq()
	platform/x86: intel_scu_ipc: Fix interrupt support
	ALSA: hda: Add Clevo W65_67SB the power_save blacklist
	KVM: arm64: Correct PSTATE on exception entry
	KVM: arm/arm64: Correct CPSR on exception entry
	KVM: arm/arm64: Correct AArch32 SPSR on exception entry
	KVM: arm64: Only sign-extend MMIO up to register width
	MIPS: fix indentation of the 'RELOCS' message
	MIPS: boot: fix typo in 'vmlinux.lzma.its' target
	s390/mm: fix dynamic pagetable upgrade for hugetlbfs
	powerpc/xmon: don't access ASDR in VMs
	powerpc/pseries: Advance pfn if section is not present in lmb_is_removable()
	smb3: fix signing verification of large reads
	PCI: tegra: Fix return value check of pm_runtime_get_sync()
	mmc: spi: Toggle SPI polarity, do not hardcode it
	ACPI: video: Do not export a non working backlight interface on MSI MS-7721 boards
	ACPI / battery: Deal with design or full capacity being reported as -1
	ACPI / battery: Use design-cap for capacity calculations if full-cap is not available
	ACPI / battery: Deal better with neither design nor full capacity not being reported
	alarmtimer: Unregister wakeup source when module get fails
	ubifs: Reject unsupported ioctl flags explicitly
	ubifs: don't trigger assertion on invalid no-key filename
	ubifs: Fix FS_IOC_SETFLAGS unexpectedly clearing encrypt flag
	ubifs: Fix deadlock in concurrent bulk-read and writepage
	crypto: geode-aes - convert to skcipher API and make thread-safe
	PCI: keystone: Fix link training retries initiation
	mmc: sdhci-of-at91: fix memleak on clk_get failure
	hv_balloon: Balloon up according to request page number
	mfd: axp20x: Mark AXP20X_VBUS_IPSOUT_MGMT as volatile
	crypto: api - Check spawn->alg under lock in crypto_drop_spawn
	crypto: ccree - fix backlog memory leak
	crypto: ccree - fix pm wrongful error reporting
	crypto: ccree - fix PM race condition
	scripts/find-unused-docs: Fix massive false positives
	scsi: qla2xxx: Fix mtcp dump collection failure
	power: supply: ltc2941-battery-gauge: fix use-after-free
	ovl: fix wrong WARN_ON() in ovl_cache_update_ino()
	f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
	f2fs: fix miscounted block limit in f2fs_statfs_project()
	f2fs: code cleanup for f2fs_statfs_project()
	PM: core: Fix handling of devices deleted during system-wide resume
	of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc
	dm zoned: support zone sizes smaller than 128MiB
	dm space map common: fix to ensure new block isn't already in use
	dm crypt: fix benbi IV constructor crash if used in authenticated mode
	dm: fix potential for q->make_request_fn NULL pointer
	dm writecache: fix incorrect flush sequence when doing SSD mode commit
	padata: Remove broken queue flushing
	tracing: Annotate ftrace_graph_hash pointer with __rcu
	tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
	ftrace: Add comment to why rcu_dereference_sched() is open coded
	ftrace: Protect ftrace_graph_hash with ftrace_sync
	samples/bpf: Don't try to remove user's homedir on clean
	crypto: ccp - set max RSA modulus size for v3 platform devices as well
	crypto: pcrypt - Do not clear MAY_SLEEP flag in original request
	crypto: atmel-aes - Fix counter overflow in CTR mode
	crypto: api - Fix race condition in crypto_spawn_alg
	crypto: picoxcell - adjust the position of tasklet_init and fix missed tasklet_kill
	scsi: qla2xxx: Fix unbound NVME response length
	NFS: Fix memory leaks and corruption in readdir
	NFS: Directory page cache pages need to be locked when read
	jbd2_seq_info_next should increase position index
	Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES
	btrfs: set trans->drity in btrfs_commit_transaction
	Btrfs: fix race between adding and putting tree mod seq elements and nodes
	ARM: tegra: Enable PLLP bypass during Tegra124 LP1
	iwlwifi: don't throw error when trying to remove IGTK
	mwifiex: fix unbalanced locking in mwifiex_process_country_ie()
	sunrpc: expiry_time should be seconds not timeval
	gfs2: move setting current->backing_dev_info
	gfs2: fix O_SYNC write handling
	drm/rect: Avoid division by zero
	media: rc: ensure lirc is initialized before registering input device
	tools/kvm_stat: Fix kvm_exit filter name
	xen/balloon: Support xend-based toolstack take two
	watchdog: fix UAF in reboot notifier handling in watchdog core code
	bcache: add readahead cache policy options via sysfs interface
	eventfd: track eventfd_signal() recursion depth
	aio: prevent potential eventfd recursion on poll
	KVM: x86: Refactor picdev_write() to prevent Spectre-v1/L1TF attacks
	KVM: x86: Refactor prefix decoding to prevent Spectre-v1/L1TF attacks
	KVM: x86: Protect pmu_intel.c from Spectre-v1/L1TF attacks
	KVM: x86: Protect DR-based index computations from Spectre-v1/L1TF attacks
	KVM: x86: Protect kvm_lapic_reg_write() from Spectre-v1/L1TF attacks
	KVM: x86: Protect kvm_hv_msr_[get|set]_crash_data() from Spectre-v1/L1TF attacks
	KVM: x86: Protect ioapic_write_indirect() from Spectre-v1/L1TF attacks
	KVM: x86: Protect MSR-based index computations in pmu.h from Spectre-v1/L1TF attacks
	KVM: x86: Protect ioapic_read_indirect() from Spectre-v1/L1TF attacks
	KVM: x86: Protect MSR-based index computations from Spectre-v1/L1TF attacks in x86.c
	KVM: x86: Protect x86_decode_insn from Spectre-v1/L1TF attacks
	KVM: x86: Protect MSR-based index computations in fixed_msr_to_seg_unit() from Spectre-v1/L1TF attacks
	KVM: x86: Fix potential put_fpu() w/o load_fpu() on MPX platform
	KVM: PPC: Book3S HV: Uninit vCPU if vcore creation fails
	KVM: PPC: Book3S PR: Free shared page if mmu initialization fails
	x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
	KVM: x86: Don't let userspace set host-reserved cr4 bits
	KVM: x86: Free wbinvd_dirty_mask if vCPU creation fails
	KVM: s390: do not clobber registers during guest reset/store status
	clk: tegra: Mark fuse clock as critical
	drm/amd/dm/mst: Ignore payload update failures
	percpu: Separate decrypted varaibles anytime encryption can be enabled
	scsi: qla2xxx: Fix the endianness of the qla82xx_get_fw_size() return type
	scsi: csiostor: Adjust indentation in csio_device_reset
	scsi: qla4xxx: Adjust indentation in qla4xxx_mem_free
	scsi: ufs: Recheck bkops level if bkops is disabled
	phy: qualcomm: Adjust indentation in read_poll_timeout
	ext2: Adjust indentation in ext2_fill_super
	powerpc/44x: Adjust indentation in ibm4xx_denali_fixup_memsize
	drm: msm: mdp4: Adjust indentation in mdp4_dsi_encoder_enable
	NFC: pn544: Adjust indentation in pn544_hci_check_presence
	ppp: Adjust indentation into ppp_async_input
	net: smc911x: Adjust indentation in smc911x_phy_configure
	net: tulip: Adjust indentation in {dmfe, uli526x}_init_module
	IB/mlx5: Fix outstanding_pi index for GSI qps
	IB/core: Fix ODP get user pages flow
	nfsd: fix delay timer on 32-bit architectures
	nfsd: fix jiffies/time_t mixup in LRU list
	nfsd: Return the correct number of bytes written to the file
	ubi: fastmap: Fix inverted logic in seen selfcheck
	ubi: Fix an error pointer dereference in error handling code
	mfd: da9062: Fix watchdog compatible string
	mfd: rn5t618: Mark ADC control register volatile
	bonding/alb: properly access headers in bond_alb_xmit()
	net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port
	net: mvneta: move rx_dropped and rx_errors in per-cpu stats
	net_sched: fix a resource leak in tcindex_set_parms()
	net: systemport: Avoid RBUF stuck in Wake-on-LAN mode
	net/mlx5: IPsec, Fix esp modify function attribute
	net/mlx5: IPsec, fix memory leak at mlx5_fpga_ipsec_delete_sa_ctx
	net: macb: Remove unnecessary alignment check for TSO
	net: macb: Limit maximum GEM TX length in TSO
	net: dsa: b53: Always use dev->vlan_enabled in b53_configure_vlan()
	ext4: fix deadlock allocating crypto bounce page from mempool
	btrfs: use bool argument in free_root_pointers()
	btrfs: free block groups after free'ing fs trees
	drm: atmel-hlcdc: enable clock before configuring timing engine
	drm/dp_mst: Remove VCPI while disabling topology mgr
	btrfs: flush write bio if we loop in extent_write_cache_pages
	KVM: x86/mmu: Apply max PA check for MMIO sptes to 32-bit KVM
	KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM
	KVM: VMX: Add non-canonical check on writes to RTIT address MSRs
	KVM: nVMX: vmread should not set rflags to specify success in case of #PF
	KVM: Use vcpu-specific gva->hva translation when querying host page size
	KVM: Play nice with read-only memslots when querying host page size
	mm: zero remaining unavailable struct pages
	mm: return zero_resv_unavail optimization
	mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
	cifs: fail i/o on soft mounts if sessionsetup errors out
	x86/apic/msi: Plug non-maskable MSI affinity race
	clocksource: Prevent double add_timer_on() for watchdog_timer
	perf/core: Fix mlock accounting in perf_mmap()
	rxrpc: Fix service call disconnection
	Linux 4.19.103

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0d7f09085c3541373e0fd6b2e3ffacc5e34f7d55
2020-02-11 15:05:03 -08:00
Song Liu
a3623db43a perf/core: Fix mlock accounting in perf_mmap()
commit 003461559ef7a9bd0239bae35a22ad8924d6e9ad upstream.

Decreasing sysctl_perf_event_mlock between two consecutive perf_mmap()s of
a perf ring buffer may lead to an integer underflow in locked memory
accounting. This may lead to the undesired behaviors, such as failures in
BPF map creation.

Address this by adjusting the accounting logic to take into account the
possibility that the amount of already locked memory may exceed the
current limit.

Fixes: c4b75479741c ("perf/core: Make the mlock accounting simple again")
Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lkml.kernel.org/r/20200123181146.2238074-1-songliubraving@fb.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:34:19 -08:00
Konstantin Khlebnikov
6284d30e96 clocksource: Prevent double add_timer_on() for watchdog_timer
commit febac332a819f0e764aa4da62757ba21d18c182b upstream.

Kernel crashes inside QEMU/KVM are observed:

  kernel BUG at kernel/time/timer.c:1154!
  BUG_ON(timer_pending(timer) || !timer->function) in add_timer_on().

At the same time another cpu got:

  general protection fault: 0000 [#1] SMP PTI of poinson pointer 0xdead000000000200 in:

  __hlist_del at include/linux/list.h:681
  (inlined by) detach_timer at kernel/time/timer.c:818
  (inlined by) expire_timers at kernel/time/timer.c:1355
  (inlined by) __run_timers at kernel/time/timer.c:1686
  (inlined by) run_timer_softirq at kernel/time/timer.c:1699

Unfortunately kernel logs are badly scrambled, stacktraces are lost.

Printing the timer->function before the BUG_ON() pointed to
clocksource_watchdog().

The execution of clocksource_watchdog() can race with a sequence of
clocksource_stop_watchdog() .. clocksource_start_watchdog():

expire_timers()
 detach_timer(timer, true);
  timer->entry.pprev = NULL;
 raw_spin_unlock_irq(&base->lock);
 call_timer_fn
  clocksource_watchdog()

					clocksource_watchdog_kthread() or
					clocksource_unbind()

					spin_lock_irqsave(&watchdog_lock, flags);
					clocksource_stop_watchdog();
					 del_timer(&watchdog_timer);
					 watchdog_running = 0;
					spin_unlock_irqrestore(&watchdog_lock, flags);

					spin_lock_irqsave(&watchdog_lock, flags);
					clocksource_start_watchdog();
					 add_timer_on(&watchdog_timer, ...);
					 watchdog_running = 1;
					spin_unlock_irqrestore(&watchdog_lock, flags);

  spin_lock(&watchdog_lock);
  add_timer_on(&watchdog_timer, ...);
   BUG_ON(timer_pending(timer) || !timer->function);
    timer_pending() -> true
    BUG()

I.e. inside clocksource_watchdog() watchdog_timer could be already armed.

Check timer_pending() before calling add_timer_on(). This is sufficient as
all operations are synchronized by watchdog_lock.

Fixes: 75c5158f70 ("timekeeping: Update clocksource with stop_machine")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/158048693917.4378.13823603769948933793.stgit@buzz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:34:18 -08:00
Thomas Gleixner
032a2bf978 x86/apic/msi: Plug non-maskable MSI affinity race
commit 6f1a4891a5928a5969c87fa5a584844c983ec823 upstream.

Evan tracked down a subtle race between the update of the MSI message and
the device raising an interrupt internally on PCI devices which do not
support MSI masking. The update of the MSI message is non-atomic and
consists of either 2 or 3 sequential 32bit wide writes to the PCI config
space.

   - Write address low 32bits
   - Write address high 32bits (If supported by device)
   - Write data

When an interrupt is migrated then both address and data might change, so
the kernel attempts to mask the MSI interrupt first. But for MSI masking is
optional, so there exist devices which do not provide it. That means that
if the device raises an interrupt internally between the writes then a MSI
message is sent built from half updated state.

On x86 this can lead to spurious interrupts on the wrong interrupt
vector when the affinity setting changes both address and data. As a
consequence the device interrupt can be lost causing the device to
become stuck or malfunctioning.

Evan tried to handle that by disabling MSI accross an MSI message
update. That's not feasible because disabling MSI has issues on its own:

 If MSI is disabled the PCI device is routing an interrupt to the legacy
 INTx mechanism. The INTx delivery can be disabled, but the disablement is
 not working on all devices.

 Some devices lose interrupts when both MSI and INTx delivery are disabled.

Another way to solve this would be to enforce the allocation of the same
vector on all CPUs in the system for this kind of screwed devices. That
could be done, but it would bring back the vector space exhaustion problems
which got solved a few years ago.

Fortunately the high address (if supported by the device) is only relevant
when X2APIC is enabled which implies interrupt remapping. In the interrupt
remapping case the affinity setting is happening at the interrupt remapping
unit and the PCI MSI message is programmed only once when the PCI device is
initialized.

That makes it possible to solve it with a two step update:

  1) Target the MSI msg to the new vector on the current target CPU

  2) Target the MSI msg to the new vector on the new target CPU

In both cases writing the MSI message is only changing a single 32bit word
which prevents the issue of inconsistency.

After writing the final destination it is necessary to check whether the
device issued an interrupt while the intermediate state #1 (new vector,
current CPU) was in effect.

This is possible because the affinity change is always happening on the
current target CPU. The code runs with interrupts disabled, so the
interrupt can be detected by checking the IRR of the local APIC. If the
vector is pending in the IRR then the interrupt is retriggered on the new
target CPU by sending an IPI for the associated vector on the target CPU.

This can cause spurious interrupts on both the local and the new target
CPU.

 1) If the new vector is not in use on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then interrupt entry code will
    ignore that spurious interrupt. The vector is marked so that the
    'No irq handler for vector' warning is supressed once.

 2) If the new vector is in use already on the local CPU then the IRR check
    might see an pending interrupt from the device which is using this
    vector. The IPI to the new target CPU will then invoke the handler of
    the device, which got the affinity change, even if that device did not
    issue an interrupt

 3) If the new vector is in use already on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then the handler of the device which
    uses that vector on the local CPU will be invoked.

expose issues in device driver interrupt handlers which are not prepared to
handle a spurious interrupt correctly. This not a regression, it's just
exposing something which was already broken as spurious interrupts can
happen for a lot of reasons and all driver handlers need to be able to deal
with them.

Reported-by: Evan Green <evgreen@chromium.org>
Debugged-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Evan Green <evgreen@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:34:18 -08:00
Steven Rostedt (VMware)
0948d6294d ftrace: Protect ftrace_graph_hash with ftrace_sync
[ Upstream commit 54a16ff6f2e50775145b210bcd94d62c3c2af117 ]

As function_graph tracer can run when RCU is not "watching", it can not be
protected by synchronize_rcu() it requires running a task on each CPU before
it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.

Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72

Cc: stable@vger.kernel.org
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:05 -08:00
Steven Rostedt (VMware)
c03d235980 ftrace: Add comment to why rcu_dereference_sched() is open coded
[ Upstream commit 16052dd5bdfa16dbe18d8c1d4cde2ddab9d23177 ]

Because the function graph tracer can execute in sections where RCU is not
"watching", the rcu_dereference_sched() for the has needs to be open coded.
This is fine because the RCU "flavor" of the ftrace hash is protected by
its own RCU handling (it does its own little synchronization on every CPU
and does not rely on RCU sched).

Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:04 -08:00
Amol Grover
30afa80b0f tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
[ Upstream commit fd0e6852c407dd9aefc594f54ddcc21d84803d3b ]

Fix following instances of sparse error
kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison

Use rcu_dereference_protected to dereference the newly annotated pointer.

Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com

Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:04 -08:00
Amol Grover
f144ad2e84 tracing: Annotate ftrace_graph_hash pointer with __rcu
[ Upstream commit 24a9729f831462b1d9d61dc85ecc91c59037243f ]

Fix following instances of sparse error
kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison

Use rcu_dereference_protected to access the __rcu annotated pointer.

Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:04 -08:00
Herbert Xu
dc34710a7a padata: Remove broken queue flushing
[ Upstream commit 07928d9bfc81640bab36f5190e8725894d93b659 ]

The function padata_flush_queues is fundamentally broken because
it cannot force padata users to complete the request that is
underway.  IOW padata has to passively wait for the completion
of any outstanding work.

As it stands flushing is used in two places.  Its use in padata_stop
is simply unnecessary because nothing depends on the queues to
be flushed afterwards.

The other use in padata_replace is more substantial as we depend
on it to free the old pd structure.  This patch instead uses the
pd->refcnt to dynamically free the pd structure once all requests
are complete.

Fixes: 2b73b07ab8 ("padata: Flush the padata queues actively")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:34:04 -08:00
Stephen Boyd
b522ff023e alarmtimer: Unregister wakeup source when module get fails
commit 6b6d188aae79a630957aefd88ff5c42af6553ee3 upstream.

The alarmtimer_rtc_add_device() function creates a wakeup source and then
tries to grab a module reference. If that fails the function returns early
with an error code, but fails to remove the wakeup source.

Cleanup this exit path so there is no dangling wakeup source, which is
named 'alarmtime' left allocated which will conflict with another RTC
device that may be registered later.

Fixes: 51218298a2 ("alarmtimer: Ensure RTC module is not unloaded")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200109155910.907-2-swboyd@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:33:59 -08:00
Kevin Hao
4f7d834cec irqdomain: Fix a memory leak in irq_domain_push_irq()
commit 0f394daef89b38d58c91118a2b08b8a1b316703b upstream.

Fix a memory leak reported by kmemleak:
unreferenced object 0xffff000bc6f50e80 (size 128):
  comm "kworker/23:2", pid 201, jiffies 4294894947 (age 942.132s)
  hex dump (first 32 bytes):
    00 00 00 00 41 00 00 00 86 c0 03 00 00 00 00 00  ....A...........
    00 a0 b2 c6 0b 00 ff ff 40 51 fd 10 00 80 ff ff  ........@Q......
  backtrace:
    [<00000000e62d2240>] kmem_cache_alloc_trace+0x1a4/0x320
    [<00000000279143c9>] irq_domain_push_irq+0x7c/0x188
    [<00000000d9f4c154>] thunderx_gpio_probe+0x3ac/0x438
    [<00000000fd09ec22>] pci_device_probe+0xe4/0x198
    [<00000000d43eca75>] really_probe+0xdc/0x320
    [<00000000d3ebab09>] driver_probe_device+0x5c/0xf0
    [<000000005b3ecaa0>] __device_attach_driver+0x88/0xc0
    [<000000004e5915f5>] bus_for_each_drv+0x7c/0xc8
    [<0000000079d4db41>] __device_attach+0xe4/0x140
    [<00000000883bbda9>] device_initial_probe+0x18/0x20
    [<000000003be59ef6>] bus_probe_device+0x98/0xa0
    [<0000000039b03d3f>] deferred_probe_work_func+0x74/0xa8
    [<00000000870934ce>] process_one_work+0x1c8/0x470
    [<00000000e3cce570>] worker_thread+0x1f8/0x428
    [<000000005d64975e>] kthread+0xfc/0x128
    [<00000000f0eaa764>] ret_from_fork+0x10/0x18

Fixes: 495c38d300 ("irqdomain: Add irq_domain_{push,pop}_irq() functions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200120043547.22271-1-haokexin@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:33:57 -08:00
Eric Dumazet
00b13445f9 rcu: Avoid data-race in rcu_gp_fqs_check_wake()
commit 6935c3983b246d5fbfebd3b891c825e65c118f2d upstream.

The rcu_gp_fqs_check_wake() function uses rcu_preempt_blocked_readers_cgp()
to read ->gp_tasks while other cpus might overwrite this field.

We need READ_ONCE()/WRITE_ONCE() pairs to avoid compiler
tricks and KCSAN splats like the following :

BUG: KCSAN: data-race in rcu_gp_fqs_check_wake / rcu_preempt_deferred_qs_irqrestore

write to 0xffffffff85a7f190 of 8 bytes by task 7317 on cpu 0:
 rcu_preempt_deferred_qs_irqrestore+0x43d/0x580 kernel/rcu/tree_plugin.h:507
 rcu_read_unlock_special+0xec/0x370 kernel/rcu/tree_plugin.h:659
 __rcu_read_unlock+0xcf/0xe0 kernel/rcu/tree_plugin.h:394
 rcu_read_unlock include/linux/rcupdate.h:645 [inline]
 __ip_queue_xmit+0x3b0/0xa40 net/ipv4/ip_output.c:533
 ip_queue_xmit+0x45/0x60 include/net/ip.h:236
 __tcp_transmit_skb+0xdeb/0x1cd0 net/ipv4/tcp_output.c:1158
 __tcp_send_ack+0x246/0x300 net/ipv4/tcp_output.c:3685
 tcp_send_ack+0x34/0x40 net/ipv4/tcp_output.c:3691
 tcp_cleanup_rbuf+0x130/0x360 net/ipv4/tcp.c:1575
 tcp_recvmsg+0x633/0x1a30 net/ipv4/tcp.c:2179
 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
 sock_recvmsg_nosec net/socket.c:871 [inline]
 sock_recvmsg net/socket.c:889 [inline]
 sock_recvmsg+0x92/0xb0 net/socket.c:885
 sock_read_iter+0x15f/0x1e0 net/socket.c:967
 call_read_iter include/linux/fs.h:1864 [inline]
 new_sync_read+0x389/0x4f0 fs/read_write.c:414

read to 0xffffffff85a7f190 of 8 bytes by task 10 on cpu 1:
 rcu_gp_fqs_check_wake kernel/rcu/tree.c:1556 [inline]
 rcu_gp_fqs_check_wake+0x93/0xd0 kernel/rcu/tree.c:1546
 rcu_gp_fqs_loop+0x36c/0x580 kernel/rcu/tree.c:1611
 rcu_gp_kthread+0x143/0x220 kernel/rcu/tree.c:1768
 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 10 Comm: rcu_preempt Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
[ paulmck:  Added another READ_ONCE() for RCU CPU stall warnings. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:33:55 -08:00
Mathieu Desnoyers
62bfa26e4d tracing: Fix sched switch start/stop refcount racy updates
commit 64ae572bc7d0060429e40e1c8d803ce5eb31a0d6 upstream.

Reading the sched_cmdline_ref and sched_tgid_ref initial state within
tracing_start_sched_switch without holding the sched_register_mutex is
racy against concurrent updates, which can lead to tracepoint probes
being registered more than once (and thus trigger warnings within
tracepoint.c).

[ May be the fix for this bug ]
Link: https://lore.kernel.org/r/000000000000ab6f84056c786b93@google.com

Link: http://lkml.kernel.org/r/20190817141208.15226-1-mathieu.desnoyers@efficios.com

Cc: stable@vger.kernel.org
CC: Steven Rostedt (VMware) <rostedt@goodmis.org>
CC: Joel Fernandes (Google) <joel@joelfernandes.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul E. McKenney <paulmck@linux.ibm.com>
Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 04:33:55 -08:00
John Ogness
8360063bfa printk: fix exclusive_console replaying
[ Upstream commit def97da136515cb289a14729292c193e0a93bc64 ]

Commit f92b070f2dc8 ("printk: Do not miss new messages when replaying
the log") introduced a new variable @exclusive_console_stop_seq to
store when an exclusive console should stop printing. It should be
set to the @console_seq value at registration. However, @console_seq
is previously set to @syslog_seq so that the exclusive console knows
where to begin. This results in the exclusive console immediately
reactivating all the other consoles and thus repeating the messages
for those consoles.

Set @console_seq after @exclusive_console_stop_seq has stored the
current @console_seq value.

Fixes: f92b070f2dc8 ("printk: Do not miss new messages when replaying the log")
Link: http://lkml.kernel.org/r/20191219115322.31160-1-john.ogness@linutronix.de
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:33:51 -08:00
YueHaibing
bdfaaf35ac kernel/module: Fix memleak in module_add_modinfo_attrs()
[ Upstream commit f6d061d617124abbd55396a3bc37b9bf7d33233c ]

In module_add_modinfo_attrs() if sysfs_create_file() fails
on the first iteration of the loop (so i = 0), we forget to
free the modinfo_attrs.

Fixes: bc6f2a757d52 ("kernel/module: Fix mem leak in module_add_modinfo_attrs")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11 04:33:50 -08:00
Greg Kroah-Hartman
83b584a64c This is the 4.19.102 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl461NIACgkQONu9yGCS
 aT6Mqw//W5xIIcs0Ut+P+QYNN6lCTRJ0AvFUolz79M3pyK/rHUluwTvYJbDAeGE3
 sckv96rE1pxj5ZSf6LegXIoALrA4RlYHS8xXkYnRrt6xfrb7UwpqsJtt4Mx+IrJ3
 9uFfaWRSvuDfRCraZxLiE2Bl9xVYvaPfFJYBmH383VB+deYNfpwORFsqNDQT+gR6
 PZLuV0x//Kerwmd4OvaaHR/fIl8YVKmIz5lu3+3WIuVKxTK6Bbd3YzVu13dhVaX2
 mETflLEAO/sYsUQiS4SO22ejLAiWyD8LyMV8s9KeTFQXzML3JpibKnt3ySDfzsFE
 m8VRlaLcQwB0Ca2AVGHA5QV0+V+2+6qh/IcZl630feBueGQX59qLQkOurD4e/9lm
 Na6ZkLPTh9UipIfTu9fvA5HY5lPt2VcSWwG2nLluckfJIpKNFVQEB7vuk9zd7468
 qkXmj/J1YDdJzt2YgD0WZuKu3f1/No7rXbNmT2Oj0+HNWWvIU9xFNFlIPAxo7pJy
 kwekd9+gHI0n1OhLRjzYUyf0pD+j0o75ZHsYYsUW0y6cGoWX/LmQ8JPFi+waHiov
 FOe8FJz/uDtfQnJ4+izAM5Jjbu1LE+L8uGoIExYAv4DuXgPZtI2wtHvP4HHM3Aov
 mDWLesMgizsroViv57aXC0C1ZPksPpGeHT+HcH7RnDQ0kQmpe3E=
 =2XGW
 -----END PGP SIGNATURE-----

Merge 4.19.102 into android-4.19

Changes in 4.19.102
	vfs: fix do_last() regression
	x86/resctrl: Fix use-after-free when deleting resource groups
	x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup
	x86/resctrl: Fix a deadlock due to inaccurate reference
	crypto: pcrypt - Fix user-after-free on module unload
	rsi: add hci detach for hibernation and poweroff
	rsi: fix use-after-free on failed probe and unbind
	perf c2c: Fix return type for histogram sorting comparision functions
	PM / devfreq: Add new name attribute for sysfs
	tools lib: Fix builds when glibc contains strlcpy()
	arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean'
	ext4: validate the debug_want_extra_isize mount option at parse time
	mm/mempolicy.c: fix out of bounds write in mpol_parse_str()
	reiserfs: Fix memory leak of journal device string
	media: digitv: don't continue if remote control state can't be read
	media: af9005: uninitialized variable printked
	media: vp7045: do not read uninitialized values if usb transfer fails
	media: gspca: zero usb_buf
	media: dvb-usb/dvb-usb-urb.c: initialize actlen to 0
	tomoyo: Use atomic_t for statistics counter
	ttyprintk: fix a potential deadlock in interrupt context issue
	Bluetooth: Fix race condition in hci_release_sock()
	cgroup: Prevent double killing of css when enabling threaded cgroup
	media: si470x-i2c: Move free() past last use of 'radio'
	ARM: dts: sun8i: a83t: Correct USB3503 GPIOs polarity
	ARM: dts: am57xx-beagle-x15/am57xx-idk: Remove "gpios" for endpoint dt nodes
	ARM: dts: beagle-x15-common: Model 5V0 regulator
	soc: ti: wkup_m3_ipc: Fix race condition with rproc_boot
	tools lib traceevent: Fix memory leakage in filter_event
	rseq: Unregister rseq for clone CLONE_VM
	clk: sunxi-ng: h6-r: Fix AR100/R_APB2 parent order
	mac80211: mesh: restrict airtime metric to peered established plinks
	clk: mmp2: Fix the order of timer mux parents
	ASoC: rt5640: Fix NULL dereference on module unload
	ixgbevf: Remove limit of 10 entries for unicast filter list
	ixgbe: Fix calculation of queue with VFs and flow director on interface flap
	igb: Fix SGMII SFP module discovery for 100FX/LX.
	platform/x86: GPD pocket fan: Allow somewhat lower/higher temperature limits
	ASoC: sti: fix possible sleep-in-atomic
	qmi_wwan: Add support for Quectel RM500Q
	parisc: Use proper printk format for resource_size_t
	wireless: fix enabling channel 12 for custom regulatory domain
	cfg80211: Fix radar event during another phy CAC
	mac80211: Fix TKIP replay protection immediately after key setup
	wireless: wext: avoid gcc -O3 warning
	netfilter: nft_tunnel: ERSPAN_VERSION must not be null
	net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec
	bnxt_en: Fix ipv6 RFS filter matching logic.
	riscv: delete temporary files
	iwlwifi: Don't ignore the cap field upon mcc update
	ARM: dts: am335x-boneblack-common: fix memory size
	vti[6]: fix packet tx through bpf_redirect()
	xfrm interface: fix packet tx through bpf_redirect()
	xfrm: interface: do not confirm neighbor when do pmtu update
	scsi: fnic: do not queue commands during fwreset
	ARM: 8955/1: virt: Relax arch timer version check during early boot
	tee: optee: Fix compilation issue with nommu
	airo: Fix possible info leak in AIROOLDIOCTL/SIOCDEVPRIVATE
	airo: Add missing CAP_NET_ADMIN check in AIROOLDIOCTL/SIOCDEVPRIVATE
	r8152: get default setting of WOL before initializing
	ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1
	qlcnic: Fix CPU soft lockup while collecting firmware dump
	powerpc/fsl/dts: add fsl,erratum-a011043
	net/fsl: treat fsl,erratum-a011043
	net: fsl/fman: rename IF_MODE_XGMII to IF_MODE_10G
	seq_tab_next() should increase position index
	l2t_seq_next should increase position index
	net: Fix skb->csum update in inet_proto_csum_replace16().
	btrfs: do not zero f_bavail if we have available space
	perf report: Fix no libunwind compiled warning break s390 issue
	mm/migrate.c: also overwrite error when it is bigger than zero
	Linux 4.19.102

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia9b63c7932b66f469ab0e88467e1e07741408f0b
2020-02-05 19:20:26 +00:00
Michal Koutný
6d26630912 cgroup: Prevent double killing of css when enabling threaded cgroup
commit 3bc0bb36fa30e95ca829e9cf480e1ef7f7638333 upstream.

The test_cgcore_no_internal_process_constraint_on_threads selftest when
running with subsystem controlling noise triggers two warnings:

> [  597.443115] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3131 cgroup_apply_control_enable+0xe0/0x3f0
> [  597.443413] WARNING: CPU: 1 PID: 28167 at kernel/cgroup/cgroup.c:3177 cgroup_apply_control_disable+0xa6/0x160

Both stem from a call to cgroup_type_write. The first warning was also
triggered by syzkaller.

When we're switching cgroup to threaded mode shortly after a subsystem
was disabled on it, we can see the respective subsystem css dying there.

The warning in cgroup_apply_control_enable is harmless in this case
since we're not adding new subsys anyway.
The warning in cgroup_apply_control_disable indicates an attempt to kill
css of recently disabled subsystem repeatedly.

The commit prevents these situations by making cgroup_type_write wait
for all dying csses to go away before re-applying subtree controls.
When at it, the locations of WARN_ON_ONCE calls are moved so that
warning is triggered only when we are about to misuse the dying css.

Reported-by: syzbot+5493b2a54d31d6aea629@syzkaller.appspotmail.com
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-05 14:43:39 +00:00
Greg Kroah-Hartman
1b44c9bd91 This is the 4.19.101 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl41RsgACgkQONu9yGCS
 aT4P7A/+PZVt4c6phHZ9tj0OV4TjAWfu3IX9nLypzyBxjmBeJu8yt1pkNrfKj6fT
 +N3MjDlmAYss5CV6SOACPWXdhAQF3SsM6PR+CSrzwpS3+iAVTqNTaHpZqJFBgr3R
 cDe+MksbMLDpw3x+hXWV1E6WKcJZZJVeANuaD09HQDRVqKw1hRGxGEdyPChEjT71
 Ml3o9a2TYzOvRClBtBHPRQNy/MP4cVv06kS7jefDNh1z9PMsD2w01W54ur44WFJb
 aujt6bLyJlcs0cPdSkU7D8pmgzs/0cxW8N+4gCpfW66P6FJL8SU4RDTujUARlyvC
 oP5d62XrARXAO0hh1NYdWyUqpQjOFJRTWfEqW+lFGo5s9yL9oPW8vcCBKBuZfg+u
 HlVCCTCyU/IJN0DMeqdneThDg8sxirlzHu/NllgGIf7rhyMRqRmruQZXc1W3/7e8
 UgqqAEFkgVmJgq3mVWlHsV5Fmgb+PQlqj4rSB05wlAbXsQwF0nbSS/lsvwDR8qqE
 8nO/PQoxpQyAOYJ+iyaCsq51IsJUCwWOto8L/RpdYSbFpLTn+BRmNdDr7jHOVnPl
 FshugoXijE6IrVGIJhJBGGy/E+eG8Dru7IZEsi2UZLsw+bBvucqv7raIHAJ2YRaL
 8ZuwwmvpZpCOdYSWa7lIgqZb0qOTyR+b6UQ57X8hS5U3MZ2jMOE=
 =+bpt
 -----END PGP SIGNATURE-----

Merge 4.19.101 into android-4.19

Changes in 4.19.101
	orinoco_usb: fix interface sanity check
	rsi_91x_usb: fix interface sanity check
	usb: dwc3: pci: add ID for the Intel Comet Lake -V variant
	USB: serial: ir-usb: add missing endpoint sanity check
	USB: serial: ir-usb: fix link-speed handling
	USB: serial: ir-usb: fix IrLAP framing
	usb: dwc3: turn off VBUS when leaving host mode
	staging: most: net: fix buffer overflow
	staging: wlan-ng: ensure error return is actually returned
	staging: vt6656: correct packet types for CTS protect, mode.
	staging: vt6656: use NULLFUCTION stack on mac80211
	staging: vt6656: Fix false Tx excessive retries reporting.
	serial: 8250_bcm2835aux: Fix line mismatch on driver unbind
	component: do not dereference opaque pointer in debugfs
	mei: me: add comet point (lake) H device ids
	iio: st_gyro: Correct data for LSM9DS0 gyro
	crypto: chelsio - fix writing tfm flags to wrong place
	cifs: Fix memory allocation in __smb2_handle_cancelled_cmd()
	ath9k: fix storage endpoint lookup
	brcmfmac: fix interface sanity check
	rtl8xxxu: fix interface sanity check
	zd1211rw: fix storage endpoint lookup
	net_sched: ematch: reject invalid TCF_EM_SIMPLE
	net_sched: fix ops->bind_class() implementations
	HID: multitouch: Add LG MELF0410 I2C touchscreen support
	arc: eznps: fix allmodconfig kconfig warning
	HID: Add quirk for Xin-Mo Dual Controller
	HID: ite: Add USB id match for Acer SW5-012 keyboard dock
	HID: Add quirk for incorrect input length on Lenovo Y720
	drivers/hid/hid-multitouch.c: fix a possible null pointer access.
	phy: qcom-qmp: Increase PHY ready timeout
	phy: cpcap-usb: Prevent USB line glitches from waking up modem
	watchdog: max77620_wdt: fix potential build errors
	watchdog: rn5t618_wdt: fix module aliases
	spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls
	drivers/net/b44: Change to non-atomic bit operations on pwol_mask
	net: wan: sdla: Fix cast from pointer to integer of different size
	gpio: max77620: Add missing dependency on GPIOLIB_IRQCHIP
	atm: eni: fix uninitialized variable warning
	HID: steam: Fix input device disappearing
	platform/x86: dell-laptop: disable kbd backlight on Inspiron 10xx
	PCI: Add DMA alias quirk for Intel VCA NTB
	iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping
	ARM: OMAP2+: SmartReflex: add omap_sr_pdata definition
	usb-storage: Disable UAS on JMicron SATA enclosure
	sched/fair: Add tmp_alone_branch assertion
	sched/fair: Fix insertion in rq->leaf_cfs_rq_list
	rsi: fix use-after-free on probe errors
	rsi: fix memory leak on failed URB submission
	rsi: fix non-atomic allocation in completion handler
	crypto: af_alg - Use bh_lock_sock in sk_destruct
	random: try to actively add entropy rather than passively wait for it
	block: cleanup __blkdev_issue_discard()
	block: fix 32 bit overflow in __blkdev_issue_discard()
	KVM: arm64: Write arch.mdcr_el2 changes since last vcpu_load on VHE
	Linux 4.19.101

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I801cd8d04eea35b4b53957cc69c0987d88094992
2020-02-02 20:22:38 +00:00
Patrick Bellasi
8b2fbd9076 UPSTREAM: sched/fair/util_est: Implement faster ramp-up EWMA on utilization increases
The estimated utilization for a task:

   util_est = max(util_avg, est.enqueue, est.ewma)

is defined based on:

 - util_avg: the PELT defined utilization
 - est.enqueued: the util_avg at the end of the last activation
 - est.ewma:     a exponential moving average on the est.enqueued samples

According to this definition, when a task suddenly changes its bandwidth
requirements from small to big, the EWMA will need to collect multiple
samples before converging up to track the new big utilization.

This slow convergence towards bigger utilization values is not
aligned to the default scheduler behavior, which is to optimize for
performance. Moreover, the est.ewma component fails to compensate for
temporarely utilization drops which spans just few est.enqueued samples.

To let util_est do a better job in the scenario depicted above, change
its definition by making util_est directly follow upward motion and
only decay the est.ewma on downward.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@matbug.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <qperret@google.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191023205630.14469-1-patrick.bellasi@matbug.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit b8c96361402aa3e74ad48ceef18aed99153d8da8)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I5c0bdd401f3fe599a2b7b9215c9a3a621f91002d
2020-02-01 17:35:00 +00:00
Qais Yousef
f503db1178 ANDROID: Re-use SUGOV_RT_MAX_FREQ to control uclamp rt behavior
By default uclamp RT tasks will use the max frequency, which is not the
desired default behavior on mobile devices.

Re-use the SUGOV_RT_MAX_FREQ sched_feat to control the default behavior.

When SUGOV_RT_MAX_FREQ is NOT selected, the uclamp_min value of the RT
tasks will be 0.

Note, since now we use SUGOV_RT_MAX_FREQ to enforce the default max
frequency for RT when uclamp is compiled in; the condition in
schedutil_cpu_util() needs to be inverted so that max no longer
unconditionally applied when uclamp is compiled in && SUGOV_RT_MAX_FREQ
is true. This unconditional application means uclamp values are always
ignored which is not what we want when uclamp is compiled in.

Bug: 120440300
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I3d36f1ebed6ef35a6299af32bbf4462d0353e783
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 16:14:12 +00:00
Valentin Schneider
ecce1cf84a BACKPORT: sched/fair: Make EAS wakeup placement consider uclamp restrictions
task_fits_capacity() has just been made uclamp-aware, and
find_energy_efficient_cpu() needs to go through the same treatment.

Things are somewhat different here however - using the task max clamp isn't
sufficient. Consider the following setup:

  The target runqueue, rq:
    rq.cpu_capacity_orig = 512
    rq.cfs.avg.util_avg = 200
    rq.uclamp.max = 768 // the max p.uclamp.max of all enqueued p's is 768

  The waking task, p (not yet enqueued on rq):
    p.util_est = 600
    p.uclamp.max = 100

Now, consider the following code which doesn't use the rq clamps:

  util = uclamp_task_util(p);
  // Does the task fit in the spare CPU capacity?
  cpu = cpu_of(rq);
  fits_capacity(util, cpu_capacity(cpu) - cpu_util(cpu))

This would lead to:

  util = 100;
  fits_capacity(100, 512 - 200)

fits_capacity() would return true. However, enqueuing p on that CPU *will*
cause it to become overutilized since rq clamp values are max-aggregated,
so we'd remain with

  rq.uclamp.max = 768

which comes from the other tasks already enqueued on rq. Thus, we could
select a high enough frequency to reach beyond 0.8 * 512 utilization
(== overutilized) after enqueuing p on rq. What find_energy_efficient_cpu()
needs here is uclamp_rq_util_with() which lets us peek at the future
utilization landscape, including rq-wide uclamp values.

Make find_energy_efficient_cpu() use uclamp_rq_util_with() for its
fits_capacity() check. This is in line with what compute_energy() ends up
using for estimating utilization.

[QP: moved changes to select_cpu_candidates(), which is the equivalent
to the mainline path, and fix missing dependency on fits_capacity() by
using the open coded version]

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-6-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1d42509e475cdc8542aa5b3e03a7e845244f4f57)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Ibe1643cd5e6c97daceceae9733344e54bf4a4857
2020-02-01 16:14:11 +00:00
Valentin Schneider
50262f741b BACKPORT: sched/fair: Make task_fits_capacity() consider uclamp restrictions
task_fits_capacity() drives CPU selection at wakeup time, and is also used
to detect misfit tasks. Right now it does so by comparing task_util_est()
with a CPU's capacity, but doesn't take into account uclamp restrictions.

There's a few interesting uses that can come out of doing this. For
instance, a low uclamp.max value could prevent certain tasks from being
flagged as misfit tasks, so they could merrily remain on low-capacity CPUs.
Similarly, a high uclamp.min value would steer tasks towards high capacity
CPUs at wakeup (and, should that fail, later steered via misfit balancing),
so such "boosted" tasks would favor CPUs of higher capacity.

Introduce uclamp_task_util() and make task_fits_capacity() use it.

[QP: fixed missing dependency on fits_capacity() by using the open coded
alternative]

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-5-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit a7008c07a568278ed2763436404752a98004c7ff)
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: Iabde2eda7252c3bcc273e61260a7a12a7de991b1
2020-02-01 16:14:11 +00:00
Patrick Bellasi
f609a2239f ANDROID: sched/core: Move SchedTune task API into UtilClamp wrappers
The main SchedTune API calls realted to task tuning attributes are now
wrapped by more generic and mainlinish UtilClamp calls.

The new APIs are:

 - uclamp_task(p)               <= boosted_task_util(p)
 - uclamp_boosted(p)            <= schedtune_task_boost(p) > 0
 - uclamp_latency_sensitive(p)  <= schedtune_prefer_idle(p)

Let's provide also an implementation of the same API based on the new
uclamp.uclamp_latency_sensitive flag.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
[Modified the patch to use uclamp.latency_sensitive instead mainline
attributes]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ib1a6902e1c07a82a370e36bf1776d895b7528cbc
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 16:14:11 +00:00
Quentin Perret
752b47b84d ANDROID: sched/core: Add a latency-sensitive flag to uclamp
Add a 'latency_sensitive' flag to uclamp in order to express the need
for some tasks to find a CPU where they can wake-up quickly. This is not
expected to be used without cgroup support, so add solely a cgroup
interface for it.

As this flag represents a boolean attribute and not an amount of
resources to be shared, it is not clear what the delegation logic should
be. As such, it is kept simple: every new cgroup starts with
latency_sensitive set to false, regardless of the parent.

In essence, this is similar to SchedTune's prefer-idle flag which was
used in android-4.19 and prior.

Bug: 120440300
Change-Id: I722d8ecabb428bb7b95a5b54bc70a87f182dde2a
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
(cherry picked from commit ad7dd648fc7dbe11f23673a3463af2468a274998)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:17 +00:00
Patrick Bellasi
9a05300da0 ANDROID: sched/tune: Move SchedTune cpu API into UtilClamp wrappers
The SchedTune CPU boosting API is currently used from sugov_get_util()
to get the boosted utilization and to pass it into schedutil_cpu_util().

When UtilClamp is in use instead we call schedutil_cpu_util() by
passing in just the CFS utilization and the clamping is done internally
on the aggregated CFS+RT utilization for FREQUENCY_UTIL calls.

This asymmetry is not required moreover, schedutil code is polluted by
non-mainline SchedTune code.

Wrap SchedTune API call related to cpu utilization boosting with a more
generic and mainlinish UtilClamp call:

 - uclamp_rq_util_with(cpu, util, p)  <= boosted_cpu_util(cpu)

This new API is already used in schedutil_cpu_util() to clamp the
aggregated RT+CFS utilization on FREQUENCY_UTIL calls.

Move the cpu boosting into uclamp_rq_util_with() so that we remove any
SchedTune specific bit from kernel/sched/cpufreq_schedutil.c.

Get rid of the no more required boosted_cpu_util(cpu) method and replace
it with a stune_util(cpu, util) which signature is better aligned with
its uclamp_rq_util_with(cpu, util, p) counterpart.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I45b0f0f54123fe0a2515fa9f1683842e6b99234f
[Removed superfluous __maybe_unused for capacity_orig_of]
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:17 +00:00
Li Guanglei
7e1c333ed1 FROMGIT: sched/core: Fix size of rq::uclamp initialization
rq::uclamp is an array of struct uclamp_rq, make sure we clear the
whole thing.

Bug: 120440300
Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcountinga")
Signed-off-by: Li Guanglei <guanglei.li@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lkml.kernel.org/r/1577259844-12677-1-git-send-email-guangleix.li@gmail.com
(cherry picked from commit dcd6dffb0a75741471297724640733fa4e958d72
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Id36a2b77c45e586535e8fadfb7d66868ca8fe8c7
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Qais Yousef
45b9d34bec FROMGIT: sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
When a new cgroup is created, the effective uclamp value wasn't updated
with a call to cpu_util_update_eff() that looks at the hierarchy and
update to the most restrictive values.

Fix it by ensuring to call cpu_util_update_eff() when a new cgroup
becomes online.

Without this change, the newly created cgroup uses the default
root_task_group uclamp values, which is 1024 for both uclamp_{min, max},
which will cause the rq to to be clamped to max, hence cause the
system to run at max frequency.

The problem was observed on Ubuntu server and was reproduced on Debian
and Buildroot rootfs.

By default, Ubuntu and Debian create a cpu controller cgroup hierarchy
and add all tasks to it - which creates enough noise to keep the rq
uclamp value at max most of the time. Imitating this behavior makes the
problem visible in Buildroot too which otherwise looks fine since it's a
minimal userspace.

Bug: 120440300
Fixes: 0b60ba2dd342 ("sched/uclamp: Propagate parent clamps")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Doug Smythies <dsmythies@telus.net>
Link: https://lore.kernel.org/lkml/000701d5b965$361b6c60$a2524520$@net/
(cherry picked from commit 7226017ad37a888915628e59a84a2d1e57b40707
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I9636c60e04d58bbfc5041df1305b34a12b5a3f46
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Valentin Schneider
f59dfad8f9 FROMGIT: sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
The current helper returns (CPU) rq utilization with uclamp restrictions
taken into account. A uclamp task utilization helper would be quite
helpful, but this requires some renaming.

Prepare the code for the introduction of a uclamp_task_util() by renaming
the existing uclamp_util_with() to uclamp_rq_util_with().

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-4-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit d2b58a286e89824900d501db0be1d4f6aed474fc
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I3e7146b788e079e400167203df5e5dadee2fd232
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Valentin Schneider
254e090f3a FROMGIT: sched/uclamp: Make uclamp util helpers use and return UL values
Vincent pointed out recently that the canonical type for utilization
values is 'unsigned long'. Internally uclamp uses 'unsigned int' values for
cache optimization, but this doesn't have to be exported to its users.

Make the uclamp helpers that deal with utilization use and return unsigned
long values.

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-3-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 686516b55e98edf18c2a02d36aaaa6f4c0f6c39c
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Id3837f12237e5b77eb3a236bd32457dcd7de743e
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Valentin Schneider
6477d90135 FROMGIT: sched/uclamp: Remove uclamp_util()
The sole user of uclamp_util(), schedutil_cpu_util(), was made to use
uclamp_util_with() instead in commit:

  af24bde8df20 ("sched/uclamp: Add uclamp support to energy_compute()")

From then on, uclamp_util() has remained unused. Being a simple wrapper
around uclamp_util_with(), we can get rid of it and win back a few lines.

Bug: 120440300
Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com>
Suggested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191211113851.24241-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 59fe675248ffc37d4167e9ec6920a2f3d5ec67bb
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I11dbff80c6c4be9666438800b2527aca8cd24025
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Qais Yousef
cdadd91444 BACKPORT: sched/rt: Make RT capacity-aware
Capacity Awareness refers to the fact that on heterogeneous systems
(like Arm big.LITTLE), the capacity of the CPUs is not uniform, hence
when placing tasks we need to be aware of this difference of CPU
capacities.

In such scenarios we want to ensure that the selected CPU has enough
capacity to meet the requirement of the running task. Enough capacity
means here that capacity_orig_of(cpu) >= task.requirement.

The definition of task.requirement is dependent on the scheduling class.

For CFS, utilization is used to select a CPU that has >= capacity value
than the cfs_task.util.

	capacity_orig_of(cpu) >= cfs_task.util

DL isn't capacity aware at the moment but can make use of the bandwidth
reservation to implement that in a similar manner CFS uses utilization.
The following patchset implements that:

https://lore.kernel.org/lkml/20190506044836.2914-1-luca.abeni@santannapisa.it/

	capacity_orig_of(cpu)/SCHED_CAPACITY >= dl_deadline/dl_runtime

For RT we don't have a per task utilization signal and we lack any
information in general about what performance requirement the RT task
needs. But with the introduction of uclamp, RT tasks can now control
that by setting uclamp_min to guarantee a minimum performance point.

ATM the uclamp value are only used for frequency selection; but on
heterogeneous systems this is not enough and we need to ensure that the
capacity of the CPU is >= uclamp_min. Which is what implemented here.

	capacity_orig_of(cpu) >= rt_task.uclamp_min

Note that by default uclamp.min is 1024, which means that RT tasks will
always be biased towards the big CPUs, which make for a better more
predictable behavior for the default case.

Must stress that the bias acts as a hint rather than a definite
placement strategy. For example, if all big cores are busy executing
other RT tasks we can't guarantee that a new RT task will be placed
there.

On non-heterogeneous systems the original behavior of RT should be
retained. Similarly if uclamp is not selected in the config.

[ mingo: Minor edits to comments. ]

Bug: 120440300
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191009104611.15363-1-qais.yousef@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 804d402fb6f6487b825aae8cf42fda6426c62867
 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git)
[Qais: resolved minor conflict in kernel/sched/cpupri.c]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ifc9da1c47de1aec9b4d87be2614e4c8968366900
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:16 +00:00
Valentin Schneider
ea9ce42997 UPSTREAM: sched/uclamp: Fix overzealous type replacement
Some uclamp helpers had their return type changed from 'unsigned int' to
'enum uclamp_id' by commit

  0413d7f33e60 ("sched/uclamp: Always use 'enum uclamp_id' for clamp_id values")

but it happens that some do return a value in the [0, SCHED_CAPACITY_SCALE]
range, which should really be unsigned int. The affected helpers are
uclamp_none(), uclamp_rq_max_value() and uclamp_eff_value(). Fix those up.

Note that this doesn't lead to any obj diff using a relatively recent
aarch64 compiler (8.3-2019.03). The current code of e.g. uclamp_eff_value()
properly returns an 11 bit value (bits_per(1024)) and doesn't seem to do
anything funny. I'm still marking this as fixing the above commit to be on
the safe side.

Bug: 120440300
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: patrick.bellasi@matbug.net
Cc: qperret@google.com
Cc: surenb@google.com
Cc: tj@kernel.org
Fixes: 0413d7f33e60 ("sched/uclamp: Always use 'enum uclamp_id' for clamp_id values")
Link: https://lkml.kernel.org/r/20191115103908.27610-1-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7763baace1b738d65efa46d68326c9406311c6bf)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I924a99c125372a8fca81cb4bc0c82e6a7183fc8a
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:15 +00:00
Qais Yousef
7125c7cfca UPSTREAM: sched/uclamp: Fix incorrect condition
uclamp_update_active() should perform the update when
p->uclamp[clamp_id].active is true. But when the logic was inverted in
[1], the if condition wasn't inverted correctly too.

[1] https://lore.kernel.org/lkml/20190902073836.GO2369@hirez.programming.kicks-ass.net/

Bug: 120440300
Reported-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Patrick Bellasi <patrick.bellasi@matbug.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes")
Link: https://lkml.kernel.org/r/20191114211052.15116-1-qais.yousef@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 6e1ff0773f49c7d38e8b4a9df598def6afb9f415)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I51b58a6089290277e08a0aaa72b86f852eec1512
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:15 +00:00
Qais Yousef
64bf81cac2 UPSTREAM: sched/core: Fix compilation error when cgroup not selected
When cgroup is disabled the following compilation error was hit

	kernel/sched/core.c: In function ‘uclamp_update_active_tasks’:
	kernel/sched/core.c:1081:23: error: storage size of ‘it’ isn’t known
	  struct css_task_iter it;
			       ^~
	kernel/sched/core.c:1084:2: error: implicit declaration of function ‘css_task_iter_start’; did you mean ‘__sg_page_iter_start’? [-Werror=implicit-function-declaration]
	  css_task_iter_start(css, 0, &it);
	  ^~~~~~~~~~~~~~~~~~~
	  __sg_page_iter_start
	kernel/sched/core.c:1085:14: error: implicit declaration of function ‘css_task_iter_next’; did you mean ‘__sg_page_iter_next’? [-Werror=implicit-function-declaration]
	  while ((p = css_task_iter_next(&it))) {
		      ^~~~~~~~~~~~~~~~~~
		      __sg_page_iter_next
	kernel/sched/core.c:1091:2: error: implicit declaration of function ‘css_task_iter_end’; did you mean ‘get_task_cred’? [-Werror=implicit-function-declaration]
	  css_task_iter_end(&it);
	  ^~~~~~~~~~~~~~~~~
	  get_task_cred
	kernel/sched/core.c:1081:23: warning: unused variable ‘it’ [-Wunused-variable]
	  struct css_task_iter it;
			       ^~
	cc1: some warnings being treated as errors
	make[2]: *** [kernel/sched/core.o] Error 1

Fix by protetion uclamp_update_active_tasks() with
CONFIG_UCLAMP_TASK_GROUP

Bug: 120440300
Fixes: babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Patrick Bellasi <patrick.bellasi@matbug.net>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ben Segall <bsegall@google.com>
Link: https://lkml.kernel.org/r/20191105112212.596-1-qais.yousef@arm.com
(cherry picked from commit e3b8b6a0d12cccf772113d6b5c1875192186fbd4)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ia4c0f801d68050526f9f117ec9189e448b01345a
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:15 +00:00
Ingo Molnar
7f682d7abc UPSTREAM: sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:

  $ chrt -p $$
  chrt: failed to get pid 26306's policy: Argument list too long

and he has root-caused the bug to the following commit increasing sched_attr
size and breaking sched_read_attr() into returning -EFBIG:

  a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")

The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
logic of checking non-zero bits in new ABI components is arguably broken,
and pretty much any extension of the ABI will spuriously break the ABI.
That's way too fragile.

Instead implement the perf syscall's extensible ABI instead, which we
already implement on the sched_setattr() side:

 - if user-attributes have the same size as kernel attributes then the
   logic is unchanged.

 - if user-attributes are larger than the kernel knows about then simply
   skip the extra bits, but set attr->size to the (smaller) kernel size
   so that tooling can (in principle) handle older kernel as well.

 - if user-attributes are smaller than the kernel knows about then just
   copy whatever user-space can accept.

Also clean up the whole logic:

 - Simplify the code flow - there's no need for 'ret' for example.

 - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
   always know which side we are dealing with.

 - Why is it called 'read' when what it does is to copy to user? This
   code is so far away from VFS read() semantics that the naming is
   actively confusing. Name it sched_attr_copy_to_user() instead, which
   mirrors other copy_to_user() functionality.

 - Move the attr->size assignment from the head of sched_getattr() to the
   sched_attr_copy_to_user() function. Nothing else within the kernel
   should care about the size of the structure.

With these fixes the sched_getattr() syscall now nicely supports an
extensible ABI in both a forward and backward compatible fashion, and
will also fix the chrt bug.

As an added bonus the bogus -EFBIG return is removed as well, which as
Thadeu noted should have been -E2BIG to begin with.

Bug: 120440300
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1251201c0d34fadf69d56efa675c2b7dd0a90eca)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I67e653c4f69db0140e9651c125b60e2b8cfd62f1
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:15 +00:00
Patrick Bellasi
53a73b1f35 UPSTREAM: sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
The supported clamp indexes are defined in 'enum clamp_id', however, because
of the code logic in some of the first utilization clamping series version,
sometimes we needed to use 'unsigned int' to represent indices.

This is not more required since the final version of the uclamp_* APIs can
always use the proper enum uclamp_id type.

Fix it with a bulk rename now that we have all the bits merged.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-7-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 0413d7f33e60751570fd6c179546bde2f7d82dcb)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I0be680b2489fa07244bac63b5c6fe1a79a53bef7
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:15 +00:00
Patrick Bellasi
d286ec414a UPSTREAM: sched/uclamp: Update CPU's refcount on TG's clamp changes
On updates of task group (TG) clamp values, ensure that these new values
are enforced on all RUNNABLE tasks of the task group, i.e. all RUNNABLE
tasks are immediately boosted and/or capped as requested.

Do that each time we update effective clamps from cpu_util_update_eff().
Use the *cgroup_subsys_state (css) to walk the list of tasks in each
affected TG and update their RUNNABLE tasks.
Update each task by using the same mechanism used for cpu affinity masks
updates, i.e. by taking the rq lock.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-6-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit babbe170e053c6ec2343751749995b7b9fd5fd2c)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I5e48891bd48c266dd282e1bab8f60533e4e29b48
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:15 +00:00
Patrick Bellasi
a1f3376922 UPSTREAM: sched/uclamp: Use TG's clamps to restrict TASK's clamps
When a task specific clamp value is configured via sched_setattr(2), this
value is accounted in the corresponding clamp bucket every time the task is
{en,de}qeued. However, when cgroups are also in use, the task specific
clamp values could be restricted by the task_group (TG) clamp values.

Update uclamp_cpu_inc() to aggregate task and TG clamp values. Every time a
task is enqueued, it's accounted in the clamp bucket tracking the smaller
clamp between the task specific value and its TG effective value. This
allows to:

1. ensure cgroup clamps are always used to restrict task specific requests,
   i.e. boosted not more than its TG effective protection and capped at
   least as its TG effective limit.

2. implement a "nice-like" policy, where tasks are still allowed to request
   less than what enforced by their TG effective limits and protections

Do this by exploiting the concept of "effective" clamp, which is already
used by a TG to track parent enforced restrictions.

Apply task group clamp restrictions only to tasks belonging to a child
group. While, for tasks in the root group or in an autogroup, system
defaults are still enforced.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-5-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3eac870a324728e5d17118888840dad70bcd37f3)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I0215e0a68cc0fa7c441e33052757f8571b7c99b9
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:15 +00:00
Patrick Bellasi
c4c03cf9bf UPSTREAM: sched/uclamp: Propagate system defaults to the root group
The clamp values are not tunable at the level of the root task group.
That's for two main reasons:

 - the root group represents "system resources" which are always
   entirely available from the cgroup standpoint.

 - when tuning/restricting "system resources" makes sense, tuning must
   be done using a system wide API which should also be available when
   control groups are not.

When a system wide restriction is available, cgroups should be aware of
its value in order to know exactly how much "system resources" are
available for the subgroups.

Utilization clamping supports already the concepts of:

 - system defaults: which define the maximum possible clamp values
   usable by tasks.

 - effective clamps: which allows a parent cgroup to constraint (maybe
   temporarily) its descendants without losing the information related
   to the values "requested" from them.

Exploit these two concepts and bind them together in such a way that,
whenever system default are tuned, the new values are propagated to
(possibly) restrict or relax the "effective" value of nested cgroups.

When cgroups are in use, force an update of all the RUNNABLE tasks.
Otherwise, keep things simple and do just a lazy update next time each
task will be enqueued.
Do that since we assume a more strict resource control is required when
cgroups are in use. This allows also to keep "effective" clamp values
updated in case we need to expose them to user-space.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-4-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7274a5c1bbec45f06f1fff4b8c8b5855b6cc189d)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ibf7ce5c46b67c79765b56b792ee22ed9595802c3
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:14 +00:00
Patrick Bellasi
77a413e758 UPSTREAM: sched/uclamp: Propagate parent clamps
In order to properly support hierarchical resources control, the cgroup
delegation model requires that attribute writes from a child group never
fail but still are locally consistent and constrained based on parent's
assigned resources. This requires to properly propagate and aggregate
parent attributes down to its descendants.

Implement this mechanism by adding a new "effective" clamp value for each
task group. The effective clamp value is defined as the smaller value
between the clamp value of a group and the effective clamp value of its
parent. This is the actual clamp value enforced on tasks in a task group.

Since it's possible for a cpu.uclamp.min value to be bigger than the
cpu.uclamp.max value, ensure local consistency by restricting each
"protection" (i.e. min utilization) with the corresponding "limit"
(i.e. max utilization).

Do that at effective clamps propagation to ensure all user-space write
never fails while still always tracking the most restrictive values.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-3-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 0b60ba2dd342016e4e717dbaa4ca9af3a43f4434)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: If1cc136e1fb4a8f4c6ea15dc440b28d833a8d7e7
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:14 +00:00
Patrick Bellasi
19718921c3 UPSTREAM: sched/uclamp: Extend CPU's cgroup controller
The cgroup CPU bandwidth controller allows to assign a specified
(maximum) bandwidth to the tasks of a group. However this bandwidth is
defined and enforced only on a temporal base, without considering the
actual frequency a CPU is running on. Thus, the amount of computation
completed by a task within an allocated bandwidth can be very different
depending on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Giving the mechanisms described above, it is now possible to extend the
cpu controller to specify the minimum (or maximum) utilization which
should be considered for tasks RUNNABLE on a cpu.
This makes it possible to better defined the actual computational
power assigned to task groups, thus improving the cgroup CPU bandwidth
controller which is currently based just on time constraints.

Extend the CPU controller with a couple of new attributes uclamp.{min,max}
which allow to enforce utilization boosting and capping for all the
tasks in a group.

Specifically:

- uclamp.min: defines the minimum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run at least at a
	      minimum frequency which corresponds to the uclamp.min
	      utilization

- uclamp.max: defines the maximum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run up to a
	      maximum frequency which corresponds to the uclamp.max
	      utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies, while system wide clamps are defined by a generic
   interface which does not depends on cgroups. This system wide
   interface enforces constraints on tasks in the root node.

b) enforce effective constraints at each level of the hierarchy which
   are a restriction of the group requests considering its parent's
   effective constraints. Root group effective constraints are defined
   by the system wide interface.
   This mechanism allows each (non-root) level of the hierarchy to:
   - request whatever clamp values it would like to get
   - effectively get only up to the maximum amount allowed by its parent

c) have higher priority than task-specific clamps, defined via
   sched_setattr(), thus allowing to control and restrict task requests.

Add two new attributes to the cpu controller to collect "requested"
clamp values. Allow that at each non-root level of the hierarchy.
Keep it simple by not caring now about "effective" values computation
and propagation along the hierarchy.

Update sysctl_sched_uclamp_handler() to use the newly introduced
uclamp_mutex so that we serialize system default updates with cgroup
relate updates.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2480c093130f64ac3a410504fa8b3db1fc4b87ce)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I0285c44910bf073b80d7996361e6698bc5aedfae
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:14 +00:00
Patrick Bellasi
9a843ff48d BACKPORT: sched/uclamp: Add uclamp support to energy_compute()
The Energy Aware Scheduler (EAS) estimates the energy impact of waking
up a task on a given CPU. This estimation is based on:

 a) an (active) power consumption defined for each CPU frequency
 b) an estimation of which frequency will be used on each CPU
 c) an estimation of the busy time (utilization) of each CPU

Utilization clamping can affect both b) and c).

A CPU is expected to run:

 - on an higher than required frequency, but for a shorter time, in case
   its estimated utilization will be smaller than the minimum utilization
   enforced by uclamp
 - on a smaller than required frequency, but for a longer time, in case
   its estimated utilization is bigger than the maximum utilization
   enforced by uclamp

While compute_energy() already accounts clamping effects on busy time,
the clamping effects on frequency selection are currently ignored.

Fix it by considering how CPU clamp values will be affected by a
task waking up and being RUNNABLE on that CPU.

Do that by refactoring schedutil_freq_util() to take an additional
task_struct* which allows EAS to evaluate the impact on clamp values of
a task being eventually queued in a CPU. Clamp values are applied to the
RT+CFS utilization only when a FREQUENCY_UTIL is required by
compute_energy().

Do note that switching from ENERGY_UTIL to FREQUENCY_UTIL in the
computation of the cpu_util signal implies that we are more likely to
estimate the highest OPP when a RT task is running in another CPU of
the same performance domain. This can have an impact on energy
estimation but:

 - it's not easy to say which approach is better, since it depends on
   the use case
 - the original approach could still be obtained by setting a smaller
   task-specific util_min whenever required

Since we are at that:

 - rename schedutil_freq_util() into schedutil_cpu_util(),
   since it's not only used for frequency selection.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-12-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit af24bde8df2029f067dc46aff0393c8f18ff6e2f)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
[Moved cpu_util_cfs() outside of CONFIG_CPU_FREQ_GOV_SCHEDUTIL]
Change-Id: Idc4933f44be746ce35c1181a9288e6cb5d9607b2
[Protect cpu_util_cfs() with CONFIG_SMP ifdefery]
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 15:03:14 +00:00
Patrick Bellasi
814a151015 UPSTREAM: sched/uclamp: Add uclamp_util_with()
So far uclamp_util() allows to clamp a specified utilization considering
the clamp values requested by RUNNABLE tasks in a CPU. For the Energy
Aware Scheduler (EAS) it is interesting to test how clamp values will
change when a task is becoming RUNNABLE on a given CPU.
For example, EAS is interested in comparing the energy impact of
different scheduling decisions and the clamp values can play a role on
that.

Add uclamp_util_with() which allows to clamp a given utilization by
considering the possible impact on CPU clamp values of a specified task.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-11-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 9d20ad7dfc9a5cc64e33d725902d3863d350a66a)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ida153a3526b87f5674a6e037d4725d99eec7b478
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:38 +00:00
Patrick Bellasi
61d44b22d9 BACKPORT: sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
Each time a frequency update is required via schedutil, a frequency is
selected to (possibly) satisfy the utilization reported by each
scheduling class and irqs. However, when utilization clamping is in use,
the frequency selection should consider userspace utilization clamping
hints.  This will allow, for example, to:

 - boost tasks which are directly affecting the user experience
   by running them at least at a minimum "requested" frequency

 - cap low priority tasks not directly affecting the user experience
   by running them only up to a maximum "allowed" frequency

These constraints are meant to support a per-task based tuning of the
frequency selection thus supporting a fine grained definition of
performance boosting vs energy saving strategies in kernel space.

Add support to clamp the utilization of RUNNABLE FAIR and RT tasks
within the boundaries defined by their aggregated utilization clamp
constraints.

Do that by considering the max(min_util, max_util) to give boosted tasks
the performance they need even when they happen to be co-scheduled with
other capped tasks.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-10-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 982d9cdc22c9f6df5ad790caa229ff74fb1d95e7)

Conflicts:
	kernel/sched/cpufreq_schedutil.c

	1. Merged the if condition to include the non-upstream
	   sched_feat(SUGOV_RT_MAX_FREQ) check

	2. Change the function signature to pass util_cfs and define
	   util as an automatic variable.

Bug: 120440300
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ie222c9ad84776fc2948e30c116eee876df697a17
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:38 +00:00
Patrick Bellasi
fa167475e1 UPSTREAM: sched/uclamp: Set default clamps for RT tasks
By default FAIR tasks start without clamps, i.e. neither boosted nor
capped, and they run at the best frequency matching their utilization
demand.  This default behavior does not fit RT tasks which instead are
expected to run at the maximum available frequency, if not otherwise
required by explicitly capping them.

Enforce the correct behavior for RT tasks by setting util_min to max
whenever:

 1. the task is switched to the RT class and it does not already have a
    user-defined clamp value assigned.

 2. an RT task is forked from a parent with RESET_ON_FORK set.

NOTE: utilization clamp values are cross scheduling class attributes and
thus they are never changed/reset once a value has been explicitly
defined from user-space.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-9-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1a00d999971c78ab024a17b0efc37d78404dd120)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I81fcadaea34f557e531fa5ac6aab84fcb0ee37c7
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:38 +00:00
Patrick Bellasi
341f61099d UPSTREAM: sched/uclamp: Reset uclamp values on RESET_ON_FORK
A forked tasks gets the same clamp values of its parent however, when
the RESET_ON_FORK flag is set on parent, e.g. via:

   sys_sched_setattr()
      sched_setattr()
         __sched_setscheduler(attr::SCHED_FLAG_RESET_ON_FORK)

the new forked task is expected to start with all attributes reset to
default values.

Do that for utilization clamp values too by checking the reset request
from the existing uclamp_fork() call which already provides the required
initialization for other uclamp related bits.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-8-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit a87498ace58e23b62a572dc7267579ede4c8495c)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: If7bda202707aac3a2696a42f8146f607cdd36905
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:38 +00:00
Patrick Bellasi
e6056b2a5b UPSTREAM: sched/uclamp: Extend sched_setattr() to support utilization clamping
The SCHED_DEADLINE scheduling class provides an advanced and formal
model to define tasks requirements that can translate into proper
decisions for both task placements and frequencies selections. Other
classes have a more simplified model based on the POSIX concept of
priorities.

Such a simple priority based model however does not allow to exploit
most advanced features of the Linux scheduler like, for example, driving
frequencies selection via the schedutil cpufreq governor. However, also
for non SCHED_DEADLINE tasks, it's still interesting to define tasks
properties to support scheduler decisions.

Utilization clamping exposes to user-space a new set of per-task
attributes the scheduler can use as hints about the expected/required
utilization for a task. This allows to implement a "proactive" per-task
frequency control policy, a more advanced policy than the current one
based just on "passive" measured task utilization. For example, it's
possible to boost interactive tasks (e.g. to get better performance) or
cap background tasks (e.g. to be more energy/thermal efficient).

Introduce a new API to set utilization clamping values for a specified
task by extending sched_setattr(), a syscall which already allows to
define task specific properties for different scheduling classes. A new
pair of attributes allows to specify a minimum and maximum utilization
the scheduler can consider for a task.

Do that by validating the required clamp values before and then applying
the required changes using _the_ same pattern already in use for
__setscheduler(). This ensures that the task is re-enqueued with the new
clamp values.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-7-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit a509a7cd79747074a2c018a45bbbc52d1f4aed44)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I420e7ece5628bc639811a79654c35135a65bfd02
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:37 +00:00
Patrick Bellasi
258e6b82dd UPSTREAM: sched/core: Allow sched_setattr() to use the current policy
The sched_setattr() syscall mandates that a policy is always specified.
This requires to always know which policy a task will have when
attributes are configured and this makes it impossible to add more
generic task attributes valid across different scheduling policies.
Reading the policy before setting generic tasks attributes is racy since
we cannot be sure it is not changed concurrently.

Introduce the required support to change generic task attributes without
affecting the current task policy. This is done by adding an attribute flag
(SCHED_FLAG_KEEP_POLICY) to enforce the usage of the current policy.

Add support for the SETPARAM_POLICY policy, which is already used by the
sched_setparam() POSIX syscall, to the sched_setattr() non-POSIX
syscall.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-6-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 1d6362fa0cfc8c7b243fa92924429d826599e691)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I41cbe73d7aa30123adbd757fa30e346938651784
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:37 +00:00
Patrick Bellasi
613eecebf9 UPSTREAM: sched/uclamp: Add system default clamps
Tasks without a user-defined clamp value are considered not clamped
and by default their utilization can have any value in the
[0..SCHED_CAPACITY_SCALE] range.

Tasks with a user-defined clamp value are allowed to request any value
in that range, and the required clamp is unconditionally enforced.
However, a "System Management Software" could be interested in limiting
the range of clamp values allowed for all tasks.

Add a privileged interface to define a system default configuration via:

  /proc/sys/kernel/sched_uclamp_util_{min,max}

which works as an unconditional clamp range restriction for all tasks.

With the default configuration, the full SCHED_CAPACITY_SCALE range of
values is allowed for each clamp index. Otherwise, the task-specific
clamp is capped by the corresponding system default value.

Do that by tracking, for each task, the "effective" clamp value and
bucket the task has been refcounted in at enqueue time. This
allows to lazy aggregate "requested" and "system default" values at
enqueue time and simplifies refcounting updates at dequeue time.

The cached bucket ids are used to avoid (relatively) more expensive
integer divisions every time a task is enqueued.

An active flag is used to report when the "effective" value is valid and
thus the task is actually refcounted in the corresponding rq's bucket.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-5-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e8f14172c6b11e9a86c65532497087f8eb0f91b1)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I4f014c5ec9c312aaad606518f6e205fd0cfbcaa2
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:37 +00:00
Patrick Bellasi
c659be787e UPSTREAM: sched/uclamp: Enforce last task's UCLAMP_MAX
When a task sleeps it removes its max utilization clamp from its CPU.
However, the blocked utilization on that CPU can be higher than the max
clamp value enforced while the task was running. This allows undesired
CPU frequency increases while a CPU is idle, for example, when another
CPU on the same frequency domain triggers a frequency update, since
schedutil can now see the full not clamped blocked utilization of the
idle CPU.

Fix this by using:

  uclamp_rq_dec_id(p, rq, UCLAMP_MAX)
    uclamp_rq_max_value(rq, UCLAMP_MAX, clamp_value)

to detect when a CPU has no more RUNNABLE clamped tasks and to flag this
condition.

Don't track any minimum utilization clamps since an idle CPU never
requires a minimum frequency. The decay of the blocked utilization is
good enough to reduce the CPU frequency.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-4-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e496187da71070687b55ff455e7d8d7d7f0ae0b9)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ie9eab897eb654ec9d4fba5eda20f66a91a712817
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:37 +00:00
Patrick Bellasi
ad20939c13 UPSTREAM: sched/uclamp: Add bucket local max tracking
Because of bucketization, different task-specific clamp values are
tracked in the same bucket.  For example, with 20% bucket size and
assuming to have:

  Task1: util_min=25%
  Task2: util_min=35%

both tasks will be refcounted in the [20..39]% bucket and always boosted
only up to 20% thus implementing a simple floor aggregation normally
used in histograms.

In systems with only few and well-defined clamp values, it would be
useful to track the exact clamp value required by a task whenever
possible. For example, if a system requires only 23% and 47% boost
values then it's possible to track the exact boost required by each
task using only 3 buckets of ~33% size each.

Introduce a mechanism to max aggregate the requested clamp values of
RUNNABLE tasks in the same bucket. Keep it simple by resetting the
bucket value to its base value only when a bucket becomes inactive.
Allow a limited and controlled overboosting margin for tasks recounted
in the same bucket.

In systems where the boost values are not known in advance, it is still
possible to control the maximum acceptable overboosting margin by tuning
the number of clamp groups. For example, 20 groups ensure a 5% maximum
overboost.

Remove the rq bucket initialization code since a correct bucket value
is now computed when a task is refcounted into a CPU's rq.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-3-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 60daf9c19410604f08c99e146bc378c8a64f4ccd)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I8782971f8867033cee5aaf981c96f9de33a5288c
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:37 +00:00
Patrick Bellasi
d96bd1d5fc UPSTREAM: sched/uclamp: Add CPU's clamp buckets refcounting
Utilization clamping allows to clamp the CPU's utilization within a
[util_min, util_max] range, depending on the set of RUNNABLE tasks on
that CPU. Each task references two "clamp buckets" defining its minimum
and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
bucket is active if there is at least one RUNNABLE tasks enqueued on
that CPU and refcounting that bucket.

When a task is {en,de}queued {on,from} a rq, the set of active clamp
buckets on that CPU can change. If the set of active clamp buckets
changes for a CPU a new "aggregated" clamp value is computed for that
CPU. This is because each clamp bucket enforces a different utilization
clamp value.

Clamp values are always MAX aggregated for both util_min and util_max.
This ensures that no task can affect the performance of other
co-scheduled tasks which are more boosted (i.e. with higher util_min
clamp) or less capped (i.e. with higher util_max clamp).

A task has:
   task_struct::uclamp[clamp_id]::bucket_id
to track the "bucket index" of the CPU's clamp bucket it refcounts while
enqueued, for each clamp index (clamp_id).

A runqueue has:
   rq::uclamp[clamp_id]::bucket[bucket_id].tasks
to track how many RUNNABLE tasks on that CPU refcount each
clamp bucket (bucket_id) of a clamp index (clamp_id).
It also has a:
   rq::uclamp[clamp_id]::bucket[bucket_id].value
to track the clamp value of each clamp bucket (bucket_id) of a clamp
index (clamp_id).

The rq::uclamp::bucket[clamp_id][] array is scanned every time it's
needed to find a new MAX aggregated clamp value for a clamp_id. This
operation is required only when it's dequeued the last task of a clamp
bucket tracking the current MAX aggregated clamp value. In this case,
the CPU is either entering IDLE or going to schedule a less boosted or
more clamped task.
The expected number of different clamp values configured at build time
is small enough to fit the full unordered array into a single cache
line, for configurations of up to 7 buckets.

Add to struct rq the basic data structures required to refcount the
number of RUNNABLE tasks for each clamp bucket. Add also the max
aggregation required to update the rq's clamp value at each
enqueue/dequeue event.

Use a simple linear mapping of clamp values into clamp buckets.
Pre-compute and cache bucket_id to avoid integer divisions at
enqueue/dequeue time.

Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 69842cba9ace84849bb9b8edcdf2cefccd97901c)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I2c2c23572fb82e004f815cc9c783881355df6836
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:37 +00:00
Tejun Heo
3703043afb UPSTREAM: cgroup: add cgroup_parse_float()
cgroup already uses floating point for percent[ile] numbers and there
are several controllers which want to take them as input.  Add a
generic parse helper to handle inputs.

Update the interface convention documentation about the use of
percentage numbers.  While at it, also clarify the default time unit.

Bug: 120440300
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit a5e112e6424adb77d953eac20e6936b952fd6b32)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ic1fcf21d7955eb8edd2e8e91517bca6aef41694f
Signed-off-by: Quentin Perret <qperret@google.com>
2020-02-01 14:39:37 +00:00
Vincent Guittot
2d935df7b2 sched/fair: Fix insertion in rq->leaf_cfs_rq_list
commit f6783319737f28e4436a69611853a5a098cbe974 upstream.

Sargun reported a crash:

  "I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: Fix
   infinite loop in update_blocked_averages() by reverting a9e7f6544b
   and put it on top of 4.19.13. In addition to this, I uninlined
   list_add_leaf_cfs_rq for debugging.

   This revealed a new bug that we didn't get to because we kept getting
   crashes from the previous issue. When we are running with cgroups that
   are rapidly changing, with CFS bandwidth control, and in addition
   using the cpusets cgroup, we see this crash. Specifically, it seems to
   occur with cgroups that are throttled and we change the allowed
   cpuset."

The algorithm used to order cfs_rq in rq->leaf_cfs_rq_list assumes that
it will walk down to root the 1st time a cfs_rq is used and we will finish
to add either a cfs_rq without parent or a cfs_rq with a parent that is
already on the list. But this is not always true in presence of throttling.
Because a cfs_rq can be throttled even if it has never been used but other CPUs
of the cgroup have already used all the bandwdith, we are not sure to go down to
the root and add all cfs_rq in the list.

Ensure that all cfs_rq will be added in the list even if they are throttled.

[ mingo: Fix !CGROUPS build. ]

Reported-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: tj@kernel.org
Fixes: 9c2791f936 ("Fix hierarchical order in rq->leaf_cfs_rq_list")
Link: https://lkml.kernel.org/r/1548825767-10799-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Janne Huttunen <janne.huttunen@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-01 09:37:10 +00:00
Peter Zijlstra
6c11530ea4 sched/fair: Add tmp_alone_branch assertion
commit 5d299eabea5a251fbf66e8277704b874bbba92dc upstream.

The magic in list_add_leaf_cfs_rq() requires that at the end of
enqueue_task_fair():

  rq->tmp_alone_branch == &rq->lead_cfs_rq_list

If this is violated, list integrity is compromised for list entries
and the tmp_alone_branch pointer might dangle.

Also, reflow list_add_leaf_cfs_rq() while there. This looses one
indentation level and generates a form that's convenient for the next
patch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Janne Huttunen <janne.huttunen@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-01 09:37:10 +00:00
Sami Tolvanen
4596eee0c8 ANDROID: kallsyms: strip hashes from function names with ThinLTO
With CONFIG_THINLTO and CFI both enabled, LLVM appends a hash to the
names of all static functions. This breaks userspace tools, so strip
out the hash from output.

Bug: 147422318
Change-Id: Ibea6be089d530e92dcd191481cb02549041203f6
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2020-01-31 16:50:06 +00:00
Greg Kroah-Hartman
654c66e990 This is the 4.19.100 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4xqFEACgkQONu9yGCS
 aT4pWhAAnBOvHPDEBjQzrvrhQAEZkT421Plew1Z1E/RL0CgbYisuaUMmNhGppfAu
 MFt3TXt7sbQ9XfzfRGhiuY3jv2pOBiKlu3DdmKanLGjeCXGSPFPXf+UL/m4utD2F
 /XvtWTQwOakrghfJn93iF01nF6pSc7IIe7hBgptyc0C0TZXvPy7FC03JxCiepW/8
 XEsXYbth6jTEaWwwFZf/QK9sYh7BThm/CmK8UsIdG1kZMW8I9jpAp+1m2DqCB4Je
 KACR4IEfWGEvipw8r0tCDjbSeo8LlKkbb3Kiz/yPZemECX/MEeN3ErGLQT+eut5a
 G6Bs5QJfgoYaq3/XjhRp3IQhM8OFEFe9Z0rm7mikRbKDmPp24f9cQ8OUVDEUSBWK
 zaTi5U7K5jEAI1/PNn2ZSKWMKya2AP5awX48jV2e6bHDo6AXK/JPr2omu2WqpT9f
 SbPa47cFmgR11oFWpCmLFG//sL5oB5djP1blAhnMxExVlzBpMOmJi4jCd461hTaA
 mQ+E5WDOMoRTxC2G+SoBoyYprsNK0PIPZilIs6hPz9kSJan7EXKeqfA5ZFjwY3mW
 tb+kS5XpllfWvajAkWskZpre2NITUV3ybIywm8pPaX2OSvC3zodaglHm76yj32Vk
 qkfS3psVnpQTKepip5a1y6pkB951jE4hx/zofmW3tMivQGOFSCU=
 =dyoL
 -----END PGP SIGNATURE-----

Merge 4.19.100 into android-4.19

Changes in 4.19.100
	can, slip: Protect tty->disc_data in write_wakeup and close with RCU
	firestream: fix memory leaks
	gtp: make sure only SOCK_DGRAM UDP sockets are accepted
	ipv6: sr: remove SKB_GSO_IPXIP6 on End.D* actions
	net: bcmgenet: Use netif_tx_napi_add() for TX NAPI
	net: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEM
	net: ip6_gre: fix moving ip6gre between namespaces
	net, ip6_tunnel: fix namespaces move
	net, ip_tunnel: fix namespaces move
	net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()
	net_sched: fix datalen for ematch
	net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject
	net-sysfs: fix netdev_queue_add_kobject() breakage
	net-sysfs: Call dev_hold always in netdev_queue_add_kobject
	net-sysfs: Call dev_hold always in rx_queue_add_kobject
	net-sysfs: Fix reference count leak
	net: usb: lan78xx: Add .ndo_features_check
	Revert "udp: do rmem bulk free even if the rx sk queue is empty"
	tcp_bbr: improve arithmetic division in bbr_update_bw()
	tcp: do not leave dangling pointers in tp->highest_sack
	tun: add mutex_unlock() call and napi.skb clearing in tun_get_user()
	afs: Fix characters allowed into cell names
	hwmon: (adt7475) Make volt2reg return same reg as reg2volt input
	hwmon: (core) Do not use device managed functions for memory allocations
	PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken
	tracing: trigger: Replace unneeded RCU-list traversals
	Input: keyspan-remote - fix control-message timeouts
	Revert "Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers"
	ARM: 8950/1: ftrace/recordmcount: filter relocation types
	mmc: tegra: fix SDR50 tuning override
	mmc: sdhci: fix minimum clock rate for v3 controller
	Documentation: Document arm64 kpti control
	Input: pm8xxx-vib - fix handling of separate enable register
	Input: sur40 - fix interface sanity checks
	Input: gtco - fix endpoint sanity check
	Input: aiptek - fix endpoint sanity check
	Input: pegasus_notetaker - fix endpoint sanity check
	Input: sun4i-ts - add a check for devm_thermal_zone_of_sensor_register
	netfilter: nft_osf: add missing check for DREG attribute
	hwmon: (nct7802) Fix voltage limits to wrong registers
	scsi: RDMA/isert: Fix a recently introduced regression related to logout
	tracing: xen: Ordered comparison of function pointers
	do_last(): fetch directory ->i_mode and ->i_uid before it's too late
	net/sonic: Add mutual exclusion for accessing shared state
	net/sonic: Clear interrupt flags immediately
	net/sonic: Use MMIO accessors
	net/sonic: Fix interface error stats collection
	net/sonic: Fix receive buffer handling
	net/sonic: Avoid needless receive descriptor EOL flag updates
	net/sonic: Improve receive descriptor status flag check
	net/sonic: Fix receive buffer replenishment
	net/sonic: Quiesce SONIC before re-initializing descriptor memory
	net/sonic: Fix command register usage
	net/sonic: Fix CAM initialization
	net/sonic: Prevent tx watchdog timeout
	tracing: Use hist trigger's var_ref array to destroy var_refs
	tracing: Remove open-coding of hist trigger var_ref management
	tracing: Fix histogram code when expression has same var as value
	sd: Fix REQ_OP_ZONE_REPORT completion handling
	crypto: geode-aes - switch to skcipher for cbc(aes) fallback
	coresight: etb10: Do not call smp_processor_id from preemptible
	coresight: tmc-etf: Do not call smp_processor_id from preemptible
	libertas: Fix two buffer overflows at parsing bss descriptor
	media: v4l2-ioctl.c: zero reserved fields for S/TRY_FMT
	scsi: iscsi: Avoid potential deadlock in iscsi_if_rx func
	netfilter: ipset: use bitmap infrastructure completely
	netfilter: nf_tables: add __nft_chain_type_get()
	net/x25: fix nonblocking connect
	mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
	mm, sparse: drop pgdat_resize_lock in sparse_add/remove_one_section()
	mm, sparse: pass nid instead of pgdat to sparse_add_one_section()
	drivers/base/memory.c: remove an unnecessary check on NR_MEM_SECTIONS
	mm, memory_hotplug: add nid parameter to arch_remove_memory
	mm/memory_hotplug: release memory resource after arch_remove_memory()
	drivers/base/memory.c: clean up relics in function parameters
	mm, memory_hotplug: update a comment in unregister_memory()
	mm/memory_hotplug: make unregister_memory_section() never fail
	mm/memory_hotplug: make __remove_section() never fail
	powerpc/mm: Fix section mismatch warning
	mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never fail
	s390x/mm: implement arch_remove_memory()
	mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE
	drivers/base/memory: pass a block_id to init_memory_block()
	mm/memory_hotplug: create memory block devices after arch_add_memory()
	mm/memory_hotplug: remove memory block devices before arch_remove_memory()
	mm/memory_hotplug: make unregister_memory_block_under_nodes() never fail
	mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section
	mm/hotplug: kill is_dev_zone() usage in __remove_pages()
	drivers/base/node.c: simplify unregister_memory_block_under_nodes()
	mm/memunmap: don't access uninitialized memmap in memunmap_pages()
	mm/memory_hotplug: fix try_offline_node()
	mm/memory_hotplug: shrink zones when offlining memory
	Linux 4.19.100

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I1664d6d4de9358bff5632c291a26e1401ec7b5f1
2020-01-29 17:10:45 +01:00
David Hildenbrand
86834898d5 mm/memory_hotplug: shrink zones when offlining memory
commit feee6b2989165631b17ac6d4ccdbf6759254e85a upstream.

-- snip --

- Missing arm64 hot(un)plug support
- Missing some vmem_altmap_offset() cleanups
- Missing sub-section hotadd support
- Missing unification of mm/hmm.c and kernel/memremap.c

-- snip --

We currently try to shrink a single zone when removing memory.  We use
the zone of the first page of the memory we are removing.  If that
memmap was never initialized (e.g., memory was never onlined), we will
read garbage and can trigger kernel BUGs (due to a stale pointer):

    BUG: unable to handle page fault for address: 000000000000353d
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 0 P4D 0
    Oops: 0002 [#1] SMP PTI
    CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
    Workqueue: kacpi_hotplug acpi_hotplug_work_fn
    RIP: 0010:clear_zone_contiguous+0x5/0x10
    Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
    RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
    RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
    RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
    R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
    FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     __remove_pages+0x4b/0x640
     arch_remove_memory+0x63/0x8d
     try_remove_memory+0xdb/0x130
     __remove_memory+0xa/0x11
     acpi_memory_device_remove+0x70/0x100
     acpi_bus_trim+0x55/0x90
     acpi_device_hotplug+0x227/0x3a0
     acpi_hotplug_work_fn+0x1a/0x30
     process_one_work+0x221/0x550
     worker_thread+0x50/0x3b0
     kthread+0x105/0x140
     ret_from_fork+0x3a/0x50
    Modules linked in:
    CR2: 000000000000353d

Instead, shrink the zones when offlining memory or when onlining failed.
Introduce and use remove_pfn_range_from_zone(() for that.  We now
properly shrink the zones, even if we have DIMMs whereby

 - Some memory blocks fall into no zone (never onlined)

 - Some memory blocks fall into multiple zones (offlined+re-onlined)

 - Multiple memory blocks that fall into different zones

Drop the zone parameter (with a potential dubious value) from
__remove_pages() and __remove_section().

Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e86b]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: <stable@vger.kernel.org>	[5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:43:27 +01:00
Aneesh Kumar K.V
f291080659 mm/memunmap: don't access uninitialized memmap in memunmap_pages()
commit 77e080e7680e1e615587352f70c87b9e98126d03 upstream.

-- snip --

- Missing mm/hmm.c and kernel/memremap.c unification.
-- hmm code does not need fixes (no altmap)
- Missing 7cc7867fb061 ("mm/devm_memremap_pages: enable sub-section remap")

-- snip --

Patch series "mm/memory_hotplug: Shrink zones before removing memory",
v6.

This series fixes the access of uninitialized memmaps when shrinking
zones/nodes and when removing memory.  Also, it contains all fixes for
crashes that can be triggered when removing certain namespace using
memunmap_pages() - ZONE_DEVICE, reported by Aneesh.

We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
more involved (we don't have SECTION_IS_ONLINE as an indicator), and
shrinking is only of limited use (set_zone_contiguous() cannot detect
the ZONE_DEVICE as contiguous).

We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount
of code to a minimum.  Shrinking is especially necessary to keep
zone->contiguous set where possible, especially, on memory unplug of
DIMMs at zone boundaries.

--------------------------------------------------------------------------

Zones are now properly shrunk when offlining memory blocks or when
onlining failed.  This allows to properly shrink zones on memory unplug
even if the separate memory blocks of a DIMM were onlined to different
zones or re-onlined to a different zone after offlining.

Example:

  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  0
          present  0
          managed  0
  :/# echo "online_movable" > /sys/devices/system/memory/memory41/state
  :/# echo "online_movable" > /sys/devices/system/memory/memory43/state
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  98304
          present  65536
          managed  65536
  :/# echo 0 > /sys/devices/system/memory/memory43/online
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  32768
          present  32768
          managed  32768
  :/# echo 0 > /sys/devices/system/memory/memory41/online
  :/# cat /proc/zoneinfo
  Node 1, zone  Movable
          spanned  0
          present  0
          managed  0

This patch (of 10):

With an altmap, the memmap falling into the reserved altmap space are not
initialized and, therefore, contain a garbage NID and a garbage zone.
Make sure to read the NID/zone from a memmap that was initialized.

This fixes a kernel crash that is observed when destroying a namespace:

  kernel BUG at include/linux/mm.h:1107!
  cpu 0x1: Vector: 700 (Program Check) at [c000000274087890]
      pc: c0000000004b9728: memunmap_pages+0x238/0x340
      lr: c0000000004b9724: memunmap_pages+0x234/0x340
  ...
      pid   = 3669, comm = ndctl
  kernel BUG at include/linux/mm.h:1107!
    devm_action_release+0x30/0x50
    release_nodes+0x268/0x2d0
    device_release_driver_internal+0x174/0x240
    unbind_store+0x13c/0x190
    drv_attr_store+0x44/0x60
    sysfs_kf_write+0x70/0xa0
    kernfs_fop_write+0x1ac/0x290
    __vfs_write+0x3c/0x70
    vfs_write+0xe4/0x200
    ksys_write+0x7c/0x140
    system_call+0x5c/0x68

The "page_zone(pfn_to_page(pfn)" was introduced by 69324b8f4833 ("mm,
devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I
think we will never have driver reserved memory with
MEMORY_DEVICE_PRIVATE (no altmap AFAIKS).

[david@redhat.com: minimze code changes, rephrase description]
Link: http://lkml.kernel.org/r/20191006085646.5768-2-david@redhat.com
Fixes: 2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to arch_remove_memory")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Damian Tometzki <damian.tometzki@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Rich Felker <dalias@libc.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Cc: <stable@vger.kernel.org>	[5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:43:27 +01:00
Oscar Salvador
5c1f8f5358 mm, memory_hotplug: add nid parameter to arch_remove_memory
commit 2c2a5af6fed20cf74401c9d64319c76c5ff81309 upstream.

-- snip --

Missing unification of mm/hmm.c and kernel/memremap.c

-- snip --

Patch series "Do not touch pages in hot-remove path", v2.

This patchset aims for two things:

 1) A better definition about offline and hot-remove stage
 2) Solving bugs where we can access non-initialized pages
    during hot-remove operations [2] [3].

This is achieved by moving all page/zone handling to the offline
stage, so we do not need to access pages when hot-removing memory.

[1] https://patchwork.kernel.org/cover/10691415/
[2] https://patchwork.kernel.org/patch/10547445/
[3] https://www.spinics.net/lists/linux-mm/msg161316.html

This patch (of 5):

This is a preparation for the following-up patches.  The idea of passing
the nid is that it will allow us to get rid of the zone parameter
afterwards.

Link: http://lkml.kernel.org/r/20181127162005.15833-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:43:25 +01:00
Steven Rostedt (VMware)
ce28d66405 tracing: Fix histogram code when expression has same var as value
commit 8bcebc77e85f3d7536f96845a0fe94b1dddb6af0 upstream.

While working on a tool to convert SQL syntex into the histogram language of
the kernel, I discovered the following bug:

 # echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events
 # echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger
 # echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

Would not display any histograms in the sched_switch histogram side.

But if I were to swap the location of

  "delta=common_timestamp-$start" with "start2=$start"

Such that the last line had:

 # echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger

The histogram works as expected.

What I found out is that the expressions clear out the value once it is
resolved. As the variables are resolved in the order listed, when
processing:

  delta=common_timestamp-$start

The $start is cleared. When it gets to "start2=$start", it errors out with
"unresolved symbol" (which is silent as this happens at the location of the
trace), and the histogram is dropped.

When processing the histogram for variable references, instead of adding a
new reference for a variable used twice, use the same reference. That way,
not only is it more efficient, but the order will no longer matter in
processing of the variables.

From Tom Zanussi:

 "Just to clarify some more about what the problem was is that without
  your patch, we would have two separate references to the same variable,
  and during resolve_var_refs(), they'd both want to be resolved
  separately, so in this case, since the first reference to start wasn't
  part of an expression, it wouldn't get the read-once flag set, so would
  be read normally, and then the second reference would do the read-once
  read and also be read but using read-once.  So everything worked and
  you didn't see a problem:

   from: start2=$start,delta=common_timestamp-$start

  In the second case, when you switched them around, the first reference
  would be resolved by doing the read-once, and following that the second
  reference would try to resolve and see that the variable had already
  been read, so failed as unset, which caused it to short-circuit out and
  not do the trigger action to generate the synthetic event:

   to: delta=common_timestamp-$start,start2=$start

  With your patch, we only have the single resolution which happens
  correctly the one time it's resolved, so this can't happen."

Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
Reviewed-by: Tom Zanuss <zanussi@kernel.org>
Tested-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:43:23 +01:00
Tom Zanussi
cbb042fd87 tracing: Remove open-coding of hist trigger var_ref management
commit de40f033d4e84e843d6a12266e3869015ea9097c upstream.

Have create_var_ref() manage the hist trigger's var_ref list, rather
than having similar code doing it in multiple places.  This cleans up
the code and makes sure var_refs are always accounted properly.

Also, document the var_ref-related functions to make what their
purpose clearer.

Link: http://lkml.kernel.org/r/05ddae93ff514e66fc03897d6665231892939913.1545161087.git.tom.zanussi@linux.intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:43:23 +01:00
Tom Zanussi
836717841a tracing: Use hist trigger's var_ref array to destroy var_refs
commit 656fe2ba85e81d00e4447bf77b8da2be3c47acb2 upstream.

Since every var ref for a trigger has an entry in the var_ref[] array,
use that to destroy the var_refs, instead of piecemeal via the field
expressions.

This allows us to avoid having to keep and treat differently separate
lists for the action-related references, which future patches will
remove.

Link: http://lkml.kernel.org/r/fad1a164f0e257c158e70d6eadbf6c586e04b2a2.1545161087.git.tom.zanussi@linux.intel.com

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:43:23 +01:00
Masami Hiramatsu
47eb3574d0 tracing: trigger: Replace unneeded RCU-list traversals
commit aeed8aa3874dc15b9d82a6fe796fd7cfbb684448 upstream.

With CONFIG_PROVE_RCU_LIST, I had many suspicious RCU warnings
when I ran ftracetest trigger testcases.

-----
  # dmesg -c > /dev/null
  # ./ftracetest test.d/trigger
  ...
  # dmesg | grep "RCU-list traversed" | cut -f 2 -d ] | cut -f 2 -d " "
  kernel/trace/trace_events_hist.c:6070
  kernel/trace/trace_events_hist.c:1760
  kernel/trace/trace_events_hist.c:5911
  kernel/trace/trace_events_trigger.c:504
  kernel/trace/trace_events_hist.c:1810
  kernel/trace/trace_events_hist.c:3158
  kernel/trace/trace_events_hist.c:3105
  kernel/trace/trace_events_hist.c:5518
  kernel/trace/trace_events_hist.c:5998
  kernel/trace/trace_events_hist.c:6019
  kernel/trace/trace_events_hist.c:6044
  kernel/trace/trace_events_trigger.c:1500
  kernel/trace/trace_events_trigger.c:1540
  kernel/trace/trace_events_trigger.c:539
  kernel/trace/trace_events_trigger.c:584
-----

I investigated those warnings and found that the RCU-list
traversals in event trigger and hist didn't need to use
RCU version because those were called only under event_mutex.

I also checked other RCU-list traversals related to event
trigger list, and found that most of them were called from
event_hist_trigger_func() or hist_unregister_trigger() or
register/unregister functions except for a few cases.

Replace these unneeded RCU-list traversals with normal list
traversal macro and lockdep_assert_held() to check the
event_mutex is held.

Link: http://lkml.kernel.org/r/157680910305.11685.15110237954275915782.stgit@devnote2

Cc: stable@vger.kernel.org
Fixes: 30350d65ac ("tracing: Add variable support to hist triggers")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29 16:43:18 +01:00
Greg Kroah-Hartman
1fca2c99f4 This is the 4.19.99 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4u6tsACgkQONu9yGCS
 aT693A//TExeDRnNnf+2v4TJorylyRr17BMxk/Ie2L5E6d2n/RWodsrOThAPU9tx
 5alNUkXCT8Jd31BUVnUoPoAQ4zSymSVi++XEf05wDeO0tQ982IESGaLmu9EC1uMF
 nnM5y4IdRYmFI1Zji4h5vRJckoYUlB6Mdg4BgMr4Q1KX7RkZYfe6bjs7DwM/uyMx
 jVXdFaQBD1H6F5W6A+GmgUZ36g9uNqzcBxxWwv5URj+q816NdI4bsxIJMF0v0WC+
 S54fmpS07QWIYKKsQBUepeSgEF4ECESOE2VoF1ICcnfakdPnDBmNgyPJPSrLmVf+
 itRUxoH1MewaOvoJrv+xsGBPmM29LcKH2oBmj5DR2Xstp7ACPs+OtXJEU9dUTDN4
 NhaSts5fIp0f4Y5mMn508pDUwYDAWDt99ZJWdx6aK/TRyUsHBgpxBQDt37BE3U5W
 PCBnObNe2b2KDAsVXLjX5iDYoA0+usFreveMo8uEP+ohfh0ANvJlRkzedYw7NquI
 ZCcT+I1P9q8aa0528tR332VLrQeYg+kG6LVi2kAabmRA/VtEsT0w90MY/eo2vuTU
 WlPmbs2yerv2HTm050e6MOgBZfPh7wP/FpbjsSXufj7EDywlfxF+1hXdwfrpPJeN
 fN3g0kepeUp7+kLzO40FLam/z5ndjAUhyN2SBaPzGsXjMkZdETk=
 =zvlh
 -----END PGP SIGNATURE-----

Merge 4.19.99 into android-4.19

Changes in 4.19.99
	Revert "efi: Fix debugobjects warning on 'efi_rts_work'"
	xfs: Sanity check flags of Q_XQUOTARM call
	i2c: stm32f7: rework slave_id allocation
	i2c: i2c-stm32f7: fix 10-bits check in slave free id search loop
	mfd: intel-lpss: Add default I2C device properties for Gemini Lake
	SUNRPC: Fix svcauth_gss_proxy_init()
	powerpc/pseries: Enable support for ibm,drc-info property
	powerpc/archrandom: fix arch_get_random_seed_int()
	tipc: update mon's self addr when node addr generated
	tipc: fix wrong timeout input for tipc_wait_for_cond()
	mt7601u: fix bbp version check in mt7601u_wait_bbp_ready
	crypto: sun4i-ss - fix big endian issues
	perf map: No need to adjust the long name of modules
	soc: aspeed: Fix snoop_file_poll()'s return type
	watchdog: sprd: Fix the incorrect pointer getting from driver data
	ipmi: Fix memory leak in __ipmi_bmc_register
	drm/sti: do not remove the drm_bridge that was never added
	ARM: dts: at91: nattis: set the PRLUD and HIPOW signals low
	ARM: dts: at91: nattis: make the SD-card slot work
	ixgbe: don't clear IPsec sa counters on HW clearing
	drm/virtio: fix bounds check in virtio_gpu_cmd_get_capset()
	iio: fix position relative kernel version
	apparmor: Fix network performance issue in aa_label_sk_perm
	ALSA: hda: fix unused variable warning
	apparmor: don't try to replace stale label in ptrace access check
	ARM: qcom_defconfig: Enable MAILBOX
	firmware: coreboot: Let OF core populate platform device
	PCI: iproc: Remove PAXC slot check to allow VF support
	bridge: br_arp_nd_proxy: set icmp6_router if neigh has NTF_ROUTER
	drm/hisilicon: hibmc: Don't overwrite fb helper surface depth
	signal/ia64: Use the generic force_sigsegv in setup_frame
	signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
	ASoC: wm9712: fix unused variable warning
	mailbox: mediatek: Add check for possible failure of kzalloc
	IB/rxe: replace kvfree with vfree
	IB/hfi1: Add mtu check for operational data VLs
	genirq/debugfs: Reinstate full OF path for domain name
	usb: dwc3: add EXTCON dependency for qcom
	usb: gadget: fsl_udc_core: check allocation return value and cleanup on failure
	cfg80211: regulatory: make initialization more robust
	mei: replace POLL* with EPOLL* for write queues.
	drm/msm: fix unsigned comparison with less than zero
	of: Fix property name in of_node_get_device_type
	ALSA: usb-audio: update quirk for B&W PX to remove microphone
	iwlwifi: nvm: get num of hw addresses from firmware
	staging: comedi: ni_mio_common: protect register write overflow
	netfilter: nft_osf: usage from output path is not valid
	pwm: lpss: Release runtime-pm reference from the driver's remove callback
	powerpc/pseries/memory-hotplug: Fix return value type of find_aa_index
	rtlwifi: rtl8821ae: replace _rtl8821ae_mrate_idx_to_arfr_id with generic version
	RDMA/bnxt_re: Add missing spin lock initialization
	netfilter: nf_flow_table: do not remove offload when other netns's interface is down
	powerpc/kgdb: add kgdb_arch_set/remove_breakpoint()
	tipc: eliminate message disordering during binding table update
	net: socionext: Add dummy PHY register read in phy_write()
	drm/sun4i: hdmi: Fix double flag assignation
	net: hns3: add error handler for hns3_nic_init_vector_data()
	mlxsw: reg: QEEC: Add minimum shaper fields
	mlxsw: spectrum: Set minimum shaper on MC TCs
	NTB: ntb_hw_idt: replace IS_ERR_OR_NULL with regular NULL checks
	ASoC: wm97xx: fix uninitialized regmap pointer problem
	ARM: dts: bcm283x: Correct mailbox register sizes
	pcrypt: use format specifier in kobject_add
	ASoC: sun8i-codec: add missing route for ADC
	pinctrl: meson-gxl: remove invalid GPIOX tsin_a pins
	bus: ti-sysc: Add mcasp optional clocks flag
	exportfs: fix 'passing zero to ERR_PTR()' warning
	drm: rcar-du: Fix the return value in case of error in 'rcar_du_crtc_set_crc_source()'
	drm: rcar-du: Fix vblank initialization
	net: always initialize pagedlen
	drm/dp_mst: Skip validating ports during destruction, just ref
	arm64: dts: meson-gx: Add hdmi_5v regulator as hdmi tx supply
	arm64: dts: renesas: r8a7795-es1: Add missing power domains to IPMMU nodes
	net: phy: Fix not to call phy_resume() if PHY is not attached
	IB/hfi1: Correctly process FECN and BECN in packets
	OPP: Fix missing debugfs supply directory for OPPs
	IB/rxe: Fix incorrect cache cleanup in error flow
	mailbox: ti-msgmgr: Off by one in ti_msgmgr_of_xlate()
	staging: bcm2835-camera: Abort probe if there is no camera
	staging: bcm2835-camera: fix module autoloading
	switchtec: Remove immediate status check after submitting MRPC command
	ipv6: add missing tx timestamping on IPPROTO_RAW
	pinctrl: sh-pfc: r8a7740: Add missing REF125CK pin to gether_gmii group
	pinctrl: sh-pfc: r8a7740: Add missing LCD0 marks to lcd0_data24_1 group
	pinctrl: sh-pfc: r8a7791: Remove bogus ctrl marks from qspi_data4_b group
	pinctrl: sh-pfc: r8a7791: Remove bogus marks from vin1_b_data18 group
	pinctrl: sh-pfc: sh73a0: Add missing TO pin to tpu4_to3 group
	pinctrl: sh-pfc: r8a7794: Remove bogus IPSR9 field
	pinctrl: sh-pfc: r8a77970: Add missing MOD_SEL0 field
	pinctrl: sh-pfc: r8a77980: Add missing MOD_SEL0 field
	pinctrl: sh-pfc: sh7734: Add missing IPSR11 field
	pinctrl: sh-pfc: r8a77995: Remove bogus SEL_PWM[0-3]_3 configurations
	pinctrl: sh-pfc: sh7269: Add missing PCIOR0 field
	pinctrl: sh-pfc: sh7734: Remove bogus IPSR10 value
	net: hns3: fix error handling int the hns3_get_vector_ring_chain
	vxlan: changelink: Fix handling of default remotes
	Input: nomadik-ske-keypad - fix a loop timeout test
	fork,memcg: fix crash in free_thread_stack on memcg charge fail
	clk: highbank: fix refcount leak in hb_clk_init()
	clk: qoriq: fix refcount leak in clockgen_init()
	clk: ti: fix refcount leak in ti_dt_clocks_register()
	clk: socfpga: fix refcount leak
	clk: samsung: exynos4: fix refcount leak in exynos4_get_xom()
	clk: imx6q: fix refcount leak in imx6q_clocks_init()
	clk: imx6sx: fix refcount leak in imx6sx_clocks_init()
	clk: imx7d: fix refcount leak in imx7d_clocks_init()
	clk: vf610: fix refcount leak in vf610_clocks_init()
	clk: armada-370: fix refcount leak in a370_clk_init()
	clk: kirkwood: fix refcount leak in kirkwood_clk_init()
	clk: armada-xp: fix refcount leak in axp_clk_init()
	clk: mv98dx3236: fix refcount leak in mv98dx3236_clk_init()
	clk: dove: fix refcount leak in dove_clk_init()
	MIPS: BCM63XX: drop unused and broken DSP platform device
	arm64: defconfig: Re-enable bcm2835-thermal driver
	remoteproc: qcom: q6v5-mss: Add missing clocks for MSM8996
	remoteproc: qcom: q6v5-mss: Add missing regulator for MSM8996
	drm: Fix error handling in drm_legacy_addctx
	ARM: dts: r8a7743: Remove generic compatible string from iic3
	drm/etnaviv: fix some off by one bugs
	drm/fb-helper: generic: Fix setup error path
	fork, memcg: fix cached_stacks case
	IB/usnic: Fix out of bounds index check in query pkey
	RDMA/ocrdma: Fix out of bounds index check in query pkey
	RDMA/qedr: Fix out of bounds index check in query pkey
	drm/shmob: Fix return value check in shmob_drm_probe
	arm64: dts: apq8016-sbc: Increase load on l11 for SDCARD
	spi: cadence: Correct initialisation of runtime PM
	RDMA/iw_cxgb4: Fix the unchecked ep dereference
	net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ9031
	memory: tegra: Don't invoke Tegra30+ specific memory timing setup on Tegra20
	drm/etnaviv: NULL vs IS_ERR() buf in etnaviv_core_dump()
	media: s5p-jpeg: Correct step and max values for V4L2_CID_JPEG_RESTART_INTERVAL
	kbuild: mark prepare0 as PHONY to fix external module build
	crypto: brcm - Fix some set-but-not-used warning
	crypto: tgr192 - fix unaligned memory access
	ASoC: imx-sgtl5000: put of nodes if finding codec fails
	IB/iser: Pass the correct number of entries for dma mapped SGL
	net: hns3: fix wrong combined count returned by ethtool -l
	media: tw9910: Unregister subdevice with v4l2-async
	IB/mlx5: Don't override existing ip_protocol
	rtc: cmos: ignore bogus century byte
	spi/topcliff_pch: Fix potential NULL dereference on allocation error
	net: hns3: fix bug of ethtool_ops.get_channels for VF
	ARM: dts: sun8i-a23-a33: Move NAND controller device node to sort by address
	clk: sunxi-ng: sun8i-a23: Enable PLL-MIPI LDOs when ungating it
	iwlwifi: mvm: avoid possible access out of array.
	net/mlx5: Take lock with IRQs disabled to avoid deadlock
	ip_tunnel: Fix route fl4 init in ip_md_tunnel_xmit
	arm64: dts: allwinner: h6: Move GIC device node fix base address ordering
	iwlwifi: mvm: fix A-MPDU reference assignment
	bus: ti-sysc: Fix timer handling with drop pm_runtime_irq_safe()
	tty: ipwireless: Fix potential NULL pointer dereference
	driver: uio: fix possible memory leak in __uio_register_device
	driver: uio: fix possible use-after-free in __uio_register_device
	crypto: crypto4xx - Fix wrong ppc4xx_trng_probe()/ppc4xx_trng_remove() arguments
	driver core: Fix DL_FLAG_AUTOREMOVE_SUPPLIER device link flag handling
	driver core: Avoid careless re-use of existing device links
	driver core: Do not resume suppliers under device_links_write_lock()
	driver core: Fix handling of runtime PM flags in device_link_add()
	driver core: Do not call rpm_put_suppliers() in pm_runtime_drop_link()
	ARM: dts: lpc32xx: add required clocks property to keypad device node
	ARM: dts: lpc32xx: reparent keypad controller to SIC1
	ARM: dts: lpc32xx: fix ARM PrimeCell LCD controller variant
	ARM: dts: lpc32xx: fix ARM PrimeCell LCD controller clocks property
	ARM: dts: lpc32xx: phy3250: fix SD card regulator voltage
	drm/xen-front: Fix mmap attributes for display buffers
	iwlwifi: mvm: fix RSS config command
	staging: most: cdev: add missing check for cdev_add failure
	clk: ingenic: jz4740: Fix gating of UDC clock
	rtc: ds1672: fix unintended sign extension
	thermal: mediatek: fix register index error
	arm64: dts: msm8916: remove bogus argument to the cpu clock
	ath10k: fix dma unmap direction for management frames
	net: phy: fixed_phy: Fix fixed_phy not checking GPIO
	rtc: ds1307: rx8130: Fix alarm handling
	net/smc: original socket family in inet_sock_diag
	rtc: 88pm860x: fix unintended sign extension
	rtc: 88pm80x: fix unintended sign extension
	rtc: pm8xxx: fix unintended sign extension
	fbdev: chipsfb: remove set but not used variable 'size'
	iw_cxgb4: use tos when importing the endpoint
	iw_cxgb4: use tos when finding ipv6 routes
	ipmi: kcs_bmc: handle devm_kasprintf() failure case
	xsk: add missing smp_rmb() in xsk_mmap
	drm/etnaviv: potential NULL dereference
	ntb_hw_switchtec: debug print 64bit aligned crosslink BAR Numbers
	ntb_hw_switchtec: NT req id mapping table register entry number should be 512
	pinctrl: sh-pfc: emev2: Add missing pinmux functions
	pinctrl: sh-pfc: r8a7791: Fix scifb2_data_c pin group
	pinctrl: sh-pfc: r8a7792: Fix vin1_data18_b pin group
	pinctrl: sh-pfc: sh73a0: Fix fsic_spdif pin groups
	RDMA/mlx5: Fix memory leak in case we fail to add an IB device
	driver core: Fix possible supplier PM-usage counter imbalance
	PCI: endpoint: functions: Use memcpy_fromio()/memcpy_toio()
	usb: phy: twl6030-usb: fix possible use-after-free on remove
	block: don't use bio->bi_vcnt to figure out segment number
	keys: Timestamp new keys
	net: dsa: b53: Fix default VLAN ID
	net: dsa: b53: Properly account for VLAN filtering
	net: dsa: b53: Do not program CPU port's PVID
	mt76: usb: fix possible memory leak in mt76u_buf_free
	media: sh: migor: Include missing dma-mapping header
	vfio_pci: Enable memory accesses before calling pci_map_rom
	hwmon: (pmbus/tps53679) Fix driver info initialization in probe routine
	mdio_bus: Fix PTR_ERR() usage after initialization to constant
	KVM: PPC: Release all hardware TCE tables attached to a group
	staging: r8822be: check kzalloc return or bail
	dmaengine: mv_xor: Use correct device for DMA API
	cdc-wdm: pass return value of recover_from_urb_loss
	brcmfmac: create debugfs files for bus-specific layer
	regulator: pv88060: Fix array out-of-bounds access
	regulator: pv88080: Fix array out-of-bounds access
	regulator: pv88090: Fix array out-of-bounds access
	net: dsa: qca8k: Enable delay for RGMII_ID mode
	net/mlx5: Delete unused FPGA QPN variable
	drm/nouveau/bios/ramcfg: fix missing parentheses when calculating RON
	drm/nouveau/pmu: don't print reply values if exec is false
	drm/nouveau: fix missing break in switch statement
	driver core: Fix PM-runtime for links added during consumer probe
	ASoC: qcom: Fix of-node refcount unbalance in apq8016_sbc_parse_of()
	net: dsa: fix unintended change of bridge interface STP state
	fs/nfs: Fix nfs_parse_devname to not modify it's argument
	staging: rtlwifi: Use proper enum for return in halmac_parse_psd_data_88xx
	powerpc/64s: Fix logic when handling unknown CPU features
	NFS: Fix a soft lockup in the delegation recovery code
	perf: Copy parent's address filter offsets on clone
	perf, pt, coresight: Fix address filters for vmas with non-zero offset
	clocksource/drivers/sun5i: Fail gracefully when clock rate is unavailable
	clocksource/drivers/exynos_mct: Fix error path in timer resources initialization
	platform/x86: wmi: fix potential null pointer dereference
	NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umount
	mmc: sdhci-brcmstb: handle mmc_of_parse() errors during probe
	iommu: Fix IOMMU debugfs fallout
	ARM: 8847/1: pm: fix HYP/SVC mode mismatch when MCPM is used
	ARM: 8848/1: virt: Align GIC version check with arm64 counterpart
	ARM: 8849/1: NOMMU: Fix encodings for PMSAv8's PRBAR4/PRLAR4
	regulator: wm831x-dcdc: Fix list of wm831x_dcdc_ilim from mA to uA
	ath10k: Fix length of wmi tlv command for protected mgmt frames
	netfilter: nft_set_hash: fix lookups with fixed size hash on big endian
	netfilter: nft_set_hash: bogus element self comparison from deactivation path
	net: sched: act_csum: Fix csum calc for tagged packets
	hwrng: bcm2835 - fix probe as platform device
	iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm()
	NFS: Add missing encode / decode sequence_maxsz to v4.2 operations
	NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE()
	net: aquantia: fixed instack structure overflow
	powerpc/mm: Check secondary hash page table
	media: dvb/earth-pt1: fix wrong initialization for demod blocks
	rbd: clear ->xferred on error from rbd_obj_issue_copyup()
	PCI: Fix "try" semantics of bus and slot reset
	nios2: ksyms: Add missing symbol exports
	x86/mm: Remove unused variable 'cpu'
	scsi: megaraid_sas: reduce module load time
	nfp: fix simple vNIC mailbox length
	drivers/rapidio/rio_cm.c: fix potential oops in riocm_ch_listen()
	xen, cpu_hotplug: Prevent an out of bounds access
	net/mlx5: Fix multiple updates of steering rules in parallel
	net/mlx5e: IPoIB, Fix RX checksum statistics update
	net: sh_eth: fix a missing check of of_get_phy_mode
	regulator: lp87565: Fix missing register for LP87565_BUCK_0
	soc: amlogic: gx-socinfo: Add mask for each SoC packages
	media: ivtv: update *pos correctly in ivtv_read_pos()
	media: cx18: update *pos correctly in cx18_read_pos()
	media: wl128x: Fix an error code in fm_download_firmware()
	media: cx23885: check allocation return
	regulator: tps65086: Fix tps65086_ldoa1_ranges for selector 0xB
	crypto: ccree - reduce kernel stack usage with clang
	jfs: fix bogus variable self-initialization
	tipc: tipc clang warning
	m68k: mac: Fix VIA timer counter accesses
	ARM: dts: sun8i: a33: Reintroduce default pinctrl muxing
	arm64: dts: allwinner: a64: Add missing PIO clocks
	ARM: dts: sun9i: optimus: Fix fixed-regulators
	net: phy: don't clear BMCR in genphy_soft_reset
	ARM: OMAP2+: Fix potentially uninitialized return value for _setup_reset()
	net: dsa: Avoid null pointer when failing to connect to PHY
	soc: qcom: cmd-db: Fix an error code in cmd_db_dev_probe()
	media: davinci-isif: avoid uninitialized variable use
	media: tw5864: Fix possible NULL pointer dereference in tw5864_handle_frame
	spi: tegra114: clear packed bit for unpacked mode
	spi: tegra114: fix for unpacked mode transfers
	spi: tegra114: terminate dma and reset on transfer timeout
	spi: tegra114: flush fifos
	spi: tegra114: configure dma burst size to fifo trig level
	bus: ti-sysc: Fix sysc_unprepare() when no clocks have been allocated
	soc/fsl/qe: Fix an error code in qe_pin_request()
	spi: bcm2835aux: fix driver to not allow 65535 (=-1) cs-gpios
	drm/fb-helper: generic: Call drm_client_add() after setup is done
	arm64/vdso: don't leak kernel addresses
	rtc: Fix timestamp value for RTC_TIMESTAMP_BEGIN_1900
	rtc: mt6397: Don't call irq_dispose_mapping.
	ehea: Fix a copy-paste err in ehea_init_port_res
	bpf: Add missed newline in verifier verbose log
	drm/vmwgfx: Remove set but not used variable 'restart'
	scsi: qla2xxx: Unregister chrdev if module initialization fails
	of: use correct function prototype for of_overlay_fdt_apply()
	net/sched: cbs: fix port_rate miscalculation
	clk: qcom: Skip halt checks on gcc_pcie_0_pipe_clk for 8998
	ACPI: button: reinitialize button state upon resume
	firmware: arm_scmi: fix of_node leak in scmi_mailbox_check
	rxrpc: Fix detection of out of order acks
	scsi: target/core: Fix a race condition in the LUN lookup code
	brcmfmac: fix leak of mypkt on error return path
	ARM: pxa: ssp: Fix "WARNING: invalid free of devm_ allocated data"
	PCI: rockchip: Fix rockchip_pcie_ep_assert_intx() bitwise operations
	net: hns3: fix for vport->bw_limit overflow problem
	hwmon: (w83627hf) Use request_muxed_region for Super-IO accesses
	perf/core: Fix the address filtering fix
	staging: android: vsoc: fix copy_from_user overrun
	PCI: dwc: Fix dw_pcie_ep_find_capability() to return correct capability offset
	soc: amlogic: meson-gx-pwrc-vpu: Fix power on/off register bitmask
	platform/x86: alienware-wmi: fix kfree on potentially uninitialized pointer
	tipc: set sysctl_tipc_rmem and named_timeout right range
	usb: typec: tcpm: Notify the tcpc to start connection-detection for SRPs
	selftests/ipc: Fix msgque compiler warnings
	net: hns3: fix loop condition of hns3_get_tx_timeo_queue_info()
	powerpc: vdso: Make vdso32 installation conditional in vdso_install
	ARM: dts: ls1021: Fix SGMII PCS link remaining down after PHY disconnect
	media: ov2659: fix unbalanced mutex_lock/unlock
	6lowpan: Off by one handling ->nexthdr
	dmaengine: axi-dmac: Don't check the number of frames for alignment
	ALSA: usb-audio: Handle the error from snd_usb_mixer_apply_create_quirk()
	afs: Fix AFS file locking to allow fine grained locks
	afs: Further fix file locking
	NFS: Don't interrupt file writeout due to fatal errors
	coresight: catu: fix clang build warning
	s390/kexec_file: Fix potential segment overlap in ELF loader
	irqchip/gic-v3-its: fix some definitions of inner cacheability attributes
	scsi: qla2xxx: Fix a format specifier
	scsi: qla2xxx: Fix error handling in qlt_alloc_qfull_cmd()
	scsi: qla2xxx: Avoid that qlt_send_resp_ctio() corrupts memory
	KVM: PPC: Book3S HV: Fix lockdep warning when entering the guest
	netfilter: nft_flow_offload: add entry to flowtable after confirmation
	PCI: iproc: Enable iProc config read for PAXBv2
	ARM: dts: logicpd-som-lv: Fix MMC1 card detect
	packet: in recvmsg msg_name return at least sizeof sockaddr_ll
	ASoC: fix valid stream condition
	usb: gadget: fsl: fix link error against usb-gadget module
	dwc2: gadget: Fix completed transfer size calculation in DDMA
	IB/mlx5: Add missing XRC options to QP optional params mask
	RDMA/rxe: Consider skb reserve space based on netdev of GID
	iommu/vt-d: Make kernel parameter igfx_off work with vIOMMU
	net: ena: fix swapped parameters when calling ena_com_indirect_table_fill_entry
	net: ena: fix: Free napi resources when ena_up() fails
	net: ena: fix incorrect test of supported hash function
	net: ena: fix ena_com_fill_hash_function() implementation
	dmaengine: tegra210-adma: restore channel status
	watchdog: rtd119x_wdt: Fix remove function
	mmc: core: fix possible use after free of host
	lightnvm: pblk: fix lock order in pblk_rb_tear_down_check
	ath10k: Fix encoding for protected management frames
	afs: Fix the afs.cell and afs.volume xattr handlers
	vfio/mdev: Avoid release parent reference during error path
	vfio/mdev: Follow correct remove sequence
	vfio/mdev: Fix aborting mdev child device removal if one fails
	l2tp: Fix possible NULL pointer dereference
	ALSA: aica: Fix a long-time build breakage
	media: omap_vout: potential buffer overflow in vidioc_dqbuf()
	media: davinci/vpbe: array underflow in vpbe_enum_outputs()
	platform/x86: alienware-wmi: printing the wrong error code
	crypto: caam - fix caam_dump_sg that iterates through scatterlist
	netfilter: ebtables: CONFIG_COMPAT: reject trailing data after last rule
	pwm: meson: Consider 128 a valid pre-divider
	pwm: meson: Don't disable PWM when setting duty repeatedly
	ARM: riscpc: fix lack of keyboard interrupts after irq conversion
	nfp: bpf: fix static check error through tightening shift amount adjustment
	kdb: do a sanity check on the cpu in kdb_per_cpu()
	netfilter: nf_tables: correct NFT_LOGLEVEL_MAX value
	backlight: lm3630a: Return 0 on success in update_status functions
	thermal: rcar_gen3_thermal: fix interrupt type
	thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power
	EDAC/mc: Fix edac_mc_find() in case no device is found
	afs: Fix key leak in afs_release() and afs_evict_inode()
	afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set
	afs: Fix lock-wait/callback-break double locking
	afs: Fix double inc of vnode->cb_break
	ARM: dts: sun8i-h3: Fix wifi in Beelink X2 DT
	clk: meson: gxbb: no spread spectrum on mpll0
	clk: meson: axg: spread spectrum is on mpll2
	dmaengine: tegra210-adma: Fix crash during probe
	arm64: dts: meson: libretech-cc: set eMMC as removable
	RDMA/qedr: Fix incorrect device rate.
	spi: spi-fsl-spi: call spi_finalize_current_message() at the end
	crypto: ccp - fix AES CFB error exposed by new test vectors
	crypto: ccp - Fix 3DES complaint from ccp-crypto module
	serial: stm32: fix word length configuration
	serial: stm32: fix rx error handling
	serial: stm32: fix rx data length when parity enabled
	serial: stm32: fix transmit_chars when tx is stopped
	serial: stm32: Add support of TC bit status check
	serial: stm32: fix wakeup source initialization
	misc: sgi-xp: Properly initialize buf in xpc_get_rsvd_page_pa
	iommu: Add missing new line for dma type
	iommu: Use right function to get group for device
	signal/bpfilter: Fix bpfilter_kernl to use send_sig not force_sig
	signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig
	inet: frags: call inet_frags_fini() after unregister_pernet_subsys()
	net: hns3: fix a memory leak issue for hclge_map_unmap_ring_to_vf_vector
	crypto: talitos - fix AEAD processing.
	netvsc: unshare skb in VF rx handler
	net: core: support XDP generic on stacked devices.
	RDMA/uverbs: check for allocation failure in uapi_add_elm()
	net: don't clear sock->sk early to avoid trouble in strparser
	phy: qcom-qusb2: fix missing assignment of ret when calling clk_prepare_enable
	cpufreq: brcmstb-avs-cpufreq: Fix initial command check
	cpufreq: brcmstb-avs-cpufreq: Fix types for voltage/frequency
	clk: sunxi-ng: sun50i-h6-r: Fix incorrect W1 clock gate register
	media: vivid: fix incorrect assignment operation when setting video mode
	crypto: inside-secure - fix zeroing of the request in ahash_exit_inv
	crypto: inside-secure - fix queued len computation
	arm64: dts: renesas: ebisu: Remove renesas, no-ether-link property
	mpls: fix warning with multi-label encap
	serial: stm32: fix a recursive locking in stm32_config_rs485
	arm64: dts: meson-gxm-khadas-vim2: fix gpio-keys-polled node
	arm64: dts: meson-gxm-khadas-vim2: fix Bluetooth support
	iommu/vt-d: Duplicate iommu_resv_region objects per device list
	phy: usb: phy-brcm-usb: Remove sysfs attributes upon driver removal
	firmware: arm_scmi: fix bitfield definitions for SENSOR_DESC attributes
	firmware: arm_scmi: update rate_discrete in clock_describe_rates_get
	ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
	ASoC: meson: axg-tdmin: right_j is not supported
	ASoC: meson: axg-tdmout: right_j is not supported
	qed: iWARP - Use READ_ONCE and smp_store_release to access ep->state
	qed: iWARP - fix uninitialized callback
	powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild
	powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration
	bpf: fix the check that forwarding is enabled in bpf_ipv6_fib_lookup
	IB/hfi1: Handle port down properly in pio
	drm/msm/mdp5: Fix mdp5_cfg_init error return
	net: netem: fix backlog accounting for corrupted GSO frames
	net/udp_gso: Allow TX timestamp with UDP GSO
	net/af_iucv: build proper skbs for HiperTransport
	net/af_iucv: always register net_device notifier
	ASoC: ti: davinci-mcasp: Fix slot mask settings when using multiple AXRs
	rtc: pcf8563: Fix interrupt trigger method
	rtc: pcf8563: Clear event flags and disable interrupts before requesting irq
	ARM: dts: iwg20d-q7-common: Fix SDHI1 VccQ regularor
	net/sched: cbs: Fix error path of cbs_module_init
	arm64: dts: allwinner: h6: Pine H64: Add interrupt line for RTC
	drm/msm/a3xx: remove TPL1 regs from snapshot
	ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1()
	perf/ioctl: Add check for the sample_period value
	dmaengine: hsu: Revert "set HSU_CH_MTSR to memory width"
	clk: qcom: Fix -Wunused-const-variable
	nvmem: imx-ocotp: Ensure WAIT bits are preserved when setting timing
	nvmem: imx-ocotp: Change TIMING calculation to u-boot algorithm
	tools: bpftool: use correct argument in cgroup errors
	backlight: pwm_bl: Fix heuristic to determine number of brightness levels
	fork,memcg: alloc_thread_stack_node needs to set tsk->stack
	bnxt_en: Fix ethtool selftest crash under error conditions.
	bnxt_en: Suppress error messages when querying DSCP DCB capabilities.
	iommu/amd: Make iommu_disable safer
	mfd: intel-lpss: Release IDA resources
	rxrpc: Fix uninitialized error code in rxrpc_send_data_packet()
	xprtrdma: Fix use-after-free in rpcrdma_post_recvs
	um: Fix IRQ controller regression on console read
	PM: ACPI/PCI: Resume all devices during hibernation
	ACPI: PM: Simplify and fix PM domain hibernation callbacks
	ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS
	fsi/core: Fix error paths on CFAM init
	devres: allow const resource arguments
	fsi: sbefifo: Don't fail operations when in SBE IPL state
	RDMA/hns: Fixs hw access invalid dma memory error
	PCI: mobiveil: Remove the flag MSI_FLAG_MULTI_PCI_MSI
	PCI: mobiveil: Fix devfn check in mobiveil_pcie_valid_device()
	PCI: mobiveil: Fix the valid check for inbound and outbound windows
	ceph: fix "ceph.dir.rctime" vxattr value
	net: pasemi: fix an use-after-free in pasemi_mac_phy_init()
	net/tls: fix socket wmem accounting on fallback with netem
	x86/pgtable/32: Fix LOWMEM_PAGES constant
	xdp: fix possible cq entry leak
	ARM: stm32: use "depends on" instead of "if" after prompt
	scsi: libfc: fix null pointer dereference on a null lport
	xfrm interface: ifname may be wrong in logs
	drm/panel: make drm_panel.h self-contained
	clk: sunxi-ng: v3s: add the missing PLL_DDR1
	PM: sleep: Fix possible overflow in pm_system_cancel_wakeup()
	libertas_tf: Use correct channel range in lbtf_geo_init
	qed: reduce maximum stack frame size
	usb: host: xhci-hub: fix extra endianness conversion
	media: rcar-vin: Clean up correct notifier in error path
	mic: avoid statically declaring a 'struct device'.
	x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI
	crypto: ccp - Reduce maximum stack usage
	ALSA: aoa: onyx: always initialize register read value
	arm64: dts: renesas: r8a77995: Fix register range of display node
	tipc: reduce risk of wakeup queue starvation
	ARM: dts: stm32: add missing vdda-supply to adc on stm32h743i-eval
	net/mlx5: Fix mlx5_ifc_query_lag_out_bits
	cifs: fix rmmod regression in cifs.ko caused by force_sig changes
	iio: tsl2772: Use devm_add_action_or_reset for tsl2772_chip_off
	net: fix bpf_xdp_adjust_head regression for generic-XDP
	spi: bcm-qspi: Fix BSPI QUAD and DUAL mode support when using flex mode
	cxgb4: smt: Add lock for atomic_dec_and_test
	crypto: caam - free resources in case caam_rng registration failed
	ext4: set error return correctly when ext4_htree_store_dirent fails
	RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver
	RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver
	ASoC: es8328: Fix copy-paste error in es8328_right_line_controls
	ASoC: cs4349: Use PM ops 'cs4349_runtime_pm'
	ASoC: wm8737: Fix copy-paste error in wm8737_snd_controls
	net/rds: Add a few missing rds_stat_names entries
	tools: bpftool: fix arguments for p_err() in do_event_pipe()
	tools: bpftool: fix format strings and arguments for jsonw_printf()
	drm: rcar-du: lvds: Fix bridge_to_rcar_lvds
	bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
	signal: Allow cifs and drbd to receive their terminating signals
	powerpc/64s/radix: Fix memory hot-unplug page table split
	ASoC: sun4i-i2s: RX and TX counter registers are swapped
	dmaengine: dw: platform: Switch to acpi_dma_controller_register()
	rtc: rv3029: revert error handling patch to rv3029_eeprom_write()
	mac80211: minstrel_ht: fix per-group max throughput rate initialization
	i40e: reduce stack usage in i40e_set_fc
	media: atmel: atmel-isi: fix timeout value for stop streaming
	ARM: 8896/1: VDSO: Don't leak kernel addresses
	rtc: pcf2127: bugfix: read rtc disables watchdog
	mips: avoid explicit UB in assignment of mips_io_port_base
	media: em28xx: Fix exception handling in em28xx_alloc_urbs()
	iommu/mediatek: Fix iova_to_phys PA start for 4GB mode
	ahci: Do not export local variable ahci_em_messages
	rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]
	Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()"
	hwmon: (lm75) Fix write operations for negative temperatures
	net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
	power: supply: Init device wakeup after device_add()
	x86, perf: Fix the dependency of the x86 insn decoder selftest
	staging: greybus: light: fix a couple double frees
	irqdomain: Add the missing assignment of domain->fwnode for named fwnode
	bcma: fix incorrect update of BCMA_CORE_PCI_MDIO_DATA
	usb: typec: tps6598x: Fix build error without CONFIG_REGMAP_I2C
	bcache: Fix an error code in bch_dump_read()
	iio: dac: ad5380: fix incorrect assignment to val
	netfilter: ctnetlink: honor IPS_OFFLOAD flag
	ath9k: dynack: fix possible deadlock in ath_dynack_node_{de}init
	wcn36xx: use dynamic allocation for large variables
	tty: serial: fsl_lpuart: Use appropriate lpuart32_* I/O funcs
	ARM: dts: aspeed-g5: Fixe gpio-ranges upper limit
	xsk: avoid store-tearing when assigning queues
	xsk: avoid store-tearing when assigning umem
	led: triggers: Fix dereferencing of null pointer
	net: sonic: return NETDEV_TX_OK if failed to map buffer
	net: hns3: fix error VF index when setting VLAN offload
	rtlwifi: Fix file release memory leak
	ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
	f2fs: fix wrong error injection path in inc_valid_block_count()
	f2fs: fix error path of f2fs_convert_inline_page()
	scsi: fnic: fix msix interrupt allocation
	Btrfs: fix hang when loading existing inode cache off disk
	Btrfs: fix inode cache waiters hanging on failure to start caching thread
	Btrfs: fix inode cache waiters hanging on path allocation failure
	btrfs: use correct count in btrfs_file_write_iter()
	ixgbe: sync the first fragment unconditionally
	hwmon: (shtc1) fix shtc1 and shtw1 id mask
	net: sonic: replace dev_kfree_skb in sonic_send_packet
	pinctrl: iproc-gpio: Fix incorrect pinconf configurations
	gpio/aspeed: Fix incorrect number of banks
	ath10k: adjust skb length in ath10k_sdio_mbox_rx_packet
	RDMA/cma: Fix false error message
	net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names'
	um: Fix off by one error in IRQ enumeration
	bnxt_en: Increase timeout for HWRM_DBG_COREDUMP_XX commands
	f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
	mailbox: qcom-apcs: fix max_register value
	clk: actions: Fix factor clk struct member access
	powerpc/mm/mce: Keep irqs disabled during lockless page table walk
	bpf: fix BTF limits
	crypto: hisilicon - Matching the dma address for dma_pool_free()
	iommu/amd: Wait for completion of IOTLB flush in attach_device
	net: aquantia: Fix aq_vec_isr_legacy() return value
	cxgb4: Signedness bug in init_one()
	net: hisilicon: Fix signedness bug in hix5hd2_dev_probe()
	net: broadcom/bcmsysport: Fix signedness in bcm_sysport_probe()
	net: netsec: Fix signedness bug in netsec_probe()
	net: socionext: Fix a signedness bug in ave_probe()
	net: stmmac: dwmac-meson8b: Fix signedness bug in probe
	net: axienet: fix a signedness bug in probe
	of: mdio: Fix a signedness bug in of_phy_get_and_connect()
	net: nixge: Fix a signedness bug in nixge_probe()
	net: ethernet: stmmac: Fix signedness bug in ipq806x_gmac_of_parse()
	net: sched: cbs: Avoid division by zero when calculating the port rate
	nvme: retain split access workaround for capability reads
	net: stmmac: gmac4+: Not all Unicast addresses may be available
	rxrpc: Fix trace-after-put looking at the put connection record
	mac80211: accept deauth frames in IBSS mode
	llc: fix another potential sk_buff leak in llc_ui_sendmsg()
	llc: fix sk_buff refcounting in llc_conn_state_process()
	ip6erspan: remove the incorrect mtu limit for ip6erspan
	net: stmmac: fix length of PTP clock's name string
	net: stmmac: fix disabling flexible PPS output
	sctp: add chunks to sk_backlog when the newsk sk_socket is not set
	s390/qeth: Fix error handling during VNICC initialization
	s390/qeth: Fix initialization of vnicc cmd masks during set online
	act_mirred: Fix mirred_init_module error handling
	net: avoid possible false sharing in sk_leave_memory_pressure()
	net: add {READ|WRITE}_ONCE() annotations on ->rskq_accept_head
	tcp: annotate lockless access to tcp_memory_pressure
	net/smc: receive returns without data
	net/smc: receive pending data after RCV_SHUTDOWN
	drm/msm/dsi: Implement reset correctly
	vhost/test: stop device before reset
	dmaengine: imx-sdma: fix size check for sdma script_number
	firmware: dmi: Fix unlikely out-of-bounds read in save_mem_devices
	arm64: hibernate: check pgd table allocation
	net: netem: fix error path for corrupted GSO frames
	net: netem: correct the parent's backlog when corrupted packet was dropped
	xsk: Fix registration of Rx-only sockets
	bpf, offload: Unlock on error in bpf_offload_dev_create()
	afs: Fix missing timeout reset
	net: qca_spi: Move reset_count to struct qcaspi
	hv_netvsc: Fix offset usage in netvsc_send_table()
	hv_netvsc: Fix send_table offset in case of a host bug
	afs: Fix large file support
	drm: panel-lvds: Potential Oops in probe error handling
	hwrng: omap3-rom - Fix missing clock by probing with device tree
	dpaa_eth: perform DMA unmapping before read
	dpaa_eth: avoid timestamp read on error paths
	MIPS: Loongson: Fix return value of loongson_hwmon_init
	hv_netvsc: flag software created hash value
	net: neigh: use long type to store jiffies delta
	packet: fix data-race in fanout_flow_is_huge()
	i2c: stm32f7: report dma error during probe
	mmc: sdio: fix wl1251 vendor id
	mmc: core: fix wl1251 sdio quirks
	affs: fix a memory leak in affs_remount
	afs: Remove set but not used variables 'before', 'after'
	dmaengine: ti: edma: fix missed failure handling
	drm/radeon: fix bad DMA from INTERRUPT_CNTL2
	arm64: dts: juno: Fix UART frequency
	samples/bpf: Fix broken xdp_rxq_info due to map order assumptions
	usb: dwc3: Allow building USB_DWC3_QCOM without EXTCON
	IB/iser: Fix dma_nents type definition
	serial: stm32: fix clearing interrupt error flags
	arm64: dts: meson-gxm-khadas-vim2: fix uart_A bluetooth node
	m68k: Call timer_interrupt() with interrupts disabled
	Linux 4.19.99

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ieabeab79ea5c8cb4b6b1552702fa5d6100cea5db
2020-01-27 15:55:44 +01:00
Dan Carpenter
4622676d8f bpf, offload: Unlock on error in bpf_offload_dev_create()
[ Upstream commit d0fbb51dfaa612f960519b798387be436e8f83c5 ]

We need to drop the bpf_devs_lock on error before returning.

Fixes: 9fd7c55591 ("bpf: offload: aggregate offloads per-device")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Link: https://lore.kernel.org/bpf/20191104091536.GB31509@mwanda
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:51:20 +01:00
Dexuan Cui
3f929fe0ac irqdomain: Add the missing assignment of domain->fwnode for named fwnode
[ Upstream commit 711419e504ebd68c8f03656616829c8ad7829389 ]

Recently device pass-through stops working for Linux VM running on Hyper-V.

git-bisect shows the regression is caused by the recent commit
467a3bb97432 ("PCI: hv: Allocate a named fwnode ..."), but the root cause
is that the commit d59f6617ee forgets to set the domain->fwnode for
IRQCHIP_FWNODE_NAMED*, and as a result:

1. The domain->fwnode remains to be NULL.

2. irq_find_matching_fwspec() returns NULL since "h->fwnode == fwnode" is
false, and pci_set_bus_msi_domain() sets the Hyper-V PCI root bus's
msi_domain to NULL.

3. When the device is added onto the root bus, the device's dev->msi_domain
is set to NULL in pci_set_msi_domain().

4. When a device driver tries to enable MSI-X, pci_msi_setup_msi_irqs()
calls arch_setup_msi_irqs(), which uses the native MSI chip (i.e.
arch/x86/kernel/apic/msi.c: pci_msi_controller) to set up the irqs, but
actually pci_msi_setup_msi_irqs() is supposed to call
msi_domain_alloc_irqs() with the hbus->irq_domain, which is created in
hv_pcie_init_irq_domain() and is associated with the Hyper-V chip
hv_msi_irq_chip. Consequently, the irq line is not properly set up, and
the device driver can not receive any interrupt.

Fixes: d59f6617ee ("genirq: Allow fwnode to carry name information only")
Fixes: 467a3bb97432 ("PCI: hv: Allocate a named fwnode instead of an address-based one")
Reported-by: Lili Deng <v-lide@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/PU1P153MB01694D9AF625AC335C600C5FBFBE0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:51:09 +01:00
Eric W. Biederman
6db0e28b89 signal: Allow cifs and drbd to receive their terminating signals
[ Upstream commit 33da8e7c814f77310250bb54a9db36a44c5de784 ]

My recent to change to only use force_sig for a synchronous events
wound up breaking signal reception cifs and drbd.  I had overlooked
the fact that by default kthreads start out with all signals set to
SIG_IGN.  So a change I thought was safe turned out to have made it
impossible for those kernel thread to catch their signals.

Reverting the work on force_sig is a bad idea because what the code
was doing was very much a misuse of force_sig.  As the way force_sig
ultimately allowed the signal to happen was to change the signal
handler to SIG_DFL.  Which after the first signal will allow userspace
to send signals to these kernel threads.  At least for
wake_ack_receiver in drbd that does not appear actively wrong.

So correct this problem by adding allow_kernel_signal that will allow
signals whose siginfo reports they were sent by the kernel through,
but will not allow userspace generated signals, and update cifs and
drbd to call allow_kernel_signal in an appropriate place so that their
thread can receive this signal.

Fixing things this way ensures that userspace won't be able to send
signals and cause problems, that it is clear which signals the
threads are expecting to receive, and it guarantees that nothing
else in the system will be affected.

This change was partly inspired by similar cifs and drbd patches that
added allow_signal.

Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com>
Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Cc: Steve French <smfrench@gmail.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes")
Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig")
Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:51:05 +01:00
Andrea Arcangeli
fde68698dd fork,memcg: alloc_thread_stack_node needs to set tsk->stack
[ Upstream commit 1bf4580e00a248a2c86269125390eb3648e1877c ]

Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on
memcg charge fail") corrected two instances, but there was a third
instance of this bug.

Without setting tsk->stack, if memcg_charge_kernel_stack fails, it'll
execute free_thread_stack() on a dangling pointer.

Enterprise kernels are compiled with VMAP_STACK=y so this isn't
critical, but custom VMAP_STACK=n builds should have some performance
advantage, with the drawback of risking to fail fork because compaction
didn't succeed.  So as long as VMAP_STACK=n is a supported option it's
worth fixing it upstream.

Link: http://lkml.kernel.org/r/20190619011450.28048-1-aarcange@redhat.com
Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:58 +01:00
Ravi Bangoria
574fe4c9a3 perf/ioctl: Add check for the sample_period value
[ Upstream commit 913a90bc5a3a06b1f04c337320e9aeee2328dd77 ]

perf_event_open() limits the sample_period to 63 bits. See:

  0819b2e30c ("perf: Limit perf_event_attr::sample_period to 63 bits")

Make ioctl() consistent with it.

Also on PowerPC, negative sample_period could cause a recursive
PMIs leading to a hang (reported when running perf-fuzzer).

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: maddy@linux.vnet.ibm.com
Cc: mpe@ellerman.id.au
Fixes: 0819b2e30c ("perf: Limit perf_event_attr::sample_period to 63 bits")
Link: https://lkml.kernel.org/r/20190604042953.914-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:57 +01:00
Dan Carpenter
9245e019e5 kdb: do a sanity check on the cpu in kdb_per_cpu()
[ Upstream commit b586627e10f57ee3aa8f0cfab0d6f7dc4ae63760 ]

The "whichcpu" comes from argv[3].  The cpu_online() macro looks up the
cpu in a bitmap of online cpus, but if the value is too high then it
could read beyond the end of the bitmap and possibly Oops.

Fixes: 5d5314d679 ("kdb: core for kgdb back end (1 of 2)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:48 +01:00
Alexander Shishkin
d6ef9a8fd8 perf/core: Fix the address filtering fix
[ Upstream commit 52a44f83fc2d64a5e74d5d685fad2fecc7b7a321 ]

The following recent commit:

  c60f83b813e5 ("perf, pt, coresight: Fix address filters for vmas with non-zero offset")

changes the address filtering logic to communicate filter ranges to the PMU driver
via a single address range object, instead of having the driver do the final bit of
math.

That change forgets to take into account kernel filters, which are not calculated
the same way as DSO based filters.

Fix that by passing the kernel filters the same way as file-based filters.
This doesn't require any additional changes in the drivers.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: c60f83b813e5 ("perf, pt, coresight: Fix address filters for vmas with non-zero offset")
Link: https://lkml.kernel.org/r/20190329091212.29870-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:39 +01:00
Andrey Ignatov
462c72919b bpf: Add missed newline in verifier verbose log
[ Upstream commit 1fbd20f8b77b366ea4aeb92ade72daa7f36a7e3b ]

check_stack_access() that prints verbose log is used in
adjust_ptr_min_max_vals() that prints its own verbose log and now they
stick together, e.g.:

  variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16
  size=1R2 stack pointer arithmetic goes out of range, prohibited for
  !root

Add missing newline so that log is more readable:
  variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1
  R2 stack pointer arithmetic goes out of range, prohibited for !root

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:37 +01:00
Alexander Shishkin
b34abf24f2 perf, pt, coresight: Fix address filters for vmas with non-zero offset
[ Upstream commit c60f83b813e5b25ccd5de7e8c8925c31b3aebcc1 ]

Currently, the address range calculation for file-based filters works as
long as the vma that maps the matching part of the object file starts
from offset zero into the file (vm_pgoff==0). Otherwise, the resulting
filter range would be off by vm_pgoff pages. Another related problem is
that in case of a partially matching vma, that is, a vma that matches
part of a filter region, the filter range size wouldn't be adjusted.

Fix the arithmetics around address filter range calculations, taking
into account vma offset, so that the entire calculation is done before
the filter configuration is passed to the PMU drivers instead of having
those drivers do the final bit of arithmetics.

Based on the patch by Adrian Hunter <adrian.hunter.intel.com>.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: 375637bc52 ("perf/core: Introduce address range filtering")
Link: http://lkml.kernel.org/r/20190215115655.63469-3-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:27 +01:00
Alexander Shishkin
673f190df0 perf: Copy parent's address filter offsets on clone
[ Upstream commit 18736eef12137c59f60cc9f56dc5bea05c92e0eb ]

When a child event is allocated in the inherit_event() path, the VMA
based filter offsets are not copied from the parent, even though the
address space mapping of the new task remains the same, which leads to
no trace for the new task until exec.

Reported-by: Mansour Alharthi <malharthi9@gatech.edu>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: 375637bc52 ("perf/core: Introduce address range filtering")
Link: http://lkml.kernel.org/r/20190215115655.63469-2-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:27 +01:00
Shakeel Butt
3ed8ca4d29 fork, memcg: fix cached_stacks case
[ Upstream commit ba4a45746c362b665e245c50b870615f02f34781 ]

Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on
memcg charge fail") fixes a crash caused due to failed memcg charge of
the kernel stack.  However the fix misses the cached_stacks case which
this patch fixes.  So, the same crash can happen if the memcg charge of
a cached stack is failed.

Link: http://lkml.kernel.org/r/20190102180145.57406-1-shakeelb@google.com
Fixes: 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:11 +01:00
Rik van Riel
641164565b fork,memcg: fix crash in free_thread_stack on memcg charge fail
[ Upstream commit 5eed6f1dff87bfb5e545935def3843edf42800f2 ]

Commit 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") will
result in fork failing if allocating a kernel stack for a task in
dup_task_struct exceeds the kernel memory allowance for that cgroup.

Unfortunately, it also results in a crash.

This is due to the code jumping to free_stack and calling
free_thread_stack when the memcg kernel stack charge fails, but without
tsk->stack pointing at the freshly allocated stack.

This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:

#5 [ffffc900244efc88] die at ffffffff8101f0ab
 #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
 #7 [ffffc900244efce0] general_protection at ffffffff818ff082
    [exception RIP: llist_add_batch+7]
    RIP: ffffffff8150d487  RSP: ffffc900244efd98  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffff88085ef55980  RCX: 0000000000000000
    RDX: ffff88085ef55980  RSI: 343834343531203a  RDI: 343834343531203a
    RBP: ffffc900244efd98   R8: 0000000000000001   R9: ffff8808578c3600
    R10: 0000000000000000  R11: 0000000000000001  R12: ffff88029f6c21c0
    R13: 0000000000000286  R14: ffff880147759b00  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
    RIP: 000000000049b948  RSP: 00007ffcdb307830  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000896030  RCX: 000000000049b948
    RDX: 0000000000000000  RSI: 00007ffcdb307790  RDI: 00000000005d7421
    RBP: 000000000067370f   R8: 00007ffcdb3077b0   R9: 000000000001ed00
    R10: 0000000000000008  R11: 0000000000000246  R12: 0000000000000040
    R13: 000000000000000f  R14: 0000000000000000  R15: 000000000088d018
    ORIG_RAX: 000000000000003a  CS: 0033  SS: 002b

The simplest fix is to assign tsk->stack right where it is allocated.

Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:08 +01:00
Marc Zyngier
c153dcfc29 genirq/debugfs: Reinstate full OF path for domain name
[ Upstream commit 94967b55ebf3b603f2fe750ecedd896042585a1c ]

On a DT based system, we use the of_node full name to name the
corresponding irq domain. We expect that name to be unique, so so that
domains with the same base name won't clash (this happens on multi-node
topologies, for example).

Since a7e4cfb0a7 ("of/fdt: only store the device node basename in
full_name"), of_node_full_name() lies and only returns the basename. This
breaks the above requirement, and we end-up with only a subset of the
domains in /sys/kernel/debug/irq/domains.

Let's reinstate the feature by using the fancy new %pOF format specifier,
which happens to do the right thing.

Fixes: a7e4cfb0a7 ("of/fdt: only store the device node basename in full_name")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20181001100522.180054-3-marc.zyngier@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:49:56 +01:00
Jeff Vander Stoep
025a1ee618 Revert "ANDROID: security,perf: Allow further restriction of perf_event_open"
Unfork Android.

This reverts commit 8e5e42d5ae.

Perf_event_paranoid=3 is no longer needed on Android. Access control
of perf events is now done by selinux. See:
https://patchwork.kernel.org/patch/11185793/

Bug: 120445712
Bug: 137092007
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: Iba493424174b30baff460caaa25a54a472c87bd4
2020-01-23 22:02:46 +00:00
Kusanagi Kouichi
3ccf82e899 BACKPORT: tracing: Remove unnecessary DEBUG_FS dependency
Tracing replaced debugfs with tracefs.

Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20191120104350753.EWCT.12796.ppp.dion.ne.jp@dmta0009.auone-net.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0e4a459f56c32d3e52ae69a4b447db2f48a65f44)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Id61dddcb804cf7a5d62d2d04a455d8b84097c967
2020-01-23 11:16:22 +01:00
Greg Kroah-Hartman
8cb4870403 This is the 4.19.98 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4pSYMACgkQONu9yGCS
 aT7Rkg/8C/AXaTp+2HxRj3ZO56uzpMBMb5duBzdzxnEnvFp+DIM7xxRX+NFI5CSK
 4rjnxMd2tPsFtqiWo/bBCUcHh9gu5HJKOMFRZGaRYAXvJ/8hgahgzkBE00JiAB6r
 mrk9Y/pwcKxMFsAHtu3xM0oENeefXOmavVTHc9N3DQLd3hNuyTrPztBMFaDg8djR
 pSwh1uE2G+Z2UOdi2kXmHiEIG6NViIqp+qFYI5CUIyeKfvOEsR5nSQ97LyNQ+dUX
 qshARQFuk78+Ax+GNPTQXiWdzN7+SH5aw5frFtdhAN90F+XrRDj4ZXw+EkX+/M2J
 NZU9P/v41ESG8RWxbAZ6osAUkQ4Dgq2BQpdyRxNNjTchXc0Kr4K6BCKuhY6cGxS7
 0PXPV7MsuAHYIrIvzG2lqif9gmknA0UrGVKuYJIZxBaWlHD2mEkFby0W0HIcBwir
 yKKK3fkFjmsGKYzh+VZVoGySWDbs7qYASWXHOCz0QCLb0CT8/ePbyxLdjY7u5KyX
 wDaDHXG9nm6Nu68HD/9CRnUkiK8dnsODZ0k+sBZfEa+xvHPJCdv3gnrf4SwU7dj7
 ZyhO9XkFzncOJDoxYxiXTfI+zbU1ZhaDw7fk2PFvAI6P1xRS3m6rp8pDWp8iw/MX
 92Sz1YzS68+otHLi+OBGxzu10PwMDtu2nUvqn68SYq6Rp0mZnnE=
 =2O94
 -----END PGP SIGNATURE-----

Merge 4.19.98 into android-4.19

Changes in 4.19.98
	ARM: dts: meson8: fix the size of the PMU registers
	clk: qcom: gcc-sdm845: Add missing flag to votable GDSCs
	dt-bindings: reset: meson8b: fix duplicate reset IDs
	ARM: dts: imx6q-dhcom: fix rtc compatible
	clk: Don't try to enable critical clocks if prepare failed
	ASoC: msm8916-wcd-digital: Reset RX interpolation path after use
	iio: buffer: align the size of scan bytes to size of the largest element
	USB: serial: simple: Add Motorola Solutions TETRA MTP3xxx and MTP85xx
	USB: serial: option: Add support for Quectel RM500Q
	USB: serial: opticon: fix control-message timeouts
	USB: serial: option: add support for Quectel RM500Q in QDL mode
	USB: serial: suppress driver bind attributes
	USB: serial: ch341: handle unbound port at reset_resume
	USB: serial: io_edgeport: handle unbound ports on URB completion
	USB: serial: io_edgeport: add missing active-port sanity check
	USB: serial: keyspan: handle unbound ports
	USB: serial: quatech2: handle unbound ports
	scsi: fnic: fix invalid stack access
	scsi: mptfusion: Fix double fetch bug in ioctl
	ASoC: msm8916-wcd-analog: Fix selected events for MIC BIAS External1
	ASoC: msm8916-wcd-analog: Fix MIC BIAS Internal1
	ARM: dts: imx6q-dhcom: Fix SGTL5000 VDDIO regulator connection
	ALSA: dice: fix fallback from protocol extension into limited functionality
	ALSA: seq: Fix racy access for queue timer in proc read
	ALSA: usb-audio: fix sync-ep altsetting sanity check
	arm64: dts: allwinner: a64: olinuxino: Fix SDIO supply regulator
	Fix built-in early-load Intel microcode alignment
	block: fix an integer overflow in logical block size
	ARM: dts: am571x-idk: Fix gpios property to have the correct gpio number
	LSM: generalize flag passing to security_capable
	ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()
	usb: core: hub: Improved device recognition on remote wakeup
	x86/resctrl: Fix an imbalance in domain_remove_cpu()
	x86/CPU/AMD: Ensure clearing of SME/SEV features is maintained
	x86/efistub: Disable paging at mixed mode entry
	drm/i915: Add missing include file <linux/math64.h>
	x86/resctrl: Fix potential memory leak
	perf hists: Fix variable name's inconsistency in hists__for_each() macro
	perf report: Fix incorrectly added dimensions as switch perf data file
	mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment
	mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid
	btrfs: rework arguments of btrfs_unlink_subvol
	btrfs: fix invalid removal of root ref
	btrfs: do not delete mismatched root refs
	btrfs: fix memory leak in qgroup accounting
	mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
	ARM: dts: imx6qdl: Add Engicam i.Core 1.5 MX6
	ARM: dts: imx6q-icore-mipi: Use 1.5 version of i.Core MX6DL
	ARM: dts: imx7: Fix Toradex Colibri iMX7S 256MB NAND flash support
	net: stmmac: 16KB buffer must be 16 byte aligned
	net: stmmac: Enable 16KB buffer size
	mm/huge_memory.c: make __thp_get_unmapped_area static
	mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment
	arm64: dts: agilex/stratix10: fix pmu interrupt numbers
	bpf: Fix incorrect verifier simulation of ARSH under ALU32
	cfg80211: fix deadlocks in autodisconnect work
	cfg80211: fix memory leak in cfg80211_cqm_rssi_update
	cfg80211: fix page refcount issue in A-MSDU decap
	netfilter: fix a use-after-free in mtype_destroy()
	netfilter: arp_tables: init netns pointer in xt_tgdtor_param struct
	netfilter: nft_tunnel: fix null-attribute check
	netfilter: nf_tables: remove WARN and add NLA_STRING upper limits
	netfilter: nf_tables: store transaction list locally while requesting module
	netfilter: nf_tables: fix flowtable list del corruption
	NFC: pn533: fix bulk-message timeout
	batman-adv: Fix DAT candidate selection on little endian systems
	macvlan: use skb_reset_mac_header() in macvlan_queue_xmit()
	hv_netvsc: Fix memory leak when removing rndis device
	net: dsa: tag_qca: fix doubled Tx statistics
	net: hns: fix soft lockup when there is not enough memory
	net: usb: lan78xx: limit size of local TSO packets
	net/wan/fsl_ucc_hdlc: fix out of bounds write on array utdm_info
	ptp: free ptp device pin descriptors properly
	r8152: add missing endpoint sanity check
	tcp: fix marked lost packets not being retransmitted
	sh_eth: check sh_eth_cpu_data::dual_port when dumping registers
	mlxsw: spectrum: Wipe xstats.backlog of down ports
	mlxsw: spectrum_qdisc: Include MC TCs in Qdisc counters
	xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk
	tcp: refine rule to allow EPOLLOUT generation under mem pressure
	irqchip: Place CONFIG_SIFIVE_PLIC into the menu
	cw1200: Fix a signedness bug in cw1200_load_firmware()
	arm64: dts: meson-gxl-s905x-khadas-vim: fix gpio-keys-polled node
	cfg80211: check for set_wiphy_params
	tick/sched: Annotate lockless access to last_jiffies_update
	arm64: dts: marvell: Fix CP110 NAND controller node multi-line comment alignment
	Revert "arm64: dts: juno: add dma-ranges property"
	mtd: devices: fix mchp23k256 read and write
	drm/nouveau/bar/nv50: check bar1 vmm return value
	drm/nouveau/bar/gf100: ensure BAR is mapped
	drm/nouveau/mmu: qualify vmm during dtor
	reiserfs: fix handling of -EOPNOTSUPP in reiserfs_for_each_xattr
	scsi: esas2r: unlock on error in esas2r_nvram_read_direct()
	scsi: qla4xxx: fix double free bug
	scsi: bnx2i: fix potential use after free
	scsi: target: core: Fix a pr_debug() argument
	scsi: qla2xxx: Fix qla2x00_request_irqs() for MSI
	scsi: qla2xxx: fix rports not being mark as lost in sync fabric scan
	scsi: core: scsi_trace: Use get_unaligned_be*()
	perf probe: Fix wrong address verification
	clk: sprd: Use IS_ERR() to validate the return value of syscon_regmap_lookup_by_phandle()
	regulator: ab8500: Remove SYSCLKREQ from enum ab8505_regulator_id
	hwmon: (pmbus/ibm-cffps) Switch LEDs to blocking brightness call
	Linux 4.19.98

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I74a43a9e60734aec6d24b10374ba97de89172eca
2020-01-23 08:36:16 +01:00
Eric Dumazet
a31889a691 tick/sched: Annotate lockless access to last_jiffies_update
commit de95a991bb72e009f47e0c4bbc90fc5f594588d5 upstream.

syzbot (KCSAN) reported a data-race in tick_do_update_jiffies64():

BUG: KCSAN: data-race in tick_do_update_jiffies64 / tick_do_update_jiffies64

write to 0xffffffff8603d008 of 8 bytes by interrupt on cpu 1:
 tick_do_update_jiffies64+0x100/0x250 kernel/time/tick-sched.c:73
 tick_sched_do_timer+0xd4/0xe0 kernel/time/tick-sched.c:138
 tick_sched_timer+0x43/0xe0 kernel/time/tick-sched.c:1292
 __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
 __hrtimer_run_queues+0x274/0x5f0 kernel/time/hrtimer.c:1576
 hrtimer_interrupt+0x22a/0x480 kernel/time/hrtimer.c:1638
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
 smp_apic_timer_interrupt+0xdc/0x280 arch/x86/kernel/apic/apic.c:1135
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
 arch_local_irq_restore arch/x86/include/asm/paravirt.h:756 [inline]
 kcsan_setup_watchpoint+0x1d4/0x460 kernel/kcsan/core.c:436
 check_access kernel/kcsan/core.c:466 [inline]
 __tsan_read1 kernel/kcsan/core.c:593 [inline]
 __tsan_read1+0xc2/0x100 kernel/kcsan/core.c:593
 kallsyms_expand_symbol.constprop.0+0x70/0x160 kernel/kallsyms.c:79
 kallsyms_lookup_name+0x7f/0x120 kernel/kallsyms.c:170
 insert_report_filterlist kernel/kcsan/debugfs.c:155 [inline]
 debugfs_write+0x14b/0x2d0 kernel/kcsan/debugfs.c:256
 full_proxy_write+0xbd/0x100 fs/debugfs/file.c:225
 __vfs_write+0x67/0xc0 fs/read_write.c:494
 vfs_write fs/read_write.c:558 [inline]
 vfs_write+0x18a/0x390 fs/read_write.c:542
 ksys_write+0xd5/0x1b0 fs/read_write.c:611
 __do_sys_write fs/read_write.c:623 [inline]
 __se_sys_write fs/read_write.c:620 [inline]
 __x64_sys_write+0x4c/0x60 fs/read_write.c:620
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

read to 0xffffffff8603d008 of 8 bytes by task 0 on cpu 0:
 tick_do_update_jiffies64+0x2b/0x250 kernel/time/tick-sched.c:62
 tick_nohz_update_jiffies kernel/time/tick-sched.c:505 [inline]
 tick_nohz_irq_enter kernel/time/tick-sched.c:1257 [inline]
 tick_irq_enter+0x139/0x1c0 kernel/time/tick-sched.c:1274
 irq_enter+0x4f/0x60 kernel/softirq.c:354
 entering_irq arch/x86/include/asm/apic.h:517 [inline]
 entering_ack_irq arch/x86/include/asm/apic.h:523 [inline]
 smp_apic_timer_interrupt+0x55/0x280 arch/x86/kernel/apic/apic.c:1133
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
 native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60
 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
 cpuidle_idle_call kernel/sched/idle.c:154 [inline]
 do_idle+0x1af/0x280 kernel/sched/idle.c:263
 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
 rest_init+0xec/0xf6 init/main.c:452
 arch_call_rest_init+0x17/0x37
 start_kernel+0x838/0x85e init/main.c:786
 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490
 x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Use READ_ONCE() and WRITE_ONCE() to annotate this expected race.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191205045619.204946-1-edumazet@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23 08:21:37 +01:00
Daniel Borkmann
042a3a6d93 bpf: Fix incorrect verifier simulation of ARSH under ALU32
commit 0af2ffc93a4b50948f9dad2786b7f1bd253bf0b9 upstream.

Anatoly has been fuzzing with kBdysch harness and reported a hang in one
of the outcomes:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (85) call bpf_get_socket_cookie#46
  1: R0_w=invP(id=0) R10=fp0
  1: (57) r0 &= 808464432
  2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
  2: (14) w0 -= 810299440
  3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
  3: (c4) w0 s>>= 1
  4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
  4: (76) if w0 s>= 0x30303030 goto pc+216
  221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
  221: (95) exit
  processed 6 insns (limit 1000000) [...]

Taking a closer look, the program was xlated as follows:

  # ./bpftool p d x i 12
  0: (85) call bpf_get_socket_cookie#7800896
  1: (bf) r6 = r0
  2: (57) r6 &= 808464432
  3: (14) w6 -= 810299440
  4: (c4) w6 s>>= 1
  5: (76) if w6 s>= 0x30303030 goto pc+216
  6: (05) goto pc-1
  7: (05) goto pc-1
  8: (05) goto pc-1
  [...]
  220: (05) goto pc-1
  221: (05) goto pc-1
  222: (95) exit

Meaning, the visible effect is very similar to f54c7898ed1c ("bpf: Fix
precision tracking for unbounded scalars"), that is, the fall-through
branch in the instruction 5 is considered to be never taken given the
conclusion from the min/max bounds tracking in w6, and therefore the
dead-code sanitation rewrites it as goto pc-1. However, real-life input
disagrees with verification analysis since a soft-lockup was observed.

The bug sits in the analysis of the ARSH. The definition is that we shift
the target register value right by K bits through shifting in copies of
its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the
register into 32 bit mode, same happens after simulating the operation.
However, for the case of simulating the actual ARSH, we don't take the
mode into account and act as if it's always 64 bit, but location of sign
bit is different:

  dst_reg->smin_value >>= umin_val;
  dst_reg->smax_value >>= umin_val;
  dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val);

Consider an unknown R0 where bpf_get_socket_cookie() (or others) would
for example return 0xffff. With the above ARSH simulation, we'd see the
following results:

  [...]
  1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0
  1: (85) call bpf_get_socket_cookie#46
  2: R0_w=invP(id=0) R10=fp0
  2: (57) r0 &= 808464432
    -> R0_runtime = 0x3030
  3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
  3: (14) w0 -= 810299440
    -> R0_runtime = 0xcfb40000
  4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
                              (0xffffffff)
  4: (c4) w0 s>>= 1
    -> R0_runtime = 0xe7da0000
  5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
                              (0x67c00000)           (0x7ffbfff8)
  [...]

In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011
0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that
is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly
retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into
0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000'
and means here that the simulation didn't retain the sign bit. With above
logic, the updates happen on the 64 bit min/max bounds and given we coerced
the register, the sign bits of the bounds are cleared as well, meaning, we
need to force the simulation into s32 space for 32 bit alu mode.

Verification after the fix below. We're first analyzing the fall-through branch
on 32 bit signed >= test eventually leading to rejection of the program in this
specific case:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (b7) r2 = 808464432
  1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0
  1: (85) call bpf_get_socket_cookie#46
  2: R0_w=invP(id=0) R10=fp0
  2: (bf) r6 = r0
  3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0
  3: (57) r6 &= 808464432
  4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
  4: (14) w6 -= 810299440
  5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
  5: (c4) w6 s>>= 1
  6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
                                              (0x67c00000)          (0xfffbfff8)
  6: (76) if w6 s>= 0x30303030 goto pc+216
  7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
  7: (30) r0 = *(u8 *)skb[808464432]
  BPF_LD_[ABS|IND] uses reserved fields
  processed 8 insns (limit 1000000) [...]

Fixes: 9cbe1f5a32 ("bpf/verifier: improve register value range tracking with ARSH")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115204733.16648-1-daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23 08:21:32 +01:00
Christian Brauner
21cd79a27a ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()
commit 6b3ad6649a4c75504edeba242d3fd36b3096a57f upstream.

Commit 69f594a389 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
introduced the ability to opt out of audit messages for accesses to various
proc files since they are not violations of policy.  While doing so it
somehow switched the check from ns_capable() to
has_ns_capability{_noaudit}(). That means it switched from checking the
subjective credentials of the task to using the objective credentials. This
is wrong since. ptrace_has_cap() is currently only used in
ptrace_may_access() And is used to check whether the calling task (subject)
has the CAP_SYS_PTRACE capability in the provided user namespace to operate
on the target task (object). According to the cred.h comments this would
mean the subjective credentials of the calling task need to be used.
This switches ptrace_has_cap() to use security_capable(). Because we only
call ptrace_has_cap() in ptrace_may_access() and in there we already have a
stable reference to the calling task's creds under rcu_read_lock() there's
no need to go through another series of dereferences and rcu locking done
in ns_capable{_noaudit}().

As one example where this might be particularly problematic, Jann pointed
out that in combination with the upcoming IORING_OP_OPENAT feature, this
bug might allow unprivileged users to bypass the capability checks while
asynchronously opening files like /proc/*/mem, because the capability
checks for this would be performed against kernel credentials.

To illustrate on the former point about this being exploitable: When
io_uring creates a new context it records the subjective credentials of the
caller. Later on, when it starts to do work it creates a kernel thread and
registers a callback. The callback runs with kernel creds for
ktask->real_cred and ktask->cred. To prevent this from becoming a
full-blown 0-day io_uring will call override_cred() and override
ktask->cred with the subjective credentials of the creator of the io_uring
instance. With ptrace_has_cap() currently looking at ktask->real_cred this
override will be ineffective and the caller will be able to open arbitray
proc files as mentioned above.
Luckily, this is currently not exploitable but will turn into a 0-day once
IORING_OP_OPENAT{2} land in v5.6. Fix it now!

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Jann Horn <jannh@google.com>
Fixes: 69f594a389 ("ptrace: do not audit capability check when outputing /proc/pid/stat")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23 08:21:29 +01:00
Micah Morton
87ca9aaf0c LSM: generalize flag passing to security_capable
[ Upstream commit c1a85a00ea66cb6f0bd0f14e47c28c2b0999799f ]

This patch provides a general mechanism for passing flags to the
security_capable LSM hook. It replaces the specific 'audit' flag that is
used to tell security_capable whether it should log an audit message for
the given capability check. The reason for generalizing this flag
passing is so we can add an additional flag that signifies whether
security_capable is being called by a setid syscall (which is needed by
the proposed SafeSetID LSM).

Signed-off-by: Micah Morton <mortonm@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.morris@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-23 08:21:29 +01:00
Andrey Konovalov
a7ed6cc7b7 UPSTREAM: kcov: remote coverage support
(Upstream commit eec028c9386ed1a692aa01a85b55952202b41619.)

Patch series " kcov: collect coverage from usb and vhost", v3.

This patchset extends kcov to allow collecting coverage from backgound
kernel threads.  This extension requires custom annotations for each of
the places where coverage collection is desired.  This patchset
implements this for hub events in the USB subsystem and for vhost
workers.  See the first patch description for details about the kcov
extension.  The other two patches apply this kcov extension to USB and
vhost.

Examples of other subsystems that might potentially benefit from this
when custom annotations are added (the list is based on
process_one_work() callers for bugs recently reported by syzbot):

1. fs: writeback wb_workfn() worker,
2. net: addrconf_dad_work()/addrconf_verify_work() workers,
3. net: neigh_periodic_work() worker,
4. net/p9: p9_write_work()/p9_read_work() workers,
5. block: blk_mq_run_work_fn() worker.

These patches have been used to enable coverage-guided USB fuzzing with
syzkaller for the last few years, see the details here:

  https://github.com/google/syzkaller/blob/master/docs/linux/external_fuzzing_usb.md

This patchset has been pushed to the public Linux kernel Gerrit
instance:

  https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/1524

This patch (of 3):

Add background thread coverage collection ability to kcov.

With KCOV_ENABLE coverage is collected only for syscalls that are issued
from the current process.  With KCOV_REMOTE_ENABLE it's possible to
collect coverage for arbitrary parts of the kernel code, provided that
those parts are annotated with kcov_remote_start()/kcov_remote_stop().

This allows to collect coverage from two types of kernel background
threads: the global ones, that are spawned during kernel boot in a
limited number of instances (e.g.  one USB hub_event() worker thread is
spawned per USB HCD); and the local ones, that are spawned when a user
interacts with some kernel interface (e.g.  vhost workers).

To enable collecting coverage from a global background thread, a unique
global handle must be assigned and passed to the corresponding
kcov_remote_start() call.  Then a userspace process can pass a list of
such handles to the KCOV_REMOTE_ENABLE ioctl in the handles array field
of the kcov_remote_arg struct.  This will attach the used kcov device to
the code sections, that are referenced by those handles.

Since there might be many local background threads spawned from
different userspace processes, we can't use a single global handle per
annotation.  Instead, the userspace process passes a non-zero handle
through the common_handle field of the kcov_remote_arg struct.  This
common handle gets saved to the kcov_handle field in the current
task_struct and needs to be passed to the newly spawned threads via
custom annotations.  Those threads should in turn be annotated with
kcov_remote_start()/kcov_remote_stop().

Internally kcov stores handles as u64 integers.  The top byte of a
handle is used to denote the id of a subsystem that this handle belongs
to, and the lower 4 bytes are used to denote the id of a thread instance
within that subsystem.  A reserved value 0 is used as a subsystem id for
common handles as they don't belong to a particular subsystem.  The
bytes 4-7 are currently reserved and must be zero.  In the future the
number of bytes used for the subsystem or handle ids might be increased.

When a particular userspace process collects coverage by via a common
handle, kcov will collect coverage for each code section that is
annotated to use the common handle obtained as kcov_handle from the
current task_struct.  However non common handles allow to collect
coverage selectively from different subsystems.

Link: http://lkml.kernel.org/r/e90e315426a384207edbec1d6aa89e43008e4caf.1572366574.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David Windsor <dwindsor@gmail.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 147413187
Change-Id: I868c4846a412bfbae16086017e113813571df377
2020-01-15 14:51:23 +00:00
Elena Reshetova
7073504599 UPSTREAM: kcov: convert kcov.refcount to refcount_t
(Upstream commit 39e07cb60860e3162fc377380b8a60409315681e.)

atomic_t variables are currently used to implement reference
counters with the following properties:

 - counter is initialized to 1 using atomic_set()

 - a resource is freed upon counter reaching zero

 - once counter reaches zero, its further
   increments aren't allowed

 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided refcount_t
type and API that prevents accidental counter overflows and underflows.
This is important since overflows and underflows can lead to
use-after-free situation and be exploitable.

The variable kcov.refcount is used as pure reference counter.  Convert
it to refcount_t and fix up the operations.

**Important note for maintainers:

Some functions from refcount_t API defined in lib/refcount.c have
different memory ordering guarantees than their atomic counterparts.

The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57
and it is hopefully soon in state to be merged to the documentation
tree.  Normally the differences should not matter since refcount_t
provides enough guarantees to satisfy the refcounting use cases, but in
some rare cases it might matter.  Please double check that you don't
have some undocumented memory guarantees for this variable usage.

For the kcov.refcount it might make a difference
in following places:
 - kcov_put(): decrement in refcount_dec_and_test() only
   provides RELEASE ordering and control dependency on success
   vs. fully ordered atomic counterpart

Link: http://lkml.kernel.org/r/1547634429-772-1-git-send-email-elena.reshetova@intel.com
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 147413187
Change-Id: Ie22524d133af5ab86dcc5cadde4bdca931625d3a
2020-01-15 14:50:51 +00:00
Greg Kroah-Hartman
8e39dd1479 UPSTREAM: kcov: no need to check return value of debugfs_create functions
(Upstream commit ec9672d57670d495404f36ab8b651bfefc0ea10b.)

When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Link: http://lkml.kernel.org/r/20190122152151.16139-46-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: I8ed5dc6aeba3dda8b91ceea4fed5cd9ef058461f
Bug: 147413187
2020-01-15 14:50:31 +00:00
Greg Kroah-Hartman
21e3a71314 This is the 4.19.96 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4eEV4ACgkQONu9yGCS
 aT6D6Q//W205i6vBtK440pV659K29fw2+XYhJnG+3kkcXJ1GBEHlS7ddZCzV1lua
 uOLYxG2J0A3D0WC8S78XwE1ZXe1YTor647qW0H6pDeZIj/ID2x4TEN6rZq0YlCb7
 JLx6d+yb7MkK9fHGXVyTZbKESLr7WimDnZckWbO4xWTqtlh195ygMYxdwTxa97ZP
 rMJXR/qS/n5rlgQE8BEFsPZBNcsSVKecIi8ibSWu8zDJerBb2h0TuUMKvn1fteG9
 O69y8MeoJSWGTsorii+f7toGKhECFRWSrDj9GhK4SFTj3eV5evrkozAWJpXuzIOA
 9ou9OHUQnJxpXaVyBxiUDkP58tDYYddsrSCHFSNPtPcsTJSxDUEwZBDXr7V8TyVq
 axjoXgHwXOg9qi5IvEOvOBIahTX9OdKvR578eLQIFinZTRmZGKl2XW08olybcUw3
 ZpTBW6A97Xmhs4szFOQfgxRrc3n0F4+Mk9tFp2HFzSu23y6A/wwctIt4xoOWJ8Nq
 ouZbGwI3Em6rtTWDufNJbagm5QYLEcsS5F4Ala6uGIxBXs8mD7dAisILEXxFfSAn
 aPzPvsKh1vMhYBFmTStsowxy1pjeuUMHukrBQLDtmhzR4aBs5GXCJi/Iq2ojC123
 LLAnG/ytLyYyVUxpYyNMsconLITZ1iS3iUAIcUHeYcEFZ3JAV3E=
 =D3a0
 -----END PGP SIGNATURE-----

Merge 4.19.96 into android-4.19

Changes in 4.19.96
	chardev: Avoid potential use-after-free in 'chrdev_open()'
	i2c: fix bus recovery stop mode timing
	usb: chipidea: host: Disable port power only if previously enabled
	ALSA: usb-audio: Apply the sample rate quirk for Bose Companion 5
	ALSA: hda/realtek - Add new codec supported for ALCS1200A
	ALSA: hda/realtek - Set EAPD control to default for ALC222
	ALSA: hda/realtek - Add quirk for the bass speaker on Lenovo Yoga X1 7th gen
	kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
	tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined
	tracing: Change offset type to s32 in preempt/irq tracepoints
	HID: Fix slab-out-of-bounds read in hid_field_extract
	HID: uhid: Fix returning EPOLLOUT from uhid_char_poll
	HID: hid-input: clear unmapped usages
	Input: add safety guards to input_set_keycode()
	Input: input_event - fix struct padding on sparc64
	drm/sun4i: tcon: Set RGB DCLK min. divider based on hardware model
	drm/fb-helper: Round up bits_per_pixel if possible
	drm/dp_mst: correct the shifting in DP_REMOTE_I2C_READ
	can: kvaser_usb: fix interface sanity check
	can: gs_usb: gs_usb_probe(): use descriptors of current altsetting
	can: mscan: mscan_rx_poll(): fix rx path lockup when returning from polling to irq mode
	can: can_dropped_invalid_skb(): ensure an initialized headroom in outgoing CAN sk_buffs
	gpiolib: acpi: Turn dmi_system_id table into a generic quirk table
	gpiolib: acpi: Add honor_wakeup module-option + quirk mechanism
	staging: vt6656: set usb_set_intfdata on driver fail.
	USB: serial: option: add ZLP support for 0x1bc7/0x9010
	usb: musb: fix idling for suspend after disconnect interrupt
	usb: musb: Disable pullup at init
	usb: musb: dma: Correct parameter passed to IRQ handler
	staging: comedi: adv_pci1710: fix AI channels 16-31 for PCI-1713
	staging: rtl8188eu: Add device code for TP-Link TL-WN727N v5.21
	serdev: Don't claim unsupported ACPI serial devices
	tty: link tty and port before configuring it as console
	tty: always relink the port
	mwifiex: fix possible heap overflow in mwifiex_process_country_ie()
	mwifiex: pcie: Fix memory leak in mwifiex_pcie_alloc_cmdrsp_buf
	scsi: bfa: release allocated memory in case of error
	rtl8xxxu: prevent leaking urb
	ath10k: fix memory leak
	HID: hiddev: fix mess in hiddev_open()
	USB: Fix: Don't skip endpoint descriptors with maxpacket=0
	phy: cpcap-usb: Fix error path when no host driver is loaded
	phy: cpcap-usb: Fix flakey host idling and enumerating of devices
	netfilter: arp_tables: init netns pointer in xt_tgchk_param struct
	netfilter: conntrack: dccp, sctp: handle null timeout argument
	netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is present
	drm/i915/gen9: Clear residual context state on context switch
	Linux 4.19.96

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I261d6a9e90a5461701f74e3ca1482e3c00939f3e
2020-01-15 08:57:09 +01:00
Steven Rostedt (VMware)
c919096552 tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined
commit b8299d362d0837ae39e87e9019ebe6b736e0f035 upstream.

On some archs with some configurations, MCOUNT_INSN_SIZE is not defined, and
this makes the stack tracer fail to compile. Just define it to zero in this
case.

Link: https://lore.kernel.org/r/202001020219.zvE3vsty%lkp@intel.com

Cc: stable@vger.kernel.org
Fixes: 4df297129f ("tracing: Remove most or all of stack tracer stack size from stack_max_size")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14 20:06:59 +01:00
Kaitao Cheng
5ab4bb7b40 kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
commit 50f9ad607ea891a9308e67b81f774c71736d1098 upstream.

In the function, if register_trace_sched_migrate_task() returns error,
sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is
why fail_deprobe_sched_switch was added.

Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com

Cc: stable@vger.kernel.org
Fixes: 478142c39c ("tracing: do not grab lock in wakeup latency function tracing")
Signed-off-by: Kaitao Cheng <pilgrimtao@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14 20:06:59 +01:00
Greg Kroah-Hartman
5da11144c3 This is the 4.19.95 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4bAEoACgkQONu9yGCS
 aT70DQ//YgSd3JJ9/d7TLy+Mm8GrlZIZWgRrz8YZdKcJBG+l/c+m6FR0uLDw95nf
 zIFq1GY8DwS3djuNkxPoz8yvIgqoHuSjEUo+YF8v71ZdnGCXIt6SCoKbugAH+azp
 zTEJ4lU3/Wrc1Gh4w5zeSgdQVPIJBQ+d7pKccZHOJ0DFwbU3hQ69vqXcaxrFuhop
 vqKvNNEeCT2l1AxgAhKwNhceFL3Rb/2HxAjUED68ueY/EZ0/OsoinboBu2riSKNu
 NWZTPq7B2Kht19aV48FThwksEvkelz72gkSzKvrsT03tDtNewQ7C/mreUqpU6VZE
 lrcIWOHjXMQyk3g3XZfG3j8ppzHvZPaG/WEqmnJV8txjtaRU8hvj3Q95kHPheTv6
 O/Ds4OpHBaS5g6+dmsUmtfSrbrhm3KKXMn0IJvyYkC/VY0RVdLKTZu0TM5ZS5id/
 zW4AZWWzephUID9LAwRrTWvfGKMBK6gEiv2AtnY4XRiEMYEFw45uP97yWcUCg89L
 a3BCbhhiO4tMqL/lf7KD4vPbXZsRgDM3q1wLitvM2KCVVSA32XGRMT9x8A6uwTJl
 OLL39NSi8h8rqn5S22AowKcR/3VakRNw9pf5Y6AH5xOnP3R/OBHzYJNho2DUc8OI
 6p+zeUZkDEIsFI1wdJSaaSWsNdpJj3Sw6SdB2QJI3c782F4pFUA=
 =RCVS
 -----END PGP SIGNATURE-----

Merge 4.19.95 into android-4.19

Changes in 4.19.95
	USB: dummy-hcd: use usb_urb_dir_in instead of usb_pipein
	USB: dummy-hcd: increase max number of devices to 32
	bpf: Fix passing modified ctx to ld/abs/ind instruction
	regulator: fix use after free issue
	ASoC: max98090: fix possible race conditions
	locking/spinlock/debug: Fix various data races
	netfilter: ctnetlink: netns exit must wait for callbacks
	mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()
	libtraceevent: Fix lib installation with O=
	x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage
	ASoC: Intel: bytcr_rt5640: Update quirk for Teclast X89
	efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs
	efi/gop: Return EFI_SUCCESS if a usable GOP was found
	efi/gop: Fix memory leak in __gop_query32/64()
	ARM: dts: imx6ul: imx6ul-14x14-evk.dtsi: Fix SPI NOR probing
	ARM: vexpress: Set-up shared OPP table instead of individual for each CPU
	netfilter: uapi: Avoid undefined left-shift in xt_sctp.h
	netfilter: nft_set_rbtree: bogus lookup/get on consecutive elements in named sets
	netfilter: nf_tables: validate NFT_SET_ELEM_INTERVAL_END
	netfilter: nf_tables: validate NFT_DATA_VALUE after nft_data_init()
	ARM: dts: BCM5301X: Fix MDIO node address/size cells
	selftests/ftrace: Fix multiple kprobe testcase
	ARM: dts: Cygnus: Fix MDIO node address/size cells
	spi: spi-cavium-thunderx: Add missing pci_release_regions()
	ASoC: topology: Check return value for soc_tplg_pcm_create()
	ARM: dts: bcm283x: Fix critical trip point
	bnxt_en: Return error if FW returns more data than dump length
	bpf, mips: Limit to 33 tail calls
	spi: spi-ti-qspi: Fix a bug when accessing non default CS
	ARM: dts: am437x-gp/epos-evm: fix panel compatible
	samples: bpf: Replace symbol compare of trace_event
	samples: bpf: fix syscall_tp due to unused syscall
	powerpc: Ensure that swiotlb buffer is allocated from low memory
	btrfs: Fix error messages in qgroup_rescan_init
	bpf: Clear skb->tstamp in bpf_redirect when necessary
	bnx2x: Do not handle requests from VFs after parity
	bnx2x: Fix logic to get total no. of PFs per engine
	cxgb4: Fix kernel panic while accessing sge_info
	net: usb: lan78xx: Fix error message format specifier
	parisc: add missing __init annotation
	rfkill: Fix incorrect check to avoid NULL pointer dereference
	ASoC: wm8962: fix lambda value
	regulator: rn5t618: fix module aliases
	iommu/iova: Init the struct iova to fix the possible memleak
	kconfig: don't crash on NULL expressions in expr_eq()
	perf/x86/intel: Fix PT PMI handling
	fs: avoid softlockups in s_inodes iterators
	net: stmmac: Do not accept invalid MTU values
	net: stmmac: xgmac: Clear previous RX buffer size
	net: stmmac: RX buffer size must be 16 byte aligned
	net: stmmac: Always arm TX Timer at end of transmission start
	s390/purgatory: do not build purgatory with kcov, kasan and friends
	drm/exynos: gsc: add missed component_del
	s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
	s390/dasd: fix memleak in path handling error case
	block: fix memleak when __blk_rq_map_user_iov() is failed
	parisc: Fix compiler warnings in debug_core.c
	llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
	hv_netvsc: Fix unwanted rx_table reset
	powerpc/vcpu: Assume dedicated processors as non-preempt
	powerpc/spinlocks: Include correct header for static key
	cpufreq: imx6q: read OCOTP through nvmem for imx6ul/imx6ull
	ARM: dts: imx6ul: use nvmem-cells for cpu speed grading
	PCI/switchtec: Read all 64 bits of part_event_bitmap
	gtp: fix bad unlock balance in gtp_encap_enable_socket
	macvlan: do not assume mac_header is set in macvlan_broadcast()
	net: dsa: mv88e6xxx: Preserve priority when setting CPU port.
	net: stmmac: dwmac-sun8i: Allow all RGMII modes
	net: stmmac: dwmac-sunxi: Allow all RGMII modes
	net: usb: lan78xx: fix possible skb leak
	pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM
	sch_cake: avoid possible divide by zero in cake_enqueue()
	sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY
	tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK
	vxlan: fix tos value before xmit
	vlan: fix memory leak in vlan_dev_set_egress_priority
	vlan: vlan_changelink() should propagate errors
	mlxsw: spectrum_qdisc: Ignore grafting of invisible FIFO
	net: sch_prio: When ungrafting, replace with FIFO
	usb: dwc3: gadget: Fix request complete check
	USB: core: fix check for duplicate endpoints
	USB: serial: option: add Telit ME910G1 0x110a composition
	usb: missing parentheses in USE_NEW_SCHEME
	Linux 4.19.95

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I611f034c58f975a5d2b70eed0a0884f1ff5b09cc
2020-01-12 12:29:19 +01:00
Marco Elver
c7673f0160 locking/spinlock/debug: Fix various data races
[ Upstream commit 1a365e822372ba24c9da0822bc583894f6f3d821 ]

This fixes various data races in spinlock_debug. By testing with KCSAN,
it is observable that the console gets spammed with data races reports,
suggesting these are extremely frequent.

Example data race report:

  read to 0xffff8ab24f403c48 of 4 bytes by task 221 on cpu 2:
   debug_spin_lock_before kernel/locking/spinlock_debug.c:85 [inline]
   do_raw_spin_lock+0x9b/0x210 kernel/locking/spinlock_debug.c:112
   __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
   _raw_spin_lock+0x39/0x40 kernel/locking/spinlock.c:151
   spin_lock include/linux/spinlock.h:338 [inline]
   get_partial_node.isra.0.part.0+0x32/0x2f0 mm/slub.c:1873
   get_partial_node mm/slub.c:1870 [inline]
  <snip>

  write to 0xffff8ab24f403c48 of 4 bytes by task 167 on cpu 3:
   debug_spin_unlock kernel/locking/spinlock_debug.c:103 [inline]
   do_raw_spin_unlock+0xc9/0x1a0 kernel/locking/spinlock_debug.c:138
   __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:159 [inline]
   _raw_spin_unlock_irqrestore+0x2d/0x50 kernel/locking/spinlock.c:191
   spin_unlock_irqrestore include/linux/spinlock.h:393 [inline]
   free_debug_processing+0x1b3/0x210 mm/slub.c:1214
   __slab_free+0x292/0x400 mm/slub.c:2864
  <snip>

As a side-effect, with KCSAN, this eventually locks up the console, most
likely due to deadlock, e.g. .. -> printk lock -> spinlock_debug ->
KCSAN detects data race -> kcsan_print_report() -> printk lock ->
deadlock.

This fix will 1) avoid the data races, and 2) allow using lock debugging
together with KCSAN.

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Marco Elver <elver@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/20191120155715.28089-1-elver@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-12 12:17:05 +01:00
Daniel Borkmann
f70280ee89 bpf: Fix passing modified ctx to ld/abs/ind instruction
commit 6d4f151acf9a4f6fab09b615f246c717ddedcf0c upstream.

Anatoly has been fuzzing with kBdysch harness and reported a KASAN
slab oob in one of the outcomes:

  [...]
  [   77.359642] BUG: KASAN: slab-out-of-bounds in bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.360463] Read of size 4 at addr ffff8880679bac68 by task bpf/406
  [   77.361119]
  [   77.361289] CPU: 2 PID: 406 Comm: bpf Not tainted 5.5.0-rc2-xfstests-00157-g2187f215eba #1
  [   77.362134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  [   77.362984] Call Trace:
  [   77.363249]  dump_stack+0x97/0xe0
  [   77.363603]  print_address_description.constprop.0+0x1d/0x220
  [   77.364251]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365030]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365860]  __kasan_report.cold+0x37/0x7b
  [   77.366365]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.366940]  kasan_report+0xe/0x20
  [   77.367295]  bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.367821]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.368278]  ? mark_lock+0xa3/0x9b0
  [   77.368641]  ? kvm_sched_clock_read+0x14/0x30
  [   77.369096]  ? sched_clock+0x5/0x10
  [   77.369460]  ? sched_clock_cpu+0x18/0x110
  [   77.369876]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.370330]  ___bpf_prog_run+0x16c0/0x28f0
  [   77.370755]  __bpf_prog_run32+0x83/0xc0
  [   77.371153]  ? __bpf_prog_run64+0xc0/0xc0
  [   77.371568]  ? match_held_lock+0x1b/0x230
  [   77.371984]  ? rcu_read_lock_held+0xa1/0xb0
  [   77.372416]  ? rcu_is_watching+0x34/0x50
  [   77.372826]  sk_filter_trim_cap+0x17c/0x4d0
  [   77.373259]  ? sock_kzfree_s+0x40/0x40
  [   77.373648]  ? __get_filter+0x150/0x150
  [   77.374059]  ? skb_copy_datagram_from_iter+0x80/0x280
  [   77.374581]  ? do_raw_spin_unlock+0xa5/0x140
  [   77.375025]  unix_dgram_sendmsg+0x33a/0xa70
  [   77.375459]  ? do_raw_spin_lock+0x1d0/0x1d0
  [   77.375893]  ? unix_peer_get+0xa0/0xa0
  [   77.376287]  ? __fget_light+0xa4/0xf0
  [   77.376670]  __sys_sendto+0x265/0x280
  [   77.377056]  ? __ia32_sys_getpeername+0x50/0x50
  [   77.377523]  ? lock_downgrade+0x350/0x350
  [   77.377940]  ? __sys_setsockopt+0x2a6/0x2c0
  [   77.378374]  ? sock_read_iter+0x240/0x240
  [   77.378789]  ? __sys_socketpair+0x22a/0x300
  [   77.379221]  ? __ia32_sys_socket+0x50/0x50
  [   77.379649]  ? mark_held_locks+0x1d/0x90
  [   77.380059]  ? trace_hardirqs_on_thunk+0x1a/0x1c
  [   77.380536]  __x64_sys_sendto+0x74/0x90
  [   77.380938]  do_syscall_64+0x68/0x2a0
  [   77.381324]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [   77.381878] RIP: 0033:0x44c070
  [...]

After further debugging, turns out while in case of other helper functions
we disallow passing modified ctx, the special case of ld/abs/ind instruction
which has similar semantics (except r6 being the ctx argument) is missing
such check. Modified ctx is impossible here as bpf_skb_load_helper_8_no_cache()
and others are expecting skb fields in original position, hence, add
check_ctx_reg() to reject any modified ctx. Issue was first introduced back
in f1174f77b5 ("bpf/verifier: rework value tracking").

Fixes: f1174f77b5 ("bpf/verifier: rework value tracking")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200106215157.3553-1-daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12 12:17:04 +01:00
Joel Fernandes (Google)
89ae5a7cad BACKPORT: perf_event: Add support for LSM and SELinux checks
In current mainline, the degree of access to perf_event_open(2) system
call depends on the perf_event_paranoid sysctl.  This has a number of
limitations:

1. The sysctl is only a single value. Many types of accesses are controlled
   based on the single value thus making the control very limited and
   coarse grained.
2. The sysctl is global, so if the sysctl is changed, then that means
   all processes get access to perf_event_open(2) opening the door to
   security issues.

This patch adds LSM and SELinux access checking which will be used in
Android to access perf_event_open(2) for the purposes of attaching BPF
programs to tracepoints, perf profiling and other operations from
userspace. These operations are intended for production systems.

5 new LSM hooks are added:
1. perf_event_open: This controls access during the perf_event_open(2)
   syscall itself. The hook is called from all the places that the
   perf_event_paranoid sysctl is checked to keep it consistent with the
   systctl. The hook gets passed a 'type' argument which controls CPU,
   kernel and tracepoint accesses (in this context, CPU, kernel and
   tracepoint have the same semantics as the perf_event_paranoid sysctl).
   Additionally, I added an 'open' type which is similar to
   perf_event_paranoid sysctl == 3 patch carried in Android and several other
   distros but was rejected in mainline [1] in 2016.

2. perf_event_alloc: This allocates a new security object for the event
   which stores the current SID within the event. It will be useful when
   the perf event's FD is passed through IPC to another process which may
   try to read the FD. Appropriate security checks will limit access.

3. perf_event_free: Called when the event is closed.

4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.

5. perf_event_write: Called from the ioctl(2) syscalls for the event.

[1] https://lwn.net/Articles/696240/

Since Peter had suggest LSM hooks in 2016 [1], I am adding his
Suggested-by tag below.

To use this patch, we set the perf_event_paranoid sysctl to -1 and then
apply selinux checking as appropriate (default deny everything, and then
add policy rules to give access to domains that need it). In the future
we can remove the perf_event_paranoid sysctl altogether.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: James Morris <jmorris@namei.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: rostedt@goodmis.org
Cc: Yonghong Song <yhs@fb.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: jeffv@google.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: primiano@google.com
Cc: Song Liu <songliubraving@fb.com>
Cc: rsavitski@google.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Matthew Garrett <matthewgarrett@google.com>
Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org

(cherry picked from commit da97e18458fb42d7c00fac5fd1c56a3896ec666e)
[ Ryan Savitski: Resolved conflicts with existing code, and folded in
  upstream ae79d5588a04 (perf/core: Fix !CONFIG_PERF_EVENTS build
  warnings and failures). This should fix the build errors from the
  previous backport attempt, where certain configurations would end up
  with functions referring to the perf_event struct prior to its
  declaration (and therefore declaring it with a different scope). ]
Bug: 137092007
Signed-off-by: Ryan Savitski <rsavitski@google.com>
Change-Id: Ief8c669083c81f4ea2fa75d5c0d947d19ea741b3
2020-01-10 15:18:52 +00:00
Greg Kroah-Hartman
ff0e96e80f This is the 4.19.94 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4W8A4ACgkQONu9yGCS
 aT5ZcBAAha0GMcpxm1ettNVMXUVD/Df2pntc3x2G1T+dtI89YwIilJcdQBpbDGB6
 6oNRpnopc+/ynm820SMlhjBNE8KlDzHS3Tmsn1lplru0yOqZMFcFlHSESCAA0b4E
 T21KwQ4rLZTzW4LvTf//4WpJZD1RnVrwKkbgkci9kvCjZsdh2GMK3XkBeVBUdXuX
 3gvW+9zsgmkU3Bhk5Mi8JUmOw2yY5sJt2tDmIyxOtBknAo1TK6n4kqd+NgjfsdcI
 cnNTstDU0Ikmi4UBOZGDmey0THtHdvi/oM3DUkzHtZ68W0rg/gPu4nDR+Fx3sKvo
 y5bI10j4W16PKXyxVehel+lD8XmIV/+zSerS0enGjijBPZKI9FTlGEuczk0x7sj+
 wqMh3WkkPig2bQPrCOIjkA5VW4n/ZL07cjd1nNeZ48MkvA/3k47o4vDV/lPE88ZT
 49qqaJvZ3kAdqtV1pfzpQtrG1Pp8YPcEHAMYIM/6jb6poCro5vFtuRX4tzj2fRSf
 u7jSVPDt7ED9SgHPe+RrGWVIx2/iVnr5mVdi53rjWTWfeTdNIY5bUs/wsTde1k99
 9bnAhwD4ZbFrO240MMYPWpOCr8kl0LXAeyQ104m7ONbMRnLoRp4sQCae252jyHFD
 Qxgez5cDwDQnj2W4/SdXSWytioTnyVHsI89FkWw+f/IU8AsbBuw=
 =mmeT
 -----END PGP SIGNATURE-----

Merge 4.19.94 into android-4.19

Changes in 4.19.94
	nvme_fc: add module to ops template to allow module references
	nvme-fc: fix double-free scenarios on hw queues
	drm/amdgpu: add check before enabling/disabling broadcast mode
	drm/amdgpu: add cache flush workaround to gfx8 emit_fence
	drm/amd/display: Fixed kernel panic when booting with DP-to-HDMI dongle
	iio: adc: max9611: Fix too short conversion time delay
	PM / devfreq: Fix devfreq_notifier_call returning errno
	PM / devfreq: Set scaling_max_freq to max on OPP notifier error
	PM / devfreq: Don't fail devfreq_dev_release if not in list
	afs: Fix afs_find_server lookups for ipv4 peers
	afs: Fix SELinux setting security label on /afs
	RDMA/cma: add missed unregister_pernet_subsys in init failure
	rxe: correctly calculate iCRC for unaligned payloads
	scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set func
	scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
	scsi: qla2xxx: Don't call qlt_async_event twice
	scsi: qla2xxx: Fix PLOGI payload and ELS IOCB dump length
	scsi: qla2xxx: Configure local loop for N2N target
	scsi: qla2xxx: Send Notify ACK after N2N PLOGI
	scsi: qla2xxx: Ignore PORT UPDATE after N2N PLOGI
	scsi: iscsi: qla4xxx: fix double free in probe
	scsi: libsas: stop discovering if oob mode is disconnected
	drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit
	usb: gadget: fix wrong endpoint desc
	net: make socket read/write_iter() honor IOCB_NOWAIT
	afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
	md: raid1: check rdev before reference in raid1_sync_request func
	s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
	s390/cpum_sf: Avoid SBD overflow condition in irq handler
	IB/mlx4: Follow mirror sequence of device add during device removal
	IB/mlx5: Fix steering rule of drop and count
	xen-blkback: prevent premature module unload
	xen/balloon: fix ballooned page accounting without hotplug enabled
	PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
	ALSA: hda/realtek - Add Bass Speaker and fixed dac for bass speaker
	ALSA: hda/realtek - Enable the bass speaker of ASUS UX431FLC
	ALSA: hda - fixup for the bass speaker on Lenovo Carbon X1 7th gen
	xfs: fix mount failure crash on invalid iclog memory access
	taskstats: fix data-race
	drm: limit to INT_MAX in create_blob ioctl
	netfilter: nft_tproxy: Fix port selector on Big Endian
	ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code
	ALSA: usb-audio: fix set_format altsetting sanity check
	ALSA: usb-audio: set the interface format after resume on Dell WD19
	ALSA: hda/realtek - Add headset Mic no shutup for ALC283
	drm/sun4i: hdmi: Remove duplicate cleanup calls
	MIPS: Avoid VDSO ABI breakage due to global register variable
	media: pulse8-cec: fix lost cec_transmit_attempt_done() call
	media: cec: CEC 2.0-only bcast messages were ignored
	media: cec: avoid decrementing transmit_queue_sz if it is 0
	media: cec: check 'transmit_in_progress', not 'transmitting'
	mm/zsmalloc.c: fix the migrated zspage statistics.
	memcg: account security cred as well to kmemcg
	mm: move_pages: return valid node id in status if the page is already on the target node
	pstore/ram: Write new dumps to start of recycled zones
	locks: print unsigned ino in /proc/locks
	dmaengine: Fix access to uninitialized dma_slave_caps
	compat_ioctl: block: handle Persistent Reservations
	compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE
	ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()
	ata: ahci_brcm: Fix AHCI resources management
	ata: ahci_brcm: Allow optional reset controller to be used
	ata: ahci_brcm: Add missing clock management during recovery
	ata: ahci_brcm: BCM7425 AHCI requires AHCI_HFLAG_DELAY_ENGINE
	libata: Fix retrieving of active qcs
	gpiolib: fix up emulated open drain outputs
	riscv: ftrace: correct the condition logic in function graph tracer
	rseq/selftests: Fix: Namespace gettid() for compatibility with glibc 2.30
	tracing: Fix lock inversion in trace_event_enable_tgid_record()
	tracing: Avoid memory leak in process_system_preds()
	tracing: Have the histogram compare functions convert to u64 first
	tracing: Fix endianness bug in histogram trigger
	apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
	ALSA: cs4236: fix error return comparison of an unsigned integer
	ALSA: firewire-motu: Correct a typo in the clock proc string
	exit: panic before exit_mm() on global init exit
	arm64: Revert support for execute-only user mappings
	ftrace: Avoid potential division by zero in function profiler
	drm/msm: include linux/sched/task.h
	PM / devfreq: Check NULL governor in available_governors_show
	nfsd4: fix up replay_matches_cache()
	HID: i2c-hid: Reset ALPS touchpads on resume
	ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100
	xfs: don't check for AG deadlock for realtime files in bunmapi
	platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
	Bluetooth: btusb: fix PM leak in error case of setup
	Bluetooth: delete a stray unlock
	Bluetooth: Fix memory leak in hci_connect_le_scan
	media: flexcop-usb: ensure -EIO is returned on error condition
	regulator: ab8500: Remove AB8505 USB regulator
	media: usb: fix memory leak in af9005_identify_state
	dt-bindings: clock: renesas: rcar-usb2-clock-sel: Fix typo in example
	arm64: dts: meson: odroid-c2: Disable usb_otg bus to avoid power failed warning
	tty: serial: msm_serial: Fix lockup for sysrq and oops
	fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP
	bdev: Factor out bdev revalidation into a common helper
	bdev: Refresh bdev size for disks without partitioning
	scsi: qedf: Do not retry ELS request if qedf_alloc_cmd fails
	drm/mst: Fix MST sideband up-reply failure handling
	powerpc/pseries/hvconsole: Fix stack overread via udbg
	selftests: rtnetlink: add addresses with fixed life time
	KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag
	rxrpc: Fix possible NULL pointer access in ICMP handling
	tcp: annotate tp->rcv_nxt lockless reads
	net: core: limit nested device depth
	ath9k_htc: Modify byte order for an error message
	ath9k_htc: Discard undersized packets
	xfs: periodically yield scrub threads to the scheduler
	net: add annotations on hh->hh_len lockless accesses
	ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps
	s390/smp: fix physical to logical CPU map for SMT
	xen/blkback: Avoid unmapping unmapped grant pages
	perf/x86/intel/bts: Fix the use of page_private()
	Linux 4.19.94

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic3d1a4e10565c38d0e82448f0fb7b6fd1822aab2
2020-01-09 16:14:43 +01:00
Greg Kroah-Hartman
58fd41cb2d Revert "BACKPORT: perf_event: Add support for LSM and SELinux checks"
This reverts commit 8af21ac176 as it
breaks the build :(

Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Ryan Savitski <rsavitski@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-01-09 11:07:21 +01:00
Wen Yang
010a7e846d ftrace: Avoid potential division by zero in function profiler
commit e31f7939c1c27faa5d0e3f14519eaf7c89e8a69d upstream.

The ftrace_profile->counter is unsigned long and
do_div truncates it to 32 bits, which means it can test
non-zero and be truncated to zero for division.
Fix this issue by using div64_ul() instead.

Link: http://lkml.kernel.org/r/20200103030248.14516-1-wenyang@linux.alibaba.com

Cc: stable@vger.kernel.org
Fixes: e330b3bcd8 ("tracing: Show sample std dev in function profiling")
Fixes: 34886c8bc5 ("tracing: add average time in function to function profiler")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:03 +01:00
chenqiwu
b9227aacdc exit: panic before exit_mm() on global init exit
commit 43cf75d96409a20ef06b756877a2e72b10a026fc upstream.

Currently, when global init and all threads in its thread-group have exited
we panic via:
do_exit()
-> exit_notify()
   -> forget_original_parent()
      -> find_child_reaper()
This makes it hard to extract a useable coredump for global init from a
kernel crashdump because by the time we panic exit_mm() will have already
released global init's mm.
This patch moves the panic futher up before exit_mm() is called. As was the
case previously, we only panic when global init and all its threads in the
thread-group have exited.

Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
[christian.brauner@ubuntu.com: fix typo, rewrite commit message]
Link: https://lore.kernel.org/r/1576736993-10121-1-git-send-email-qiwuchen55@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:02 +01:00
Sven Schnelle
1c662483c5 tracing: Fix endianness bug in histogram trigger
commit fe6e096a5bbf73a142f09c72e7aa2835026eb1a3 upstream.

At least on PA-RISC and s390 synthetic histogram triggers are failing
selftests because trace_event_raw_event_synth() always writes a 64 bit
values, but the reader expects a field->size sized value. On little endian
machines this doesn't hurt, but on big endian this makes the reader always
read zero values.

Link: http://lore.kernel.org/linux-trace-devel/20191218074427.96184-4-svens@linux.ibm.com

Cc: stable@vger.kernel.org
Fixes: 4b147936fa ("tracing: Add support for 'synthetic' events")
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:02 +01:00
Steven Rostedt (VMware)
0c81595930 tracing: Have the histogram compare functions convert to u64 first
commit 106f41f5a302cb1f36c7543fae6a05de12e96fa4 upstream.

The compare functions of the histogram code would be specific for the size
of the value being compared (byte, short, int, long long). It would
reference the value from the array via the type of the compare, but the
value was stored in a 64 bit number. This is fine for little endian
machines, but for big endian machines, it would end up comparing zeros or
all ones (depending on the sign) for anything but 64 bit numbers.

To fix this, first derference the value as a u64 then convert it to the type
being compared.

Link: http://lkml.kernel.org/r/20191211103557.7bed6928@gandalf.local.home

Cc: stable@vger.kernel.org
Fixes: 08d43a5fa0 ("tracing: Add lock-free tracing_map")
Acked-by: Tom Zanussi <zanussi@kernel.org>
Reported-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:02 +01:00
Keita Suzuki
8595e2aadd tracing: Avoid memory leak in process_system_preds()
commit 79e65c27f09683fbb50c33acab395d0ddf5302d2 upstream.

When failing in the allocation of filter_item, process_system_preds()
goes to fail_mem, where the allocated filter is freed.

However, this leads to memory leak of filter->filter_string and
filter->prog, which is allocated before and in process_preds().
This bug has been detected by kmemleak as well.

Fix this by changing kfree to __free_fiter.

unreferenced object 0xffff8880658007c0 (size 32):
  comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
  hex dump (first 32 bytes):
    63 6f 6d 6d 6f 6e 5f 70 69 64 20 20 3e 20 31 30  common_pid  > 10
    00 00 00 00 00 00 00 00 65 73 00 00 00 00 00 00  ........es......
  backtrace:
    [<0000000067441602>] kstrdup+0x2d/0x60
    [<00000000141cf7b7>] apply_subsystem_event_filter+0x378/0x932
    [<000000009ca32334>] subsystem_filter_write+0x5a/0x90
    [<0000000072da2bee>] vfs_write+0xe1/0x240
    [<000000004f14f473>] ksys_write+0xb4/0x150
    [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0
    [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff888060c22d00 (size 64):
  comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 e8 d7 41 80 88 ff ff  ...........A....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000b8c1b109>] process_preds+0x243/0x1820
    [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932
    [<000000009ca32334>] subsystem_filter_write+0x5a/0x90
    [<0000000072da2bee>] vfs_write+0xe1/0x240
    [<000000004f14f473>] ksys_write+0xb4/0x150
    [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0
    [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff888041d7e800 (size 512):
  comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
  hex dump (first 32 bytes):
    70 bc 85 97 ff ff ff ff 0a 00 00 00 00 00 00 00  p...............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000001e04af34>] process_preds+0x71a/0x1820
    [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932
    [<000000009ca32334>] subsystem_filter_write+0x5a/0x90
    [<0000000072da2bee>] vfs_write+0xe1/0x240
    [<000000004f14f473>] ksys_write+0xb4/0x150
    [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0
    [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: http://lkml.kernel.org/r/20191211091258.11310-1-keitasuzuki.park@sslab.ics.keio.ac.jp

Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 404a3add43 ("tracing: Only add filter list when needed")
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:02 +01:00
Prateek Sood
0e48d030e3 tracing: Fix lock inversion in trace_event_enable_tgid_record()
commit 3a53acf1d9bea11b57c1f6205e3fe73f9d8a3688 upstream.

       Task T2                             Task T3
trace_options_core_write()            subsystem_open()

 mutex_lock(trace_types_lock)           mutex_lock(event_mutex)

 set_tracer_flag()

   trace_event_enable_tgid_record()       mutex_lock(trace_types_lock)

    mutex_lock(event_mutex)

This gives a circular dependency deadlock between trace_types_lock and
event_mutex. To fix this invert the usage of trace_types_lock and
event_mutex in trace_options_core_write(). This keeps the sequence of
lock usage consistent.

Link: http://lkml.kernel.org/r/0101016eef175e38-8ca71caf-a4eb-480d-a1e6-6f0bbc015495-000000@us-west-2.amazonses.com

Cc: stable@vger.kernel.org
Fixes: d914ba37d7 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:02 +01:00
Shakeel Butt
3b677f7543 memcg: account security cred as well to kmemcg
commit 84029fd04c201a4c7e0b07ba262664900f47c6f5 upstream.

The cred_jar kmem_cache is already memcg accounted in the current kernel
but cred->security is not.  Account cred->security to kmemcg.

Recently we saw high root slab usage on our production and on further
inspection, we found a buggy application leaking processes.  Though that
buggy application was contained within its memcg but we observe much
more system memory overhead, couple of GiBs, during that period.  This
overhead can adversely impact the isolation on the system.

One source of high overhead we found was cred->security objects, which
have a lifetime of at least the life of the process which allocated
them.

Link: http://lkml.kernel.org/r/20191205223721.40034-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:00 +01:00