877 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Greg Kroah-Hartman
|
b9a942466b |
This is the 4.19.153 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl+ag5UACgkQONu9yGCS aT5O3w//RaOcwQdi47/UJz8zyja1ZG8MSSCGibpwvaDwrsXu9es1QtqLAC38H10o ygxNLBZQHxhScsRpicNc+Dy87+lcSj8cF1ed7sd1LU8rvmQ18uIeFUZxfzYth8jW i6erzas0Ojw8IMy566GDxkfAC6n5GhJuJTVFQWUQpoEbsb5rXcGCLx3u+S3Ew+5t Xb9qE6r5cImYymvMkMy7RQ4Db2qgOwjkaCj+Ol+4BSR0bF4OweMQLPJs9gN8pJpr o2nxHg7wdO8SKJZCBVw8ZmfO4zF6czcKy+KzFajn+4LA2oT5mgiV8y21cd9CWYeQ JQK1jZGwwl/xljrM1yLd+crG8i11DhCStY90+4bxD68r8H+g1kwZ8jELmCwuuyx6 dk1s7jOxyKl9qAnMt6r2HqrjgxGD+2hL+2S84jPGRBow5IYjrdD0REXZjyk1R7Rp 8k00lRk1ATEy7H2lj4JW34tcsTEEDcn8PqUFx7MRKtCUI2uo4Gr5HXqf6wTJDp6S BsDe8mm77jd81vtw/AZ8Fv7Fg42QIPt7G1QV9wBbFvDmKmDa7Gj6SuQqTeu75oU9 M++aWSwyOb08wZEE0y94wsm6r4raN3A8o70Df9FltNFTALowuIcR+CVtOnQfHEuL BUBJcWg3SDsIxkXYgvQ9jO5h38i6dhAIVGAcU4VB0rgP/ePKMQs= =GiLo -----END PGP SIGNATURE----- Merge 4.19.153 into android-4.19-stable Changes in 4.19.153 ibmveth: Switch order of ibmveth_helper calls. ibmveth: Identify ingress large send packets. ipv4: Restore flowi4_oif update before call to xfrm_lookup_route mlx4: handle non-napi callers to napi_poll net: fec: Fix phy_device lookup for phy_reset_after_clk_enable() net: fec: Fix PHY init after phy_reset_after_clk_enable() net: fix pos incrementment in ipv6_route_seq_next net/smc: fix valid DMBE buffer sizes net: usb: qmi_wwan: add Cellient MPL200 card tipc: fix the skb_unshare() in tipc_buf_append() net/ipv4: always honour route mtu during forwarding r8169: fix data corruption issue on RTL8402 net/tls: sendfile fails with ktls offload binder: fix UAF when releasing todo list ALSA: bebob: potential info leak in hwdep_read() chelsio/chtls: fix socket lock chelsio/chtls: correct netdevice for vlan interface chelsio/chtls: correct function return and return type net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME attribute in nfc_genl_fw_download() tcp: fix to update snd_wl1 in bulk receiver fast path r8169: fix operation under forced interrupt threading icmp: randomize the global rate limiter ALSA: hda/realtek: Enable audio jacks of ASUS D700SA with ALC887 cifs: remove bogus debug code cifs: Return the error from crypt_message when enc/dec key not found. KVM: x86/mmu: Commit zap of remaining invalid pages when recovering lpages KVM: SVM: Initialize prev_ga_tag before use ima: Don't ignore errors from crypto_shash_update() crypto: algif_aead - Do not set MAY_BACKLOG on the async path EDAC/i5100: Fix error handling order in i5100_init_one() EDAC/ti: Fix handling of platform_get_irq() error x86/fpu: Allow multiple bits in clearcpuid= parameter drivers/perf: xgene_pmu: Fix uninitialized resource struct x86/nmi: Fix nmi_handle() duration miscalculation x86/events/amd/iommu: Fix sizeof mismatch crypto: algif_skcipher - EBUSY on aio should be an error crypto: mediatek - Fix wrong return value in mtk_desc_ring_alloc() crypto: ixp4xx - Fix the size used in a 'dma_free_coherent()' call crypto: picoxcell - Fix potential race condition bug media: tuner-simple: fix regression in simple_set_radio_freq media: Revert "media: exynos4-is: Add missed check for pinctrl_lookup_state()" media: m5mols: Check function pointer in m5mols_sensor_power media: uvcvideo: Set media controller entity functions media: uvcvideo: Silence shift-out-of-bounds warning media: omap3isp: Fix memleak in isp_probe crypto: omap-sham - fix digcnt register handling with export/import hwmon: (pmbus/max34440) Fix status register reads for MAX344{51,60,61} cypto: mediatek - fix leaks in mtk_desc_ring_alloc media: mx2_emmaprp: Fix memleak in emmaprp_probe media: tc358743: initialize variable media: tc358743: cleanup tc358743_cec_isr media: rcar-vin: Fix a reference count leak. media: rockchip/rga: Fix a reference count leak. media: platform: fcp: Fix a reference count leak. media: camss: Fix a reference count leak. media: s5p-mfc: Fix a reference count leak media: stm32-dcmi: Fix a reference count leak media: ti-vpe: Fix a missing check and reference count leak regulator: resolve supply after creating regulator pinctrl: bcm: fix kconfig dependency warning when !GPIOLIB spi: spi-s3c64xx: swap s3c64xx_spi_set_cs() and s3c64xx_enable_datapath() spi: spi-s3c64xx: Check return values ath10k: provide survey info as accumulated data Bluetooth: hci_uart: Cancel init work before unregistering ath6kl: prevent potential array overflow in ath6kl_add_new_sta() ath9k: Fix potential out of bounds in ath9k_htc_txcompletion_cb() ath10k: Fix the size used in a 'dma_free_coherent()' call in an error handling path wcn36xx: Fix reported 802.11n rx_highest rate wcn3660/wcn3680 ASoC: qcom: lpass-platform: fix memory leak ASoC: qcom: lpass-cpu: fix concurrency issue brcmfmac: check ndev pointer mwifiex: Do not use GFP_KERNEL in atomic context staging: rtl8192u: Do not use GFP_KERNEL in atomic context drm/gma500: fix error check scsi: qla4xxx: Fix an error handling path in 'qla4xxx_get_host_stats()' scsi: qla2xxx: Fix wrong return value in qla_nvme_register_hba() scsi: csiostor: Fix wrong return value in csio_hw_prep_fw() backlight: sky81452-backlight: Fix refcount imbalance on error VMCI: check return value of get_user_pages_fast() for errors tty: serial: earlycon dependency tty: hvcs: Don't NULL tty->driver_data until hvcs_cleanup() pty: do tty_flip_buffer_push without port->lock in pty_write pwm: lpss: Fix off by one error in base_unit math in pwm_lpss_prepare() pwm: lpss: Add range limit check for the base_unit register value drivers/virt/fsl_hypervisor: Fix error handling path video: fbdev: vga16fb: fix setting of pixclock because a pass-by-value error video: fbdev: sis: fix null ptr dereference video: fbdev: radeon: Fix memleak in radeonfb_pci_register HID: roccat: add bounds checking in kone_sysfs_write_settings() pinctrl: mcp23s08: Fix mcp23x17_regmap initialiser pinctrl: mcp23s08: Fix mcp23x17 precious range net/mlx5: Don't call timecounter cyc2time directly from 1PPS flow net: stmmac: use netif_tx_start|stop_all_queues() function cpufreq: armada-37xx: Add missing MODULE_DEVICE_TABLE net: dsa: rtl8366: Check validity of passed VLANs net: dsa: rtl8366: Refactor VLAN/PVID init net: dsa: rtl8366: Skip PVID setting if not requested net: dsa: rtl8366rb: Support all 4096 VLANs ath6kl: wmi: prevent a shift wrapping bug in ath6kl_wmi_delete_pstream_cmd() misc: mic: scif: Fix error handling path ALSA: seq: oss: Avoid mutex lock for a long-time ioctl usb: dwc2: Fix parameter type in function pointer prototype quota: clear padding in v2r1_mem2diskdqb() slimbus: core: check get_addr before removing laddr ida slimbus: core: do not enter to clock pause mode in core slimbus: qcom-ngd-ctrl: disable ngd in qmi server down callback HID: hid-input: fix stylus battery reporting qtnfmac: fix resource leaks on unsupported iftype error return path net: enic: Cure the enic api locking trainwreck mfd: sm501: Fix leaks in probe() iwlwifi: mvm: split a print to avoid a WARNING in ROC usb: gadget: f_ncm: fix ncm_bitrate for SuperSpeed and above. usb: gadget: u_ether: enable qmult on SuperSpeed Plus as well nl80211: fix non-split wiphy information usb: dwc2: Fix INTR OUT transfers in DDMA mode. scsi: target: tcmu: Fix warning: 'page' may be used uninitialized scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs() platform/x86: mlx-platform: Remove PSU EEPROM configuration mwifiex: fix double free ipvs: clear skb->tstamp in forwarding path net: korina: fix kfree of rx/tx descriptor array netfilter: nf_log: missing vlan offload tag and proto mm/memcg: fix device private memcg accounting mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary IB/mlx4: Fix starvation in paravirt mux/demux IB/mlx4: Adjust delayed work when a dup is observed powerpc/pseries: Fix missing of_node_put() in rng_init() powerpc/icp-hv: Fix missing of_node_put() in success path RDMA/ucma: Fix locking for ctx->events_reported RDMA/ucma: Add missing locking around rdma_leave_multicast() mtd: lpddr: fix excessive stack usage with clang powerpc/pseries: explicitly reschedule during drmem_lmb list traversal mtd: mtdoops: Don't write panic data twice ARM: 9007/1: l2c: fix prefetch bits init in L2X0_AUX_CTRL using DT values arc: plat-hsdk: fix kconfig dependency warning when !RESET_CONTROLLER xfs: limit entries returned when counting fsmap records xfs: fix high key handling in the rt allocator's query_range function RDMA/qedr: Fix use of uninitialized field RDMA/qedr: Fix inline size returned for iWARP powerpc/tau: Use appropriate temperature sample interval powerpc/tau: Convert from timer to workqueue powerpc/tau: Remove duplicated set_thresholds() call Linux 4.19.153 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I9e85e8ca67ab8e28d04a77339f80fdbf3c568956 |
||
Suren Baghdasaryan
|
a3d0ceee71 |
mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary
[ Upstream commit 67197a4f28d28d0b073ab0427b03cb2ee5382578 ]
Currently __set_oom_adj loops through all processes in the system to keep
oom_score_adj and oom_score_adj_min in sync between processes sharing
their mm. This is done for any task with more that one mm_users, which
includes processes with multiple threads (sharing mm and signals).
However for such processes the loop is unnecessary because their signal
structure is shared as well.
Android updates oom_score_adj whenever a tasks changes its role
(background/foreground/...) or binds to/unbinds from a service, making it
more/less important. Such operation can happen frequently. We noticed
that updates to oom_score_adj became more expensive and after further
investigation found out that the patch mentioned in "Fixes" introduced a
regression. Using Pixel 4 with a typical Android workload, write time to
oom_score_adj increased from ~3.57us to ~362us. Moreover this regression
linearly depends on the number of multi-threaded processes running on the
system.
Mark the mm with a new MMF_MULTIPROCESS flag bit when task is created with
(CLONE_VM && !CLONE_THREAD && !CLONE_VFORK). Change __set_oom_adj to use
MMF_MULTIPROCESS instead of mm_users to decide whether oom_score_adj
update should be synchronized between multiple processes. To prevent
races between clone() and __set_oom_adj(), when oom_score_adj of the
process being cloned might be modified from userspace, we use
oom_adj_mutex. Its scope is changed to global.
The combination of (CLONE_VM && !CLONE_THREAD) is rarely used except for
the case of vfork(). To prevent performance regressions of vfork(), we
skip taking oom_adj_mutex and setting MMF_MULTIPROCESS when CLONE_VFORK is
specified. Clearing the MMF_MULTIPROCESS flag (when the last process
sharing the mm exits) is left out of this patch to keep it simple and
because it is believed that this threading model is rare. Should there
ever be a need for optimizing that case as well, it can be done by hooking
into the exit path, likely following the mm_update_next_owner pattern.
With the combination of (CLONE_VM && !CLONE_THREAD && !CLONE_VFORK) being
quite rare, the regression is gone after the change is applied.
[surenb@google.com: v3]
Link: https://lkml.kernel.org/r/20200902012558.2335613-1-surenb@google.com
Fixes:
|
||
Roman Gushchin
|
5318f3163c |
BACKPORT: cgroup: cgroup v2 freezer
Cgroup v1 implements the freezer controller, which provides an ability to stop the workload in a cgroup and temporarily free up some resources (cpu, io, network bandwidth and, potentially, memory) for some other tasks. Cgroup v2 lacks this functionality. This patch implements freezer for cgroup v2. Cgroup v2 freezer tries to put tasks into a state similar to jobctl stop. This means that tasks can be killed, ptraced (using PTRACE_SEIZE*), and interrupted. It is possible to attach to a frozen task, get some information (e.g. read registers) and detach. It's also possible to migrate a frozen tasks to another cgroup. This differs cgroup v2 freezer from cgroup v1 freezer, which mostly tried to imitate the system-wide freezer. However uninterruptible sleep is fine when all tasks are going to be frozen (hibernation case), it's not the acceptable state for some subset of the system. Cgroup v2 freezer is not supporting freezing kthreads. If a non-root cgroup contains kthread, the cgroup still can be frozen, but the kthread will remain running, the cgroup will be shown as non-frozen, and the notification will not be delivered. * PTRACE_ATTACH is not working because non-fatal signal delivery is blocked in frozen state. There are some interface differences between cgroup v1 and cgroup v2 freezer too, which are required to conform the cgroup v2 interface design principles: 1) There is no separate controller, which has to be turned on: the functionality is always available and is represented by cgroup.freeze and cgroup.events cgroup control files. 2) The desired state is defined by the cgroup.freeze control file. Any hierarchical configuration is allowed. 3) The interface is asynchronous. The actual state is available using cgroup.events control file ("frozen" field). There are no dedicated transitional states. 4) It's allowed to make any changes with the cgroup hierarchy (create new cgroups, remove old cgroups, move tasks between cgroups) no matter if some cgroups are frozen. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> No-objection-from-me-by: Oleg Nesterov <oleg@redhat.com> Cc: kernel-team@fb.com Change-Id: I3404119678cbcd7410aa56e9334055cee79d02fa (cherry picked from commit 76f969e8948d82e78e1bc4beb6b9465908e74873) cgroup-defs.h: use the struct cgroup_freezer_state and the freezer field from definitions in I6221a975c04f06249a4f8d693852776ae08a8d8e sched.h: use the frozen field defined in I6221a975c04f06249a4f8d693852776ae08a8d8e Bug: 154548692 Signed-off-by: Marco Ballesio <balejs@google.com> |
||
Greg Kroah-Hartman
|
1fca2c99f4 |
This is the 4.19.99 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl4u6tsACgkQONu9yGCS aT693A//TExeDRnNnf+2v4TJorylyRr17BMxk/Ie2L5E6d2n/RWodsrOThAPU9tx 5alNUkXCT8Jd31BUVnUoPoAQ4zSymSVi++XEf05wDeO0tQ982IESGaLmu9EC1uMF nnM5y4IdRYmFI1Zji4h5vRJckoYUlB6Mdg4BgMr4Q1KX7RkZYfe6bjs7DwM/uyMx jVXdFaQBD1H6F5W6A+GmgUZ36g9uNqzcBxxWwv5URj+q816NdI4bsxIJMF0v0WC+ S54fmpS07QWIYKKsQBUepeSgEF4ECESOE2VoF1ICcnfakdPnDBmNgyPJPSrLmVf+ itRUxoH1MewaOvoJrv+xsGBPmM29LcKH2oBmj5DR2Xstp7ACPs+OtXJEU9dUTDN4 NhaSts5fIp0f4Y5mMn508pDUwYDAWDt99ZJWdx6aK/TRyUsHBgpxBQDt37BE3U5W PCBnObNe2b2KDAsVXLjX5iDYoA0+usFreveMo8uEP+ohfh0ANvJlRkzedYw7NquI ZCcT+I1P9q8aa0528tR332VLrQeYg+kG6LVi2kAabmRA/VtEsT0w90MY/eo2vuTU WlPmbs2yerv2HTm050e6MOgBZfPh7wP/FpbjsSXufj7EDywlfxF+1hXdwfrpPJeN fN3g0kepeUp7+kLzO40FLam/z5ndjAUhyN2SBaPzGsXjMkZdETk= =zvlh -----END PGP SIGNATURE----- Merge 4.19.99 into android-4.19 Changes in 4.19.99 Revert "efi: Fix debugobjects warning on 'efi_rts_work'" xfs: Sanity check flags of Q_XQUOTARM call i2c: stm32f7: rework slave_id allocation i2c: i2c-stm32f7: fix 10-bits check in slave free id search loop mfd: intel-lpss: Add default I2C device properties for Gemini Lake SUNRPC: Fix svcauth_gss_proxy_init() powerpc/pseries: Enable support for ibm,drc-info property powerpc/archrandom: fix arch_get_random_seed_int() tipc: update mon's self addr when node addr generated tipc: fix wrong timeout input for tipc_wait_for_cond() mt7601u: fix bbp version check in mt7601u_wait_bbp_ready crypto: sun4i-ss - fix big endian issues perf map: No need to adjust the long name of modules soc: aspeed: Fix snoop_file_poll()'s return type watchdog: sprd: Fix the incorrect pointer getting from driver data ipmi: Fix memory leak in __ipmi_bmc_register drm/sti: do not remove the drm_bridge that was never added ARM: dts: at91: nattis: set the PRLUD and HIPOW signals low ARM: dts: at91: nattis: make the SD-card slot work ixgbe: don't clear IPsec sa counters on HW clearing drm/virtio: fix bounds check in virtio_gpu_cmd_get_capset() iio: fix position relative kernel version apparmor: Fix network performance issue in aa_label_sk_perm ALSA: hda: fix unused variable warning apparmor: don't try to replace stale label in ptrace access check ARM: qcom_defconfig: Enable MAILBOX firmware: coreboot: Let OF core populate platform device PCI: iproc: Remove PAXC slot check to allow VF support bridge: br_arp_nd_proxy: set icmp6_router if neigh has NTF_ROUTER drm/hisilicon: hibmc: Don't overwrite fb helper surface depth signal/ia64: Use the generic force_sigsegv in setup_frame signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn ASoC: wm9712: fix unused variable warning mailbox: mediatek: Add check for possible failure of kzalloc IB/rxe: replace kvfree with vfree IB/hfi1: Add mtu check for operational data VLs genirq/debugfs: Reinstate full OF path for domain name usb: dwc3: add EXTCON dependency for qcom usb: gadget: fsl_udc_core: check allocation return value and cleanup on failure cfg80211: regulatory: make initialization more robust mei: replace POLL* with EPOLL* for write queues. drm/msm: fix unsigned comparison with less than zero of: Fix property name in of_node_get_device_type ALSA: usb-audio: update quirk for B&W PX to remove microphone iwlwifi: nvm: get num of hw addresses from firmware staging: comedi: ni_mio_common: protect register write overflow netfilter: nft_osf: usage from output path is not valid pwm: lpss: Release runtime-pm reference from the driver's remove callback powerpc/pseries/memory-hotplug: Fix return value type of find_aa_index rtlwifi: rtl8821ae: replace _rtl8821ae_mrate_idx_to_arfr_id with generic version RDMA/bnxt_re: Add missing spin lock initialization netfilter: nf_flow_table: do not remove offload when other netns's interface is down powerpc/kgdb: add kgdb_arch_set/remove_breakpoint() tipc: eliminate message disordering during binding table update net: socionext: Add dummy PHY register read in phy_write() drm/sun4i: hdmi: Fix double flag assignation net: hns3: add error handler for hns3_nic_init_vector_data() mlxsw: reg: QEEC: Add minimum shaper fields mlxsw: spectrum: Set minimum shaper on MC TCs NTB: ntb_hw_idt: replace IS_ERR_OR_NULL with regular NULL checks ASoC: wm97xx: fix uninitialized regmap pointer problem ARM: dts: bcm283x: Correct mailbox register sizes pcrypt: use format specifier in kobject_add ASoC: sun8i-codec: add missing route for ADC pinctrl: meson-gxl: remove invalid GPIOX tsin_a pins bus: ti-sysc: Add mcasp optional clocks flag exportfs: fix 'passing zero to ERR_PTR()' warning drm: rcar-du: Fix the return value in case of error in 'rcar_du_crtc_set_crc_source()' drm: rcar-du: Fix vblank initialization net: always initialize pagedlen drm/dp_mst: Skip validating ports during destruction, just ref arm64: dts: meson-gx: Add hdmi_5v regulator as hdmi tx supply arm64: dts: renesas: r8a7795-es1: Add missing power domains to IPMMU nodes net: phy: Fix not to call phy_resume() if PHY is not attached IB/hfi1: Correctly process FECN and BECN in packets OPP: Fix missing debugfs supply directory for OPPs IB/rxe: Fix incorrect cache cleanup in error flow mailbox: ti-msgmgr: Off by one in ti_msgmgr_of_xlate() staging: bcm2835-camera: Abort probe if there is no camera staging: bcm2835-camera: fix module autoloading switchtec: Remove immediate status check after submitting MRPC command ipv6: add missing tx timestamping on IPPROTO_RAW pinctrl: sh-pfc: r8a7740: Add missing REF125CK pin to gether_gmii group pinctrl: sh-pfc: r8a7740: Add missing LCD0 marks to lcd0_data24_1 group pinctrl: sh-pfc: r8a7791: Remove bogus ctrl marks from qspi_data4_b group pinctrl: sh-pfc: r8a7791: Remove bogus marks from vin1_b_data18 group pinctrl: sh-pfc: sh73a0: Add missing TO pin to tpu4_to3 group pinctrl: sh-pfc: r8a7794: Remove bogus IPSR9 field pinctrl: sh-pfc: r8a77970: Add missing MOD_SEL0 field pinctrl: sh-pfc: r8a77980: Add missing MOD_SEL0 field pinctrl: sh-pfc: sh7734: Add missing IPSR11 field pinctrl: sh-pfc: r8a77995: Remove bogus SEL_PWM[0-3]_3 configurations pinctrl: sh-pfc: sh7269: Add missing PCIOR0 field pinctrl: sh-pfc: sh7734: Remove bogus IPSR10 value net: hns3: fix error handling int the hns3_get_vector_ring_chain vxlan: changelink: Fix handling of default remotes Input: nomadik-ske-keypad - fix a loop timeout test fork,memcg: fix crash in free_thread_stack on memcg charge fail clk: highbank: fix refcount leak in hb_clk_init() clk: qoriq: fix refcount leak in clockgen_init() clk: ti: fix refcount leak in ti_dt_clocks_register() clk: socfpga: fix refcount leak clk: samsung: exynos4: fix refcount leak in exynos4_get_xom() clk: imx6q: fix refcount leak in imx6q_clocks_init() clk: imx6sx: fix refcount leak in imx6sx_clocks_init() clk: imx7d: fix refcount leak in imx7d_clocks_init() clk: vf610: fix refcount leak in vf610_clocks_init() clk: armada-370: fix refcount leak in a370_clk_init() clk: kirkwood: fix refcount leak in kirkwood_clk_init() clk: armada-xp: fix refcount leak in axp_clk_init() clk: mv98dx3236: fix refcount leak in mv98dx3236_clk_init() clk: dove: fix refcount leak in dove_clk_init() MIPS: BCM63XX: drop unused and broken DSP platform device arm64: defconfig: Re-enable bcm2835-thermal driver remoteproc: qcom: q6v5-mss: Add missing clocks for MSM8996 remoteproc: qcom: q6v5-mss: Add missing regulator for MSM8996 drm: Fix error handling in drm_legacy_addctx ARM: dts: r8a7743: Remove generic compatible string from iic3 drm/etnaviv: fix some off by one bugs drm/fb-helper: generic: Fix setup error path fork, memcg: fix cached_stacks case IB/usnic: Fix out of bounds index check in query pkey RDMA/ocrdma: Fix out of bounds index check in query pkey RDMA/qedr: Fix out of bounds index check in query pkey drm/shmob: Fix return value check in shmob_drm_probe arm64: dts: apq8016-sbc: Increase load on l11 for SDCARD spi: cadence: Correct initialisation of runtime PM RDMA/iw_cxgb4: Fix the unchecked ep dereference net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ9031 memory: tegra: Don't invoke Tegra30+ specific memory timing setup on Tegra20 drm/etnaviv: NULL vs IS_ERR() buf in etnaviv_core_dump() media: s5p-jpeg: Correct step and max values for V4L2_CID_JPEG_RESTART_INTERVAL kbuild: mark prepare0 as PHONY to fix external module build crypto: brcm - Fix some set-but-not-used warning crypto: tgr192 - fix unaligned memory access ASoC: imx-sgtl5000: put of nodes if finding codec fails IB/iser: Pass the correct number of entries for dma mapped SGL net: hns3: fix wrong combined count returned by ethtool -l media: tw9910: Unregister subdevice with v4l2-async IB/mlx5: Don't override existing ip_protocol rtc: cmos: ignore bogus century byte spi/topcliff_pch: Fix potential NULL dereference on allocation error net: hns3: fix bug of ethtool_ops.get_channels for VF ARM: dts: sun8i-a23-a33: Move NAND controller device node to sort by address clk: sunxi-ng: sun8i-a23: Enable PLL-MIPI LDOs when ungating it iwlwifi: mvm: avoid possible access out of array. net/mlx5: Take lock with IRQs disabled to avoid deadlock ip_tunnel: Fix route fl4 init in ip_md_tunnel_xmit arm64: dts: allwinner: h6: Move GIC device node fix base address ordering iwlwifi: mvm: fix A-MPDU reference assignment bus: ti-sysc: Fix timer handling with drop pm_runtime_irq_safe() tty: ipwireless: Fix potential NULL pointer dereference driver: uio: fix possible memory leak in __uio_register_device driver: uio: fix possible use-after-free in __uio_register_device crypto: crypto4xx - Fix wrong ppc4xx_trng_probe()/ppc4xx_trng_remove() arguments driver core: Fix DL_FLAG_AUTOREMOVE_SUPPLIER device link flag handling driver core: Avoid careless re-use of existing device links driver core: Do not resume suppliers under device_links_write_lock() driver core: Fix handling of runtime PM flags in device_link_add() driver core: Do not call rpm_put_suppliers() in pm_runtime_drop_link() ARM: dts: lpc32xx: add required clocks property to keypad device node ARM: dts: lpc32xx: reparent keypad controller to SIC1 ARM: dts: lpc32xx: fix ARM PrimeCell LCD controller variant ARM: dts: lpc32xx: fix ARM PrimeCell LCD controller clocks property ARM: dts: lpc32xx: phy3250: fix SD card regulator voltage drm/xen-front: Fix mmap attributes for display buffers iwlwifi: mvm: fix RSS config command staging: most: cdev: add missing check for cdev_add failure clk: ingenic: jz4740: Fix gating of UDC clock rtc: ds1672: fix unintended sign extension thermal: mediatek: fix register index error arm64: dts: msm8916: remove bogus argument to the cpu clock ath10k: fix dma unmap direction for management frames net: phy: fixed_phy: Fix fixed_phy not checking GPIO rtc: ds1307: rx8130: Fix alarm handling net/smc: original socket family in inet_sock_diag rtc: 88pm860x: fix unintended sign extension rtc: 88pm80x: fix unintended sign extension rtc: pm8xxx: fix unintended sign extension fbdev: chipsfb: remove set but not used variable 'size' iw_cxgb4: use tos when importing the endpoint iw_cxgb4: use tos when finding ipv6 routes ipmi: kcs_bmc: handle devm_kasprintf() failure case xsk: add missing smp_rmb() in xsk_mmap drm/etnaviv: potential NULL dereference ntb_hw_switchtec: debug print 64bit aligned crosslink BAR Numbers ntb_hw_switchtec: NT req id mapping table register entry number should be 512 pinctrl: sh-pfc: emev2: Add missing pinmux functions pinctrl: sh-pfc: r8a7791: Fix scifb2_data_c pin group pinctrl: sh-pfc: r8a7792: Fix vin1_data18_b pin group pinctrl: sh-pfc: sh73a0: Fix fsic_spdif pin groups RDMA/mlx5: Fix memory leak in case we fail to add an IB device driver core: Fix possible supplier PM-usage counter imbalance PCI: endpoint: functions: Use memcpy_fromio()/memcpy_toio() usb: phy: twl6030-usb: fix possible use-after-free on remove block: don't use bio->bi_vcnt to figure out segment number keys: Timestamp new keys net: dsa: b53: Fix default VLAN ID net: dsa: b53: Properly account for VLAN filtering net: dsa: b53: Do not program CPU port's PVID mt76: usb: fix possible memory leak in mt76u_buf_free media: sh: migor: Include missing dma-mapping header vfio_pci: Enable memory accesses before calling pci_map_rom hwmon: (pmbus/tps53679) Fix driver info initialization in probe routine mdio_bus: Fix PTR_ERR() usage after initialization to constant KVM: PPC: Release all hardware TCE tables attached to a group staging: r8822be: check kzalloc return or bail dmaengine: mv_xor: Use correct device for DMA API cdc-wdm: pass return value of recover_from_urb_loss brcmfmac: create debugfs files for bus-specific layer regulator: pv88060: Fix array out-of-bounds access regulator: pv88080: Fix array out-of-bounds access regulator: pv88090: Fix array out-of-bounds access net: dsa: qca8k: Enable delay for RGMII_ID mode net/mlx5: Delete unused FPGA QPN variable drm/nouveau/bios/ramcfg: fix missing parentheses when calculating RON drm/nouveau/pmu: don't print reply values if exec is false drm/nouveau: fix missing break in switch statement driver core: Fix PM-runtime for links added during consumer probe ASoC: qcom: Fix of-node refcount unbalance in apq8016_sbc_parse_of() net: dsa: fix unintended change of bridge interface STP state fs/nfs: Fix nfs_parse_devname to not modify it's argument staging: rtlwifi: Use proper enum for return in halmac_parse_psd_data_88xx powerpc/64s: Fix logic when handling unknown CPU features NFS: Fix a soft lockup in the delegation recovery code perf: Copy parent's address filter offsets on clone perf, pt, coresight: Fix address filters for vmas with non-zero offset clocksource/drivers/sun5i: Fail gracefully when clock rate is unavailable clocksource/drivers/exynos_mct: Fix error path in timer resources initialization platform/x86: wmi: fix potential null pointer dereference NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umount mmc: sdhci-brcmstb: handle mmc_of_parse() errors during probe iommu: Fix IOMMU debugfs fallout ARM: 8847/1: pm: fix HYP/SVC mode mismatch when MCPM is used ARM: 8848/1: virt: Align GIC version check with arm64 counterpart ARM: 8849/1: NOMMU: Fix encodings for PMSAv8's PRBAR4/PRLAR4 regulator: wm831x-dcdc: Fix list of wm831x_dcdc_ilim from mA to uA ath10k: Fix length of wmi tlv command for protected mgmt frames netfilter: nft_set_hash: fix lookups with fixed size hash on big endian netfilter: nft_set_hash: bogus element self comparison from deactivation path net: sched: act_csum: Fix csum calc for tagged packets hwrng: bcm2835 - fix probe as platform device iommu/vt-d: Fix NULL pointer reference in intel_svm_bind_mm() NFS: Add missing encode / decode sequence_maxsz to v4.2 operations NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE() net: aquantia: fixed instack structure overflow powerpc/mm: Check secondary hash page table media: dvb/earth-pt1: fix wrong initialization for demod blocks rbd: clear ->xferred on error from rbd_obj_issue_copyup() PCI: Fix "try" semantics of bus and slot reset nios2: ksyms: Add missing symbol exports x86/mm: Remove unused variable 'cpu' scsi: megaraid_sas: reduce module load time nfp: fix simple vNIC mailbox length drivers/rapidio/rio_cm.c: fix potential oops in riocm_ch_listen() xen, cpu_hotplug: Prevent an out of bounds access net/mlx5: Fix multiple updates of steering rules in parallel net/mlx5e: IPoIB, Fix RX checksum statistics update net: sh_eth: fix a missing check of of_get_phy_mode regulator: lp87565: Fix missing register for LP87565_BUCK_0 soc: amlogic: gx-socinfo: Add mask for each SoC packages media: ivtv: update *pos correctly in ivtv_read_pos() media: cx18: update *pos correctly in cx18_read_pos() media: wl128x: Fix an error code in fm_download_firmware() media: cx23885: check allocation return regulator: tps65086: Fix tps65086_ldoa1_ranges for selector 0xB crypto: ccree - reduce kernel stack usage with clang jfs: fix bogus variable self-initialization tipc: tipc clang warning m68k: mac: Fix VIA timer counter accesses ARM: dts: sun8i: a33: Reintroduce default pinctrl muxing arm64: dts: allwinner: a64: Add missing PIO clocks ARM: dts: sun9i: optimus: Fix fixed-regulators net: phy: don't clear BMCR in genphy_soft_reset ARM: OMAP2+: Fix potentially uninitialized return value for _setup_reset() net: dsa: Avoid null pointer when failing to connect to PHY soc: qcom: cmd-db: Fix an error code in cmd_db_dev_probe() media: davinci-isif: avoid uninitialized variable use media: tw5864: Fix possible NULL pointer dereference in tw5864_handle_frame spi: tegra114: clear packed bit for unpacked mode spi: tegra114: fix for unpacked mode transfers spi: tegra114: terminate dma and reset on transfer timeout spi: tegra114: flush fifos spi: tegra114: configure dma burst size to fifo trig level bus: ti-sysc: Fix sysc_unprepare() when no clocks have been allocated soc/fsl/qe: Fix an error code in qe_pin_request() spi: bcm2835aux: fix driver to not allow 65535 (=-1) cs-gpios drm/fb-helper: generic: Call drm_client_add() after setup is done arm64/vdso: don't leak kernel addresses rtc: Fix timestamp value for RTC_TIMESTAMP_BEGIN_1900 rtc: mt6397: Don't call irq_dispose_mapping. ehea: Fix a copy-paste err in ehea_init_port_res bpf: Add missed newline in verifier verbose log drm/vmwgfx: Remove set but not used variable 'restart' scsi: qla2xxx: Unregister chrdev if module initialization fails of: use correct function prototype for of_overlay_fdt_apply() net/sched: cbs: fix port_rate miscalculation clk: qcom: Skip halt checks on gcc_pcie_0_pipe_clk for 8998 ACPI: button: reinitialize button state upon resume firmware: arm_scmi: fix of_node leak in scmi_mailbox_check rxrpc: Fix detection of out of order acks scsi: target/core: Fix a race condition in the LUN lookup code brcmfmac: fix leak of mypkt on error return path ARM: pxa: ssp: Fix "WARNING: invalid free of devm_ allocated data" PCI: rockchip: Fix rockchip_pcie_ep_assert_intx() bitwise operations net: hns3: fix for vport->bw_limit overflow problem hwmon: (w83627hf) Use request_muxed_region for Super-IO accesses perf/core: Fix the address filtering fix staging: android: vsoc: fix copy_from_user overrun PCI: dwc: Fix dw_pcie_ep_find_capability() to return correct capability offset soc: amlogic: meson-gx-pwrc-vpu: Fix power on/off register bitmask platform/x86: alienware-wmi: fix kfree on potentially uninitialized pointer tipc: set sysctl_tipc_rmem and named_timeout right range usb: typec: tcpm: Notify the tcpc to start connection-detection for SRPs selftests/ipc: Fix msgque compiler warnings net: hns3: fix loop condition of hns3_get_tx_timeo_queue_info() powerpc: vdso: Make vdso32 installation conditional in vdso_install ARM: dts: ls1021: Fix SGMII PCS link remaining down after PHY disconnect media: ov2659: fix unbalanced mutex_lock/unlock 6lowpan: Off by one handling ->nexthdr dmaengine: axi-dmac: Don't check the number of frames for alignment ALSA: usb-audio: Handle the error from snd_usb_mixer_apply_create_quirk() afs: Fix AFS file locking to allow fine grained locks afs: Further fix file locking NFS: Don't interrupt file writeout due to fatal errors coresight: catu: fix clang build warning s390/kexec_file: Fix potential segment overlap in ELF loader irqchip/gic-v3-its: fix some definitions of inner cacheability attributes scsi: qla2xxx: Fix a format specifier scsi: qla2xxx: Fix error handling in qlt_alloc_qfull_cmd() scsi: qla2xxx: Avoid that qlt_send_resp_ctio() corrupts memory KVM: PPC: Book3S HV: Fix lockdep warning when entering the guest netfilter: nft_flow_offload: add entry to flowtable after confirmation PCI: iproc: Enable iProc config read for PAXBv2 ARM: dts: logicpd-som-lv: Fix MMC1 card detect packet: in recvmsg msg_name return at least sizeof sockaddr_ll ASoC: fix valid stream condition usb: gadget: fsl: fix link error against usb-gadget module dwc2: gadget: Fix completed transfer size calculation in DDMA IB/mlx5: Add missing XRC options to QP optional params mask RDMA/rxe: Consider skb reserve space based on netdev of GID iommu/vt-d: Make kernel parameter igfx_off work with vIOMMU net: ena: fix swapped parameters when calling ena_com_indirect_table_fill_entry net: ena: fix: Free napi resources when ena_up() fails net: ena: fix incorrect test of supported hash function net: ena: fix ena_com_fill_hash_function() implementation dmaengine: tegra210-adma: restore channel status watchdog: rtd119x_wdt: Fix remove function mmc: core: fix possible use after free of host lightnvm: pblk: fix lock order in pblk_rb_tear_down_check ath10k: Fix encoding for protected management frames afs: Fix the afs.cell and afs.volume xattr handlers vfio/mdev: Avoid release parent reference during error path vfio/mdev: Follow correct remove sequence vfio/mdev: Fix aborting mdev child device removal if one fails l2tp: Fix possible NULL pointer dereference ALSA: aica: Fix a long-time build breakage media: omap_vout: potential buffer overflow in vidioc_dqbuf() media: davinci/vpbe: array underflow in vpbe_enum_outputs() platform/x86: alienware-wmi: printing the wrong error code crypto: caam - fix caam_dump_sg that iterates through scatterlist netfilter: ebtables: CONFIG_COMPAT: reject trailing data after last rule pwm: meson: Consider 128 a valid pre-divider pwm: meson: Don't disable PWM when setting duty repeatedly ARM: riscpc: fix lack of keyboard interrupts after irq conversion nfp: bpf: fix static check error through tightening shift amount adjustment kdb: do a sanity check on the cpu in kdb_per_cpu() netfilter: nf_tables: correct NFT_LOGLEVEL_MAX value backlight: lm3630a: Return 0 on success in update_status functions thermal: rcar_gen3_thermal: fix interrupt type thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power EDAC/mc: Fix edac_mc_find() in case no device is found afs: Fix key leak in afs_release() and afs_evict_inode() afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set afs: Fix lock-wait/callback-break double locking afs: Fix double inc of vnode->cb_break ARM: dts: sun8i-h3: Fix wifi in Beelink X2 DT clk: meson: gxbb: no spread spectrum on mpll0 clk: meson: axg: spread spectrum is on mpll2 dmaengine: tegra210-adma: Fix crash during probe arm64: dts: meson: libretech-cc: set eMMC as removable RDMA/qedr: Fix incorrect device rate. spi: spi-fsl-spi: call spi_finalize_current_message() at the end crypto: ccp - fix AES CFB error exposed by new test vectors crypto: ccp - Fix 3DES complaint from ccp-crypto module serial: stm32: fix word length configuration serial: stm32: fix rx error handling serial: stm32: fix rx data length when parity enabled serial: stm32: fix transmit_chars when tx is stopped serial: stm32: Add support of TC bit status check serial: stm32: fix wakeup source initialization misc: sgi-xp: Properly initialize buf in xpc_get_rsvd_page_pa iommu: Add missing new line for dma type iommu: Use right function to get group for device signal/bpfilter: Fix bpfilter_kernl to use send_sig not force_sig signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig inet: frags: call inet_frags_fini() after unregister_pernet_subsys() net: hns3: fix a memory leak issue for hclge_map_unmap_ring_to_vf_vector crypto: talitos - fix AEAD processing. netvsc: unshare skb in VF rx handler net: core: support XDP generic on stacked devices. RDMA/uverbs: check for allocation failure in uapi_add_elm() net: don't clear sock->sk early to avoid trouble in strparser phy: qcom-qusb2: fix missing assignment of ret when calling clk_prepare_enable cpufreq: brcmstb-avs-cpufreq: Fix initial command check cpufreq: brcmstb-avs-cpufreq: Fix types for voltage/frequency clk: sunxi-ng: sun50i-h6-r: Fix incorrect W1 clock gate register media: vivid: fix incorrect assignment operation when setting video mode crypto: inside-secure - fix zeroing of the request in ahash_exit_inv crypto: inside-secure - fix queued len computation arm64: dts: renesas: ebisu: Remove renesas, no-ether-link property mpls: fix warning with multi-label encap serial: stm32: fix a recursive locking in stm32_config_rs485 arm64: dts: meson-gxm-khadas-vim2: fix gpio-keys-polled node arm64: dts: meson-gxm-khadas-vim2: fix Bluetooth support iommu/vt-d: Duplicate iommu_resv_region objects per device list phy: usb: phy-brcm-usb: Remove sysfs attributes upon driver removal firmware: arm_scmi: fix bitfield definitions for SENSOR_DESC attributes firmware: arm_scmi: update rate_discrete in clock_describe_rates_get ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev() ASoC: meson: axg-tdmin: right_j is not supported ASoC: meson: axg-tdmout: right_j is not supported qed: iWARP - Use READ_ONCE and smp_store_release to access ep->state qed: iWARP - fix uninitialized callback powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration bpf: fix the check that forwarding is enabled in bpf_ipv6_fib_lookup IB/hfi1: Handle port down properly in pio drm/msm/mdp5: Fix mdp5_cfg_init error return net: netem: fix backlog accounting for corrupted GSO frames net/udp_gso: Allow TX timestamp with UDP GSO net/af_iucv: build proper skbs for HiperTransport net/af_iucv: always register net_device notifier ASoC: ti: davinci-mcasp: Fix slot mask settings when using multiple AXRs rtc: pcf8563: Fix interrupt trigger method rtc: pcf8563: Clear event flags and disable interrupts before requesting irq ARM: dts: iwg20d-q7-common: Fix SDHI1 VccQ regularor net/sched: cbs: Fix error path of cbs_module_init arm64: dts: allwinner: h6: Pine H64: Add interrupt line for RTC drm/msm/a3xx: remove TPL1 regs from snapshot ip6_fib: Don't discard nodes with valid routing information in fib6_locate_1() perf/ioctl: Add check for the sample_period value dmaengine: hsu: Revert "set HSU_CH_MTSR to memory width" clk: qcom: Fix -Wunused-const-variable nvmem: imx-ocotp: Ensure WAIT bits are preserved when setting timing nvmem: imx-ocotp: Change TIMING calculation to u-boot algorithm tools: bpftool: use correct argument in cgroup errors backlight: pwm_bl: Fix heuristic to determine number of brightness levels fork,memcg: alloc_thread_stack_node needs to set tsk->stack bnxt_en: Fix ethtool selftest crash under error conditions. bnxt_en: Suppress error messages when querying DSCP DCB capabilities. iommu/amd: Make iommu_disable safer mfd: intel-lpss: Release IDA resources rxrpc: Fix uninitialized error code in rxrpc_send_data_packet() xprtrdma: Fix use-after-free in rpcrdma_post_recvs um: Fix IRQ controller regression on console read PM: ACPI/PCI: Resume all devices during hibernation ACPI: PM: Simplify and fix PM domain hibernation callbacks ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS fsi/core: Fix error paths on CFAM init devres: allow const resource arguments fsi: sbefifo: Don't fail operations when in SBE IPL state RDMA/hns: Fixs hw access invalid dma memory error PCI: mobiveil: Remove the flag MSI_FLAG_MULTI_PCI_MSI PCI: mobiveil: Fix devfn check in mobiveil_pcie_valid_device() PCI: mobiveil: Fix the valid check for inbound and outbound windows ceph: fix "ceph.dir.rctime" vxattr value net: pasemi: fix an use-after-free in pasemi_mac_phy_init() net/tls: fix socket wmem accounting on fallback with netem x86/pgtable/32: Fix LOWMEM_PAGES constant xdp: fix possible cq entry leak ARM: stm32: use "depends on" instead of "if" after prompt scsi: libfc: fix null pointer dereference on a null lport xfrm interface: ifname may be wrong in logs drm/panel: make drm_panel.h self-contained clk: sunxi-ng: v3s: add the missing PLL_DDR1 PM: sleep: Fix possible overflow in pm_system_cancel_wakeup() libertas_tf: Use correct channel range in lbtf_geo_init qed: reduce maximum stack frame size usb: host: xhci-hub: fix extra endianness conversion media: rcar-vin: Clean up correct notifier in error path mic: avoid statically declaring a 'struct device'. x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI crypto: ccp - Reduce maximum stack usage ALSA: aoa: onyx: always initialize register read value arm64: dts: renesas: r8a77995: Fix register range of display node tipc: reduce risk of wakeup queue starvation ARM: dts: stm32: add missing vdda-supply to adc on stm32h743i-eval net/mlx5: Fix mlx5_ifc_query_lag_out_bits cifs: fix rmmod regression in cifs.ko caused by force_sig changes iio: tsl2772: Use devm_add_action_or_reset for tsl2772_chip_off net: fix bpf_xdp_adjust_head regression for generic-XDP spi: bcm-qspi: Fix BSPI QUAD and DUAL mode support when using flex mode cxgb4: smt: Add lock for atomic_dec_and_test crypto: caam - free resources in case caam_rng registration failed ext4: set error return correctly when ext4_htree_store_dirent fails RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver ASoC: es8328: Fix copy-paste error in es8328_right_line_controls ASoC: cs4349: Use PM ops 'cs4349_runtime_pm' ASoC: wm8737: Fix copy-paste error in wm8737_snd_controls net/rds: Add a few missing rds_stat_names entries tools: bpftool: fix arguments for p_err() in do_event_pipe() tools: bpftool: fix format strings and arguments for jsonw_printf() drm: rcar-du: lvds: Fix bridge_to_rcar_lvds bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails signal: Allow cifs and drbd to receive their terminating signals powerpc/64s/radix: Fix memory hot-unplug page table split ASoC: sun4i-i2s: RX and TX counter registers are swapped dmaengine: dw: platform: Switch to acpi_dma_controller_register() rtc: rv3029: revert error handling patch to rv3029_eeprom_write() mac80211: minstrel_ht: fix per-group max throughput rate initialization i40e: reduce stack usage in i40e_set_fc media: atmel: atmel-isi: fix timeout value for stop streaming ARM: 8896/1: VDSO: Don't leak kernel addresses rtc: pcf2127: bugfix: read rtc disables watchdog mips: avoid explicit UB in assignment of mips_io_port_base media: em28xx: Fix exception handling in em28xx_alloc_urbs() iommu/mediatek: Fix iova_to_phys PA start for 4GB mode ahci: Do not export local variable ahci_em_messages rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2] Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()" hwmon: (lm75) Fix write operations for negative temperatures net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate power: supply: Init device wakeup after device_add() x86, perf: Fix the dependency of the x86 insn decoder selftest staging: greybus: light: fix a couple double frees irqdomain: Add the missing assignment of domain->fwnode for named fwnode bcma: fix incorrect update of BCMA_CORE_PCI_MDIO_DATA usb: typec: tps6598x: Fix build error without CONFIG_REGMAP_I2C bcache: Fix an error code in bch_dump_read() iio: dac: ad5380: fix incorrect assignment to val netfilter: ctnetlink: honor IPS_OFFLOAD flag ath9k: dynack: fix possible deadlock in ath_dynack_node_{de}init wcn36xx: use dynamic allocation for large variables tty: serial: fsl_lpuart: Use appropriate lpuart32_* I/O funcs ARM: dts: aspeed-g5: Fixe gpio-ranges upper limit xsk: avoid store-tearing when assigning queues xsk: avoid store-tearing when assigning umem led: triggers: Fix dereferencing of null pointer net: sonic: return NETDEV_TX_OK if failed to map buffer net: hns3: fix error VF index when setting VLAN offload rtlwifi: Fix file release memory leak ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux f2fs: fix wrong error injection path in inc_valid_block_count() f2fs: fix error path of f2fs_convert_inline_page() scsi: fnic: fix msix interrupt allocation Btrfs: fix hang when loading existing inode cache off disk Btrfs: fix inode cache waiters hanging on failure to start caching thread Btrfs: fix inode cache waiters hanging on path allocation failure btrfs: use correct count in btrfs_file_write_iter() ixgbe: sync the first fragment unconditionally hwmon: (shtc1) fix shtc1 and shtw1 id mask net: sonic: replace dev_kfree_skb in sonic_send_packet pinctrl: iproc-gpio: Fix incorrect pinconf configurations gpio/aspeed: Fix incorrect number of banks ath10k: adjust skb length in ath10k_sdio_mbox_rx_packet RDMA/cma: Fix false error message net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names' um: Fix off by one error in IRQ enumeration bnxt_en: Increase timeout for HWRM_DBG_COREDUMP_XX commands f2fs: fix to avoid accessing uninitialized field of inode page in is_alive() mailbox: qcom-apcs: fix max_register value clk: actions: Fix factor clk struct member access powerpc/mm/mce: Keep irqs disabled during lockless page table walk bpf: fix BTF limits crypto: hisilicon - Matching the dma address for dma_pool_free() iommu/amd: Wait for completion of IOTLB flush in attach_device net: aquantia: Fix aq_vec_isr_legacy() return value cxgb4: Signedness bug in init_one() net: hisilicon: Fix signedness bug in hix5hd2_dev_probe() net: broadcom/bcmsysport: Fix signedness in bcm_sysport_probe() net: netsec: Fix signedness bug in netsec_probe() net: socionext: Fix a signedness bug in ave_probe() net: stmmac: dwmac-meson8b: Fix signedness bug in probe net: axienet: fix a signedness bug in probe of: mdio: Fix a signedness bug in of_phy_get_and_connect() net: nixge: Fix a signedness bug in nixge_probe() net: ethernet: stmmac: Fix signedness bug in ipq806x_gmac_of_parse() net: sched: cbs: Avoid division by zero when calculating the port rate nvme: retain split access workaround for capability reads net: stmmac: gmac4+: Not all Unicast addresses may be available rxrpc: Fix trace-after-put looking at the put connection record mac80211: accept deauth frames in IBSS mode llc: fix another potential sk_buff leak in llc_ui_sendmsg() llc: fix sk_buff refcounting in llc_conn_state_process() ip6erspan: remove the incorrect mtu limit for ip6erspan net: stmmac: fix length of PTP clock's name string net: stmmac: fix disabling flexible PPS output sctp: add chunks to sk_backlog when the newsk sk_socket is not set s390/qeth: Fix error handling during VNICC initialization s390/qeth: Fix initialization of vnicc cmd masks during set online act_mirred: Fix mirred_init_module error handling net: avoid possible false sharing in sk_leave_memory_pressure() net: add {READ|WRITE}_ONCE() annotations on ->rskq_accept_head tcp: annotate lockless access to tcp_memory_pressure net/smc: receive returns without data net/smc: receive pending data after RCV_SHUTDOWN drm/msm/dsi: Implement reset correctly vhost/test: stop device before reset dmaengine: imx-sdma: fix size check for sdma script_number firmware: dmi: Fix unlikely out-of-bounds read in save_mem_devices arm64: hibernate: check pgd table allocation net: netem: fix error path for corrupted GSO frames net: netem: correct the parent's backlog when corrupted packet was dropped xsk: Fix registration of Rx-only sockets bpf, offload: Unlock on error in bpf_offload_dev_create() afs: Fix missing timeout reset net: qca_spi: Move reset_count to struct qcaspi hv_netvsc: Fix offset usage in netvsc_send_table() hv_netvsc: Fix send_table offset in case of a host bug afs: Fix large file support drm: panel-lvds: Potential Oops in probe error handling hwrng: omap3-rom - Fix missing clock by probing with device tree dpaa_eth: perform DMA unmapping before read dpaa_eth: avoid timestamp read on error paths MIPS: Loongson: Fix return value of loongson_hwmon_init hv_netvsc: flag software created hash value net: neigh: use long type to store jiffies delta packet: fix data-race in fanout_flow_is_huge() i2c: stm32f7: report dma error during probe mmc: sdio: fix wl1251 vendor id mmc: core: fix wl1251 sdio quirks affs: fix a memory leak in affs_remount afs: Remove set but not used variables 'before', 'after' dmaengine: ti: edma: fix missed failure handling drm/radeon: fix bad DMA from INTERRUPT_CNTL2 arm64: dts: juno: Fix UART frequency samples/bpf: Fix broken xdp_rxq_info due to map order assumptions usb: dwc3: Allow building USB_DWC3_QCOM without EXTCON IB/iser: Fix dma_nents type definition serial: stm32: fix clearing interrupt error flags arm64: dts: meson-gxm-khadas-vim2: fix uart_A bluetooth node m68k: Call timer_interrupt() with interrupts disabled Linux 4.19.99 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ieabeab79ea5c8cb4b6b1552702fa5d6100cea5db |
||
Andrea Arcangeli
|
fde68698dd |
fork,memcg: alloc_thread_stack_node needs to set tsk->stack
[ Upstream commit 1bf4580e00a248a2c86269125390eb3648e1877c ] Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail") corrected two instances, but there was a third instance of this bug. Without setting tsk->stack, if memcg_charge_kernel_stack fails, it'll execute free_thread_stack() on a dangling pointer. Enterprise kernels are compiled with VMAP_STACK=y so this isn't critical, but custom VMAP_STACK=n builds should have some performance advantage, with the drawback of risking to fail fork because compaction didn't succeed. So as long as VMAP_STACK=n is a supported option it's worth fixing it upstream. Link: http://lkml.kernel.org/r/20190619011450.28048-1-aarcange@redhat.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Shakeel Butt
|
3ed8ca4d29 |
fork, memcg: fix cached_stacks case
[ Upstream commit ba4a45746c362b665e245c50b870615f02f34781 ] Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail") fixes a crash caused due to failed memcg charge of the kernel stack. However the fix misses the cached_stacks case which this patch fixes. So, the same crash can happen if the memcg charge of a cached stack is failed. Link: http://lkml.kernel.org/r/20190102180145.57406-1-shakeelb@google.com Fixes: 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on memcg charge fail") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Rik van Riel <riel@surriel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Rik van Riel
|
641164565b |
fork,memcg: fix crash in free_thread_stack on memcg charge fail
[ Upstream commit 5eed6f1dff87bfb5e545935def3843edf42800f2 ] Commit 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") will result in fork failing if allocating a kernel stack for a task in dup_task_struct exceeds the kernel memory allowance for that cgroup. Unfortunately, it also results in a crash. This is due to the code jumping to free_stack and calling free_thread_stack when the memcg kernel stack charge fails, but without tsk->stack pointing at the freshly allocated stack. This in turn results in the vfree_atomic in free_thread_stack oopsing with a backtrace like this: #5 [ffffc900244efc88] die at ffffffff8101f0ab #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86 #7 [ffffc900244efce0] general_protection at ffffffff818ff082 [exception RIP: llist_add_batch+7] RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000 RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0 R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37 #10 [ffffc900244efe98] _do_fork at ffffffff810884e0 #11 [ffffc900244eff10] sys_vfork at ffffffff810887ff #12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43 RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948 RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421 RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00 R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040 R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018 ORIG_RAX: 000000000000003a CS: 0033 SS: 002b The simplest fix is to assign tsk->stack right where it is allocated. Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Greg Kroah-Hartman
|
291d853dff |
This is the 4.19.88 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl3owgEACgkQONu9yGCS aT43zw//SS1As83XXuHr4mdWIVDjXo6RMJ6Ib7YbRi/uhBmQuUuGVFcqGxUIA9Kl eSXu5Kt8TNmInzHq9AMYgegrELAEwPD2XfptALGDwiUHonQuiFaqOQn/bltJOm1L PsG15A7+/gFhuhPJDp2ZfNBmZGdpXdIwD27oUDqF1XD64dMa/HPbFUVgxWn3HHkd sm0J6Ez0eNA+BmLnHXYDiSaEYIiwvy1nN6XpyIfOyb2Tz6kPoe0vVWU00Cmy8KAU EIWB+TBRunspgMsShL5Cl1MSFOxf9QOmgnZxcrODAQfb1TbLMACB1FGMjK4nLm+3 wPlSnC7L49ARl/pvmN5NOUrjHi8S8qq/Od9QW+UIckRI6KzOU832h99v4gFuHjSC KFiLi5K9+uTIMgNOETmINBiKKUcUzYXYVajvm4tuAUq3HO8wy6jeALtt34OiJZQZ DV8wyBdL9NDUFqBymFaMFA4Us/fGIREzvPgI0E0jth2ANuLFLtScrnStuWv8buwJ JT3V9xCxHZtZ3Ctevx/Jp6OaQtnbSnWjMjrO0UDzZ6N7+g5UKmh9/R3xL6sBpFVU Vu49J+qWU3VmbY3EIulel+yARNe7xS4ExK185JmNzpYFyOpXum14FHhhtQ6xNSeu dRqyITI0KYP7jWtBDKCgVAWF5jC9gHP1ksrHSZMhyGrv1dC1XZM= =KnJW -----END PGP SIGNATURE----- Merge 4.19.88 into android-4.19 Changes in 4.19.88 clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate clocksource/drivers/mediatek: Fix error handling ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX ASoC: compress: fix unsigned integer overflow check reset: Fix memory leak in reset_control_array_put() clk: samsung: exynos5433: Fix error paths ASoC: kirkwood: fix external clock probe defer ASoC: kirkwood: fix device remove ordering clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume pinctrl: cherryview: Allocate IRQ chip dynamic ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts reset: fix reset_control_ops kerneldoc comment clk: at91: avoid sleeping early clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18 ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend samples/bpf: fix build by setting HAVE_ATTR_TEST to zero powerpc/bpf: Fix tail call implementation idr: Fix integer overflow in idr_for_each_entry idr: Fix idr_alloc_u32 on 32-bit systems x86/resctrl: Prevent NULL pointer dereference when reading mondata clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call clk: ti: clkctrl: Fix failed to enable error with double udelay timeout net: fec: add missed clk_disable_unprepare in remove bridge: ebtables: don't crash when using dnat target in output chains can: peak_usb: report bus recovery as well can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition watchdog: meson: Fix the wrong value of left time ASoC: stm32: sai: add restriction on mmap support scripts/gdb: fix debugging modules compiled with hot/cold partitioning net: bcmgenet: use RGMII loopback for MAC reset net: bcmgenet: reapply manual settings to the PHY net: mscc: ocelot: fix __ocelot_rmw_ix prototype ceph: return -EINVAL if given fsc mount option on kernel w/o support net/fq_impl: Switch to kvmalloc() for memory allocation mac80211: fix station inactive_time shortly after boot block: drbd: remove a stray unlock in __drbd_send_protocol() pwm: bcm-iproc: Prevent unloading the driver module while in use scsi: target/tcmu: Fix queue_cmd_ring() declaration scsi: lpfc: Fix kernel Oops due to null pring pointers scsi: lpfc: Fix dif and first burst use in write commands ARM: dts: Fix up SQ201 flash access tracing: Lock event_mutex before synth_event_mutex ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed ARM: dts: imx51: Fix memory node duplication ARM: dts: imx53: Fix memory node duplication ARM: dts: imx31: Fix memory node duplication ARM: dts: imx35: Fix memory node duplication ARM: dts: imx7: Fix memory node duplication ARM: dts: imx6ul: Fix memory node duplication ARM: dts: imx6sx: Fix memory node duplication ARM: dts: imx6sl: Fix memory node duplication ARM: dts: imx50: Fix memory node duplication ARM: dts: imx23: Fix memory node duplication ARM: dts: imx1: Fix memory node duplication ARM: dts: imx27: Fix memory node duplication ARM: dts: imx25: Fix memory node duplication ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication parisc: Fix serio address output parisc: Fix HP SDC hpa address output ARM: dts: Fix hsi gdd range for omap4 arm64: mm: Prevent mismatched 52-bit VA support arm64: smp: Handle errors reported by the firmware bus: ti-sysc: Check for no-reset and no-idle flags at the child level platform/x86: mlx-platform: Fix LED configuration ARM: OMAP1: fix USB configuration for device-only setups RDMA/hns: Fix the bug while use multi-hop of pbl arm64: preempt: Fix big-endian when checking preempt count in assembly RDMA/vmw_pvrdma: Use atomic memory allocation in create AH PM / AVS: SmartReflex: NULL check before some freeing functions is not needed xfs: zero length symlinks are not valid ARM: ks8695: fix section mismatch warning ACPI / LPSS: Ignore acpi_device_fix_up_power() return value scsi: lpfc: Enable Management features for IF_TYPE=6 scsi: qla2xxx: Fix NPIV handling for FC-NVMe scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port nvme: provide fallback for discard alloc failure s390/zcrypt: make sysfs reset attribute trigger queue reset crypto: user - support incremental algorithm dumps arm64: dts: renesas: draak: Fix CVBS input mwifiex: fix potential NULL dereference and use after free mwifiex: debugfs: correct histogram spacing, formatting brcmfmac: set F2 watermark to 256 for 4373 brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373 rtl818x: fix potential use after free bcache: do not check if debug dentry is ERR or NULL explicitly on remove bcache: do not mark writeback_running too early xfs: require both realtime inodes to mount nvme: fix kernel paging oops ubifs: Fix default compression selection in ubifs ubi: Put MTD device after it is not used ubi: Do not drop UBI device reference before using microblaze: adjust the help to the real behavior microblaze: move "... is ready" messages to arch/microblaze/Makefile microblaze: fix multiple bugs in arch/microblaze/boot/Makefile iwlwifi: move iwl_nvm_check_version() into dvm iwlwifi: mvm: force TCM re-evaluation on TCM resume iwlwifi: pcie: fix erroneous print iwlwifi: pcie: set cmd_len in the correct place gpio: pca953x: Fix AI overflow on PCAL6524 gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB kvm: vmx: Set IA32_TSC_AUX for legacy mode guests Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS" Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()" crypto/chelsio/chtls: listen fails with multiadapt VSOCK: bind to random port for VMADDR_PORT_ANY mmc: meson-gx: make sure the descriptor is stopped on errors mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET usb: ehci-omap: Fix deferred probe for phy handling btrfs: Check for missing device before bio submission in btrfs_map_bio btrfs: fix ncopies raid_attr for RAID56 btrfs: dev-replace: set result code of cancel by status of scrub Btrfs: allow clear_extent_dirty() to receive a cached extent state record btrfs: only track ref_heads in delayed_ref_updates serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback HID: intel-ish-hid: fixes incorrect error handling gpio: raspberrypi-exp: decrease refcount on firmware dt node serial: 8250: Rate limit serial port rx interrupts during input overruns kprobes/x86/xen: blacklist non-attachable xen interrupt functions xen/pciback: Check dev_data before using it kprobes: Blacklist symbols in arch-defined prohibited area kprobes/x86: Show x86-64 specific blacklisted symbols correctly vfio-mdev/samples: Use u8 instead of char for handle functions memory: omap-gpmc: Get the header of the enum pinctrl: xway: fix gpio-hog related boot issues net/mlx5: Continue driver initialization despite debugfs failure netfilter: nf_nat_sip: fix RTP/RTCP source port translations exofs_mount(): fix leaks on failure exits bnxt_en: Return linux standard errors in bnxt_ethtool.c bnxt_en: Save ring statistics before reset. bnxt_en: query force speeds before disabling autoneg mode. KVM: s390: unregister debug feature on failing arch init pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10 HID: doc: fix wrong data structure reference for UHID_OUTPUT dm flakey: Properly corrupt multi-page bios. gfs2: take jdata unstuff into account in do_grow dm raid: fix false -EBUSY when handling check/repair message xfs: Align compat attrlist_by_handle with native implementation. xfs: Fix bulkstat compat ioctls on x32 userspace. IB/qib: Fix an error code in qib_sdma_verbs_send() clocksource/drivers/fttmr010: Fix invalid interrupt register access vxlan: Fix error path in __vxlan_dev_create() powerpc/book3s/32: fix number of bats in p/v_block_mapped() powerpc/xmon: fix dump_segments() drivers/regulator: fix a missing check of return value Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading serial: max310x: Fix tx_empty() callback openrisc: Fix broken paths to arch/or32 RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer scsi: qla2xxx: deadlock by configfs_depend_item scsi: csiostor: fix incorrect dma device in case of vport brcmfmac: Fix access point mode ath6kl: Only use match sets when firmware supports it ath6kl: Fix off by one error in scan completion powerpc/perf: Fix unit_sel/cache_sel checks powerpc/32: Avoid unsupported flags with clang powerpc/prom: fix early DEBUG messages powerpc/mm: Make NULL pointer deferences explicit on bad page faults. powerpc/44x/bamboo: Fix PCI range vfio/spapr_tce: Get rid of possible infinite loop powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status drbd: ignore "all zero" peer volume sizes in handshake drbd: reject attach of unsuitable uuids even if connected drbd: do not block when adjusting "disk-options" while IO is frozen drbd: fix print_st_err()'s prototype to match the definition IB/rxe: Make counters thread safe bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't regulator: tps65910: fix a missing check of return value powerpc/83xx: handle machine check caused by watchdog timer powerpc/pseries: Fix node leak in update_lmb_associativity_index() powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y crypto: mxc-scc - fix build warnings on ARM64 pwm: clps711x: Fix period calculation net/netlink_compat: Fix a missing check of nla_parse_nested net/net_namespace: Check the return value of register_pernet_subsys() f2fs: fix block address for __check_sit_bitmap f2fs: fix to dirty inode synchronously um: Include sys/uio.h to have writev() um: Make GCOV depend on !KCOV net: (cpts) fix a missing check of clk_prepare net: stmicro: fix a missing check of clk_prepare net: dsa: bcm_sf2: Propagate error value from mdio_write atl1e: checking the status of atl1e_write_phy_reg tipc: fix a missing check of genlmsg_put net: marvell: fix a missing check of acpi_match_device net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe() ocfs2: clear journal dirty flag after shutdown journal vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n mm/page_alloc.c: free order-0 pages through PCP in page_frag_free() mm/page_alloc.c: use a single function to free page mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free() tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures netfilter: nf_tables: fix a missing check of nla_put_failure xprtrdma: Prevent leak of rpcrdma_rep objects infiniband: bnxt_re: qplib: Check the return value of send_message infiniband/qedr: Potential null ptr dereference of qp firmware: arm_sdei: fix wrong of_node_put() in init function firmware: arm_sdei: Fix DT platform device creation lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk lib/genalloc.c: use vzalloc_node() to allocate the bitmap fork: fix some -Wmissing-prototypes warnings drivers/base/platform.c: kmemleak ignore a known leak lib/genalloc.c: include vmalloc.h mtd: Check add_mtd_device() ret code tipc: fix memory leak in tipc_nl_compat_publ_dump net/core/neighbour: tell kmemleak about hash tables ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity() net/core/neighbour: fix kmemleak minimal reference count for hash tables serial: 8250: Fix serial8250 initialization crash gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel decnet: fix DN_IFREQ_SIZE net/smc: prevent races between smc_lgr_terminate() and smc_conn_free() net/smc: don't wait for send buffer space when data was already sent mm/hotplug: invalid PFNs from pfn_to_online_page() xfs: end sync buffer I/O properly on shutdown error net/smc: fix sender_free computation blktrace: Show requests without sector net/smc: fix byte_order for rx_curs_confirmed tipc: fix skb may be leaky in tipc_link_input ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI sfc: initialise found bitmap in efx_ef10_mtd_probe geneve: change NET_UDP_TUNNEL dependency to select net: fix possible overflow in __sk_mem_raise_allocated() net: ip_gre: do not report erspan_ver for gre or gretap net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap sctp: don't compare hb_timer expire date before starting it bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id() mmc: core: align max segment size with logical block size net: dev: Use unsigned integer as an argument to left-shift kvm: properly check debugfs dentry before using it bpf: drop refcount if bpf_map_new_fd() fails in map_create() net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED net: hns3: fix PFC not setting problem for DCB module net: hns3: fix an issue for hclgevf_ae_get_hdev net: hns3: fix an issue for hns3_update_new_int_gl iommu/amd: Fix NULL dereference bug in match_hid_uid apparmor: delete the dentry in aafs_remove() to avoid a leak scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery ACPI / APEI: Don't wait to serialise with oops messages when panic()ing ACPI / APEI: Switch estatus pool to use vmalloc memory scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned scsi: libsas: Check SMP PHY control function result RDMA/hns: Fix the bug with updating rq head pointer when flush cqe RDMA/hns: Bugfix for the scene without receiver queue RDMA/hns: Fix the state of rereg mr RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property() xdp: fix cpumap redirect SKB creation bug mtd: Remove a debug trace in mtdpart.c mm, gup: add missing refcount overflow checks on s390 clk: at91: fix update bit maps on CFG_MOR write clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated() usb: dwc2: use a longer core rest timeout in dwc2_core_reset() staging: rtl8192e: fix potential use after free staging: rtl8723bs: Drop ACPI device ids staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P mei: bus: prefix device names on bus with the bus name mei: me: add comet point V device id thunderbolt: Power cycle the router if NVM authentication fails xfrm: Fix memleak on xfrm state destroy media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE net: macb: fix error format in dev_err() pwm: Clear chip_data in pwm_put() media: atmel: atmel-isc: fix asd memory allocation media: atmel: atmel-isc: fix INIT_WORK misplacement macvlan: schedule bc_work even if error net: psample: fix skb_over_panic openvswitch: fix flow command message size sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook slip: Fix use-after-free Read in slip_open openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() openvswitch: remove another BUG_ON() selftests: bpf: test_sockmap: handle file creation failures gracefully tipc: fix link name length check sctp: cache netns in sctp_ep_common net: sched: fix `tc -s class show` no bstats on class with nolock subqueues net: macb: add missed tasklet_kill ext4: add more paranoia checking in ext4_expand_extra_isize handling watchdog: sama5d4: fix WDD value to be always set to max net: macb: Fix SUBNS increment and increase resolution net: macb driver, check for SKBTX_HW_TSTAMP mtd: rawnand: atmel: Fix spelling mistake in error message mtd: rawnand: atmel: fix possible object reference leak mtd: spi-nor: cast to u64 to avoid uint overflows drm/atmel-hlcdc: revert shift by 8 mailbox: stm32_ipcc: add spinlock to fix channels concurrent access tcp: exit if nothing to retransmit on RTO timeout HID: core: check whether Usage Page item is after Usage ID items crypto: stm32/hash - Fix hmac issue more than 256 bytes media: stm32-dcmi: fix DMA corruption when stopping streaming media: stm32-dcmi: fix check of pm_runtime_get_sync return value hwrng: stm32 - fix unbalanced pm_runtime_enable clk: stm32mp1: fix HSI divider flag clk: stm32mp1: fix mcu divider table clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks clk: stm32mp1: parent clocks update mailbox: mailbox-test: fix null pointer if no mmio pinctrl: stm32: fix memory leak issue ASoC: stm32: i2s: fix dma configuration ASoC: stm32: i2s: fix 16 bit format support ASoC: stm32: i2s: fix IRQ clearing ASoC: stm32: sai: add missing put_device() dmaengine: stm32-dma: check whether length is aligned on FIFO threshold platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size net: fec: fix clock count mis-match Linux 4.19.88 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifd3801a77cb551be72788031e7fcfc8a1d4fd197 |
||
Yi Wang
|
fc38279ec5 |
fork: fix some -Wmissing-prototypes warnings
[ Upstream commit fb5bf31722d0805a3f394f7d59f2e8cd07acccb7 ]
We get a warning when building kernel with W=1:
kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]
Add the missing declaration in head file to fix this.
Also, remove arch_release_thread_stack() completely because no arch
seems to implement it since
|
||
Sami Tolvanen
|
16e13c60ff |
FROMLIST: add support for Clang's Shadow Call Stack (SCS)
This change adds generic support for Clang's Shadow Call Stack, which uses a shadow stack to protect return addresses from being overwritten by an attacker. Details are available here: https://clang.llvm.org/docs/ShadowCallStack.html Note that security guarantees in the kernel differ from the ones documented for user space. The kernel must store addresses of shadow stacks used by other tasks and interrupt handlers in memory, which means an attacker capable reading and writing arbitrary memory may be able to locate them and hijack control flow by modifying shadow stacks that are not currently in use. Bug: 145210207 Change-Id: Ia5f1650593fa95da4efcf86f84830a20989f161c (am from https://lore.kernel.org/patchwork/patch/1149054/) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> |
||
Greg Kroah-Hartman
|
d8a623cfbb |
This is the 4.19.80 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl2o0vkACgkQONu9yGCS aT50rQ//So/ShGy5+D6cmsPVbd8p/9X0fM5D6VjgJ2pj+HbalxW7JhJ0tyVdfkO1 AS6p/24bmUMWI3pxJBbJjfggbyleV2UFqlHIei+SjfHd0sghZ+E/XY7mLdcKvx0a dGdXuxF+enb18pNiUKUrJUgqksgfO40yMXyZhjGf5A+/JmYsNaJhkyCWgPfGIB5i YuowOWC4Rkds/CQxyRSRJl+YSQWh8j7Ke0tdjhMQxnmyzFD0wcE5vUtq2vIP7yxO ypPrz9zWTase3y1yCcPsXd0m4Ji1VpF99hlgAr0HLFNcZn2kNYFpyblfOrI/zfoK RqUJgVVlZaWvhMOrueQO+XbebXdCWTJHiNTUDxIFakEAPNz+XkTQQf6LvzOjcFxB oLHnp1qBCn72y6o6Q3XmauFcmYBokyfBvXDAJsXYr+X5B4akn/fpyzzBCInX0h+r og5zpLbXDLszG3j76TPSpcjd+t4xqMy+MG+jkH/mLtMAFyaVi2sfMzsZJAQCACr2 wNHRFQQGjDmQjovkwSRYENQBUuITSj+VrBUD0M7lnfA/Fni3BAJaxY5I6WeEvbeW ejzPVFZB8dfEy7hAKKiVEuI8/akcp8MUXfLJOfTxytLIpYllOBoVC4LtjgO5FYu3 grSbRuowjrlUCtnCU9H18avLE1ScFFEGTmPOqI6xDqKiZf59QsQ= =sKVA -----END PGP SIGNATURE----- Merge 4.19.80 into android-4.19 Changes in 4.19.80 panic: ensure preemption is disabled during panic() f2fs: use EINVAL for superblock with invalid magic USB: rio500: Remove Rio 500 kernel driver USB: yurex: Don't retry on unexpected errors USB: yurex: fix NULL-derefs on disconnect USB: usb-skeleton: fix runtime PM after driver unbind USB: usb-skeleton: fix NULL-deref on disconnect xhci: Fix false warning message about wrong bounce buffer write length xhci: Prevent device initiated U1/U2 link pm if exit latency is too long xhci: Check all endpoints for LPM timeout xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts usb: xhci: wait for CNR controller not ready bit in xhci resume xhci: Prevent deadlock when xhci adapter breaks during init xhci: Increase STS_SAVE timeout in xhci_suspend() USB: adutux: fix use-after-free on disconnect USB: adutux: fix NULL-derefs on disconnect USB: adutux: fix use-after-free on release USB: iowarrior: fix use-after-free on disconnect USB: iowarrior: fix use-after-free on release USB: iowarrior: fix use-after-free after driver unbind USB: usblp: fix runtime PM after driver unbind USB: chaoskey: fix use-after-free on release USB: ldusb: fix NULL-derefs on driver unbind serial: uartlite: fix exit path null pointer USB: serial: keyspan: fix NULL-derefs on open() and write() USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20 USB: serial: option: add Telit FN980 compositions USB: serial: option: add support for Cinterion CLS8 devices USB: serial: fix runtime PM after driver unbind USB: usblcd: fix I/O after disconnect USB: microtek: fix info-leak at probe USB: dummy-hcd: fix power budget for SuperSpeed mode usb: renesas_usbhs: gadget: Do not discard queues in usb_ep_set_{halt,wedge}() usb: renesas_usbhs: gadget: Fix usb_ep_set_{halt,wedge}() behavior USB: legousbtower: fix slab info leak at probe USB: legousbtower: fix deadlock on disconnect USB: legousbtower: fix potential NULL-deref on disconnect USB: legousbtower: fix open after failed reset request USB: legousbtower: fix use-after-free on release mei: me: add comet point (lake) LP device ids mei: avoid FW version request on Ibex Peak and earlier gpio: eic: sprd: Fix the incorrect EIC offset when toggling Staging: fbtft: fix memory leak in fbtft_framebuffer_alloc staging: vt6655: Fix memory leak in vt6655_probe iio: adc: hx711: fix bug in sampling of data iio: adc: ad799x: fix probe error handling iio: adc: axp288: Override TS pin bias current for some models iio: light: opt3001: fix mutex unlock race efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified perf llvm: Don't access out-of-scope array perf inject jit: Fix JIT_CODE_MOVE filename blk-wbt: fix performance regression in wbt scale_up/scale_down CIFS: Gracefully handle QueryInfo errors during open CIFS: Force revalidate inode when dentry is stale CIFS: Force reval dentry if LOOKUP_REVAL flag is set kernel/sysctl.c: do not override max_threads provided by userspace mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() firmware: google: increment VPD key_len properly gpiolib: don't clear FLAG_IS_OUT when emulating open-drain/open-source iio: adc: stm32-adc: move registers definitions iio: adc: stm32-adc: fix a race when using several adcs with dma and irq cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic btrfs: fix incorrect updating of log root tree btrfs: fix uninitialized ret in ref-verify NFS: Fix O_DIRECT accounting of number of bytes read/written MIPS: Disable Loongson MMI instructions for kernel build MIPS: elf_hwcap: Export userspace ASEs ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags ACPI/PPTT: Add support for ACPI 6.3 thread flag arm64: topology: Use PPTT to determine if PE is a thread Fix the locking in dcache_readdir() and friends media: stkwebcam: fix runtime PM after driver unbind arm64/sve: Fix wrong free for task->thread.sve_state tracing/hwlat: Report total time spent in all NMIs during the sample tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency ftrace: Get a reference counter for the trace_array on filter files tracing: Get trace_array reference for available_tracers files hwmon: Fix HWMON_P_MIN_ALARM mask x86/asm: Fix MWAITX C-state hint value PCI: vmd: Fix config addressing when using bus offsets perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization Linux 4.19.80 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I8ef1835585f79b752712a835852a4bc960ae3b97 |
||
Michal Hocko
|
7bbe6eefdb |
kernel/sysctl.c: do not override max_threads provided by userspace
commit b0f53dbc4bc4c371f38b14c391095a3bb8a0bb40 upstream. Partially revert |
||
Joel Fernandes (Google)
|
b3481301f4 |
UPSTREAM: pidfd: add polling support
This patch adds polling support to pidfd. Android low memory killer (LMK) needs to know when a process dies once it is sent the kill signal. It does so by checking for the existence of /proc/pid which is both racy and slow. For example, if a PID is reused between when LMK sends a kill signal and checks for existence of the PID, since the wrong PID is now possibly checked for existence. Using the polling support, LMK will be able to get notified when a process exists in race-free and fast way, and allows the LMK to do other things (such as by polling on other fds) while awaiting the process being killed to die. For notification to polling processes, we follow the same existing mechanism in the kernel used when the parent of the task group is to be notified of a child's death (do_notify_parent). This is precisely when the tasks waiting on a poll of pidfd are also awakened in this patch. We have decided to include the waitqueue in struct pid for the following reasons: 1. The wait queue has to survive for the lifetime of the poll. Including it in task_struct would not be option in this case because the task can be reaped and destroyed before the poll returns. 2. By including the struct pid for the waitqueue means that during de_thread(), the new thread group leader automatically gets the new waitqueue/pid even though its task_struct is different. Appropriate test cases are added in the second patch to provide coverage of all the cases the patch is handling. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Jonathan Kowalski <bl0pbl33p@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: kernel-team@android.com Reviewed-by: Oleg Nesterov <oleg@redhat.com> Co-developed-by: Daniel Colascione <dancol@google.com> Signed-off-by: Daniel Colascione <dancol@google.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Christian Brauner <christian@brauner.io> (cherry picked from commit b53b0b9d9a613c418057f6cb921c2f40a6f78c24) Bug: 135608568 Test: test program using syscall(__NR_sys_pidfd_open,..) and poll() Change-Id: I0391acb666b34e33ef60a778dffdf12ed3654393 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Christian Brauner
|
5013eebdf4 |
UPSTREAM: fork: do not release lock that wasn't taken
Avoid calling cgroup_threadgroup_change_end() without having called cgroup_threadgroup_change_begin() first. During process creation we need to check whether the cgroup we are in allows us to fork. To perform this check the cgroup needs to guard itself against threadgroup changes and takes a lock. Prior to CLONE_PIDFD the cleanup target "bad_fork_free_pid" would also need to call cgroup_threadgroup_change_end() because said lock had already been taken. However, this is not the case anymore with the addition of CLONE_PIDFD. We are now allocating a pidfd before we check whether the cgroup we're in can fork and thus prior to taking the lock. So when copy_process() fails at the right step it would release a lock we haven't taken. This bug is not even very subtle to be honest. It's just not very clear from the naming of cgroup_threadgroup_change_{begin,end}() that a lock is taken. Here's the relevant splat: entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7fec849 Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078 RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000 RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(depth <= 0) WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052 __lock_release kernel/locking/lockdep.c:4052 [inline] WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052 lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 7744 Comm: syz-executor007 Not tainted 5.1.0+ #4 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 panic+0x2cb/0x65c kernel/panic.c:214 __warn.cold+0x20/0x45 kernel/panic.c:566 report_bug+0x263/0x2b0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:179 [inline] fixup_bug arch/x86/kernel/traps.c:174 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:972 RIP: 0010:__lock_release kernel/locking/lockdep.c:4052 [inline] RIP: 0010:lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321 Code: 0f 85 a0 03 00 00 8b 35 77 66 08 08 85 f6 75 23 48 c7 c6 a0 55 6b 87 48 c7 c7 40 25 6b 87 4c 89 85 70 ff ff ff e8 b7 a9 eb ff <0f> 0b 4c 8b 85 70 ff ff ff 4c 89 ea 4c 89 e6 4c 89 c7 e8 52 63 ff RSP: 0018:ffff888094117b48 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 1ffff11012822f6f RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815af236 RDI: ffffed1012822f5b RBP: ffff888094117c00 R08: ffff888092bfc400 R09: fffffbfff113301d R10: fffffbfff113301c R11: ffffffff889980e3 R12: ffffffff8a451df8 R13: ffffffff8142e71f R14: ffffffff8a44cc80 R15: ffff888094117bd8 percpu_up_read.constprop.0+0xcb/0x110 include/linux/percpu-rwsem.h:92 cgroup_threadgroup_change_end include/linux/cgroup-defs.h:712 [inline] copy_process.part.0+0x47ff/0x6710 kernel/fork.c:2222 copy_process kernel/fork.c:1772 [inline] _do_fork+0x25d/0xfd0 kernel/fork.c:2338 __do_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:240 [inline] __se_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:236 [inline] __ia32_compat_sys_x86_clone+0xbc/0x140 arch/x86/ia32/sys_ia32.c:236 do_syscall_32_irqs_on arch/x86/entry/common.c:334 [inline] do_fast_syscall_32+0x281/0xd54 arch/x86/entry/common.c:405 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7fec849 Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078 RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000 RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Kernel Offset: disabled Rebooting in 86400 seconds.. Reported-and-tested-by: syzbot+3286e58549edc479faae@syzkaller.appspotmail.com Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Christian Brauner <christian@brauner.io> (cherry picked from commit c3b7112df86b769927a60a6d7175988ca3d60f09) Bug: 135608568 Test: test program using syscall(__NR_sys_pidfd_open,..) and poll() Change-Id: I3878b8796347676a4d013180c213e5d4830ee30d Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Christian Brauner
|
66faab946a |
UPSTREAM: clone: add CLONE_PIDFD
This patchset makes it possible to retrieve pid file descriptors at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call. As spotted by Linus, there is exactly one bit for clone() left. CLONE_PIDFD creates file descriptors based on the anonymous inode implementation in the kernel that will also be used to implement the new mount api. They serve as a simple opaque handle on pids. Logically, this makes it possible to interpret a pidfd differently, narrowing or widening the scope of various operations (e.g. signal sending). Thus, a pidfd cannot just refer to a tgid, but also a tid, or in theory - given appropriate flag arguments in relevant syscalls - a process group or session. A pidfd does not represent a privilege. This does not imply it cannot ever be that way but for now this is not the case. A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d". As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the parent_tidptr argument of clone. This has the advantage that we can give back the associated pid and the pidfd at the same time. To remove worries about missing metadata access this patchset comes with a sample program that illustrates how a combination of CLONE_PIDFD, and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>. The sample program can easily be translated into a helper that would be suitable for inclusion in libc so that users don't have to worry about writing it themselves. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <christian@brauner.io> Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit b3e5838252665ee4cfa76b82bdf1198dca81e5be) Bug: 135608568 Test: test program using syscall(__NR_sys_pidfd_open,..) and poll() Change-Id: I8a8f87e8fb23de0adb6d6acf2e622926b7a9f55c Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Greg Kroah-Hartman
|
844ecc4634 |
This is the 4.19.64 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1GibIACgkQONu9yGCS aT7z2hAAmv8AsH9IG43m7t6zLroJVswr/9594xk7yPBQgcY3/PW2aTFBCFbsdOL4 yXcj2PSwRiq9K6qAJULrvOvncR9fIILHqzWzyXnoaZ30lR/FxaaFmuHZX/5Ix1tB e5EEE/EA49UAEjEDaMLq8g2IvibsReDxmSpnXyBJWoyRAdFIElVnMJ2+zvP/wRhF NKzQj/bj/qecCbis2lUCaVWJFZ6+P/52UbD8lvIwqR3nk2TKsGDcLU6eY3yg4KrB rEHl5T8KIPrkX3KNIEB8EcFREene+rdpZLLVe4fYwf+gOqfiFXSzZZvweauMkplq ehlVHkykvQvlsVM2tjBD379z3C4aasZDuMVNMCbAy2FlruLeBQ7gEn77mCJB9VH5 /n/mlc2yizdoowtARCLWOUMfASpdSbqu2SQ7A/3kwG7l6GrpzKSIU2nQgm+41sUZ QJVtZ3IYsPoYjnU4B3JZzgJnf3M9jcRz/3JegviqhSEbF1gaScJX0cqN8C1idN/v ZAGCJK9S20/EEEsp5jn+bq2grUehvmD4TVDfot4P+5yRYyBIhMFpbM2RpjydOpwy +x8D1Q34LYPFgZfQ0vF62vcSBhMBiJ/7j41rUeo44K+Lg00F3yCOyL6FxK6S8h6j wsD0xLbllMrhV5KRYFizb3QbCHoHYiROIJk76uLvB+Tqq2Jg9VQ= =qIi2 -----END PGP SIGNATURE----- Merge 4.19.64 into android-4.19 Changes in 4.19.64 hv_sock: Add support for delayed close vsock: correct removal of socket from the list NFS: Fix dentry revalidation on NFSv4 lookup NFS: Refactor nfs_lookup_revalidate() NFSv4: Fix lookup revalidate of regular files usb: dwc2: Disable all EP's on disconnect usb: dwc2: Fix disable all EP's on disconnect arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ binder: fix possible UAF when freeing buffer ISDN: hfcsusb: checking idx of ep configuration media: au0828: fix null dereference in error path ath10k: Change the warning message string media: cpia2_usb: first wake up, then free in disconnect media: pvrusb2: use a different format for warnings NFS: Cleanup if nfs_match_client is interrupted media: radio-raremono: change devm_k*alloc to k*alloc iommu/vt-d: Don't queue_iova() if there is no flush queue iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA Bluetooth: hci_uart: check for missing tty operations vhost: introduce vhost_exceeds_weight() vhost_net: fix possible infinite loop vhost: vsock: add weight support vhost: scsi: add weight support sched/fair: Don't free p->numa_faults with concurrent readers sched/fair: Use RCU accessors consistently for ->numa_group /proc/<pid>/cmdline: remove all the special cases /proc/<pid>/cmdline: add back the setproctitle() special case drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl Fix allyesconfig output. ceph: hold i_ceph_lock when removing caps for freeing inode block, scsi: Change the preempt-only flag into a counter scsi: core: Avoid that a kernel warning appears during system resume ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL Linux 4.19.64 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3e9055b677bd8ad9d5070307fae0bc765d444e9d |
||
Jann Horn
|
48046e092a |
sched/fair: Don't free p->numa_faults with concurrent readers
commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.
During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.
Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes:
|
||
Greg Kroah-Hartman
|
50f91435a2 |
This is the 4.19.45 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzk4CsACgkQONu9yGCS aT5Xaw//UWopx4Yqbiv+4HBgW+2ijP4utxI4lBNYITD44jvkyVJnztUtVkWepu5r Tkl/7zytXOpxbpuhS0xqpWwG7lL5eT4NCG08KSX4lYQVjIWX4YzVkw9gLe9V2AaK IqTzaWtbuagARbnR3UC65TI4kjRGsr9ldY0AbbGGVTM6IwPquHN9Qd9TAzRwRohn CxY94Bwp1RcN2sSPkD3nUCUGOSNh97BXyypeM7FyceOzOpyAdQCXoUPc84cPqdNC 4GBkd5Z1IL/7zX3HDjQeGS0KK6e1enslSmsbSSUVuHI90LCr3CZPJkFF8RFnPnff 2RA7bdhp8C1JPeLDimr+SNSLEl9yywoH6d4UQAnBwoLDjiFCEITVgjDtYzzd81+1 ES6lbUAs8v/LXkaCaExq6pNNd1prg6Mj9Fe6cz+G9V/YV1tLUsoAJHdFucu8Sp7w rwz/PZ6waCf8VRO4aYFF9b+u7PQ/RFZWQYsz22P7PhAYg0CTajV1FWGk1AYi0+wQ 5YCmthbWhDo9U5lAFyQ0pVTXv/UNgEu6MfV1/jKtCk5AzsbE77orj1xusKckHq2e QojgmELmHMlFFajI0h/ddDo7iwz/5OrPVs9D03RysiOciMzdTKPucPyC0Ah4yEBA sJ0cQkaVtqO2Nu3E42lfQTpVIqBgi8NGav+kRwryB1YyKeaXLsM= =HJ7O -----END PGP SIGNATURE----- Merge 4.19.45 into android-4.19 Changes in 4.19.45 locking/rwsem: Prevent decrement of reader count before increment x86/speculation/mds: Revert CPU buffer clear on double fault exit x86/speculation/mds: Improve CPU buffer clear documentation objtool: Fix function fallthrough detection arm64: dts: rockchip: Disable DCMDs on RK3399's eMMC controller. ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260 ARM: dts: exynos: Fix audio (microphone) routing on Odroid XU3 mmc: sdhci-of-arasan: Add DTS property to disable DCMDs. ARM: exynos: Fix a leaked reference by adding missing of_node_put power: supply: axp288_charger: Fix unchecked return value power: supply: axp288_fuel_gauge: Add ACEPC T8 and T11 mini PCs to the blacklist arm64: mmap: Ensure file offset is treated as unsigned arm64: arch_timer: Ensure counter register reads occur with seqlock held arm64: compat: Reduce address limit arm64: Clear OSDLR_EL1 on CPU boot arm64: Save and restore OSDLR_EL1 across suspend/resume sched/x86: Save [ER]FLAGS on context switch crypto: crypto4xx - fix ctr-aes missing output IV crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues crypto: salsa20 - don't access already-freed walk.iv crypto: chacha20poly1305 - set cra_name correctly crypto: ccp - Do not free psp_master when PLATFORM_INIT fails crypto: vmx - fix copy-paste error in CTR mode crypto: skcipher - don't WARN on unprocessed data after slow walk step crypto: crct10dif-generic - fix use via crypto_shash_digest() crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest() crypto: arm64/gcm-aes-ce - fix no-NEON fallback code crypto: gcm - fix incompatibility between "gcm" and "gcm_base" crypto: rockchip - update IV buffer to contain the next IV crypto: arm/aes-neonbs - don't access already-freed walk.iv crypto: arm64/aes-neonbs - don't access already-freed walk.iv mmc: core: Fix tag set memory leak ALSA: line6: toneport: Fix broken usage of timer for delayed execution ALSA: usb-audio: Fix a memory leak bug ALSA: hda/hdmi - Read the pin sense from register when repolling ALSA: hda/hdmi - Consider eld_valid when reporting jack event ALSA: hda/realtek - EAPD turn on later ALSA: hdea/realtek - Headset fixup for System76 Gazelle (gaze14) ASoC: max98090: Fix restore of DAPM Muxes ASoC: RT5677-SPI: Disable 16Bit SPI Transfers ASoC: fsl_esai: Fix missing break in switch statement ASoC: codec: hdac_hdmi add device_link to card device bpf, arm64: remove prefetch insn in xadd mapping crypto: ccree - remove special handling of chained sg crypto: ccree - fix mem leak on error path crypto: ccree - don't map MAC key on stack crypto: ccree - use correct internal state sizes for export crypto: ccree - don't map AEAD key and IV on stack crypto: ccree - pm resume first enable the source clk crypto: ccree - HOST_POWER_DOWN_EN should be the last CC access during suspend crypto: ccree - add function to handle cryptocell tee fips error crypto: ccree - handle tee fips error during power management resume mm/mincore.c: make mincore() more conservative mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses mm/hugetlb.c: don't put_page in lock of hugetlb_lock hugetlb: use same fault hash key for shared and private mappings ocfs2: fix ocfs2 read inode data panic in ocfs2_iget userfaultfd: use RCU to free the task struct when fork fails ACPI: PM: Set enable_for_wake for wakeup GPEs during suspend-to-idle mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0 tty/vt: fix write/write race in ioctl(KDSKBSENT) handler jbd2: check superblock mapped prior to committing ext4: make sanity check in mballoc more strict ext4: ignore e_value_offs for xattrs with value-in-ea-inode ext4: avoid drop reference to iloc.bh twice ext4: fix use-after-free race with debug_want_extra_isize ext4: actually request zeroing of inode table after grow ext4: fix ext4_show_options for file systems w/o journal btrfs: Check the first key and level for cached extent buffer btrfs: Correctly free extent buffer in case btree_read_extent_buffer_pages fails btrfs: Honour FITRIM range constraints during free space trim Btrfs: send, flush dellaloc in order to avoid data loss Btrfs: do not start a transaction during fiemap Btrfs: do not start a transaction at iterate_extent_inodes() bcache: fix a race between cache register and cacheset unregister bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim() ipmi:ssif: compare block number correctly for multi-part return messages crypto: ccm - fix incompatibility between "ccm" and "ccm_base" fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount tty: Don't force RISCV SBI console as preferred console ext4: zero out the unused memory region in the extent tree block ext4: fix data corruption caused by overlapping unaligned and aligned IO ext4: fix use-after-free in dx_release() ext4: avoid panic during forced reboot due to aborted journal ALSA: hda/realtek - Corrected fixup for System76 Gazelle (gaze14) ALSA: hda/realtek - Fixup headphone noise via runtime suspend ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug jbd2: fix potential double free KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes KVM: lapic: Busy wait for timer to expire when using hv_timer kbuild: turn auto.conf.cmd into a mandatory include file xen/pvh: set xen_domain_type to HVM in xen_pvh_init libnvdimm/namespace: Fix label tracking error iov_iter: optimize page_copy_sane() pstore: Centralize init/exit routines pstore: Allocate compression during late_initcall() pstore: Refactor compression initialization ext4: fix compile error when using BUFFER_TRACE ext4: don't update s_rev_level if not required Linux 4.19.45 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Andrea Arcangeli
|
8bae439855 |
userfaultfd: use RCU to free the task struct when fork fails
commit c3f3ce049f7d97cc7ec9c01cb51d9ec74e0f37c2 upstream.
The task structure is freed while get_mem_cgroup_from_mm() holds
rcu_read_lock() and dereferences mm->owner.
get_mem_cgroup_from_mm() failing fork()
---- ---
task = mm->owner
mm->owner = NULL;
free(task)
if (task) *task; /* use after free */
The fix consists in freeing the task with RCU also in the fork failure
case, exactly like it always happens for the regular exit(2) path. That
is enough to make the rcu_read_lock hold in get_mem_cgroup_from_mm()
(left side above) effective to avoid a use after free when dereferencing
the task structure.
An alternate possible fix would be to defer the delivery of the
userfaultfd contexts to the monitor until after fork() is guaranteed to
succeed. Such a change would require more changes because it would
create a strict ordering dependency where the uffd methods would need to
be called beyond the last potentially failing branch in order to be
safe. This solution as opposed only adds the dependency to common code
to set mm->owner to NULL and to free the task struct that was pointed by
mm->owner with RCU, if fork ends up failing. The userfaultfd methods
can still be called anywhere during the fork runtime and the monitor
will keep discarding orphaned "mm" coming from failed forks in userland.
This race condition couldn't trigger if CONFIG_MEMCG was set =n at build
time.
[aarcange@redhat.com: improve changelog, reduce #ifdefs per Michal]
Link: http://lkml.kernel.org/r/20190429035752.4508-1-aarcange@redhat.com
Link: http://lkml.kernel.org/r/20190325225636.11635-2-aarcange@redhat.com
Fixes:
|
||
Johannes Weiner
|
e550f94252 |
UPSTREAM: psi: pressure stall information for CPU, memory, and IO
When systems are overcommitted and resources become contended, it's hard to tell exactly the impact this has on workload productivity, or how close the system is to lockups and OOM kills. In particular, when machines work multiple jobs concurrently, the impact of overcommit in terms of latency and throughput on the individual job can be enormous. In order to maximize hardware utilization without sacrificing individual job health or risk complete machine lockups, this patch implements a way to quantify resource pressure in the system. A kernel built with CONFIG_PSI=y creates files in /proc/pressure/ that expose the percentage of time the system is stalled on CPU, memory, or IO, respectively. Stall states are aggregate versions of the per-task delay accounting delays: cpu: some tasks are runnable but not executing on a CPU memory: tasks are reclaiming, or waiting for swapin or thrashing cache io: tasks are waiting for io completions These percentages of walltime can be thought of as pressure percentages, and they give a general sense of system health and productivity loss incurred by resource overcommit. They can also indicate when the system is approaching lockup scenarios and OOMs. To do this, psi keeps track of the task states associated with each CPU and samples the time they spend in stall states. Every 2 seconds, the samples are averaged across CPUs - weighted by the CPUs' non-idle time to eliminate artifacts from unused CPUs - and translated into percentages of walltime. A running average of those percentages is maintained over 10s, 1m, and 5m periods (similar to the loadaverage). [hannes@cmpxchg.org: doc fixlet, per Randy] Link: http://lkml.kernel.org/r/20180828205625.GA14030@cmpxchg.org [hannes@cmpxchg.org: code optimization] Link: http://lkml.kernel.org/r/20180907175015.GA8479@cmpxchg.org [hannes@cmpxchg.org: rename psi_clock() to psi_update_work(), per Peter] Link: http://lkml.kernel.org/r/20180907145404.GB11088@cmpxchg.org [hannes@cmpxchg.org: fix build] Link: http://lkml.kernel.org/r/20180913014222.GA2370@cmpxchg.org Link: http://lkml.kernel.org/r/20180828172258.3185-9-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit eb414681d5a07d28d2ff90dc05f69ec6b232ebd2) Bug: 127712811 Test: lmkd in PSI mode Change-Id: Id00d23c977169b0c4636d92016fc1fee0274be05 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
Connor O'Brien
|
406d53f0c7 |
ANDROID: cpufreq: track per-task time in state
Add time in state data to task structs, and create /proc/<pid>/time_in_state files to show how long each individual task has run at each frequency. Create a CONFIG_CPU_FREQ_TIMES option to enable/disable this tracking. Bug: 72339335 Bug: 127641090 Test: Read /proc/<pid>/time_in_state Change-Id: Ia6456754f4cb1e83b2bc35efa8fbe9f8696febc8 Signed-off-by: Connor O'Brien <connoro@google.com> [astrachan: Folded the following changes into this patch: a6d3de6a7fba ("ANDROID: Reduce use of #ifdef CONFIG_CPU_FREQ_TIMES") b89ada5d9c09 ("ANDROID: Fix massive cpufreq_times memory leaks")] Signed-off-by: Alistair Strachan <astrachan@google.com> |
||
David Herrmann
|
bc999b5099 |
fork: record start_time late
commit 7b55851367136b1efd84d98fea81ba57a98304cf upstream. This changes the fork(2) syscall to record the process start_time after initializing the basic task structure but still before making the new process visible to user-space. Technically, we could record the start_time anytime during fork(2). But this might lead to scenarios where a start_time is recorded long before a process becomes visible to user-space. For instance, with userfaultfd(2) and TLS, user-space can delay the execution of fork(2) for an indefinite amount of time (and will, if this causes network access, or similar). By recording the start_time late, it much closer reflects the point in time where the process becomes live and can be observed by other processes. Lastly, this makes it much harder for user-space to predict and control the start_time they get assigned. Previously, user-space could fork a process and stall it in copy_thread_tls() before its pid is allocated, but after its start_time is recorded. This can be misused to later-on cycle through PIDs and resume the stalled fork(2) yielding a process that has the same pid and start_time as a process that existed before. This can be used to circumvent security systems that identify processes by their pid+start_time combination. Even though user-space was always aware that start_time recording is flaky (but several projects are known to still rely on start_time-based identification), changing the start_time to be recorded late will help mitigate existing attacks and make it much harder for user-space to control the start_time a process gets assigned. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Nadav Amit
|
1ed0cc5a01 |
mm: respect arch_dup_mmap() return value
Commit |
||
Linus Torvalds
|
cd9b44f907 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: - the rest of MM - procfs updates - various misc things - more y2038 fixes - get_maintainer updates - lib/ updates - checkpatch updates - various epoll updates - autofs updates - hfsplus - some reiserfs work - fatfs updates - signal.c cleanups - ipc/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (166 commits) ipc/util.c: update return value of ipc_getref from int to bool ipc/util.c: further variable name cleanups ipc: simplify ipc initialization ipc: get rid of ids->tables_initialized hack lib/rhashtable: guarantee initial hashtable allocation lib/rhashtable: simplify bucket_table_alloc() ipc: drop ipc_lock() ipc/util.c: correct comment in ipc_obtain_object_check ipc: rename ipcctl_pre_down_nolock() ipc/util.c: use ipc_rcu_putref() for failues in ipc_addid() ipc: reorganize initialization of kern_ipc_perm.seq ipc: compute kern_ipc_perm.id under the ipc lock init/Kconfig: remove EXPERT from CHECKPOINT_RESTORE fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp adfs: use timespec64 for time conversion kernel/sysctl.c: fix typos in comments drivers/rapidio/devices/rio_mport_cdev.c: remove redundant pointer md fork: don't copy inconsistent signal handler state to child signal: make get_signal() return bool signal: make sigkill_pending() return bool ... |
||
Jann Horn
|
06e62a46bb |
fork: don't copy inconsistent signal handler state to child
Before this change, if a multithreaded process forks while one of its threads is changing a signal handler using sigaction(), the memcpy() in copy_sighand() can race with the struct assignment in do_sigaction(). It isn't clear whether this can cause corruption of the userspace signal handler pointer, but it definitely can cause inconsistency between different fields of struct sigaction. Take the appropriate spinlock to avoid this. I have tested that this patch prevents inconsistency between sa_sigaction and sa_flags, which is possible before this patch. Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Dmitry Vyukov
|
a2e5144538 |
kernel/hung_task.c: allow to set checking interval separately from timeout
Currently task hung checking interval is equal to timeout, as the result hung is detected anywhere between timeout and 2*timeout. This is fine for most interactive environments, but this hurts automated testing setups (syzbot). In an automated setup we need to strictly order CPU lockup < RCU stall < workqueue lockup < task hung < silent loss, so that RCU stall is not detected as task hung and task hung is not detected as silent machine loss. The large variance in task hung detection timeout requires setting silent machine loss timeout to a very large value (e.g. if task hung is 3 mins, then silent loss need to be set to ~7 mins). The additional 3 minutes significantly reduce testing efficiency because usually we crash kernel within a minute, and this can add hours to bug localization process as it needs to do dozens of tests. Allow setting checking interval separately from timeout. This allows to set timeout to, say, 3 minutes, but checking interval to 10 secs. The interval is controlled via a new hung_task_check_interval_secs sysctl, similar to the existing hung_task_timeout_secs sysctl. The default value of 0 results in the current behavior: checking interval is equal to timeout. [akpm@linux-foundation.org: update hung_task_timeout_max's comment] Link: http://lkml.kernel.org/r/20180611111004.203513-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrew Morton
|
a670468f5e |
mm: zero out the vma in vma_init()
Rather than in vm_area_alloc(). To ensure that the various oddball stack-based vmas are in a good state. Some of the callers were zeroing them out, others were not. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
0214f46b3a |
Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull core signal handling updates from Eric Biederman: "It was observed that a periodic timer in combination with a sufficiently expensive fork could prevent fork from every completing. This contains the changes to remove the need for that restart. This set of changes is split into several parts: - The first part makes PIDTYPE_TGID a proper pid type instead something only for very special cases. The part starts using PIDTYPE_TGID enough so that in __send_signal where signals are actually delivered we know if the signal is being sent to a a group of processes or just a single process. - With that prep work out of the way the logic in fork is modified so that fork logically makes signals received while it is running appear to be received after the fork completes" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits) signal: Don't send signals to tasks that don't exist signal: Don't restart fork when signals come in. fork: Have new threads join on-going signal group stops fork: Skip setting TIF_SIGPENDING in ptrace_init_task signal: Add calculate_sigpending() fork: Unconditionally exit if a fatal signal is pending fork: Move and describe why the code examines PIDNS_ADDING signal: Push pid type down into complete_signal. signal: Push pid type down into __send_signal signal: Push pid type down into send_signal signal: Pass pid type into do_send_sig_info signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task signal: Pass pid type into group_send_sig_info signal: Pass pid and pid type into send_sigqueue posix-timers: Noralize good_sigevent signal: Use PIDTYPE_TGID to clearly store where file signals will be sent pid: Implement PIDTYPE_TGID pids: Move the pgrp and session pid pointers from task_struct to signal_struct kvm: Don't open code task_pid in kvm_vcpu_ioctl pids: Compute task_tgid using signal->leader_pid ... |
||
Shakeel Butt
|
d46eb14b73 |
fs: fsnotify: account fsnotify metadata to kmemcg
Patch series "Directed kmem charging", v8. The Linux kernel's memory cgroup allows limiting the memory usage of the jobs running on the system to provide isolation between the jobs. All the kernel memory allocated in the context of the job and marked with __GFP_ACCOUNT will also be included in the memory usage and be limited by the job's limit. The kernel memory can only be charged to the memcg of the process in whose context kernel memory was allocated. However there are cases where the allocated kernel memory should be charged to the memcg different from the current processes's memcg. This patch series contains two such concrete use-cases i.e. fsnotify and buffer_head. The fsnotify event objects can consume a lot of system memory for large or unlimited queues if there is either no or slow listener. The events are allocated in the context of the event producer. However they should be charged to the event consumer. Similarly the buffer_head objects can be allocated in a memcg different from the memcg of the page for which buffer_head objects are being allocated. To solve this issue, this patch series introduces mechanism to charge kernel memory to a given memcg. In case of fsnotify events, the memcg of the consumer can be used for charging and for buffer_head, the memcg of the page can be charged. For directed charging, the caller can use the scope API memalloc_[un]use_memcg() to specify the memcg to charge for all the __GFP_ACCOUNT allocations within the scope. This patch (of 2): A lot of memory can be consumed by the events generated for the huge or unlimited queues if there is either no or slow listener. This can cause system level memory pressure or OOMs. So, it's better to account the fsnotify kmem caches to the memcg of the listener. However the listener can be in a different memcg than the memcg of the producer and these allocations happen in the context of the event producer. This patch introduces remote memcg charging API which the producer can use to charge the allocations to the memcg of the listener. There are seven fsnotify kmem caches and among them allocations from dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and inotify_inode_mark_cachep happens in the context of syscall from the listener. So, SLAB_ACCOUNT is enough for these caches. The objects from fsnotify_mark_connector_cachep are not accounted as they are small compared to the notification mark or events and it is unclear whom to account connector to since it is shared by all events attached to the inode. The allocations from the event caches happen in the context of the event producer. For such caches we will need to remote charge the allocations to the listener's memcg. Thus we save the memcg reference in the fsnotify_group structure of the listener. This patch has also moved the members of fsnotify_group to keep the size same, at least for 64 bit build, even with additional member by filling the holes. [shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it] Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
73ba2fb33c |
for-4.19/block-20180812
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltwvasQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpv65EACTq5gSLnJBI6ZPr1RAHruVDnjfzO2Veitl tUtjm0XfWmnEiwQ3dYvnyhy99xbyaG3900d9BClCTlH6xaUdSiQkDpcKG/R2F36J 5mZitYukQcpFAQJWF8YKsTTE7JPl4VglCIDqYiC4+C3rOSVi8lrKn2qp4J4MMCFn thRg3jCcq7c5s9Eigsop1pXWQSasubkXfk55Krcp4oybKYpYRKXXf74Mj14QAbwJ QHN3VisyAUWoBRg7UQZo1Npe2oPk6bbnJypnjf8M0M2EnlvddEkIlHob91sodka8 6p4APOEu5cbyXOBCAQsw/koff14mb8aEadqeQA68WvXfIdX9ZjfxCX0OoC3sBEXk yqJhZ0C980AM13zIBD8ejv4uasGcPca8W+47mE5P8sRiI++5kBsFWDZPCtUBna0X 2Kh24NsmEya9XRR5vsB84dsIPQ3tLMkxg/IgQRVDaSnfJz0c/+zm54xDyKRaFT4l 5iERk2WSkm9+8jNfVmWG0edrv6nRAXjpGwFfOCPh6/LCSCi4xQRULYN7sVzsX8ZK FRjt24HftBI8mJbh4BtweJvg+ppVe1gAk3IO3HvxAQhv29Hz+uvFYe9kL+3N8LJA Qosr9n9O4+wKYizJcDnw+5iPqCHfAwOm9th4pyedR+R7SmNcP3yNC8AbbheNBiF5 Zolos5H+JA== =b9ib -----END PGP SIGNATURE----- Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ... |
||
Linus Torvalds
|
203b4fc903 |
Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Thomas Gleixner: - Make lazy TLB mode even lazier to avoid pointless switch_mm() operations, which reduces CPU load by 1-2% for memcache workloads - Small cleanups and improvements all over the place * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove redundant check for kmem_cache_create() arm/asm/tlb.h: Fix build error implicit func declaration x86/mm/tlb: Make clear_asid_other() static x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off() x86/mm/tlb: Always use lazy TLB mode x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs x86/mm/tlb: Make lazy TLB mode lazier x86/mm/tlb: Restructure switch_mm_irqs_off() x86/mm/tlb: Leave lazy TLB mode at page table free time mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids x86/mm: Add TLB purge to free pmd/pte page interfaces ioremap: Update pgtable free interfaces with addr x86/mm: Disable ioremap free page handling on x86-PAE |
||
Eric W. Biederman
|
c3ad2c3b02 |
signal: Don't restart fork when signals come in.
Wen Yang <wen.yang99@zte.com.cn> and majiang <ma.jiang@zte.com.cn>
report that a periodic signal received during fork can cause fork to
continually restart preventing an application from making progress.
The code was being overly pessimistic. Fork needs to guarantee that a
signal sent to multiple processes is logically delivered before the
fork and just to the forking process or logically delivered after the
fork to both the forking process and it's newly spawned child. For
signals like periodic timers that are always delivered to a single
process fork can safely complete and let them appear to logically
delivered after the fork().
While examining this issue I also discovered that fork today will miss
signals delivered to multiple processes during the fork and handled by
another thread. Similarly the current code will also miss blocked
signals that are delivered to multiple process, as those signals will
not appear pending during fork.
Add a list of each thread that is currently forking, and keep on that
list a signal set that records all of the signals sent to multiple
processes. When fork completes initialize the new processes
shared_pending signal set with it. The calculate_sigpending function
will see those signals and set TIF_SIGPENDING causing the new task to
take the slow path to userspace to handle those signals. Making it
appear as if those signals were received immediately after the fork.
It is not possible to send real time signals to multiple processes and
exceptions don't go to multiple processes, which means that that are
no signals sent to multiple processes that require siginfo. This
means it is safe to not bother collecting siginfo on signals sent
during fork.
The sigaction of a child of fork is initially the same as the
sigaction of the parent process. So a signal the parent ignores the
child will also initially ignore. Therefore it is safe to ignore
signals sent to multiple processes and ignored by the forking process.
Signals sent to only a single process or only a single thread and delivered
during fork are treated as if they are received after the fork, and generally
not dealt with. They won't cause any problems.
V2: Added removal from the multiprocess list on failure.
V3: Use -ERESTARTNOINTR directly
V4: - Don't queue both SIGCONT and SIGSTOP
- Initialize signal_struct.multiprocess in init_task
- Move setting of shared_pending to before the new task
is visible to signals. This prevents signals from comming
in before shared_pending.signal is set to delayed.signal
and being lost.
V5: - rework list add and delete to account for idle threads
v6: - Use sigdelsetmask when removing stop signals
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200447
Reported-by: Wen Yang <wen.yang99@zte.com.cn> and
Reported-by: majiang <ma.jiang@zte.com.cn>
Fixes:
|
||
Jens Axboe
|
05b9ba4b55 |
Linux 4.18-rc6
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76 sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA 1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR 9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2 L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt vJJlibI= =42a5 -----END PGP SIGNATURE----- Merge tag 'v4.18-rc6' into for-4.19/block2 Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <axboe@kernel.dk> |
||
Eric W. Biederman
|
924de3b8c9 |
fork: Have new threads join on-going signal group stops
There are only two signals that are delivered to every member of a signal group: SIGSTOP and SIGKILL. Signal delivery requires every signal appear to be delivered either before or after a clone syscall. SIGKILL terminates the clone so does not need to be considered. Which leaves only SIGSTOP that needs to be considered when creating new threads. Today in the event of a group stop TIF_SIGPENDING will get set and the fork will restart ensuring the fork syscall participates in the group stop. A fork (especially of a process with a lot of memory) is one of the most expensive system so we really only want to restart a fork when necessary. It is easy so check to see if a SIGSTOP is ongoing and have the new thread join it immediate after the clone completes. Making it appear the clone completed happened just before the SIGSTOP. The calculate_sigpending function will see the bits set in jobctl and set TIF_SIGPENDING to ensure the new task takes the slow path to userspace. V2: The call to task_join_group_stop was moved before the new task is added to the thread group list. This should not matter as sighand->siglock is held over both the addition of the threads, the call to task_join_group_stop and do_signal_stop. But the change is trivial and it is one less thing to worry about when reading the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Josef Bacik
|
2c323017e3 |
blk-cgroup: clear the throttle queue on fork
We were hitting a panic in production where we put too many times on the request queue. This is because we'd get the throttle_queue of the parent if we fork()'ed while we needed to be throttled, but we didn't have a reference on it. Instead just clear these flags on fork so the child doesn't pay for the sins of its father. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Kirill A. Shutemov
|
027232da7c |
mm: introduce vma_init()
Not all VMAs allocated with vm_area_alloc(). Some of them allocated on stack or in data segment. The new helper can be use to initialize VMA properly regardless where it was allocated. Link: http://lkml.kernel.org/r/20180724121139.62570-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Eric W. Biederman
|
7673bf553b |
fork: Unconditionally exit if a fatal signal is pending
In practice this does not change anything as testing for fatal_signal_pending and exiting for with an error code duplicates the work of the next clause which recalculates pending signals and then exits fork if any are pending. In both cases the pending signal will trigger the slow path when existing to userspace, and the fatal signal will cause do_exit to be called. The advantage of making this a separate test is that it makes it clear processing the fatal signal will terminate the fork, and it allows the rest of the signal logic to be updated without fear that this important case will be lost. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
4ca1d3ee46 |
fork: Move and describe why the code examines PIDNS_ADDING
Normally this would be something that would be handled by handling signals that are sent to a group of processes but in this case the forking process is not a member of the group being signaled. Thus special code is needed to prevent a race with pid namespaces exiting, and fork adding new processes within them. Move this test up before the signal restart just in case signals are also pending. Fatal conditions should take presedence over restarts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Linus Torvalds
|
490fc05386 |
mm: make vm_area_alloc() initialize core fields
Like vm_area_dup(), it initializes the anon_vma_chain head, and the basic mm pointer. The rest of the fields end up being different for different users, although the plan is to also initialize the 'vm_ops' field to a dummy entry. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
95faf6992d |
mm: make vm_area_dup() actually copy the old vma data
.. and re-initialize th eanon_vma_chain head. This removes some boiler-plate from the users, and also makes it clear why it didn't need use the 'zalloc()' version. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
3928d4f5ee |
mm: use helper functions for allocating and freeing vm_area structs
The vm_area_struct is one of the most fundamental memory management objects, but the management of it is entirely open-coded evertwhere, ranging from allocation and freeing (using kmem_cache_[z]alloc and kmem_cache_free) to initializing all the fields. We want to unify this in order to end up having some unified initialization of the vmas, and the first step to this is to at least have basic allocation functions. Right now those functions are literally just wrappers around the kmem_cache_*() calls. This is a purely mechanical conversion: # new vma: kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc() # copy old vma kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old) # free vma kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma) to the point where the old vma passed in to the vm_area_dup() function isn't even used yet (because I've left all the old manual initialization alone). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Eric W. Biederman
|
6883f81aac |
pid: Implement PIDTYPE_TGID
Everywhere except in the pid array we distinguish between a tasks pid and a tasks tgid (thread group id). Even in the enumeration we want that distinction sometimes so we have added __PIDTYPE_TGID. With leader_pid we almost have an implementation of PIDTYPE_TGID in struct signal_struct. Add PIDTYPE_TGID as a first class member of the pid_type enumeration and into the pids array. Then remove the __PIDTYPE_TGID special case and the leader_pid in signal_struct. The net size increase is just an extra pointer added to struct pid and an extra pair of pointers of an hlist_node added to task_struct. The effect on code maintenance is the removal of a number of special cases today and the potential to remove many more special cases as PIDTYPE_TGID gets used to it's fullest. The long term potential is allowing zombie thread group leaders to exit, which will remove a lot more special cases in the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
2c4704756c |
pids: Move the pgrp and session pid pointers from task_struct to signal_struct
To access these fields the code always has to go to group leader so going to signal struct is no loss and is actually a fundamental simplification. This saves a little bit of memory by only allocating the pid pointer array once instead of once for every thread, and even better this removes a few potential races caused by the fact that group_leader can be changed by de_thread, while signal_struct can not. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Rik van Riel
|
c1a2f7f0c0 |
mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids
The mm_struct always contains a cpumask bitmap, regardless of CONFIG_CPUMASK_OFFSTACK. That means the first step can be to simplify things, and simply have one bitmask at the end of the mm_struct for the mm_cpumask. This does necessitate moving everything else in mm_struct into an anonymous sub-structure, which can be randomized when struct randomization is enabled. The second step is to determine the correct size for the mm_struct slab object from the size of the mm_struct (excluding the CPU bitmap) and the size the cpumask. For init_mm we can simply allocate the maximum size this kernel is compiled for, since we only have one init_mm in the system, anyway. Pointer magic by Mike Galbraith, to evade -Wstringop-overflow getting confused by the dynamically sized array. Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Cc: luto@kernel.org Link: http://lkml.kernel.org/r/20180716190337.26133-2-riel@surriel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Tetsuo Handa
|
655c79bb40 |
mm: check for SIGKILL inside dup_mmap() loop
As a theoretical problem, dup_mmap() of an mm_struct with 60000+ vmas can loop while potentially allocating memory, with mm->mmap_sem held for write by current thread. This is bad if current thread was selected as an OOM victim, for current thread will continue allocations using memory reserves while OOM reaper is unable to reclaim memory. As an actually observable problem, it is not difficult to make OOM reaper unable to reclaim memory if the OOM victim is blocked at i_mmap_lock_write() in this loop. Unfortunately, since nobody can explain whether it is safe to use killable wait there, let's check for SIGKILL before trying to allocate memory. Even without an OOM event, there is no point with continuing the loop from the beginning if current thread is killed. I tested with debug printk(). This patch should be safe because we already fail if security_vm_enough_memory_mm() or kmem_cache_alloc(GFP_KERNEL) fails and exit_mmap() handles it. ***** Aborting dup_mmap() due to SIGKILL ***** ***** Aborting dup_mmap() due to SIGKILL ***** ***** Aborting dup_mmap() due to SIGKILL ***** ***** Aborting dup_mmap() due to SIGKILL ***** ***** Aborting exit_mmap() due to NULL mmap ***** [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/201804071938.CDE04681.SOFVQJFtMHOOLF@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Rik van Riel <riel@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
050e9baa9d |
Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables
The changes to automatically test for working stack protector compiler support in the Kconfig files removed the special STACKPROTECTOR_AUTO option that picked the strongest stack protector that the compiler supported. That was all a nice cleanup - it makes no sense to have the AUTO case now that the Kconfig phase can just determine the compiler support directly. HOWEVER. It also meant that doing "make oldconfig" would now _disable_ the strong stackprotector if you had AUTO enabled, because in a legacy config file, the sane stack protector configuration would look like CONFIG_HAVE_CC_STACKPROTECTOR=y # CONFIG_CC_STACKPROTECTOR_NONE is not set # CONFIG_CC_STACKPROTECTOR_REGULAR is not set # CONFIG_CC_STACKPROTECTOR_STRONG is not set CONFIG_CC_STACKPROTECTOR_AUTO=y and when you ran this through "make oldconfig" with the Kbuild changes, it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version used to be disabled (because it was really enabled by AUTO), and would disable it in the new config, resulting in: CONFIG_HAVE_CC_STACKPROTECTOR=y CONFIG_CC_HAS_STACKPROTECTOR_NONE=y CONFIG_CC_STACKPROTECTOR=y # CONFIG_CC_STACKPROTECTOR_STRONG is not set CONFIG_CC_HAS_SANE_STACKPROTECTOR=y That's dangerously subtle - people could suddenly find themselves with the weaker stack protector setup without even realizing. The solution here is to just rename not just the old RECULAR stack protector option, but also the strong one. This does that by just removing the CC_ prefix entirely for the user choices, because it really is not about the compiler support (the compiler support now instead automatially impacts _visibility_ of the options to users). This results in "make oldconfig" actually asking the user for their choice, so that we don't have any silent subtle security model changes. The end result would generally look like this: CONFIG_HAVE_CC_STACKPROTECTOR=y CONFIG_CC_HAS_STACKPROTECTOR_NONE=y CONFIG_STACKPROTECTOR=y CONFIG_STACKPROTECTOR_STRONG=y CONFIG_CC_HAS_SANE_STACKPROTECTOR=y where the "CC_" versions really are about internal compiler infrastructure, not the user selections. Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
d82991a868 |
Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull restartable sequence support from Thomas Gleixner: "The restartable sequences syscall (finally): After a lot of back and forth discussion and massive delays caused by the speculative distraction of maintainers, the core set of restartable sequences has finally reached a consensus. It comes with the basic non disputed core implementation along with support for arm, powerpc and x86 and a full set of selftests It was exposed to linux-next earlier this week, so it does not fully comply with the merge window requirements, but there is really no point to drag it out for yet another cycle" * 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq/selftests: Provide Makefile, scripts, gitignore rseq/selftests: Provide parametrized tests rseq/selftests: Provide basic percpu ops test rseq/selftests: Provide basic test rseq/selftests: Provide rseq library selftests/lib.mk: Introduce OVERRIDE_TARGETS powerpc: Wire up restartable sequences system call powerpc: Add syscall detection for restartable sequences powerpc: Add support for restartable sequences x86: Wire up restartable sequence system call x86: Add support for restartable sequences arm: Wire up restartable sequences system call arm: Add syscall detection for restartable sequences arm: Add restartable sequences support rseq: Introduce restartable sequences system call uapi/headers: Provide types_32_64.h |
||
Yang Shi
|
88aa7cc688 |
mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
mmap_sem is on the hot path of kernel, and it very contended, but it is abused too. It is used to protect arg_start|end and evn_start|end when reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make sense since those proc files just expect to read 4 values atomically and not related to VM, they could be set to arbitrary values by C/R. And, the mmap_sem contention may cause unexpected issue like below: INFO: task ps:14018 blocked for more than 120 seconds. Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ps D 0 14018 1 0x00000004 Call Trace: schedule+0x36/0x80 rwsem_down_read_failed+0xf0/0x150 call_rwsem_down_read_failed+0x18/0x30 down_read+0x20/0x40 proc_pid_cmdline_read+0xd9/0x4e0 __vfs_read+0x37/0x150 vfs_read+0x96/0x130 SyS_read+0x55/0xc0 entry_SYSCALL_64_fastpath+0x1a/0xc5 Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock for them to mitigate the abuse of mmap_sem. So, introduce a new spinlock in mm_struct to protect the concurrent access to arg_start|end, env_start|end and others, as well as replace write map_sem to read to protect the race condition between prctl and sys_brk which might break check_data_rlimit(), and makes prctl more friendly to other VM operations. This patch just eliminates the abuse of mmap_sem, but it can't resolve the above hung task warning completely since the later access_remote_vm() call needs acquire mmap_sem. The mmap_sem scalability issue will be solved in the future. [yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock] Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
8b5c6a3a49 |
audit/stable-4.18 PR 20180605
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAlsXFUEUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQVeRaWujKfIoomg//eRNpc6x9kxTijN670AC2uD0CBTlZ 2z6mHuJaOhG8bTxjZxQfUBoo6/eZJ2YC1yq6ornGFNzw4sfKsR/j86ujJim2HAmo opUhziq3SILGEvjsxfPkREe/wb49jy0AA/WjZqciitB1ig8Hz7xzqi0lpNaEspFh QJFB6XXkojWGFGrRzruAVJnPS+pDWoTQR0qafs3JWKnpeinpOdZnl1hPsysAEHt5 Ag8o4qS/P9xJM0khi7T+jWECmTyT/mtWqEtFcZ0o+JLOgt/EMvNX6DO4ETDiYRD2 mVChga9x5r78bRgNy2U8IlEWWa76WpcQAEODvhzbijX4RxMAmjsmLE+e+udZSnMZ eCITl2f7ExxrL5SwNFC/5h7pAv0RJ+SOC19vcyeV4JDlQNNVjUy/aNKv5baV0aeg EmkeobneMWxqHx52aERz8RF1in5pT8gLOYoYnWfNpcDEmjLrwhuZLX2asIzUEqrS SoPJ8hxIDCxceHOWIIrz5Dqef7x28Dyi46w3QINC8bSy2RnR/H3q40DRegvXOGiS 9WcbbwbhnM4Kau413qKicGCvdqTVYdeyZqo7fVelSciD139Vk7pZotyom4MuU25p fIyGfXa8/8gkl7fZ+HNkZbba0XWNfAZt//zT095qsp3CkhVnoybwe6OwG1xRqErq W7OOQbS7vvN/KGo= =10u6 -----END PGP SIGNATURE----- Merge tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Another reasonable chunk of audit changes for v4.18, thirteen patches in total. The thirteen patches can mostly be broken down into one of four categories: general bug fixes, accessor functions for audit state stored in the task_struct, negative filter matches on executable names, and extending the (relatively) new seccomp logging knobs to the audit subsystem. The main driver for the accessor functions from Richard are the changes we're working on to associate audit events with containers, but I think they have some standalone value too so I figured it would be good to get them in now. The seccomp/audit patches from Tyler apply the seccomp logging improvements from a few releases ago to audit's seccomp logging; starting with this patchset the changes in /proc/sys/kernel/seccomp/actions_logged should apply to both the standard kernel logging and audit. As usual, everything passes the audit-testsuite and it happens to merge cleanly with your tree" [ Heh, except it had trivial merge conflicts with the SELinux tree that also came in from Paul - Linus ] * tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: Fix wrong task in comparison of session ID audit: use existing session info function audit: normalize loginuid read access audit: use new audit_context access funciton for seccomp_actions_logged audit: use inline function to set audit context audit: use inline function to get audit context audit: convert sessionid unset to a macro seccomp: Don't special case audited processes when logging seccomp: Audit attempts to modify the actions_logged sysctl seccomp: Configurable separator for the actions_logged string seccomp: Separate read and write code for actions_logged sysctl audit: allow not equal op for audit by executable audit: add syscall information to FEATURE_CHANGE records |
||
Mathieu Desnoyers
|
d7822b1e24 |
rseq: Introduce restartable sequences system call
Expose a new system call allowing each thread to register one userspace memory area to be used as an ABI between kernel and user-space for two purposes: user-space restartable sequences and quick access to read the current CPU number value from user-space. * Restartable sequences (per-cpu atomics) Restartables sequences allow user-space to perform update operations on per-cpu data without requiring heavy-weight atomic operations. The restartable critical sections (percpu atomics) work has been started by Paul Turner and Andrew Hunter. It lets the kernel handle restart of critical sections. [1] [2] The re-implementation proposed here brings a few simplifications to the ABI which facilitates porting to other architectures and speeds up the user-space fast path. Here are benchmarks of various rseq use-cases. Test hardware: arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading The following benchmarks were all performed on a single thread. * Per-CPU statistic counter increment getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 344.0 31.4 11.0 x86-64: 15.3 2.0 7.7 * LTTng-UST: write event 32-bit header, 32-bit payload into tracer per-cpu buffer getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 2502.0 2250.0 1.1 x86-64: 117.4 98.0 1.2 * liburcu percpu: lock-unlock pair, dereference, read/compare word getcpu+atomic (ns/op) rseq (ns/op) speedup arm32: 751.0 128.5 5.8 x86-64: 53.4 28.6 1.9 * jemalloc memory allocator adapted to use rseq Using rseq with per-cpu memory pools in jemalloc at Facebook (based on rseq 2016 implementation): The production workload response-time has 1-2% gain avg. latency, and the P99 overall latency drops by 2-3%. * Reading the current CPU number Speeding up reading the current CPU number on which the caller thread is running is done by keeping the current CPU number up do date within the cpu_id field of the memory area registered by the thread. This is done by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, a notify-resume handler updates the current CPU value within the registered user-space memory area. User-space can then read the current CPU number directly from memory. Keeping the current cpu id in a memory area shared between kernel and user-space is an improvement over current mechanisms available to read the current CPU number, which has the following benefits over alternative approaches: - 35x speedup on ARM vs system call through glibc - 20x speedup on x86 compared to calling glibc, which calls vdso executing a "lsl" instruction, - 14x speedup on x86 compared to inlined "lsl" instruction, - Unlike vdso approaches, this cpu_id value can be read from an inline assembly, which makes it a useful building block for restartable sequences. - The approach of reading the cpu id through memory mapping shared between kernel and user-space is portable (e.g. ARM), which is not the case for the lsl-based x86 vdso. On x86, yet another possible approach would be to use the gs segment selector to point to user-space per-cpu data. This approach performs similarly to the cpu id cache, but it has two disadvantages: it is not portable, and it is incompatible with existing applications already using the gs segment selector for other purposes. Benchmarking various approaches for reading the current CPU number: ARMv7 Processor rev 4 (v7l) Machine model: Cubietruck - Baseline (empty loop): 8.4 ns - Read CPU from rseq cpu_id: 16.7 ns - Read CPU from rseq cpu_id (lazy register): 19.8 ns - glibc 2.19-0ubuntu6.6 getcpu: 301.8 ns - getcpu system call: 234.9 ns x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz: - Baseline (empty loop): 0.8 ns - Read CPU from rseq cpu_id: 0.8 ns - Read CPU from rseq cpu_id (lazy register): 0.8 ns - Read using gs segment selector: 0.8 ns - "lsl" inline assembly: 13.0 ns - glibc 2.19-0ubuntu6 getcpu: 16.6 ns - getcpu system call: 53.9 ns - Speed (benchmark taken on v8 of patchset) Running 10 runs of hackbench -l 100000 seems to indicate, contrary to expectations, that enabling CONFIG_RSEQ slightly accelerates the scheduler: Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1 kernel parameter), with a Linux v4.6 defconfig+localyesconfig, restartable sequences series applied. * CONFIG_RSEQ=n avg.: 41.37 s std.dev.: 0.36 s * CONFIG_RSEQ=y avg.: 40.46 s std.dev.: 0.33 s - Size On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is 567 bytes, and the data size increase of vmlinux is 5696 bytes. [1] https://lwn.net/Articles/650333/ [2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Watson <davejwatson@fb.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Chris Lameter <cl@linux.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Hunter <ahh@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Maurer <bmaurer@fb.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-api@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com |