-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl91ulMACgkQONu9yGCS
aT7ezhAArTOQxPGkhktgdGfCMYgjvIHdny8o4pNGumnxW6TG7FCiJHoZuj8OLkdx
2x5brOOvSGgcGTOwJXyUjL6opQzD5syTCuzbgEpGB2Tyd1x5q8vgqvI2XPxZeYHy
x+mUDgacT+4m7FNbFDhNMZoTS4KCiJ3IcTevjeQexDtIs6R38HhxNl0Ee67gkqxZ
p7c6L3kbUuR5T9EWGE1DPPLhOFGeOMk592qzkFsCGERsuswQOpXrxyw6zkik/0UG
6Losmo2i+OtQFeiDz0WYJZNO9ySI511j+7R2Ewch/nFuTp6yFzy9kJZnP0YWK/KE
U4BLmopgzCs9q+TQ/QNjxlCltl4eOrrjkFXF3Zz8o5ddbKwrugEsJUdUUDIpva71
qEUgSw7vguGKoCttBenCDwyYOcjIVJRBFSWTVDzkgw5pXrz3m7qePF1Kj+KzG0pN
8gTqosXPlYPzH1mh+2vRVntiCpZRMJYo18CX+ifqN20dHH3dsM4vA5NiWwjTJVY8
JddRXfujxBQ0jxs2jFKvPZNrgqeY3Mh51L0a5G+HbHCIb+4kgD+2jl+C/X38TKch
osTM1/qQriFVxtlH9TkTa8opYvrYBWO+G+XhNVc2tSpmd8T2EaKokMAVVvGiK3l9
ZPq06SytJyKDPsSLvk4BKxCUv5CY0VT18k6mCYd1fq4oxTR92A4=
=5bC5
-----END PGP SIGNATURE-----
Merge 4.19.149 into android-4.19-stable
Changes in 4.19.149
selinux: allow labeling before policy is loaded
media: mc-device.c: fix memleak in media_device_register_entity
dma-fence: Serialise signal enabling (dma_fence_enable_sw_signaling)
ath10k: fix array out-of-bounds access
ath10k: fix memory leak for tpc_stats_final
mm: fix double page fault on arm64 if PTE_AF is cleared
scsi: aacraid: fix illegal IO beyond last LBA
m68k: q40: Fix info-leak in rtc_ioctl
gma/gma500: fix a memory disclosure bug due to uninitialized bytes
ASoC: kirkwood: fix IRQ error handling
media: smiapp: Fix error handling at NVM reading
arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
x86/ioapic: Unbreak check_timer()
ALSA: usb-audio: Add delay quirk for H570e USB headsets
ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged
ALSA: hda/realtek: Enable front panel headset LED on Lenovo ThinkStation P520
lib/string.c: implement stpcpy
leds: mlxreg: Fix possible buffer overflow
PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out
scsi: fnic: fix use after free
scsi: lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce
net: silence data-races on sk_backlog.tail
clk/ti/adpll: allocate room for terminating null
drm/amdgpu/powerplay: fix AVFS handling with custom powerplay table
mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup()
mfd: mfd-core: Protect against NULL call-back function pointer
drm/amdgpu/powerplay/smu7: fix AVFS handling with custom powerplay table
tpm_crb: fix fTPM on AMD Zen+ CPUs
tracing: Adding NULL checks for trace_array descriptor pointer
bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
dmaengine: mediatek: hsdma_probe: fixed a memory leak when devm_request_irq fails
RDMA/qedr: Fix potential use after free
RDMA/i40iw: Fix potential use after free
fix dget_parent() fastpath race
xfs: fix attr leaf header freemap.size underflow
RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()'
ubi: Fix producing anchor PEBs
mmc: core: Fix size overflow for mmc partitions
gfs2: clean up iopen glock mess in gfs2_create_inode
scsi: pm80xx: Cleanup command when a reset times out
debugfs: Fix !DEBUG_FS debugfs_create_automount
CIFS: Properly process SMB3 lease breaks
ASoC: max98090: remove msleep in PLL unlocked workaround
kernel/sys.c: avoid copying possible padding bytes in copy_to_user
KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy()
xfs: fix log reservation overflows when allocating large rt extents
neigh_stat_seq_next() should increase position index
rt_cpu_seq_next should increase position index
ipv6_route_seq_next should increase position index
seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier
media: ti-vpe: cal: Restrict DMA to avoid memory corruption
sctp: move trace_sctp_probe_path into sctp_outq_sack
ACPI: EC: Reference count query handlers under lock
scsi: ufs: Make ufshcd_add_command_trace() easier to read
scsi: ufs: Fix a race condition in the tracing code
dmaengine: zynqmp_dma: fix burst length configuration
s390/cpum_sf: Use kzalloc and minor changes
powerpc/eeh: Only dump stack once if an MMIO loop is detected
Bluetooth: btrtl: Use kvmalloc for FW allocations
tracing: Set kernel_stack's caller size properly
ARM: 8948/1: Prevent OOB access in stacktrace
ar5523: Add USB ID of SMCWUSBT-G2 wireless adapter
ceph: ensure we have a new cap before continuing in fill_inode
selftests/ftrace: fix glob selftest
tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility
Bluetooth: Fix refcount use-after-free issue
mm/swapfile.c: swap_next should increase position index
mm: pagewalk: fix termination condition in walk_pte_range()
Bluetooth: prefetch channel before killing sock
KVM: fix overflow of zero page refcount with ksm running
ALSA: hda: Clear RIRB status before reading WP
skbuff: fix a data race in skb_queue_len()
audit: CONFIG_CHANGE don't log internal bookkeeping as an event
selinux: sel_avc_get_stat_idx should increase position index
scsi: lpfc: Fix RQ buffer leakage when no IOCBs available
scsi: lpfc: Fix coverity errors in fmdi attribute handling
drm/omap: fix possible object reference leak
clk: stratix10: use do_div() for 64-bit calculation
crypto: chelsio - This fixes the kernel panic which occurs during a libkcapi test
mt76: clear skb pointers from rx aggregation reorder buffer during cleanup
ALSA: usb-audio: Don't create a mixer element with bogus volume range
perf test: Fix test trace+probe_vfs_getname.sh on s390
RDMA/rxe: Fix configuration of atomic queue pair attributes
KVM: x86: fix incorrect comparison in trace event
dmaengine: stm32-mdma: use vchan_terminate_vdesc() in .terminate_all
media: staging/imx: Missing assignment in imx_media_capture_device_register()
x86/pkeys: Add check for pkey "overflow"
bpf: Remove recursion prevention from rcu free callback
dmaengine: stm32-dma: use vchan_terminate_vdesc() in .terminate_all
dmaengine: tegra-apb: Prevent race conditions on channel's freeing
drm/amd/display: dal_ddc_i2c_payloads_create can fail causing panic
firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhp
random: fix data races at timer_rand_state
bus: hisi_lpc: Fixup IO ports addresses to avoid use-after-free in host removal
media: go7007: Fix URB type for interrupt handling
Bluetooth: guard against controllers sending zero'd events
timekeeping: Prevent 32bit truncation in scale64_check_overflow()
ext4: fix a data race at inode->i_disksize
perf jevents: Fix leak of mapfile memory
mm: avoid data corruption on CoW fault into PFN-mapped VMA
drm/amdgpu: increase atombios cmd timeout
drm/amd/display: Stop if retimer is not available
ath10k: use kzalloc to read for ath10k_sdio_hif_diag_read
scsi: aacraid: Disabling TM path and only processing IOP reset
Bluetooth: L2CAP: handle l2cap config request during open state
media: tda10071: fix unsigned sign extension overflow
xfs: don't ever return a stale pointer from __xfs_dir3_free_read
xfs: mark dir corrupt when lookup-by-hash fails
ext4: mark block bitmap corrupted when found instead of BUGON
tpm: ibmvtpm: Wait for buffer to be set before proceeding
rtc: sa1100: fix possible race condition
rtc: ds1374: fix possible race condition
nfsd: Don't add locks to closed or closing open stateids
RDMA/cm: Remove a race freeing timewait_info
KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones
drm/msm: fix leaks if initialization fails
drm/msm/a5xx: Always set an OPP supported hardware value
tracing: Use address-of operator on section symbols
thermal: rcar_thermal: Handle probe error gracefully
perf parse-events: Fix 3 use after frees found with clang ASAN
serial: 8250_port: Don't service RX FIFO if throttled
serial: 8250_omap: Fix sleeping function called from invalid context during probe
serial: 8250: 8250_omap: Terminate DMA before pushing data on RX timeout
perf cpumap: Fix snprintf overflow check
cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn
tools: gpio-hammer: Avoid potential overflow in main
nvme-multipath: do not reset on unknown status
nvme: Fix controller creation races with teardown flow
RDMA/rxe: Set sys_image_guid to be aligned with HW IB devices
scsi: hpsa: correct race condition in offload enabled
SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()'
svcrdma: Fix leak of transport addresses
PCI: Use ioremap(), not phys_to_virt() for platform ROM
ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len
ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor
PCI: pciehp: Fix MSI interrupt race
NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
mm/kmemleak.c: use address-of operator on section symbols
mm/filemap.c: clear page error before actual read
mm/vmscan.c: fix data races using kswapd_classzone_idx
nvmet-rdma: fix double free of rdma queue
mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area
scsi: qedi: Fix termination timeouts in session logout
serial: uartps: Wait for tx_empty in console setup
KVM: Remove CREATE_IRQCHIP/SET_PIT2 race
bdev: Reduce time holding bd_mutex in sync in blkdev_close()
drivers: char: tlclk.c: Avoid data race between init and interrupt handler
KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
net: openvswitch: use u64 for meter bucket
scsi: aacraid: Fix error handling paths in aac_probe_one()
staging:r8188eu: avoid skb_clone for amsdu to msdu conversion
sparc64: vcc: Fix error return code in vcc_probe()
arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
dt-bindings: sound: wm8994: Correct required supplies based on actual implementaion
atm: fix a memory leak of vcc->user_back
perf mem2node: Avoid double free related to realloc
power: supply: max17040: Correct voltage reading
phy: samsung: s5pv210-usb2: Add delay after reset
Bluetooth: Handle Inquiry Cancel error after Inquiry Complete
USB: EHCI: ehci-mv: fix error handling in mv_ehci_probe()
tipc: fix memory leak in service subscripting
tty: serial: samsung: Correct clock selection logic
ALSA: hda: Fix potential race in unsol event handler
powerpc/traps: Make unrecoverable NMIs die instead of panic
fuse: don't check refcount after stealing page
USB: EHCI: ehci-mv: fix less than zero comparison of an unsigned int
scsi: cxlflash: Fix error return code in cxlflash_probe()
arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
e1000: Do not perform reset in reset_task if we are already down
drm/nouveau/debugfs: fix runtime pm imbalance on error
drm/nouveau: fix runtime pm imbalance on error
drm/nouveau/dispnv50: fix runtime pm imbalance on error
printk: handle blank console arguments passed in.
usb: dwc3: Increase timeout for CmdAct cleared by device controller
btrfs: don't force read-only after error in drop snapshot
vfio/pci: fix memory leaks of eventfd ctx
perf evsel: Fix 2 memory leaks
perf trace: Fix the selection for architectures to generate the errno name tables
perf stat: Fix duration_time value for higher intervals
perf util: Fix memory leak of prefix_if_not_in
perf metricgroup: Free metric_events on error
perf kcore_copy: Fix module map when there are no modules loaded
ASoC: img-i2s-out: Fix runtime PM imbalance on error
wlcore: fix runtime pm imbalance in wl1271_tx_work
wlcore: fix runtime pm imbalance in wlcore_regdomain_config
mtd: rawnand: omap_elm: Fix runtime PM imbalance on error
PCI: tegra: Fix runtime PM imbalance on error
ceph: fix potential race in ceph_check_caps
mm/swap_state: fix a data race in swapin_nr_pages
rapidio: avoid data race between file operation callbacks and mport_cdev_add().
mtd: parser: cmdline: Support MTD names containing one or more colons
x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline
vfio/pci: Clear error and request eventfd ctx after releasing
cifs: Fix double add page to memcg when cifs_readpages
nvme: fix possible deadlock when I/O is blocked
scsi: libfc: Handling of extra kref
scsi: libfc: Skip additional kref updating work event
selftests/x86/syscall_nt: Clear weird flags after each test
vfio/pci: fix racy on error and request eventfd ctx
btrfs: qgroup: fix data leak caused by race between writeback and truncate
ubi: fastmap: Free unused fastmap anchor peb during detach
perf parse-events: Use strcmp() to compare the PMU name
net: openvswitch: use div_u64() for 64-by-32 divisions
nvme: explicitly update mpath disk capacity on revalidation
ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811
ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions
ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1
RISC-V: Take text_mutex in ftrace_init_nop()
s390/init: add missing __init annotations
lockdep: fix order in trace_hardirqs_off_caller()
drm/amdkfd: fix a memory leak issue
i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()
objtool: Fix noreturn detection for ignored functions
ieee802154: fix one possible memleak in ca8210_dev_com_init
ieee802154/adf7242: check status of adf7242_read_reg
clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
mwifiex: Increase AES key storage size to 256 bits
batman-adv: bla: fix type misuse for backbone_gw hash indexing
atm: eni: fix the missed pci_disable_device() for eni_init_one()
batman-adv: mcast/TT: fix wrongly dropped or rerouted packets
mac802154: tx: fix use-after-free
bpf: Fix clobbering of r2 in bpf_gen_ld_abs
drm/vc4/vc4_hdmi: fill ASoC card owner
net: qed: RDMA personality shouldn't fail VF load
drm/sun4i: sun8i-csc: Secondary CSC register correction
batman-adv: Add missing include for in_interrupt()
batman-adv: mcast: fix duplicate mcast packets in BLA backbone from mesh
batman-adv: mcast: fix duplicate mcast packets from BLA backbone to mesh
bpf: Fix a rcu warning for bpffs map pretty-print
ALSA: asihpi: fix iounmap in error handler
regmap: fix page selection for noinc reads
MIPS: Add the missing 'CPU_1074K' into __get_cpu_type()
KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE
KVM: SVM: Add a dedicated INVD intercept routine
tracing: fix double free
s390/dasd: Fix zero write for FBA devices
kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
mm, THP, swap: fix allocating cluster for swapfile by mistake
s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl
kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE
ata: define AC_ERR_OK
ata: make qc_prep return ata_completion_errors
ata: sata_mv, avoid trigerrable BUG_ON
KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch
Linux 4.19.149
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idfc1b35ec63b4b464aeb6e32709102bee0efc872
[ Upstream commit 5e1aada08cd19ea652b2d32a250501d09b02ff2e ]
Initialization is not guaranteed to zero padding bytes so use an
explicit memset instead to avoid leaking any kernel content in any
possible padding bytes.
Link: http://lkml.kernel.org/r/dfa331c00881d61c8ee51577a082d8bebd61805c.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(Upstream commit 3e91ec89f527b9870fe42dcbdb74fd389d123a95).
Require that arg{3,4,5} of the PR_{SET,GET}_TAGGED_ADDR_CTRL prctl and
arg2 of the PR_GET_TAGGED_ADDR_CTRL prctl() are zero rather than ignored
for future extensions.
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
Change-Id: I8bb5c3eb4728440880c971d77904f7e45b571ddc
(Upstream commit 63f0c60379650d82250f22e4cf4137ef3dc4f43d).
It is not desirable to relax the ABI to allow tagged user addresses into
the kernel indiscriminately. This patch introduces a prctl() interface
for enabling or disabling the tagged ABI with a global sysctl control
for preventing applications from enabling the relaxed ABI (meant for
testing user-space prctl() return error checking without reconfiguring
the kernel). The ABI properties are inherited by threads of the same
application and fork()'ed children but cleared on execve(). A Kconfig
option allows the overall disabling of the relaxed ABI.
The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
MTE-specific settings like imprecise vs precise exceptions.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Change-Id: I2d52c5589b05415faab315c116245f1058d64750
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Bug: 135692346
-----BEGIN PGP SIGNATURE-----
iQIyBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0EwEMACgkQONu9yGCS
aT4UpA/3UqmMAaHH4g20cic8YDoBkYXoQUyj/kbf1w6bXqCuLTHOFS/qXa6QtPK3
WfwNWChIob+EbfAMjreQZjT6pwBbxyCNUvsvpB0k8YhJ3v/sIZL/Wc+1ZDu+jBC9
xfqwT9+JGzgu8P4PUTr1BsGMhgiH9qoJAeq7RCuDcicMIJ4/aJVr4Cvrs+18PVUe
95T5vSn6G62QyOrUExmuuztvNM2P/kos6yJTkN80l3uPLMUjsnWCsMKu+9utd5ea
ew332Z/BQs+ff4oljH1uRwsM/Z7+AKXlXatXD0sHQ8CTEqh44SgUSU96vB/h9W8I
6a0t4M2atsdaGjMHmiiPA+gIgd1rW0lsHk6ob6qgfzuRBFGN9BTUfZgQwhOW7uXt
e4o5RrWELkbk/TlJzrG1dFjhfyeb7q3LHOOg8kOVU0KdPD44ekJW9qoI8tlMPI+5
mafCCS/oS6TaW20ZKmjjkIbfTndzdO3dy5EWLy3elCEyLF2gDZ6WCAL+SMngdApC
/dbuBigF/+RBaEU1e56DkcYUYXjt6UO84O3dYAx69s6EhHS5CP4yCySuYpdxyg/G
MWdFDhtbnFyMXqoK1ROS0hNnxpydkh+R1Ns0TeSYibJI2J2enMGIa0thu4aVLD3v
+GLqHV2PsPTRQmF5ChnvNV7O53i4j4WBQQjL+80PQulJwRFb/A==
=bIqz
-----END PGP SIGNATURE-----
Merge 4.19.51 into android-4.19
Changes in 4.19.51
rapidio: fix a NULL pointer dereference when create_workqueue() fails
fs/fat/file.c: issue flush after the writeback of FAT
sysctl: return -EINVAL if val violates minmax
ipc: prevent lockup on alloc_msg and free_msg
drm/pl111: Initialize clock spinlock early
ARM: prevent tracing IPI_CPU_BACKTRACE
mm/hmm: select mmu notifier when selecting HMM
hugetlbfs: on restore reserve error path retain subpool reservation
mem-hotplug: fix node spanned pages when we have a node with only ZONE_MOVABLE
mm/cma.c: fix crash on CMA allocation if bitmap allocation fails
initramfs: free initrd memory if opening /initrd.image fails
mm/cma.c: fix the bitmap status to show failed allocation reason
mm: page_mkclean vs MADV_DONTNEED race
mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
mm/slab.c: fix an infinite loop in leaks_show()
kernel/sys.c: prctl: fix false positive in validate_prctl_map()
thermal: rcar_gen3_thermal: disable interrupt in .remove
drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER
mfd: tps65912-spi: Add missing of table registration
mfd: intel-lpss: Set the device in reset state when init
drm/nouveau/disp/dp: respect sink limits when selecting failsafe link configuration
mfd: twl6040: Fix device init errors for ACCCTL register
perf/x86/intel: Allow PEBS multi-entry in watermark mode
drm/nouveau/kms/gf119-gp10x: push HeadSetControlOutputResource() mthd when encoders change
drm/bridge: adv7511: Fix low refresh rate selection
objtool: Don't use ignore flag for fake jumps
drm/nouveau/kms/gv100-: fix spurious window immediate interlocks
bpf: fix undefined behavior in narrow load handling
EDAC/mpc85xx: Prevent building as a module
pwm: meson: Use the spin-lock only to protect register modifications
mailbox: stm32-ipcc: check invalid irq
ntp: Allow TAI-UTC offset to be set to zero
f2fs: fix to avoid panic in do_recover_data()
f2fs: fix to avoid panic in f2fs_inplace_write_data()
f2fs: fix to avoid panic in f2fs_remove_inode_page()
f2fs: fix to do sanity check on free nid
f2fs: fix to clear dirty inode in error path of f2fs_iget()
f2fs: fix to avoid panic in dec_valid_block_count()
f2fs: fix to use inline space only if inline_xattr is enable
f2fs: fix to do sanity check on valid block count of segment
f2fs: fix to do checksum even if inode page is uptodate
percpu: remove spurious lock dependency between percpu and sched
configfs: fix possible use-after-free in configfs_register_group
uml: fix a boot splat wrt use of cpu_all_mask
PCI: dwc: Free MSI in dw_pcie_host_init() error path
PCI: dwc: Free MSI IRQ page in dw_pcie_free_msi()
ovl: do not generate duplicate fsnotify events for "fake" path
mmc: mmci: Prevent polling for busy detection in IRQ context
netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast
netfilter: nf_conntrack_h323: restore boundary check correctness
mips: Make sure dt memory regions are valid
netfilter: nf_tables: fix base chain stat rcu_dereference usage
watchdog: imx2_wdt: Fix set_timeout for big timeout values
watchdog: fix compile time error of pretimeout governors
blk-mq: move cancel of requeue_work into blk_mq_release
iommu/vt-d: Set intel_iommu_gfx_mapped correctly
misc: pci_endpoint_test: Fix test_reg_bar to be updated in pci_endpoint_test
PCI: designware-ep: Use aligned ATU window for raising MSI interrupts
nvme-pci: unquiesce admin queue on shutdown
nvme-pci: shutdown on timeout during deletion
netfilter: nf_flow_table: check ttl value in flow offload data path
netfilter: nf_flow_table: fix netdev refcnt leak
ALSA: hda - Register irq handler after the chip initialization
nvmem: core: fix read buffer in place
nvmem: sunxi_sid: Support SID on A83T and H5
fuse: retrieve: cap requested size to negotiated max_write
nfsd: allow fh_want_write to be called twice
nfsd: avoid uninitialized variable warning
vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING"
iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel
switchtec: Fix unintended mask of MRPC event
net: thunderbolt: Unregister ThunderboltIP protocol handler when suspending
x86/PCI: Fix PCI IRQ routing table memory leak
i40e: Queues are reserved despite "Invalid argument" error
platform/chrome: cros_ec_proto: check for NULL transfer function
PCI: keystone: Prevent ARM32 specific code to be compiled for ARM64
soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher
clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288
soc: rockchip: Set the proper PWM for rk3288
ARM: dts: imx51: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
ARM: dts: imx50: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
ARM: dts: imx53: Specify IMX5_CLK_IPG as "ahb" clock to SDMA
ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA
ARM: dts: imx6sll: Specify IMX6SLL_CLK_IPG as "ipg" clock to SDMA
ARM: dts: imx7d: Specify IMX7D_CLK_IPG as "ipg" clock to SDMA
ARM: dts: imx6ul: Specify IMX6UL_CLK_IPG as "ipg" clock to SDMA
ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" clock to SDMA
ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA
PCI: rpadlpar: Fix leaked device_node references in add/remove paths
drm/amd/display: Use plane->color_space for dpp if specified
ARM: OMAP2+: pm33xx-core: Do not Turn OFF CEFUSE as PPA may be using it
platform/x86: intel_pmc_ipc: adding error handling
power: supply: max14656: fix potential use-before-alloc
net: hns3: return 0 and print warning when hit duplicate MAC
PCI: rcar: Fix a potential NULL pointer dereference
PCI: rcar: Fix 64bit MSI message address handling
scsi: qla2xxx: Reset the FCF_ASYNC_{SENT|ACTIVE} flags
video: hgafb: fix potential NULL pointer dereference
video: imsttfb: fix potential NULL pointer dereferences
block, bfq: increase idling for weight-raised queues
PCI: xilinx: Check for __get_free_pages() failure
gpio: gpio-omap: add check for off wake capable gpios
ice: Add missing case in print_link_msg for printing flow control
dmaengine: idma64: Use actual device for DMA transfers
pwm: tiehrpwm: Update shadow register for disabling PWMs
ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa
pwm: Fix deadlock warning when removing PWM device
ARM: exynos: Fix undefined instruction during Exynos5422 resume
usb: typec: fusb302: Check vconn is off when we start toggling
soc: renesas: Identify R-Car M3-W ES1.3
gpio: vf610: Do not share irq_chip
percpu: do not search past bitmap when allocating an area
Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections"
Revert "drm/nouveau: add kconfig option to turn off nouveau legacy contexts. (v3)"
ovl: check the capability before cred overridden
ovl: support stacked SEEK_HOLE/SEEK_DATA
drm/vc4: fix fb references in async update
ALSA: seq: Cover unsubscribe_port() in list_mutex
Linux 4.19.51
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit a9e73998f9d705c94a8dca9687633adc0f24a19a ]
While validating new map we require the @start_data to be strictly less
than @end_data, which is fine for regular applications (this is why this
nit didn't trigger for that long). These members are set from executable
loaders such as elf handers, still it is pretty valid to have a loadable
data section with zero size in file, in such case the start_data is equal
to end_data once kernel loader finishes.
As a result when we're trying to restore such programs the procedure fails
and the kernel returns -EINVAL. From the image dump of a program:
| "mm_start_code": "0x400000",
| "mm_end_code": "0x8f5fb4",
| "mm_start_data": "0xf1bfb0",
| "mm_end_data": "0xf1bfb0",
Thus we need to change validate_prctl_map from strictly less to less or
equal operator use.
Link: http://lkml.kernel.org/r/20190408143554.GY1421@uranus.lan
Fixes: f606b77f1a ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory. When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.
This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma. vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.
Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.
The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas. If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".
The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen. This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging. The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.
Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list. Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.
Bug: 120441514
Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Fix get_user_pages_remote() call to align with upstream commit
5b56d49fc3 ("mm: add locked parameter to get_user_pages_remote()")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Pull namespace fixes from Eric Biederman:
"This is a set of four fairly obvious bug fixes:
- a switch from d_find_alias to d_find_any_alias because the xattr
code perversely takes a dentry
- two mutex vs copy_to_user fixes from Jann Horn
- a fix to use a sanitized size not the size userspace passed in from
Christian Brauner"
* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
getxattr: use correct xattr length
sys: don't hold uts_sem while accessing userspace memory
userns: move user access out of the mutex
cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()
Holding uts_sem as a writer while accessing userspace memory allows a
namespace admin to stall all processes that attempt to take uts_sem.
Instead, move data through stack buffers and don't access userspace memory
while uts_sem is held.
Cc: stable@vger.kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
get_monotonic_boottime() is deprecated because it uses the old 'timespec'
structure. This replaces one of the last callers with a call to
ktime_get_boottime.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: y2038@lists.linaro.org
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Link: https://lkml.kernel.org/r/20180618150114.849216-1-arnd@arndb.de
mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too. It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.
And, the mmap_sem contention may cause unexpected issue like below:
INFO: task ps:14018 blocked for more than 120 seconds.
Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
ps D 0 14018 1 0x00000004
Call Trace:
schedule+0x36/0x80
rwsem_down_read_failed+0xf0/0x150
call_rwsem_down_read_failed+0x18/0x30
down_read+0x20/0x40
proc_pid_cmdline_read+0xd9/0x4e0
__vfs_read+0x37/0x150
vfs_read+0x96/0x130
SyS_read+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xc5
Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.
So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.
This patch just eliminates the abuse of mmap_sem, but it can't resolve
the above hung task warning completely since the later
access_remote_vm() call needs acquire mmap_sem. The mmap_sem
scalability issue will be solved in the future.
[yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
`resource' can be controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
Fix this by sanitizing *resource* before using it to index
current->signal->rlim
Notice that given that speculation windows are large, the policy is to
kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than
current.
This is needed both for /proc/$pid/status queries and for seccomp (since
thread-syncing can trigger seccomp in non-current threads).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.
PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:
Bit Define Description
0 PR_SPEC_PRCTL Mitigation can be controlled per task by
PR_SET_SPECULATION_CTRL
1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
disabled
2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
enabled
If all bits are 0 the CPU is not affected by the speculation misfeature.
If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.
PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.
The common return values are:
EINVAL prctl is not implemented by the architecture or the unused prctl()
arguments are not 0
ENODEV arg2 is selecting a not supported speculation misfeature
PR_SET_SPECULATION_CTRL has these additional return values:
ERANGE arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO prctl control of the selected speculation misfeature is disabled
The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.
Based on an initial patch from Tim Chen and mostly rewritten.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Using this helper allows us to avoid the in-kernel call to the
sys_setsid() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_setsid().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using these helpers allows us to avoid the in-kernel calls to these
syscalls: sys_setregid(), sys_setgid(), sys_setreuid(), sys_setuid(),
sys_setresuid(), sys_setresgid(), sys_setfsuid(), and sys_setfsgid().
The ksys_ prefix denotes that these function are meant as a drop-in
replacement for the syscall. In particular, they use the same calling
convention.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Using the do_getpgid() helper removes an in-kernel call to the
sys_getpgid() syscall.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The patch remains without practical effect since both macros carry
identical values. Still, it might become a problem in the future if
(for whatever reason) the default overflow uid and gid differ. The
DEFAULT_FS_OVERFLOWGID macro was previously unused.
Signed-off-by: Wolffhardt Schwabe <wolffhardt.schwabe@fau.de>
Signed-off-by: Anatoliy Cherepantsev <anatoliy.cherepantsev@fau.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Plenty of acronym soup here:
- Initial support for the Scalable Vector Extension (SVE)
- Improved handling for SError interrupts (required to handle RAS events)
- Enable GCC support for 128-bit integer types
- Remove kernel text addresses from backtraces and register dumps
- Use of WFE to implement long delay()s
- ACPI IORT updates from Lorenzo Pieralisi
- Perf PMU driver for the Statistical Profiling Extension (SPE)
- Perf PMU driver for Hisilicon's system PMUs
- Misc cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJaCcLqAAoJELescNyEwWM0JREH/2FbmD/khGzEtP8LW+o9D8iV
TBM02uWQxS1bbO1pV2vb+512YQO+iWfeQwJH9Jv2FZcrMvFv7uGRnYgAnJuXNGrl
W+LL6OhN22A24LSawC437RU3Xe7GqrtONIY/yLeJBPablfcDGzPK1eHRA0pUzcyX
VlyDruSHWX44VGBPV6JRd3x0vxpV8syeKOjbRvopRfn3Nwkbd76V3YSfEgwoTG5W
ET1sOnXLmHHdeifn/l1Am5FX1FYstpcd7usUTJ4Oto8y7e09tw3bGJCD0aMJ3vow
v1pCUWohEw7fHqoPc9rTrc1QEnkdML4vjJvMPUzwyTfPrN+7uEuMIEeJierW+qE=
=0qrg
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The big highlight is support for the Scalable Vector Extension (SVE)
which required extensive ABI work to ensure we don't break existing
applications by blowing away their signal stack with the rather large
new vector context (<= 2 kbit per vector register). There's further
work to be done optimising things like exception return, but the ABI
is solid now.
Much of the line count comes from some new PMU drivers we have, but
they're pretty self-contained and I suspect we'll have more of them in
future.
Plenty of acronym soup here:
- initial support for the Scalable Vector Extension (SVE)
- improved handling for SError interrupts (required to handle RAS
events)
- enable GCC support for 128-bit integer types
- remove kernel text addresses from backtraces and register dumps
- use of WFE to implement long delay()s
- ACPI IORT updates from Lorenzo Pieralisi
- perf PMU driver for the Statistical Profiling Extension (SPE)
- perf PMU driver for Hisilicon's system PMUs
- misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
arm64: Make ARMV8_DEPRECATED depend on SYSCTL
arm64: Implement __lshrti3 library function
arm64: support __int128 on gcc 5+
arm64/sve: Add documentation
arm64/sve: Detect SVE and activate runtime support
arm64/sve: KVM: Hide SVE from CPU features exposed to guests
arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
arm64/sve: KVM: Prevent guests from using SVE
arm64/sve: Add sysctl to set the default vector length for new processes
arm64/sve: Add prctl controls for userspace vector length management
arm64/sve: ptrace and ELF coredump support
arm64/sve: Preserve SVE registers around EFI runtime service calls
arm64/sve: Preserve SVE registers around kernel-mode NEON use
arm64/sve: Probe SVE capabilities and usable vector lengths
arm64: cpufeature: Move sys_caps_initialised declarations
arm64/sve: Backend logic for setting the vector length
arm64/sve: Signal handling support
arm64/sve: Support vector length resetting for new processes
arm64/sve: Core task context handling
arm64/sve: Low-level CPU setup
...
This patch adds two arm64-specific prctls, to permit userspace to
control its vector length:
* PR_SVE_SET_VL: set the thread's SVE vector length and vector
length inheritance mode.
* PR_SVE_GET_VL: get the same information.
Although these prctls resemble instruction set features in the SVE
architecture, they provide additional control: the vector length
inheritance mode is Linux-specific and nothing to do with the
architecture, and the architecture does not permit EL0 to set its
own vector length directly. Both can be used in portable tools
without requiring the use of SVE instructions.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: Fixed up prctl constants to avoid clash with PDEATHSIG]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During checkpointing and restore of userspace tasks
we bumped into the situation, that it's not possible
to restore the tasks, which user namespace does not
have uid 0 or gid 0 mapped.
People create user namespace mappings like they want,
and there is no a limitation on obligatory uid and gid
"must be mapped". So, if there is no uid 0 or gid 0
in the mapping, it's impossible to restore mm->exe_file
of the processes belonging to this user namespace.
Also, there is no a workaround. It's impossible
to create a temporary uid/gid mapping, because
only one write to /proc/[pid]/uid_map and gid_map
is allowed during a namespace lifetime.
If there is an entry, then no more mapings can't be
written. If there isn't an entry, we can't write
there too, otherwise user task won't be able
to do that in the future.
The patch changes the check, and looks for CAP_SYS_ADMIN
instead of zero uid and gid. This allows to restore
a task independently of its user namespace mappings.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Serge Hallyn <serge@hallyn.com>
CC: "Eric W. Biederman" <ebiederm@xmission.com>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Michal Hocko <mhocko@suse.com>
CC: Andrei Vagin <avagin@openvz.org>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
CC: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-and-tested-by: Meelis Roos <mroos@linux.ee>
Fixes: commit d9e968cb9f "getrlimit()/setrlimit(): move compat to native"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PR_SET_THP_DISABLE has a rather subtle semantic. It doesn't affect any
existing mapping because it only updated mm->def_flags which is a
template for new mappings.
The mappings created after prctl(PR_SET_THP_DISABLE) have VM_NOHUGEPAGE
flag set. This can be quite surprising for all those applications which
do not do prctl(); fork() & exec() and want to control their own THP
behavior.
Another usecase when the immediate semantic of the prctl might be useful
is a combination of pre- and post-copy migration of containers with
CRIU. In this case CRIU populates a part of a memory region with data
that was saved during the pre-copy stage. Afterwards, the region is
registered with userfaultfd and CRIU expects to get page faults for the
parts of the region that were not yet populated. However, khugepaged
collapses the pages and the expected page faults do not occur.
In more general case, the prctl(PR_SET_THP_DISABLE) could be used as a
temporary mechanism for enabling/disabling THP process wide.
Implementation wise, a new MMF_DISABLE_THP flag is added. This flag is
tested when decision whether to use huge pages is taken either during
page fault of at the time of THP collapse.
It should be noted, that the new implementation makes PR_SET_THP_DISABLE
master override to any per-VMA setting, which was not the case
previously.
Fixes: a0715cc226 ("mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE")
Link: http://lkml.kernel.org/r/1496415802-30944-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull misc compat stuff updates from Al Viro:
"This part is basically untangling various compat stuff. Compat
syscalls moved to their native counterparts, getting rid of quite a
bit of double-copying and/or set_fs() uses. A lot of field-by-field
copyin/copyout killed off.
- kernel/compat.c is much closer to containing just the
copyin/copyout of compat structs. Not all compat syscalls are gone
from it yet, but it's getting there.
- ipc/compat_mq.c killed off completely.
- block/compat_ioctl.c cleaned up; floppy compat ioctls moved to
drivers/block/floppy.c where they belong. Yes, there are several
drivers that implement some of the same ioctls. Some are m68k and
one is 32bit-only pmac. drivers/block/floppy.c is the only one in
that bunch that can be built on biarch"
* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
mqueue: move compat syscalls to native ones
usbdevfs: get rid of field-by-field copyin
compat_hdio_ioctl: get rid of set_fs()
take floppy compat ioctls to sodding floppy.c
ipmi: get rid of field-by-field __get_user()
ipmi: get COMPAT_IPMICTL_RECEIVE_MSG in sync with the native one
rt_sigtimedwait(): move compat to native
select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()
put_compat_rusage(): switch to copy_to_user()
sigpending(): move compat to native
getrlimit()/setrlimit(): move compat to native
times(2): move compat to native
compat_{get,put}_bitmap(): use unsafe_{get,put}_user()
fb_get_fscreeninfo(): don't bother with do_fb_ioctl()
do_sigaltstack(): lift copying to/from userland into callers
take compat_sys_old_getrlimit() to native syscall
trim __ARCH_WANT_SYS_OLD_GETRLIMIT
New helpers: kernel_waitid() and kernel_wait4(). sys_waitid(),
sys_wait4() and their compat variants switched to those. Copying
struct rusage to userland is left to syscall itself. For
compat_sys_wait4() that eliminates the use of set_fs() completely.
For compat_sys_waitid() it's still needed (for siginfo handling);
that will change shortly.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull namespace updates from Eric Biederman:
"This is a set of small fixes that were mostly stumbled over during
more significant development. This proc fix and the fix to
posix-timers are the most significant of the lot.
There is a lot of good development going on but unfortunately it
didn't quite make the merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: Fix unbalanced hard link numbers
signal: Make kill_proc_info static
rlimit: Properly call security_task_setrlimit
signal: Remove unused definition of sig_user_definied
ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET
ipc: Remove unused declaration of recompute_msgmni
posix-timers: Correct sanity check in posix_cpu_nsleep
sysctl: Remove dead register_sysctl_root
Modify do_prlimit to call security_task_setrlimit passing the task
whose rlimit we are changing not the tsk->group_leader.
In general this should not matter as the lsms implementing
security_task_setrlimit apparmor and selinux both examine the
task->cred to see what should be allowed on the destination task.
That task->cred is shared between tasks created with CLONE_THREAD
unless thread keyrings are in play, in which case both apparmor and
selinux create duplicate security contexts.
So the only time when it will matter which thread is passed to
security_task_setrlimit is if one of the threads of a process performs
an operation that changes only it's credentials. At which point if a
thread has done that we don't want to hide that information from the
lsms.
So fix the call of security_task_setrlimit. With the removal
of tsk->group_leader this makes the code slightly faster,
more comprehensible and maintainable.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When SELinux was first added to the kernel, a process could only get
and set its own resource limits via getrlimit(2) and setrlimit(2), so no
MAC checks were required for those operations, and thus no security hooks
were defined for them. Later, SELinux introduced a hook for setlimit(2)
with a check if the hard limit was being changed in order to be able to
rely on the hard limit value as a safe reset point upon context
transitions.
Later on, when prlimit(2) was added to the kernel with the ability to get
or set resource limits (hard or soft) of another process, LSM/SELinux was
not updated other than to pass the target process to the setrlimit hook.
This resulted in incomplete control over both getting and setting the
resource limits of another process.
Add a new security_task_prlimit() hook to the check_prlimit_permission()
function to provide complete mediation. The hook is only called when
acting on another task, and only if the existing DAC/capability checks
would allow access. Pass flags down to the hook to indicate whether the
prlimit(2) call will read, write, or both read and write the resource
limits of the target process.
The existing security_task_setrlimit() hook is left alone; it continues
to serve a purpose in supporting the ability to make decisions based on
the old and/or new resource limit values when setting limits. This
is consistent with the DAC/capability logic, where
check_prlimit_permission() performs generic DAC/capability checks for
acting on another task, while do_prlimit() performs a capability check
based on a comparison of the old and new resource limits. Fix the
inline documentation for the hook to match the code.
Implement the new hook for SELinux. For setting resource limits, we
reuse the existing setrlimit permission. Note that this does overload
the setrlimit permission to mean the ability to set the resource limit
(soft or hard) of another process or the ability to change one's own
hard limit. For getting resource limits, a new getrlimit permission
is defined. This was not originally defined since getrlimit(2) could
only be used to obtain a process' own limits.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Introduce a trivial, mostly empty <linux/sched/cputime.h> header
to prepare for the moving of cputime functionality out of sched.h.
Update all code that relies on these facilities.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/stat.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/coredump.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/coredump.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
The APIs that are going to be moved first are:
mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/autogroup.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/autogroup.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/loadavg.h> out of <linux/sched.h>, which
will have to be picked up from a couple of .c files.
Create a trivial placeholder <linux/sched/topology.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull namespace updates from Eric Biederman:
"There is a lot here. A lot of these changes result in subtle user
visible differences in kernel behavior. I don't expect anything will
care but I will revert/fix things immediately if any regressions show
up.
From Seth Forshee there is a continuation of the work to make the vfs
ready for unpriviled mounts. We had thought the previous changes
prevented the creation of files outside of s_user_ns of a filesystem,
but it turns we missed the O_CREAT path. Ooops.
Pavel Tikhomirov and Oleg Nesterov worked together to fix a long
standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only
children that are forked after the prctl are considered and not
children forked before the prctl. The only known user of this prctl
systemd forks all children after the prctl. So no userspace
regressions will occur. Holding earlier forked children to the same
rules as later forked children creates a semantic that is sane enough
to allow checkpoing of processes that use this feature.
There is a long delayed change by Nikolay Borisov to limit inotify
instances inside a user namespace.
Michael Kerrisk extends the API for files used to maniuplate
namespaces with two new trivial ioctls to allow discovery of the
hierachy and properties of namespaces.
Konstantin Khlebnikov with the help of Al Viro adds code that when a
network namespace exits purges it's sysctl entries from the dcache. As
in some circumstances this could use a lot of memory.
Vivek Goyal fixed a bug with stacked filesystems where the permissions
on the wrong inode were being checked.
I continue previous work on ptracing across exec. Allowing a file to
be setuid across exec while being ptraced if the tracer has enough
credentials in the user namespace, and if the process has CAP_SETUID
in it's own namespace. Proc files for setuid or otherwise undumpable
executables are now owned by the root in the user namespace of their
mm. Allowing debugging of setuid applications in containers to work
better.
A bug I introduced with permission checking and automount is now
fixed. The big change is to mark the mounts that the kernel initiates
as a result of an automount. This allows the permission checks in sget
to be safely suppressed for this kind of mount. As the permission
check happened when the original filesystem was mounted.
Finally a special case in the mount namespace is removed preventing
unbounded chains in the mount hash table, and making the semantics
simpler which benefits CRIU.
The vfs fix along with related work in ima and evm I believe makes us
ready to finish developing and merge fully unprivileged mounts of the
fuse filesystem. The cleanups of the mount namespace makes discussing
how to fix the worst case complexity of umount. The stacked filesystem
fixes pave the way for adding multiple mappings for the filesystem
uids so that efficient and safer containers can be implemented"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc/sysctl: Don't grab i_lock under sysctl_lock.
vfs: Use upper filesystem inode in bprm_fill_uid()
proc/sysctl: prune stale dentries during unregistering
mnt: Tuck mounts under others instead of creating shadow/side mounts.
prctl: propagate has_child_subreaper flag to every descendant
introduce the walk_process_tree() helper
nsfs: Add an ioctl() to return owner UID of a userns
fs: Better permission checking for submounts
exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction
vfs: open() with O_CREAT should not create inodes with unknown ids
nsfs: Add an ioctl() to return the namespace type
proc: Better ownership of files for non-dumpable tasks in user namespaces
exec: Remove LSM_UNSAFE_PTRACE_CAP
exec: Test the ptracer's saved cred to see if the tracee can gain caps
exec: Don't reset euid and egid when the tracee has CAP_SETUID
inotify: Convert to using per-namespace limits
If process forks some children when it has is_child_subreaper
flag enabled they will inherit has_child_subreaper flag - first
group, when is_child_subreaper is disabled forked children will
not inherit it - second group. So child-subreaper does not reparent
all his descendants when their parents die. Having these two
differently behaving groups can lead to confusion. Also it is
a problem for CRIU, as when we restore process tree we need to
somehow determine which descendants belong to which group and
much harder - to put them exactly to these group.
To simplify these we can add a propagation of has_child_subreaper
flag on PR_SET_CHILD_SUBREAPER, walking all descendants of child-
subreaper to setup has_child_subreaper flag.
In common cases when process like systemd first sets itself to
be a child-subreaper and only after that forks its services, we will
have zero-length list of descendants to walk. Testing with binary
subtree of 2^15 processes prctl took < 0.007 sec and has shown close
to linear dependency(~0.2 * n * usec) on lower numbers of processes.
Moreover, I doubt someone intentionaly pre-forks the children whitch
should reparent to init before becoming subreaper, because some our
ancestor migh have had is_child_subreaper flag while forking our
sub-tree and our childs will all inherit has_child_subreaper flag,
and we have no way to influence it. And only way to check if we have
no has_child_subreaper flag is to create some childs, kill them and
see where they will reparent to.
Using walk_process_tree helper to walk subtree, thanks to Oleg! Timing
seems to be the same.
Optimize:
a) When descendant already has has_child_subreaper flag all his subtree
has it too already.
* for a) to be true need to move has_child_subreaper inheritance under
the same tasklist_lock with adding task to its ->real_parent->children
as without it process can inherit zero has_child_subreaper, then we
set 1 to it's parent flag, check that parent has no more children, and
only after child with wrong flag is added to the tree.
* Also make these inheritance more clear by using real_parent instead of
current, as on clone(CLONE_PARENT) if current has is_child_subreaper
and real_parent has no is_child_subreaper or has_child_subreaper, child
will have has_child_subreaper flag set without actually having a
subreaper in it's ancestors.
b) When some descendant is child_reaper, it's subtree is in different
pidns from us(original child-subreaper) and processes from other pidns
will never reparent to us.
So we can skip their(a,b) subtree from walk.
v2: switch to walk_process_tree() general helper, move
has_child_subreaper inheritance
v3: remove csr_descendant leftover, change current to real_parent
in has_child_subreaper inheritance
v4: small commit message fix
Fixes: ebec18a6d3 ("prctl: add PR_{SET,GET}_CHILD_SUBREAPER to allow simple process supervision")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Now that most cputime readers use the transition API which return the
task cputime in old style cputime_t, we can safely store the cputime in
nsecs. This will eventually make cputime statistics less opaque and more
granular. Back and forth convertions between cputime_t and nsecs in order
to deal with cputime_t random granularity won't be needed anymore.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-8-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge updates from Andrew Morton:
- various misc bits
- most of MM (quite a lot of MM material is awaiting the merge of
linux-next dependencies)
- kasan
- printk updates
- procfs updates
- MAINTAINERS
- /lib updates
- checkpatch updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (123 commits)
init: reduce rootwait polling interval time to 5ms
binfmt_elf: use vmalloc() for allocation of vma_filesz
checkpatch: don't emit unified-diff error for rename-only patches
checkpatch: don't check c99 types like uint8_t under tools
checkpatch: avoid multiple line dereferences
checkpatch: don't check .pl files, improve absolute path commit log test
scripts/checkpatch.pl: fix spelling
checkpatch: don't try to get maintained status when --no-tree is given
lib/ida: document locking requirements a bit better
lib/rbtree.c: fix typo in comment of ____rb_erase_color
lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
MAINTAINERS: add drm and drm/i915 irc channels
MAINTAINERS: add "C:" for URI for chat where developers hang out
MAINTAINERS: add drm and drm/i915 bug filing info
MAINTAINERS: add "B:" for URI where to file bugs
get_maintainer: look for arbitrary letter prefixes in sections
printk: add Kconfig option to set default console loglevel
printk/sound: handle more message headers
printk/btrfs: handle more message headers
printk/kdb: handle more message headers
...
This limitation came with the reason to remove "another way for
malicious code to obscure a compromised program and masquerade as a
benign process" by allowing "security-concious program can use this
prctl once during its early initialization to ensure the prctl cannot
later be abused for this purpose":
http://marc.info/?l=linux-kernel&m=133160684517468&w=2
This explanation doesn't look sufficient. The only thing "exe" link is
indicating is the file, used to execve, which is basically nothing and
not reliable immediately after process has returned from execve system
call.
Moreover, to use this feture, all the mappings to previous exe file have
to be unmapped and all the new exe file permissions must be satisfied.
Which means, that changing exe link is very similar to calling execve on
the binary.
The need to remove this limitations comes from migration of NFS mount
point, which is not accessible during restore and replaced by other file
system. Because of this exe link has to be changed twice.
[akpm@linux-foundation.org: fix up comment]
Link: http://lkml.kernel.org/r/20160927153755.9337.69650.stgit@localhost.localdomain
Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some embedded systems have no use for them. This removes about
25KB from the kernel binary size when configured out.
Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime, setitimer, getitimer, alarm.
The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-kbuild@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Michal Marek <mmarek@suse.com>
Cc: Edward Cree <ecree@solarflare.com>
Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
PR_SET_THP_DISABLE requires mmap_sem for write. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset introduces a /proc/<pid>/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).
The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems. It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.
The second patch introduces the /proc/<pid>/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.
With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:
$ time sleep 1
real 0m10.747s
user 0m0.001s
sys 0m0.005s
The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s. Let me
know if it makes sense to break that up more or not.
Other than that things are fairly straightforward.
This patch (of 2):
The timer_slack_ns value in the task struct is currently a unsigned
long. This means that on 32bit applications, the maximum slack is just
over 4 seconds. However, on 64bit machines, its much much larger (~500
years).
This disparity could make application development a little (as well as
the default_slack) to a u64. This means both 32bit and 64bit systems
have the same effective internal slack range.
Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.
This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An unprivileged user can trigger an oops on a kernel with
CONFIG_CHECKPOINT_RESTORE.
proc_pid_cmdline_read takes mmap_sem for reading and obtains args + env
start/end values. These get sanity checked as follows:
BUG_ON(arg_start > arg_end);
BUG_ON(env_start > env_end);
These can be changed by prctl_set_mm. Turns out also takes the semaphore for
reading, effectively rendering it useless. This results in:
kernel BUG at fs/proc/base.c:240!
invalid opcode: 0000 [#1] SMP
Modules linked in: virtio_net
CPU: 0 PID: 925 Comm: a.out Not tainted 4.4.0-rc8-next-20160105dupa+ #71
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff880077a68000 ti: ffff8800784d0000 task.ti: ffff8800784d0000
RIP: proc_pid_cmdline_read+0x520/0x530
RSP: 0018:ffff8800784d3db8 EFLAGS: 00010206
RAX: ffff880077c5b6b0 RBX: ffff8800784d3f18 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 00007f78e8857000 RDI: 0000000000000246
RBP: ffff8800784d3e40 R08: 0000000000000008 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000050
R13: 00007f78e8857800 R14: ffff88006fcef000 R15: ffff880077c5b600
FS: 00007f78e884a740(0000) GS:ffff88007b200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f78e8361770 CR3: 00000000790a5000 CR4: 00000000000006f0
Call Trace:
__vfs_read+0x37/0x100
vfs_read+0x82/0x130
SyS_read+0x58/0xd0
entry_SYSCALL_64_fastpath+0x12/0x76
Code: 4c 8b 7d a8 eb e9 48 8b 9d 78 ff ff ff 4c 8b 7d 90 48 8b 03 48 39 45 a8 0f 87 f0 fe ff ff e9 d1 fe ff ff 4c 8b 7d 90 eb c6 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
RIP proc_pid_cmdline_read+0x520/0x530
---[ end trace 97882617ae9c6818 ]---
Turns out there are instances where the code just reads aformentioned
values without locking whatsoever - namely environ_read and get_cmdline.
Interestingly these functions look quite resilient against bogus values,
but I don't believe this should be relied upon.
The first patch gets rid of the oops bug by grabbing mmap_sem for
writing.
The second patch is optional and puts locking around aformentioned
consumers for safety. Consumers of other fields don't seem to benefit
from similar treatment and are left untouched.
This patch (of 2):
The code was taking the semaphore for reading, which does not protect
against readers nor concurrent modifications.
The problem could cause a sanity checks to fail in procfs's cmdline
reader, resulting in an OOPS.
Note that some functions perform an unlocked read of various mm fields,
but they seem to be fine despite possible modificaton.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anshuman Khandual <anshuman.linux@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>