Commit graph

505 commits

Author SHA1 Message Date
Greg Kroah-Hartman
633920f372 Revert "ANDROID: vfs: Add permission2 for filesystems with per mount permissions"
This reverts commit e81cea2a6f as it is
longer needed because sdcardfs is gone.

Bug: 157700134
Cc: Daniel Rosenberg <drosen@google.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Yongqin Liu <yongqin.liu@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic7f303d4353c9f03e8e7a5aad07d0a5aa5289412
2020-06-27 15:17:42 +02:00
Greg Kroah-Hartman
91d4544b24 This is the 4.19.124 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl7Ey84ACgkQONu9yGCS
 aT4enA/+JsigMJOLeEtEZ4Gf97S0HnxOIqvuz7759s07vTzwPV1BfQm2eafcS8Cl
 8//BO73tDe+m5shH0mFCeFsy0p1qC4+ewIyLPnjulxls1BCZ86xK44/WD6N0DgX9
 Fi0HACcObuNZD7814yIyrWaI9QHZO+OwJlmjCXBiZGC4gZwAnGcgY2+ffYf/hRv2
 wgEyJF2Td0rORCOM3qp8Ipdt1S8inm2yZodGC5htSPajfBLPe8narmkOXxcN+tuB
 BvOwdTJoplmhNwpimWacytL+jQJYKHS/izPX0JYkFDfQ/bgOYXz3CWwa2DMOVsGd
 CQOHp4rK/Rl/caAANe3nD87jstRbaRKp7HZELCJ+KZrHpGfefAZs6g5j+LNC7KQt
 6YloSnTQsnRC6nqu+b2ieI5KoZAfwWoyHrQf7obJi6PJF4Ge4XUbaLEDH9TuxZTN
 tZX5ZOGZ8/i32VgYqBA4mDAbV+n5TyEYl722XxXzgim73VUDl67F7JqtDxMMb4Ic
 KW98luDDXgoq+kM2FqWgXtjxoP4TpjRREjwCpNDEa03ydKW+dwM21D7IoQNtXUgT
 uE6aFPVuhRt5MAhOdSHtkSsbOjiJZjKuPKvYyUFvAQT5JMaYZg9pabnH89E6URQ9
 x7M2JOvR/GMOmPRykQoewqV0027K37TYxBfRAzLbNFv8Iol/a7I=
 =pHmd
 -----END PGP SIGNATURE-----

Merge 4.19.124 into android-4.19-stable

Changes in 4.19.124
	net: dsa: Do not make user port errors fatal
	shmem: fix possible deadlocks on shmlock_user_lock
	net/sonic: Fix a resource leak in an error handling path in 'jazz_sonic_probe()'
	net: moxa: Fix a potential double 'free_irq()'
	drop_monitor: work around gcc-10 stringop-overflow warning
	virtio-blk: handle block_device_operations callbacks after hot unplug
	scsi: sg: add sg_remove_request in sg_write
	mmc: sdhci-acpi: Add SDHCI_QUIRK2_BROKEN_64_BIT_DMA for AMDI0040
	net: fix a potential recursive NETDEV_FEAT_CHANGE
	netlabel: cope with NULL catmap
	net: phy: fix aneg restart in phy_ethtool_set_eee
	pppoe: only process PADT targeted at local interfaces
	Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"
	tcp: fix error recovery in tcp_zerocopy_receive()
	virtio_net: fix lockdep warning on 32 bit
	hinic: fix a bug of ndo_stop
	net: dsa: loop: Add module soft dependency
	net: ipv4: really enforce backoff for redirects
	netprio_cgroup: Fix unlimited memory leak of v2 cgroups
	net: tcp: fix rx timestamp behavior for tcp_recvmsg
	tcp: fix SO_RCVLOWAT hangs with fat skbs
	riscv: fix vdso build with lld
	dmaengine: pch_dma.c: Avoid data race between probe and irq handler
	dmaengine: mmp_tdma: Reset channel error on release
	cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once
	ALSA: hda/hdmi: fix race in monitor detection during probe
	drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper()
	ipc/util.c: sysvipc_find_ipc() incorrectly updates position index
	ALSA: hda/realtek - Fix S3 pop noise on Dell Wyse
	gfs2: Another gfs2_walk_metadata fix
	pinctrl: baytrail: Enable pin configuration setting for GPIO chip
	pinctrl: cherryview: Add missing spinlock usage in chv_gpio_irq_handler
	i40iw: Fix error handling in i40iw_manage_arp_cache()
	mmc: core: Check request type before completing the request
	mmc: block: Fix request completion in the CQE timeout path
	NFS: Fix fscache super_cookie index_key from changing after umount
	nfs: fscache: use timespec64 in inode auxdata
	NFSv4: Fix fscache cookie aux_data to ensure change_attr is included
	netfilter: conntrack: avoid gcc-10 zero-length-bounds warning
	arm64: fix the flush_icache_range arguments in machine_kexec
	netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()
	IB/mlx4: Test return value of calls to ib_get_cached_pkey
	hwmon: (da9052) Synchronize access with mfd
	pnp: Use list_for_each_entry() instead of open coding
	gcc-10 warnings: fix low-hanging fruit
	kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
	Stop the ad-hoc games with -Wno-maybe-initialized
	gcc-10: disable 'zero-length-bounds' warning for now
	gcc-10: disable 'array-bounds' warning for now
	gcc-10: disable 'stringop-overflow' warning for now
	gcc-10: disable 'restrict' warning for now
	gcc-10: avoid shadowing standard library 'free()' in crypto
	ALSA: hda/realtek - Limit int mic boost for Thinkpad T530
	ALSA: rawmidi: Fix racy buffer resize under concurrent accesses
	ALSA: usb-audio: Add control message quirk delay for Kingston HyperX headset
	usb: core: hub: limit HUB_QUIRK_DISABLE_AUTOSUSPEND to USB5534B
	usb: host: xhci-plat: keep runtime active when removing host
	USB: gadget: fix illegal array access in binding with UDC
	usb: xhci: Fix NULL pointer dereference when enqueuing trbs from urb sg list
	ARM: dts: dra7: Fix bus_dma_limit for PCIe
	ARM: dts: imx27-phytec-phycard-s-rdk: Fix the I2C1 pinctrl entries
	cifs: fix leaked reference on requeued write
	x86: Fix early boot crash on gcc-10, third try
	x86/unwind/orc: Fix error handling in __unwind_start()
	exec: Move would_dump into flush_old_exec
	clk: rockchip: fix incorrect configuration of rk3228 aclk_gpu* clocks
	dwc3: Remove check for HWO flag in dwc3_gadget_ep_reclaim_trb_sg()
	usb: gadget: net2272: Fix a memory leak in an error handling path in 'net2272_plat_probe()'
	usb: gadget: audio: Fix a missing error return value in audio_bind()
	usb: gadget: legacy: fix error return code in gncm_bind()
	usb: gadget: legacy: fix error return code in cdc_bind()
	Revert "ALSA: hda/realtek: Fix pop noise on ALC225"
	clk: Unlink clock if failed to prepare or enable
	arm64: dts: rockchip: Replace RK805 PMIC node name with "pmic" on rk3328 boards
	arm64: dts: rockchip: Rename dwc3 device nodes on rk3399 to make dtc happy
	ARM: dts: r8a73a4: Add missing CMT1 interrupts
	arm64: dts: renesas: r8a77980: Fix IPMMU VIP[01] nodes
	ARM: dts: r8a7740: Add missing extal2 to CPG node
	KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
	Makefile: disallow data races on gcc-10 as well
	Linux 4.19.124

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3d253f677cc08337e64d316005a0ec0c33717940
2020-05-20 11:37:46 +02:00
Eric W. Biederman
bfdb18282b exec: Move would_dump into flush_old_exec
commit f87d1c9559164294040e58f5e3b74a162bf7c6e8 upstream.

I goofed when I added mm->user_ns support to would_dump.  I missed the
fact that in the case of binfmt_loader, binfmt_em86, binfmt_misc, and
binfmt_script bprm->file is reassigned.  Which made the move of
would_dump from setup_new_exec to __do_execve_file before exec_binprm
incorrect as it can result in would_dump running on the script instead
of the interpreter of the script.

The net result is that the code stopped making unreadable interpreters
undumpable.  Which allows them to be ptraced and written to disk
without special permissions.  Oops.

The move was necessary because the call in set_new_exec was after
bprm->mm was no longer valid.

To correct this mistake move the misplaced would_dump from
__do_execve_file into flos_old_exec, before exec_mmap is called.

I tested and confirmed that without this fix I can attach with gdb to
a script with an unreadable interpreter, and with this fix I can not.

Cc: stable@vger.kernel.org
Fixes: f84df2a6f2 ("exec: Ensure mm->user_ns contains the execed files")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20 08:18:50 +02:00
Greg Kroah-Hartman
95bff4cdab This is the 4.19.116 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6ZbYYACgkQONu9yGCS
 aT76ohAAn4lIjSuMRCILy/lq0DXVWDy7q6YdfyzNBITxc86tVfnfjMeQxUBviE/1
 OzShWgMRXeqrb0xJTJ5Rv6mt5Kf9a3DpPWt2jwo1iqWkl4AihDtDV7Z2Bh+QdnSX
 +lQ1xGPqDi4MMgoYlpMtlFc3wq/pJV0i8Q7amXC/KbsDkt5dlDrQYeEZHe2P7pR9
 ZljKLHEdGRE3uGqXmEM8qb6aLjQudnHmH/9uChP4UX6b+ZADDCc05DMhEkhEoCZT
 jdxiqVZvRdiiXTc1r6ckGv0xae77s0IAAZMQAd+24zFK94QByi6d9Cw0y6qyyDi7
 1rfHIWSjvetY3+4DCQDOu/k2/pLt/Vqh9zuvtaf8Tu8cKM9rxow0Hl9FlL3fZpBN
 btpqeCY6twFxApHoAp9ZDK6otaVEOtbg1MCsmpUbVxWIF9IR8cPqMGyYK3lR2Ao1
 HgdKEFkYOycAOu51ujuHsDLx/9k2ZqeSPyh0yrdVpFUVvMV/YqoYP9X3jzGRVllL
 hgYfFcywgrVgxK4c02/6cPiJNbFskTpLllDPVVXGIjO+9R4vTRUgJ74CNrqL25aT
 ioSFWJA00UvXObnbCDdA+otYYWAmYOJX7HVvEieb0oDqPYHZHa1UW6+1WlYSAQLm
 WAsHiejOv6PwzRmCDI6RyuZKQjjX6bppAWFq0/RLPO0uEqjXlxc=
 =Iq3k
 -----END PGP SIGNATURE-----

Merge 4.19.116 into android-4.19

Changes in 4.19.116
	ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltage
	bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads
	net: vxge: fix wrong __VA_ARGS__ usage
	hinic: fix a bug of waitting for IO stopped
	hinic: fix wrong para of wait_for_completion_timeout
	cxgb4/ptp: pass the sign of offset delta in FW CMD
	qlcnic: Fix bad kzalloc null test
	i2c: st: fix missing struct parameter description
	cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL
	media: venus: hfi_parser: Ignore HEVC encoding for V1
	firmware: arm_sdei: fix double-lock on hibernate with shared events
	null_blk: Fix the null_add_dev() error path
	null_blk: Handle null_add_dev() failures properly
	null_blk: fix spurious IO errors after failed past-wp access
	xhci: bail out early if driver can't accress host in resume
	x86: Don't let pgprot_modify() change the page encryption bit
	block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices
	irqchip/versatile-fpga: Handle chained IRQs properly
	sched: Avoid scale real weight down to zero
	selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault
	PCI/switchtec: Fix init_completion race condition with poll_wait()
	media: i2c: video-i2c: fix build errors due to 'imply hwmon'
	libata: Remove extra scsi_host_put() in ata_scsi_add_hosts()
	pstore/platform: fix potential mem leak if pstore_init_fs failed
	gfs2: Don't demote a glock until its revokes are written
	x86/boot: Use unsigned comparison for addresses
	efi/x86: Ignore the memory attributes table on i386
	genirq/irqdomain: Check pointer in irq_domain_alloc_irqs_hierarchy()
	block: Fix use-after-free issue accessing struct io_cq
	media: i2c: ov5695: Fix power on and off sequences
	usb: dwc3: core: add support for disabling SS instances in park mode
	irqchip/gic-v4: Provide irq_retrigger to avoid circular locking dependency
	md: check arrays is suspended in mddev_detach before call quiesce operations
	firmware: fix a double abort case with fw_load_sysfs_fallback
	locking/lockdep: Avoid recursion in lockdep_count_{for,back}ward_deps()
	block, bfq: fix use-after-free in bfq_idle_slice_timer_body
	btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued
	btrfs: remove a BUG_ON() from merge_reloc_roots()
	btrfs: track reloc roots based on their commit root bytenr
	IB/mlx5: Replace tunnel mpls capability bits for tunnel_offloads
	uapi: rename ext2_swab() to swab() and share globally in swab.h
	slub: improve bit diffusion for freelist ptr obfuscation
	ASoC: fix regwmask
	ASoC: dapm: connect virtual mux with default value
	ASoC: dpcm: allow start or stop during pause for backend
	ASoC: topology: use name_prefix for new kcontrol
	usb: gadget: f_fs: Fix use after free issue as part of queue failure
	usb: gadget: composite: Inform controller driver of self-powered
	ALSA: usb-audio: Add mixer workaround for TRX40 and co
	ALSA: hda: Add driver blacklist
	ALSA: hda: Fix potential access overflow in beep helper
	ALSA: ice1724: Fix invalid access for enumerated ctl items
	ALSA: pcm: oss: Fix regression by buffer overflow fix
	ALSA: doc: Document PC Beep Hidden Register on Realtek ALC256
	ALSA: hda/realtek - Set principled PC Beep configuration for ALC256
	ALSA: hda/realtek - Remove now-unnecessary XPS 13 headphone noise fixups
	ALSA: hda/realtek - Add quirk for MSI GL63
	media: ti-vpe: cal: fix disable_irqs to only the intended target
	acpi/x86: ignore unspecified bit positions in the ACPI global lock field
	thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=n
	nvme-fc: Revert "add module to ops template to allow module references"
	nvme: Treat discovery subsystems as unique subsystems
	PCI: pciehp: Fix indefinite wait on sysfs requests
	PCI/ASPM: Clear the correct bits when enabling L1 substates
	PCI: Add boot interrupt quirk mechanism for Xeon chipsets
	PCI: endpoint: Fix for concurrent memory allocation in OB address region
	tpm: Don't make log failures fatal
	tpm: tpm1_bios_measurements_next should increase position index
	tpm: tpm2_bios_measurements_next should increase position index
	KEYS: reaching the keys quotas correctly
	irqchip/versatile-fpga: Apply clear-mask earlier
	pstore: pstore_ftrace_seq_next should increase position index
	MIPS/tlbex: Fix LDDIR usage in setup_pw() for Loongson-3
	MIPS: OCTEON: irq: Fix potential NULL pointer dereference
	ath9k: Handle txpower changes even when TPC is disabled
	signal: Extend exec_id to 64bits
	x86/entry/32: Add missing ASM_CLAC to general_protection entry
	KVM: nVMX: Properly handle userspace interrupt window request
	KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
	KVM: s390: vsie: Fix delivery of addressing exceptions
	KVM: x86: Allocate new rmap and large page tracking when moving memslot
	KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support
	KVM: x86: Gracefully handle __vmalloc() failure during VM allocation
	KVM: VMX: fix crash cleanup when KVM wasn't used
	CIFS: Fix bug which the return value by asynchronous read is error
	mtd: spinand: Stop using spinand->oobbuf for buffering bad block markers
	mtd: spinand: Do not erase the block before writing a bad block marker
	Btrfs: fix crash during unmount due to race with delayed inode workers
	btrfs: set update the uuid generation as soon as possible
	btrfs: drop block from cache on error in relocation
	btrfs: fix missing file extent item for hole after ranged fsync
	btrfs: fix missing semaphore unlock in btrfs_sync_file
	crypto: mxs-dcp - fix scatterlist linearization for hash
	erofs: correct the remaining shrink objects
	powerpc/pseries: Drop pointless static qualifier in vpa_debugfs_init()
	x86/speculation: Remove redundant arch_smt_update() invocation
	tools: gpio: Fix out-of-tree build regression
	mm: Use fixed constant in page_frag_alloc instead of size + 1
	net: qualcomm: rmnet: Allow configuration updates to existing devices
	arm64: dts: allwinner: h6: Fix PMU compatible
	dm writecache: add cond_resched to avoid CPU hangs
	dm verity fec: fix memory leak in verity_fec_dtr
	scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point
	arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
	selftests: vm: drop dependencies on page flags from mlock2 tests
	rtc: omap: Use define directive for PIN_CONFIG_ACTIVE_HIGH
	drm/etnaviv: rework perfmon query infrastructure
	powerpc/pseries: Avoid NULL pointer dereference when drmem is unavailable
	NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
	ext4: fix a data race at inode->i_blocks
	fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
	ocfs2: no need try to truncate file beyond i_size
	perf tools: Support Python 3.8+ in Makefile
	s390/diag: fix display of diagnose call statistics
	Input: i8042 - add Acer Aspire 5738z to nomux list
	clk: ingenic/jz4770: Exit with error if CGU init failed
	kmod: make request_module() return an error when autoloading is disabled
	cpufreq: powernv: Fix use-after-free
	hfsplus: fix crash and filesystem corruption when deleting files
	libata: Return correct status in sata_pmp_eh_recover_pm() when ATA_DFLAG_DETACH is set
	ipmi: fix hung processes in __get_guid()
	xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()
	powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle
	powerpc/64/tm: Don't let userspace set regs->trap via sigreturn
	powerpc/hash64/devmap: Use H_PAGE_THP_HUGE when setting up huge devmap PTE entries
	powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs
	powerpc/kprobes: Ignore traps that happened in real mode
	scsi: mpt3sas: Fix kernel panic observed on soft HBA unplug
	powerpc: Add attributes for setjmp/longjmp
	powerpc: Make setjmp/longjmp signature standard
	btrfs: use nofs allocations for running delayed items
	dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone()
	crypto: caam - update xts sector size for large input length
	crypto: ccree - improve error handling
	crypto: ccree - zero out internal struct before use
	crypto: ccree - don't mangle the request assoclen
	crypto: ccree - dec auth tag size from cryptlen map
	crypto: ccree - only try to map auth tag if needed
	Revert "drm/dp_mst: Remove VCPI while disabling topology mgr"
	drm/dp_mst: Fix clearing payload state on topology disable
	drm: Remove PageReserved manipulation from drm_pci_alloc
	ftrace/kprobe: Show the maxactive number on kprobe_events
	powerpc/fsl_booke: Avoid creating duplicate tlb1 entry
	misc: echo: Remove unnecessary parentheses and simplify check for zero
	etnaviv: perfmon: fix total and idle HI cyleces readout
	mfd: dln2: Fix sanity checking for endpoints
	efi/x86: Fix the deletion of variables in mixed mode
	Linux 4.19.116

Change-Id: If09fbb53fcb11ea01eaaa7fee7ed21ed6234f352
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2020-04-18 13:33:51 +02:00
Eric W. Biederman
a2a1be2de7 signal: Extend exec_id to 64bits
commit d1e7fd6462ca9fc76650fbe6ca800e35b24267da upstream.

Replace the 32bit exec_id with a 64bit exec_id to make it impossible
to wrap the exec_id counter.  With care an attacker can cause exec_id
wrap and send arbitrary signals to a newly exec'd parent.  This
bypasses the signal sending checks if the parent changes their
credentials during exec.

The severity of this problem can been seen that in my limited testing
of a 32bit exec_id it can take as little as 19s to exec 65536 times.
Which means that it can take as little as 14 days to wrap a 32bit
exec_id.  Adam Zabrocki has succeeded wrapping the self_exe_id in 7
days.  Even my slower timing is in the uptime of a typical server.
Which means self_exec_id is simply a speed bump today, and if exec
gets noticably faster self_exec_id won't even be a speed bump.

Extending self_exec_id to 64bits introduces a problem on 32bit
architectures where reading self_exec_id is no longer atomic and can
take two read instructions.  Which means that is is possible to hit
a window where the read value of exec_id does not match the written
value.  So with very lucky timing after this change this still
remains expoiltable.

I have updated the update of exec_id on exec to use WRITE_ONCE
and the read of exec_id in do_notify_parent to use READ_ONCE
to make it clear that there is no locking between these two
locations.

Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl
Fixes: 2.3.23pre2
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17 10:48:47 +02:00
Greg Kroah-Hartman
844ecc4634 This is the 4.19.64 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl1GibIACgkQONu9yGCS
 aT7z2hAAmv8AsH9IG43m7t6zLroJVswr/9594xk7yPBQgcY3/PW2aTFBCFbsdOL4
 yXcj2PSwRiq9K6qAJULrvOvncR9fIILHqzWzyXnoaZ30lR/FxaaFmuHZX/5Ix1tB
 e5EEE/EA49UAEjEDaMLq8g2IvibsReDxmSpnXyBJWoyRAdFIElVnMJ2+zvP/wRhF
 NKzQj/bj/qecCbis2lUCaVWJFZ6+P/52UbD8lvIwqR3nk2TKsGDcLU6eY3yg4KrB
 rEHl5T8KIPrkX3KNIEB8EcFREene+rdpZLLVe4fYwf+gOqfiFXSzZZvweauMkplq
 ehlVHkykvQvlsVM2tjBD379z3C4aasZDuMVNMCbAy2FlruLeBQ7gEn77mCJB9VH5
 /n/mlc2yizdoowtARCLWOUMfASpdSbqu2SQ7A/3kwG7l6GrpzKSIU2nQgm+41sUZ
 QJVtZ3IYsPoYjnU4B3JZzgJnf3M9jcRz/3JegviqhSEbF1gaScJX0cqN8C1idN/v
 ZAGCJK9S20/EEEsp5jn+bq2grUehvmD4TVDfot4P+5yRYyBIhMFpbM2RpjydOpwy
 +x8D1Q34LYPFgZfQ0vF62vcSBhMBiJ/7j41rUeo44K+Lg00F3yCOyL6FxK6S8h6j
 wsD0xLbllMrhV5KRYFizb3QbCHoHYiROIJk76uLvB+Tqq2Jg9VQ=
 =qIi2
 -----END PGP SIGNATURE-----

Merge 4.19.64 into android-4.19

Changes in 4.19.64
	hv_sock: Add support for delayed close
	vsock: correct removal of socket from the list
	NFS: Fix dentry revalidation on NFSv4 lookup
	NFS: Refactor nfs_lookup_revalidate()
	NFSv4: Fix lookup revalidate of regular files
	usb: dwc2: Disable all EP's on disconnect
	usb: dwc2: Fix disable all EP's on disconnect
	arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
	binder: fix possible UAF when freeing buffer
	ISDN: hfcsusb: checking idx of ep configuration
	media: au0828: fix null dereference in error path
	ath10k: Change the warning message string
	media: cpia2_usb: first wake up, then free in disconnect
	media: pvrusb2: use a different format for warnings
	NFS: Cleanup if nfs_match_client is interrupted
	media: radio-raremono: change devm_k*alloc to k*alloc
	iommu/vt-d: Don't queue_iova() if there is no flush queue
	iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
	Bluetooth: hci_uart: check for missing tty operations
	vhost: introduce vhost_exceeds_weight()
	vhost_net: fix possible infinite loop
	vhost: vsock: add weight support
	vhost: scsi: add weight support
	sched/fair: Don't free p->numa_faults with concurrent readers
	sched/fair: Use RCU accessors consistently for ->numa_group
	/proc/<pid>/cmdline: remove all the special cases
	/proc/<pid>/cmdline: add back the setproctitle() special case
	drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
	Fix allyesconfig output.
	ceph: hold i_ceph_lock when removing caps for freeing inode
	block, scsi: Change the preempt-only flag into a counter
	scsi: core: Avoid that a kernel warning appears during system resume
	ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
	Linux 4.19.64

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I3e9055b677bd8ad9d5070307fae0bc765d444e9d
2019-08-04 09:37:11 +02:00
Jann Horn
48046e092a sched/fair: Don't free p->numa_faults with concurrent readers
commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream.

When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:30:56 +02:00
Greg Kroah-Hartman
34e9e65731 This is the 4.19.28 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlyEq/IACgkQONu9yGCS
 aT77AhAAgqbxHsKsgqh7JV477xEgLqrLKYw+/6Bx+l79fUZoaR3PRwM62UEzikPQ
 KbvutNuOHOWuJA0Xyj5gqV4SJaqBOkoGNnHFOBi/qjtFUsOlBFrlGpow+0fsP/ay
 Bmo0LhoTvBub4ap7bJt4pwel/elWYVtOkA1Qgv3OCiDkTorYuPTbIUyuAVOJJbRn
 sZ1eKi00CQPrN65Rxgci0g0p/m7JWpvW2zqmDNZJuZZeEmSLdrrZGwt5ExiI6oKz
 CqX/VBGChEesMTEOLsSfRyg6NZW3j4rOUaCzkxDq/Tsh9XqNabhk1jod2p3t7Nmu
 n5Js3ujfuyCKf5tD49Z8xy5A++nYyJLa5jbFnURr2H/ZJPla0CHuc4RoFf/wFYr4
 xQAeA3XXiZZB02n6oZlweUSp7hVnCgLJ4Ev2ctAyPUpyf4ncl+vffzj40bozFvAC
 adJE1UyEJp0xUuRVdx0I+HyueHcWmRIAgs0iz9B5S2KNc4rYDqE+/t+ddrNqjvSF
 +C33nMn+7A+ngmQlwWjOBaZhhQn3qWrWU0ACERbIG/DUD2voBYu3oIfQzKsXTt0V
 erSSDq0KSy73PRKN4Tzf2GnDUAlNITUuyFgWITOg8p29HuXJO00p4MQ7fOMYacyB
 WdwXFB7islUtUoA8nEedZgF7IF1WRh6Iz2HJ5uMTl6pCMSV+8Dk=
 =OvW8
 -----END PGP SIGNATURE-----

Merge 4.19.28 into android-4.19

Changes in 4.19.28
	cpufreq: Use struct kobj_attribute instead of struct global_attr
	staging: erofs: fix mis-acted TAIL merging behavior
	USB: serial: option: add Telit ME910 ECM composition
	USB: serial: cp210x: add ID for Ingenico 3070
	USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
	staging: erofs: fix illegal address access under memory pressure
	staging: erofs: compressed_pages should not be accessed again after freed
	staging: comedi: ni_660x: fix missing break in switch statement
	staging: wilc1000: fix to set correct value for 'vif_num'
	staging: android: ion: fix sys heap pool's gfp_flags
	staging: android: ashmem: Don't call fallocate() with ashmem_mutex held.
	staging: android: ashmem: Avoid range_alloc() allocation with ashmem_mutex held.
	ip6mr: Do not call __IP6_INC_STATS() from preemptible context
	net: dsa: mv88e6xxx: handle unknown duplex modes gracefully in mv88e6xxx_port_set_duplex
	net: dsa: mv8e6xxx: fix number of internal PHYs for 88E6x90 family
	net: sched: put back q.qlen into a single location
	net-sysfs: Fix mem leak in netdev_register_kobject
	qmi_wwan: Add support for Quectel EG12/EM12
	sctp: call iov_iter_revert() after sending ABORT
	sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
	team: Free BPF filter when unregistering netdev
	tipc: fix RDM/DGRAM connect() regression
	bnxt_en: Drop oversize TX packets to prevent errors.
	geneve: correctly handle ipv6.disable module parameter
	hv_netvsc: Fix IP header checksum for coalesced packets
	ipv4: Add ICMPv6 support when parse route ipproto
	lan743x: Fix TX Stall Issue
	net: dsa: mv88e6xxx: Fix statistics on mv88e6161
	net: dsa: mv88e6xxx: Fix u64 statistics
	netlabel: fix out-of-bounds memory accesses
	net: netem: fix skb length BUG_ON in __skb_to_sgvec
	net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails
	net: phy: Micrel KSZ8061: link failure after cable connect
	net: phy: phylink: fix uninitialized variable in phylink_get_mac_state
	net: sit: fix memory leak in sit_init_net()
	net: socket: set sock->sk to NULL after calling proto_ops::release()
	tipc: fix race condition causing hung sendto
	tun: fix blocking read
	xen-netback: don't populate the hash cache on XenBus disconnect
	xen-netback: fix occasional leak of grant ref mappings under memory pressure
	tun: remove unnecessary memory barrier
	net: Add __icmp_send helper.
	net: avoid use IPCB in cipso_v4_error
	ipv4: Return error for RTA_VIA attribute
	ipv6: Return error for RTA_VIA attribute
	mpls: Return error for RTA_GATEWAY attribute
	ipv4: Pass original device to ip_rcv_finish_core
	net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X
	net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode
	net/sched: act_ipt: fix refcount leak when replace fails
	net/sched: act_skbedit: fix refcount leak when replace fails
	net: sched: act_tunnel_key: fix NULL pointer dereference during init
	x86/CPU/AMD: Set the CPB bit unconditionally on F17h
	x86/boot/compressed/64: Do not read legacy ROM on EFI system
	tracing: Fix event filters and triggers to handle negative numbers
	usb: xhci: Fix for Enabling USB ROLE SWITCH QUIRK on INTEL_SUNRISEPOINT_LP_XHCI
	applicom: Fix potential Spectre v1 vulnerabilities
	MIPS: irq: Allocate accurate order pages for irq stack
	aio: Fix locking in aio_poll()
	xtensa: fix get_wchan
	gnss: sirf: fix premature wakeup interrupt enable
	USB: serial: cp210x: fix GPIO in autosuspend
	selftests: firmware: fix verify_reqs() return value
	Bluetooth: btrtl: Restore old logic to assume firmware is already loaded
	Bluetooth: Fix locking in bt_accept_enqueue() for BH context
	exec: Fix mem leak in kernel_read_file
	scsi: core: reset host byte in DID_NEXUS_FAILURE case
	bpf: fix sanitation rewrite in case of non-pointers
	Linux 4.19.28

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-10 07:23:21 +01:00
YueHaibing
b60d90b2d3 exec: Fix mem leak in kernel_read_file
commit f612acfae86af7ecad754ae6a46019be9da05b8e upstream.

syzkaller report this:
BUG: memory leak
unreferenced object 0xffffc9000488d000 (size 9195520):
  comm "syz-executor.0", pid 2752, jiffies 4294787496 (age 18.757s)
  hex dump (first 32 bytes):
    ff ff ff ff ff ff ff ff a8 00 00 00 01 00 00 00  ................
    02 00 00 00 00 00 00 00 80 a1 7a c1 ff ff ff ff  ..........z.....
  backtrace:
    [<000000000863775c>] __vmalloc_node mm/vmalloc.c:1795 [inline]
    [<000000000863775c>] __vmalloc_node_flags mm/vmalloc.c:1809 [inline]
    [<000000000863775c>] vmalloc+0x8c/0xb0 mm/vmalloc.c:1831
    [<000000003f668111>] kernel_read_file+0x58f/0x7d0 fs/exec.c:924
    [<000000002385813f>] kernel_read_file_from_fd+0x49/0x80 fs/exec.c:993
    [<0000000011953ff1>] __do_sys_finit_module+0x13b/0x2a0 kernel/module.c:3895
    [<000000006f58491f>] do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
    [<00000000ee78baf4>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<00000000241f889b>] 0xffffffffffffffff

It should goto 'out_free' lable to free allocated buf while kernel_read
fails.

Fixes: 39d637af5a ("vfs: forbid write access when reading a file into memory")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thibaut Sautereau <thibaut@sautereau.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-10 07:17:21 +01:00
Daniel Rosenberg
e81cea2a6f ANDROID: vfs: Add permission2 for filesystems with per mount permissions
This allows filesystems to use their mount private data to
influence the permssions they return in permission2. It has
been separated into a new call to avoid disrupting current
permission users.

Bug: 35848445
Bug: 120446149
Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[AmitP: Minor refactoring of original patch to align with
        changes from the following upstream commit
        4bfd054ae1 ("fs: fold __inode_permission() into inode_permission()").
        Also introduce vfs_mkobj2(), because do_create()
        moved from using vfs_create() to vfs_mkobj()
        eecec19d9e ("mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata")
        do_create() is dropped/cleaned-up upstream so a
        minor refactoring there as well.
        066cc813e9 ("do_mq_open(): move all work prior to dentry_open() into a helper")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[astrachan: Folded the following changes into this patch:
            f46c9d62dd81 ("ANDROID: fs: Export vfs_rmdir2")
            9992eb8b9a1e ("ANDROID: xattr: Pass EOPNOTSUPP to permission2")]
Signed-off-by: Alistair Strachan <astrachan@google.com>
2018-12-05 09:48:14 -08:00
Linus Torvalds
0214f46b3a Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull core signal handling updates from Eric Biederman:
 "It was observed that a periodic timer in combination with a
  sufficiently expensive fork could prevent fork from every completing.
  This contains the changes to remove the need for that restart.

  This set of changes is split into several parts:

   - The first part makes PIDTYPE_TGID a proper pid type instead
     something only for very special cases. The part starts using
     PIDTYPE_TGID enough so that in __send_signal where signals are
     actually delivered we know if the signal is being sent to a a group
     of processes or just a single process.

   - With that prep work out of the way the logic in fork is modified so
     that fork logically makes signals received while it is running
     appear to be received after the fork completes"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
  signal: Don't send signals to tasks that don't exist
  signal: Don't restart fork when signals come in.
  fork: Have new threads join on-going signal group stops
  fork: Skip setting TIF_SIGPENDING in ptrace_init_task
  signal: Add calculate_sigpending()
  fork: Unconditionally exit if a fatal signal is pending
  fork: Move and describe why the code examines PIDNS_ADDING
  signal: Push pid type down into complete_signal.
  signal: Push pid type down into __send_signal
  signal: Push pid type down into send_signal
  signal: Pass pid type into do_send_sig_info
  signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
  signal: Pass pid type into group_send_sig_info
  signal: Pass pid and pid type into send_sigqueue
  posix-timers: Noralize good_sigevent
  signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
  pid: Implement PIDTYPE_TGID
  pids: Move the pgrp and session pid pointers from task_struct to signal_struct
  kvm: Don't open code task_pid in kvm_vcpu_ioctl
  pids: Compute task_tgid using signal->leader_pid
  ...
2018-08-21 13:47:29 -07:00
Kirill A. Shutemov
bfd40eaff5 mm: fix vma_is_anonymous() false-positives
vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous
VMA.  This is unreliable as ->mmap may not set ->vm_ops.

False-positive vma_is_anonymous() may lead to crashes:

	next ffff8801ce5e7040 prev ffff8801d20eca50 mm ffff88019c1e13c0
	prot 27 anon_vma ffff88019680cdd8 vm_ops 0000000000000000
	pgoff 0 file ffff8801b2ec2d00 private_data 0000000000000000
	flags: 0xff(read|write|exec|shared|mayread|maywrite|mayexec|mayshare)
	------------[ cut here ]------------
	kernel BUG at mm/memory.c:1422!
	invalid opcode: 0000 [#1] SMP KASAN
	CPU: 0 PID: 18486 Comm: syz-executor3 Not tainted 4.18.0-rc3+ #136
	Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
	01/01/2011
	RIP: 0010:zap_pmd_range mm/memory.c:1421 [inline]
	RIP: 0010:zap_pud_range mm/memory.c:1466 [inline]
	RIP: 0010:zap_p4d_range mm/memory.c:1487 [inline]
	RIP: 0010:unmap_page_range+0x1c18/0x2220 mm/memory.c:1508
	Call Trace:
	 unmap_single_vma+0x1a0/0x310 mm/memory.c:1553
	 zap_page_range_single+0x3cc/0x580 mm/memory.c:1644
	 unmap_mapping_range_vma mm/memory.c:2792 [inline]
	 unmap_mapping_range_tree mm/memory.c:2813 [inline]
	 unmap_mapping_pages+0x3a7/0x5b0 mm/memory.c:2845
	 unmap_mapping_range+0x48/0x60 mm/memory.c:2880
	 truncate_pagecache+0x54/0x90 mm/truncate.c:800
	 truncate_setsize+0x70/0xb0 mm/truncate.c:826
	 simple_setattr+0xe9/0x110 fs/libfs.c:409
	 notify_change+0xf13/0x10f0 fs/attr.c:335
	 do_truncate+0x1ac/0x2b0 fs/open.c:63
	 do_sys_ftruncate+0x492/0x560 fs/open.c:205
	 __do_sys_ftruncate fs/open.c:215 [inline]
	 __se_sys_ftruncate fs/open.c:213 [inline]
	 __x64_sys_ftruncate+0x59/0x80 fs/open.c:213
	 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
	 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reproducer:

	#include <stdio.h>
	#include <stddef.h>
	#include <stdint.h>
	#include <stdlib.h>
	#include <string.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <sys/ioctl.h>
	#include <sys/mman.h>
	#include <unistd.h>
	#include <fcntl.h>

	#define KCOV_INIT_TRACE			_IOR('c', 1, unsigned long)
	#define KCOV_ENABLE			_IO('c', 100)
	#define KCOV_DISABLE			_IO('c', 101)
	#define COVER_SIZE			(1024<<10)

	#define KCOV_TRACE_PC  0
	#define KCOV_TRACE_CMP 1

	int main(int argc, char **argv)
	{
		int fd;
		unsigned long *cover;

		system("mount -t debugfs none /sys/kernel/debug");
		fd = open("/sys/kernel/debug/kcov", O_RDWR);
		ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE);
		cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long),
				PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
		munmap(cover, COVER_SIZE * sizeof(unsigned long));
		cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long),
				PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
		memset(cover, 0, COVER_SIZE * sizeof(unsigned long));
		ftruncate(fd, 3UL << 20);
		return 0;
	}

This can be fixed by assigning anonymous VMAs own vm_ops and not relying
on it being NULL.

If ->mmap() failed to set ->vm_ops, mmap_region() will set it to
dummy_vm_ops.  This way we will have non-NULL ->vm_ops for all VMAs.

Link: http://lkml.kernel.org/r/20180724121139.62570-4-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: syzbot+3f84280d52be9b7083cc@syzkaller.appspotmail.com
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-26 19:38:03 -07:00
Linus Torvalds
490fc05386 mm: make vm_area_alloc() initialize core fields
Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.

The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 15:24:03 -07:00
Linus Torvalds
3928d4f5ee mm: use helper functions for allocating and freeing vm_area structs
The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.

We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.

Right now those functions are literally just wrappers around the
kmem_cache_*() calls.  This is a purely mechanical conversion:

    # new vma:
    kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()

    # copy old vma
    kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)

    # free vma
    kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)

to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-21 13:48:51 -07:00
Eric W. Biederman
6883f81aac pid: Implement PIDTYPE_TGID
Everywhere except in the pid array we distinguish between a tasks pid and
a tasks tgid (thread group id).  Even in the enumeration we want that
distinction sometimes so we have added __PIDTYPE_TGID.  With leader_pid
we almost have an implementation of PIDTYPE_TGID in struct signal_struct.

Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
into the pids array.  Then remove the __PIDTYPE_TGID special case and the
leader_pid in signal_struct.

The net size increase is just an extra pointer added to struct pid and
an extra pair of pointers of an hlist_node added to task_struct.

The effect on code maintenance is the removal of a number of special
cases today and the potential to remove many more special cases as
PIDTYPE_TGID gets used to it's fullest.  The long term potential
is allowing zombie thread group leaders to exit, which will remove
a lot more special cases in the code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-07-21 10:43:12 -05:00
Linus Torvalds
d82991a868 Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull restartable sequence support from Thomas Gleixner:
 "The restartable sequences syscall (finally):

  After a lot of back and forth discussion and massive delays caused by
  the speculative distraction of maintainers, the core set of
  restartable sequences has finally reached a consensus.

  It comes with the basic non disputed core implementation along with
  support for arm, powerpc and x86 and a full set of selftests

  It was exposed to linux-next earlier this week, so it does not fully
  comply with the merge window requirements, but there is really no
  point to drag it out for yet another cycle"

* 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq/selftests: Provide Makefile, scripts, gitignore
  rseq/selftests: Provide parametrized tests
  rseq/selftests: Provide basic percpu ops test
  rseq/selftests: Provide basic test
  rseq/selftests: Provide rseq library
  selftests/lib.mk: Introduce OVERRIDE_TARGETS
  powerpc: Wire up restartable sequences system call
  powerpc: Add syscall detection for restartable sequences
  powerpc: Add support for restartable sequences
  x86: Wire up restartable sequence system call
  x86: Add support for restartable sequences
  arm: Wire up restartable sequences system call
  arm: Add syscall detection for restartable sequences
  arm: Add restartable sequences support
  rseq: Introduce restartable sequences system call
  uapi/headers: Provide types_32_64.h
2018-06-10 10:17:09 -07:00
Mathieu Desnoyers
d7822b1e24 rseq: Introduce restartable sequences system call
Expose a new system call allowing each thread to register one userspace
memory area to be used as an ABI between kernel and user-space for two
purposes: user-space restartable sequences and quick access to read the
current CPU number value from user-space.

* Restartable sequences (per-cpu atomics)

Restartables sequences allow user-space to perform update operations on
per-cpu data without requiring heavy-weight atomic operations.

The restartable critical sections (percpu atomics) work has been started
by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
critical sections. [1] [2] The re-implementation proposed here brings a
few simplifications to the ABI which facilitates porting to other
architectures and speeds up the user-space fast path.

Here are benchmarks of various rseq use-cases.

Test hardware:

arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

The following benchmarks were all performed on a single thread.

* Per-CPU statistic counter increment

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                344.0                 31.4          11.0
x86-64:                15.3                  2.0           7.7

* LTTng-UST: write event 32-bit header, 32-bit payload into tracer
             per-cpu buffer

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:               2502.0                 2250.0         1.1
x86-64:               117.4                   98.0         1.2

* liburcu percpu: lock-unlock pair, dereference, read/compare word

                getcpu+atomic (ns/op)    rseq (ns/op)    speedup
arm32:                751.0                 128.5          5.8
x86-64:                53.4                  28.6          1.9

* jemalloc memory allocator adapted to use rseq

Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
rseq 2016 implementation):

The production workload response-time has 1-2% gain avg. latency, and
the P99 overall latency drops by 2-3%.

* Reading the current CPU number

Speeding up reading the current CPU number on which the caller thread is
running is done by keeping the current CPU number up do date within the
cpu_id field of the memory area registered by the thread. This is done
by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
current thread. Upon return to user-space, a notify-resume handler
updates the current CPU value within the registered user-space memory
area. User-space can then read the current CPU number directly from
memory.

Keeping the current cpu id in a memory area shared between kernel and
user-space is an improvement over current mechanisms available to read
the current CPU number, which has the following benefits over
alternative approaches:

- 35x speedup on ARM vs system call through glibc
- 20x speedup on x86 compared to calling glibc, which calls vdso
  executing a "lsl" instruction,
- 14x speedup on x86 compared to inlined "lsl" instruction,
- Unlike vdso approaches, this cpu_id value can be read from an inline
  assembly, which makes it a useful building block for restartable
  sequences.
- The approach of reading the cpu id through memory mapping shared
  between kernel and user-space is portable (e.g. ARM), which is not the
  case for the lsl-based x86 vdso.

On x86, yet another possible approach would be to use the gs segment
selector to point to user-space per-cpu data. This approach performs
similarly to the cpu id cache, but it has two disadvantages: it is
not portable, and it is incompatible with existing applications already
using the gs segment selector for other purposes.

Benchmarking various approaches for reading the current CPU number:

ARMv7 Processor rev 4 (v7l)
Machine model: Cubietruck
- Baseline (empty loop):                                    8.4 ns
- Read CPU from rseq cpu_id:                               16.7 ns
- Read CPU from rseq cpu_id (lazy register):               19.8 ns
- glibc 2.19-0ubuntu6.6 getcpu:                           301.8 ns
- getcpu system call:                                     234.9 ns

x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
- Baseline (empty loop):                                    0.8 ns
- Read CPU from rseq cpu_id:                                0.8 ns
- Read CPU from rseq cpu_id (lazy register):                0.8 ns
- Read using gs segment selector:                           0.8 ns
- "lsl" inline assembly:                                   13.0 ns
- glibc 2.19-0ubuntu6 getcpu:                              16.6 ns
- getcpu system call:                                      53.9 ns

- Speed (benchmark taken on v8 of patchset)

Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
expectations, that enabling CONFIG_RSEQ slightly accelerates the
scheduler:

Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
restartable sequences series applied.

* CONFIG_RSEQ=n

avg.:      41.37 s
std.dev.:   0.36 s

* CONFIG_RSEQ=y

avg.:      40.46 s
std.dev.:   0.33 s

- Size

On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
567 bytes, and the data size increase of vmlinux is 5696 bytes.

[1] https://lwn.net/Articles/650333/
[2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com
2018-06-06 11:58:31 +02:00
Alexei Starovoitov
449325b52b umh: introduce fork_usermode_blob() helper
Introduce helper:
int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
struct umh_info {
       struct file *pipe_to_umh;
       struct file *pipe_from_umh;
       pid_t pid;
};

that GPLed kernel modules (signed or unsigned) can use it to execute part
of its own data as swappable user mode process.

The kernel will do:
- allocate a unique file in tmpfs
- populate that file with [data, data + len] bytes
- user-mode-helper code will do_execve that file and, before the process
  starts, the kernel will create two unix pipes for bidirectional
  communication between kernel module and umh
- close tmpfs file, effectively deleting it
- the fork_usermode_blob will return zero on success and populate
  'struct umh_info' with two unix pipes and the pid of the user process

As the first step in the development of the bpfilter project
the fork_usermode_blob() helper is introduced to allow user mode code
to be invoked from a kernel module. The idea is that user mode code plus
normal kernel module code are built as part of the kernel build
and installed as traditional kernel module into distro specified location,
such that from a distribution point of view, there is
no difference between regular kernel modules and kernel modules + umh code.
Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
by a kernel module doesn't make it any special from kernel and user space
tooling point of view.

Such approach enables kernel to delegate functionality traditionally done
by the kernel modules into the user space processes (either root or !root) and
reduces security attack surface of the new code. The buggy umh code would crash
the user process, but not the kernel. Another advantage is that umh code
of the kernel module can be debugged and tested out of user space
(e.g. opening the possibility to run clang sanitizers, fuzzers or
user space test suites on the umh code).
In case of the bpfilter project such architecture allows complex control plane
to be done in the user space while bpf based data plane stays in the kernel.

Since umh can crash, can be oom-ed by the kernel, killed by the admin,
the kernel module that uses them (like bpfilter) needs to manage life
time of umh on its own via two unix pipes and the pid of umh.

The exit code of such kernel module should kill the umh it started,
so that rmmod of the kernel module will cleanup the corresponding umh.
Just like if the kernel module does kmalloc() it should kfree() it
in the exit code.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23 13:23:39 -04:00
Kees Cook
c31dbb146d exec: pin stack limit during exec
Since the stack rlimit is used in multiple places during exec and it can
be changed via other threads (via setrlimit()) or processes (via
prlimit()), the assumption that the value doesn't change cannot be made.
This leads to races with mm layout selection and argument size
calculations.  This changes the exec path to use the rlimit stored in
bprm instead of in current.  Before starting the thread, the bprm stack
rlimit is stored back to current.

Link: http://lkml.kernel.org/r/1518638796-20819-4-git-send-email-keescook@chromium.org
Fixes: 64701dee41 ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Brad Spengler <spender@grsecurity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Kees Cook
b838383133 exec: introduce finalize_exec() before start_thread()
Provide a final callback into fs/exec.c before start_thread() takes
over, to handle any last-minute changes, like the coming restoration of
the stack limit.

Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Greg KH <greg@kroah.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Kees Cook
8f2af155b5 exec: pass stack rlimit into mm layout functions
Patch series "exec: Pin stack limit during exec".

Attempts to solve problems with the stack limit changing during exec
continue to be frustrated[1][2].  In addition to the specific issues
around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
other places during exec where the stack limit is used and is assumed to
be unchanging.  Given the many places it gets used and the fact that it
can be manipulated/raced via setrlimit() and prlimit(), I think the only
way to handle this is to move away from the "current" view of the stack
limit and instead attach it to the bprm, and plumb this down into the
functions that need to know the stack limits.  This series implements
the approach.

[1] 04e35f4495 ("exec: avoid RLIMIT_STACK races with prlimit()")
[2] 779f4e1c6c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
[3] to security@kernel.org, "Subject: existing rlimit races?"

This patch (of 3):

Since it is possible that the stack rlimit can change externally during
exec (either via another thread calling setrlimit() or another process
calling prlimit()), provide a way to pass the rlimit down into the
per-architecture mm layout functions so that the rlimit can stay in the
bprm structure instead of sitting in the signal structure until exec is
finalized.

Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Kees Cook
7bd698b3c0 exec: Set file unwritable before LSM check
The LSM check should happen after the file has been confirmed to be
unchanging. Without this, we could have a race between the Time of Check
(the call to security_kernel_read_file() which could read the file and
make access policy decisions) and the Time of Use (starting with
kernel_read_file()'s reading of the file contents). In theory, file
contents could change between the two.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
2018-03-19 15:49:32 +11:00
Kees Cook
e816c201ae exec: Weaken dumpability for secureexec
This is a logical revert of commit e37fdb785a ("exec: Use secureexec
for setting dumpability")

This weakens dumpability back to checking only for uid/gid changes in
current (which is useless), but userspace depends on dumpability not
being tied to secureexec.

  https://bugzilla.redhat.com/show_bug.cgi?id=1528633

Reported-by: Tom Horsley <horsley1953@gmail.com>
Fixes: e37fdb785a ("exec: Use secureexec for setting dumpability")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-03 10:13:36 -08:00
Kees Cook
779f4e1c6c Revert "exec: avoid RLIMIT_STACK races with prlimit()"
This reverts commit 04e35f4495.

SELinux runs with secureexec for all non-"noatsecure" domain transitions,
which means lots of processes end up hitting the stack hard-limit change
that was introduced in order to fix a race with prlimit(). That race fix
will need to be redesigned.

Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Tomáš Trnka <trnka@scm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-17 14:26:25 -08:00
Arnd Bergmann
3756f6401c exec: avoid gcc-8 warning for get_task_comm
gcc-8 warns about using strncpy() with the source size as the limit:

  fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]

This is indeed slightly suspicious, as it protects us from source
arguments without NUL-termination, but does not guarantee that the
destination is terminated.

This keeps the strncpy() to ensure we have properly padded target
buffer, but ensures that we use the correct length, by passing the
actual length of the destination buffer as well as adding a build-time
check to ensure it is exactly TASK_COMM_LEN.

There are only 23 callsites which I all reviewed to ensure this is
currently the case.  We could get away with doing only the check or
passing the right length, but it doesn't hurt to do both.

Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-14 16:00:48 -08:00
Kees Cook
04e35f4495 exec: avoid RLIMIT_STACK races with prlimit()
While the defense-in-depth RLIMIT_STACK limit on setuid processes was
protected against races from other threads calling setrlimit(), I missed
protecting it against races from external processes calling prlimit().
This adds locking around the change and makes sure that rlim_max is set
too.

Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast
Fixes: 64701dee41 ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Brad Spengler <spender@grsecurity.net>
Acked-by: Serge Hallyn <serge@hallyn.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:42 -08:00
Mark Rutland
6aa7de0591 locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:01:08 +02:00
Mathieu Desnoyers
a961e40917 membarrier: Provide register expedited private command
This introduces a "register private expedited" membarrier command which
allows eventual removal of important memory barrier constraints on the
scheduler fast-paths. It changes how the "private expedited" membarrier
command (new to 4.14) is used from user-space.

This new command allows processes to register their intent to use the
private expedited command.  This affects how the expedited private
command introduced in 4.14-rc is meant to be used, and should be merged
before 4.14 final.

Processes are now required to register before using
MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.

This fixes a problem that arose when designing requested extensions to
sys_membarrier() to allow JITs to efficiently flush old code from
instruction caches.  Several potential algorithms are much less painful
if the user register intent to use this functionality early on, for
example, before the process spawns the second thread.  Registering at
this time removes the need to interrupt each and every thread in that
process at the first expedited sys_membarrier() system call.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-19 22:13:40 -04:00
Oleg Nesterov
c2315c187f exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array
Patch series "exec: binfmt_misc: fix use-after-free, kill
iname[BINPRM_BUF_SIZE]".

It looks like this code was always wrong, then commit 948b701a60
("binfmt_misc: add persistent opened binary handler for containers")
added more problems.

This patch (of 6):

load_script() can simply use i_name instead, it points into bprm->buf[]
and nobody can change this memory until we call prepare_binprm().

The only complication is that we need to also change the signature of
bprm_change_interp() but this change looks good too.

While at it, do whitespace/style cleanups.

NOTE: the real motivation for this change is that people want to
increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but
this looks more complicated because afaics it is very buggy.

Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Travis Gummels <tgummels@redhat.com>
Cc: Ben Woodard <woodard@redhat.com>
Cc: Jim Foraker <foraker1@llnl.gov>
Cc: <tdhooge@llnl.gov>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:25 -07:00
Mimi Zohar
711aab1dbb vfs: constify path argument to kernel_read_file_from_path
This patch constifies the path argument to kernel_read_file_from_path().

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-14 20:18:45 -07:00
Linus Torvalds
581bfce969 Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more set_fs removal from Al Viro:
 "Christoph's 'use kernel_read and friends rather than open-coding
  set_fs()' series"

* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: unexport vfs_readv and vfs_writev
  fs: unexport vfs_read and vfs_write
  fs: unexport __vfs_read/__vfs_write
  lustre: switch to kernel_write
  gadget/f_mass_storage: stop messing with the address limit
  mconsole: switch to kernel_read
  btrfs: switch write_buf to kernel_write
  net/9p: switch p9_fd_read to kernel_write
  mm/nommu: switch do_mmap_private to kernel_read
  serial2002: switch serial2002_tty_write to kernel_{read/write}
  fs: make the buf argument to __kernel_write a void pointer
  fs: fix kernel_write prototype
  fs: fix kernel_read prototype
  fs: move kernel_read to fs/read_write.c
  fs: move kernel_write to fs/read_write.c
  autofs4: switch autofs4_write to __kernel_write
  ashmem: switch to ->read_iter
2017-09-14 18:13:32 -07:00
Michal Hocko
0ee931c4e3 mm: treewide: remove GFP_TEMPORARY allocation flag
GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13 18:53:16 -07:00
Christoph Hellwig
bdd1d2d3d2 fs: fix kernel_read prototype
Use proper ssize_t and size_t types for the return value and count
argument, move the offset last and make it an in/out argument like
all other read/write helpers, and make the buf argument a void pointer
to get rid of lots of casts in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-04 19:05:15 -04:00
Christoph Hellwig
c41fbad015 fs: move kernel_read to fs/read_write.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-04 19:05:15 -04:00
Kees Cook
fe8993b3a0 exec: Consolidate pdeath_signal clearing
Instead of an additional secureexec check for pdeath_signal, just move it
up into the initial secureexec test. Neither perf nor arch code touches
pdeath_signal, so the relocation shouldn't change anything.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01 12:03:14 -07:00
Kees Cook
64701dee41 exec: Use sane stack rlimit under secureexec
For a secureexec, before memory layout selection has happened, reset the
stack rlimit to something sane to avoid the caller having control over
the resulting layouts.

$ ulimit -s
8192
$ ulimit -s unlimited
$ /bin/sh -c 'ulimit -s'
unlimited
$ sudo /bin/sh -c 'ulimit -s'
8192

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01 12:03:14 -07:00
Kees Cook
473d89639d exec: Consolidate dumpability logic
Since it's already valid to set dumpability in the early part of
setup_new_exec(), we can consolidate the logic into a single place.
The BINPRM_FLAGS_ENFORCE_NONDUMP is set during would_dump() calls
before setup_new_exec(), so its test is safe to move as well.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
2017-08-01 12:03:13 -07:00
Kees Cook
a70423dfbc exec: Use secureexec for clearing pdeath_signal
Like dumpability, clearing pdeath_signal happens both in setup_new_exec()
and later in commit_creds(). The test in setup_new_exec() is different
from all other privilege comparisons, though: it is checking the new cred
(bprm) uid vs the old cred (current) euid. This appears to be a bug,
introduced by commit a6f76f23d2 ("CRED: Make execve() take advantage of
copy-on-write credentials"):

-       if (bprm->e_uid != current_euid() ||
-           bprm->e_gid != current_egid()) {
-               set_dumpable(current->mm, suid_dumpable);
+       if (bprm->cred->uid != current_euid() ||
+           bprm->cred->gid != current_egid()) {

It was bprm euid vs current euid (and egids), but the effective got
dropped. Nothing in the exec flow changes bprm->cred->uid (nor gid).
The call traces are:

	prepare_bprm_creds()
	    prepare_exec_creds()
	        prepare_creds()
	            memcpy(new_creds, old_creds, ...)
	            security_prepare_creds() (unimplemented by commoncap)
	...
	prepare_binprm()
	    bprm_fill_uid()
	        resets euid/egid to current euid/egid
	        sets euid/egid on bprm based on set*id file bits
	    security_bprm_set_creds()
		cap_bprm_set_creds()
		        handle all caps-based manipulations

so this test is effectively a test of current_uid() vs current_euid(),
which is wrong, just like the prior dumpability tests were wrong.

The commit log says "Clear pdeath_signal and set dumpable on
certain circumstances that may not be covered by commit_creds()." This
may be meaning the earlier old euid vs new euid (and egid) test that
got changed.

Luckily, as with dumpability, this is all masked by commit_creds()
which performs old/new euid and egid tests and clears pdeath_signal.

And again, like dumpability, we should include LSM secureexec logic for
pdeath_signal clearing. For example, Smack goes out of its way to clear
pdeath_signal when it finds a secureexec condition.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
2017-08-01 12:03:11 -07:00
Kees Cook
e37fdb785a exec: Use secureexec for setting dumpability
The examination of "current" to decide dumpability is wrong. This was a
check of and euid/uid (or egid/gid) mismatch in the existing process,
not the newly created one. This appears to stretch back into even the
"history.git" tree. Luckily, dumpability is later set in commit_creds().
In earlier kernel versions before creds existed, similar checks also
existed late in the exec flow, covering up the mistake as far back as I
could find.

Note that because the commit_creds() check examines differences of euid,
uid, egid, gid, and capabilities between the old and new creds, it would
look like the setup_new_exec() dumpability test could be entirely removed.
However, the secureexec test may cover a different set of tests (specific
to the LSMs) than what commit_creds() checks for. So, fix this test to
use secureexec (the removed euid tests are redundant to the commoncap
secureexec checks now).

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
2017-08-01 12:03:10 -07:00
Kees Cook
2af6228026 LSM: drop bprm_secureexec hook
This removes the bprm_secureexec hook since the logic has been folded into
the bprm_set_creds hook for all LSMs now.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01 12:03:10 -07:00
Kees Cook
46d98eb4e1 commoncap: Refactor to remove bprm_secureexec hook
The commoncap implementation of the bprm_secureexec hook is the only LSM
that depends on the final call to its bprm_set_creds hook (since it may
be called for multiple files, it ignores bprm->called_set_creds). As a
result, it cannot safely _clear_ bprm->secureexec since other LSMs may
have set it.  Instead, remove the bprm_secureexec hook by introducing a
new flag to bprm specific to commoncap: cap_elevated. This is similar to
cap_effective, but that is used for a specific subset of elevated
privileges, and exists solely to track state from bprm_set_creds to
bprm_secureexec. As such, it will be removed in the next patch.

Here, set the new bprm->cap_elevated flag when setuid/setgid has happened
from bprm_fill_uid() or fscapabilities have been prepared. This temporarily
moves the bprm_secureexec hook to a static inline. The helper will be
removed in the next patch; this makes the step easier to review and bisect,
since this does not introduce any changes to inputs nor outputs to the
"elevated privileges" calculation.

The new flag is merged with the bprm->secureexec flag in setup_new_exec()
since this marks the end of any further prepare_binprm() calls.

Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01 12:03:08 -07:00
Kees Cook
c425e189ff binfmt: Introduce secureexec flag
The bprm_secureexec hook can be moved earlier. Right now, it is called
during create_elf_tables(), via load_binary(), via search_binary_handler(),
via exec_binprm(). Nearly all (see exception below) state used by
bprm_secureexec is created during the bprm_set_creds hook, called from
prepare_binprm().

For all LSMs (except commoncaps described next), only the first execution
of bprm_set_creds takes any effect (they all check bprm->called_set_creds
which prepare_binprm() sets after the first call to the bprm_set_creds
hook).  However, all these LSMs also only do anything with bprm_secureexec
when they detected a secure state during their first run of bprm_set_creds.
Therefore, it is functionally identical to move the detection into
bprm_set_creds, since the results from secureexec here only need to be
based on the first call to the LSM's bprm_set_creds hook.

The single exception is that the commoncaps secureexec hook also examines
euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
via prepare_binprm(), which can be called multiple times (e.g.
binfmt_script, binfmt_misc), and may clear the euid/egid for the final
load (i.e. the script interpreter). However, while commoncaps specifically
ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time
prepare_binprm() may get called, it needs to base the secureexec decision
on the final call to bprm_set_creds. As a result, it will need special
handling.

To begin this refactoring, this adds the secureexec flag to the bprm
struct, and calls the secureexec hook during setup_new_exec(). This is
safe since all the cred work is finished (and past the point of no return).
This explicit call will be removed in later patches once the hook has been
removed.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
2017-08-01 12:03:05 -07:00
Kees Cook
a9208e42ba exec: Correct comments about "point of no return"
In commit 221af7f87b ("Split 'flush_old_exec' into two functions"),
the comment about the point of no return should have stayed in
flush_old_exec() since it refers to "bprm->mm = NULL;" line, but prior
changes in commits c89681ed7d ("remove steal_locks()"), and
fd8328be87 ("sanitize handling of shared descriptor tables in failing
execve()") made it look like it meant the current->sas_ss_sp line instead.

The comment was referring to the fact that once bprm->mm is NULL, all
failures from a binfmt load_binary hook (e.g. load_elf_binary), will
get SEGV raised against current. Move this comment and expand the
explanation a bit, putting it above the assignment this time, and add
details about the true nature of "point of no return" being the call
to flush_old_exec() itself.

This also removes an erroneous commet about when credentials are being
installed. That has its own dedicated function, install_exec_creds(),
which carries a similar (and correct) comment, so remove the bogus comment
where installation is not actually happening.

Cc: David Howells <dhowells@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01 12:03:04 -07:00
Kees Cook
ddb4a1442d exec: Rename bprm->cred_prepared to called_set_creds
The cred_prepared bprm flag has a misleading name. It has nothing to do
with the bprm_prepare_cred hook, and actually tracks if bprm_set_creds has
been called. Rename this flag and improve its comment.

Cc: David Howells <dhowells@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
2017-08-01 12:02:48 -07:00
Kees Cook
da029c11e6 exec: Limit arg stack to at most 75% of _STK_LIM
To avoid pathological stack usage or the need to special-case setuid
execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-07 20:05:08 -07:00
Kees Cook
98da7d0885 fs/exec.c: account for argv/envp pointers
When limiting the argv/envp strings during exec to 1/4 of the stack limit,
the storage of the pointers to the strings was not included.  This means
that an exec with huge numbers of tiny strings could eat 1/4 of the stack
limit in strings and then additional space would be later used by the
pointers to the strings.

For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
single-byte strings would consume less than 2MB of stack, the max (8MB /
4) amount allowed, but the pointers to the strings would consume the
remaining additional stack space (1677721 * 4 == 6710884).

The result (1677721 + 6710884 == 8388605) would exhaust stack space
entirely.  Controlling this stack exhaustion could result in
pathological behavior in setuid binaries (CVE-2017-1000365).

[akpm@linux-foundation.org: additional commenting from Kees]
Fixes: b6a2fea393 ("mm: variable length argument support")
Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Qualys Security Advisory <qsa@qualys.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23 16:15:56 -07:00
Kyle Huey
e9ea1e7f53 x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
When enabled, the processor will fault on attempts to execute the CPUID
instruction with CPL>0. Exposing this feature to userspace will allow a
ptracer to trap and emulate the CPUID instruction.

When supported, this feature is controlled by toggling bit 0 of
MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
https://bugzilla.kernel.org/attachment.cgi?id=243991

Implement a new pair of arch_prctls, available on both x86-32 and x86-64.

ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting
    is enabled (and thus the CPUID instruction is not available) or 1 if
    CPUID faulting is not enabled.

ARCH_SET_CPUID: Set the CPUID state to the second argument. If
    cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will
    be deactivated. Returns ENODEV if CPUID faulting is not supported on
    this system.

The state of the CPUID faulting flag is propagated across forks, but reset
upon exec.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Robert O'Callahan <robert@ocallahan.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: user-mode-linux-user@lists.sourceforge.net
Cc: David Matlack <dmatlack@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20 16:10:34 +01:00
Ingo Molnar
299300258d sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:35 +01:00
Ingo Molnar
6a3827d750 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/numa_balancing.h>
We are going to split <linux/sched/numa_balancing.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/numa_balancing.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:30 +01:00
Ingo Molnar
3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00