It's a part of oom context just like allocation order and nodemask, so
let's move it to oom_control instead of passing it in the argument list.
Link: http://lkml.kernel.org/r/40e03fd7aaf1f55c75d787128d6d17c5a71226c2.1464358556.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 984d74a720 ("sysrq: rcu-ify __handle_sysrq") replaced
spin_lock_irqsave() calls with rcu_read_lock() calls in sysrq. Since
rcu_read_lock() does not disable preemption, faulthandler_disabled() in
__do_page_fault() in x86/fault.c returns false. When the code later calls
might_sleep() in the pagefault handler, we get the following warning:
BUG: sleeping function called from invalid context at ../arch/x86/mm/fault.c:1187
in_atomic(): 0, irqs_disabled(): 0, pid: 4706, name: bash
Preemption disabled at:[<ffffffff81484339>] printk+0x48/0x4a
To fix this, we release the RCU read lock before we crash.
Tested this patch on linux 3.18 by booting off one of our boards.
Fixes: 984d74a720 ("sysrq: rcu-ify __handle_sysrq")
Signed-off-by: Ani Sinha <ani@arista.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The Kconfig currently controlling compilation of this code is:
config.debug:config MAGIC_SYSRQ
bool "Magic SysRq key"
...meaning that it currently is not being built as a module by anyone.
Lets remove the traces of modularity we can so that when reading the
driver there is less doubt it is builtin-only.
Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.
We don't delete the module.h include since other parts of the file are
using content from there.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The force_kill member of struct oom_control isn't needed if an order of -1
is used instead. This is the same as order == -1 in struct
compact_control which requires full memory compaction.
This patch introduces no functional change.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are essential elements to an oom context that are passed around to
multiple functions.
Organize these elements into a new struct, struct oom_control, that
specifies the context for an oom condition.
This patch introduces no functional change.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
speed module address lookup. He found some abusers of the module lock
doing that too.
A little bit of parameter work here too; including Dan Streetman's breaking
up the big param mutex so writing a parameter can load another module (yeah,
really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and
!CONFIG_SYSFS, so those fixes were appended too.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
tLdh/a9GiCitqS0bT7GE
=tWPQ
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.
A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...
Pull MIPS updates from Ralf Baechle:
- Improvements to the tlb_dump code
- KVM fixes
- Add support for appended DTB
- Minor improvements to the R12000 support
- Minor improvements to the R12000 support
- Various platform improvments for BCM47xx
- The usual pile of minor cleanups
- A number of BPF fixes and improvments
- Some improvments to the support for R3000 and DECstations
- Some improvments to the ATH79 platform support
- A major patchset for the JZ4740 SOC adding support for the CI20 platform
- Add support for the Pistachio SOC
- Minor BMIPS/BCM63xx platform support improvments.
- Avoid "SYNC 0" as memory barrier when unlocking spinlocks
- Add support for the XWR-1750 board.
- Paul's __cpuinit/__cpuinitdata cleanups.
- New Malta CPU board support large memory so enable ZONE_DMA32.
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (131 commits)
MIPS: spinlock: Adjust arch_spin_lock back-off time
MIPS: asmmacro: Ensure 64-bit FP registers are used with MSA
MIPS: BCM47xx: Simplify handling SPROM revisions
MIPS: Cobalt Don't use module_init in non-modular MTD registration.
MIPS: BCM47xx: Move NVRAM driver to the drivers/firmware/
MIPS: use for_each_sg()
MIPS: BCM47xx: Don't select BCMA_HOST_PCI
MIPS: BCM47xx: Add helper variable for storing NVRAM length
MIPS: IRQ/IP27: Move IRQ allocation API to platform code.
MIPS: Replace smp_mb with release barrier function in unlocks.
MIPS: i8259: DT support
MIPS: Malta: Basic DT plumbing
MIPS: include errno.h for ENODEV in mips-cm.h
MIPS: Define GCR_GIC_STATUS register fields
MIPS: BPF: Introduce BPF ASM helpers
MIPS: BPF: Use BPF register names to describe the ABI
MIPS: BPF: Move register definition to the BPF header
MIPS: net: BPF: Replace RSIZE with SZREG
MIPS: BPF: Free up some callee-saved registers
MIPS: Xtalk: Update xwidget.h with known Xtalk device numbers
...
Merge first patchbomb from Andrew Morton:
- a few misc things
- ocfs2 udpates
- kernel/watchdog.c feature work (took ages to get right)
- most of MM. A few tricky bits are held up and probably won't make 4.2.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits)
mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
mm, thp: respect MPOL_PREFERRED policy with non-local node
tmpfs: truncate prealloc blocks past i_size
mm/memory hotplug: print the last vmemmap region at the end of hot add memory
mm/mmap.c: optimization of do_mmap_pgoff function
mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
mm: kmemleak: fix delete_object_*() race when called on the same memory block
mm: kmemleak: allow safe memory scanning during kmemleak disabling
memcg: convert mem_cgroup->under_oom from atomic_t to int
memcg: remove unused mem_cgroup->oom_wakeups
frontswap: allow multiple backends
x86, mirror: x86 enabling - find mirrored memory ranges
mm/memblock: allocate boot time data structures from mirrored memory
mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
mm: do not ignore mapping_gfp_mask in page cache allocation paths
mm/cma.c: fix typos in comments
mm/oom_kill.c: print points as unsigned int
mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
...
Pull input subsystem updates from Dmitry Torokhov:
"Thanks to Samuel Thibault input device (keyboard) LEDs are no longer
hardwired within the input core but use LED subsystem and so allow use
of different triggers; Hans de Goede did a large update for the ALPS
touchpad driver; we have new TI drv2665 haptics driver and DA9063
OnKey driver, and host of other drivers got various fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (55 commits)
Input: pixcir_i2c_ts - fix receive error
MAINTAINERS: remove non existent input mt git tree
Input: improve usage of gpiod API
tty/vt/keyboard: define LED triggers for VT keyboard lock states
tty/vt/keyboard: define LED triggers for VT LED states
Input: export LEDs as class devices in sysfs
Input: cyttsp4 - use swap() in cyttsp4_get_touch()
Input: goodix - do not explicitly set evbits in input device
Input: goodix - export id and version read from device
Input: goodix - fix variable length array warning
Input: goodix - fix alignment issues
Input: add OnKey driver for DA9063 MFD part
Input: elan_i2c - add product IDs FW names
Input: elan_i2c - add support for multi IC type and iap format
Input: focaltech - report finger width to userspace
tty: remove platform_sysrq_reset_seq
Input: synaptics_i2c - use proper boolean values
Input: psmouse - use true instead of 1 for boolean values
Input: cyapa - fix a few typos in comments
Input: stmpe-ts - enforce device tree only mode
...
The zonelist locking and the oom_sem are two overlapping locks that are
used to serialize global OOM killing against different things.
The historical zonelist locking serializes OOM kills from allocations with
overlapping zonelists against each other to prevent killing more tasks
than necessary in the same memory domain. Only when neither tasklists nor
zonelists from two concurrent OOM kills overlap (tasks in separate memcgs
bound to separate nodes) are OOM kills allowed to execute in parallel.
The younger oom_sem is a read-write lock to serialize OOM killing against
the PM code trying to disable the OOM killer altogether.
However, the OOM killer is a fairly cold error path, there is really no
reason to optimize for highly performant and concurrent OOM kills. And
the oom_sem is just flat-out redundant.
Replace both locking schemes with a single global mutex serializing OOM
kills regardless of context.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a MIPS specific SysRq operation to dump the TLB entries on all CPUs,
using the 'x' trigger key.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10072/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The platform_sysrq_reset_seq code was intended as a way for an embedded
platform to provide its own sysrq sequence at compile time. After over two
years, nobody has started using it in an upstream kernel, and the platforms
that were interested in it have moved on to devicetree, which can be used
to configure the sequence without requiring kernel changes. The method is
also incompatible with the way that most architectures build support for
multiple platforms into a single kernel.
Now the code is producing warnings when built with gcc-5.1:
drivers/tty/sysrq.c: In function 'sysrq_init':
drivers/tty/sysrq.c:959:33: warning: array subscript is above array bounds [-Warray-bounds]
key = platform_sysrq_reset_seq[i];
We could fix this, but it seems unlikely that it will ever be used, so
let's just remove the code instead. We still have the option to pass the
sequence either in DT, using the kernel command line, or using the
/sys/module/sysrq/parameters/reset_seq file.
Fixes: 154b7a489a ("Input: sysrq - allow specifying alternate reset sequence")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Most code already uses consts for the struct kernel_param_ops,
sweep the kernel for the last offending stragglers. Other than
include/linux/moduleparam.h and kernel/params.c all other changes
were generated with the following Coccinelle SmPL patch. Merge
conflicts between trees can be handled with Coccinelle.
In the future git could get Coccinelle merge support to deal with
patch --> fail --> grammar --> Coccinelle --> new patch conflicts
automatically for us on patches where the grammar is available and
the patch is of high confidence. Consider this a feature request.
Test compiled on x86_64 against:
* allnoconfig
* allmodconfig
* allyesconfig
@ const_found @
identifier ops;
@@
const struct kernel_param_ops ops = {
};
@ const_not_found depends on !const_found @
identifier ops;
@@
-struct kernel_param_ops ops = {
+const struct kernel_param_ops ops = {
};
Generated-by: Coccinelle SmPL
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: cocci@systeme.lip6.fr
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Workqueues are used extensively throughout the kernel but sometimes
it's difficult to debug stalls involving work items because visibility
into its inner workings is fairly limited. Although sysrq-t task dump
annotates each active worker task with the information on the work
item being executed, it is challenging to find out which work items
are pending or delayed on which queues and how pools are being
managed.
This patch implements show_workqueue_state() which dumps all busy
workqueues and pools and is called from the sysrq-t handler. At the
end of sysrq-t dump, something like the following is printed.
Showing busy workqueues and worker pools:
...
workqueue filler_wq: flags=0x0
pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
in-flight: 491:filler_workfn, 507:filler_workfn
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
in-flight: 501:filler_workfn
pending: filler_workfn
...
workqueue test_wq: flags=0x8
pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500)
delayed: test_workfn1 BAR(492), test_workfn2
...
pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137
pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469
pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16
pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62
The above shows that test_wq is executing test_workfn() on pid 510
which is the rescuer and also that there are two tasks 69 and 500
waiting for the work item to finish in flush_work(). As test_wq has
max_active of 1, there are two work items for test_workfn1() and
test_workfn2() which are delayed till the current work item is
finished. In addition, pid 492 is flushing test_workfn1().
The work item for test_workfn() is being executed on pwq of pool 2
which is the normal priority per-cpu pool for CPU 1. The pool has
three workers, two of which are executing filler_workfn() for
filler_wq and the last one is assuming the manager role trying to
create more workers.
This extra workqueue state dump will hopefully help chasing down hangs
involving workqueues.
v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting.
v2: As suggested by Andrew, minor formatting change in pr_cont_work(),
printk()'s replaced with pr_info()'s, and cpumask printing now
uses cpulist_pr_cont().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@redhat.com>
Commit 5695be142e ("OOM, PM: OOM killed task shouldn't escape PM
suspend") has left a race window when OOM killer manages to
note_oom_kill after freeze_processes checks the counter. The race
window is quite small and really unlikely and partial solution deemed
sufficient at the time of submission.
Tejun wasn't happy about this partial solution though and insisted on a
full solution. That requires the full OOM and freezer's task freezing
exclusion, though. This is done by this patch which introduces oom_sem
RW lock and turns oom_killer_disable() into a full OOM barrier.
oom_killer_disabled check is moved from the allocation path to the OOM
level and we take oom_sem for reading for both the check and the whole
OOM invocation.
oom_killer_disable() takes oom_sem for writing so it waits for all
currently running OOM killer invocations. Then it disable all the further
OOMs by setting oom_killer_disabled and checks for any oom victims.
Victims are counted via mark_tsk_oom_victim resp. unmark_oom_victim. The
last victim wakes up all waiters enqueued by oom_killer_disable().
Therefore this function acts as the full OOM barrier.
The page fault path is covered now as well although it was assumed to be
safe before. As per Tejun, "We used to have freezing points deep in file
system code which may be reacheable from page fault." so it would be
better and more robust to not rely on freezing points here. Same applies
to the memcg OOM killer.
out_of_memory tells the caller whether the OOM was allowed to trigger and
the callers are supposed to handle the situation. The page allocation
path simply fails the allocation same as before. The page fault path will
retry the fault (more on that later) and Sysrq OOM trigger will simply
complain to the log.
Normally there wouldn't be any unfrozen user tasks after
try_to_freeze_tasks so the function will not block. But if there was an
OOM killer racing with try_to_freeze_tasks and the OOM victim didn't
finish yet then we have to wait for it. This should complete in a finite
time, though, because
- the victim cannot loop in the page fault handler (it would die
on the way out from the exception)
- it cannot loop in the page allocator because all the further
allocation would fail and __GFP_NOFAIL allocations are not
acceptable at this stage
- it shouldn't be blocked on any locks held by frozen tasks
(try_to_freeze expects lockless context) and kernel threads and
work queues are not frozen yet
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While touching this area let's convert printk to pr_*. This also makes
the printing of continuation lines done properly.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With memoryless node support being worked on, it's possible that for
optimizations that a node may not have a non-NULL zonelist. When
CONFIG_NUMA is enabled and node 0 is memoryless, this means the zonelist
for first_online_node may become NULL.
The oom killer requires a zonelist that includes all memory zones for
the sysrq trigger and pagefault out of memory handler.
Ensure that a non-NULL zonelist is always passed to the oom killer.
[akpm@linux-foundation.org: fix non-numa build]
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some sysrq handlers can run for a long time, because they dump a lot of
data onto a serial console. Having RCU stall warnings pop up in the
middle of them only makes the problem worse.
This patch temporarily disables RCU stall warnings while a sysrq request
is handled.
Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Madper Xie <cxie@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Echoing values into /proc/sysrq-trigger seems to be a popular way to get
information out of the kernel. However, dumping information about
thousands of processes, or hundreds of CPUs to serial console can result
in IRQs being blocked for minutes, resulting in various kinds of cascade
failures.
The most common failure is due to interrupts being blocked for a very
long time. This can lead to things like failed IO requests, and other
things the system cannot easily recover from.
This problem is easily fixable by making __handle_sysrq use RCU instead
of spin_lock_irqsave.
This leaves the warning that RCU grace periods have not elapsed for a
long time, but the system will come back from that automatically.
It also leaves sysrq-from-irq-context when the sysrq keys are pressed,
but that is probably desired since people want that to work in
situations where the system is already hosed.
The callers of register_sysrq_key and unregister_sysrq_key appear to be
capable of sleeping.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Madper Xie <cxie@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... instead of naked numbers.
Stuff in sysrq.c used to set it to 8 which is supposed to mean above
default level so set it to DEBUG instead as we're terminating/killing all
tasks and we want to be verbose there.
Also, correct the check in x86_64_start_kernel which should be >= as
we're clearly issuing the string there for all debug levels, not only
the magical 10.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Joe Perches <joe@perches.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Turn the initial value of sysctl kernel.sysrq (SYSRQ_DEFAULT_ENABLE)
into a Kconfig variable.
Original version by Bastian Blank <waldi@debian.org>.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull input updates from Dmitry Torokhov:
"A new driver for slidebar on Ideapad laptops and a bunch of assorted
driver fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (32 commits)
Input: add SYN_MAX and SYN_CNT constants
Input: max11801_ts - convert to devm
Input: egalax-ts - fix typo and improve text
Input: MAINTAINERS - change maintainer for cyttsp driver
Input: cyttsp4 - kill 'defined but not used' compiler warnings
Input: add driver for slidebar on Lenovo IdeaPad laptops
Input: omap-keypad - set up irq type from DT
Input: omap-keypad - enable wakeup capability for keypad.
Input: omap-keypad - clear interrupts on open
Input: omap-keypad - convert to threaded IRQ
Input: omap-keypad - use bitfiled instead of hardcoded values
Input: cyttsp4 - remove useless NULL test from cyttsp4_watchdog_timer()
Input: wacom - fix error return code in wacom_probe()
Input: as5011 - fix error return code in as5011_probe()
Input: keyboard, serio - simplify use of devm_ioremap_resource
Input: tegra-kbc - simplify use of devm_ioremap_resource
Input: htcpen - fix incorrect placement of __initdata
Input: qt1070 - add power management ops
Input: wistron_btns - add MODULE_DEVICE_TABLE
Input: wistron_btns - mark the Medion MD96500 keymap as tested
...
Adding a simple device tree binding for the specification of key
sequences. Definition of the keys found in the sequence are located in
'include/uapi/linux/input.h'.
For the sysrq driver, holding the sequence of keys down for a specific
amount of time will reset the system.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull input updates from Dmitry Torokhov:
"First round of updates for the input subsystem.
You will get a new touchsreen driver for Cypress 4th generation
devices, a driver for a special controller implementing PS/2 protocol
in OLPC devices, and a driver for power key for SiRFprimaII PWRC.
HID and bcm5497 now support for the 2013 MacBook Air.
EVIOCGKEY and the rest of evdev ioctls now flush events of matching
type from the client's event queue so that clients can be sure any
events received after issuing EVIOCG* ioctl are new events.
And a host of cleanups and improvements in other drivers"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (87 commits)
Input: cyttsp4 - kfree xfer_buf on error path in probe()
Input: tps6507x-ts - select INPUT_POLLDEV
Input: bcm5974 - add support for the 2013 MacBook Air
HID: apple: Add support for the 2013 Macbook Air
Input: cyttsp4 - leak on error path in probe()
Input: cyttsp4 - silence NULL dereference warning
Input: cyttsp4 - silence shift wrap warning
Input: tps6507x-ts - convert to polled input device infrastructure
ARM: davinci: da850-evm: remove vref from touchscreen platform data
Input: cyttsp4 - SPI driver for Cypress TMA4XX touchscreen devices
Input: cyttsp4 - I2C driver for Cypress TMA4XX touchscreen devices
Input: cyttsp4 - add core driver for Cypress TMA4XX touchscreen devices
Input: cyttsp - I2C driver split into two modules
Input: add OLPC AP-SP driver
Input: nspire-keypad - remove redundant dev_err call in nspire_keypad_probe()
Input: tps6507x-ts - remove vref from platform data
Input: tps6507x-ts - use bool for booleans
Input: tps6507x-ts - remove bogus unreachable code
Input: samsung-keypad - let device core setup the default pin configuration
Input: wacom_i2c - implement hovering capability
...
Attempt to reboot the system gracefully when a key combo is detected.
If the reste combination is pressed the 2nd time we assume that graceful
reboot failed and perform emergency reboot. This fucntionality is useful
when UI is stuck but the system is otherwise working fine.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull input updates from Dmitry Torokhov:
"Assorted fixes and cleanups to the existing drivers plus a new driver
for IMS Passenger Control Unit device they use for ther in-flight
entertainment system."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (44 commits)
Input: trackpoint - Optimize trackpoint init to use power-on reset
Input: apbps2 - convert to devm_ioremap_resource()
Input: ALPS - use %ph to print buffers
ARM - shmobile: Armadillo800EVA: Move st1232 reset pin handling
Input: st1232 - add reset pin handling
Input: st1232 - convert to devm_* infrastructure
Input: MT - handle semi-mt devices in core
Input: adxl34x - use spi_get_drvdata()
Input: ad7877 - use spi_get_drvdata() and spi_set_drvdata()
Input: ads7846 - use spi_get_drvdata() and spi_set_drvdata()
Input: ims-pcu - fix a memory leak on error
Input: sysrq - supplement reset sequence with timeout functionality
Input: tegra-kbc - support for defining row/columns based on SoC
Input: imx_keypad - switch to using managed resources
Input: arc_ps2 - add support for device tree
Input: mma8450 - fix signed 12bits to 32bits conversion
Input: eeti_ts - remove redundant null check
Input: edt-ft5x06 - remove redundant null check before kfree
Input: ad714x - add CONFIG_PM_SLEEP to suspend/resume functions
Input: adxl34x - add CONFIG_PM_SLEEP to suspend/resume functions
...
Some devices have too few buttons, which it makes it hard to have
a reset combo that won't trigger automatically. As such a
timeout functionality that requires the combination to be held for
a given amount of time before triggering is introduced.
If a key combo is recognized and held for a 'timeout' amount of time,
the system triggers a reset. If the timeout value is omitted the
driver simply ignores the functionality.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Currently help message of /proc/sysrq-trigger highlight its
upper-case characters, like below:
SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
memory-full-oom-kill(F) kill-all-tasks(I) ...
this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.
This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.
This patch fix it.
Thanks the comments from Andrew and Randy.
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When taking an address of an extern array, gcc quite naturally should be
able to say "an address of an object can never be NULL" and just
optimize away the test entirely.
However, the new alternate sysrq reset code (commit 154b7a489a:
"Input: sysrq - allow specifying alternate reset sequence") did exactly
that, and declared platform_sysrq_reset_seq[] as a weak array, and
expecting that testing the address of the array would show whether it
actually got linked against something or not.
And that doesn't work with all gcc versions. Clearly it works with
*some* versions of gcc, and maybe it's even supposed to work, but it
really is a very fragile concept.
So instead of testing the address of the weak variable, just create a
weak instance of that array that is empty. If some platform then has a
real platform_sysrq_reset_seq[] that overrides our weak one, the linker
will switch to that one, and it all works without any run-time
conditionals at all.
Reported-by: Dave Airlie <airlied@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull input updates from Dmitry Torokhov:
"Two new touchpad drivers - Cypress APA I2C Trackpad and Cypress PS/2
touchpad and a big update to ALPS driver from Kevin Cernekee that adds
support for "Rushmore" touchpads and paves way for adding support for
"Dolphin" touchpads.
There is also a new input driver for Goldfish emulator and also
Android keyreset driver was folded into SysRq code.
A few more drivers were updated with device tree bindings and others
got some small cleanups and fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (55 commits)
Input: cyttsp-spi - remove duplicate MODULE_ALIAS()
Input: tsc2005 - add MODULE_ALIAS
Input: tegra-kbc - require CONFIG_OF, remove platform data
Input: synaptics - initialize pointer emulation usage
Input: MT - do not apply filtering on emulated events
Input: bma150 - make some defines public and fix some comments
Input: bma150 - fix checking pm_runtime_get_sync() return value
Input: ALPS - enable trackstick on Rushmore touchpads
Input: ALPS - add support for "Rushmore" touchpads
Input: ALPS - make the V3 packet field decoder "pluggable"
Input: ALPS - move pixel and bitmap info into alps_data struct
Input: ALPS - fix command mode check
Input: ALPS - rework detection of Pinnacle AGx touchpads
Input: ALPS - move {addr,nibble}_command settings into alps_set_defaults()
Input: ALPS - use function pointers for different protocol handlers
Input: ALPS - rework detection sequence
Input: ALPS - introduce helper function for repeated commands
Input: ALPS - move alps_get_model() down below hw_init code
Input: ALPS - copy "model" info into alps_data struct
Input: ALPS - document the alps.h data structures
...
Move rt scheduler definitions out of include/linux/sched.h into
new file include/linux/sched/rt.h
Signed-off-by: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds keyreset functionality to the sysrq driver. It allows
certain button/key combinations to be used in order to trigger emergency
reboots.
Redefining the '__weak platform_sysrq_reset_seq' variable is required
to trigger the feature. Alternatively keys can be passed to the driver
via a module parameter.
This functionality comes from the keyreset driver submitted by
Arve Hjønnevåg in the Android kernel.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
With hotpluggable and memoryless nodes, it's possible that node 0 will
not be online, so use the first online node's zonelist rather than
hardcoding node 0 to pass a zonelist with all zones to the oom killer.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change send_sig_all() to use do_send_sig_info(SEND_SIG_FORCED) instead
of force_sig(SIGKILL). With the recent changes we do not need force_ to
kill the CLONE_NEWPID tasks.
And this is more correct. force_sig() can race with the exiting thread,
while do_send_sig_info(group => true) kill the whole process.
Some more notes from Oleg Nesterov:
> Just one note. This change makes no difference for sysrq_handle_kill().
> But it obviously changes the behaviour sysrq_handle_term(). I think
> this is fine, if you want to really kill the task which blocks/ignores
> SIGTERM you can use sysrq_handle_kill().
>
> Even ignoring the reasons why force_sig() is simply wrong here,
> force_sig(SIGTERM) looks strange. The task won't be killed if it has
> a handler, but SIG_IGN can't help. However if it has the handler
> but blocks SIGTERM temporary (this is very common) it will be killed.
Also,
> force_sig() can't kill the process if the main thread has already
> exited. IOW, it is trivial to create the process which can't be
> killed by sysrq.
So, this patch fixes the issue.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The oom killer chooses not to kill a thread if:
- an eligible thread has already been oom killed and has yet to exit,
and
- an eligible thread is exiting but has yet to free all its memory and
is not the thread attempting to currently allocate memory.
SysRq+F manually invokes the global oom killer to kill a memory-hogging
task. This is normally done as a last resort to free memory when no
progress is being made or to test the oom killer itself.
For both uses, we always want to kill a thread and never defer. This
patch causes SysRq+F to always kill an eligible thread and can be used to
force a kill even if another oom killed thread has failed to exit.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keyboard struct lifetime is easy, but the locking is not and is completely
ignored by the existing code. Tackle this one head on
- Make the kbd_table private so we can run down all direct users
- Hoick the relevant ioctl handlers into the keyboard layer
- Lock them with the keyboard lock so they don't change mid keypress
- Add helpers for things like console stop/start so we isolate the poking
around properly
- Tweak the braille console so it still builds
There are a couple of FIXME locking cases left for ioctls that are so hideous
they should be addressed in a later patch. After this patch the kbd_table is
private and all the keyboard jiggery pokery is in one place.
This update fixes speakup and also a memory leak in the original.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's a real possibility of killing kernel threads that might
have issued use_mm(), so kthread's mm might become non-NULL.
This patch fixes the issue by checking for PF_KTHREAD (just as
get_task_mm()).
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysrq should grab the tasklist lock, otherwise calling force_sig() is
not safe, as it might race with exiting task, which ->sighand might be
set to NULL already.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
kill_bdev as well, so brd doesn't have to open code it. Reduce
buffer_head.h requirement accordingly.
Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving. The small comment replacing it says enough.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit ddd588b5dd ("oom: suppress nodes that are not allowed from
meminfo on oom kill") moved lib/show_mem.o out of lib/lib.a, which
resulted in build warnings on all architectures that implement their own
versions of show_mem():
lib/lib.a(show_mem.o): In function `show_mem':
show_mem.c:(.text+0x1f4): multiple definition of `show_mem'
arch/sparc/mm/built-in.o:(.text+0xd70): first defined here
The fix is to remove __show_mem() and add its argument to show_mem() in
all implementations to prevent this breakage.
Architectures that implement their own show_mem() actually don't do
anything with the argument yet, but they could be made to filter nodes
that aren't allowed in the current context in the future just like the
generic implementation.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: James Bottomley <James.Bottomley@hansenpartnership.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: wacom - pass touch resolution to clients through input_absinfo
Input: wacom - add 2 Bamboo Pen and touch models
Input: sysrq - ensure sysrq_enabled and __sysrq_enabled are consistent
Input: sparse-keymap - fix KEY_VSW handling in sparse_keymap_setup
Input: tegra-kbc - add tegra keyboard driver
Input: gpio_keys - switch to using request_any_context_irq
Input: serio - allow registered drivers to get status flag
Input: ct82710c - return proper error code for ct82c710_open
Input: bu21013_ts - added regulator support
Input: bu21013_ts - remove duplicate resolution parameters
Input: tnetv107x-ts - don't treat NULL clk as an error
Input: tnetv107x-keypad - don't treat NULL clk as an error
Fix up trivial conflicts in drivers/input/keyboard/Makefile due to
additions of tc3589x/Tegra drivers
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: fix typo in keycode validation supporting large scancodes
Input: aiptek - tighten up permissions on sysfs attributes
Input: sysrq - pass along lone Alt + SysRq
The tty code should be in its own subdirectory and not in the char
driver with all of the cruft that is currently there.
Based on work done by Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>