Commit graph

18093 commits

Author SHA1 Message Date
Thomas Gleixner
5cee964597 time/timers: Move all time(r) related files into kernel/time
Except for Kconfig.HZ. That needs a separate treatment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-23 11:22:35 +02:00
Thomas Gleixner
f037c1171d fork: Use ktime_get_ts()
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140611234607.427408044@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
2014-06-12 16:18:45 +02:00
Thomas Gleixner
a9821c741c kdb: Use ktime_get_ts()
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20140611234607.261629142@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-12 16:18:45 +02:00
Thomas Gleixner
4e8c5847d1 tsacct: Use ktime_get_ts()
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611234606.840900621@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-12 16:18:45 +02:00
Thomas Gleixner
b5d7682533 delayacct: Use ktime_get_ts()
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts(). Remove the
silly wrapper while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611234606.931409215@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-12 16:18:45 +02:00
Thomas Gleixner
22001821d9 acct: Use ktime_get_ts()
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611234606.764810535@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-12 16:18:44 +02:00
Rik van Riel
1662867a9b numa,sched: fix load_to_imbalanced logic inversion
This function is supposed to return true if the new load imbalance is
worse than the old one.  It didn't.  I can only hope brown paper bags
are in style.

Now things converge much better on both the 4 node and 8 node systems.

I am not sure why this did not seem to impact specjbb performance on the
4 node system, which is the system I have full-time access to.

This bug was introduced recently, with commit e63da03639 ("sched/numa:
Allow task switch if load imbalance improves")

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-08 14:35:05 -07:00
Linus Torvalds
3f17ea6dea Merge branch 'next' (accumulated 3.16 merge window patches) into master
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.

* accumulated work in next: (6809 commits)
  ufs: sb mutex merge + mutex_destroy
  powerpc: update comments for generic idle conversion
  cris: update comments for generic idle conversion
  idle: remove cpu_idle() forward declarations
  nbd: zero from and len fields in NBD_CMD_DISCONNECT.
  mm: convert some level-less printks to pr_*
  MAINTAINERS: adi-buildroot-devel is moderated
  MAINTAINERS: add linux-api for review of API/ABI changes
  mm/kmemleak-test.c: use pr_fmt for logging
  fs/dlm/debug_fs.c: replace seq_printf by seq_puts
  fs/dlm/lockspace.c: convert simple_str to kstr
  fs/dlm/config.c: convert simple_str to kstr
  mm: mark remap_file_pages() syscall as deprecated
  mm: memcontrol: remove unnecessary memcg argument from soft limit functions
  mm: memcontrol: clean up memcg zoneinfo lookup
  mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
  mm/mempool.c: update the kmemleak stack trace for mempool allocations
  lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
  mm: introduce kmemleak_update_trace()
  mm/kmemleak.c: use %u to print ->checksum
  ...
2014-06-08 11:31:16 -07:00
Joe Perches
6f8fd1d77e sysctl: convert use of typedef ctl_table to struct ctl_table
This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:16 -07:00
Fabian Frederick
119ce5c8b9 kernel/seccomp.c: kernel-doc warning fix
+ fix small typo

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:15 -07:00
Paul McQuade
46c0a8ca3e ipc, kernel: clear whitespace
trailing whitespace

Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Paul McQuade
7153e40273 ipc, kernel: use Linux headers
Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Use #include <linux/types.h> instead of <asm/types.h>

Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Fabian Frederick
f3da64d1ea kernel/profile.c: use static const char instead of static char
schedstr, sleepstr and kvmstr are only used in strcmp & strlen

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Fabian Frederick
aba871f1e9 kernel/profile.c: convert printk to pr_foo()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Fabian Frederick
68a9a435e4 kernel/user_namespace.c: kernel-doc/checkpatch fixes
-uid->gid
-split some function declarations
-if/then/else warning

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Kees Cook
f4aacea2f5 sysctl: allow for strict write position handling
When writing to a sysctl string, each write, regardless of VFS position,
begins writing the string from the start.  This means the contents of
the last write to the sysctl controls the string contents instead of the
first:

  open("/proc/sys/kernel/modprobe", O_WRONLY)   = 1
  write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096
  write(1, "/bin/true", 9)                = 9
  close(1)                                = 0

  $ cat /proc/sys/kernel/modprobe
  /bin/true

Expected behaviour would be to have the sysctl be "AAAA..." capped at
maxlen (in this case KMOD_PATH_LEN: 256), instead of truncating to the
contents of the second write.  Similarly, multiple short writes would
not append to the sysctl.

The old behavior is unlike regular POSIX files enough that doing audits
of software that interact with sysctls can end up in unexpected or
dangerous situations.  For example, "as long as the input starts with a
trusted path" turns out to be an insufficient filter, as what must also
happen is for the input to be entirely contained in a single write
syscall -- not a common consideration, especially for high level tools.

This provides kernel.sysctl_writes_strict as a way to make this behavior
act in a less surprising manner for strings, and disallows non-zero file
position when writing numeric sysctls (similar to what is already done
when reading from non-zero file positions).  For now, the default (0) is
to warn about non-zero file position use, but retain the legacy
behavior.  Setting this to -1 disables the warning, and setting this to
1 enables the file position respecting behavior.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: move misplaced hunk, per Randy]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Kees Cook
2ca9bb456a sysctl: refactor sysctl string writing logic
Consolidate buffer length checking with new-line/end-of-line checking.
Additionally, instead of reading user memory twice, just do the
assignment during the loop.

This change doesn't affect the potential races here.  It was already
possible to read a sysctl that was in the middle of a write.  In both
cases, the string will always be NULL terminated.  The pre-existing race
remains a problem to be solved.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Kees Cook
f88083005a sysctl: clean up char buffer arguments
When writing to a sysctl string, each write, regardless of VFS position,
began writing the string from the start.  This meant the contents of the
last write to the sysctl controlled the string contents instead of the
first.

This misbehavior was featured in an exploit against Chrome OS.  While
it's not in itself a vulnerability, it's a weirdness that isn't on the
mind of most auditors: "This filter looks correct, the first line
written would not be meaningful to sysctl" doesn't apply here, since the
size of the write and the contents of the final write are what matter
when writing to sysctls.

This adds the sysctl kernel.sysctl_writes_strict to control the write
behavior.  The default (0) reports when VFS position is non-0 on a
write, but retains legacy behavior, -1 disables the warning, and 1
enables the position-respecting behavior.

The long-term plan here is to wait for userspace to be fixed in response
to the new warning and to then switch the default kernel behavior to the
new position-respecting behavior.

This patch (of 4):

The char buffer arguments are needlessly cast in weird places.  Clean it
up so things are easier to read.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Fabian Frederick
e1bebcf41e kernel/kexec.c: convert printk to pr_foo()
+ some pr_warning -> pr_warn and checkpatch warning fixes

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Masami Hiramatsu
f06e5153f4 kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers
Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg.  This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).

Usage: add "crash_kexec_post_notifiers" to kernel boot option.

Note that this actually increases risks of the failure of kdump.  This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Motohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Satoru MORIYA <satoru.moriya.br@hitachi.com>
Cc: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Srivatsa S. Bhat
a219ccf463 smp: print more useful debug info upon receiving IPI on an offline CPU
There is a longstanding problem related to CPU hotplug which causes IPIs
to be delivered to offline CPUs, and the smp-call-function IPI handler
code prints out a warning whenever this is detected.  Every once in a
while this (usually harmless) warning gets reported on LKML, but so far
it has not been completely fixed.  Usually the solution involves finding
out the IPI sender and fixing it by adding appropriate synchronization
with CPU hotplug.

However, while going through one such internal bug reports, I found that
there is a significant bug in the receiver side itself (more
specifically, in stop-machine) that can lead to this problem even when
the sender code is perfectly fine.  This patchset fixes that
synchronization problem in the CPU hotplug stop-machine code.

Patch 1 adds some additional debug code to the smp-call-function
framework, to help debug such issues easily.

Patch 2 modifies the stop-machine code to ensure that any IPIs that were
sent while the target CPU was online, would be noticed and handled by
that CPU without fail before it goes offline.  Thus, this avoids
scenarios where IPIs are received on offline CPUs (as long as the sender
uses proper hotplug synchronization).

In fact, I debugged the problem by using Patch 1, and found that the
payload of the IPI was always the block layer's trigger_softirq()
function.  But I was not able to find anything wrong with the block
layer code.  That's when I started looking at the stop-machine code and
realized that there is a race-window which makes the IPI _receiver_ the
culprit, not the sender.  Patch 2 fixes that race and hence this should
put an end to most of the hard-to-debug IPI-to-offline-CPU issues.

This patch (of 2):

Today the smp-call-function code just prints a warning if we get an IPI
on an offline CPU.  This info is sufficient to let us know that
something went wrong, but often it is very hard to debug exactly who
sent the IPI and why, from this info alone.

In most cases, we get the warning about the IPI to an offline CPU,
immediately after the CPU going offline comes out of the stop-machine
phase and reenables interrupts.  Since all online CPUs participate in
stop-machine, the information regarding the sender of the IPI is already
lost by the time we exit the stop-machine loop.  So even if we dump the
stack on each CPU at this point, we won't find anything useful since all
of them will show the stack-trace of the stopper thread.  So we need a
better way to figure out who sent the IPI and why.

To achieve this, when we detect an IPI targeted to an offline CPU, loop
through the call-single-data linked list and print out the payload
(i.e., the name of the function which was supposed to be executed by the
target CPU).  This would give us an insight as to who might have sent
the IPI and help us debug this further.

[akpm@linux-foundation.org: correctly suppress warning output on second and later occurrences]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <mgalbraith@suse.de>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Oleg Nesterov
76e0a6f40b signals: change wait_for_helper() to use kernel_sigaction()
Now that we have kernel_sigaction() we can change wait_for_helper() to
use it and cleans up the code a bit.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Oleg Nesterov
b4e74264eb signals: introduce kernel_sigaction()
Now that allow_signal() is really trivial we can unify it with
disallow_signal().  Add the new helper, kernel_sigaction(), and
reimplement allow_signal/disallow_signal as a trivial wrappers.

This saves one EXPORT_SYMBOL() and the new helper can have more users.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Oleg Nesterov
580d34e42a signals: disallow_signal() should flush the potentially pending signal
disallow_signal() simply sets SIG_IGN, this is not enough and
recalc_sigpending() is simply pointless because in can never change the
state of TIF_SIGPENDING.

If we ignore a signal, we also need to do flush_sigqueue_mask() for the
case when this signal is pending, this way recalc_sigpending() can
actually clear TIF_SIGPENDING and we do not "leak" the allocated
siginfo's.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:12 -07:00
Oleg Nesterov
ec5955b8fd signals: kill the obsolete sigdelset() and recalc_sigpending() in allow_signal()
allow_signal() does sigdelset(current->blocked) due to historic reason,
previously it could be called by a daemonize()'ed kthread, and
daemonize() played with current->blocked.

Now that daemonize() has gone away we can remove sigdelset() and
recalc_sigpending().  If a user really wants to unblock a signal, it
must use sigprocmask() or set_current_block() explicitely.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Oleg Nesterov
0341729b4b signals: mv {dis,}allow_signal() from sched.h/exit.c to signal.[ch]
Move the declaration/definition of allow_signal/disallow_signal to
signal.h/signal.c.  The new place is more logical and allows to use the
static helpers in signal.c (see the next changes).

While at it, make them return void and remove the valid_signal() check.
Nobody checks the returned value, and in-kernel users must not pass the
wrong signal number.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Oleg Nesterov
afe2b0386a signals: cleanup the usage of t/current in do_sigaction()
The usage of "task_struct *t" and "current" in do_sigaction() looks really
annoying and chaotic.  Initially "t" is used as a cached value of current
but not consistently, then it is reused as a loop variable and we have to
use "current" again.

Clean up this mess and also convert the code to use for_each_thread().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Oleg Nesterov
c09c144139 signals: rename rm_from_queue_full() to flush_sigqueue_mask()
"rm_from_queue_full" looks ugly and misleading, especially now that
rm_from_queue() has gone away.  Rename it to flush_sigqueue_mask(), this
matches flush_sigqueue() we already have.

Also remove the obsolete comment which explains the difference with
rm_from_queue() we already killed.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Oleg Nesterov
9490592f27 signals: kill rm_from_queue(), change prepare_signal() to use for_each_thread()
rm_from_queue() doesn't make sense.  The only caller, prepare_signal(),
can use rm_from_queue_full() with the same effect.

While at it, change prepare_signal() to use for_each_thread() instead of
do/while_each_thread.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Oleg Nesterov
6114041aa7 signals: s/siginitset/sigemptyset/ in do_sigtimedwait()
Cosmetic, but siginitset(0) looks a bit strange, sigemptyset() is what
do_sigtimedwait() needs.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Oleg Nesterov
650226bd95 ptrace: task_clear_jobctl_trapping()->wake_up_bit() needs mb()
__wake_up_bit() checks waitqueue_active() and thus the caller needs mb()
as wake_up_bit() documents, fix task_clear_jobctl_trapping().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Matthew Dempsky
4e52365f27 ptrace: fix fork event messages across pid namespaces
When tracing a process in another pid namespace, it's important for fork
event messages to contain the child's pid as seen from the tracer's pid
namespace, not the parent's.  Otherwise, the tracer won't be able to
correlate the fork event with later SIGTRAP signals it receives from the
child.

We still risk a race condition if a ptracer from a different pid
namespace attaches after we compute the pid_t value.  However, sending a
bogus fork event message in this unlikely scenario is still a vast
improvement over the status quo where we always send bogus fork event
messages to debuggers in a different pid namespace than the forking
process.

Signed-off-by: Matthew Dempsky <mdempsky@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Julien Tinnes <jln@chromium.org>
Cc: Roland McGrath <mcgrathr@chromium.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Linus Torvalds
d54d14bfb4 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Four misc fixes: each was deemed serious enough to warrant v3.15
  inclusion"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock
  sched/dl: Fix race in dl_task_timer()
  sched: Fix sched_policy < 0 comparison
  sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled
2014-06-06 09:53:32 -07:00
Linus Torvalds
eb3d3ec567 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm into next
Pull ARM updates from Russell King:

 - Major clean-up of the L2 cache support code.  The existing mess was
   becoming rather unmaintainable through all the additions that others
   have done over time.  This turns it into a much nicer structure, and
   implements a few performance improvements as well.

 - Clean up some of the CP15 control register tweaks for alignment
   support, moving some code and data into alignment.c

 - DMA properties for ARM, from Santosh and reviewed by DT people.  This
   adds DT properties to specify bus translations we can't discover
   automatically, and to indicate whether devices are coherent.

 - Hibernation support for ARM

 - Make ftrace work with read-only text in modules

 - add suspend support for PJ4B CPUs

 - rework interrupt masking for undefined instruction handling, which
   allows us to enable interrupts earlier in the handling of these
   exceptions.

 - support for big endian page tables

 - fix stacktrace support to exclude stacktrace functions from the
   trace, and add save_stack_trace_regs() implementation so that kprobes
   can record stack traces.

 - Add support for the Cortex-A17 CPU.

 - Remove last vestiges of ARM710 support.

 - Removal of ARM "meminfo" structure, finally converting us solely to
   memblock to handle the early memory initialisation.

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (142 commits)
  ARM: ensure C page table setup code follows assembly code (part II)
  ARM: ensure C page table setup code follows assembly code
  ARM: consolidate last remaining open-coded alignment trap enable
  ARM: remove global cr_no_alignment
  ARM: remove CPU_CP15 conditional from alignment.c
  ARM: remove unused adjust_cr() function
  ARM: move "noalign" command line option to alignment.c
  ARM: provide common method to clear bits in CPU control register
  ARM: 8025/1: Get rid of meminfo
  ARM: 8060/1: mm: allow sub-architectures to override PCI I/O memory type
  ARM: 8066/1: correction for ARM patch 8031/2
  ARM: 8049/1: ftrace/add save_stack_trace_regs() implementation
  ARM: 8065/1: remove last use of CONFIG_CPU_ARM710
  ARM: 8062/1: Modify ldrt fixup handler to re-execute the userspace instruction
  ARM: 8047/1: rwsem: use asm-generic rwsem implementation
  ARM: l2c: trial at enabling some Cortex-A9 optimisations
  ARM: l2c: add warnings for stuff modifying aux_ctrl register values
  ARM: l2c: print a warning with L2C-310 caches if the cache size is modified
  ARM: l2c: remove old .set_debug method
  ARM: l2c: kill L2X0_AUX_CTRL_MASK before anyone else makes use of this
  ...
2014-06-05 15:57:04 -07:00
Linus Torvalds
1c5aefb5b1 Merge branch 'futex-fixes' (futex fixes from Thomas Gleixner)
Merge futex fixes from Thomas Gleixner:
 "So with more awake and less futex wreckaged brain, I went through my
  list of points again and came up with the following 4 patches.

  1) Prevent pi requeueing on the same futex

     I kept Kees check for uaddr1 == uaddr2 as a early check for private
     futexes and added a key comparison to both futex_requeue and
     futex_wait_requeue_pi.

     Sebastian, sorry for the confusion yesterday night.  I really
     misunderstood your question.

     You are right the check is pointless for shared futexes where the
     same physical address is mapped to two different virtual addresses.

  2) Sanity check atomic acquisiton in futex_lock_pi_atomic

     That's basically what Darren suggested.

     I just simplified it to use futex_top_waiter() to find kernel
     internal state.  If state is found return -EINVAL and do not bother
     to fix up the user space variable.  It's corrupted already.

  3) Ensure state consistency in futex_unlock_pi

     The code is silly versus the owner died bit.  There is no point to
     preserve it on unlock when the user space thread owns the futex.

     What's worse is that it does not update the user space value when
     the owner died bit is set.  So the kernel itself creates observable
     inconsistency.

     Another "optimization" is to retry an atomic unlock.  That's
     pointless as in a sane environment user space would not call into
     that code if it could have unlocked it atomically.  So we always
     check whether there is kernel state around and only if there is
     none, we do the unlock by setting the user space value to 0.

  4) Sanitize lookup_pi_state

     lookup_pi_state is ambigous about TID == 0 in the user space value.

     This can be a valid state even if there is kernel state on this
     uaddr, but we miss a few corner case checks.

     I tried to come up with a smaller solution hacking the checks into
     the current cruft, but it turned out to be ugly as hell and I got
     more confused than I was before.  So I rewrote the sanity checks
     along the state documentation with awful lots of commentry"

* emailed patches from Thomas Gleixner <tglx@linutronix.de>:
  futex: Make lookup_pi_state more robust
  futex: Always cleanup owner tid in unlock_pi
  futex: Validate atomic acquisition in futex_lock_pi_atomic()
  futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
2014-06-05 12:31:32 -07:00
Thomas Gleixner
54a217887a futex: Make lookup_pi_state more robust
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex.  We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address.  This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

       Waiter | pi_state | pi->owner | uTID      | uODIED | ?

  [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
  [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid

  [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid

  [4]  Found  | Found    | NULL      | 0         | 1      | Valid
  [5]  Found  | Found    | NULL      | >0        | 1      | Invalid

  [6]  Found  | Found    | task      | 0         | 1      | Valid

  [7]  Found  | Found    | NULL      | Any       | 0      | Invalid

  [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
  [9]  Found  | Found    | task      | 0         | 0      | Invalid
  [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid

 [1] Indicates that the kernel can acquire the futex atomically. We
     came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

 [2] Valid, if TID does not belong to a kernel thread. If no matching
     thread is found then it indicates that the owner TID has died.

 [3] Invalid. The waiter is queued on a non PI futex

 [4] Valid state after exit_robust_list(), which sets the user space
     value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

 [5] The user space value got manipulated between exit_robust_list()
     and exit_pi_state_list()

 [6] Valid state after exit_pi_state_list() which sets the new owner in
     the pi_state but cannot access the user space value.

 [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.

 [8] Owner and user space value match

 [9] There is no transient state which sets the user space TID to 0
     except exit_robust_list(), but this is indicated by the
     FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
     TID out of sync.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
13fbca4c6e futex: Always cleanup owner tid in unlock_pi
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex.  So the owner TID of the current owner
(the unlocker) persists.  That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
b3eaa9fc5c futex: Validate atomic acquisition in futex_lock_pi_atomic()
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state.  If
it has, return -EINVAL.  The state is corrupted already, so no point in
cleaning it up.  Subsequent calls will fail as well.  Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
  	restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
e9c243a5a6 futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call.  If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
  	different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Linus Torvalds
a0abcf2e8f Merge branch 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull x86 cdso updates from Peter Anvin:
 "Vdso cleanups and improvements largely from Andy Lutomirski.  This
  makes the vdso a lot less ''special''"

* 'x86/vdso' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso, build: Make LE access macros clearer, host-safe
  x86/vdso, build: Fix cross-compilation from big-endian architectures
  x86/vdso, build: When vdso2c fails, unlink the output
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
  x86, mm: Replace arch_vma_name with vm_ops->name for vsyscalls
  x86, mm: Improve _install_special_mapping and fix x86 vdso naming
  mm, fs: Add vm_ops->name as an alternative to arch_vma_name
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
  x86, vdso: Remove vestiges of VDSO_PRELINK and some outdated comments
  x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSO
  x86, vdso: Move the 32-bit vdso special pages after the text
  x86, vdso: Reimplement vdso.so preparation in build-time C
  x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.c
  x86, vdso: Clean up 32-bit vs 64-bit vdso params
  x86, mm: Ensure correct alignment of the fixmap
2014-06-05 08:05:29 -07:00
Russell King
1fb333489f Merge branches 'alignment', 'fixes', 'l2c' (early part) and 'misc' into for-next 2014-06-05 12:35:52 +01:00
Roman Gushchin
09dc4ab039 sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock
tg_set_cfs_bandwidth() sets cfs_b->timer_active to 0 to
force the period timer restart. It's not safe, because
can lead to deadlock, described in commit 927b54fccb:
"__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq->lock, resulting in deadlock."

Three CPUs must be involved:

  CPU0               CPU1                         CPU2
  take rq->lock      period timer fired
  ...                take cfs_b lock
  ...                ...                          tg_set_cfs_bandwidth()
  throttle_cfs_rq()  release cfs_b lock           take cfs_b lock
  ...                distribute_cfs_runtime()     timer_active = 0
  take cfs_b->lock   wait for rq->lock            ...
  __start_cfs_bandwidth()
  {wait for timer callback
   break if timer_active == 1}

So, CPU0 and CPU1 are deadlocked.

Instead of resetting cfs_b->timer_active, tg_set_cfs_bandwidth can
wait for period timer callbacks (ignoring cfs_b->timer_active) and
restart the timer explicitly.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/87wqdi9g8e.wl\%klamm@yandex-team.ru
Cc: pjt@google.com
Cc: chris.j.arges@canonical.com
Cc: gregkh@linuxfoundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:51:34 +02:00
Kirill Tkhai
0f397f2c90 sched/dl: Fix race in dl_task_timer()
Throttled task is still on rq, and it may be moved to other cpu
if user is playing with sched_setaffinity(). Therefore, unlocked
task_rq() access makes the race.

Juri Lelli reports he got this race when dl_bandwidth_enabled()
was not set.

Other thing, pointed by Peter Zijlstra:

   "Now I suppose the problem can still actually happen when
    you change the root domain and trigger a effective affinity
    change that way".

To fix that we do the same as made in __task_rq_lock(). We do not
use __task_rq_lock() itself, because it has a useful lockdep check,
which is not correct in case of dl_task_timer(). We do not need
pi_lock locked here. This case is an exception (PeterZ):

   "The only reason we don't strictly need ->pi_lock now is because
    we're guaranteed to have p->state == TASK_RUNNING here and are
    thus free of ttwu races".

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.14+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3056991400578422@web14g.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:51:12 +02:00
Richard Weinberger
b14ed2c273 sched: Fix sched_policy < 0 comparison
attr.sched_policy is u32, therefore a comparison against < 0 is never true.
Fix this by casting sched_policy to int.

This issue was reported by coverity CID 1219934.

Fixes: dbdb22754f ("sched: Disallow sched_attr::sched_policy < 0")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1401741514-7045-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:07:43 +02:00
Steven Rostedt
e9dd685ce8 sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled
As Peter Zijlstra told me, we have the following path:

do_exit()
  exit_itimers()
    itimer_delete()
      spin_lock_irqsave(&timer->it_lock, &flags);
      timer_delete_hook(timer);
        kc->timer_del(timer) := posix_cpu_timer_del()
          put_task_struct()
            __put_task_struct()
              task_numa_free()
                spin_lock(&grp->lock);

Which means that task_numa_free() can be called with interrupts
disabled, which means that we should not be using spin_lock_irq() but
spin_lock_irqsave() instead. Otherwise we are enabling interrupts while
holding an interrupt unsafe lock!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner<tglx@linutronix.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140527182541.GH11096@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:07:41 +02:00
Linus Torvalds
00170fdd08 Merge branch 'akpm' (patchbomb from Andrew) into next
Merge misc updates from Andrew Morton:

 - a few fixes for 3.16.  Cc'ed to stable so they'll get there somehow.

 - various misc fixes and cleanups

 - most of the ocfs2 queue.  Review is slow...

 - most of MM.  The MM queue is pretty huge this time, but not much in
   the way of feature work.

 - some tweaks under kernel/

 - printk maintenance work

 - updates to lib/

 - checkpatch updates

 - tweaks to init/

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (276 commits)
  fs/autofs4/dev-ioctl.c: add __init to autofs_dev_ioctl_init
  fs/ncpfs/getopt.c: replace simple_strtoul by kstrtoul
  init/main.c: remove an ifdef
  kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND
  init/main.c: add initcall_blacklist kernel parameter
  init/main.c: don't use pr_debug()
  fs/binfmt_flat.c: make old_reloc() static
  fs/binfmt_elf.c: fix bool assignements
  fs/efs: convert printk(KERN_DEBUG to pr_debug
  fs/efs: add pr_fmt / use __func__
  fs/efs: convert printk to pr_foo()
  scripts/checkpatch.pl: device_initcall is not the only __initcall substitute
  checkpatch: check stable email address
  checkpatch: warn on unnecessary void function return statements
  checkpatch: prefer kstrto<foo> to sscanf(buf, "%<lhuidx>", &bar);
  checkpatch: add warning for kmalloc/kzalloc with multiply
  checkpatch: warn on #defines ending in semicolon
  checkpatch: make --strict a default for files in drivers/net and net/
  checkpatch: always warn on missing blank line after variable declaration block
  checkpatch: fix wildcard DT compatible string checking
  ...
2014-06-04 16:55:13 -07:00
Fabian Frederick
6516a46619 kernel/compat.c: use sizeof() instead of sizeof
Fix 4 checkpatch warnings
WARNING: sizeof *tv should be sizeof(*tv)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:19 -07:00
Borislav Petkov
a8fe19ebfb kernel/printk: use symbolic defines for console loglevels
... instead of naked numbers.

Stuff in sysrq.c used to set it to 8 which is supposed to mean above
default level so set it to DEBUG instead as we're terminating/killing all
tasks and we want to be verbose there.

Also, correct the check in x86_64_start_kernel which should be >= as
we're clearly issuing the string there for all debug levels, not only
the magical 10.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Joe Perches <joe@perches.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:17 -07:00
Will Deacon
84b5ec8a9d printk: report dropping of messages from logbuf
If the log ring buffer becomes full, we silently overwrite old messages
with new data.  console_unlock will detect this case and fast-forward the
console_* pointers to skip over the corrupted data, but nothing will be
reported to the user.

This patch hijacks the first valid log message after detecting that we
dropped messages and prefixes it with a note detailing how many messages
were dropped.  For long (~1000 char) messages, this will result in some
truncation of the real message, but given that we're dropping things
anyway, that doesn't seem to be the end of the world.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:17 -07:00
John Stultz
6d9bcb621b timekeeping: use printk_deferred when holding timekeeping seqlock
Jiri Bohac pointed out that there are rare but potential deadlock
possibilities when calling printk while holding the timekeeping
seqlock.

This is due to printk() triggering console sem wakeup, which can
cause scheduling code to trigger hrtimers which may try to read
the time.

Specifically, as Jiri pointed out, that path is:
  printk
    vprintk_emit
      console_unlock
        up(&console_sem)
          __up
	    wake_up_process
	      try_to_wake_up
	        ttwu_do_activate
		  ttwu_activate
		    activate_task
		      enqueue_task
		        enqueue_task_fair
			  hrtick_update
			    hrtick_start_fair
			      hrtick_start_fair
			        get_time
				  ktime_get
				    --> endless loop on
				    read_seqcount_retry(&timekeeper_seq, ...)

This patch tries to avoid this issue by using printk_deferred (previously
named printk_sched) which should defer printing via a irq_work_queue.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reported-by: Jiri Bohac <jbohac@suse.cz>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:17 -07:00