Commit graph

18093 commits

Author SHA1 Message Date
John Stultz
c224815dac printk: Add printk_deferred_once
Two of the three prink_deferred uses are really printk_once style
uses, so add a printk_deferred_once macro to simplify those call
sites.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:17 -07:00
John Stultz
aac74dc495 printk: rename printk_sched to printk_deferred
After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.

This only changes the function name. No logic changes.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:17 -07:00
John Stultz
8195460626 printk: disable preemption for printk_sched
An earlier change in -mm (printk: remove separate printk_sched
buffers...), removed the printk_sched irqsave/restore lines since it was
safe for current users.  Since we may be expanding usage of
printk_sched(), disable preepmtion for this function to make it more
generally safe to call.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:17 -07:00
Steven Rostedt
458df9fd48 printk: remove separate printk_sched buffers and use printk buf instead
To prevent deadlocks with doing a printk inside the scheduler,
printk_sched() was created.  The issue is that printk has a console_sem
that it can grab and release.  The release does a wake up if there's a
task pending on the sem, and this wake up grabs the rq locks that is
held in the scheduler.  This leads to a possible deadlock if the wake up
uses the same rq as the one with the rq lock held already.

What printk_sched() does is to save the printk write in a per cpu buffer
and sets the PRINTK_PENDING_SCHED flag.  On a timer tick, if this flag is
set, the printk() is done against the buffer.

There's a couple of issues with this approach.

1) If two printk_sched()s are called before the tick, the second one
   will overwrite the first one.

2) The temporary buffer is 512 bytes and is per cpu.  This is a quite a
   bit of space wasted for something that is seldom used.

In order to remove this, the printk_sched() can use the printk buffer
instead, and delay the console_trylock()/console_unlock() to the queued
work.

Because printk_sched() would then be taking the logbuf_lock, the
logbuf_lock must not be held while doing anything that may call into the
scheduler functions, which includes wake ups.  Unfortunately, printk()
also has a console_sem that it uses, and on release, the up(&console_sem)
may do a wake up of any pending waiters.  This must be avoided while
holding the logbuf_lock.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:17 -07:00
Jan Kara
939f04bec1 printk: enable interrupts before calling console_trylock_for_printk()
We need interrupts disabled when calling console_trylock_for_printk()
only so that cpu id we pass to can_use_console() remains valid (for
other things console_sem provides all the exclusion we need and
deadlocks on console_sem due to interrupts are impossible because we use
down_trylock()).  However if we are rescheduled, we are guaranteed to
run on an online cpu so we can easily just get the cpu id in
can_use_console().

We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH
it can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:17 -07:00
Jan Kara
bd8d7cf5b8 printk: fix lockdep instrumentation of console_sem
Printk calls mutex_acquire() / mutex_release() by hand to instrument
lockdep about console_sem.  However in some corner cases the
instrumentation is missing.  Fix the problem by creating helper functions
for locking / unlocking console_sem which take care of lockdep
instrumentation as well.

Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Fabio Estevam <festevam@gmail.com>
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:16 -07:00
Jan Kara
608873cacb printk: release lockbuf_lock before calling console_trylock_for_printk()
There's no reason to hold lockbuf_lock when entering
console_trylock_for_printk().

The first thing this function does is to call down_trylock(console_sem)
and if that fails it immediately unlocks lockbuf_lock.  So lockbuf_lock
isn't needed for that branch.  When down_trylock() succeeds, the rest of
console_trylock() is OK without lockbuf_lock (it is called without it
from other places), and the only remaining thing in
console_trylock_for_printk() is can_use_console() call.  For that call
console_sem is enough (it iterates all consoles and checks CON_ANYTIME
flag).

So we drop logbuf_lock before entering console_trylock_for_printk() which
simplifies the code.

[akpm@linux-foundation.org: fix have_callable_console() comment]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:16 -07:00
Jan Kara
ca1d432ad8 printk: remove outdated comment
Comment about interesting interlocking between lockbuf_lock and
console_sem is outdated.

It was added in 2002 by commit a880f45a48be during conversion of
console_lock to console_sem + lockbuf_lock.

At that time release_console_sem() (today's equivalent is
console_unlock()) was indeed using lockbuf_lock to avoid races between
trylock on console_sem in printk() and unlock of console_sem.  However
these days the interlocking is gone and the races are avoided by
rechecking logbuf state after releasing console_sem.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:16 -07:00
Petr Mladek
034633ccb2 printk: return really stored message length
I wonder if anyone uses printk return value but it is there and should be
counted correctly.

This patch modifies log_store() to return the number of really stored
bytes from the 'text' part.  Also it handles the return value in
vprintk_emit().

Note that log_store() is used also in cont_flush() but we could ignore the
return value there.  The function works with characters that were already
counted earlier.  In addition, the store could newer fail here because the
length of the printed text is limited by the "cont" buffer and "dict" is
NULL.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:16 -07:00
Petr Mladek
55bd53a4eb printk: shrink too long messages
We might want to print at least part of too long messages and add some
warning for debugging purpose.

The question is how long the shrunken message should be.  If we use the
whole buffer, it might get rotated too soon.  Let's try to use only 1/4 of
the buffer for now.

Also shrink the whole dictionary.  We do not want to parse it or break it
in the middle of some pair of values.  It would not cause any real harm
but still.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:16 -07:00
Petr Mladek
85c8704302 printk: split message size computation
We will want to recompute the message size when shrinking too long
messages.  Let's put the code into separate function.

The side effect of setting "pad_len" is not nice but it is worth removing
the code duplication.  Note that I will probably have one more usage for
this function when handling messages safe way in NMI context.

This patch does not change the existing behavior.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:16 -07:00
Petr Mladek
f40e4b9f70 printk: ignore too long messages
There was no check for too long messages.  The check for free space always
passed when first_seq and next_seq were equal.  Enough free space was not
guaranteed, though.

log_store() might be called to store messages up to 64kB + 64kB + 16B.
This is sum of maximal text_len, dict_len values, and the size of the
structure printk_log.

On the other hand, the minimal size for the main log buffer currently is
4kB and it is enforced only by Kconfig.

The good news is that the usage looks safe right now.  log_store() is
called only from vprintk_emit() and cont_flush().  Here the "text" part is
always passed via a static buffer and the length is limited to
LOG_LINE_MAX which is 1024.  The "dict" part is NULL in most cases.  The
only exceptions is when vprintk_emit() is called from printk_emit() and
dev_vprintk_emit().  But printk_emit() is currently used only in
devkmsg_writev() and here "dict" is NULL as well.  In dev_vprintk_emit(),
"dict" is limited by the static buffer "hdr" of the size 128 bytes.  It
meas that the current maximal printed text is 1024B + 128B + 16B and it
always fit the log buffer.

But it is only matter of time when someone calls printk_emit() with unsafe
parameters, especially the "dict" one.

This patch adds a check for the free space when the buffer is empty.  It
reuses the already existing log_has_space() function but it has to add an
extra parameter.  It defines whether the buffer is empty.  Note that the
same values of "first_idx" and "next_idx" might also mean that the buffer
is full.

If the buffer is empty, we must respect the current position of the
indexes.  We cannot reset them to the beginning of the buffer.  Otherwise,
the functions reading the buffer would get crazy.

The question is what to do when the message is too long.  This patch uses
the easiest solution and just ignores the problematic message.  Let's do
something better in a followup patch.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:16 -07:00
Petr Mladek
0a581694ab printk: split code for making free space in the log buffer
The check for free space in the log buffer always passes when "first_seq"
and "next_seq" are equal.  In theory, it might cause writing outside of
the log buffer.

Fortunately, the current usage looks safe because the used "text" and
"dict" buffers are quite limited.  See the second patch for more details.

Anyway, it is better to be on the safe side and add a check.  An easy
solution is done in the 2nd patch and it is improved in the 4th patch.

5th patch fixes the computation of the printed message length.

1st and 3rd patches just do some code refactoring to make the other
patches easier.

This patch (of 5):

There will be needed some fixes in the check for free space.  They will be
easier if the code is moved outside of the quite long log_store()
function.

This patch does not change the existing behavior.

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:16 -07:00
Kirill A. Shutemov
b300a4ea66 kernel/user.c: drop unused field 'files' from user_struct
Nobody seems uses it for a long time. Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:16 -07:00
Fabian Frederick
b51dbec68c kernel/hung_task.c: convert simple_strtoul to kstrtouint
sysctl_hung_task_panic has been changed to unsigned int.  use kstrtouint
instead of obsolete simple_strtoul

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Fabian Frederick
95583e4ab5 kernel/utsname_sysctl.c: replace obsolete __initcall by device_initcall
Also fixes checkpatch warnings on proc_dostring function parameters

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Fabian Frederick
616feab753 kernel/reboot.c: convert simple_strtoul to kstrtoint
Replace obsolete function.
kstrtoint is used as reboot_cpu is an integer.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Fabian Frederick
6c5a53c670 kernel/res_counter.c: replace simple_strtoull by kstrtoull
[akpm@linux-foundation.org: don't overwrite kstrtoull()'s errno]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Fabian Frederick
cac92ba74f kernel/tracepoint.c: kernel-doc fixes
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Fabian Frederick
cf25004069 kernel/stop_machine.c: kernel-doc warning fix
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Fabian Frederick
eaa1809b90 kernel/latencytop.c: convert seq_printf to seq_puts
This patch also fixes one function declaration over 80 characters.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Fabian Frederick
b9e5db6d2b kernel/exec_domain.c: code clean-up
Fix checkpatch warnings about EXPORT_SYMBOL and return()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Fabian Frederick
a6c8c6902c kernel/capability.c: code clean-up
- EXPORT_SYMBOL

- typo: unexpectidly->unexpectedly

- function prototype over 80 characters

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Fabian Frederick
462b29b856 kernel/backtracetest.c: replace no level printk by pr_info()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:14 -07:00
Fabian Frederick
84117da5b7 kernel/cpu.c: convert printk to pr_foo()
no level printk converted to pr_warn (if err)
no level printk converted to pr_info (disabling non-boot cpus)
Other printk converted to respective level.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:14 -07:00
Fabian Frederick
f6187769da sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL
sys_sgetmask and sys_ssetmask are obsolete system calls no longer
supported in libc.

This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
mode configuration.That option is enabled by default for those
architectures.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:14 -07:00
Mel Gorman
664eeddeef mm: page_alloc: use jump labels to avoid checking number_of_cpusets
If cpusets are not in use then we still check a global variable on every
page allocation.  Use jump labels to avoid the overhead.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:08 -07:00
Oleg Nesterov
39af1765f1 memcg: optimize the "Search everything else" loop in mm_update_next_owner()
for_each_process_thread() is sub-optimal. All threads share the same
->mm, we can swicth to the next process once we found a thread with
->mm != NULL and ->mm != mm.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Peter Chiang <pchiang@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:03 -07:00
Oleg Nesterov
f87fb599ae memcg: mm_update_next_owner() should skip kthreads
"Search through everything else" in mm_update_next_owner() can hit a
kthread which adopted this "mm" via use_mm(), it should not be used as
mm->owner.  Add the PF_KTHREAD check.

While at it, change this code to use for_each_process_thread() instead
of deprecated do_each_thread/while_each_thread.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Peter Chiang <pchiang@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:03 -07:00
Oleg Nesterov
f98bafa06a memcg: kill CONFIG_MM_OWNER
CONFIG_MM_OWNER makes no sense.  It is not user-selectable, it is only
selected by CONFIG_MEMCG automatically.  So we can kill this option in
init/Kconfig and do s/CONFIG_MM_OWNER/CONFIG_MEMCG/ globally.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:01 -07:00
Vladimir Davydov
52383431b3 mm: get rid of __GFP_KMEMCG
Currently to allocate a page that should be charged to kmemcg (e.g.
threadinfo), we pass __GFP_KMEMCG flag to the page allocator.  The page
allocated is then to be freed by free_memcg_kmem_pages.  Apart from
looking asymmetrical, this also requires intrusion to the general
allocation path.  So let's introduce separate functions that will
alloc/free pages charged to kmemcg.

The new functions are called alloc_kmem_pages and free_kmem_pages.  They
should be used when the caller actually would like to use kmalloc, but
has to fall back to the page allocator for the allocation is large.
They only differ from alloc_pages and free_pages in that besides
allocating or freeing pages they also charge them to the kmem resource
counter of the current memory cgroup.

[sfr@canb.auug.org.au: export kmalloc_order() to modules]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:56 -07:00
Tetsuo Handa
8fe6929cfd kthread: fix return value of kthread_create() upon SIGKILL.
Commit 786235eeba ("kthread: make kthread_create() killable") meant
for allowing kthread_create() to abort as soon as killed by the
OOM-killer.  But returning -ENOMEM is wrong if killed by SIGKILL from
userspace.  Change kthread_create() to return -EINTR upon SIGKILL.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:51 -07:00
Linus Torvalds
d09cc3659d Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core irq updates from Thomas Gleixner:
 "The irq department delivers:

   - Another tree wide update to get rid of the horrible create_irq
     interface along with its even more horrible variants.  That also
     gets rid of the last leftovers of the initial sparse irq hackery.
     arch/driver specific changes have been either acked or ignored.

   - A fix for the spurious interrupt detection logic with threaded
     interrupts.

   - A new ARM SoC interrupt controller

   - The usual pile of fixes and improvements all over the place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  Documentation: brcmstb-l2: Add Broadcom STB Level-2 interrupt controller binding
  irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller
  genirq: Improve documentation to match current implementation
  ARM: iop13xx: fix msi support with sparse IRQ
  genirq: Provide !SMP stub for irq_set_affinity_notifier()
  irqchip: armada-370-xp: Move the devicetree binding documentation
  irqchip: gic: Use mask field in GICC_IAR
  genirq: Remove dynamic_irq mess
  ia64: Use irq_init_desc
  genirq: Replace dynamic_irq_init/cleanup
  genirq: Remove irq_reserve_irq[s]
  genirq: Replace reserve_irqs in core code
  s390: Avoid call to irq_reserve_irqs()
  s390: Remove pointless arch_show_interrupts()
  s390: pci: Check return value of alloc_irq_desc() proper
  sh: intc: Remove pointless irq_reserve_irqs() invocation
  x86, irq: Remove pointless irq_reserve_irqs() call
  genirq: Make create/destroy_irq() ia64 private
  tile: Use SPARSE_IRQ
  tile: pci: Use irq_alloc/free_hwirq()
  ...
2014-06-04 15:59:13 -07:00
Linus Torvalds
82e627eb5e Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull timer core updates from Thomas Gleixner:
 "This time you get nothing really exciting:
   - A huge update to the sh* clocksource drivers
   - Support for two more ARM SoCs
   - Removal of the deprecated setup_sched_clock() API
   - The usual pile of fixlets all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  clocksource: Add Freescale FlexTimer Module (FTM) timer support
  ARM: dts: vf610: Add Freescale FlexTimer Module timer node.
  clocksource: ftm: Add FlexTimer Module (FTM) Timer devicetree Documentation
  clocksource: sh_tmu: Remove unnecessary OOM messages
  clocksource: sh_mtu2: Remove unnecessary OOM messages
  clocksource: sh_cmt: Remove unnecessary OOM messages
  clocksource: em_sti: Remove unnecessary OOM messages
  clocksource: dw_apb_timer_of: Do not trace read_sched_clock
  clocksource: Fix clocksource_mmio_readX_down
  clocksource: Fix type confusion for clocksource_mmio_readX_Y
  clocksource: sh_tmu: Fix channel IRQ retrieval in legacy case
  clocksource: qcom: Implement read_current_timer for udelay
  ntp: Make is_error_status() use its argument
  ntp: Convert simple_strtol to kstrtol
  timer_stats/doc: Fix /proc/timer_stats documentation
  sched_clock: Remove deprecated setup_sched_clock() API
  ARM: sun6i: a31: Add support for the High Speed Timers
  clocksource: sun5i: Add support for reset controller
  clocksource: efm32: use $vendor,$device scheme for compatible string
  KConfig: Vexpress: build the ARM_GLOBAL_TIMER with vexpress platform
  ...
2014-06-04 15:57:20 -07:00
Linus Torvalds
4dc4226f99 ACPI and power management updates for 3.16-rc1
- ACPICA update to upstream version 20140424.  That includes a
    number of fixes and improvements related to things like GPE
    handling, table loading, headers, memory mapping and unmapping,
    DSDT/SSDT overriding, and the Unload() operator.  The acpidump
    utility from upstream ACPICA is included too.  From Bob Moore,
    Lv Zheng, David Box, David Binderman, and Colin Ian King.
 
  - Fixes and cleanups related to ACPI video and backlight interfaces
    from Hans de Goede.  That includes blacklist entries for some new
    machines and using native backlight by default.
 
  - ACPI device enumeration changes to create platform devices
    rather than PNP devices for ACPI device objects with _HID by
    default.  PNP devices will still be created for the ACPI device
    object with device IDs corresponding to real PNP devices, so
    that change should not break things left and right, and we're
    expecting to see more and more ACPI-enumerated platform devices
    in the future.  From Zhang Rui and Rafael J Wysocki.
 
  - Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing
    it to handle system suspend/resume on Asus T100 correctly.
    From Heikki Krogerus and Rafael J Wysocki.
 
  - PM core update introducing a mechanism to allow runtime-suspended
    devices to stay suspended over system suspend/resume transitions
    if certain additional conditions related to coordination within
    device hierarchy are met.  Related PM documentation update and
    ACPI PM domain support for the new feature.  From Rafael J Wysocki.
 
  - Fixes and improvements related to the "freeze" sleep state. They
    affect several places including cpuidle, PM core, ACPI core, and
    the ACPI battery driver.  From Rafael J Wysocki and Zhang Rui.
 
  - Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
    Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.
 
  - Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
    Aggregator Device) drivers from Baoquan He, Manuel Schölling,
    Tony Camuso, and Toshi Kani.
 
  - System suspend/resume optimization in the ACPI battery driver from
    Lan Tianyu.
 
  - OPP (Operating Performance Points) subsystem updates from
    Chander Kashyap, Mark Brown, and Nishanth Menon.
 
  - cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
    Stratos Karafotis, and Viresh Kumar.
 
  - Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
    s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
    Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
    Viresh Kumar.
 
  - intel_pstate driver fixes and cleanups from Dirk Brandewie,
    Doug Smythies, and Stratos Karafotis.
 
  - Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.
 
  - Fix for the cpuidle menu governor from Chander Kashyap.
 
  - New ARM clps711x cpuidle driver from Alexander Shiyan.
 
  - Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
    Fabian Frederick, Pali Rohár, and Sebastian Capella.
 
  - Intel RAPL (Running Average Power Limit) driver updates from
    Jacob Pan.
 
  - PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.
 
  - devfreq core updates from Chanwoo Choi and Paul Bolle.
 
  - devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
    Bartlomiej Zolnierkiewicz.
 
  - turbostat tool fix from Jean Delvare.
 
  - cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
    and Thomas Renninger.
 
  - New ACPI ec_access.c tool for poking at the EC in a safe way
    from Thomas Renninger.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTjl16AAoJEILEb/54YlRxeKgP/RRQSV7lFtf582Dw/5M/iWOg
 qYeNtuYFLArEmJ7SpxHdKsU1ZRm3CahAS1j7grvQMQasUxTzoavMcSBNZefeaoNK
 d01LVNqcyKCZs3+izRezk5N1IY+AjdrOcqCdIk8rfgFnc6kOttYUrVcIzKuIKAvJ
 MsJ5s/uqP8G69FsAA3Ttdtr0HKiQhN4skSt424wntQRDeJNZPBs74mPKBGh8bxlO
 Zr/VCDibKQ2Z8jS7x+TzwZrOxgE1/9x0Cub6GAdTvAfS8A+utPwSkneUyopNqpQ+
 tJ5rz5R+HpmPMerizBuU+5s+tvjDPtH4/OZvOPSpYraQSFLOwx3hAm+a5k7fOGmc
 XWjXnXWT0i0V3iQkwrspTNjX1RgywbsHbmXrcWn192HResvMQ9zk2gH2ch6m8JhN
 yTV5V51dOZicpPuaTCvIkJpsV33p6vRz+EdPBiXoEdua5KKtOg8EnQ470dNaMR92
 3ZtWmIvSgGlyPyHlSHLfGXbPUwTYvDNV3aheIoXp9E6WY3WJN9J3WXm4EHKBNVaI
 H83kwuk1s92cgqh22H5Pcb0CmDcrbkUdP6hhsPS/aL80/EJMljRP2AYW1Y+l1LAf
 pzMLmekHFqQEDjFQltwGvFV/EjFeMHnqOgQONx9ygMaayCGGTYSDx3FbRDesf8t9
 qhoFcTPSxoo0XjrGrR6b
 =tpdF
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm into next

Pull ACPI and power management updates from Rafael Wysocki:
 "ACPICA is the leader this time (63 commits), followed by cpufreq (28
  commits), devfreq (15 commits), system suspend/hibernation (12
  commits), ACPI video and ACPI device enumeration (10 commits each).

  We have no major new features this time, but there are a few
  significant changes of how things work.  The most visible one will
  probably be that we are now going to create platform devices rather
  than PNP devices by default for ACPI device objects with _HID.  That
  was long overdue and will be really necessary to be able to use the
  same drivers for the same hardware blocks on ACPI and DT-based systems
  going forward.  We're not expecting fallout from this one (as usual),
  but it's something to watch nevertheless.

  The second change having a chance to be visible is that ACPI video
  will now default to using native backlight rather than the ACPI
  backlight interface which should generally help systems with broken
  Win8 BIOSes.  We're hoping that all problems with the native backlight
  handling that we had previously have been addressed and we are in a
  good enough shape to flip the default, but this change should be easy
  enough to revert if need be.

  In addition to that, the system suspend core has a new mechanism to
  allow runtime-suspended devices to stay suspended throughout system
  suspend/resume transitions if some extra conditions are met
  (generally, they are related to coordination within device hierarchy).
  However, enabling this feature requires cooperation from the bus type
  layer and for now it has only been implemented for the ACPI PM domain
  (used by ACPI-enumerated platform devices mostly today).

  Also, the acpidump utility that was previously shipped as a separate
  tool will now be provided by the upstream ACPICA along with the rest
  of ACPICA code, which will allow it to be more up to date and better
  supported, and we have one new cpuidle driver (ARM clps711x).

  The rest is improvements related to certain specific use cases,
  cleanups and fixes all over the place.

  Specifics:

   - ACPICA update to upstream version 20140424.  That includes a number
     of fixes and improvements related to things like GPE handling,
     table loading, headers, memory mapping and unmapping, DSDT/SSDT
     overriding, and the Unload() operator.  The acpidump utility from
     upstream ACPICA is included too.  From Bob Moore, Lv Zheng, David
     Box, David Binderman, and Colin Ian King.

   - Fixes and cleanups related to ACPI video and backlight interfaces
     from Hans de Goede.  That includes blacklist entries for some new
     machines and using native backlight by default.

   - ACPI device enumeration changes to create platform devices rather
     than PNP devices for ACPI device objects with _HID by default.  PNP
     devices will still be created for the ACPI device object with
     device IDs corresponding to real PNP devices, so that change should
     not break things left and right, and we're expecting to see more
     and more ACPI-enumerated platform devices in the future.  From
     Zhang Rui and Rafael J Wysocki.

   - Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing it
     to handle system suspend/resume on Asus T100 correctly.  From
     Heikki Krogerus and Rafael J Wysocki.

   - PM core update introducing a mechanism to allow runtime-suspended
     devices to stay suspended over system suspend/resume transitions if
     certain additional conditions related to coordination within device
     hierarchy are met.  Related PM documentation update and ACPI PM
     domain support for the new feature.  From Rafael J Wysocki.

   - Fixes and improvements related to the "freeze" sleep state.  They
     affect several places including cpuidle, PM core, ACPI core, and
     the ACPI battery driver.  From Rafael J Wysocki and Zhang Rui.

   - Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
     Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.

   - Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
     Aggregator Device) drivers from Baoquan He, Manuel Schölling, Tony
     Camuso, and Toshi Kani.

   - System suspend/resume optimization in the ACPI battery driver from
     Lan Tianyu.

   - OPP (Operating Performance Points) subsystem updates from Chander
     Kashyap, Mark Brown, and Nishanth Menon.

   - cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
     Stratos Karafotis, and Viresh Kumar.

   - Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
     s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
     Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
     Viresh Kumar.

   - intel_pstate driver fixes and cleanups from Dirk Brandewie, Doug
     Smythies, and Stratos Karafotis.

   - Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.

   - Fix for the cpuidle menu governor from Chander Kashyap.

   - New ARM clps711x cpuidle driver from Alexander Shiyan.

   - Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
     Fabian Frederick, Pali Rohár, and Sebastian Capella.

   - Intel RAPL (Running Average Power Limit) driver updates from Jacob
     Pan.

   - PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.

   - devfreq core updates from Chanwoo Choi and Paul Bolle.

   - devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
     Bartlomiej Zolnierkiewicz.

   - turbostat tool fix from Jean Delvare.

   - cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
     and Thomas Renninger.

   - New ACPI ec_access.c tool for poking at the EC in a safe way from
     Thomas Renninger"

* tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (187 commits)
  ACPICA: Namespace: Remove _PRP method support.
  intel_pstate: Improve initial busy calculation
  intel_pstate: add sample time scaling
  intel_pstate: Correct rounding in busy calculation
  intel_pstate: Remove C0 tracking
  PM / hibernate: fixed typo in comment
  ACPI: Fix x86 regression related to early mapping size limitation
  ACPICA: Tables: Add mechanism to control early table checksum verification.
  ACPI / scan: use platform bus type by default for _HID enumeration
  ACPI / scan: always register ACPI LPSS scan handler
  ACPI / scan: always register memory hotplug scan handler
  ACPI / scan: always register container scan handler
  ACPI / scan: Change the meaning of missing .attach() in scan handlers
  ACPI / scan: introduce platform_id device PNP type flag
  ACPI / scan: drop unsupported serial IDs from PNP ACPI scan handler ID list
  ACPI / scan: drop IDs that do not comply with the ACPI PNP ID rule
  ACPI / PNP: use device ID list for PNPACPI device enumeration
  ACPI / scan: .match() callback for ACPI scan handlers
  ACPI / battery: wakeup the system only when necessary
  power_supply: allow power supply devices registered w/o wakeup source
  ...
2014-06-04 08:57:16 -07:00
Rafael J. Wysocki
cd0c5bd391 Merge branches 'pnp', 'powercap', 'pm-runtime' and 'pm-opp'
* pnp:
  MAINTAINERS: Remove Bjorn Helgaas as PNP maintainer
  PNP / resources: remove positive test on unsigned values

* powercap:
  powercap / RAPL: add new CPU IDs
  powercap / RAPL: further relax energy counter checks

* pm-runtime:
  PM / runtime: Update documentation to reflect the current code flow

* pm-opp:
  PM / OPP: discard duplicate OPPs
  PM / OPP: Make OPP invisible to users in Kconfig
  PM / OPP: fix incorrect OPP count handling in of_init_opp_table
2014-06-03 23:13:00 +02:00
Rafael J. Wysocki
a392f7d4af Merge branch 'acpi-pm'
* acpi-pm:
  ACPI / PM: Export rest of the subsys PM callbacks
  ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend
  ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state
  ACPI / PM: Export acpi_target_system_state() to modules
2014-06-03 23:11:42 +02:00
Rafael J. Wysocki
ee7f9d7c7c Merge branch 'pm-sleep'
* pm-sleep:
  PM / hibernate: fixed typo in comment
  PM / sleep: unregister wakeup source when disabling device wakeup
  PM / sleep: Introduce command line argument for sleep state enumeration
  PM / sleep: Use valid_state() for platform-dependent sleep states only
  PM / sleep: Add state field to pm_states[] entries
  PM / sleep: Update device PM documentation to cover direct_complete
  PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily
  PM / hibernate: Fix memory corruption in resumedelay_setup()
  PM / hibernate: convert simple_strtoul to kstrtoul
  PM / hibernate: Documentation: Fix script for unswapping
  PM / hibernate: no kernel_power_off when pm_power_off NULL
  PM / hibernate: use unsigned local variables in swsusp_show_speed()
2014-06-03 23:10:23 +02:00
Rafael J. Wysocki
97b80e685f Merge branch 'pm-cpuidle'
* pm-cpuidle:
  PM / suspend: Always use deepest C-state in the "freeze" sleep state
  cpuidle / menu: move repeated correction factor check to init
  cpuidle / menu: Return (-1) if there are no suitable states
  cpuidle: Combine cpuidle_enabled() with cpuidle_select()
  ARM: clps711x: Add cpuidle driver
2014-06-03 23:10:15 +02:00
Linus Torvalds
c84a1e32ee Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull scheduler updates from Ingo Molnar:
 "The main scheduling related changes in this cycle were:

   - various sched/numa updates, for better performance

   - tree wide cleanup of open coded nice levels

   - nohz fix related to rq->nr_running use

   - cpuidle changes and continued consolidation to improve the
     kernel/sched/idle.c high level idle scheduling logic.  As part of
     this effort I pulled cpuidle driver changes from Rafael as well.

   - standardized idle polling amongst architectures

   - continued work on preparing better power/energy aware scheduling

   - sched/rt updates

   - misc fixlets and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  sched/numa: Decay ->wakee_flips instead of zeroing
  sched/numa: Update migrate_improves/degrades_locality()
  sched/numa: Allow task switch if load imbalance improves
  sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
  sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
  sched: Initialize rq->age_stamp on processor start
  sched, nohz: Change rq->nr_running to always use wrappers
  sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
  sched: Use clamp() and clamp_val() to make sys_nice() more readable
  sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
  sched/numa: Fix initialization of sched_domain_topology for NUMA
  sched: Call select_idle_sibling() when not affine_sd
  sched: Simplify return logic in sched_read_attr()
  sched: Simplify return logic in sched_copy_attr()
  sched: Fix exec_start/task_hot on migrated tasks
  arm64: Remove TIF_POLLING_NRFLAG
  metag: Remove TIF_POLLING_NRFLAG
  sched/idle: Make cpuidle_idle_call() void
  sched/idle: Reflow cpuidle_idle_call()
  sched/idle: Delay clearing the polling bit
  ...
2014-06-03 14:00:15 -07:00
Linus Torvalds
3d521f9151 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull perf updates from Ingo Molnar:
 "The tooling changes maintained by Jiri Olsa until Arnaldo is on
  vacation:

  User visible changes:
   - Add -F option for specifying output fields (Namhyung Kim)
   - Propagate exit status of a command line workload for record command
     (Namhyung Kim)
   - Use tid for finding thread (Namhyung Kim)
   - Clarify the output of perf sched map plus small sched command
     fixes (Dongsheng Yang)
   - Wire up perf_regs and unwind support for ARM64 (Jean Pihet)
   - Factor hists statistics counts processing which in turn also fixes
     several bugs in TUI report command (Namhyung Kim)
   - Add --percentage option to control absolute/relative percentage
     output (Namhyung Kim)
   - Add --list-cmds to 'kmem', 'mem', 'lock' and 'sched', for use by
     completion scripts (Ramkumar Ramachandra)

  Development/infrastructure changes and fixes:
   - Android related fixes for pager and map dso resolving (Michael
     Lentine)
   - Add libdw DWARF post unwind support for ARM (Jean Pihet)
   - Consolidate types.h for ARM and ARM64 (Jean Pihet)
   - Fix possible null pointer dereference in session.c (Masanari Iida)
   - Cleanup, remove unused variables in map_switch_event() (Dongsheng
     Yang)
   - Remove nr_state_machine_bugs in perf latency (Dongsheng Yang)
   - Remove usage of trace_sched_wakeup(.success) (Peter Zijlstra)
   - Cleanups for perf.h header (Jiri Olsa)
   - Consolidate types.h and export.h within tools (Borislav Petkov)
   - Move u64_swap union to its single user's header, evsel.h (Borislav
     Petkov)
   - Fix for s390 to properly parse tracepoints plus test code
     (Alexander Yarygin)
   - Handle EINTR error for readn/writen (Namhyung Kim)
   - Add a test case for hists filtering (Namhyung Kim)
   - Share map_groups among threads of the same group (Arnaldo Carvalho
     de Melo, Jiri Olsa)
   - Making some code (cpu node map and report parse callchain callback)
     global to be usable by upcomming changes (Don Zickus)
   - Fix pmu object compilation error (Jiri Olsa)

  Kernel side changes:
   - intrusive uprobes fixes from Oleg Nesterov.  Since the interface is
     admin-only, and the bug only affects user-space ("any probed
     jmp/call can kill the application"), we queued these fixes via the
     development tree, as a special exception.
   - more fuzzer motivated race fixes and related refactoring and
     robustization.
   - allow PMU drivers to be built as modules.  (No actual module yet,
     because the x86 Intel uncore module wasn't ready in time for this)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  perf tools: Add automatic remapping of Android libraries
  perf tools: Add cat as fallback pager
  perf tests: Add a testcase for histogram output sorting
  perf tests: Factor out print_hists_*()
  perf tools: Introduce reset_output_field()
  perf tools: Get rid of obsolete hist_entry__sort_list
  perf hists: Reset width of output fields with header length
  perf tools: Skip elided sort entries
  perf top: Add --fields option to specify output fields
  perf report/tui: Fix a bug when --fields/sort is given
  perf tools: Add ->sort() member to struct sort_entry
  perf report: Add -F option to specify output fields
  perf tools: Call perf_hpp__init() before setting up GUI browsers
  perf tools: Consolidate management of default sort orders
  perf tools: Allow hpp fields to be sort keys
  perf ui: Get rid of callback from __hpp__fmt()
  perf tools: Consolidate output field handling to hpp format routines
  perf tools: Use hpp formats to sort final output
  perf tools: Support event grouping in hpp ->sort()
  perf tools: Use hpp formats to sort hist entries
  ...
2014-06-03 13:18:00 -07:00
Linus Torvalds
776edb5931 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - reduced/streamlined smp_mb__*() interface that allows more usecases
     and makes the existing ones less buggy, especially in rarer
     architectures

   - add rwsem implementation comments

   - bump up lockdep limits"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  rwsem: Add comments to explain the meaning of the rwsem's count field
  lockdep: Increase static allocations
  arch: Mass conversion of smp_mb__*()
  arch,doc: Convert smp_mb__*()
  arch,xtensa: Convert smp_mb__*()
  arch,x86: Convert smp_mb__*()
  arch,tile: Convert smp_mb__*()
  arch,sparc: Convert smp_mb__*()
  arch,sh: Convert smp_mb__*()
  arch,score: Convert smp_mb__*()
  arch,s390: Convert smp_mb__*()
  arch,powerpc: Convert smp_mb__*()
  arch,parisc: Convert smp_mb__*()
  arch,openrisc: Convert smp_mb__*()
  arch,mn10300: Convert smp_mb__*()
  arch,mips: Convert smp_mb__*()
  arch,metag: Convert smp_mb__*()
  arch,m68k: Convert smp_mb__*()
  arch,m32r: Convert smp_mb__*()
  arch,ia64: Convert smp_mb__*()
  ...
2014-06-03 12:57:53 -07:00
Linus Torvalds
59a3d4c363 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull RCU changes from Ingo Molnar:
 "The main RCU changes in this cycle were:

   - RCU torture-test changes.

   - variable-name renaming cleanup.

   - update RCU documentation.

   - miscellaneous fixes.

   - patch to suppress RCU stall warnings while sysrq requests are being
     processed"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
  rcu: Provide API to suppress stall warnings while sysrc runs
  rcu: Variable name changed in tree_plugin.h and used in tree.c
  torture: Remove unused definition
  torture: Remove __init from torture_init_begin/end
  torture: Check for multiple concurrent torture tests
  locktorture: Remove reference to nonexistent Kconfig parameter
  rcutorture: Run rcu_torture_writer at normal priority
  rcutorture: Note diffs from git commits
  rcutorture: Add missing destroy_timer_on_stack()
  rcutorture: Explicitly test synchronous grace-period primitives
  rcutorture:  Add tests for get_state_synchronize_rcu()
  rcutorture: Test RCU-sched primitives in TREE_PREEMPT_RCU kernels
  torture: Use elapsed time to detect hangs
  rcutorture: Check for rcu_torture_fqs creation errors
  torture: Better summary diagnostics for build failures
  torture: Notice if an all-zero cpumask is passed inside a critical section
  rcutorture: Make rcu_torture_reader() use cond_resched()
  sched,rcu: Make cond_resched() report RCU quiescent states
  percpu: Fix raw_cpu_inc_return()
  rcutorture: Export RCU grace-period kthread wait state to rcutorture
  ...
2014-06-03 12:35:05 -07:00
Linus Torvalds
49eb7b0750 TTY/Serial driver patches for 3.16-rc1
Here is the big tty / serial driver pull request for 3.16-rc1.
 
 A variety of different serial driver fixes and updates and additions,
 nothing huge, and no real major core tty changes at all.
 
 All have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlONXgoACgkQMUfUDdst+ymdSwCgwL0xmWjFYr/UbJ4LslOZ29Q4
 BFQAoKyYe9LsfEyodBPabxJjKUtj1htz
 =ZGSN
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty into next

Pull tty/serial driver updates from Greg KH:
 "Here is the big tty / serial driver pull request for 3.16-rc1.

  A variety of different serial driver fixes and updates and additions,
  nothing huge, and no real major core tty changes at all.

  All have been in linux-next for a while"

* tag 'tty-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (84 commits)
  Revert "serial: imx: remove the DMA wait queue"
  serial: kgdb_nmi: Improve console integration with KDB I/O
  serial: kgdb_nmi: Switch from tasklets to real timers
  serial: kgdb_nmi: Use container_of() to locate private data
  serial: cpm_uart: No LF conversion in put_poll_char()
  serial: sirf: Fix compilation failure
  console: Remove superfluous readonly check
  console: Use explicit pointer type for vc_uni_pagedir* fields
  vgacon: Fix & cleanup refcounting
  ARM: tty: Move HVC DCC assembly to arch/arm
  tty/hvc/hvc_console: Fix wakeup of HVC thread on hvc_kick()
  drivers/tty/n_hdlc.c: replace kmalloc/memset by kzalloc
  vt: emulate 8- and 24-bit colour codes.
  printk/of_serial: fix serial console cessation part way through boot.
  serial: 8250_dma: check the result of TX buffer mapping
  serial: uart: add hw flow control support configuration
  tty/serial: at91: add interrupts for modem control lines
  tty/serial: at91: use mctrl_gpio helpers
  tty/serial: Add GPIOLIB helpers for controlling modem lines
  ARM: at91: gpio: implement get_direction
  ...
2014-06-03 09:01:02 -07:00
Jianyu Zhan
c9482a5bdc kernfs: move the last knowledge of sysfs out from kernfs
There is still one residue of sysfs remaining: the sb_magic
SYSFS_MAGIC. However this should be kernfs user specific,
so this patch moves it out. Kerrnfs user should specify their
magic number while mouting.

Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-03 08:11:18 -07:00
Linus Torvalds
5da77761e6 Driver core / kernfs pull request for 3.16-rc1
Here is the "big" pull request for 3.16-rc1.
 Not a lot of changes here, some kernfs work, a revert of a very old
 driver core change that ended up cauing some memory leaks on driver
 probe error paths, and other minor things.
 
 As was pointed out earlier today, one commit here,
 26fc9cd200 (kernfs: move the last
 knowledge of sysfs out from kernfs) is also needed in your 3.15-final
 branch as well.  If you could cherry-pick it there, it would be most
 appreciated by Andy Lutomirski to prevent a regression there.
 
 All of these have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlONV9YACgkQMUfUDdst+yn0sQCfWWYg1oVXyu6f0uJjYbVBFkpD
 UHgAoJxxfwTZJq/xYrnk6+RqUowIsUlh
 =ojAS
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into next

Pull driver core / kernfs changes from Greg KH:
 "Here is the "big" pull request for 3.16-rc1.

  Not a lot of changes here, some kernfs work, a revert of a very old
  driver core change that ended up cauing some memory leaks on driver
  probe error paths, and other minor things.

  As was pointed out earlier today, one commit here, 26fc9cd200
  ("kernfs: move the last knowledge of sysfs out from kernfs") is also
  needed in your 3.15-final branch as well.  If you could cherry-pick it
  there, it would be most appreciated by Andy Lutomirski to prevent a
  regression there.

  All of these have been in linux-next for a while"

* tag 'driver-core-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  crypto/nx/nx-842: dev_set_drvdata can no longer fail
  kernfs: move the last knowledge of sysfs out from kernfs
  sysfs: fix attribute_group bin file path on removal
  sysfs.h: don't return a void-valued expression in sysfs_remove_file
  init.h: Update initcall_sync variants to fix build errors
  driver core: Inline dev_set/get_drvdata
  driver core: dev_get_drvdata: Don't check for NULL dev
  driver core: dev_set_drvdata returns void
  driver core: dev_set_drvdata can no longer fail
  driver core: Move driver_data back to struct device
  lib/devres.c: fix checkpatch warnings
  lib/devres.c: use dev in devm_request_and_ioremap
  kobject: Make support for uevent_helper optional.
  kernfs: make kernfs_notify() trigger inotify events too
  kernfs: implement kernfs_root->supers list
2014-06-03 08:07:41 -07:00
Linus Torvalds
425553209b PCI changes for the v3.16 merge window:
Enumeration
     - Notify driver before and after device reset (Keith Busch)
     - Use reset notification in NVMe (Keith Busch)
 
   NUMA
     - Warn if we have to guess host bridge node information (Myron Stowe)
     - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee Suthikulpanit)
     - Clean up and mark early_root_info_init() as deprecated (Suravee Suthikulpanit)
 
   Driver binding
     - Add "driver_override" for force specific binding (Alex Williamson)
     - Fail "new_id" addition for devices we already know about (Bandan Das)
 
   Resource management
     - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
     - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
     - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
     - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
     - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
     - Don't convert BAR address to resource if dma_addr_t is too small (Bjorn Helgaas)
     - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
     - Don't print anything while decoding is disabled (Bjorn Helgaas)
     - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
     - Add resource allocation comments (Bjorn Helgaas)
     - Restrict 64-bit prefetchable bridge windows to 64-bit resources (Yinghai Lu)
     - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)
 
   PCI device hotplug
     - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
     - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
     - Fix rphahp endianess issues (Laurent Dufour)
     - Acknowledge spurious "cmd completed" event (Rajat Jain)
     - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
     - Fix cpqphp possible NULL dereference (Rickard Strandqvist)
 
   MSI
     - Replace pci_enable_msi_block() by pci_enable_msi_exact() (Alexander Gordeev)
     - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
     - Simplify populate_msi_sysfs() (Jan Beulich)
 
   Virtualization
     - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
     - Mark RTL8110SC INTx masking as broken (Alex Williamson)
 
   Generic host bridge driver
     - Add generic PCI host controller driver (Will Deacon)
 
   Freescale i.MX6
     - Use new clock names (Lucas Stach)
     - Drop old IRQ mapping (Lucas Stach)
     - Remove optional (and unused) IRQs (Lucas Stach)
     - Add support for MSI (Lucas Stach)
     - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Renesas R-Car
     - Add gen2 device tree support (Ben Dooks)
     - Use new OF interrupt mapping when possible (Lucas Stach)
     - Add PCIe driver (Phil Edworthy)
     - Add PCIe MSI support (Phil Edworthy)
     - Add PCIe device tree bindings (Phil Edworthy)
 
   Samsung Exynos
     - Remove unnecessary OOM messages (Jingoo Han)
     - Fix add_pcie_port() section mismatch warning (Sachin Kamat)
 
   Synopsys DesignWare
     - Make MSI ISR shared IRQ aware (Lucas Stach)
 
   Miscellaneous
     - Check for broken config space aliasing (Alex Williamson)
     - Update email address (Ben Hutchings)
     - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
     - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
     - Remove unnecessary __ref annotations (Bjorn Helgaas)
     - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns (Bjorn Helgaas)
     - Fix use of uninitialized MPS value (Bjorn Helgaas)
     - Tidy x86/gart messages (Bjorn Helgaas)
     - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
     - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
     - Remove unused serial device IDs (Jean Delvare)
     - Use designated initialization in PCI_VDEVICE (Mark Rustad)
     - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
     - Configure MPS on ARM (Murali Karicheri)
     - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
     - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
     - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
     - Remove pcibios_add_platform_entries() (Sebastian Ott)
     - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
     - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
     - Add and use new pci_is_bridge() interface (Yijing Wang)
     - Make pci_bus_add_device() void (Yijing Wang)
 
   DMA API
     - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
     - Fix typos in docs (Emilio López)
     - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
     - Change dma_declare_coherent_memory() CPU address to phys_addr_t (Bjorn Helgaas)
     - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory() (Bjorn Helgaas)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjMaQAAoJEFmIoMA60/r8XncQAKX7cD6btXCZnrcYo7inseyp
 3rwOlrsNkWyHqSj/RqqzE1NY6L1h5G2uliI6xg1SKenuHPDcosm5d8FYO0ORKiUs
 xrqBkmZJHXN7fck//tJwsTXiYh5u42RO8QWbvZVr5UqXe40LyaMHMh9Y7VarrU/o
 sM2ADzFKagv1qMQ13nmYxqT+Zl+CqpimyLP+ep6Nfqxi6ils+KJ6b9SKYqrpqE6t
 Mcq2K5ShqU5SaYub1JIXLcQ9XylID+t1M9+cwixcs7a87HbJiktfkGqQvQJoUIuK
 Q5U+abcIGk4vfOnDCctSnoRyrcbTAZ/vqfo0vpX22TokESjwrD8hFOX5HPOFtD+4
 wIDbYurW/8oBhLRaJ0uTPzSH8bXjXTynAwxHZgIrEur5908eECKQ/WiFCxyrovvv
 r4ThAN0FaobllEr0XOFESOzDNSt/ME00WWI7+puAJ/KJkFEtcXt9othLmLmvLz8H
 2GWXrm/aOR0WUO7foGUxI3bXYlDN6NbSKpfuZsLAi2VAyJJ6L6yVSo/fT0X07e3z
 qRy9LOohuiwIKv/I4F2SEq2REfGGsnkrJBoeQi/oBZDcBy1Lsi7P9LWIERhLQEM+
 Hm+30lC/f326nI3hoyThj2k2xxZOQzCIvrt658xP4qd9Zfe1bvCH58FF8K62CoOd
 p8XAf7Sl6v6YUodUrT/t
 =km55
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into next

Pull PCI changes from Bjorn Helgaas:
 "Enumeration
    - Notify driver before and after device reset (Keith Busch)
    - Use reset notification in NVMe (Keith Busch)

  NUMA
    - Warn if we have to guess host bridge node information (Myron Stowe)
    - Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee
      Suthikulpanit)
    - Clean up and mark early_root_info_init() as deprecated (Suravee
      Suthikulpanit)

  Driver binding
    - Add "driver_override" for force specific binding (Alex Williamson)
    - Fail "new_id" addition for devices we already know about (Bandan
      Das)

  Resource management
    - Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
    - Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
    - Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
    - Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
    - Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
    - Don't convert BAR address to resource if dma_addr_t is too small
      (Bjorn Helgaas)
    - Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
    - Don't print anything while decoding is disabled (Bjorn Helgaas)
    - Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
    - Add resource allocation comments (Bjorn Helgaas)
    - Restrict 64-bit prefetchable bridge windows to 64-bit resources
      (Yinghai Lu)
    - Assign i82875p_edac PCI resources before adding device (Yinghai Lu)

  PCI device hotplug
    - Remove unnecessary "dev->bus" test (Bjorn Helgaas)
    - Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
    - Fix rphahp endianess issues (Laurent Dufour)
    - Acknowledge spurious "cmd completed" event (Rajat Jain)
    - Allow hotplug service drivers to operate in polling mode (Rajat Jain)
    - Fix cpqphp possible NULL dereference (Rickard Strandqvist)

  MSI
    - Replace pci_enable_msi_block() by pci_enable_msi_exact()
      (Alexander Gordeev)
    - Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
    - Simplify populate_msi_sysfs() (Jan Beulich)

  Virtualization
    - Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
    - Mark RTL8110SC INTx masking as broken (Alex Williamson)

  Generic host bridge driver
    - Add generic PCI host controller driver (Will Deacon)

  Freescale i.MX6
    - Use new clock names (Lucas Stach)
    - Drop old IRQ mapping (Lucas Stach)
    - Remove optional (and unused) IRQs (Lucas Stach)
    - Add support for MSI (Lucas Stach)
    - Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)

  Renesas R-Car
    - Add gen2 device tree support (Ben Dooks)
    - Use new OF interrupt mapping when possible (Lucas Stach)
    - Add PCIe driver (Phil Edworthy)
    - Add PCIe MSI support (Phil Edworthy)
    - Add PCIe device tree bindings (Phil Edworthy)

  Samsung Exynos
    - Remove unnecessary OOM messages (Jingoo Han)
    - Fix add_pcie_port() section mismatch warning (Sachin Kamat)

  Synopsys DesignWare
    - Make MSI ISR shared IRQ aware (Lucas Stach)

  Miscellaneous
    - Check for broken config space aliasing (Alex Williamson)
    - Update email address (Ben Hutchings)
    - Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
    - Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
    - Remove unnecessary __ref annotations (Bjorn Helgaas)
    - Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns
      (Bjorn Helgaas)
    - Fix use of uninitialized MPS value (Bjorn Helgaas)
    - Tidy x86/gart messages (Bjorn Helgaas)
    - Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
    - Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
    - Remove unused serial device IDs (Jean Delvare)
    - Use designated initialization in PCI_VDEVICE (Mark Rustad)
    - Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
    - Configure MPS on ARM (Murali Karicheri)
    - Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
    - Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
    - Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
    - Remove pcibios_add_platform_entries() (Sebastian Ott)
    - Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
    - Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
    - Add and use new pci_is_bridge() interface (Yijing Wang)
    - Make pci_bus_add_device() void (Yijing Wang)

  DMA API
    - Clarify physical/bus address distinction in docs (Bjorn Helgaas)
    - Fix typos in docs (Emilio López)
    - Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
    - Change dma_declare_coherent_memory() CPU address to phys_addr_t
      (Bjorn Helgaas)
    - Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
      (Bjorn Helgaas)"

* tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits)
  MAINTAINERS: Add generic PCI host controller driver
  PCI: generic: Add generic PCI host controller driver
  PCI: imx6: Add support for MSI
  PCI: designware: Make MSI ISR shared IRQ aware
  PCI: imx6: Remove optional (and unused) IRQs
  PCI: imx6: Drop old IRQ mapping
  PCI: imx6: Use new clock names
  i82875p_edac: Assign PCI resources before adding device
  ARM/PCI: Call pcie_bus_configure_settings() to set MPS
  PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning
  PCI: Make pci_bus_add_device() void
  PCI: exynos: Fix add_pcie_port() section mismatch warning
  PCI: Introduce new device binding path using pci_dev.driver_override
  PCI: rcar: Add gen2 device tree support
  PCI: cpqphp: Fix possible null pointer dereference
  PCI: rcar: Add R-Car PCIe device tree bindings
  PCI: rcar: Add MSI support for PCIe
  PCI: rcar: Add Renesas R-Car PCIe driver
  PCI: Fix return value from pci_user_{read,write}_config_*()
  PCI: exynos: Remove unnecessary OOM messages
  ...
2014-06-02 12:15:19 -07:00
Linus Torvalds
32439700fe Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Various fixlets, mostly related to the (root-only) SCHED_DEADLINE
  policy, but also a hotplug bug fix and a fix for a NR_CPUS related
  overallocation bug causing a suspend/resume regression"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix hotplug vs. set_cpus_allowed_ptr()
  sched/cpupri: Replace NR_CPUS arrays
  sched/deadline: Replace NR_CPUS arrays
  sched/deadline: Restrict user params max value to 2^63 ns
  sched/deadline: Change sched_getparam() behaviour vs SCHED_DEADLINE
  sched: Disallow sched_attr::sched_policy < 0
  sched: Make sched_setattr() correctly return -EFBIG
2014-06-01 18:26:59 -07:00
Niv Yehezkel
057b0a7518 PM / hibernate: fixed typo in comment
Fix a trivial comment typo (s/mam/map) in kernel/power/swap.c.

Signed-off-by: Niv Yehezkel <executerx@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-01 00:23:07 +02:00
Linus Torvalds
a4bf79eb6a Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core futex/rtmutex fixes from Thomas Gleixner:
 "Three fixlets for long standing issues in the futex/rtmutex code
  unearthed by Dave Jones syscall fuzzer:

   - Add missing early deadlock detection checks in the futex code
   - Prevent user space from attaching a futex to kernel threads
   - Make the deadlock detector of rtmutex work again

  Looks large, but is more comments than code change"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rtmutex: Fix deadlock detector for real
  futex: Prevent attaching to kernel threads
  futex: Add another early deadlock detection check
2014-05-31 09:47:55 -07:00
Stephen Chivers
7fa21dd8bd printk/of_serial: fix serial console cessation part way through boot.
Commit 5f5c9ae56c
"serial_core: Unregister console in uart_remove_one_port()"
fixed a crash where a serial port was removed but
not deregistered as a console.

There is a side effect of that commit for platforms having serial consoles
and of_serial configured (CONFIG_SERIAL_OF_PLATFORM). The serial console
is disabled midway through the boot process.

This cessation of the serial console affects PowerPC computers
such as the MVME5100 and SAM440EP.

The sequence is:

	bootconsole [udbg0] enabled
	....
	serial8250/16550 driver initialises and registers its UARTS,
	one of these is the serial console.
	console [ttyS0] enabled
	....
	of_serial probes "platform" devices, registering them as it goes.
	One of these is the serial console.
	console [ttyS0] disabled.

The disabling of the serial console is due to:

	a.  unregister_console in printk not clearing the
	    CONS_ENABLED bit in the console flags,
	    even though it has announced that the console is disabled; and

	b.  of_platform_serial_probe in of_serial not setting the port type
	    before it registers with serial8250_register_8250_port.

This patch ensures that the serial console is re-enabled when of_serial
registers a serial port that corresponds to the designated console.

Signed-off-by: Stephen Chivers <schivers@csc.com>
Tested-by: Stephen Chivers <schivers@csc.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [unregister_console]
Cc: stable <stable@vger.kernel.org> # 3.15

===
The above failure was identified in Linux-3.15-rc2.

Tested using MVME5100 and SAM440EP PowerPC computers with
kernels built from Linux-3.15-rc5 and tty-next.

The continued operation of the serial console is vital for computers
such as the MVME5100 as that Single Board Computer does not
have any grapical/display hardware.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-28 13:07:27 -07:00
Thomas Gleixner
397335f004 rtmutex: Fix deadlock detector for real
The current deadlock detection logic does not work reliably due to the
following early exit path:

	/*
	 * Drop out, when the task has no waiters. Note,
	 * top_waiter can be NULL, when we are in the deboosting
	 * mode!
	 */
	if (top_waiter && (!task_has_pi_waiters(task) ||
			   top_waiter != task_top_pi_waiter(task)))
		goto out_unlock_pi;

So this not only exits when the task has no waiters, it also exits
unconditionally when the current waiter is not the top priority waiter
of the task.

So in a nested locking scenario, it might abort the lock chain walk
and therefor miss a potential deadlock.

Simple fix: Continue the chain walk, when deadlock detection is
enabled.

We also avoid the whole enqueue, if we detect the deadlock right away
(A-A). It's an optimization, but also prevents that another waiter who
comes in after the detection and before the task has undone the damage
observes the situation and detects the deadlock and returns
-EDEADLOCK, which is wrong as the other task is not in a deadlock
situation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-28 17:28:13 +02:00
Linus Torvalds
9e3d633178 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull two powerpc fixes from Ben Herrenschmidt:
 "Here's a pair of powerpc fixes for 3.15 which are also going to
  stable.

  One's a fix for building with newer binutils (the problem currently
  only affects the BookE kernels but the affected macro might come back
  into use on BookS platforms at any time).  Unfortunately, the binutils
  maintainer did a backward incompatible change to a construct that we
  use so we have to add Makefile check.

  The other one is a fix for CPUs getting stuck in kexec when running
  single threaded.  Since we routinely use kexec on power (including in
  our newer bootloaders), I deemed that important enough"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
  powerpc: Fix 64 bit builds with binutils 2.24
2014-05-28 08:06:50 -07:00
Srivatsa S. Bhat
011e4b02f1 powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode
If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
(ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
get the following messages during boot:

[    0.089866] POWER8 performance monitor hardware support registered
[    0.089985] power8-pmu: PMAO restore workaround active.
[    5.095419] Processor 1 is stuck.
[   10.097933] Processor 2 is stuck.
[   15.100480] Processor 3 is stuck.
[   20.102982] Processor 4 is stuck.
[   25.105489] Processor 5 is stuck.
[   30.108005] Processor 6 is stuck.
[   35.110518] Processor 7 is stuck.
[   40.113369] Processor 9 is stuck.
[   45.115879] Processor 10 is stuck.
[   50.118389] Processor 11 is stuck.
[   55.120904] Processor 12 is stuck.
[   60.123425] Processor 13 is stuck.
[   65.125970] Processor 14 is stuck.
[   70.128495] Processor 15 is stuck.
[   75.131316] Processor 17 is stuck.

Note that only the sibling threads are stuck, while the primary threads (0, 8,
16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
that kexec tries to wakeup (bring online) the sibling threads of all the cores,
before performing kexec:

[ 9464.131231] Starting new kernel
[ 9464.148507] kexec: Waking offline cpu 1.
[ 9464.148552] kexec: Waking offline cpu 2.
[ 9464.148600] kexec: Waking offline cpu 3.
[ 9464.148636] kexec: Waking offline cpu 4.
[ 9464.148671] kexec: Waking offline cpu 5.
[ 9464.148708] kexec: Waking offline cpu 6.
[ 9464.148743] kexec: Waking offline cpu 7.
[ 9464.148779] kexec: Waking offline cpu 9.
[ 9464.148815] kexec: Waking offline cpu 10.
[ 9464.148851] kexec: Waking offline cpu 11.
[ 9464.148887] kexec: Waking offline cpu 12.
[ 9464.148922] kexec: Waking offline cpu 13.
[ 9464.148958] kexec: Waking offline cpu 14.
[ 9464.148994] kexec: Waking offline cpu 15.
[ 9464.149030] kexec: Waking offline cpu 17.

Instrumenting this piece of code revealed that the cpu_up() operation actually
fails with -EBUSY. Thus, only the primary threads of all the cores are online
during kexec, and hence this is a sure-shot receipe for disaster, as explained
in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec),
as well as in the comment above wake_offline_cpus().

It turns out that cpu_up() was returning -EBUSY because the variable
'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
by migrate_to_reboot_cpu() inside kernel_kexec().

Now, migrate_to_reboot_cpu() was originally written with the assumption that
any further code will not need to perform CPU hotplug, since we are anyway in
the reboot path. However, kexec is clearly not such a case, since we depend on
onlining CPUs, atleast on powerpc.

So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
kexec path, to fix this regression in kexec on powerpc.

Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
can catch such issues more easily in the future.

Fixes: c97102ba96 (kexec: migrate to reboot cpu)
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-05-28 13:24:26 +10:00
Jianyu Zhan
26fc9cd200 kernfs: move the last knowledge of sysfs out from kernfs
There is still one residue of sysfs remaining: the sb_magic
SYSFS_MAGIC. However this should be kernfs user specific,
so this patch moves it out. Kerrnfs user should specify their
magic number while mouting.

Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-27 14:33:17 -07:00
Jiang Liu
a257954bb3 genirq: Improve documentation to match current implementation
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiri Kosina <trivial@kernel.org>
Link: http://lkml.kernel.org/r/1401178092-1228-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-27 10:16:44 +02:00
Rafael J. Wysocki
0399d4db3e PM / sleep: Introduce command line argument for sleep state enumeration
On some systems the platform doesn't support neither
PM_SUSPEND_MEM nor PM_SUSPEND_STANDBY, so PM_SUSPEND_FREEZE is the
only available system sleep state.  However, some user space frameworks
only use the "mem" and (sometimes) "standby" sleep state labels, so
the users of those systems need to modify user space in order to be
able to use system suspend at all and that is not always possible.

For this reason, add a new kernel command line argument,
relative_sleep_states, allowing the users of those systems to change
the way in which the kernel assigns labels to system sleep states.
Namely, for relative_sleep_states=1, the "mem", "standby" and "freeze"
labels will enumerate the available system sleem states from the
deepest to the shallowest, respectively, so that "mem" is always
present in /sys/power/state and the other state strings may or may
not be presend depending on what is supported by the platform.

Update system sleep states documentation to reflect this change.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-26 13:40:59 +02:00
Rafael J. Wysocki
43e8317b0b PM / sleep: Use valid_state() for platform-dependent sleep states only
Use the observation that, for platform-dependent sleep states
(PM_SUSPEND_STANDBY, PM_SUSPEND_MEM), a given state is either
always supported or always unsupported and store that information
in pm_states[] instead of calling valid_state() every time we
need to check it.

Also do not use valid_state() for PM_SUSPEND_FREEZE, which is always
valid, and move the pm_test_level validity check for PM_SUSPEND_FREEZE
directly into enter_state().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-26 13:40:53 +02:00
Rafael J. Wysocki
27ddcc6596 PM / sleep: Add state field to pm_states[] entries
To allow sleep states corresponding to the "mem", "standby" and
"freeze" lables to be different from the pm_states[] indexes of
those strings, introduce struct pm_sleep_state, consisting of
a string label and a state number, and turn pm_states[] into an
array of objects of that type.

This modification should not lead to any functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-26 13:40:47 +02:00
Victor Kamensky
72e6ae285a ARM: 8043/1: uprobes need icache flush after xol write
After instruction write into xol area, on ARM V7
architecture code need to flush dcache and icache to sync
them up for given set of addresses. Having just
'flush_dcache_page(page)' call is not enough - it is
possible to have stale instruction sitting in icache
for given xol area slot address.

Introduce arch_uprobe_ixol_copy weak function
that by default calls uprobes copy_to_page function and
than flush_dcache_page function and on ARM define new one
that handles xol slot copy in ARM specific way

flush_uprobe_xol_access function shares/reuses implementation
with/of flush_ptrace_access function and takes care of writing
instruction to user land address space on given variety of
different cache types on ARM CPUs. Because
flush_uprobe_xol_access does not have vma around
flush_ptrace_access was split into two parts. First that
retrieves set of condition from vma and common that receives
those conditions as flags.

Note ARM cache flush function need kernel address
through which instruction write happened, so instead
of using uprobes copy_to_page function changed
code to explicitly map page and do memcpy.

Note arch_uprobe_copy_ixol function, in similar way as
copy_to_user_page function, has preempt_disable/preempt_enable.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: David A. Long <dave.long@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-05-25 23:48:45 +01:00
Thomas Gleixner
309179fabd Merge branch 'fortglx/3.16/time' of git://git.linaro.org/people/john.stultz/linux into timers/core
* Remove the deprecated setup_sched_clock() API
* Minor NTP updates
2014-05-24 02:05:51 +02:00
Linus Torvalds
f02f79dbcb Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "The biggest commit is an irqtime accounting loop latency fix, the rest
  are misc fixes all over the place: deadline scheduling, docs, numa,
  balancer and a bad to-idle latency fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/numa: Initialize newidle balance stats in sd_numa_init()
  sched: Fix updating rq->max_idle_balance_cost and rq->next_balance in idle_balance()
  sched: Skip double execution of pick_next_task_fair()
  sched: Use CPUPRI_NR_PRIORITIES instead of MAX_RT_PRIO in cpupri check
  sched/deadline: Fix memory leak
  sched/deadline: Fix sched_yield() behavior
  sched: Sanitize irq accounting madness
  sched/docbook: Fix 'make htmldocs' warnings caused by missing description
2014-05-23 10:04:04 -07:00
Linus Torvalds
e6a32c3ad1 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "The biggest changes are fixes for races that kept triggering Trinity
  crashes, plus liblockdep build fixes and smaller misc fixes.

  The liblockdep bits in perf/urgent are a pull mistake - they should
  have been in locking/urgent - but by the time I noticed other commits
  were added and testing was done :-/ Sorry about that"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()
  perf: Prevent false warning in perf_swevent_add
  perf: Limit perf_event_attr::sample_period to 63 bits
  tools/liblockdep: Remove all build files when doing make clean
  tools/liblockdep: Build liblockdep from tools/Makefile
  perf/x86/intel: Fix Silvermont's event constraints
  perf: Fix perf_event_init_context()
  perf: Fix race in removing an event
2014-05-23 10:02:34 -07:00
Bjorn Helgaas
e4c7296643 resources: Clarify sanity check message
The resource map sanity check message is a bit confusing.  Change it to be
more readable:

  -resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
  +resource sanity check: requesting [mem 0xfed10000-0xfed15fff], which spans more than pnp 00:01 [mem 0xfed10000-0xfed13fff]

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-05-23 10:47:21 -06:00
Greg Kroah-Hartman
cbfef53360 Merge 3.15-rc6 into driver-core-next
We want the kernfs fixes in this branch as well for testing.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-23 10:13:53 +09:00
Ingo Molnar
e14505a8d5 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

" 1.      Update RCU documentation.  These were posted to LKML at
          https://lkml.org/lkml/2014/4/28/634.

  2.      Miscellaneous fixes.  These were posted to LKML at
          https://lkml.org/lkml/2014/4/28/645.

  3.      Torture-test changes.  These were posted to LKML at
          https://lkml.org/lkml/2014/4/28/667.

  4.      Variable-name renaming cleanup, sent separately due to conflicts.
          This was posted to LKML at https://lkml.org/lkml/2014/5/13/854.

  5.      Patch to suppress RCU stall warnings while sysrq requests are
          being processed.  This patch is the RCU portions of the patch
          that Rik posted to LKML at https://lkml.org/lkml/2014/4/29/457.
          The reason for pushing this patch ahead instead of waiting until
          3.17 is that the NMI-based stack traces are messing up sysrq
          output, and in some cases also messing up the system as well."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:36:10 +02:00
Rik van Riel
096aa33863 sched/numa: Decay ->wakee_flips instead of zeroing
Affine wakeups have the potential to interfere with NUMA placement.
If a task wakes up too many other tasks, affine wakeups will get
disabled.

However, regardless of how many other tasks it wakes up, it gets
re-enabled once a second, potentially interfering with NUMA
placement of other tasks.

By decaying wakee_wakes in half instead of zeroing it, we can avoid
that problem for some workloads.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: chegu_vinod@hp.com
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20140516001332.67f91af2@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:41 +02:00
Rik van Riel
b1ad065e65 sched/numa: Update migrate_improves/degrades_locality()
Update the migrate_improves/degrades_locality() functions with
knowledge of pseudo-interleaving.

Do not consider moving tasks around within the set of group's active
nodes as improving or degrading locality. Instead, leave the load
balancer free to balance the load between a numa_group's active nodes.

Also, switch from the group/task_weight functions to the group/task_fault
functions. The "weight" functions involve a division, but both calls use
the same divisor, so there's no point in doing that from these functions.

On a 4 node (x10 core) system, performance of SPECjbb2005 seems
unaffected, though the number of migrations with 2 8-warehouse wide
instances seems to have almost halved, due to the scheduler running
each instance on a single node.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Link: http://lkml.kernel.org/r/20140515130306.61aae7db@cuia.bos.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:39 +02:00
Rik van Riel
e63da03639 sched/numa: Allow task switch if load imbalance improves
Currently the NUMA balancing code only allows moving tasks between NUMA
nodes when the load on both nodes is in balance. This breaks down when
the load was imbalanced to begin with.

Allow tasks to be moved between NUMA nodes if the imbalance is small,
or if the new imbalance is be smaller than the original one.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140514132221.274b3463@annuminas.surriel.com
2014-05-22 11:16:38 +02:00
xiaofeng.yan
4027d08085 sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
Signed-off-by: xiaofeng.yan <xiaofeng.yan@huawei.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399605687-18094-1-git-send-email-xiaofeng.yan@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:37 +02:00
Dongsheng Yang
7aa2c016db sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/a568a1e3cc8e78648f41b5035fa5e381d36274da.1399532322.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:36 +02:00
Corey Minyard
a803f0261b sched: Initialize rq->age_stamp on processor start
If the sched_clock time starts at a large value, the kernel will spin
in sched_avg_update for a long time while rq->age_stamp catches up
with rq->clock.

The comment in kernel/sched/clock.c says that there is no strict promise
that it starts at zero.  So initialize rq->age_stamp when a cpu starts up
to avoid this.

I was seeing long delays on a simulator that didn't start the clock at
zero.  This might also be an issue on reboots on processors that don't
re-initialize the timer to zero on reset, and when using kexec.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399574859-11714-1-git-send-email-minyard@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:35 +02:00
Kirill Tkhai
7246544786 sched, nohz: Change rq->nr_running to always use wrappers
Sometimes ->nr_running may cross 2 but interrupt is not being
sent to rq's cpu. In this case we don't reenable the timer.
Looks like this may be the reason for rare unexpected effects,
if nohz is enabled.

Patch replaces all places of direct changing of nr_running
and makes add_nr_running() caring about crossing border.

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140508225830.2469.97461.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:33 +02:00
Jason Low
52a08ef1f1 sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
Currently, in idle_balance(), we update rq->next_balance when we pull_tasks.
However, it is also important to update this in the !pulled_tasks case too.

When the CPU is "busy" (the CPU isn't idle), rq->next_balance gets computed
using sd->busy_factor (so we increase the balance interval when the CPU is
busy). However, when the CPU goes idle, rq->next_balance could still be set
to a large value that was computed with the sd->busy_factor.

Thus, we need to also update rq->next_balance in idle_balance() in the cases
where !pulled_tasks too, so that rq->next_balance gets updated without taking
the busy_factor into account when the CPU is about to go idle.

This patch makes rq->next_balance get updated independently of whether or
not we pulled_task. Also, we add logic to ensure that we always traverse
at least 1 of the sched domains to get a proper next_balance value for
updating rq->next_balance.

Additionally, since load_balance() modifies the sd->balance_interval, we
need to re-obtain the sched domain's interval after the call to
load_balance() in rebalance_domains() before we update rq->next_balance.

This patch adds and uses 2 new helper functions, update_next_balance() and
get_sd_balance_interval() to update next_balance and obtain the sched
domain's balance_interval.

Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: daniel.lezcano@linaro.org
Cc: alex.shi@linaro.org
Cc: efault@gmx.de
Cc: vincent.guittot@linaro.org
Cc: morten.rasmussen@arm.com
Cc: aswin@hp.com
Link: http://lkml.kernel.org/r/1399596562.2200.7.camel@j-VirtualBox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:32 +02:00
Dongsheng Yang
a9467fa3cd sched: Use clamp() and clamp_val() to make sys_nice() more readable
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399541715-19568-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:31 +02:00
Dietmar Eggemann
caffcdd8d2 sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
There is no need to zero struct sched_group member cpumask and struct
sched_group_power member power since both structures are already allocated
as zeroed memory in __sdt_alloc().

This patch has been tested with
BUG_ON(!cpumask_empty(sched_group_cpus(sg))); and BUG_ON(sg->sgp->power);
in build_sched_groups() on ARM TC2 and INTEL i5 M520 platform including
CPU hotplug scenarios.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1398865178-12577-1-git-send-email-dietmar.eggemann@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:30 +02:00
Vincent Guittot
c515db8cd3 sched/numa: Fix initialization of sched_domain_topology for NUMA
Jet Chen has reported a kernel panics when booting qemu-system-x86_64 with
kvm64 cpu. A panic occured while building the sched_domain.

In sched_init_numa, we create a new topology table in which both default
levels and numa levels are copied. The last row of the table must have a null
pointer in the mask field.

The current implementation doesn't add this last row in the computation of the
table size. So we add 1 row in the allocation size that will be used as the
last row of the table. The kzalloc will ensure that the mask field is NULL.

Reported-by: Jet Chen <jet.chen@intel.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fengguang.wu@intel.com
Link: http://lkml.kernel.org/r/1399972261-25693-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:29 +02:00
Rik van Riel
8bf21433f3 sched: Call select_idle_sibling() when not affine_sd
On smaller systems, the top level sched domain will be an affine
domain, and select_idle_sibling is invoked for every SD_WAKE_AFFINE
wakeup. This seems to be working well.

On larger systems, with the node distance between far away NUMA nodes
being > RECLAIM_DISTANCE, select_idle_sibling is only called if the
waker and the wakee are on nodes less than RECLAIM_DISTANCE apart.

This patch leaves in place the policy of not pulling the task across
nodes on such systems, while fixing the issue that select_idle_sibling
is not called at all in certain circumstances.

The code will look for an idle CPU in the same CPU package as the
CPU where the task ran previously.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: morten.rasmussen@arm.com
Cc: george.mccollister@gmail.com
Cc: ktkhai@parallels.com
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Link: http://lkml.kernel.org/r/20140514114037.2d93266f@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:28 +02:00
Michael Kerrisk
2240067494 sched: Simplify return logic in sched_read_attr()
Gotos are chained pointlessly here, and the 'out' label
can be dispensed with.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEC29.9090503@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:27 +02:00
Michael Kerrisk
e78c7bca56 sched: Simplify return logic in sched_copy_attr()
The logic in this function is a little contorted, clean it up:

  * Rather than having chained gotos for the -EFBIG case, just
    return -EFBIG directly.

  * Now, the label 'out' is no longer needed, and 'ret' must be zero
    zero by the time we fall through to this point, so just return 0.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/536CEC24.9080201@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:26 +02:00
Ben Segall
3944a9274e sched: Fix exec_start/task_hot on migrated tasks
task_hot checks exec_start on any runnable task, but if it has been
migrated since the it last ran, then exec_start is a clock_task from
another cpu. If the old cpu's clock_task was sufficiently far ahead of
this cpu's then the task will not be considered for another migration
until it has run. Instead reset exec_start whenever a task is migrated,
since it is presumably no longer hot anyway.

Signed-off-by: Ben Segall <bsegall@google.com>
[ Made it compile. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140515225920.7179.13924.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 11:16:25 +02:00
Ingo Molnar
6669dc8907 Merge branch 'sched/urgent' into sched/core to avoid conflicts with upcoming changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:55:03 +02:00
Ingo Molnar
ec6e7f4082 Merge branch 'pm-cpuidle' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm into sched/core
Pull scheduling related CPU idle updates from Rafael J. Wysocki.

Conflicts:
	kernel/sched/idle.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:37:06 +02:00
Ingo Molnar
65c2ce7004 Linux 3.15-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTfR2zAAoJEHm+PkMAQRiG3noH/2s+KUge3qO2M+AmxttUo74B
 +npAMdbqYR3MdEiwxYZfsHcMu4Ye/IKLcrh4pydB5hI2mdjtGkH1bnmia0f1ve/c
 Z/a0256+W8gWp7mcUBqSNztqLPAWa7wKOqNdLjj5idr1BSj6u8im+fQ9FBh2woki
 1fyYAuq/60lq4CMOKJvkA95V1Ome/jO+8tS4PguOgsCETQxCVFGurZcBbG3Mx5Y3
 v+ioCqeRc6GvxPFR6YngnTZCrsLxSRT3tnO2Qy5zX7dxjIQkCEbvIckpBQv01Y3R
 wNUaX+2Jae207igxrEv8CjmCFnmZFuUI15aWWCy6fOS/j8bjuk6ThYJO8N4ZBM0=
 =2ShG
 -----END PGP SIGNATURE-----

Merge tag 'v3.15-rc6' into sched/core, to pick up the latest fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:28:56 +02:00
Lai Jiangshan
6acbfb9697 sched: Fix hotplug vs. set_cpus_allowed_ptr()
Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b55 ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b55 ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:31 +02:00
Peter Zijlstra
4dac0b6383 sched/cpupri: Replace NR_CPUS arrays
Tejun reported that his resume was failing due to order-3 allocations
from sched_domain building.

Replace the NR_CPUS arrays in there with a dynamically allocated
array.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-7cysnkw1gik45r864t1nkudh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:29 +02:00
Peter Zijlstra
944770ab54 sched/deadline: Replace NR_CPUS arrays
Tejun reported that his resume was failing due to order-3 allocations
from sched_domain building.

Replace the NR_CPUS arrays in there with a dynamically allocated
array.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-kat4gl1m5a6dwy6nzuqox45e@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:28 +02:00
Juri Lelli
b0827819b0 sched/deadline: Restrict user params max value to 2^63 ns
Michael Kerrisk noticed that creating SCHED_DEADLINE reservations
with certain parameters (e.g, a runtime of something near 2^64 ns)
can cause a system freeze for some amount of time.

The problem is that in the interface we have

 u64 sched_runtime;

while internally we need to have a signed runtime (to cope with
budget overruns)

 s64 runtime;

At the time we setup a new dl_entity we copy the first value in
the second. The cast turns out with negative values when
sched_runtime is too big, and this causes the scheduler to go crazy
right from the start.

Moreover, considering how we deal with deadlines wraparound

 (s64)(a - b) < 0

we also have to restrict acceptable values for sched_{deadline,period}.

This patch fixes the thing checking that user parameters are always
below 2^63 ns (still large enough for everyone).

It also rewrites other conditions that we check, since in
__checkparam_dl we don't have to deal with deadline wraparounds
and what we have now erroneously fails when the difference between
values is too big.

Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Dario Faggioli<raistlin@linux.it>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140513141131.20d944f81633ee937f256385@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:27 +02:00
Peter Zijlstra
ce5f7f8200 sched/deadline: Change sched_getparam() behaviour vs SCHED_DEADLINE
The way we read POSIX one should only call sched_getparam() when
sched_getscheduler() returns either SCHED_FIFO or SCHED_RR.

Given that we currently return sched_param::sched_priority=0 for all
others, extend the same behaviour to SCHED_DEADLINE.

Requested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: linux-man <linux-man@vger.kernel.org>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20140512205034.GH13467@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:26 +02:00
Peter Zijlstra
dbdb22754f sched: Disallow sched_attr::sched_policy < 0
The scheduler uses policy=-1 to preserve the current policy state to
implement sys_sched_setparam(), this got exposed to userspace by
accident through sys_sched_setattr(), cure this.

Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140509085311.GJ30445@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:26 +02:00
Michael Kerrisk
143cf23df2 sched: Make sched_setattr() correctly return -EFBIG
The documented[1] behavior of sched_attr() in the proposed man page text is:

    sched_attr::size must be set to the size of the structure, as in
    sizeof(struct sched_attr), if the provided structure is smaller
    than the kernel structure, any additional fields are assumed
    '0'. If the provided structure is larger than the kernel structure,
    the kernel verifies all additional fields are '0' if not the
    syscall will fail with -E2BIG.

As currently implemented, sched_copy_attr() returns -EFBIG for
for this case, but the logic in sys_sched_setattr() converts that
error to -EFAULT. This patch fixes the behavior.

[1] http://thread.gmane.org/gmane.linux.kernel/1615615/focus=1697760

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/536CEC17.9070903@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-05-22 10:21:25 +02:00
H. Peter Anvin
94aca80897 Merge remote-tracking branch 'origin/x86/urgent' into x86/vdso
Resolved Conflicts:
	arch/x86/vdso/vdso32-setup.c

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-21 17:38:22 -07:00
H. Peter Anvin
03c1b4e8e5 Merge remote-tracking branch 'origin/x86/espfix' into x86/vdso
Merge x86/espfix into x86/vdso, due to changes in the vdso setup code
that otherwise cause conflicts.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-05-21 17:36:33 -07:00
Linus Torvalds
06eb4cc2e7 Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull more cgroup fixes from Tejun Heo:
 "Three more patches to fix cgroup_freezer breakage due to the recent
  cgroup internal locking changes - an operation cgroup_freezer was
  using now requires sleepable context and cgroup_freezer was invoking
  that while holding a spin lock.  cgroup_freezer was using an overly
  elaborate hierarchical locking scheme.

  While it's possible to convert the hierarchical spinlocks directly to
  mutexes, this patch simplifies the overall locking so that it uses a
  global mutex.  This has the added benefit of avoiding iterating
  potentially huge number of tasks under a spinlock.  While the patch is
  on the larger side in the devel cycle, the changes made are mostly
  straight-forward and the locking logic is a lot simpler afterwards"

* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix rcu_read_lock() leak in update_if_frozen()
  cgroup_freezer: replace freezer->lock with freezer_mutex
  cgroup: introduce task_css_is_root()
2014-05-21 18:36:40 +09:00
Rafael J. Wysocki
55cc33ceb7 Merge branch 'pm-sleep' into acpi-pm 2014-05-20 13:22:15 +02:00
Linus Torvalds
95d08585e0 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "A single bug fix for a long standing issue:

   - Updating the expiry value of a relative timer _after_ letting the
     idle logic select a target cpu for the timer based on its stale
     expiry value is outright stupid.  Thanks to Viresh for spotting the
     brainfart"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Set expiry time before switch_hrtimer_base()
2014-05-20 14:19:10 +09:00
Mark Brown
049d595a4d PM / OPP: Make OPP invisible to users in Kconfig
The OPP code is an in kernel library selected by its users, there is no
no architecture code required to implement it and enabling it without a
user just increases the kernel size. Since the users select rather than
depend on it just remove the ability to directly set the option from
Kconfig.

Signed-off-by: Mark Brown <broonie@linaro.org>
Acked-by: Nishanth Menon <nm@ti.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-19 23:03:31 +02:00
Rik van Riel
61f38db3e3 rcu: Provide API to suppress stall warnings while sysrc runs
Some sysrq handlers can run for a long time, because they dump a lot
of data onto a serial console. Having RCU stall warnings pop up in
the middle of them only makes the problem worse.

This commit provides rcu_sysrq_start() and rcu_sysrq_end() APIs to
temporarily suppress RCU CPU stall warnings while a sysrq request is
handled.

Signed-off-by: Rik van Riel <riel@redhat.com>
[ paulmck: Fix TINY_RCU build error. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-19 10:52:04 -07:00
Borislav Petkov
12665b35b0 perf/events/core: Drop unused variable after cleanup
... in 3a497f4863 ("perf: Simplify perf_event_exit_task_context()")

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1399720259-28275-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-19 21:52:58 +09:00
Peter Zijlstra
b69cf53640 perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()
Alexander noticed that we use RCU iteration on rb->event_list but do
not use list_{add,del}_rcu() to add,remove entries to that list, nor
do we observe proper grace periods when re-using the entries.

Merge ring_buffer_detach() into ring_buffer_attach() such that
attaching to the NULL buffer is detaching.

Furthermore, ensure that between any 'detach' and 'attach' of the same
event we observe the required grace period, but only when strictly
required. In effect this means that only ioctl(.request =
PERF_EVENT_IOC_SET_OUTPUT) will wait for a grace period, while the
normal initial attach and final detach will not be delayed.

This patch should, I think, do the right thing under all
circumstances, the 'normal' cases all should never see the extra grace
period, but the two cases:

 1) PERF_EVENT_IOC_SET_OUTPUT on an event which already has a
    ring_buffer set, will now observe the required grace period between
    removing itself from the old and attaching itself to the new buffer.

    This case is 'simple' in that both buffers are present in
    perf_event_set_output() one could think an unconditional
    synchronize_rcu() would be sufficient; however...

 2) an event that has a buffer attached, the buffer is destroyed
    (munmap) and then the event is attached to a new/different buffer
    using PERF_EVENT_IOC_SET_OUTPUT.

    This case is more complex because the buffer destruction does:
      ring_buffer_attach(.rb = NULL)
    followed by the ioctl() doing:
      ring_buffer_attach(.rb = foo);

    and we still need to observe the grace period between these two
    calls due to us reusing the event->rb_entry list_head.

In order to make 2 happen we use Paul's latest cond_synchronize_rcu()
call.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140507123526.GD13658@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-05-19 21:44:56 +09:00