Commit graph

17311 commits

Author SHA1 Message Date
Frederic Weisbecker
dee08a72de cputime: Fix jiffies based cputime assumption on steal accounting
The steal guest time accounting code assumes that cputime_t is based on
jiffies. So when CONFIG_NO_HZ_FULL=y, which implies that cputime_t
is based on nsecs, steal_account_process_tick() passes the delta in
jiffies to account_steal_time() which then accounts it as if it's a
value in nsecs.

As a result, accounting 1 second of steal time (with HZ=100 that would
be 100 jiffies) is spuriously accounted as 100 nsecs.

As such /proc/stat may report 0 values of steal time even when two
guests have run concurrently for a few seconds on the same host and
same CPU.

In order to fix this, lets convert the nsecs based steal delta to
cputime instead of jiffies by using the right conversion API.

Given that the steal time is stored in cputime_t and this type can have
a smaller granularity than nsecs, we only account the rounded converted
value and leave the remaining nsecs for the next deltas.

Reported-by: Huiqingding <huding@redhat.com>
Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-03-13 15:56:44 +01:00
Peter Zijlstra
6f008e72cd locking/mutex: Fix debug checks
OK, so commit:

  1d8fe7dc80 ("locking/mutexes: Unlock the mutex without the wait_lock")

generates this boot warning when CONFIG_DEBUG_MUTEXES=y:

  WARNING: CPU: 0 PID: 139 at /usr/src/linux-2.6/kernel/locking/mutex-debug.c:82 debug_mutex_unlock+0x155/0x180() DEBUG_LOCKS_WARN_ON(lock->owner != current)

And that makes sense, because as soon as we release the lock a
new owner can come in...

One would think that !__mutex_slowpath_needs_to_unlock()
implementations suffer the same, but for DEBUG we fall back to
mutex-null.h which has an unconditional 1 for that.

The mutex debug code requires the mutex to be unlocked after
doing the debug checks, otherwise it can find inconsistent
state.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: jason.low2@hp.com
Link: http://lkml.kernel.org/r/20140312122442.GB27965@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12 13:49:47 +01:00
Peter Zijlstra
34c6bc2c91 locking/mutexes: Add extra reschedule point
Add in an extra reschedule in an attempt to avoid getting reschedule
the moment we've acquired the lock.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-zah5eyn9gu7qlgwh9r6n2anc@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:59 +01:00
Peter Zijlstra
fb0527bd5e locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
Since we want a task waiting for a mutex_lock() to go to sleep and
reschedule on need_resched() we must be able to abort the
mcs_spin_lock() around the adaptive spin.

Therefore implement a cancelable mcs lock.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: chegu_vinod@hp.com
Cc: paulmck@linux.vnet.ibm.com
Cc: Waiman.Long@hp.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Cc: Jason Low <jason.low2@hp.com>
Link: http://lkml.kernel.org/n/tip-62hcl5wxydmjzd182zhvk89m@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:56 +01:00
Jason Low
1d8fe7dc80 locking/mutexes: Unlock the mutex without the wait_lock
When running workloads that have high contention in mutexes on an 8 socket
machine, mutex spinners would often spin for a long time with no lock owner.

The main reason why this is occuring is in __mutex_unlock_common_slowpath(),
if __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the
mutex->wait_lock before releasing the mutex (setting lock->count to 1). When
the wait_lock is contended, this delays the mutex from being released.
We should be able to release the mutex without holding the wait_lock.

Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: chegu_vinod@hp.com
Cc: paulmck@linux.vnet.ibm.com
Cc: Waiman.Long@hp.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390936396-3962-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:54 +01:00
Jason Low
47667fa150 locking/mutexes: Modify the way optimistic spinners are queued
The mutex->spin_mlock was introduced in order to ensure that only 1 thread
spins for lock acquisition at a time to reduce cache line contention. When
lock->owner is NULL and the lock->count is still not 1, the spinner(s) will
continually release and obtain the lock->spin_mlock. This can generate
quite a bit of overhead/contention, and also might just delay the spinner
from getting the lock.

This patch modifies the way optimistic spinners are queued by queuing before
entering the optimistic spinning loop as oppose to acquiring before every
call to mutex_spin_on_owner(). So in situations where the spinner requires
a few extra spins before obtaining the lock, then there will only be 1 spinner
trying to get the lock and it will avoid the overhead from unnecessarily
unlocking and locking the spin_mlock.

Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Cc: chegu_vinod@hp.com
Cc: Waiman.Long@hp.com
Cc: paulmck@linux.vnet.ibm.com
Cc: torvalds@linux-foundation.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390936396-3962-3-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:53 +01:00
Jason Low
46af29e479 locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
The mutex_can_spin_on_owner() function should also return false if the
task needs to be rescheduled to avoid entering the MCS queue when it
needs to reschedule.

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman.Long@hp.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Cc: chegu_vinod@hp.com
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1390936396-3962-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:52 +01:00
Peter Zijlstra
c9122da1e2 locking: Move mcs_spinlock.h into kernel/locking/
The mcs_spinlock code is not meant (or suitable) as a generic locking
primitive, therefore take it away from the normal includes and place
it in kernel/locking/.

This way the locking primitives implemented there can use it as part
of their implementation but we do not risk it getting used
inapropriately.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-byirmpamgr7h25m5kyavwpzx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:52 +01:00
Heiko Carstens
03b8c7b623 futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
is no runtime check necessary, allow to skip the test within futex_init().

This allows to get rid of some code which would always give the same result,
and also allows the compiler to optimize a couple of if statements away.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-03 11:32:08 +01:00
Oleg Nesterov
34d0ed5ea7 lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
The __lockdep_no_validate check in mark_held_locks() adds the subtle
and (afaics) unnecessary difference between no-validate and check==0.
And this looks even more inconsistent because __lock_acquire() skips
mark_irqflags()->mark_lock() if !check.

Change mark_held_locks() to check hlock->check instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120182013.GA26505@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 21:18:59 +01:00
Oleg Nesterov
1b5ff816ca lockdep: Don't create the wrong dependency on hlock->check == 0
Test-case:

	DEFINE_MUTEX(m1);
	DEFINE_MUTEX(m2);
	DEFINE_MUTEX(mx);

	void lockdep_should_complain(void)
	{
		lockdep_set_novalidate_class(&mx);

		// m1 -> mx -> m2
		mutex_lock(&m1);
		mutex_lock(&mx);
		mutex_lock(&m2);
		mutex_unlock(&m2);
		mutex_unlock(&mx);
		mutex_unlock(&m1);

		// m2 -> m1 ; should trigger the warning
		mutex_lock(&m2);
		mutex_lock(&m1);
		mutex_unlock(&m1);
		mutex_unlock(&m2);
	}

this doesn't trigger any warning, lockdep can't detect the trivial
deadlock.

This is because lock(&mx) correctly avoids m1 -> mx dependency, it
skips validate_chain() due to mx->check == 0. But lock(&m2) wrongly
adds mx -> m2 and thus m1 -> m2 is not created.

rcu_lock_acquire()->lock_acquire(check => 0) is fine due to read == 2,
so currently only __lockdep_no_validate__ can trigger this problem.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120182010.GA26498@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 21:18:57 +01:00
Oleg Nesterov
fb9edbe984 lockdep: Make held_lock->check and "int check" argument bool
The "int check" argument of lock_acquire() and held_lock->check are
misleading. This is actually a boolean: 2 means "true", everything
else is "false".

And there is no need to pass 1 or 0 to lock_acquire() depending on
CONFIG_PROVE_LOCKING, __lock_acquire() checks prove_locking at the
start and clears "check" if !CONFIG_PROVE_LOCKING.

Note: probably we can simply kill this member/arg. The only explicit
user of check => 0 is rcu_lock_acquire(), perhaps we can change it to
use lock_acquire(trylock =>, read => 2). __lockdep_no_validate means
check => 0 implicitly, but we can change validate_chain() to check
hlock->instance->key instead. Not to mention it would be nice to get
rid of lockdep_set_novalidate_class().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120182006.GA26495@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 21:18:54 +01:00
Ingo Molnar
65370bdf88 Merge branch 'linus' into core/locking
Refresh the topic.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-02 09:43:20 +01:00
Linus Torvalds
aafd9d6a46 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer/dynticks updates from Ingo Molnar:
 "This tree contains misc dynticks updates: a fix and three cleanups"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/nohz: Fix overflow error in scheduler_tick_max_deferment()
  nohz_full: fix code style issue of tick_nohz_full_stop_tick
  nohz: Get timekeeping max deferment outside jiffies_lock
  tick: Rename tick_check_idle() to tick_irq_enter()
2014-01-31 09:02:51 -08:00
Linus Torvalds
595bf999e3 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A crash fix and documentation updates"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Make sched_class::get_rr_interval() optional
  sched/deadline: Add sched_dl documentation
  sched: Fix docbook parameter annotation error in wait.h
2014-01-31 09:00:44 -08:00
Linus Torvalds
ab5318788c Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core debug changes from Ingo Molnar:
 "This contains mostly kernel debugging related updates:

   - make hung_task detection more configurable to distros
   - add final bits for x86 UV NMI debugging, with related KGDB changes
   - update the mailing-list of MAINTAINERS entries I'm involved with"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hung_task: Display every hung task warning
  sysctl: Add neg_one as a standard constraint
  x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured
  x86/uv/nmi: Fix Sparse warnings
  kgdb/kdb: Fix no KDB config problem
  MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries
2014-01-31 08:59:46 -08:00
Roman Gushchin
73f945505b kernel/smp.c: remove cpumask_ipi
After commit 9a46ad6d6d ("smp: make smp_call_function_many() use logic
similar to smp_call_function_single()"), cfd->cpumask is accessed only
in smp_call_function_many().  So there is no more need to copy it into
cfd->cpumask_ipi before putting csd into the list.  The cpumask_ipi
field is obsolete and can be removed.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wang YanQing <udknight@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30 16:56:54 -08:00
Christoph Hellwig
6897fc22ea kernel: use lockless list for smp_call_function_single
Make smp_call_function_single and friends more efficient by using a
lockless list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30 16:56:54 -08:00
Linus Torvalds
f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Linus Torvalds
bf3d846b78 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and ->set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
2014-01-28 08:38:04 -08:00
Tim Chen
e72246748f locking/mutexes/mcs: Restructure the MCS lock defines and locking code into its own file
We will need the MCS lock code for doing optimistic spinning for rwsem
and queued rwlock.  Extracting the MCS code from mutex.c and put into
its own file allow us to reuse this code easily.

We also inline mcs_spin_lock and mcs_spin_unlock functions
for better efficiency.

Note that using the smp_load_acquire/smp_store_release pair used in
mcs_lock and mcs_unlock is not sufficient to form a full memory barrier
across cpus for many architectures (except x86).  For applications that
absolutely need a full barrier across multiple cpus with mcs_unlock and
mcs_lock pair, smp_mb__after_unlock_lock() should be used after mcs_lock.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1390347360.3138.63.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-28 13:13:27 +01:00
Waiman Long
aff7385b5a locking/mutexes/mcs: Correct barrier usage
This patch corrects the way memory barriers are used in the MCS lock
with smp_load_acquire and smp_store_release fucnctions.  The previous
barriers could leak critical sections if mcs lock is used by itself.
It is not a problem when mcs lock is embedded in mutex but will be an
issue when the mcs_lock is used elsewhere.

The patch removes the incorrect barriers and put in correct
barriers with the pair of functions smp_load_acquire and smp_store_release.

Suggested-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1390347353.3138.62.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-28 13:13:26 +01:00
Peter Zijlstra
a57beec5d4 sched: Make sched_class::get_rr_interval() optional
Not all classes implement (or can implement) a useful get_rr_interval()
function, default to a 0 time-slice for them.

This fixes a crash reported by Tommi Rantala.

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140127105413.GC11314@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-28 13:08:41 +01:00
Dario Faggioli
712e5e34ae sched/deadline: Add sched_dl documentation
Add in Documentation/scheduler/ some hints about the design
choices, the usage and the future possible developments of the
sched_dl scheduling class and of the SCHED_DEADLINE policy.

Reviewed-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
[ Re-wrote sections 2 and 3. ]
Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390821615-23247-1-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-28 13:08:40 +01:00
Joe Perches
ce85b4f2ea softirq: use const char * const for softirq_to_name, whitespace neatening
Reduce data size a little.
Reduce checkpatch noise.

$ size kernel/softirq.o*
   text	   data	    bss	    dec	    hex	filename
  11554	   6013	   4008	  21575	   5447	kernel/softirq.o.new
  11474	   6093	   4008	  21575	   5447	kernel/softirq.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:40 -08:00
Joe Perches
4032276415 softirq: convert printks to pr_<level>
Use a more current logging style.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:40 -08:00
Joe Perches
2e702b9f6c softirq: use ffs() in __do_softirq()
Possible speed improvement of __do_softirq() by using ffs() instead of
using a while loop with an & 1 test then single bit shift.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:40 -08:00
Chen Gang
a19428e5c3 kernel/kexec.c: use vscnprintf() instead of vsnprintf() in vmcoreinfo_append_str()
vsnprintf() may let 'r' larger than sizeof(buf), in this case, if 'r' is
also less than "vmcoreinfo_max_size - vmcoreinfo_size" (left size of
destination buffer), next memcpy() will read the unexpected addresses.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:40 -08:00
Linus Torvalds
ba635f8cd2 The first two patches fix the debugfs README file to reflect better
the new features added to 3.14.
 
 The third patch is a minor bugfix to the trace_puts() functions that
 will crash the system if a developer adds one before the tracing system
 is setup. It also affects trace_printk() if it has no arguments, as
 the code will convert it to a trace_puts() as well. Note, this bug
 will not affect unmodified kernels, as trace_printk() and trace_puts()
 should only be used by developers for testing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJS5oAkAAoJEKQekfcNnQGuOfkH/1ScwoRslERCCjhPEZu958bG
 VugNvliB/uDwq3qkmBcncMHwqkFBgJay9ieah+JFeK6x3G6mO/uf+UhN5cmLzVBh
 vA5dKR2zW6WTxKYSvXYapD7OzC7R+V6CLvpMl5WMZ1t3ESQhR6gUrpPdigecBWs9
 015rMVA2xXjNnHNDM1nem1PhnMba78A/N98lFErYGpvVBjkCpB8mSt8adds9bYp8
 a83P6tGUfXvsYO3sOqqBnOOqcfzCjjJbmr94v/F+SmLyuxuV0tVzBkIwXkKTNhEK
 bQZj9Pfe+1oIkldXxstn5jWaOvI6RlAGW4b4qXt30AgrGRSyEo5xpvfLfyNUif4=
 =N3kq
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "The first two patches fix the debugfs README file to reflect better
  the new features added to 3.14.

  The third patch is a minor bugfix to the trace_puts() functions that
  will crash the system if a developer adds one before the tracing
  system is setup.  It also affects trace_printk() if it has no
  arguments, as the code will convert it to a trace_puts() as well.

  Note, this bug will not affect unmodified kernels, as trace_printk()
  and trace_puts() should only be used by developers for testing"

* tag 'trace-fixes-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Check if tracing is enabled in trace_puts()
  tracing: Fix formatting of trace README file
  tracing/README: Add event file usage to tracing mini-HOWTO
2014-01-27 08:22:30 -08:00
Linus Torvalds
f6d13daadd Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A couple of regression fixes mostly hitting virtualized setups, but
  also some bare metal systems"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/x86/tsc: Initialize multiplier to 0
  sched/clock: Fixup early initialization
  sched/preempt/x86: Fix voluntary preempt for x86
  Revert "sched: Fix sleep time double accounting in enqueue entity"
2014-01-25 11:11:31 -08:00
Aaron Tomlin
270750dbc1 hung_task: Display every hung task warning
When khungtaskd detects hung tasks, it prints out
backtraces from a number of those tasks.

Limiting the number of backtraces being printed
out can result in the user not seeing the information
necessary to debug the issue. The hung_task_warnings
sysctl controls this feature.

This patch makes it possible for hung_task_warnings
to accept a special value to print an unlimited
number of backtraces when khungtaskd detects hung
tasks.

The special value is -1. To use this value it is
necessary to change types from ulong to int.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/1390239253-24030-3-git-send-email-atomlin@redhat.com
[ Build warning fix. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 12:13:33 +01:00
Oleg Nesterov
a8d4b8345e introduce __fcheck_files() to fix rcu_dereference_check_fdtable(), kill rcu_my_thread_group_empty()
rcu_dereference_check_fdtable() looks very wrong,

1. rcu_my_thread_group_empty() was added by 844b9a8707 "vfs: fix
   RCU-lockdep false positive due to /proc" but it doesn't really
   fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
   hit the same race with get_files_struct().

   And otoh rcu_my_thread_group_empty() can suppress the correct
   warning if the caller is the CLONE_FILES (without CLONE_THREAD)
   task.

2. files->count == 1 check is not really right too. Even if this
   files_struct is not shared it is not safe to access it lockless
   unless the caller is the owner.

   Otoh, this check is sub-optimal. files->count == 0 always means
   it is safe to use it lockless even if files != current->files,
   but put_files_struct() has to take rcu_read_lock(). See the next
   patch.

This patch removes the buggy checks and turns fcheck_files() into
__fcheck_files() which uses rcu_dereference_raw(), the "unshared"
callers, fget_light() and fget_raw_light(), can use it to avoid
the warning from RCU-lockdep.

fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
plus __fcheck_files().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 03:14:36 -05:00
Aaron Tomlin
2397efb1bb sysctl: Add neg_one as a standard constraint
Add neg_one to the list of standard constraints - will be used by the next patch.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: oleg@redhat.com
Link: http://lkml.kernel.org/r/1390239253-24030-2-git-send-email-atomlin@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 08:59:53 +01:00
Mike Travis
fc8b13740b kgdb/kdb: Fix no KDB config problem
Some code added to the debug_core module had KDB dependencies
that it shouldn't have.  Move the KDB dependent REASON back to
the caller to remove the dependency in the debug core code.

Update the call from the UV NMI handler to conform to the new
interface.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20140114162551.318251993@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 08:55:09 +01:00
Ingo Molnar
a2b4c607c9 Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent
Pull dynticks cleanups from Frederic Weisbecker.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 08:27:26 +01:00
Linus Torvalds
09da8dfa98 ACPI and power management updates for 3.14-rc1
- ACPI core changes to make it create a struct acpi_device object for every
    device represented in the ACPI tables during all namespace scans regardless
    of the current status of that device.  In accordance with this, ACPI hotplug
    operations will not delete those objects, unless the underlying ACPI tables
    go away.
 
  - On top of the above, new sysfs attribute for ACPI device objects allowing
    user space to check device status by triggering the execution of _STA for
    its ACPI object.  From Srinivas Pandruvada.
 
  - ACPI core hotplug changes reducing code duplication, integrating the
    PCI root hotplug with the core and reworking container hotplug.
 
  - ACPI core simplifications making it use ACPI_COMPANION() in the code
    "glueing" ACPI device objects to "physical" devices.
 
  - ACPICA update to upstream version 20131218.  This adds support for the
    DBG2 and PCCT tables to ACPICA, fixes some bugs and improves debug
    facilities.  From Bob Moore, Lv Zheng and Betty Dall.
 
  - Init code change to carry out the early ACPI initialization earlier.
    That should allow us to use ACPI during the timekeeping initialization
    and possibly to simplify the EFI initialization too.  From Chun-Yi Lee.
 
  - Clenups of the inclusions of ACPI headers in many places all over from
    Lv Zheng and Rashika Kheria (work in progress).
 
  - New helper for ACPI _DSM execution and rework of the code in drivers
    that uses _DSM to execute it via the new helper.  From Jiang Liu.
 
  - New Win8 OSI blacklist entries from Takashi Iwai.
 
  - Assorted ACPI fixes and cleanups from Al Stone, Emil Goode, Hanjun Guo,
    Lan Tianyu, Masanari Iida, Oliver Neukum, Prarit Bhargava, Rashika Kheria,
    Tang Chen, Zhang Rui.
 
  - intel_pstate driver updates, including proper Baytrail support, from
    Dirk Brandewie and intel_pstate documentation from Ramkumar Ramachandra.
 
  - Generic CPU boost ("turbo") support for cpufreq from Lukasz Majewski.
 
  - powernow-k6 cpufreq driver fixes from Mikulas Patocka.
 
  - cpufreq core fixes and cleanups from Viresh Kumar, Jane Li, Mark Brown.
 
  - Assorted cpufreq drivers fixes and cleanups from Anson Huang, John Tobias,
    Paul Bolle, Paul Walmsley, Sachin Kamat, Shawn Guo, Viresh Kumar.
 
  - cpuidle cleanups from Bartlomiej Zolnierkiewicz.
 
  - Support for hibernation APM events from Bin Shi.
 
  - Hibernation fix to avoid bringing up nonboot CPUs with ACPI EC disabled
    during thaw transitions from Bjørn Mork.
 
  - PM core fixes and cleanups from Ben Dooks, Leonardo Potenza, Ulf Hansson.
 
  - PNP subsystem fixes and cleanups from Dmitry Torokhov, Levente Kurusa,
    Rashika Kheria.
 
  - New tool for profiling system suspend from Todd E Brandt and a cpupower
    tool cleanup from One Thousand Gnomes.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS3a1eAAoJEILEb/54YlRxnTgP/iGawvgjKWm6Qqp7WSIvd5gQ
 zZ6q75C6Pc/W2fq1+OzVGnpCF8WYFy+nFDAXOvUHjIXuoxSwFcuW5l4aMckgl/0a
 TXEWe9MJrCHHRfDApfFacCJ44U02bjJAD5vTyL/hKA+IHeinq4WCSojryYC+8jU0
 cBrUIV0aNH8r5JR2WJNAyv/U29rXsDUOu0I4qTqZ4YaZT6AignMjtLXn1e9AH1Pn
 DPZphTIo/HMnb+kgBOjt4snMk+ahVO9eCOxh/hH8ecnWExw9WynXoU5Nsna0tSZs
 ssyHC7BYexD3oYsG8D52cFUpp4FCsJ0nFQNa2kw0LY+0FBNay43LySisKYHZPXEs
 2WpESDv+/t7yhtnrvM+TtA7aBheKm2XMWGFSu/aERLE17jIidOkXKH5Y7ryYLNf/
 uyRKxNS0NcZWZ0G+/wuY02jQYNkfYz3k/nTr8BAUItRBjdporGIRNEnR9gPzgCUC
 uQhjXWMPulqubr8xbyefPWHTEzU2nvbXwTUWGjrBxSy8zkyy5arfqizUj+VG6afT
 NsboANoMHa9b+xdzigSFdA3nbVK6xBjtU6Ywntk9TIpODKF5NgfARx0H+oSH+Zrj
 32bMzgZtHw/lAbYsnQ9OnTY6AEWQYt6NMuVbTiLXrMHhM3nWwfg/XoN4nZqs6jPo
 IYvE6WhQZU6L6fptGHFC
 =dRf6
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:
 "As far as the number of commits goes, the top spot belongs to ACPI
  this time with cpufreq in the second position and a handful of PM
  core, PNP and cpuidle updates.  They are fixes and cleanups mostly, as
  usual, with a couple of new features in the mix.

  The most visible change is probably that we will create struct
  acpi_device objects (visible in sysfs) for all devices represented in
  the ACPI tables regardless of their status and there will be a new
  sysfs attribute under those objects allowing user space to check that
  status via _STA.

  Consequently, ACPI device eject or generally hot-removal will not
  delete those objects, unless the table containing the corresponding
  namespace nodes is unloaded, which is extremely rare.  Also ACPI
  container hotplug will be handled quite a bit differently and cpufreq
  will support CPU boost ("turbo") generically and not only in the
  acpi-cpufreq driver.

  Specifics:

   - ACPI core changes to make it create a struct acpi_device object for
     every device represented in the ACPI tables during all namespace
     scans regardless of the current status of that device.  In
     accordance with this, ACPI hotplug operations will not delete those
     objects, unless the underlying ACPI tables go away.

   - On top of the above, new sysfs attribute for ACPI device objects
     allowing user space to check device status by triggering the
     execution of _STA for its ACPI object.  From Srinivas Pandruvada.

   - ACPI core hotplug changes reducing code duplication, integrating
     the PCI root hotplug with the core and reworking container hotplug.

   - ACPI core simplifications making it use ACPI_COMPANION() in the
     code "glueing" ACPI device objects to "physical" devices.

   - ACPICA update to upstream version 20131218.  This adds support for
     the DBG2 and PCCT tables to ACPICA, fixes some bugs and improves
     debug facilities.  From Bob Moore, Lv Zheng and Betty Dall.

   - Init code change to carry out the early ACPI initialization
     earlier.  That should allow us to use ACPI during the timekeeping
     initialization and possibly to simplify the EFI initialization too.
     From Chun-Yi Lee.

   - Clenups of the inclusions of ACPI headers in many places all over
     from Lv Zheng and Rashika Kheria (work in progress).

   - New helper for ACPI _DSM execution and rework of the code in
     drivers that uses _DSM to execute it via the new helper.  From
     Jiang Liu.

   - New Win8 OSI blacklist entries from Takashi Iwai.

   - Assorted ACPI fixes and cleanups from Al Stone, Emil Goode, Hanjun
     Guo, Lan Tianyu, Masanari Iida, Oliver Neukum, Prarit Bhargava,
     Rashika Kheria, Tang Chen, Zhang Rui.

   - intel_pstate driver updates, including proper Baytrail support,
     from Dirk Brandewie and intel_pstate documentation from Ramkumar
     Ramachandra.

   - Generic CPU boost ("turbo") support for cpufreq from Lukasz
     Majewski.

   - powernow-k6 cpufreq driver fixes from Mikulas Patocka.

   - cpufreq core fixes and cleanups from Viresh Kumar, Jane Li, Mark
     Brown.

   - Assorted cpufreq drivers fixes and cleanups from Anson Huang, John
     Tobias, Paul Bolle, Paul Walmsley, Sachin Kamat, Shawn Guo, Viresh
     Kumar.

   - cpuidle cleanups from Bartlomiej Zolnierkiewicz.

   - Support for hibernation APM events from Bin Shi.

   - Hibernation fix to avoid bringing up nonboot CPUs with ACPI EC
     disabled during thaw transitions from Bjørn Mork.

   - PM core fixes and cleanups from Ben Dooks, Leonardo Potenza, Ulf
     Hansson.

   - PNP subsystem fixes and cleanups from Dmitry Torokhov, Levente
     Kurusa, Rashika Kheria.

   - New tool for profiling system suspend from Todd E Brandt and a
     cpupower tool cleanup from One Thousand Gnomes"

* tag 'pm+acpi-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (153 commits)
  thermal: exynos: boost: Automatic enable/disable of BOOST feature (at Exynos4412)
  cpufreq: exynos4x12: Change L0 driver data to CPUFREQ_BOOST_FREQ
  Documentation: cpufreq / boost: Update BOOST documentation
  cpufreq: exynos: Extend Exynos cpufreq driver to support boost
  cpufreq / boost: Kconfig: Support for software-managed BOOST
  acpi-cpufreq: Adjust the code to use the common boost attribute
  cpufreq: Add boost frequency support in core
  intel_pstate: Add trace point to report internal state.
  cpufreq: introduce cpufreq_generic_get() routine
  ARM: SA1100: Create dummy clk_get_rate() to avoid build failures
  cpufreq: stats: create sysfs entries when cpufreq_stats is a module
  cpufreq: stats: free table and remove sysfs entry in a single routine
  cpufreq: stats: remove hotplug notifiers
  cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly
  cpufreq: speedstep: remove unused speedstep_get_state
  platform: introduce OF style 'modalias' support for platform bus
  PM / tools: new tool for suspend/resume performance optimization
  ACPI: fix module autoloading for ACPI enumerated devices
  ACPI: add module autoloading support for ACPI enumerated devices
  ACPI: fix create_modalias() return value handling
  ...
2014-01-24 15:51:02 -08:00
Linus Torvalds
3aacd625f2 Merge branch 'akpm' (incoming from Andrew)
Merge second patch-bomb from Andrew Morton:
 - various misc bits
 - the rest of MM
 - add generic fixmap.h, use it
 - backlight updates
 - dynamic_debug updates
 - printk() updates
 - checkpatch updates
 - binfmt_elf
 - ramfs
 - init/
 - autofs4
 - drivers/rtc
 - nilfs
 - hfsplus
 - Documentation/
 - coredump
 - procfs
 - fork
 - exec
 - kexec
 - kdump
 - partitions
 - rapidio
 - rbtree
 - userns
 - memstick
 - w1
 - decompressors

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (197 commits)
  lib/decompress_unlz4.c: always set an error return code on failures
  romfs: fix returm err while getting inode in fill_super
  drivers/w1/masters/w1-gpio.c: add strong pullup emulation
  drivers/memstick/host/rtsx_pci_ms.c: fix ms card data transfer bug
  userns: relax the posix_acl_valid() checks
  arch/sh/kernel/dwarf.c: use rbtree postorder iteration helper instead of solution using repeated rb_erase()
  fs-ext3-use-rbtree-postorder-iteration-helper-instead-of-opencoding-fix
  fs/ext3: use rbtree postorder iteration helper instead of opencoding
  fs/jffs2: use rbtree postorder iteration helper instead of opencoding
  fs/ext4: use rbtree postorder iteration helper instead of opencoding
  fs/ubifs: use rbtree postorder iteration helper instead of opencoding
  net/netfilter/ipset/ip_set_hash_netiface.c: use rbtree postorder iteration instead of opencoding
  rbtree/test: test rbtree_postorder_for_each_entry_safe()
  rbtree/test: move rb_node to the middle of the test struct
  rapidio: add modular rapidio core build into powerpc and mips branches
  partitions/efi: complete documentation of gpt kernel param purpose
  kdump: add /sys/kernel/vmcoreinfo ABI documentation
  kdump: fix exported size of vmcoreinfo note
  kexec: add sysctl to disable kexec_load
  fs/exec.c: call arch_pick_mmap_layout() only once
  ...
2014-01-23 19:11:50 -08:00
Linus Torvalds
13c789a6b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 3.14:

   - Improved crypto_memneq helper
   - Use cyprto_memneq in arch-specific crypto code
   - Replaced orphaned DCP driver with Freescale MXS DCP driver
   - Added AVX/AVX2 version of AESNI-GCM encode and decode
   - Added AMD Cryptographic Coprocessor (CCP) driver
   - Misc fixes"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (41 commits)
  crypto: aesni - fix build on x86 (32bit)
  crypto: mxs - Fix sparse non static symbol warning
  crypto: ccp - CCP device enabled/disabled changes
  crypto: ccp - Cleanup hash invocation calls
  crypto: ccp - Change data length declarations to u64
  crypto: ccp - Check for caller result area before using it
  crypto: ccp - Cleanup scatterlist usage
  crypto: ccp - Apply appropriate gfp_t type to memory allocations
  crypto: drivers - Sort drivers/crypto/Makefile
  ARM: mxs: dts: Enable DCP for MXS
  crypto: mxs - Add Freescale MXS DCP driver
  crypto: mxs - Remove the old DCP driver
  crypto: ahash - Fully restore ahash request before completing
  crypto: aesni - fix build on x86 (32bit)
  crypto: talitos - Remove redundant dev_set_drvdata
  crypto: ccp - Remove redundant dev_set_drvdata
  crypto: crypto4xx - Remove redundant dev_set_drvdata
  crypto: caam - simplify and harden key parsing
  crypto: omap-sham - Fix Polling mode for larger blocks
  crypto: tcrypt - Added speed tests for AEAD crypto alogrithms in tcrypt test suite
  ...
2014-01-23 18:11:00 -08:00
Linus Torvalds
6dd9158ae8 Merge git://git.infradead.org/users/eparis/audit
Pull audit update from Eric Paris:
 "Again we stayed pretty well contained inside the audit system.
  Venturing out was fixing a couple of function prototypes which were
  inconsistent (didn't hurt anything, but we used the same value as an
  int, uint, u32, and I think even a long in a couple of places).

  We also made a couple of minor changes to when a couple of LSMs called
  the audit system.  We hoped to add aarch64 audit support this go
  round, but it wasn't ready.

  I'm disappearing on vacation on Thursday.  I should have internet
  access, but it'll be spotty.  If anything goes wrong please be sure to
  cc rgb@redhat.com.  He'll make fixing things his top priority"

* git://git.infradead.org/users/eparis/audit: (50 commits)
  audit: whitespace fix in kernel-parameters.txt
  audit: fix location of __net_initdata for audit_net_ops
  audit: remove pr_info for every network namespace
  audit: Modify a set of system calls in audit class definitions
  audit: Convert int limit uses to u32
  audit: Use more current logging style
  audit: Use hex_byte_pack_upper
  audit: correct a type mismatch in audit_syscall_exit()
  audit: reorder AUDIT_TTY_SET arguments
  audit: rework AUDIT_TTY_SET to only grab spin_lock once
  audit: remove needless switch in AUDIT_SET
  audit: use define's for audit version
  audit: documentation of audit= kernel parameter
  audit: wait_for_auditd rework for readability
  audit: update MAINTAINERS
  audit: log task info on feature change
  audit: fix incorrect set of audit_sock
  audit: print error message when fail to create audit socket
  audit: fix dangling keywords in audit_log_set_loginuid() output
  audit: log on errors from filter user rules
  ...
2014-01-23 18:08:10 -08:00
Vivek Goyal
77019967f0 kdump: fix exported size of vmcoreinfo note
Right now we seem to be exporting the max data size contained inside
vmcoreinfo note.  But this does not include the size of meta data around
vmcore info data.  Like name of the note and starting and ending elf_note.

I think user space expects total size and that size is put in PT_NOTE elf
header.  Things seem to be fine so far because we are not using vmcoreinfo
note to the maximum capacity.  But as it starts filling up, to capacity,
at some point of time, problem will be visible.

I don't think user space will be broken with this change.  So there is no
need to introduce vmcoreinfo2.  This change is safe and backward
compatible.  More explanation on why this change is safe is below.

vmcoreinfo contains information about kernel which user space needs to
know to do things like filtering.  For example, various kernel config
options or information about size or offset of some data structures etc.
All this information is commmunicated to user space with an ELF note
present in ELF /proc/vmcore file.

Currently vmcoreinfo data size is 4096.  With some elf note meta data
around it, actual size is 4132 bytes.  But we are using barely 25% of that
size.  Rest is empty.  So even if we tell user space that size of ELf note
is 4096 and not 4132, nothing will be broken becase after around 1000
bytes, everything is zero anyway.

But once we start filling up the note to the capacity, and not report the
full size of note, bad things will start happening.  Either some data will
be lost or tools will be confused that they did not fine the zero note at
the end.

So I think this change is safe and should not break existing tools.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Cc: Dan Aloni <da-x@monatomic.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:03 -08:00
Kees Cook
7984754b99 kexec: add sysctl to disable kexec_load
For general-purpose (i.e.  distro) kernel builds it makes sense to build
with CONFIG_KEXEC to allow end users to choose what kind of things they
want to do with kexec.  However, in the face of trying to lock down a
system with such a kernel, there needs to be a way to disable kexec_load
(much like module loading can be disabled).  Without this, it is too easy
for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM
and modules_disabled are set.  With this change, it is still possible to
load an image for use later, then disable kexec_load so the image (or lack
of image) can't be altered.

The intention is for using this in environments where "perfect"
enforcement is hard.  Without a verified boot, along with verified
modules, and along with verified kexec, this is trying to give a system a
better chance to defend itself (or at least grow the window of
discoverability) against attack in the face of a privilege escalation.

In my mind, I consider several boot scenarios:

1) Verified boot of read-only verified root fs loading fd-based
   verification of kexec images.
2) Secure boot of writable root fs loading signed kexec images.
3) Regular boot loading kexec (e.g. kcrash) image early and locking it.
4) Regular boot with no control of kexec image at all.

1 and 2 don't exist yet, but will soon once the verified kexec series has
landed.  4 is the state of things now.  The gap between 2 and 4 is too
large, so this change creates scenario 3, a middle-ground above 4 when 2
and 1 are not possible for a system.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:03 -08:00
Oleg Nesterov
8d38f203b4 kernel/signal.c: change do_signal_stop/do_sigaction to use while_each_thread()
Change do_signal_stop() and do_sigaction() to avoid next_thread() and use
while_each_thread() instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Oleg Nesterov
2e1f383582 kernel/sys.c: k_getrusage() can use while_each_thread()
Change k_getrusage() to use while_each_thread(), no changes in the
compiled code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Oleg Nesterov
98611e4e6a exec: kill task_struct->did_exec
We can kill either task->did_exec or PF_FORKNOEXEC, they are mutually
exclusive.  The patch kills ->did_exec because it has a single user.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Daeseok Youn
68ce670b6e kernel/fork.c: remove redundant NULL check in dup_mm()
current->mm doesn't need a NULL check in dup_mm().  Becasue dup_mm() is
used only in copy_mm() and current->mm is checked whether it is NULL or
not in copy_mm() before calling dup_mm().

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Daeseok Youn
5d59e18270 kernel/fork.c: fix coding style issues
Fix errors reported by checkpatch.pl.  One error is parentheses, the other
is a whitespace issue.

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
DaeSeok Youn
ff252c1fc5 kernel/fork.c: make dup_mm() static
dup_mm() is used only in kernel/fork.c

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:02 -08:00
Arun KS
1d3fa37034 printk: flush conflicting continuation line
An earlier newline was missing and current print is from different task.
In this scenario flush the continuation line and store this line
seperatly.

This patch fix the below scenario of timestamp interleaving,
   [   28.154370 ] read_word_reg : reg[0x 3], reg[0x 4]  data [0x 642]
   [   28.155428 ] uart disconnect
   [   31.947341 ] dvfs[cpufreq.c<275>]:plug-in cpu<1> done
   [   28.155445 ] UART detached : send switch state 201
   [   32.014112 ] read_reg : reg[0x 3] data[0x21]

[akpm@linux-foundation.org: simplify and condense the code]
Signed-off-by: Arun KS <getarunks@gmail.com>
Signed-off-by: Arun KS <arun.ks@broadcom.com>
Cc: Joe Perches <joe@perches.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:56 -08:00
Andi Kleen
54a43d5498 numa: add a sysctl for numa_balancing
Add a working sysctl to enable/disable automatic numa memory balancing
at runtime.

This allows us to track down performance problems with this feature and
is generally a good idea.

This was possible earlier through debugfs, but only with special
debugging options set.  Also fix the boot message.

[akpm@linux-foundation.org: s/sched_numa_balancing/sysctl_numa_balancing/]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Steven Rostedt (Red Hat)
3132e107d6 tracing: Check if tracing is enabled in trace_puts()
If trace_puts() is used very early in boot up, it can crash the machine
if it is called before the ring buffer is allocated. If a trace_printk()
is used with no arguments, then it will be converted into a trace_puts()
and suffer the same fate.

Cc: stable@vger.kernel.org # 3.10+
Fixes: 09ae72348e "tracing: Add trace_puts() for even faster trace_printk() tracing"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-23 12:27:59 -05:00