speed up schedule(): share the 'now' parameter that deactivate_task()
was calculating internally.
( this also fixes the small accounting window between the deactivate
call and the pick_next_task() call. )
Signed-off-by: Ingo Molnar <mingo@elte.hu>
uninline rq_clock() to save 263 bytes of code:
text data bss dec hex filename
39561 3642 24 43227 a8db sched.o.before
39298 3642 24 42964 a7d4 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
here's another tiny cleanup. The generated code is not affected (gcc is
smart enough) but for people looking over the code it is just irritating
to have the extra conditional.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The move_tasks() function is currently multiplexed with two distinct
capabilities:
1. attempt to move a specified amount of weighted load from one run
queue to another; and
2. attempt to move a specified number of tasks from one run queue to
another.
The first of these capabilities is used in two places, load_balance()
and load_balance_idle(), and in both of these cases the return value of
move_tasks() is used purely to decide if tasks/load were moved and no
notice of the actual number of tasks moved is taken.
The second capability is used in exactly one place,
active_load_balance(), to attempt to move exactly one task and, as
before, the return value is only used as an indicator of success or failure.
This multiplexing of sched_task() was introduced, by me, as part of the
smpnice patches and was motivated by the fact that the alternative, one
function to move specified load and one to move a single task, would
have led to two functions of roughly the same complexity as the old
move_tasks() (or the new balance_tasks()). However, the new modular
design of the new CFS scheduler allows a simpler solution to be adopted
and this patch addresses that solution by:
1. adding a new function, move_one_task(), to be used by
active_load_balance(); and
2. making move_tasks() a single purpose function that tries to move a
specified weighted load and returns 1 for success and 0 for failure.
One of the consequences of these changes is that neither move_one_task()
or the new move_tasks() care how many tasks sched_class.load_balance()
moves and this enables its interface to be simplified by returning the
amount of load moved as its result and removing the load_moved pointer
from the argument list. This helps simplify the new move_tasks() and
slightly reduces the amount of work done in each of
sched_class.load_balance()'s implementations.
Further simplification, e.g. changes to balance_tasks(), are possible
but (slightly) complicated by the special needs of load_balance_fair()
so I've left them to a later patch (if this one gets accepted).
NB Since move_tasks() gets called with two run queue locks held even
small reductions in overhead are worthwhile.
[ mingo@elte.hu ]
this change also reduces code size nicely:
text data bss dec hex filename
39216 3618 24 42858 a76a sched.o.before
39173 3618 24 42815 a73f sched.o.after
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Williams suggested to flip the order of update_cpu_load(rq) with
the ->task_tick() call. This is a NOP for the current scheduler (the
two functions are independent of each other), ->task_tick() might
create some state for update_cpu_load() in the future (or in PlugSched).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move the rest of the debugging/instrumentation code to under
CONFIG_SCHEDSTATS too. This reduces code size and speeds code up:
text data bss dec hex filename
33044 4122 28 37194 914a sched.o.before
32708 4122 28 36858 8ffa sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. The only place that RTPRIO_TO_LOAD_WEIGHT() is used is in the call to
move_tasks() in the function active_load_balance() and its purpose here
is just to make sure that the load to be moved is big enough to ensure
that exactly one task is moved (if there's one available). This can be
accomplished by using ULONG_MAX instead and this allows
RTPRIO_TO_LOAD_WEIGHT() to be deleted.
2. This, in turn, allows PRIO_TO_LOAD_WEIGHT() to be deleted.
3. This allows load_weight() to be deleted which allows
TIME_SLICE_NICE_ZERO to be deleted along with the comment above it.
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix kernel-doc warnings in sched.c:
Warning(linux-2623-rc1g4//kernel/sched.c:1685): No description found for parameter 'notifier'
Warning(linux-2623-rc1g4//kernel/sched.c:1696): No description found for parameter 'notifier'
Warning(linux-2623-rc1g4//kernel/sched.c:1750): No description found for parameter 'prev'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
debugging feature: make the sched-domains tree runtime-tweakable.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ mingo@elte.hu: made it depend on CONFIG_SCHED_DEBUG & small updates ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
it is enough to disable interrupts to get the precise rq-clock
of the local CPU.
this also solves an NMI watchdog regression: the NMI watchdog
calls touch_softlockup_watchdog(), which might deadlock on
rq->lock if the NMI hits an rq-locked critical section.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This adds a general mechanism whereby a task can request the scheduler to
notify it whenever it is preempted or scheduled back in. This allows the
task to swap any special-purpose registers like the fpu or Intel's VT
registers.
Signed-off-by: Avi Kivity <avi@qumranet.com>
[ mingo@elte.hu: fixes, cleanups ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implement the cpu_clock(cpu) interface for kernel-internal use:
high-speed (but slightly incorrect) per-cpu clock constructed from
sched_clock().
This API, unused at the moment, will be used in the future by blktrace,
by the softlockup-watchdog, by printk and by lockstat.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
nr_moved is not the correct check for triggering all pinned logic. Fix
the all pinned logic in the case of load_balance_newidle().
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the presence of SMT, newly idle balance was never happening for
multi-core and SMP domains (even when both the logical siblings are
idle).
If thread 0 is already idle and when thread 1 is about to go to idle,
newly idle load balance always think that one of the threads is not idle
and skips doing the newly idle load balance for multi-core and SMP
domains.
This is because of the idle_cpu() macro, which checks if the current
process on a cpu is an idle process. But this is not the case for the
thread doing the load_balance_newidle().
Fix this by using runqueue's nr_running field instead of idle_cpu(). And
also skip the logic of 'only one idle cpu in the group will be doing
load balancing' during newly idle case.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently most of the per cpu data, which is accessed by different cpus,
has a ____cacheline_aligned_in_smp attribute. Move all this data to the
new per cpu shared data section: .data.percpu.shared_aligned.
This will seperate the percpu data which is referenced frequently by other
cpus from the local only percpu data.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
improve the comments around the wmult array (which controls the weight
of niced tasks). Clarify that to achieve a 10% difference in CPU
utilization, a weight multiplier of 1.25 has to be used.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Roman Zippel noticed another inconsistency of the wmult table.
wmult[16] has a missing digit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix show_task()/show_tasks() output:
- there's no sibling info anymore
- the fields were not aligned properly with the description
- get rid of the lazy-TLB output: it's been quite some time since
we last had a bug there, and when we had a bug it wasnt helped a
bit by this debug output.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's a typo in the values in prio_to_wmult[] for nice level 1. While
it did not cause bad CPU distribution, but caused more rescheduling
between nice-0 and nice-1 tasks than necessary.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
add credits for recent major scheduler contributions:
Con Kolivas, for pioneering the fair-scheduling approach
Peter Williams, for smpnice
Mike Galbraith, for interactivity tuning of CFS
Srivatsa Vaddagiri, for group scheduling enhancements
Signed-off-by: Ingo Molnar <mingo@elte.hu>
clean up the sleep_on() APIs:
- do not use fastcall
- replace fragile macro magic with proper inline functions
Signed-off-by: Ingo Molnar <mingo@elte.hu>
4 small style cleanups to sched.c: checkpatch.pl is now happy about
the totality of sched.c [ignoring false positives] - yay! ;-)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
track TSC-unstable events and propagate it to the scheduler code.
Also allow sched_clock() to be used when the TSC is unstable,
the rq_clock() wrapper creates a reliable clock out of it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
apply the CFS core code.
this change switches over the scheduler core to CFS's modular
design and makes use of kernel/sched_fair/rt/idletask.c to implement
Linux's scheduling policies.
thanks to Andrew Morton and Thomas Gleixner for lots of detailed review
feedback and for fixlets.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
remove the sleep-bonus interactivity code from the core scheduler.
scheduling policy is implemented in the policy modules, and CFS does
not need such type of heuristics.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the expired_starving() heuristics from the core scheduler.
CFS does not need it, and this did not really work well in practice
anyway, due to the rq->nr_running multiplier to STARVATION_LIMIT.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
remove the sleep_type heuristics from the core scheduler - scheduling
policy is implemented in the scheduling-policy modules. (and CFS does
not use this type of sleep-type heuristics)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cleanup: move dequeue/enqueue_task() to a more logical place, to
not split up __normal_prio()/normal_prio().
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move resched_task()/resched_cpu() into the 'public interfaces'
section of sched.c, for use by kernel/sched_fair/rt/idletask.c
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add rq_clock()/__rq_clock(), a robust wrapper around sched_clock(),
used by CFS. It protects against common type of sched_clock() problems
(caused by hardware): time warps forwards and backwards.
Signed-off-by: Ingo Molnar <mingo@elte.hu>