kernel-fxtec-pro1x/kernel
Venkatesh Pallipadi 2f36825b17 sched: Next buddy hint on sleep and preempt path
When a task in a taskgroup sleeps, pick_next_task starts all the way back at
the root and picks the task/taskgroup with the min vruntime across all
runnable tasks.

But when there are many frequently sleeping tasks across different taskgroups,
it makes better sense to stay with same taskgroup for its slice period (or
until all tasks in the taskgroup sleeps) instead of switching cross taskgroup
on each sleep after a short runtime.

This helps specifically where taskgroups corresponds to a process with
multiple threads. The change reduces the number of CR3 switches in this case.

Example:

Two taskgroups with 2 threads each which are running for 2ms and
sleeping for 1ms. Looking at sched:sched_switch shows:

BEFORE: taskgroup_1 threads [5004, 5005], taskgroup_2 threads [5016, 5017]
      cpu-soaker-5004  [003]  3683.391089
      cpu-soaker-5016  [003]  3683.393106
      cpu-soaker-5005  [003]  3683.395119
      cpu-soaker-5017  [003]  3683.397130
      cpu-soaker-5004  [003]  3683.399143
      cpu-soaker-5016  [003]  3683.401155
      cpu-soaker-5005  [003]  3683.403168
      cpu-soaker-5017  [003]  3683.405170

AFTER: taskgroup_1 threads [21890, 21891], taskgroup_2 threads [21934, 21935]
      cpu-soaker-21890 [003]   865.895494
      cpu-soaker-21935 [003]   865.897506
      cpu-soaker-21934 [003]   865.899520
      cpu-soaker-21935 [003]   865.901532
      cpu-soaker-21934 [003]   865.903543
      cpu-soaker-21935 [003]   865.905546
      cpu-soaker-21891 [003]   865.907548
      cpu-soaker-21890 [003]   865.909560
      cpu-soaker-21891 [003]   865.911571
      cpu-soaker-21890 [003]   865.913582
      cpu-soaker-21891 [003]   865.915594
      cpu-soaker-21934 [003]   865.917606

Similar problem is there when there are multiple taskgroups and say a task A
preempts currently running task B of taskgroup_1. On schedule, pick_next_task
can pick an unrelated task on taskgroup_2. Here it would be better to give some
preference to task B on pick_next_task.

A simple (may be extreme case) benchmark I tried was tbench with 2 tbench
client processes with 2 threads each running on a single CPU. Avg throughput
across 5 50 sec runs was:

 BEFORE: 105.84 MB/sec
 AFTER:  112.42 MB/sec

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1302802253-25760-1-git-send-email-venki@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-19 10:08:38 +02:00
..
debug Fix common misspellings 2011-03-31 11:26:23 -03:00
gcov Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2011-03-20 18:14:55 -07:00
irq Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-04-07 12:12:58 -07:00
power fix XEN_SAVE_RESTORE Kconfig dependencies 2011-04-11 22:54:48 +02:00
time Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 2011-04-07 11:14:49 -07:00
trace Fix common misspellings 2011-03-31 11:26:23 -03:00
.gitignore
acct.c
async.c
audit.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
audit.h audit: make functions static 2010-10-30 01:42:19 -04:00
audit_tree.c Fix common misspellings 2011-03-31 11:26:23 -03:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
auditfilter.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
auditsc.c Fix common misspellings 2011-03-31 11:26:23 -03:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c userns: make has_capability* into real functions 2011-03-23 19:47:06 -07:00
cgroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
cgroup_freezer.c cgroup_freezer: update_freezer_state() does incorrect state transitions 2010-10-27 18:03:08 -07:00
compat.c posix-timers: Introduce a syscall for clock tuning. 2011-02-02 15:28:19 +01:00
configs.c
cpu.c Fix common misspellings 2011-03-31 11:26:23 -03:00
cpuset.c sched: Dynamic sched_domain::level 2011-04-11 14:09:32 +02:00
crash_dump.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
cred.c userns: security: make capabilities relative to the user namespace 2011-03-23 19:47:02 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Fix common misspellings 2011-03-31 11:26:23 -03:00
extable.c
fork.c Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block 2011-03-24 10:16:26 -07:00
freezer.c Freezer: Fix a race during freezing of TASK_STOPPED tasks 2010-12-24 15:02:40 +01:00
futex.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
futex_compat.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
groups.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
hrtimer.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 18:53:35 -07:00
hung_task.c
hw_breakpoint.c perf: Dynamic pmu types 2010-12-16 11:36:43 +01:00
irq_work.c irq_work: Use per cpu atomics instead of regular atomics 2010-12-18 15:54:48 +01:00
itimer.c
jump_label.c jump label: Make arch_jump_label_text_poke_early() optional 2010-10-29 12:56:13 -04:00
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 2011-04-07 11:14:49 -07:00
kfifo.c
kmod.c
kprobes.c Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-01-07 17:02:58 -08:00
ksysfs.c
kthread.c Fix common misspellings 2011-03-31 11:26:23 -03:00
latencytop.c Fix common misspellings 2011-03-31 11:26:23 -03:00
lockdep.c Fix common misspellings 2011-03-31 11:26:23 -03:00
lockdep_internals.h
lockdep_proc.c lockdep: Remove unused 'factor' variable from lockdep_stats_show() 2011-03-23 13:54:47 +01:00
lockdep_states.h
Makefile crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
module.c Fix common misspellings 2011-03-31 11:26:23 -03:00
mutex-debug.c mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c
ns_cgroup.c cgroup: notify ns_cgroup deprecated 2010-10-27 18:03:09 -07:00
nsproxy.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
padata.c Fix common misspellings 2011-03-31 11:26:23 -03:00
panic.c move x86 specific oops=panic to generic code 2011-03-22 17:44:11 -07:00
params.c Fix common misspellings 2011-03-31 11:26:23 -03:00
perf_event.c perf: Fix task_struct reference leak 2011-03-31 13:02:56 +02:00
pid.c export pid symbols needed for kvm_vcpu_on_spin 2011-03-17 13:08:28 -03:00
pid_namespace.c pidns: call pid_ns_prepare_proc() from create_pid_namespace() 2011-03-23 19:46:58 -07:00
pm_qos_params.c PM QoS: Make pm_qos settings readable 2011-03-15 00:43:18 +01:00
posix-cpu-timers.c Fix common misspellings 2011-03-31 11:26:23 -03:00
posix-timers.c Fix common misspellings 2011-03-31 11:26:23 -03:00
printk.c printk: allow setting DEFAULT_MESSAGE_LEVEL via Kconfig 2011-03-22 17:44:13 -07:00
profile.c
ptrace.c userns: allow ptrace from non-init user namespaces 2011-03-23 19:47:05 -07:00
range.c kernel/range.c: fix clean_sort_range() for the case of full array 2010-11-12 07:55:31 -08:00
rcupdate.c rcu: add comment saying why DEBUG_OBJECTS_RCU_HEAD depends on PREEMPT. 2011-03-04 08:05:41 -08:00
rcutiny.c rcu: avoid pointless blocked-task warnings 2011-01-14 04:58:08 -08:00
rcutiny_plugin.h rcu: call __rcu_read_unlock() in exit_rcu for tiny RCU 2011-03-04 08:05:08 -08:00
rcutorture.c rcutorture: Get rid of duplicate sched.h include 2011-03-04 08:05:17 -08:00
rcutree.c Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-01-07 17:02:58 -08:00
rcutree.h rcu: limit rcu_node leaf-level fanout 2010-12-17 12:34:20 -08:00
rcutree_plugin.h rcu: increase synchronize_sched_expedited() batching 2010-12-17 12:34:08 -08:00
rcutree_trace.c rcu,cleanup: simplify the code when cpu is dying 2010-11-29 22:01:58 -08:00
relay.c Clean up relay_alloc_page_array() slightly by using vzalloc rather than vmalloc and memset 2010-11-05 08:21:34 -07:00
res_counter.c memcg: res_counter_read_u64(): fix potential races on 32-bit machines 2011-03-23 19:46:22 -07:00
resource.c resources: add arch hook for preventing allocation in reserved areas 2010-12-17 10:01:09 -08:00
rtmutex-debug.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.h
rtmutex-tester.c rtmutex: tester: Remove the remaining BKL leftovers 2011-02-22 22:07:22 +01:00
rtmutex.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex.h
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rwsem.c
sched.c Merge branch 'sched/locking' into sched/core 2011-04-18 14:53:33 +02:00
sched_autogroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_autogroup.h sched, autogroup: Stop going ahead if autogroup is disabled 2011-02-23 11:33:59 +01:00
sched_clock.c sched: Add some clock info to sched_debug 2010-11-23 10:29:08 +01:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: Provide p->on_rq 2011-04-14 08:52:35 +02:00
sched_fair.c sched: Next buddy hint on sleep and preempt path 2011-04-19 10:08:38 +02:00
sched_features.h sched: Move the second half of ttwu() to the remote cpu 2011-04-14 08:52:41 +02:00
sched_idletask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched_rt.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched_stats.h sched_stat: Update sched_info_queue/dequeue() code comments 2010-10-24 13:29:01 +02:00
sched_stoptask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
seccomp.c
semaphore.c
signal.c signal.c: fix erroneous syscall kernel-doc 2011-04-08 11:05:24 -07:00
smp.c smp: move smp setup functions to kernel/smp.c 2011-03-22 17:44:11 -07:00
softirq.c Fix common misspellings 2011-03-31 11:26:23 -03:00
spinlock.c
srcu.c rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status 2011-01-14 04:56:49 -08:00
stacktrace.c
stop_machine.c kthread: use kthread_create_on_node() 2011-03-22 17:44:01 -07:00
sys.c userns: user namespaces: convert all capable checks in kernel/sys.c 2011-03-23 19:47:06 -07:00
sys_ni.c vfs: Add open by file handle support 2011-03-15 02:21:44 -04:00
sysctl.c sysctl: restrict write access to dmesg_restrict 2011-03-23 19:46:54 -07:00
sysctl_binary.c open-style analog of vfs_path_lookup() 2011-03-14 09:15:28 -04:00
sysctl_check.c sysctl_check: drop dead code 2011-03-23 19:46:51 -07:00
taskstats.c taskstats: use appropriate printk priority level 2011-03-23 19:47:14 -07:00
test_kprobes.c
time.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 18:53:35 -07:00
timeconst.pl
timer.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 18:53:35 -07:00
tracepoint.c tracepoints: Fix section alignment using pointer array 2011-02-03 09:28:46 -05:00
tsacct.c taskstats: use real microsecond granularity for CPU times 2010-10-27 18:03:17 -07:00
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c
user-return-notifier.c Fix common misspellings 2011-03-31 11:26:23 -03:00
user.c userns: add a user_namespace as creator/owner of uts_namespace 2011-03-23 19:46:59 -07:00
user_namespace.c user_ns: improve the user_ns on-the-slab packaging 2011-01-13 08:03:18 -08:00
utsname.c userns: allow sethostname in a container 2011-03-23 19:47:03 -07:00
utsname_sysctl.c
wait.c Fix common misspellings 2011-03-31 11:26:23 -03:00
watchdog.c kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events 2011-03-22 17:44:12 -07:00
workqueue.c Fix common misspellings 2011-03-31 11:26:23 -03:00
workqueue_sched.h