kernel-fxtec-pro1x

History

Srivatsa Vaddagiri 6b2d770026 sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups The current load balancing scheme isn't good enough for precise group fairness. For example: on a 8-cpu system, I created 3 groups as under: a = 8 tasks (cpu.shares = 1024) b = 4 tasks (cpu.shares = 1024) c = 3 tasks (cpu.shares = 1024) a, b and c are task groups that have equal weight. We would expect each of the groups to receive 33.33% of cpu bandwidth under a fair scheduler. This is what I get with the latest scheduler git tree: Signed-off-by: Ingo Molnar <mingo@elte.hu> -------------------------------------------------------------------------------- Col1 \| Col2 \| Col3 \| Col4 ------\|---------\|-------\|------------------------------------------------------- a \| 277.676 \| 57.8% \| 54.1% 54.1% 54.1% 54.2% 56.7% 62.2% 62.8% 64.5% b \| 116.108 \| 24.2% \| 47.4% 48.1% 48.7% 49.3% c \| 86.326 \| 18.0% \| 47.5% 47.9% 48.5% -------------------------------------------------------------------------------- Explanation of o/p: Col1 -> Group name Col2 -> Cumulative execution time (in seconds) received by all tasks of that group in a 60sec window across 8 cpus Col3 -> CPU bandwidth received by the group in the 60sec window, expressed in percentage. Col3 data is derived as: Col3 = 100 * Col2 / (NR_CPUS * 60) Col4 -> CPU bandwidth received by each individual task of the group. Col4 = 100 * cpu_time_recd_by_task / 60 [I can share the test case that produces a similar o/p if reqd] The deviation from desired group fairness is as below: a = +24.47% b = -9.13% c = -15.33% which is quite high. After the patch below is applied, here are the results: -------------------------------------------------------------------------------- Col1 \| Col2 \| Col3 \| Col4 ------\|---------\|-------\|------------------------------------------------------- a \| 163.112 \| 34.0% \| 33.2% 33.4% 33.5% 33.5% 33.7% 34.4% 34.8% 35.3% b \| 156.220 \| 32.5% \| 63.3% 64.5% 66.1% 66.5% c \| 160.653 \| 33.5% \| 85.8% 90.6% 91.4% -------------------------------------------------------------------------------- Deviation from desired group fairness is as below: a = +0.67% b = -0.83% c = +0.17% which is far better IMO. Most of other runs have yielded a deviation within +-2% at the most, which is good. Why do we see bad (group) fairness with current scheuler? ========================================================= Currently cpu's weight is just the summation of individual task weights. This can yield incorrect results. For ex: consider three groups as below on a 2-cpu system: CPU0 CPU1 --------------------------- A (10) B(5) C(5) --------------------------- Group A has 10 tasks, all on CPU0, Group B and C have 5 tasks each all of which are on CPU1. Each task has the same weight (NICE_0_LOAD = 1024). The current scheme would yield a cpu weight of 10240 (101024) for each cpu and the load balancer will think both CPUs are perfectly balanced and won't move around any tasks. This, however, would yield this bandwidth: A = 50% B = 25% C = 25% which is not the desired result. What's changing in the patch? ============================= - How cpu weights are calculated when CONFIF_FAIR_GROUP_SCHED is defined (see below) - API Change - Two tunables introduced in sysfs (under SCHED_DEBUG) to control the frequency at which the load balance monitor thread runs. The basic change made in this patch is how cpu weight (rq->load.weight) is calculated. Its now calculated as the summation of group weights on a cpu, rather than summation of task weights. Weight exerted by a group on a cpu is dependent on the shares allocated to it and also the number of tasks the group has on that cpu compared to the total number of (runnable) tasks the group has in the system. Let, W(K,i) = Weight of group K on cpu i T(K,i) = Task load present in group K's cfs_rq on cpu i T(K) = Total task load of group K across various cpus S(K) = Shares allocated to group K NRCPUS = Number of online cpus in the scheduler domain to which group K is assigned. Then, W(K,i) = S(K) NRCPUS * T(K,i) / T(K) A load balance monitor thread is created at bootup, which periodically runs and adjusts group's weight on each cpu. To avoid its overhead, two min/max tunables are introduced (under SCHED_DEBUG) to control the rate at which it runs. Fixes from: Peter Zijlstra <a.p.zijlstra@chello.nl> - don't start the load_balance_monitor when there is only a single cpu. - rename the kthread because its currently longer than TASK_COMM_LEN Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>		2008-01-25 21:08:00 +01:00
..
irq	genirq: revert lazy irq disable for simple irqs	2007-12-18 18:05:58 +01:00
power	driver core: make /sys/power a kobject	2008-01-24 20:40:25 -08:00
time	Driver core: change sysdev classes to use dynamic kobject names	2008-01-24 20:40:40 -08:00
.gitignore
acct.c	acct: real_parent ppid	2008-01-07 14:55:37 -08:00
audit.c	[PATCH] audit: watching subtrees	2007-10-21 02:37:45 -04:00
audit.h	[PATCH] audit: watching subtrees	2007-10-21 02:37:45 -04:00
audit_tree.c	[PATCH] audit: watching subtrees	2007-10-21 02:37:45 -04:00
auditfilter.c	[PATCH] audit: watching subtrees	2007-10-21 02:37:45 -04:00
auditsc.c	auditsc: fix kernel-doc param warnings	2007-10-22 19:40:02 -07:00
capability.c	Uninline find_pid etc set of functions	2007-10-19 11:53:41 -07:00
cgroup.c	Improve cgroup printks	2007-11-14 18:45:37 -08:00
cgroup_debug.c	Task Control Groups: simple task cgroup debug info subsystem	2007-10-19 11:53:36 -07:00
compat.c	Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt	2007-10-18 15:12:41 -07:00
configs.c
cpu.c	CPU HOTPLUG: avoid hotadd when proper possible_map isn't specified	2007-10-19 11:53:44 -07:00
cpuset.c	hotplug cpu: migrate a task within its cpuset	2007-10-19 11:53:44 -07:00
delayacct.c	Add scaled time to taskstats based process accounting	2007-10-18 14:37:28 -07:00
dma.c	whitespace fixes: DMA channel allocator	2007-10-18 14:37:24 -07:00
exec_domain.c	whitespace fixes: execution domains	2007-10-18 14:37:26 -07:00
exit.c	wait_task_stopped(): pass correct exit_code to wait_noreap_copyout()	2007-11-29 09:24:55 -08:00
extable.c
fork.c	fix clone(CLONE_NEWPID)	2007-12-05 09:21:18 -08:00
futex.c	futex: Prevent stale futex owner when interrupted/timeout	2008-01-08 16:21:39 -08:00
futex_compat.c	[FUTEX] Fix address computation in compat code.	2007-11-09 16:13:08 -08:00
hrtimer.c	hrtimer: fix section mismatch	2008-01-21 19:39:41 -08:00
itimer.c	whitespace fixes: interval timers	2007-10-18 14:37:26 -07:00
kallsyms.c	FRV: fix the extern declaration of kallsyms_num_syms	2007-11-29 09:24:54 -08:00
Kconfig.hz
Kconfig.instrumentation	Tiny clean-up of OPROFILE/KPROBES configuration	2007-12-06 09:41:12 -08:00
Kconfig.preempt	Move PREEMPT_NOTIFIERS into an always-included Kconfig	2007-10-17 08:42:55 -07:00
kexec.c	vmcoreinfo: add the array length of "free_list" for filtering free pages	2008-01-08 16:10:36 -08:00
kfifo.c	is_power_of_2: kernel/kfifo.c	2007-07-16 09:05:50 -07:00
kmod.c	Fix unbalanced helper_lock in kernel/kmod.c	2008-01-17 15:38:59 -08:00
kprobes.c	kprobes: support kretprobe blacklist	2007-10-16 09:43:10 -07:00
ksysfs.c	Kobject: convert remaining kobject_unregister() to kobject_put()	2008-01-24 20:40:40 -08:00
kthread.c	kthread: silence bogus section mismatch warning	2007-07-31 15:39:42 -07:00
latency.c
lockdep.c	lockdep: fix kernel crash on module unload	2008-01-24 08:01:09 -08:00
lockdep_internals.h
lockdep_proc.c	lockdep: Avoid /proc/lockdep & lock_stat infinite output	2007-10-11 22:11:11 +02:00
Makefile	revert "Task Control Groups: example CPU accounting subsystem"	2007-11-14 18:45:40 -08:00
marker.c	Linux Kernel Markers: fix marker mutex not taken upon module load	2007-11-14 18:45:40 -08:00
module.c	Kobject: convert remaining kobject_unregister() to kobject_put()	2008-01-24 20:40:40 -08:00
mutex-debug.c
mutex-debug.h
mutex.c	lockdep: fixup mutex annotations	2007-10-11 22:11:12 +02:00
mutex.h
notifier.c	Add kernel/notifier.c	2007-10-19 11:53:34 -07:00
ns_cgroup.c	cgroups: implement namespace tracking subsystem	2007-10-19 11:53:37 -07:00
nsproxy.c	pid namespaces: allow cloning of new namespace	2007-10-19 11:53:39 -07:00
panic.c	debug: add end-of-oops marker	2007-12-20 15:01:17 +01:00
params.c	Modules: remove unneeded release function	2008-01-24 20:40:39 -08:00
pid.c	pidns: Place under CONFIG_EXPERIMENTAL	2007-11-14 18:45:43 -08:00
posix-cpu-timers.c	Isolate some explicit usage of task->tgid	2007-10-19 11:53:40 -07:00
posix-timers.c	Isolate some explicit usage of task->tgid	2007-10-19 11:53:40 -07:00
printk.c	sched: remove printk_clock()	2008-01-25 21:07:59 +01:00
profile.c	sched: document profile=sleep requiring CONFIG_SCHEDSTATS	2007-10-24 18:23:50 +02:00
ptrace.c	ptrace: Call arch_ptrace_attach() when request=PTRACE_TRACEME	2008-01-25 08:31:39 +01:00
rcupdate.c	rcu: fix section mismatch	2008-01-22 09:17:48 -08:00
rcutorture.c	Make rcutorture RNG use temporal entropy	2007-10-17 08:42:53 -07:00
relay.c	whitespace fixes: relayfs	2007-10-18 14:37:24 -07:00
resource.c	Add IORESOUCE_BUSY flag for System RAM	2007-11-14 18:45:39 -08:00
rtmutex-debug.c	Use helpers to obtain task pid in printks	2007-10-19 11:53:43 -07:00
rtmutex-debug.h
rtmutex-tester.c	Driver core: change sysdev classes to use dynamic kobject names	2008-01-24 20:40:40 -08:00
rtmutex.c	Use helpers to obtain task pid in printks	2007-10-19 11:53:43 -07:00
rtmutex.h
rtmutex_common.h	FUTEX: Tidy up the code	2007-07-16 09:05:49 -07:00
rwsem.c	sched: mark rwsem functions as __sched for wchan/profiling	2007-12-18 15:21:13 +01:00
sched.c	sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups	2008-01-25 21:08:00 +01:00
sched_debug.c	sched: fix gcc warnings	2007-12-30 17:24:35 +01:00
sched_fair.c	sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups	2008-01-25 21:08:00 +01:00
sched_idletask.c	sched: isolate SMP balancing code a bit more	2007-10-24 18:23:51 +02:00
sched_rt.c	sched: group scheduling, change how cpu load is calculated	2008-01-25 21:08:00 +01:00
sched_stats.h	sched: clean up kernel/sched_stat.h	2007-11-28 15:52:56 +01:00
seccomp.c	make seccomp zerocost in schedule	2007-07-16 09:05:50 -07:00
signal.c	sigwait eats blocked default-ignore signals	2007-11-12 16:05:23 -08:00
softirq.c	[KERNEL]: Unexport raise_softirq_irqoff	2007-10-10 16:49:18 -07:00
softlockup.c	Use helpers to obtain task pid in printks	2007-10-19 11:53:43 -07:00
spinlock.c	lockstat: hook into spinlock_t, rwlock_t, rwsem and mutex	2007-07-19 10:04:49 -07:00
srcu.c
stacktrace.c
stop_machine.c	Fix stop_machine_run problem with naughty real time process	2007-07-16 09:05:41 -07:00
sys.c	x86: ignore the sys_getcpu() tcache parameter	2007-11-17 16:27:00 +01:00
sys_ni.c	[COMPAT]: Fix build on COMPAT platforms when CONFIG_NET is disabled.	2007-10-30 21:29:56 -07:00
sysctl.c	sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups	2008-01-25 21:08:00 +01:00
sysctl_check.c	sysctl: fix ax25 checks	2007-12-17 19:28:17 -08:00
taskstats.c	kernel/taskstats.c: fix bogus nlmsg_free()	2007-11-14 18:45:44 -08:00
time.c	whitespace fixes: time syscalls	2007-10-18 14:37:24 -07:00
timer.c	timer: fix section mismatch	2008-01-21 19:39:41 -08:00
tsacct.c	Add scaled time to taskstats based process accounting	2007-10-18 14:37:28 -07:00
uid16.c
user.c	Kobject: convert kernel/user.c to use kobject_init/add_ng()	2008-01-24 20:40:31 -08:00
user_namespace.c	Fix user namespace exiting OOPs	2007-09-19 11:24:18 -07:00
utsname.c	Fix UTS corruption during clone(CLONE_NEWUTS)	2007-09-19 11:24:17 -07:00
utsname_sysctl.c	Isolate the UTS namespace's domainname and hostname back	2007-11-29 09:24:53 -08:00
wait.c
workqueue.c	lockdep: fix workqueue creation API lockdep interaction	2008-01-16 09:51:58 +01:00

No results found.