The Gate of the AOSP #4 : Gerrit, Memory & Performance Process Scheduling in Linux 2013. 3. 29 Namhyung Kim
Outline 1 Process scheduling 2 SMP scheduling 3 Group scheduling - www.kandroid.org 2/ 41
Process scheduling scheduler basics O(1) scheduler CFS - www.kandroid.org 3/ 41
Terminology UTS Unix Time-sharing System task process or thread runqueue (rq) per-cpu list contains runnable tasks latency time delay between stimulus and response throughput amount of work done per unit time - www.kandroid.org 4/ 41
Scheduler basics scheduler CPU resouce manager central part of kernel controls time slices schedules tasks (processes) decision maker when? who? how long? latency/throughput fairness - www.kandroid.org 5/ 41
Task types interactive task I/O bound (ex. editor) sleeps most of times wants minimal latency batch task CPU bound (ex. compiler) wants maximal throughput - www.kandroid.org 6/ 41
Task state - www.kandroid.org 7/ 41
Context switch when schedule a task voluntary (sleep) non-voluntary (preempt) save/restore task information hw registers memory address space overhead cache eviction TLB flush - www.kandroid.org 8/ 41
Preemption schedule a (running) task (at any time?) configurable at compile time CONFIG_PREEMPT{,_NONE,_VOLUNTARY} - www.kandroid.org 9/ 41
Kernel preemption points return to user (syscall, irq, exception) PREEMPT_NONE might_sleep() PREEMPT_VOLUNTARY return from irq (preempt_count == 0) preempt_enable()/spin_unlock() PREEMPT - www.kandroid.org 10/ 41
Time slice round-robin time sharing system (UTS) a time unit allowed for a task at a given time can be affected by timer freq. (HZ) hard to optimize since it should be small for less latency it should be large for better throughput - www.kandroid.org 11/ 41
POSIX scheduling policy SCHED_FIFO SCHED_RR SCHED_OTHER Linux-specific SCHED_NORMAL SCHED_BATCH SCHED_IDLE - www.kandroid.org 12/ 41
Linux scheduling class RT class SCHED_FIFO SCHED_RR FAIR class SCHED_NORMAL SCHED_BATCH SCHED_IDLE - www.kandroid.org 13/ 41
Scheduling priority Root of Evil(tm) but we need it anyway for less latency and higher throughput Changing priority $ chrt -m SCHED_OTHER min/max priority: 0/0 SCHED_FIFO min/max priority: 1/99 SCHED_RR min/max priority: 1/99 SCHED_BATCH min/max priority: 0/0 SCHED_IDLE min/max priority: 0/0 - www.kandroid.org 14/ 41
Ideal CPU model share cpu resources to all tasks run tasks simultaneously each task owns its share currently impossible - www.kandroid.org 15/ 41
O(1) Scheduler used for RT tasks used for normal tasks (prior to 2.6.23) simple and fast algorithm fixed number of bitmap and array static-assigned time slice fairness issues on prioritized tasks - www.kandroid.org 16/ 41
Completely-Fair Scheduler used for normal tasks treat all tasks fairly unless they have different priority vruntime deals with the priority - www.kandroid.org 17/ 41
SMP scheduling cpu load tracking scheduler domain load balancing - www.kandroid.org 18/ 41
CPU affinity set of cpus allowed to run a given task all cpus are allowed by default must be considered when scheduling each task remembers its cpu running on Setting cpu affinity # taskset 0xff <command> # taskset -c 0-7 <command> # taskset -p [mask] <pid> - www.kandroid.org 19/ 41
CPU load tracking CPU load = number of tasks running TASK_RUNNING + TASK_UNINTERRUPTIBLE global load average system load information (1, 5,15 min) cpu load average for load balancing Global load average $ uptime 18:19:26 up 1 day, 7:51, 2 users, load average: 0.02, 0.03, 0.05 - www.kandroid.org 20/ 41
Moving average calc average of continuous data stream smooth out short-term fluctuations highlight longer-term trends kernel uses EMA for cpu load tracking past term decreases exponentially this process called decay http://en.wikipedia.org/wiki/moving_average - www.kandroid.org 21/ 41
Scheduler domain abstraction layer of hardware topology a domain consists of groups a cpu resides in multiple hierachies a domain is a group in higher domain - www.kandroid.org 22/ 41
Load balancing migrate task according to cpu s load spread or pack chances to balance fork & exec wake up idle periodic - www.kandroid.org 23/ 41
Load balancing strategy fork & exec spread to any idlest cpu wake up keep prev cpu or migrate to current cpu or its idle sibling idle migrate to current cpu periodic migrate to current cpu (or its siblings) - www.kandroid.org 24/ 41
Scheduler domain fields for fine-tuning scheduler SD_BALANCE_* flags various indexes balance interval imbalance percent - www.kandroid.org 25/ 41
(H)MP scheduling started with per-entity load tracking patchset calculate each sched entity s load based on runnable time Linaro is trying to upstream the code for ARM big.little power-aware scheduling/balancing - www.kandroid.org 26/ 41
Group scheduling control groups (cgroups) task group (cpu cgroup) group scheduling - www.kandroid.org 27/ 41
Control groups way of grouping arbitrary tasks each group can implement what they want cpu, memcg, block, device, perf,... usually used for resource management - www.kandroid.org 28/ 41
Cgroup filesystem maintain group hierachy in a pseudo fs Using cgroupfs # mount -t cgroup -o cpu nodev /sys/fs/cgroup/cpu # cd /sys/fs/cgroup/cpu # ls cgroup.clone_children cgroup.event_control cgroup.procs cpu.shares notify_on_release release_agent tasks # mkdir aaa # echo $$ > aaa/tasks - www.kandroid.org 29/ 41
Group scheduling - www.kandroid.org 30/ 41
Scheduling entity abstraction of scheduling unit traditionally same as a task with group scheduling, it can be a task group - www.kandroid.org 31/ 41
Task group abstraction of group of tasks viewed as a sched entity consists of a sched entity and a runqueue - www.kandroid.org 32/ 41
Task group + SMP tasks in a group can be distributed on multiple CPUs each cpu sees a portion of a task group need to distribute group s share also proportional to no. tasks in a cpu/group non-group entity (task) uses its share/load solely - www.kandroid.org 33/ 41
Example of group scheduling (1) please refer to page 30 all tasks has same load professors group shares: 1024 two running tasks: A and B students group shares: 512 no running task system_tasks group shares: 512 one running task: X - www.kandroid.org 34/ 41
CPU load in example 1 - www.kandroid.org 35/ 41
Example of group scheduling (2) same condition of example 1, but... in the students group Electonics students group shares: 3072 three running tasks: f, g, h Computers students group shares: 3072 two running tasks: u, v Other students group shares: 1024 one running task: w - www.kandroid.org 36/ 41
CPU load in example 2 - www.kandroid.org 37/ 41
CPU load in example 2 (cont.) - www.kandroid.org 38/ 41
Auto group when task sits in a (default) root group not used if it moves to another group create a new group for new session new terminal/login Check autogroup enabled # cat /proc/sys/kernel/sched_autogroup_enabled 1 - www.kandroid.org 39/ 41
Group scheduling in Android root group system tasks foreground group currently running (user-visible) app shares 95% of cpu background group inactive apps shares 5% of cpu https://lkml.org/lkml/2013/1/15/964 - www.kandroid.org 40/ 41
Thanks! Q & A - www.kandroid.org 41/ 41