EECS 750: Advanced Operating Systems 01/28 /2015 Heechul Yun 1
Recap: Completely Fair Scheduler(CFS) Each task maintains its virtual time V i = E i 1 w i, where E is executed time, w is a weight Pick the task with the smallest virtual time Tasks are sorted according to their virtual times Managed by a red-black tree, O(log N) Guarantee fairness and bounded latency No complex heuristics for interactivity. 2
Recap: BVT Key ideas Weighted fair sharing using virtual time Linux CFS Prioritizing real-time tasks by borrowing future virtual time (warpback mechanism) Project idea Extending Linux CFS to include BVT s warpback mechanism https://gist.github.com/leverich/5913713 3
Today Topic: Multicore CPU scheduling Linux CPU scheduling SMP, load balancing Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin, PPoPP '09 4
Symmetric Multiprocessing (SMP) Linux CPU1 CPU2 CPU3 CPU4 Memory All CPUs are equal Memory is shared A single OS, a task can run on any CPU 5
Asymmetric Multiprocessing (AMP) CPU1 CPU2 CPU3 CPU4 Linux Windows RTOS A RTOS B Memory Linux Windows RTOS A RTOS B All CPUs are NOT equal Memory is NOT shared Multiple OSes, A task can NOT run on any CPU 6
Uniform Memory Access (UMA) CPU1 CPU2 CPU3 CPU4 Memory All CPUs have the same memory access cost 7
Non-Uniform Memory Access (NUMA) CPU1 CPU2 CPU3 CPU4 Memory Memory All CPUs still can access the entire memory space But, the access cost is different depending on the access location 8
Global Scheduling tasks OS RunQueue HW CPU1 CPU2 CPU3 CPU4 9
Partitioned Scheduling tasks tasks tasks tasks OS RunQueue RunQueue RunQueue RunQueue HW CPU1 CPU2 CPU3 CPU4 Linux s basic design. Why? 10
Load balancing Goal Don t let someone do nothing while others are busy working Equally distribute loads to all CPUs 11
Load Balancing in Linux Move tasks to equalize loads of all cores When to check? Task fork, clone, exec, wake up A runqueue becomes empty Periodically When to balance? When the imbalance is high enough 12
Load Balancing in Linux: Example busy idle OS RunQueue RunQueue HW CPU1 CPU2 13
Load Balancing in Linux: Example busy idle OS RunQueue RunQueue HW CPU1 CPU2 14
Load Balancing in Linux: Example 2 busy idle T 1 T 2 T 3 OS RunQueue RunQueue HW CPU1 CPU2 What will happen? 15
Load Balancing in Linux: Example 2 Not globally fair 16
Load Balancing in Linux Further complications SMT (hyper threading) Shared cache vs. private cache NUMA Linux defines high enough imbalance differently on a case-by-case basis 17
Today s Paper Efficient and scalable multiprocessor fair scheduling using distributed weighted roundrobin, PPoPP '09 Guarantees global fairness 18
Distributed Weighted Round Robin Goal: guarantees task-level global fairness Local fairness Fair between the tasks on a single cpu Global fairness Fair between the all tasks on all CPUs CFS: no, DWRR=yes 19
Distributed Weighted Round Robin Algorithm For each round, each task can execute w * B time w: weight, B: round time unit Once a task consumes its round slice (w*b), it is pushed to the expired queue Once all tasks in a core are in the expired queue, pull tasks from other busy cores If there s no busy cores in the current round, then move to next round (highest round) 20
Distributed Weighted Round Robin 21
Comparison 22
Migration Cost Shared cache vs. private cache 23
Summary Multicore scheduling and load balancing in Linux Linux doesn t guarantee global fairness DWRR guarantees global fairness 24
Discussion Is migration cost really negligible? Does it really matter to guarantee CPU time fairness? Load balancing in heterogeneous multicore core processors? Big cores + small cores: e.g., ARM s biglittle 25
Project Grouping 26
Appendix 27
Load Balancing in Linux Slide source: Gap-Joo Na, Task Scheduling for Multicore Embedded Devices, ELC 2013 28
Load Balancing in Linux Scheduling domains SMT domain Logical cores that share ALUs (hyperthreads) MC domain Physical cores that share cache CPU domain Processors (packages) that share memory (UMA) NUMA domain 29