Process Scheduling II

Similar documents
W4118 Operating Systems. Instructor: Junfeng Yang

Linux scheduler history. We will be talking about the O(1) scheduler

Linux Process Scheduling. sched.c. schedule() scheduler_tick() hooks. try_to_wake_up() ... CFS CPU 0 CPU 1 CPU 2 CPU 3

Linux O(1) CPU Scheduler. Amit Gud amit (dot) gud (at) veritas (dot) com

Linux Scheduler. Linux Scheduler

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Scheduling policy. ULK3e 7.1. Operating Systems: Scheduling in Linux p. 1

CPU Scheduling Outline

Process Scheduling in Linux

ò Paper reading assigned for next Thursday ò Lab 2 due next Friday ò What is cooperative multitasking? ò What is preemptive multitasking?

Overview of the Linux Scheduler Framework

Process Scheduling in Linux

Scheduling 0 : Levels. High level scheduling: Medium level scheduling: Low level scheduling

Linux process scheduling

Chapter 5 Linux Load Balancing Mechanisms

Comp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d

Task Scheduling for Multicore Embedded Devices

Operating Systems Concepts: Chapter 7: Scheduling Strategies

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

CPU Scheduling. Basic Concepts. Basic Concepts (2) Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems

Announcements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading

Linux Process Scheduling Policy

Completely Fair Scheduler and its tuning 1

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts

Objectives. Chapter 5: Process Scheduling. Chapter 5: Process Scheduling. 5.1 Basic Concepts. To introduce CPU scheduling

Linux Load Balancing

Konzepte von Betriebssystem-Komponenten. Linux Scheduler. Valderine Kom Kenmegne Proseminar KVBK Linux Scheduler Valderine Kom

Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler

Process Scheduling CS 241. February 24, Copyright University of Illinois CS 241 Staff

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition

Chapter 5 Process Scheduling

Real-Time Scheduling 1 / 39

Multilevel Load Balancing in NUMA Computers

ò Scheduling overview, key trade-offs, etc. ò O(1) scheduler older Linux scheduler ò Today: Completely Fair Scheduler (CFS) new hotness

Project No. 2: Process Scheduling in Linux Submission due: April 28, 2014, 11:59pm

An Implementation Of Multiprocessor Linux

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

Operating System: Scheduling

ICS Principles of Operating Systems

CPU Scheduling. CSC 256/456 - Operating Systems Fall TA: Mohammad Hedayati

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

Operating Systems. III. Scheduling.

CPU Scheduling. CPU Scheduling

W4118 Operating Systems. Instructor: Junfeng Yang

CPU Scheduling. Core Definitions

Job Scheduling Model

OPERATING SYSTEMS SCHEDULING

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Process Scheduling in Linux

Main Points. Scheduling policy: what to do next, when there are multiple threads ready to run. Definitions. Uniprocessor policies

Scheduling Algorithms

CHAPTER 1 INTRODUCTION

Readings for this topic: Silberschatz/Galvin/Gagne Chapter 5

Mitigating Starvation of Linux CPU-bound Processes in the Presence of Network I/O

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.

Tasks Schedule Analysis in RTAI/Linux-GPL

Operating Systems Lecture #6: Process Management

Operating Systems, 6 th ed. Test Bank Chapter 7

CPU Scheduling 101. The CPU scheduler makes a sequence of moves that determines the interleaving of threads.

Effective Computing with SMP Linux

/ Operating Systems I. Process Scheduling. Warren R. Carithers Rob Duncan

Interrupts in Linux. COMS W4118 Prof. Kaustubh R. Joshi hfp://

Multi-core and Linux* Kernel

A Survey of Parallel Processing in Linux

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Improvement of Scheduling Granularity for Deadline Scheduler

Processor Scheduling. Queues Recall OS maintains various queues

Real-time KVM from the ground up

Threads Scheduling on Linux Operating Systems

SYSTEM ecos Embedded Configurable Operating System

CPU Scheduling. Multitasking operating systems come in two flavours: cooperative multitasking and preemptive multitasking.

OS OBJECTIVE QUESTIONS

The Linux Kernel: Process Management. CS591 (Spring 2001)

Exercises : Real-time Scheduling analysis

On Linux Starvation of CPU-bound Processes in the Presence of Network I/O

An Adaptive Task-Core Ratio Load Balancing Strategy for Multi-core Processors

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:

CHAPTER 15: Operating Systems: An Overview

A complete guide to Linux process scheduling. Nikita Ishkov

Understanding the Linux CPU Scheduler

Understanding the Linux CPU Scheduler

Modular Real-Time Linux

Scheduling algorithms for Linux

Multiprogramming. IT 3123 Hardware and Software Concepts. Program Dispatching. Multiprogramming. Program Dispatching. Program Dispatching

Hadoop Fair Scheduler Design Document

Hard Real-Time Linux

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Transcription:

Process Scheduling II COMS W4118 Prof. Kaustubh R. Joshi krj@cs.columbia.edu hdp://www.cs.columbia.edu/~krj/os References: OperaWng Systems Concepts (9e), Linux Kernel Development, previous W4118s Copyright no2ce: care has been taken to use only those web images deemed by the instructor to be in the public domain. If you see a copyrighted image on any slide and are the copyright owner, please contact the instructor. It will be removed. 1

Outline Advanced scheduling issues MulWlevel queue scheduling MulWprocessor scheduling issues Linux/Android Scheduling Scheduler Architecture Scheduling algorithm O(1) RR scheduler CFS scheduler Other implementawon issues 2

MoWvaWon No one- size- fits- all scheduler Different workloads Different environment Building a general scheduler that works well for all is difficult! Real scheduling algorithms are ofen more complex than the simple scheduling algorithms we ve seen 3

Combining scheduling algorithms MulWlevel queue scheduling: ready queue is parwwoned into mulwple queues Each queue has its own scheduling algorithm Foreground processes: RR (e.g., shell, editor, GUI) Background processes: FCFS (e.g., backup, indexing) Must choose scheduling algorithm to schedule between queues. Possible algorithms RR between queues Fixed priority for each queue Timeslice for each queue (e.g., RR gets 80%, FCFS 20%) 4

Movement between queues No automawc movement between queues User can change process queue at any Wme 5

MulWlevel Feedback Queue Process automawcally moved between queues method used to determine when to upgrade a process method used to determine when to demote a process Used to implement Aging: move to higher priority queue Monopolizing resources: move to lower priority queue 6

Aging using MulWlevel Queues A new job enters queue Q 0 which is served RR. When it gains CPU, job receives 8 milliseconds. If it does not finish in 8 milliseconds, job is moved to queue Q 1. At Q 1 job is again served RR and receives 16 addiwonal milliseconds. If it swll does not complete, it is preempted and moved to low priority FCFC queue Q 2. 7

Outline Advanced scheduling issues MulWlevel queue scheduling MulWprocessor scheduling issues Linux/Android Scheduling Scheduling algorithm O(1) RR scheduler CFS scheduler Selng prioriwes and Wme slices Other implementawon issues 8

MulWprocessor scheduling issues Shared- memory MulWprocessor processes CPU0 CPU1 CPU2 CPU3 How to allocate processes to CPU? 9

Symmetric mulwprocessor Architecture Shared Memory $ $ $ $ CPU0 CPU1 CPU2 CPU3 Small number of CPUs Same access Wme to main memory Private cache Memory Memory mappings (TLB) 10

Global queue of processes One ready queue shared across all CPUs Advantages Good CPU uwlizawon Fair to all processes Disadvantages Not scalable (contenwon for global queue lock) Poor cache locality Linux 2.4 uses global queue CPU0 CPU1 CPU2 CPU3 11

Per- CPU queue of processes StaWc parwwon of processes to CPUs Advantages Easy to implement Scalable (no contenwon on ready queue) BeDer cache locality Disadvantages Load- imbalance (some CPUs have more processes) Unfair to processes and lower CPU uwlizawon CPU0 CPU1 CPU2 CPU3 12

Hybrid approach Use both global and per- CPU queues Balance jobs across queues CPU0 CPU1 CPU2 CPU3 Processor Affinity Add process to a CPU s queue if recently run on the CPU Cache state may swll present Linux 2.6 uses a very similar approach 13

SMP: gang scheduling MulWple processes need coordinawon Should be scheduled simultaneously CPU0 CPU1 CPU2 CPU3 Scheduler on each CPU does not act independently Coscheduling (gang scheduling): run a set of processes simultaneously Global context- switch across all CPUs 14

Outline Advanced scheduling issues MulWlevel queue scheduling MulWprocessor scheduling issues Linux/Android Scheduling Scheduler Architecture Scheduling algorithms O(1) RR scheduler CFS scheduler Other implementawon issues 15

Linux Scheduler Class Overview Linux has a hierarchical scheduler Sof Real- Wme scheduling policies SCHED_FIFO (FCFS) SCHED_RR (real Wme round robin) Always get priority over non real Wme tasks One of 100 priority levels (0..99) Normal scheduling policies SCHED_OTHER: standard processes SCHED_BATCH: batch style processes SCHED_IDLE: low priority tasks One of 40 priority levels (- 20..0..19) Real Time 0 Real Time 1 Real Time 2 Real Time 99 Normal 16

Linux Hierarchical Scheduler Code from kernel/sched.c: class = sched_class_highest; for ( ; ; ) { p = class- >pick_next_task(rq); if (p) return p; /* * Will never be NULL as the idle class always * returns a non- NULL p: */ class = class- >next; } 17

The runqueue All run queues available in array runqueues, one per CPU struct rq (kernel/sched.c) Contains per- class run queues (RT, CFS) and other per- class params E.g., CFS: a list of task_struct in struct list_head tasks E.g., RT: array of acwve prioriwes Data structure rt_rq, cfs_rq, struct sched_enwty (kernel/sched.c) Member of task_struct, one per scheduler class Maintains list head for class runqueue, other per- task params Current scheduler for task is specified by task_struct.sched_class Pointer to struct sched_class Contains funcwons pertaining to class (object- oriented code) 18

sched_class Structure static const struct sched_class fair_sched_class = {.next = &idle_sched_class,.enqueue_task.dequeue_task.yield_task.check_preempt_curr.pick_next_task.put_prev_task.select_task_rq.load_balance.move_one_task.set_curr_task.task_tick.task_new.prio_changed.switched_to = enqueue_task_fair, = dequeue_task_fair, = yield_task_fair, = check_preempt_wakeup, = pick_next_task_fair, = put_prev_task_fair, = select_task_rq_fair, = load_balance_fair, = move_one_task_fair, = set_curr_task_fair, = task_tick_fair, = task_new_fair, = prio_changed_fair, = switched_to_fair, } 19

MulWprocessor scheduling Per- CPU runqueue Possible for one processor to be idle while others have jobs waiwng in their run queues Periodically, rebalance runqueues MigraWon threads move processes from one runque to another The kernel always locks runqueues in the same order for deadlock prevenwon 20

Load balancing To keep all CPUs busy, load balancing pulls tasks from busy runqueues to idle runqueues. If schedule finds that a runqueue has no runnable tasks (other than the idle task), it calls load_balance load_balance also called via Wmer schedule_0ck calls rebalance_0ck Every Wck when system is idle Every 100 ms otherwise 21

Processor affinity Each process has a bitmask saying what CPUs it can run on By default, all CPUs Processes can change the mask Inherited by child processes (and threads), thus tending to keep them on the same CPU Rebalancing does not override affinity 22

Load balancing load_balance looks for the busiest runqueue (most runnable tasks) and takes a task that is (in order of preference): inacwve (likely to be cache cold) high priority load_balance skips tasks that are: likely to be cache warm currently running on a CPU not allowed to run on the current CPU (as indicated by the cpus_allowed bitmask in the task_struct) 23

Priority related fields in struct task_struct stawc_prio: stawc priority set by administrator/ users Default: 120 (even for realwme processes) Set use sys_nice() or sys_setpriority() Both call set_user_nice() prio: dynamic priority Index to prio_array rt_priority: real Wme priority prio = 99 rt_priority include/linux/sched.h 24

Adding a new Scheduler Class The Scheduler is modular and extensible New scheduler classes can be installed Each scheduler class has priority within hierarchical scheduling hierarchy PrioriWes defined in sched.h, e.g. #define SCHED_RR 2 Linked list of sched_class sched_class.next reflects priority Core funcwons: kernel/sched.c, include/linux/sched.h AddiWonal classes: kernel/sched_fair.c,sched_rt.c Process changes class via sched_setscheduler syscall Each class needs New runqueue structure in main struct runqueue New sched_class structure implemenwng scheduling funcwons New sched_enwty in the task_struct 25

Outline Advanced scheduling issues MulWlevel queue scheduling MulWprocessor scheduling issues Linux/Android Scheduling Scheduler Architecture Scheduling algorithms O(1) RR scheduler CFS scheduler Other implementawon issues 26

Real- Wme policies First- in, first- out: SCHED_FIFO StaWc priority Process is only preempted for a higher- priority process No Wme quanta; it runs unwl it blocks or yields voluntarily RR within same priority level Round- robin: SCHED_RR As above but with a Wme quanta Normal processes have SCHED_NORMAL scheduling policy 27

Old Linux O(1) scheduler Old Linux scheduler (unwl 2.6.22) for SCHED_NORMAL Round robin fixed Wme slice Boost interacwvity Fast response to user despite high load Inferring interacwve processes and dynamically increase their prioriwes Avoid starvawon Scale well with number of processes O(1) scheduling overhead Scale well with number of processors Load balance: no CPU should be idle if there is work CPU affinity: no random bouncing of processes 28

runqueue data structure Two arrays of priority queues acwve and expired Total 140 prioriwes [0, 140) Smaller integer = higher priority 29

Aging: the tradiwonal algorithm for(pp = proc; pp < proc+nproc; pp++) { if (pp- >prio!= MAX) pp- >prio++; if (pp- >prio > curproc- >prio) reschedule(); } Problem: O(N). Every process is examined on each schedule() call! This code is taken almost verbawm from 6 th EdiWon Unix, circa 1976. 30 30

Scheduling algorithm for normal processes 1. Find highest priority non- empty queue in rq- >acwve; if none, simulate aging by swapping acwve and expired 2. next = first process on that queue 3. Adjust next s priority 4. Context switch to next 5. When next used up its Wme slice, insert next to the right queue in the expired array and call schedule() again 31

Simulate aging Swapping acwve and expired gives low priority processes a chance to run Advantage: O(1) Processes are touched only when they start or stop running 32 32

Find highest priority non- empty queue Time complexity: O(1) Depends on the number of priority levels, not the number of processes ImplementaWon: a bitmap for fast look up 140 queues à 5 integers A few compares to find the first non- zero bit Hardware instrucwon to find the first 1- bit bsfl on Intel 33

AdjusWng priority Goal: dynamically increase priority of interacwve process How to determine interacwve? Sleep rawo Mostly sleeping: I/O bound Mostly running: CPU bound ImplementaWon: per process sleep_avg Before switching out a process, subtract from sleep_avg how many Wcks a task ran Before switching in a process, add to sleep_avg how many Wcks it was blocked up to MAX_SLEEP_AVG (10 ms) 34

CalculaWng Wme slices Stored in field Wme_slice in struct task_struct Higher priority processes also get bigger Wme- slice task_wmeslice() in sched.c If (stawc_priority < 120) Wme_slice = (140- stawc_priority) * 20 If (stawc_priority >= 120) Wme_slice = (140- stawc_priority) * 5 35

Example Wme slices Priority: Static Pri Niceness Quantum Highest 100-20 800 ms High 110-10 600 ms Normal 120 0 100 ms Low 130 10 50 ms Lowest 139 19 5 ms 36

Problems with O(1) RR Scheduler Not easy to diswnguish between CPU and I/O bound I/O bound typically need beder interacwvity CPU bound need sustained period of CPU at lower priority Finding right Wme slice isn t easy Too small: good for I/O, but high context switch overhead Too large: good for CPU bound jobs, but poor interacwvity PrioriWzaWon by increasing Wmeslice isn t perfect I/O bound processes want high priority, but small Wmeslice! CPU bound processes want low priority but large Wmeslice! Need complex aging to avoid starvawon Priority is relawve, but Wme slice is absolute Nice 0, 1: Wme slice 100 and 95 msec: 5% difference! Nice 19, 20: Wme slice 10 and 5: 100% difference! Time slice has to be mulwple of Wck, how to give priority to freshly woken up tasks even if their Wme slice has expired? Lots of heuriswcs to fix these problems Problem: heuriswcs can be adacked, several adacks existed 37

Outline Advanced scheduling issues MulWlevel queue scheduling MulWprocessor scheduling issues Linux/Android Scheduling Scheduler Architecture Scheduling algorithms O(1) RR scheduler CFS scheduler Other implementawon issues 38

Completely Fair Scheduler (CFS) Introduced in kernel 2.6.23 Models an ideal mulwtasking CPU Infinitesimally small Wmeslice n processes: each progresses uniformly at 1/n th rate 1 Process 3 Processes 1/3 rd Progress Problem: real CPU can t be split into infinitesimally small Wmeslice without excessive overhead 39

Completely Fair Scheduler Core ideas: dynamic Wme slice and order Don t use fixed Wme slice per task Instead, fixed Wme slice across all tasks Scheduling Latency Don t use round robin to pick next task Pick task which has received least CPU so far Equivalent to dynamic priority 40

Scheduling Latency Equivalent to Wme slice across all processes ApproximaWon of infinitesimally small Default value is 20 msec To set/get type: $ sysctl kernel.sched_latency_ns Each process gets equal proporwon of slice Timeslice(task) = latency/nr_tasks Lower bound on smallest slice: default 4 msec To set/get: $ sysctl kernel.sched_min_granularity_ns Too many tasks? sched_latency = nr_tasks*min_granularity Priority through proporwonal sharing Task gets share of CPU proporwonal to relawve priority Timeslice(task) = Timeslice(t) * prio(t) / Sum_all_t (prio(t )) Maximum wait Wme bounded by scheduling latency 41

Picking the Next Process Pick task with minimum runwme so far Tracked by vrunwme member variable Every Wme process runs for t ns, vrunwme +=t (weighed by process priority) How does this impact I/O vs CPU bound tasks Task A: needs 1 msec every 100 sec (I/O bound) Task B, C: 80 msec every 100 msec (CPU bound) Afer 10 Wmes that A, B, and C have been scheduled vrunwme(a) = 10, vrunwme(b, C) = 800 A gets priority, B and C get large Wme slices (10msec each) Problem: how to efficiently track min runwme? Scheduler needs to be efficient Finding min every Wme is an O(N) operawon 42

Finding Lowest RunWme Efficiently Need to update vrunwme and min_vrunwme When new task is added or removed On every Wmer Wck, context switch Balanced binary search tree Red- Black Trees Ordered by vrunwme as key O(lgN) inserwon, delewon, update, O(1): find min cfs_rq- >min_vrunwme vrunwme=100 vrunwme=300 VrunWme=400 vrunwme=30 VrunWme=150 vrunwme=410 Tasks move from lef of tree to the right min_vrunwme caches smallest value 43

Outline Advanced scheduling issues MulWlevel queue scheduling MulWprocessor scheduling issues Linux/Android Scheduling Scheduler Architecture Scheduling algorithms O(1) RR scheduler CFS scheduler Other implementawon issues 44

Bookkeeping on each Wmer interrupt scheduler_wck() Called on each Wck Wmer_interrupt è do_wmer_interrupt è do_wmer_interrupt_hook è update_process_wmes If realwme and SCHED_FIFO, do nothing SCHED_FIFO is non- preempwve If realwme and SCHED_RR and used up Wme slice, move to end of rq- >acwve[prio] If SCHED_NORMAL and used up Wme slice If not interacwve or starving expired queue, move to end of rq- >expired[prio] Otherwise, move to end of rq- >acwve[prio] Boost interacwve Else // SCHED_NORMAL, and not used up Wme slice Break large Wme slice into pieces TIMESLICE_GRANULARITY 45

OpWmizaWons If next is a kernel thread, borrow the MM mappings from prev User- level MMs are unused. Kernel- level MMs are the same for all kernel threads If prev == next Don t context switch 46