Linux scheduler history. We will be talking about the O(1) scheduler



Similar documents
W4118 Operating Systems. Instructor: Junfeng Yang

Linux Scheduler. Linux Scheduler

Process Scheduling II

Linux O(1) CPU Scheduler. Amit Gud amit (dot) gud (at) veritas (dot) com

Scheduling policy. ULK3e 7.1. Operating Systems: Scheduling in Linux p. 1

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Scheduling 0 : Levels. High level scheduling: Medium level scheduling: Low level scheduling

Linux Process Scheduling. sched.c. schedule() scheduler_tick() hooks. try_to_wake_up() ... CFS CPU 0 CPU 1 CPU 2 CPU 3

ò Paper reading assigned for next Thursday ò Lab 2 due next Friday ò What is cooperative multitasking? ò What is preemptive multitasking?

Process Scheduling in Linux

Linux Process Scheduling Policy

Chapter 5 Linux Load Balancing Mechanisms

CPU Scheduling Outline

Operating Systems Concepts: Chapter 7: Scheduling Strategies

Completely Fair Scheduler and its tuning 1

Comp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d

Linux Load Balancing

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Overview of the Linux Scheduler Framework

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS

Project No. 2: Process Scheduling in Linux Submission due: April 28, 2014, 11:59pm

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

CPU Scheduling. Core Definitions

Linux process scheduling

Understanding the Linux CPU Scheduler

ò Scheduling overview, key trade-offs, etc. ò O(1) scheduler older Linux scheduler ò Today: Completely Fair Scheduler (CFS) new hotness

Understanding the Linux CPU Scheduler

Chapter 5 Process Scheduling

Process Scheduling in Linux

Real-Time Scheduling 1 / 39

The Linux Kernel: Process Management. CS591 (Spring 2001)

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

Operating Systems. III. Scheduling.

CPU Scheduling. Basic Concepts. Basic Concepts (2) Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems

W4118 Operating Systems. Instructor: Junfeng Yang

SYSTEM ecos Embedded Configurable Operating System

Mitigating Starvation of Linux CPU-bound Processes in the Presence of Network I/O

CPU Scheduling. CSC 256/456 - Operating Systems Fall TA: Mohammad Hedayati

Operating System: Scheduling

Task Scheduling for Multicore Embedded Devices

Objectives. Chapter 5: Process Scheduling. Chapter 5: Process Scheduling. 5.1 Basic Concepts. To introduce CPU scheduling

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

Konzepte von Betriebssystem-Komponenten. Linux Scheduler. Valderine Kom Kenmegne Proseminar KVBK Linux Scheduler Valderine Kom

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.

Processor Scheduling. Queues Recall OS maintains various queues

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

Understanding the Linux CPU Scheduler

Job Scheduling Model

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Tasks Schedule Analysis in RTAI/Linux-GPL

An Implementation Of Multiprocessor Linux

Multilevel Load Balancing in NUMA Computers

Threads Scheduling on Linux Operating Systems

A complete guide to Linux process scheduling. Nikita Ishkov

A Survey of Parallel Processing in Linux

Process Scheduling CS 241. February 24, Copyright University of Illinois CS 241 Staff

Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler

CPU Scheduling. CPU Scheduling

Announcements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading

Effective Computing with SMP Linux

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts

CPU Scheduling. Multitasking operating systems come in two flavours: cooperative multitasking and preemptive multitasking.

Improvement of Scheduling Granularity for Deadline Scheduler

On Linux Starvation of CPU-bound Processes in the Presence of Network I/O

Real-time KVM from the ground up

ICS Principles of Operating Systems

A Study of Performance Monitoring Unit, perf and perf_events subsystem

Lecture 3 Theoretical Foundations of RTOS

CS414 SP 2007 Assignment 1

Process Scheduling in Linux

Chapter 2 Process, Thread and Scheduling

Readings for this topic: Silberschatz/Galvin/Gagne Chapter 5

OPERATING SYSTEMS SCHEDULING

POSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)

/ Operating Systems I. Process Scheduling. Warren R. Carithers Rob Duncan

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Operating System Engineering: Fall 2005

Jorix kernel: real-time scheduling

Main Points. Scheduling policy: what to do next, when there are multiple threads ready to run. Definitions. Uniprocessor policies

Linux Block I/O Scheduling. Aaron Carroll December 22, 2007

These sub-systems are all highly dependent on each other. Any one of them with high utilization can easily cause problems in the other.

Testing task schedulers on Linux system

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Scheduling Algorithms

Hard Real-Time Linux

EE8205: Embedded Computer System Electrical and Computer Engineering, Ryerson University. Multitasking ARM-Applications with uvision and RTX

White Paper Perceived Performance Tuning a system for what really matters

CPU Scheduling 101. The CPU scheduler makes a sequence of moves that determines the interleaving of threads.

Load-Balancing for Improving User Responsiveness on Multicore Embedded Systems

Why Relative Share Does Not Work

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Transcription:

CPU Scheduling

Linux scheduler history We will be talking about the O(1) scheduler

SMP Support in 2.4 and 2.6 versions 2.4 Kernel 2.6 Kernel CPU1 CPU2 CPU3 CPU1 CPU2 CPU3

Linux Scheduling 3 scheduling classes SCHED_FIFO and SCHED_RR are realtime classes SCHED_OTHER is for the rest 140 Priority levels 1-100 : RT priority 101-140 : User task priorities Three different scheduling policies One for normal tasks Two for Real time tasks

Pre-emptive, priority based scheduling. When a process with higher real-time priority (rt_priority) wishes to run, all other processes with lower real-time priority are thrust aside. In SCHED_FIFO, a process runs until it relinquishes control or another with higher real-time priority wishes to run. SCHED_RR process, in addition to this, is also interrupted when its time slice expires or there are processes of same real-time priority (RR between processes of this class) SCHED_OTHER is also round-robin, with lower time slice

SCHED_OTHER: Normal tasks Each task assigned a Nice value Static priority = 120 + Nice Nice value between -20 and +19 Assigned a time slice Tasks at the same priority are round-robined Ensures Priority + Fairness

Basic Philosophies Priority is the primary scheduling mechanism Priority is dynamically adjusted at run time Processes denied access to CPU get increased Processes running a long time get decreased Try to distinguish interactive processes from noninteractive Bonus or penalty reflecting whether I/O or compute bound Use large quanta for important processes Modify quanta based on CPU use Associate processes to CPUs Do everything in O(1) time

The Runqueue 140 separate queues, one for each priority level Actually, two sets, active and expired Priorities 0-99 for real-time processes Priorities 100-139 for normal processes; value set via nice()/setpriority() system calls

Linux 2.6 scheduler runqueue structure

Scheduler Runqueue A scheduler runqueue is a list of tasks that are runnable on a particular CPU. A rq structure maintains a linked list of those tasks. The runqueues are maintained as an array runqueues, indexed by the CPU number. The rq keeps a reference to its idle task The idle task for a CPU is never on the scheduler runqueue for that CPU (it's always the last choice) Access to a runqueue is serialized by acquiring and releasing rq->lock

Basic Scheduling Algorithm Find the highest-priority queue with a runnable process Find the first process on that queue Calculate its quantum size Let it run When its time is up, put it on the expired list Recalculate priority first Repeat

Process Descriptor Fields Related to the Scheduler thread_info->flags thread_info->cpu state prio static_prio run_list array sleep_avg timestamp last_ran activated policy cpus_allowed time_slice first_time_slice rt_priority

The Highest Priority Process There is a bit map indicating which queues have processes that are ready to run Find the first bit that s set: 140 queues 5 integers Only a few compares to find the first that is nonzero Hardware instruction to find the first 1-bit bsfl on Intel Time depends on the number of priority levels, not the number of processes

Scheduling Components Static Priority Sleep Average Bonus Dynamic Priority Interactivity Status

Static Priority Each task has a static priority that is set based upon the nice value specified by the task. static_prio in task_struct Value between 0 and 139 (between 100 and 139 for normal processes) Each task has a dynamic priority that is set based upon a number of factors tries to increase priority of interactive jobs

Sleep Average Interactivity heuristic: sleep ratio Mostly sleeping: I/O bound Mostly running: CPU bound Sleep ratio approximation sleep_avg in the task_struct Range: 0.. MAX_SLEEP_AVG When process wakes up (is made runnable), recalc_task_prio adds in how many ticks it was sleeping (blocked), up to some maximum value (MAX_SLEEP_AVG) When process is switched out, schedule subtracts the number of ticks that a task actually ran (without blocking) sleep_avg scaled to a bonus vale

Average Sleep Time and Bonus Values Average sleep time >= 0 but < 100 ms >= 100 ms but < 200 ms >= 200 ms but < 300 ms >= 300 ms but < 400 ms >= 400 ms but < 500 ms >= 500 ms but < 600 ms >= 600 ms but < 700 ms >= 700 ms but < 800 ms >= 800 ms but < 900 ms >= 900 ms but < 1000 ms 1 second Bonus 0 1 2 3 4 5 6 7 8 9 10

Bonus and Dynamic Priority Dynamic priority (prio in task_struct) is calculated in from static priority and bonus = max (100, min( static_priority bonus + 5, 139) )

Calculating Time Slices time_slice in the task_struct Calculate Quantum where If (SP < 120): Quantum = (140 SP) 20 if (SP >= 120): Quantum = (140 SP) 5 where SP is the static priority Higher priority process get longer quanta Basic idea: important processes should run longer Other mechanisms used for quick interactive response

Nice Value vs. static priority and Quantum

Interactive Processes A process is considered interactive if bonus 5 >= (Static Priority / 4) 28 (Static Priority / 4) 28 = interactive delta Low-priority processes have a hard time becoming interactive: A high static priority (100) becomes interactive when its average sleep time is greater than 200 ms A default static priority process becomes interactive when its sleep time is greater than 700 ms Lowest priority (139) can never become interactive The higher the bonus the task is getting and the higher its static priority, the more likely it is to be considered interactive.

Using Quanta At every time tick (in scheduler_tick), decrement the quantum of the current running process (time_slice) If the time goes to zero, the process is done Check interactive status: If non-interactive, put it aside on the expired list If interactive, put it at the end of the active list Exceptions: don t put on active list if: If higher-priority process is on expired list If expired task has been waiting more than STARVATION_LIMIT If there s nothing else at that priority, it will run again immediately Of course, by running so much, its bonus will go down, and so will its priority and its interactive status

Avoiding Starvation The system only runs processes from active queues, and puts them on expired queues when they use up their quanta When a priority level of the active queue is empty, the scheduler looks for the next-highest priority queue After running all of the active queues, the active and expired queues are swapped There are pointers to the current arrays; at the end of a cycle, the pointers are switched

The Priority Arrays struct prio_array { unsigned int nr_active; unsigned long bitmap[5]; struct list_head queue[140]; }; struct rq { spinlock_t lock; unsigned_long nr_running; struct prio_array *active, *expired; struct prio_array arrays[2]; task_struct *curr, *idle; };

Swapping Arrays struct prioarray *array = rq->active; if (array->nr_active == 0) { rq->active = rq->expired; rq->expired = array; }

Why Two Arrays? Why is it done this way? It avoids the need for traditional aging Why is aging bad? It s O(n) at each clock tick

Linux is More Efficient Processes are touched only when they start or stop running That s when we recalculate priorities, bonuses, quanta, and interactive status There are no loops over all processes or even over all runnable processes

Real-Time Scheduling Linux has soft real-time scheduling No hard real-time guarantees All real-time processes are higher priority than any conventional processes Processes with priorities [0, 99] are real-time saved in rt_priority in the task_struct scheduling priority of a real time task is: 99 - rt_priority Process can be converted to real-time via sched_setscheduler system call

Real-Time Policies First-in, first-out: SCHED_FIFO Static priority Process is only preempted for a higher-priority process No time quanta; it runs until it blocks or yields voluntarily RR within same priority level Round-robin: SCHED_RR As above but with a time quanta (800 ms) Normal processes have SCHED_OTHER scheduling policy

Multiprocessor Scheduling Each processor has a separate run queue Each processor only selects processes from its own queue to run Yes, it s possible for one processor to be idle while others have jobs waiting in their run queues Periodically, the queues are rebalanced: if one processor s run queue is too long, some processes are moved from it to another processor s queue

Locking Runqueues To rebalance, the kernel sometimes needs to move processes from one runqueue to another This is actually done by special kernel threads Naturally, the runqueue must be locked before this happens The kernel always locks runqueues in order of increasing indexes Why? Deadlock prevention!

Processor Affinity Each process has a bitmask saying what CPUs it can run on Normally, of course, all CPUs are listed Processes can change the mask The mask is inherited by child processes (and threads), thus tending to keep them on the same CPU Rebalancing does not override affinity

Load Balancing To keep all CPUs busy, load balancing pulls tasks from busy runqueues to idle runqueues. If schedule finds that a runqueue has no runnable tasks (other than the idle task), it calls load_balance load_balance also called via timer schedule_tick calls rebalance_tick Every tick when system is idle Every 100 ms otherwise

Load Balancing load_balance looks for the busiest runqueue (most runnable tasks) and takes a task that is (in order of preference): inactive (likely to be cache cold) high priority load_balance skips tasks that are: likely to be cache warm (hasn't run for cache_decay_ticks time) currently running on a CPU not allowed to run on the current CPU (as indicated by the cpus_allowed bitmask in the task_struct)

Linux 2.6 CFS Scheduler Was merged into the 2.6.23 release. Uses red-black tree structure instead of multilevel queues. Tries to run the task with the "gravest need" for CPU time

Red-Black tree in CFS

Red-Black tree properties Self Balance Insertion and deletion operation in O(log(n)) With proper implementation its performance is almost the same as O(1) algorithms!

The switch_to Macro switch_to() performs a process switch from the prev process (descriptor) to the next process (descriptor). switch_to is invoked by schedule() & is one of the most hardware-dependent kernel routines. See kernel/sched.c and include/asm- */system.h for more details.