Effective Computing with SMP Linux

Size: px
Start display at page:

Download "Effective Computing with SMP Linux"

Transcription

1 Effective Computing with SMP Linux Multi-processor systems were once a feature of high-end servers and mainframes, but today, even desktops for personal use have multiple processors. Linux is a popular server OS and is increasingly being accepted as a mainstream desktop OS. But how good is Linux at multi-processing? The traditional Linux kernel was built for Uniprocessor (UP) systems, but it did not utilize the power offered by multiprocessor systems. However, with the arrival of the Linux 2.6 kernel, this has changed drastically. The Linux 2.6 kernel adds many features to support symmetric multi-processing. This paper will discuss the relevance of an SMP kernel for today s computers, the changes in the Linux 2.6 kernel to support SMP and the benefits of an SMP Linux system. The paper will also explain how developers can take advantage of the SMP features of Linux to develop software that runs more efficiently.

2 About the Author Brijai Sudarsan Brijai Sudarsan is an Associate Consultant at Tata Consultancy Services and has about 13.5 years of experience in the IT industry. He holds a Bachelor s Degree in Electrical and Electronics Engineering and currently works on Storage related technologies for one of the TCS customers - a Global leader in data storage platforms. In his career span in TCS he has led multiple projects in the Linux/Unix domain and has extensive experience in developing distributed applications on Linux. He has also worked on Operating Systems development, mainly on HP NonStop Massively Parallel Processing (MPP) servers. His areas of interest include parallel processing, objected-oriented design and development and storage systems. 1

3 Table of Contents 1. Introduction 4 Overview 4 History 4 What is Symmetric Multiprocessing or SMP? 5 2. Benefits of an SMP Linux system 6 Performance and Scalability 6 Cost Is SMP Linux Cost Effective? 8 Application Portability 8 3. Challenges in Building an SMP System 9 Task Scheduling 9 Synchronization and Parallelization 9 4. The Linux Solution 11 Task Scheduling 11 Scheduler Scalability 11 Load Balancing 14 CPU Affinity 15 Synchronization and Parallelization 15 Per-CPU Variables 15 Atomic Variables 16 Spin Locks 16 Semaphores 17 The Big Kernel Lock (BKL) Application Development on a SMP Linux System Summary 20 2

4 List of Abbreviations Abbreviation/ Acronym Expansion CFS CPU DMA GHz IPC JVM MHz MPP NPTL NUMA OS PIC Completely Fair Scheduler Central Processing Unit (Processor) Direct Memory Access Giga Hertz Inter Process Communication Java Virtual Machine Mega Hertz Massively Parallel Processing Native POSIX Thread Library Non-Uniform Memory Access Operating System Programmable Interrupt Controller POSIX Portable Operating System Interface for Unix SMP UP Symmetric Multiprocessing Uniprocessor 3

5 Introduction There are two ways to improve the processing power of a computer - first is to build a Uniprocessor (UP) system with a faster (and more expensive) processor and the second is to build a system with multiple processors. The second approach is generally referred to as parallel processing. There are different approaches for building a parallel processing system such as Symmetric Multiprocessing (SMP), Cluster Computing and Massively Parallel Processing (MPP). Overview This paper focuses on the most common approach to parallel processing today - Symmetric Multiprocessing (SMP) using Linux as a reference implementation. Firstly, the paper presents the benefits of an SMP system over a UP system. After establishing the relevance of an SMP system, the paper then delves further into the implementation challenges of an SMP kernel as compared to a UP kernel and how the Linux 2.6 kernel addresses these challenges. The discussion is oriented more towards the software design than the hardware architecture. However, hardware features are also briefly mentioned when appropriate. Finally, application development on an SMP Linux system is discussed with specific focus on performance improvement by harnessing the parallel processing features of an SMP kernel. The paper concludes that an SMP Linux system is one of the best platforms available today for delivering low-cost, high-performance software solutions. History A bit of history before we dive deeper into the topic. Multi-processing was first commercially introduced by IBM in 1955 when they made the 704 system. It was designed by Gene Amdahl who is also famous for Amdahl s law for parallel computing. However, the 704 was not an SMP system. The first SMP system was the Burroughs D825 which could support up to four CPUs. Later in 1969, MIT, Bell Labs and General Electric developed the MULTICS (Multiplexed Information and Computing Service) system which could support up to eight processors. MULTICS was a landmark in the development of multi-tasking systems as it has had a significant influence in the development of other operating systems like UNIX. Other SMP systems also became commercially available in the 70s and 80s like the DEC KL 10. However, parallel processing remained the domain of super computers, mainframes and high end servers. In the 90s, desktop computers became popular but were fed with faster processors every few months and the need for parallel processing was never felt. Today, the situation has changed. Symmetric multiprocessing has come of age and is no longer limited to mainframes and super computers. Much of the credit goes to the popularity of desktop computers. As CPU speeds increased dramatically from a few MHz in the 90s to GHz today, chip manufacturers have found it increasingly difficult to design chips that can operate at higher frequencies to meet the computing demands of today s desktop computers. As a result, they turned to multi-processing, and SMP has been the choice due to lower costs and operating system portability. Due to the symmetric design of an SMP system, a UP OS like Linux could be enhanced to support SMP without a complete re-write of the kernel code. Also, OS vendors continued to provide the same interfaces for application programmers (APIs or system calls as they are popularly called) on the SMP system so that applications developed for UP systems could be run without any changes. Today, SMP is so popular that major desktop and server operating systems like Linux and Windows have SMP support built-in. 4

6 The Linux 2.0 kernel supported SMP but in a rather crude way with a single large kernel lock. Major enhancements have been made since then and the Linux 2.6 kernel has excellent support for SMP with fine grained synchronization mechanisms and a scalable scheduler. This will be discussed in detail in later sections of this paper. What is Symmetric Multiprocessing or SMP? A symmetric multiprocessing system consists of two or more identical CPUs that share the system's main memory in a symmetric manner. Every CPU has equal access to the main memory and the data accessed by one CPU can be used by any other CPU in the system. Additionally, both CPU and memory are tightly interconnected so that they can communicate with each other at high speeds. An SMP system is built by tightly interconnecting the CPUs and main memory through a high speed bus as shown in Figure 1.0 below. The components are usually built on a single board so that they can communicate with each other over short distances. The bus length is also short to improve communication speed. Additionally, I/O devices that need Direct Memory Access (DMA) can also be connected to the bus in a symmetric manner. CPU 1 CPU 2 CPU 3 CPU 4 Cache Cache Cache Cache High Speed Bus Main Memory (Shared) Figure 1.0: SMP system CPUs and main memory are interconnected by a high speed bus There are variants of the SMP system like SMP NUMA (Non-Uniform Memory Access) systems which may also have local memory that is allocated to a processor or shared between a set of processors. Such architectures are not discussed in this paper. Let us now see how an SMP system works from an Operating System (OS) perspective. An SMP OS is always a multi-tasking system, but this is not the critical difference between an SMP and UP system. Most UP systems are multi-tasking too as described in the note below. The main difference is that an SMP system can run more than one task at a time as it has multiple CPUs at its disposal. This means that a task may get more time to run on a CPU as the total number of tasks per CPU would be less when compared to a UP system for a given system load. Figure 1.1 and Figure 1.2 provide a snapshot of a UP and a 2 CPU SMP system respectively with 10 tasks T0 to T9 to execute. In Figure 1.1, the 10 tasks are handled by a single CPU, where as in Figure 1.2, the tasks are distributed across the 2 CPUs. Each box represents a time slice given to a process to run. Let us assume each slice is 10 milliseconds long. Multi-tasking on a Uniprocessor system A uniprocessor system has to run multiple tasks on a single CPU. Since it would not be wise to execute these tasks in a sequential manner as some tasks can take a long time to complete, the OS gives each task a specific amount of time (time slice) to run irrespective of how much time a task actually takes. The task is moved out of the CPU if it could not finish its job within the time given and another task is given the chance to run. The swapped out task waits till the OS allows it to run again. All the tasks are swapped in and out of the CPU till they complete. As the swapping happens very fast usually a task occupies the CPU for a few milliseconds only the OS is able to give the user the illusion that multiple tasks are running at any point of time. 5

7 CPU 0 T1 T2 T6 T9 T0 T4 T8 T2 T3 T7 Figure 1.1: Task execution on Uniprocessor system in a 100 millisecond timeframe CPU 0 CPU 1 T1 T2 T3 T1 T0 T4 T1 T2 T3 T0 T6 T7 T5 T9 T5 T8 T6 T1 T7 T1 Figure 1.2: Task execution on a 2 CPU SMP system in a 100 millisecond timeframe Clearly, the two systems do not schedule tasks in the same way. Please note that a task is a generalization and could be a process, thread or an interrupt routine. The following comparisons can be made as shown in table below. Comparison #3 is interesting and there is a clear reason for tasks having a tendency to continue on the same CPU. This is due to Processor or CPU affinity and is discussed in section Challenges in Building an SMP System. Sl. No UP System Runs all the tasks on a single CPU. Tasks take longer to complete as they are competing with more tasks for a time slice of the CPU, so they get fewer time slices for a given period of time. As a result, some of the tasks (T3 and T5) do not get to run at all in the 100 millisecond time frame shown above. A task always runs on the same CPU. SMP Distributes tasks across the two CPUs. Tasks T0 to T5 run on CPU 0 with the exception of T1, which also gets to run on CPU 1. Tasks T5 to T9 run on CPU 1. Tasks take less time to complete as they get more time slices to run. For instance, task T5 gets to run on CPU 1 as it is competing only with T6 to T9 and T1. A task tends to run on the same CPU but there are exceptions like task T1 which is re-scheduled on CPU 1 after initially running on CPU 0 for the first 70 milliseconds. Benefits of an SMP Linux System An SMP system has multiple benefits that have made it a popular choice for desktops, mobile computers and entry as well as mid-range servers. Let us look at the benefits an SMP Linux system has to offer. Performance and Scalability Moore s law predicted that the number of transistors on a processor could increase at an exponential rate. There is considerable debate now on how long Moore s law will hold to predict future processor performance. There is another law that is of interest to a system s performance Amdahl s law, which predicts a system s performance based on the amount of parallelization that can be achieved. 6

8 Amdahl s law: Speedup = 1 (1-P) + P/N N is the number of processors on the system and P is proportion of the system that can be parallelized and (1-P) is that part that is sequential. The system speedup (performance) can be calculated for different levels of parallelization as shown in the graph below. P10 has 10% parallelization, whereas P90 has only 10% of the system that is sequential in execution. Clearly, performance of an SMP system depends on the level of parallelization that can be achieved by the system. For instance, the P10 system shows hardly any performance improvement even when the number of CPUs are increased from 1 to 16, whereas the P90 system scales very well as more CPUs are added. System performance and Parallelization as per Amdhal's law Speedup No. of CPUs Figure 2.0: Amdahl's law P90 P75 P50 P25 P10 Parallelization depends on multiple elements within the system: 1. Hardware Scalability As shown in Figure 1.0, all CPUs in an SMP system share the global main memory and communicate through a common bus. Only one CPU is given access to the bus at a time and other CPUs have to wait till the bus is released by the current CPU. The bus should have enough bandwidth so that CPUs are not kept waiting to fetch instructions and data from the main memory. I/O devices also share the bus for DMA or to communicate with the CPUs. Hence, the bus speed is of critical importance and is the main bottleneck. SMP systems have been found to scale well for 2 to 16 CPU configurations. Larger systems which need a greater degree of hardware parallelization and scalability usually use other approaches like Massively Parallel Processing (MPP) or clusters. 2. Kernel Synchronization Tasks (processes, threads or interrupt routines) frequently need to execute in Kernel mode to access services such as I/O, inter-process communication and to synchronize access to shared resources. Hence, the kernel code is quite frequently run by the CPUs and needs a high level of parallelization. The Kernel needs to implement synchronization mechanisms as different CPUs could be executing kernel code that is accessing the same data. Linux provides various synchronization mechanisms such as spin locks and semaphores to ensure that all concurrent threads of execution in the kernel are synchronized. 3. Kernel Pre-emption - Linux has evolved with each release of the kernel supporting more fine grained synchronization. Up to version 2.4, the Linux kernel was non-preemptive if a task is executing in kernel mode, it continues to execute until it voluntarily exits from kernel mode. Although this simplifies the kernel design a great 7

9 deal as very little synchronization is required within the kernel, this is bad news for an SMP system. The Linux 2.6 kernel is preemptive. Except for critical sections which are non-preemptive, the 2.6 kernel allows multiple CPUs to execute in kernel mode at the same time. The critical sections which could be updating data shared across CPUs are protected by various locking mechanisms as discussed in section The Linux Solution. 4. Task Scheduling Tasks need to be scheduled by the kernel so that the CPUs are utilized with a high level of efficiency. The kernel scheduler needs to distribute tasks across CPUs so that the system load is balanced across all CPUs. The scheduler also needs to ensure that tasks are re-scheduled on the same CPU to address the cache affinity problem discussed earlier in section What is Symmetric Multiprocessing or SMP? Tasks and particularly high priority tasks should not migrate between CPUs as they are re-scheduled. The CPU cache needs to be rebuilt every time a task migrates to a different CPU and this is expensive. For a high priority process, this can be quite expensive as the task can get re-scheduled more frequently. Finally, the scheduler itself should be efficient so that it does not use a lot of CPU cycles trying to schedule tasks. Ideally, the performance of the scheduler should remain the same irrespective of the number of tasks it has to schedule. The Linux 2.6 kernel has a highly scalable scheduler that addresses these challenges as we shall see in section The Linux Solution. 5. Application Parallelism Even if the hardware and the kernel have good SMP support, applications may still not perform significantly better on a SMP system when compared to a UP system. Building large, monolithic applications is not a good idea on an SMP system. The kernel views it as a single task and is allocated time slices like any other task irrespective of its size. To take advantage of the processing power available, it is better to split a large application into multiple processes and threads. Tasks that can run in parallel within an application can be written to run as independent processes or threads. Cost Is SMP Linux Cost Effective? This is a critical question. Is an SMP system more cost effective than building a cluster of UP systems or an MPP system? There is definitely a cost benefit from a hardware perspective. As an SMP system has a symmetric architecture - every CPU has equal access to main memory and I/O devices - it is a logical extension of the UP system with more CPUs added in. Hence, an N processor SMP system is cheaper to build than interconnecting N UP systems together. Furthermore, an SMP enabled Linux OS is available at no additional cost. Also, Linux scales well as more processors are added. SMP Linux systems with a 2 to 16 CPU configuration have shown significant performance benefits to justify the cost involved in adding more processors. Distributed applications are well suited to harness the power of an SMP Linux system and can provide the expected cost benefits by reducing the hardware required to run software at a required performance level. A good example is the Java Virtual Machine (JVM) which has a multi-threaded architecture. The JVM has shown significant performance improvements using the NPTL threads library on SMP Linux systems. A major reason for this is the high performance Completely Fair Scheduler (CFS) in Linux 2.6. This enables distributed applications to create a large number of processes and threads to take advantage of the additional processing power. Application Portability Linux provides the same programming interface (APIs or system calls) on both SMP and UP systems. The complexity of the SMP system is managed by the Kernel and is transparent to an application developer. This means that applications developed for UP systems can be run on an SMP system without any changes. However, the application may need further performance tuning to take advantage of the parallel processing capabilities of an SMP system. 8

10 Challenges in Building an SMP System An SMP system is a logical extension of a UP system. The main difference between the two is that the SMP system is capable of multi-processing. Many of the other features of a UP system can be re-used with relevant changes on an SMP system. Hence, writing an SMP kernel usually does not involve developing a new kernel from scratch. An SMP system differs from a UP system mainly in two aspects: Task Scheduling - With more processors, SMP systems can run more number of tasks and so need an efficient task scheduler. Synchronization and Parallelization - Multiple processors concurrently execute different parts of the kernel in kernel mode. The kernel needs appropriate synchronization mechanisms so that a high level of parallelism can be achieved without corrupting kernel data. Let us look at the above two challenges in more detail. Task Scheduling Scheduler scalability - The main aim of choosing an SMP system is that it can handle a larger processing load than a UP system. As more processors are added, the system will have more runnable tasks. The kernel task scheduler has to scale well to adjust to a higher system load. The scheduler should not take too many CPU cycles trying to find the next task to run. Ideally, the scheduling algorithm efficiency should be independent of the number of tasks it has to schedule, the number of CPUs or any other system parameter. Load balancing - Unlike a UP scheduler, an SMP scheduler also needs to balance the load across all CPUs. Processes and threads need to be assigned to CPUs so that all CPUs are optimally utilized. In addition to processes and tasks, interrupts need to be serviced in a timely manner. Processor affinity - Tasks should be re-scheduled on the same CPU as far as possible. Every CPU has a local cache where it holds data that it thinks may be used by processes in the near future. This improves performance as the CPU does not need to fetch data from the relatively slower main memory. If a task is re-scheduled on a different CPU, all data cached for the task on the last CPU cannot be used. The new CPU needs to build its own local cache for the migrated task. This is expensive and can have a major impact on system performance particularly if the migrating task is a high priority one and is frequently migrated across CPUs. The scheduler also needs to ensure that loads across all CPUs are balanced and at the same time address the processor affinity issue these two problems can be contradictory and the right choice needs to be made to maximize system performance. Synchronization and Parallelization An SMP system has true parallelism. On an N CPU system, N tasks could be running on each CPU at the same time. Some of these tasks could be running in user mode and some in kernel mode. A user mode task can enter kernel mode when it makes a system call like fork() or write(). Interrupts and kernel threads always execute in kernel mode. Hence, at any point of time, there could be many concurrent threads of execution within the kernel a process may have invoked a system call, a CPU could be servicing an interrupt or a kernel thread could be running. 9

11 A UP kernel also has concurrency issues if the kernel is preemptive. A task running in kernel mode could be swapped out of the CPU and another task could start running which may be modifying the same data structures as the preempted task. However, there are certain conditions that are unique to an SMP system: Concurrency due to Parallelism An SMP kernel is designed to have parallel threads of execution. A non-preemptive kernel will allow a task to run in kernel mode as long it wishes and the task can do a planned switch out of the kernel after ensuring that all shared data structures are updated correctly. This greatly simplifies kernel synchronization on a UP system. However, this will not work on an SMP system for the simple reason that multiple CPUs could be executing kernel code at the same time and some of them could be changing the same data structures as another. Serializing a CPU s access to the kernel using kernel non-preemption defeats the very purpose of having an SMP system. Interrupts UP systems often disable pre-emption of a kernel mode process or thread by disabling interrupts. Although this can hamper system response, it simplifies kernel synchronization. Again, this will not work on an SMP system. I/O Interrupts are delivered to the CPU by a Programmable Interrupt Controller (PIC). The PIC has its own logic of delivering interrupts and can deliver the interrupt to any of the available CPUs. For example, on the Intel x86 platform, interrupts are delivered by the PIC to the CPU that is running the lowest priority process. If multiple CPUs are running at the same low priority, the PIC delivers interrupts in a round robin fashion. To sum up, neither kernel non-preemption nor interrupt disabling can simplify kernel synchronization on an SMP system. The kernel needs to implement proper synchronization mechanisms so that kernel data structures are kept in a consistent state. Locking granularity - Another challenge before the kernel is the amount of synchronization to be done. The kernel can opt for coarse-grained and fine-grained locking mechanisms. Coarse-grained locking involves locking large areas of code where a major percentage of the code may be updating shared kernel data. Code that is updating shared data is called a critical region and needs to be protected by locking mechanisms. Thus, the entire region may not be a critical region but a major part of it could be. If the region is too large, it will affect system performance as other tasks will need to wait till the large lock is released. On the positive side, fewer locks are required which could mean lower code complexity and possibly fewer deadlocks. On low-end SMP systems, this may be desirable too as the level of parallelism or contention between tasks may be low. In fact, having more locks may lower performance due to the cost of managing locks. Often a lock may be acquired even though there may not be any contention. However, coarse-grained locking can be bad for scalability and on higher end SMP systems this can be a bottleneck. For better scalability, fine-grained locking can be used. Fine-grained locking involves more precise identification of critical regions. For example, a linked list can be protected by providing a large lock for the entire list. However, if a large number of tasks need to access the list, all but one will be blocked and the access to the entire list becomes serialized. A fine-grained approach can provide a lock for each node of the list instead of the entire list. The tasks may need to access different nodes in the list and hence there is less contention between them. However, as the number of tasks increase, there may be many tasks trying to access a node itself and hence a finer-grained approach may be needed perhaps by providing locks for each element in the node. While this may give very good response on a high 10

12 end 32 CPU system, performance on a 2 CPU system could be disastrous as a lot of locks will be created although there may not be a lot of tasks hitting the critical region at the same time. The kernel has to be designed with the right balance of coarse and fine grained locking. Often this is a continuous evolution and involves incorporating feedback from the real world and kernel performance test teams. The Linux Solution Task Scheduling The 2.6 kernel scheduler has been re-written to provide better performance on SMP systems. In fact, the scheduler has been re-written twice since the first release of the 2.6 kernel. The initial release sported an O(1) algorithm, run queues per processor and SMP affinity. Later releases starting with have a new scheduling algorithm called the Completely Fair Scheduler (CFS). Let us take a closer look at the 2.6 scheduler to see how it meets the challenges posed by an SMP system. Scheduler Scalability A scheduler should be scalable as the number of tasks and processors increase. Let us see how Linux handles these two parameters. Task Scalability Linux 2.4 had an O(n) scheduler. The scheduler selected the next task to run by searching the list of tasks that are ready to be run. Clearly, the performance of the scheduler depended on the number of tasks and the time taken to select the next task to run would increase linearly with the number of tasks. The initial releases of Linux 2.6 kernel provided an O(1) scheduler. An O(1) scheduling algorithm performs with the same efficiency over time irrespective of any of the system parameters like the number of tasks that are runnable or the number of CPUs. The scheduler can find the next task to be run in constant time if there are 10 tasks or 500 tasks waiting to run. This was a major achievement as it can provide significant performance benefits on high end systems that may be running thousands of tasks. Even on smaller machines, distributed applications like the Java Virtual Machine (JVM) can spawn hundreds of threads and each thread is scheduled independently like a process. The scheduler achieves this constant time performance by using task priority queues instead of runnable tasks lists. Each task is given a priority that can range from 1 (highest) to N, where N is the number of priority levels in the system. Hence, there are N priority queues. A priority queue can be empty if there are no tasks with a particular priority that is runnable. An N-bit bitmap is used to track this with 1 bit per priority. If a queue has tasks, the bit is set to 1 and if empty the corresponding bit is set to 0. The Big O notation The O notation is often used in computer science to indicate an algorithm's efficiency or time complexity. For example, an algorithm that searches an unsorted array of n elements has an order O(n) as it will scale linearly with n. The larger the value of n, the slower the performance of the search. Similarly, an algorithm that searches an n*n matrix would have a time complexity of O(n^2). An algorithm that fetches an element of a contiguous array of n elements has a complexity of O(1) as it can fetch the 10th or the 100th element in constant time. 11

13 The scheduler finds the first bit in the bitmap (highest priority task available) that is set and selects the first task in the priority queue to run. Since finding the first set bit in a N size bitmap is a constant time operation and is independent of the number of runnable tasks, the scheduler has O(1) efficiency. We will not discuss this further as this algorithm has been replaced with a completely new logic in later 2.6 releases. Starting with release , Linux uses a new scheduling algorithm known as the Completely Fair Scheduler (CFS). The CFS has an efficiency of O(log n), where n is the number of runnable tasks. Thus, the CFS is not as efficient as the O(1) scheduler, but could have other advantages like better system responsiveness for interactive and real time tasks. Tests results for CFS have been positive and difference in performance of the O(1) and CFS scheduler is marginal even at very high loads. The CFS algorithm is based on logic that a truly multi-tasking CPU would run all tasks at the same time giving each a share of its processing power. However, in real hardware, only one task can run at a time on a CPU. Instead, the CFS algorithm uses the concept of virtual time. Every task is assigned a virtual time in nanoseconds and is tracked through a per task variable. Let us call this variable vruntime. A task having a low value of vruntime has not got its fair share of CPU; where as a task with a high value may have got an unfair share. The aim of the CFS algorithm is to run all tasks so that every task gets its fair share and the system is balanced. CFS picks up the task that has the lowest vruntime. For this, it uses a time ordered red-black tree instead of task priority queues. The tree is sorted using vruntime as the key. The red-black tree is a per-cpu binary search tree and every runnable task in a CPU has a node in the tree. As a task runs on a CPU, its vruntime is increased by the scheduler. At some point, the value of vruntime is no longer the lowest in the red-black tree and another task with a lower value gets selected and the current task is switched out of the CPU. The algorithm proceeds in this manner to ensure that every task gets its proportional share of CPU time. Processor Scalability On an SMP system, the scheduler can be invoked concurrently by any of the processors. This means that if the scheduler's data structures are shared, it would become a source of contention between CPUs. The scheduler's main data structure that contains the list of runnable tasks is called a run-queue. The run-queue is a container structure that stores data such as the list of runnable tasks, load on a CPU and count of tasks. For the scheduler to scale well with more number of CPUs, it is important that access to this data structure is parallelized. Linux 2.4 global run-queue Task-Queue Lock CPU 0 CPU 1 CPU 2 CPU 3 Figure 4.0: Global run-queue in Linux

14 The Linux 2.4 kernel had only a single run queue for all processors as shown in Figure 4.0. Processors would acquire a lock on the run-queue while scheduling the next task to run. This will block all other CPUs that are attempting to schedule as well. Since tasks run on a CPU for a very small time slice, the scheduler is frequently invoked and as the number of CPUs goes up it can negatively impact the scheduler s efficiency. Linux 2.6 introduces a run-queue per processor as shown in Figure 4.1a and 4.1b below. Run-queue for CPU 0 Run-queue for CPU 1 Priority 1 FIFO (2 tasks) Priority 2 FIFO (empty) N B i t Priority 1 FIFO (empty) Priority 2 FIFO (7 tasks) N B i t Priority N FIFO (5 tasks) B i t m a p Priority N FIFO (2 tasks) B i t m a p Lock Lock CPU0 CPU1 Figure 4.1a: Linux 2.6 per CPU run-queue with O(1) scheduler While the initial releases of Linux 2.6 had N (maximum task priority level) task priority queues and a bitmap within the run-queue, the newer CFS scheduler stores the red-black binary tree and associated elements within the runqueue. The run-queue also has additional fields like CPU load indicators and SMP load balancing data. Each runqueue is protected by a spin lock. Tasks that are runnable on a particular CPU are stored with the CPU's own run-queue data structure and cannot be used by another processor. Such a design is often called a multi-queue scheduler and significantly reduces contention between CPUs concurrently running the scheduler. This approach also solves the problem of CPU affinity as mentioned later in this section under CPU Affinity. Run-queue for CPU 0 Run-queue for CPU 1 Ta Th Tb Tc Ti Tj Td Te Tf Tg Tk Tl Tm Tn Red-Black Tree Lock Red-Black Tree Lock CPU0 CPU1 Figure 4.1b: Linux 2.6 per CPU run-queue with CFS scheduler. Tasks Ta to Tn are distributed across the 2 CPUs. 13

15 Load Balancing The scheduler on an SMP system needs to ensure that tasks are evenly distributed across all CPUs apart from ensuring that all processes get a fair share to run on a CPU. It is not possible for the scheduler to predict how long a task will run when it is created and scheduled to run for the first time. It depends on the program flow and cannot be predicted. As more processes are created over time, the load on all the CPUs is bound to get unbalanced as some tasks may be short-lived, others long running, some may be CPU intensive while others more I/O bound. Linux 2.6 uses a sophisticated load balancing algorithm that supports load balancing for different multi-processing architectures like SMP, Hyper threading and NUMA. All CPUs are divided into scheduling domains. Scheduler domains are hierarchical and reflect the CPU topology of the system. Each domain is further split up into groups. Each group represents a subset of the domain's CPUs. Load balancing is always done between groups of a scheduling domain a process is migrated only if the workload of a group in a domain is much less than the workload of another group in the same domain. Scheduling domains are not as relevant to SMP systems as they are to NUMA or Hyper threaded CPU topologies as the system has a symmetric design - all CPUs are identical and have equal access to memory and I/O. An SMP system could have just one domain and each group in the domain corresponds to a physical CPU. Let us now look at load balancing from an SMP perspective. The kernel periodically checks the run-queues of all CPUs to see if the load is balanced across all CPUs. The run-queue of a processor tracks the load on the CPU. At periodic intervals the kernel re-calculates the load and determines if load balancing is required. The frequency at which the kernel may call the load balancing algorithm depends on the current state of the CPU. If the CPU is idle the run-queue is empty the load balancing code is invoked quite frequently. If the CPU has an active run queue, the load balancer will be called less frequently. The load balancer once invoked looks for the busiest CPU on the scheduling domain to determine the load imbalance. The busiest CPU is identified only if it is significantly busier than all other CPUs in the domain. The load balancer then attempts to pull tasks from the busiest CPU to the local CPU on which it is running. The task migration can happen after taking the following into consideration: 1. The process on busiest CPU is not currently running. 2. The local CPU is idle. 3. Previous attempts to balance the system by migrating tasks from the busiest CPU have failed. 4. The process to be moved is not cache hot. A process is likely to have a hot CPU cache if it was run recently on that CPU. Moving a process with a hot CPU cache is expensive as the cache on the busiest CPU is lost and needs to be rebuilt on the local CPU. If the task migration fails, the kernel then looks for another idle CPU in the system. If it finds one, it re-attempts the task migration from the busiest CPU to balance the system. 14

16 CPU Affinity As mentioned earlier in this section, Linux 2.6 uses individual run-queues for each CPU. As the run-queue contains only the list of tasks to be scheduled on that CPU, tasks have a natural affinity for the last CPU on which it was scheduled and continues to be re-scheduled on the same CPU. The only scenario in which a task may be migrated to the run-queue of another CPU is when the load balancing algorithm runs and finds that the load needs to be balanced across CPUs. An unfortunate task could get selected and migrated to a different CPU. While this may provide better CPU load balancing, frequent task migration is expensive as the CPU cache needs to be re-built for the migrated process. The load balancing algorithm tries to minimize task migration while trying to balance the load. One of the criteria that the balancing algorithm uses to decide if a task can be migrated from a busy CPU to an idle one is that if the task has a hot cache. A task would have a hot cache if it was recently run on a busy CPU. If this is the case, the load balance algorithm may not select the task for migration. Synchronization and Parallelization The Linux 2.6 kernel supports kernel pre-emption. This means that a task running in kernel mode can be switched out of the CPU anytime by the task scheduler and another task could be run. SMP together with kernel pre-emption increases the concurrency within the kernel. Linux provides different synchronization techniques to handle this challenge like spin locks, semaphores, per-cpu variables and atomic variables. While spin locks, semaphores and atomic variables ensure atomic operation of the critical region of code they protect, per-cpu variables allow greater degree of parallelization by avoiding the need to synchronize between CPUs completely. The kernel also supports variants of these synchronization mechanisms like reader-writer spin locks and semaphores, sequencers and completion variables. These are not discussed here as the focus is on the fundamental techniques used. Per-CPU Variables Per-CPU variables distribute a data structure so that every CPU has a copy of the data and can access it without any contention from tasks running on other CPUs. Another advantage of a per-cpu variable is that a frequently used variable can reside in the local CPU cache and can be accessed much faster. Per-CPU variables do not migrate along with tasks to other CPUs and can be useful in reducing cache invalidation due to task migration or task ping-pong as it is popularly called in Linux circles. Although per-cpu variables do not need to be protected from tasks running on other CPUs, as the kernel is preemptive, protection may be needed from tasks on the same CPU. Also, if the task gets pre-empted and is later rescheduled on another CPU, it may leave the per-cpu variable in an inconsistent state. For these reasons, it is desirable to disable kernel pre-emption when modifying per-cpu variables. The Linux kernel provides useful macros and functions to handle this. 15

17 Atomic Variables Often a critical region can be as small as incrementing or adding a value to a variable. Within the CPU, only single instructions may be guaranteed to be atomic. On an SMP system, even some of the single but complex instructions may not be atomic as all CPUs share the system bus that connects to memory. A hardware memory arbiter circuit regulates access to memory for all CPUs and may provide the bus to another CPU while the current one has not completed all the steps within an instruction. Applications quite often need atomic read-modify-write instructions which may increment a counter, assign a new value or add two memory location and store the result back in memory. For example, the following piece of code may be compiled as shown on the right: // Add j to i and assign to i. i = i + j; Copy i to an internal CPU register; Copy j to an internal CPU register; Add the 2 registers and store in a result register; Store result register to memory location i; Clearly, the compiler has generated four instructions for the single line of C code. As the kernel supports preemption, the task executing the above piece of code may get pre-empted in the middle and memory locations could be updated by another task. To provide atomicity for such simple operations, Linux provides an atomic data type atomic_t and additional macros/functions that can perform simple operation like incrementing the variable, adding a value to the variable or adding and testing the result of addition. Most hardware platforms provide an atomic instruction that can increment/decrement or add/sub a memory location. Some platforms may not have a corresponding atomic instruction but allow locking of the system bus while the operation is executed by the CPU. Operations on the atomic_t are implemented using these atomic machine instructions by the Linux kernel. Spin Locks A spin lock is the fundamental and most widely used synchronization mechanism in an SMP kernel. They are unique to SMP Linux kernels and are not used on a UP Linux system. A spin lock, as the name indicates is a light-weight spinning or looping lock. The task attempting to acquire a spin lock either acquires the lock or loops until it gets the lock. Hence, a spin lock keeps the task running in the CPU and is busy waiting for the lock. Spin locks are usually implemented using platform specific atomic test and set instructions that would check the value of a memory location and set its value in an atomic manner. The lock loops till the test and set instruction succeeds. Figure 4.2 shows two tasks A and B trying to acquire a spin lock S. Task A succeeds and executes the critical region while B spins on another CPU waiting for A to release the lock. Please note that a spin lock is useful only on multiprocessor systems. On a UP system, if B spins, it will wait indefinitely as A will not get a chance to run. The spin lock could disable kernel preemption and B will continue to run creating a deadlock. 16

18 Task A Acquire Lock S success Execute critical region Release lock S Execute non-critical region Task B Acquire Lock S fail Spins trying to acquire S keeps CPU busy while spinning Acquires lock S Execute critical region Release lock S Figure 4.2: Task A and B try to acquire spin lock S. B fails and spins till A releases the lock. A spin lock is used for fine grained locking and is used to protect short critical regions of code on SMP systems. As it spins a task waiting for a lock, precious CPU cycles are spent doing nothing. Thus, they are suited only for locking for short periods of time. Spin locks are also commonly used by interrupts to lock a resource. Interrupts cannot use other mechanisms of locking like semaphores as they are non-schedulable tasks. Locking mechanisms like semaphores can put a task to sleep and can be later re-scheduled to run. Clearly, spin locks are only useful for locking regions of code that execute in a short period of time. They cannot be used for locking data when the associated critical region may need to sleep waiting for events as the blocked tasks will spin wasting CPU cycles. Linux supports another locking mechanism for such critical regions Semaphores. Semaphores A semaphore is another fundamental locking mechanism that can be used to protect large critical regions. A large critical region may execute longer and may sleep within the critical region waiting for an event to happen. Clearly, it is not advisable to use a spin lock in such scenarios. Unlike a spin lock, a semaphore is a sleep lock. If a task tries to acquire a semaphore that is currently locked, the task is added to a wait queue and put to sleep. Later, when the task that locked the semaphore unlocks it, the first task in the wait queue is awakened and is ready to run. The awakened task when scheduled to run, acquires the semaphore and executes the critical region. As the task sleeps while waiting for the lock, the kernel can schedule another process to run on the CPU if the two tasks are on different CPUs. Also, the task that holds the semaphore could itself be preempted by the kernel as a semaphore does not disable kernel preemption. Semaphores are suitable for coarse-grained locking as it allows the scheduler to execute other tasks. They are not suited for locking short critical regions as they are not light-weight like spin locks. A significant overhead is involved in maintaining wait queues and context switches required when a task is put to sleep when trying to acquire a locked semaphore and waking it back later when the lock is released. The Big Kernel Lock (BKL) The Big Kernel Lock (BKL) is a historic coarse-grained lock that was introduced in Linux 2.0 to support SMP. The BKL was used to lock large sections of the code to provide synchronization within the kernel on an SMP system. The Linux 2.6 kernel also has BKL locks but mostly in legacy file system code. The use of BKL is restricted as it increases serialization within the kernel and reduces system performance as more processors are added. 17

19 Application Development on a SMP Linux System An SMP system is a parallel processing system and hence applications design should mirror the system to take advantage of the parallel processing capabilities of SMP Linux. A large monolithic application will more often than not underperform on SMP Linux. This is because the application is serialized and is not capable of harnessing the parallelism with the system. Let us discuss this with an example. An application needs to process N input feeds. Each feed has M records of information and each record can be processed independent of the other. Every feed can also be processed independently. Let us see how a monolithic application will handle this problem. As shown in Figure 5.0, the program opens a feed, processes each record in the feed sequentially and then opens the next feed and continues till all feeds are processed. Let us assume that each record takes time T to process. What is the time taken by the application to process N feeds? It is TxMxN. If there are 10 feeds with 100 records per feed and each feed takes 5 seconds to process, the time required will be 5x100x10 or 5000 seconds. Can a SMP Linux system execute this application faster given all other run conditions are constant? No. Perhaps there can be a slight improvement if certain kernel parameters can be tuned. But that is not the problem here. Open a feed Read and Process a record in feed M iterations N iterations Y More records? N STOP N More feeds? Y Figure 5.0: Monolithic Application 18

20 The problem is that the application does not take advantage of the SMP capabilities of the Linux platform. It will get scheduled to run on a CPU and will get a time slice like any other process. If the load on the CPU is high, the process will get fewer time slices due to contention from other processes. As it executes, its priority also may be brought down by the scheduler if it got to run frequently initially. The application probably will never get to run on another less busy CPU due to the CPU affinity issue that the scheduler considers while performing load balancing. Even if it gets to run, there is no parallelism involved as the entire application will migrate to the new CPU. Linux provides processes and threads to distribute a program s execution across CPUs. A process or an NPTL thread is an independent schedulable entity and can be run on one of the available CPUs on the system. Hence, the right approach to application development on an SMP system is to develop distributed applications. Independent tasks or tasks which have a high degree of independence can be parallelized. In the real world, tasks are not completely independent of each other and there is bound to be some contention between tasks for shared resources. Linux provides user space inter-process communication (IPC) mechanisms such as semaphores, mutexes, conditional variables and shared memory to synchronize tasks in a distributed application. Let us get back to our application and see how the design can be improved to achieve better performance. Each of the N feeds can be processed by a feeder thread. Some of these N threads may get to execute on different CPUs and would execute at the same time. In an ideal situation, if the system had N CPUs and all N tasks got to run on different CPUs, the application could complete in TxM time. Of course, this is only the ideal, but the real performance would be much better than TxMxN. Some of the N threads will be running on the same CPU but they are independent schedulable entities and hence will get their own time slices on the same CPU. Thus overall, the application s performance is bound to improve. The parallelism can be further improved by adding more threads. For example, each feed can be segmented into S segments and a thread can process a segment. Another approach could be to have N processes for each feeds and S threads within process for each segment. Various approaches can be taken and parallelism of the SMP system provides the developer multiple options to improve application performance. 19

21 Summary The Linux platform has been evolving at a rapid pace to support true symmetric multi-processing. From the crude support for SMP provided in Linux 2.0, the platform has come a long way. The Linux 2.6 kernel boasts of features such as highly scalable task scheduler, kernel pre-emption and fine-grained locking mechanisms to provide better scalability and performance. For the application developer, Linux provides process and thread creation interfaces such as NPTL and IPCs to harness the parallelism of the system. The Linux 2.6 kernel is one of the best choices to exploit the SMP hardware platforms of today. SMP computing is becoming common today with even personal desktops sporting multiple CPUs. High end servers and mainframes can also benefit as Linux is highly scalable and stable. Linux is also portable as much of it is written in C and newer SMP platforms can port Linux and avoid costs involved in developing an OS from scratch. A fine example of this is the IBM System z mainframe that can run on Linux. Finally, Linux is open, free and evolving. There is a large and talented development and hacker community that is devoted to the development and testing of the platform. Innovation is promoted and anyone is free to contribute to the future evolution of Linux. There is no fear of development stopping dead in its tracks due to budgetary concerns. To conclude, SMP Linux computing has a bright future ahead. It is already one of the leading SMP server operating systems and is a serious and free challenger for SMP desktop computing. The future of computing may be oriented towards multi-processing as processing speeds hit an upper limit and Linux could be the right platform to run these systems. References 1. Unix Systems for Modern Architectures : Symmetric Multiprocessing and Caching for Kernel Programmers, by Curt Schimmel 2. Linux Kernel Development by Robert Love 3. Understanding the Linux Kernel by Daniel P. Bovet and Marco Cesati 4. Linux source code from 5. Linux and Symmetric Multiprocessing by Tim Jones 6. Inside the Linux Scheduler by Tim Jones 7. Towards Linux 2.6 by Anand Santhanam, 8. Introducing the 2.6 Kernel by Robert Love 9. Native POSIX Thread Library from Wikipedia 10. Red-black trees from Wikipedia 20

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann Andre.Brinkmann@uni-paderborn.de Universität Paderborn PC² Agenda Multiprocessor and

More information

Multilevel Load Balancing in NUMA Computers

Multilevel Load Balancing in NUMA Computers FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,

More information

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances: Scheduling Scheduling Scheduling levels Long-term scheduling. Selects which jobs shall be allowed to enter the system. Only used in batch systems. Medium-term scheduling. Performs swapin-swapout operations

More information

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput Import Settings: Base Settings: Brownstone Default Highest Answer Letter: D Multiple Keywords in Same Paragraph: No Chapter: Chapter 5 Multiple Choice 1. Which of the following is true of cooperative scheduling?

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun EECS 750: Advanced Operating Systems 01/28 /2015 Heechul Yun 1 Recap: Completely Fair Scheduler(CFS) Each task maintains its virtual time V i = E i 1 w i, where E is executed time, w is a weight Pick the

More information

A Survey of Parallel Processing in Linux

A Survey of Parallel Processing in Linux A Survey of Parallel Processing in Linux Kojiro Akasaka Computer Science Department San Jose State University San Jose, CA 95192 408 924 1000 kojiro.akasaka@sjsu.edu ABSTRACT Any kernel with parallel processing

More information

CPU Scheduling Outline

CPU Scheduling Outline CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different

More information

Real-Time Scheduling 1 / 39

Real-Time Scheduling 1 / 39 Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A

More information

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what

More information

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Multi-core and Linux* Kernel

Multi-core and Linux* Kernel Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores

More information

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,

More information

Chapter 6, The Operating System Machine Level

Chapter 6, The Operating System Machine Level Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General

More information

ò Paper reading assigned for next Thursday ò Lab 2 due next Friday ò What is cooperative multitasking? ò What is preemptive multitasking?

ò Paper reading assigned for next Thursday ò Lab 2 due next Friday ò What is cooperative multitasking? ò What is preemptive multitasking? Housekeeping Paper reading assigned for next Thursday Scheduling Lab 2 due next Friday Don Porter CSE 506 Lecture goals Undergrad review Understand low-level building blocks of a scheduler Understand competing

More information

Chapter 5 Linux Load Balancing Mechanisms

Chapter 5 Linux Load Balancing Mechanisms Chapter 5 Linux Load Balancing Mechanisms Load balancing mechanisms in multiprocessor systems have two compatible objectives. One is to prevent processors from being idle while others processors still

More information

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354 159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1

More information

CPU Scheduling. CSC 256/456 - Operating Systems Fall 2014. TA: Mohammad Hedayati

CPU Scheduling. CSC 256/456 - Operating Systems Fall 2014. TA: Mohammad Hedayati CPU Scheduling CSC 256/456 - Operating Systems Fall 2014 TA: Mohammad Hedayati Agenda Scheduling Policy Criteria Scheduling Policy Options (on Uniprocessor) Multiprocessor scheduling considerations CPU

More information

SYSTEM ecos Embedded Configurable Operating System

SYSTEM ecos Embedded Configurable Operating System BELONGS TO THE CYGNUS SOLUTIONS founded about 1989 initiative connected with an idea of free software ( commercial support for the free software ). Recently merged with RedHat. CYGNUS was also the original

More information

Linux Process Scheduling. sched.c. schedule() scheduler_tick() hooks. try_to_wake_up() ... CFS CPU 0 CPU 1 CPU 2 CPU 3

Linux Process Scheduling. sched.c. schedule() scheduler_tick() hooks. try_to_wake_up() ... CFS CPU 0 CPU 1 CPU 2 CPU 3 Linux Process Scheduling sched.c schedule() scheduler_tick() try_to_wake_up() hooks RT CPU 0 CPU 1 CFS CPU 2 CPU 3 Linux Process Scheduling 1. Task Classification 2. Scheduler Skeleton 3. Completely Fair

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Linux scheduler history. We will be talking about the O(1) scheduler

Linux scheduler history. We will be talking about the O(1) scheduler CPU Scheduling Linux scheduler history We will be talking about the O(1) scheduler SMP Support in 2.4 and 2.6 versions 2.4 Kernel 2.6 Kernel CPU1 CPU2 CPU3 CPU1 CPU2 CPU3 Linux Scheduling 3 scheduling

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 26 Real - Time POSIX. (Contd.) Ok Good morning, so let us get

More information

Chapter 2: OS Overview

Chapter 2: OS Overview Chapter 2: OS Overview CmSc 335 Operating Systems 1. Operating system objectives and functions Operating systems control and support the usage of computer systems. a. usage users of a computer system:

More information

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures. Centralized Systems Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Optimizing Shared Resource Contention in HPC Clusters

Optimizing Shared Resource Contention in HPC Clusters Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs

More information

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Processes and Non-Preemptive Scheduling. Otto J. Anshus Processes and Non-Preemptive Scheduling Otto J. Anshus 1 Concurrency and Process Challenge: Physical reality is Concurrent Smart to do concurrent software instead of sequential? At least we want to have

More information

independent systems in constant communication what they are, why we care, how they work

independent systems in constant communication what they are, why we care, how they work Overview of Presentation Major Classes of Distributed Systems classes of distributed system loosely coupled systems loosely coupled, SMP, Single-system-image Clusters independent systems in constant communication

More information

Chapter 5 Process Scheduling

Chapter 5 Process Scheduling Chapter 5 Process Scheduling CPU Scheduling Objective: Basic Scheduling Concepts CPU Scheduling Algorithms Why Multiprogramming? Maximize CPU/Resources Utilization (Based on Some Criteria) CPU Scheduling

More information

Chapter 1: Operating System Models 1 2 Operating System Models 2.1 Introduction Over the past several years, a number of trends affecting operating system design are witnessed and foremost among them is

More information

Linux Scheduler. Linux Scheduler

Linux Scheduler. Linux Scheduler or or Affinity Basic Interactive es 1 / 40 Reality... or or Affinity Basic Interactive es The Linux scheduler tries to be very efficient To do that, it uses some complex data structures Some of what it

More information

Performance Comparison of RTOS

Performance Comparison of RTOS Performance Comparison of RTOS Shahmil Merchant, Kalpen Dedhia Dept Of Computer Science. Columbia University Abstract: Embedded systems are becoming an integral part of commercial products today. Mobile

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Operating Systems Concepts: Chapter 7: Scheduling Strategies

Operating Systems Concepts: Chapter 7: Scheduling Strategies Operating Systems Concepts: Chapter 7: Scheduling Strategies Olav Beckmann Huxley 449 http://www.doc.ic.ac.uk/~ob3 Acknowledgements: There are lots. See end of Chapter 1. Home Page for the course: http://www.doc.ic.ac.uk/~ob3/teaching/operatingsystemsconcepts/

More information

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS CPU SCHEDULING CPU SCHEDULING (CONT D) Aims to assign processes to be executed by the CPU in a way that meets system objectives such as response time, throughput, and processor efficiency Broken down into

More information

Process Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff

Process Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff Process Scheduling CS 241 February 24, 2012 Copyright University of Illinois CS 241 Staff 1 Announcements Mid-semester feedback survey (linked off web page) MP4 due Friday (not Tuesday) Midterm Next Tuesday,

More information

Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler

Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform Ed Spetka Mike Kohler Outline Abstract Hardware Overview Completely Fair Scheduler Design Theory Breakdown of the

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

Windows Server Performance Monitoring

Windows Server Performance Monitoring Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly

More information

CPU Scheduling. Core Definitions

CPU Scheduling. Core Definitions CPU Scheduling General rule keep the CPU busy; an idle CPU is a wasted CPU Major source of CPU idleness: I/O (or waiting for it) Many programs have a characteristic CPU I/O burst cycle alternating phases

More information

Tasks Schedule Analysis in RTAI/Linux-GPL

Tasks Schedule Analysis in RTAI/Linux-GPL Tasks Schedule Analysis in RTAI/Linux-GPL Claudio Aciti and Nelson Acosta INTIA - Depto de Computación y Sistemas - Facultad de Ciencias Exactas Universidad Nacional del Centro de la Provincia de Buenos

More information

Operating Systems. Virtual Memory

Operating Systems. Virtual Memory Operating Systems Virtual Memory Virtual Memory Topics. Memory Hierarchy. Why Virtual Memory. Virtual Memory Issues. Virtual Memory Solutions. Locality of Reference. Virtual Memory with Segmentation. Page

More information

Operating System: Scheduling

Operating System: Scheduling Process Management Operating System: Scheduling OS maintains a data structure for each process called Process Control Block (PCB) Information associated with each PCB: Process state: e.g. ready, or waiting

More information

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do

More information

Operating Systems, 6 th ed. Test Bank Chapter 7

Operating Systems, 6 th ed. Test Bank Chapter 7 True / False Questions: Chapter 7 Memory Management 1. T / F In a multiprogramming system, main memory is divided into multiple sections: one for the operating system (resident monitor, kernel) and one

More information

Comp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d

Comp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d Comp 204: Computer Systems and Their Implementation Lecture 12: Scheduling Algorithms cont d 1 Today Scheduling continued Multilevel queues Examples Thread scheduling 2 Question A starvation-free job-scheduling

More information

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could

More information

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...

More information

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum Scheduling Yücel Saygın These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum 1 Scheduling Introduction to Scheduling (1) Bursts of CPU usage alternate with periods

More information

Design and Implementation of the Heterogeneous Multikernel Operating System

Design and Implementation of the Heterogeneous Multikernel Operating System 223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,

More information

Parallel Scalable Algorithms- Performance Parameters

Parallel Scalable Algorithms- Performance Parameters www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

Readings for this topic: Silberschatz/Galvin/Gagne Chapter 5

Readings for this topic: Silberschatz/Galvin/Gagne Chapter 5 77 16 CPU Scheduling Readings for this topic: Silberschatz/Galvin/Gagne Chapter 5 Until now you have heard about processes and memory. From now on you ll hear about resources, the things operated upon

More information

Parallelism and Cloud Computing

Parallelism and Cloud Computing Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication

More information

Process Scheduling in Linux

Process Scheduling in Linux The Gate of the AOSP #4 : Gerrit, Memory & Performance Process Scheduling in Linux 2013. 3. 29 Namhyung Kim Outline 1 Process scheduling 2 SMP scheduling 3 Group scheduling - www.kandroid.org 2/ 41 Process

More information

The CPU Scheduler in VMware vsphere 5.1

The CPU Scheduler in VMware vsphere 5.1 VMware vsphere 5.1 Performance Study TECHNICAL WHITEPAPER Table of Contents Executive Summary... 4 Introduction... 4 Terminology... 4 CPU Scheduler Overview... 5 Design Goals... 5 What, When, and Where

More information

Operating Systems Lecture #6: Process Management

Operating Systems Lecture #6: Process Management Lecture #6: Process Written by based on the lecture series of Dr. Dayou Li and the book Understanding 4th ed. by I.M.Flynn and A.McIver McHoes (2006) Department of Computer Science and Technology,., 2013

More information

White Paper Perceived Performance Tuning a system for what really matters

White Paper Perceived Performance Tuning a system for what really matters TMurgent Technologies White Paper Perceived Performance Tuning a system for what really matters September 18, 2003 White Paper: Perceived Performance 1/7 TMurgent Technologies Introduction The purpose

More information

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run SFWR ENG 3BB4 Software Design 3 Concurrent System Design 2 SFWR ENG 3BB4 Software Design 3 Concurrent System Design 11.8 10 CPU Scheduling Chapter 11 CPU Scheduling Policies Deciding which process to run

More information

Multi-core Programming System Overview

Multi-core Programming System Overview Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

COS 318: Operating Systems. Virtual Machine Monitors

COS 318: Operating Systems. Virtual Machine Monitors COS 318: Operating Systems Virtual Machine Monitors Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Introduction Have been around

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

Overview of Presentation. (Greek to English dictionary) Different systems have different goals. What should CPU scheduling optimize?

Overview of Presentation. (Greek to English dictionary) Different systems have different goals. What should CPU scheduling optimize? Overview of Presentation (Greek to English dictionary) introduction to : elements, purpose, goals, metrics lambda request arrival rate (e.g. 200/second) non-preemptive first-come-first-served, shortest-job-next

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

OpenMosix Presented by Dr. Moshe Bar and MAASK [01]

OpenMosix Presented by Dr. Moshe Bar and MAASK [01] OpenMosix Presented by Dr. Moshe Bar and MAASK [01] openmosix is a kernel extension for single-system image clustering. openmosix [24] is a tool for a Unix-like kernel, such as Linux, consisting of adaptive

More information

Operating System Tutorial

Operating System Tutorial Operating System Tutorial OPERATING SYSTEM TUTORIAL Simply Easy Learning by tutorialspoint.com tutorialspoint.com i ABOUT THE TUTORIAL Operating System Tutorial An operating system (OS) is a collection

More information

Overview of the Linux Scheduler Framework

Overview of the Linux Scheduler Framework Overview of the Linux Scheduler Framework WORKSHOP ON REAL-TIME SCHEDULING IN THE LINUX KERNEL Pisa, June 27th, 2014 Marco Cesati University of Rome Tor Vergata Marco Cesati (Univ. of Rome Tor Vergata)

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

Garbage Collection in the Java HotSpot Virtual Machine

Garbage Collection in the Java HotSpot Virtual Machine http://www.devx.com Printed from http://www.devx.com/java/article/21977/1954 Garbage Collection in the Java HotSpot Virtual Machine Gain a better understanding of how garbage collection in the Java HotSpot

More information

Delivering Quality in Software Performance and Scalability Testing

Delivering Quality in Software Performance and Scalability Testing Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,

More information

Course Development of Programming for General-Purpose Multicore Processors

Course Development of Programming for General-Purpose Multicore Processors Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 wzhang4@vcu.edu

More information

CHAPTER 15: Operating Systems: An Overview

CHAPTER 15: Operating Systems: An Overview CHAPTER 15: Operating Systems: An Overview The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint

More information

Why Threads Are A Bad Idea (for most purposes)

Why Threads Are A Bad Idea (for most purposes) Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories john.ousterhout@eng.sun.com http://www.sunlabs.com/~ouster Introduction Threads: Grew up in OS world (processes).

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Scheduling. Monday, November 22, 2004

Scheduling. Monday, November 22, 2004 Scheduling Page 1 Scheduling Monday, November 22, 2004 11:22 AM The scheduling problem (Chapter 9) Decide which processes are allowed to run when. Optimize throughput, response time, etc. Subject to constraints

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Distributed Systems LEEC (2005/06 2º Sem.)

Distributed Systems LEEC (2005/06 2º Sem.) Distributed Systems LEEC (2005/06 2º Sem.) Introduction João Paulo Carvalho Universidade Técnica de Lisboa / Instituto Superior Técnico Outline Definition of a Distributed System Goals Connecting Users

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Multicore Programming with LabVIEW Technical Resource Guide

Multicore Programming with LabVIEW Technical Resource Guide Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN

More information

Hadoop Fair Scheduler Design Document

Hadoop Fair Scheduler Design Document Hadoop Fair Scheduler Design Document October 18, 2010 Contents 1 Introduction 2 2 Fair Scheduler Goals 2 3 Scheduler Features 2 3.1 Pools........................................ 2 3.2 Minimum Shares.................................

More information

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source

More information

W4118 Operating Systems. Instructor: Junfeng Yang

W4118 Operating Systems. Instructor: Junfeng Yang W4118 Operating Systems Instructor: Junfeng Yang Outline Advanced scheduling issues Multilevel queue scheduling Multiprocessor scheduling issues Real-time scheduling Scheduling in Linux Scheduling algorithm

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Main Points. Scheduling policy: what to do next, when there are multiple threads ready to run. Definitions. Uniprocessor policies

Main Points. Scheduling policy: what to do next, when there are multiple threads ready to run. Definitions. Uniprocessor policies Scheduling Main Points Scheduling policy: what to do next, when there are multiple threads ready to run Or multiple packets to send, or web requests to serve, or Definitions response time, throughput,

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

Outline: Operating Systems

Outline: Operating Systems Outline: Operating Systems What is an OS OS Functions Multitasking Virtual Memory File Systems Window systems PC Operating System Wars: Windows vs. Linux 1 Operating System provides a way to boot (start)

More information

Objectives. Chapter 5: Process Scheduling. Chapter 5: Process Scheduling. 5.1 Basic Concepts. To introduce CPU scheduling

Objectives. Chapter 5: Process Scheduling. Chapter 5: Process Scheduling. 5.1 Basic Concepts. To introduce CPU scheduling Objectives To introduce CPU scheduling To describe various CPU-scheduling algorithms Chapter 5: Process Scheduling To discuss evaluation criteria for selecting the CPUscheduling algorithm for a particular

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010. Road Map Scheduling Dickinson College Computer Science 354 Spring 2010 Past: What an OS is, why we have them, what they do. Base hardware and support for operating systems Process Management Threads Present:

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

OPERATING SYSTEMS SCHEDULING

OPERATING SYSTEMS SCHEDULING OPERATING SYSTEMS SCHEDULING Jerry Breecher 5: CPU- 1 CPU What Is In This Chapter? This chapter is about how to get a process attached to a processor. It centers around efficient algorithms that perform

More information

Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs

Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs Run-Time Scheduling Support for Hybrid CPU/FPGA SoCs Jason Agron jagron@ittc.ku.edu Acknowledgements I would like to thank Dr. Andrews, Dr. Alexander, and Dr. Sass for assistance and advice in both research

More information

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction

More information