Programming and Scheduling Model for Supporting Heterogeneous Architectures in Linux
|
|
- Cameron Logan
- 8 years ago
- Views:
Transcription
1 Programming and Scheduling Model for Supporting Heterogeneous Architectures in Linux Third Workshop on Computer Architecture and Operating System co-design Paris, Tobias Beisel, Tobias Wiersema, Christian Plessl, and André Brinkmann Paderborn Center for Parallel Computing (PC²), University of Paderborn 2 ), - 4 * ) 4 ) /
2 Motivation How are heterogeneous systems currently used? Main thread Application... CPU_task_A(); GPU_task_B(); GPU_task_C(); FPGA_task_D(); CPU_task_E();... Operating System GPU task GPU task Hardware task CPU task GPU1 GPU2 FPGA Hardware CPU 2
3 Motivation How are heterogeneous systems currently used? Main thread Application... parallel do { CPU_task_A(); GPU_task_B(); GPU_task_C(); FPGA_task_D(); CPU_task_E(); }... Operating System GPU task GPU task Hardware task CPU task GPU1 GPU2 FPGA Hardware CPU 3
4 Motivation How are heterogeneous systems currently used? Application A Main thread... parallel do { CPU_task_A(); GPU_task_B(); GPU_task_C(); FPGA_task_D(); CPU_task_E(); }... Application B Main thread... parallel do { CPU_task_F(); GPU_task_G(); GPU_task_H(); FPGA_task_I(); CPU_task_J(); }... Operating System GPU task GPU task Hardware task CPU task CPU task GPU1 GPU2 FPGA Hardware CPU 4
5 Resource Management needed Motivation Application A Main thread... parallel do { CPUorGPU_task_A(); GPU_task_B(); GPU_task_C(); FPGAorGPU_task_D(); CPU_task_E(); }... Application B Main thread... parallel do { CPU_task_F(); CPUorGPU_task_G(); GPUorFPGA_task_H(); FPGA_task_I(); CPU_task_J(); }... Operating System GPU task GPU task Hardware task CPU task CPU task GPU1 GPU2 FPGA Hardware CPU 5
6 Motivation Concurrent use of accelerators may be beneficial A,B,C D GPU A B C (a) CPU D (b) GPU CPU A B C D (c) GPU CPU A B C A D B C D timestep A, B, C: Tasks only capable to run on the GPU, D: Task executable on CPU or on GPU with 5x Speedup All tasks have the same priority, scheduling overheads are neglected 6
7 Challenges Common properties of accelerators No preemption, interrupts, or migration support Full internal state not accessible Explicit non-uniform communication of data needed Use dedicated APIs, ISAs and binaries Handling several accelerators requires custom solutions Decision space is much broader than in traditional scheduling Dynamic run-time decisions still need to be very fast Decision parameter acquisition and choice is important Resource management in heterogeneous systems is difficult 7
8 Agenda Basic Concepts and Ideas Scheduling and Programming Model Task Management Experimental Evaluation Related Work Conclusions and Future Work 8
9 General Approach Scheduler objectives No architectures excluded by design Useable for future operating systems (OS) Easy to use by applications Approach One-node heterogeneous systems Scheduler component in kernel space OS adaptation instead of OS rewrite OS Scheduler Task Management by cooperative multitasking Delegate threads as scheduling entities submitted to be scheduled allocated hardware copy data & start Application Delegate Thread Hardware Specific Thread results Accelerator 9
10 Cooperative Multitasking Cooperative multitasking Tasks offer voluntarily release of hardware at checkpoints Checkpoints allow migration between different ISAs Overcomes the lack of preemption and migration Allows to use time-sharing Goal: Combine cooperative multitasking with checkpointing and time-sharing as a heterogeneous extension to the Completely Fair Scheduler (CFS) Running preempt run Ready wait signal Blocked 10
11 Cooperative Multitasking Cooperative multitasking Tasks offer voluntarily release of hardware at checkpoints Checkpoints allow migration between different ISAs Overcomes the lack of preemption and migration Allows to use time-sharing Goal: Combine cooperative multitasking with checkpointing and time-sharing as a heterogeneous extension to the Completely Fair Scheduler (CFS) Running release run Ready wait signal Blocked 11
12 Agenda Basic Concepts and Ideas Scheduling and Programming Model Task Management Experimental Evaluation Related Work Conclusions and Future Work 12
13 Scheduling Model Application Scheduler component Main thread spawns Driver Delegate thread Delegate thread Delegate thread executes submits Thread information submitted to Scheduling policy Accelerator Architectures dequeue enqueue thread GPU1 GPU2 FPGA CPU Delegate thread Delegate thread Hardware unit queue GPU thread GPU thread Hardware thread Delegate thread dequeue CF scheduler 13
14 Scheduler Extension Scheduler API provides three system calls cu_allocate() Blocking call to enqueue tasks to most affine hardware queue cu_re-request() Request to further use the allocated hardware at checkpoint Scheduler decides based on scheduling policy cu_free() Free hardware resources Call load balancer if hardware runs idle Control API allows to manage accelerators in the system CFS data structures were adapted for heterogeneous use 14
15 Programming Model Application spawns threads Application uses system calls to acquire, keep and release an accelerator Application provides Thread meta information Checkpoint(s) Function pointers for each supported hardware for: Accelerator initialization (app_init) Computation to the next checkpoint (app_main) Free function to release accelerator and write a checkpoint (app_free) Checkpoint reached Yes Request Resource cu_allocate(meta_info) Copy Data & Code app_init() Start Computation app_main() Done? No Yes Reuse? cu_rerequest() No Copy Back Results/Checkpoint app_free() Free Resources cu_free() app_exit() 15
16 ich are Delegate Thread Implementation Back cu_free() Reduce Results pthread_join() Free Resources & Copy Results s using to and e there ciently ves all remove its and ing the ionally, lication ory for plifies Example for Delete Worker shutdown() Meta information Figure 2. Typical workflow of a delegate thread. Implementation multiplexer void Worker_example::workerMetaInfo(struct meta_info *mi){ mi->memory_to_copy=0; // in MB mi->type_affinity[cu_type_cuda]=2; mi->type_affinity[cu_type_cpu]=1; } void* Worker_example::getImplementationFor(int type, functions *af) switch(type) { case CU_TYPE_CPU: af->init=&worker_example::cpu_init; af->main=&worker_example::cpu_main; af->free=&worker_example::cpu_free; af->initialized=true; break; case CU_TYPE_CUDA:... //similar default: af->initialized=false; } Listing 1. Example implementation for mandatory worker functions. 16
17 Checkpointing A checkpoint is a preferably small set of data structures that Unambiguously defines the state of execution Is stored back to the host s main memory when a task is preempted Is readable and translatable to all supported ISAs Properties Currently user defined Size is application dependent and known at definition time Size influences the scheduling granularity typedef struct md5_resources { std::string hash_to_search; unsigned long long currentwordnumber; bool found_solution; } md5_resources_t; Listing 2. Example checkpoint for MD5 cracking. 17 T as be alp dif hig
18 Agenda Basic Concepts and Ideas Scheduling and Programming Model Task Management Experimental Evaluation Related Work Conclusions and Future Work 18
19 Thread Scheduling Process and Parameters Initial enqueueing Static affinity of a task towards an accelerator Load balancing additionally considers load (current queue length) Queue size limitation for accelerators Internal queue scheduling Fairness Based on virtual runtime and defined consistently with the CFS!fairness: time a tasks needs to run to be treated fair Priorities Are inherited from the delegate thread Using the same priority adjustments of virtual runtime as CFS on accelerator queues Actual execution time slot granularity = 2 * t copy_checkpoint +!fairness + base _ granularity granularity depends on the checkpoint distance 19
20 Load Balancer Executed when any device runs idle 2 step concept for task migration 1. Search all run-queues for most affine waiting tasks 2. If 1. fails: check running tasks and leave a migration offer (flag) migrate First pass GPU Task 0 GPU: 5 CPU: 1 Task 1 GPU: 2 CPU: 1 Task 4 GPU: 6 CPU: 1 Task 2 GPU:10 CPU: 1 Task 5 GPU: 5 CPU: 1 offer migration Second pass GPU Task 0 GPU: 5 CPU: 1 Task 1 GPU:10 CPU: 0 Task 4 GPU: 6 CPU: 0 Task 2 GPU:10 CPU: 0 Task 5 GPU: 2 CPU: 0 running queued Inter CPU balancing by original CFS 20
21 Agenda Basic Concepts and Ideas Scheduling and Programming Model Task Management Experimental Evaluation Related Work Conclusions and Future Work 21
22 Experimental Setup Test applications MD5 cracking Prime factorization Running several instances of both applications Hardware setup 2 * Intel Quad Core XEON E MHz, hyperthreading enabled 12 GB DDR NVIDIA Tesla 2075 Software setup Ubuntu Linux LTS, modified kernel NVIDIA Cuda 4.0 GCC
23 Impact of Time-sharing on GPU!"#$%&%'() Comparing different base_granularities with FCFS!"#$%%$& %)) 3.1 "#$,-."!+)!*)!')!%)!)) +) *) ') %) +$, )!"#$ %"#$ &"#$ '"#$ ($(" Avg. turnaround time running 25 MD5 and 50 prime factorization instances on the GPU &01 &+$, -+$,.+$, /,/+ Mean of total runtime for 30 runs with 25 MD5 and 50 prime factorization threads on the GPU Average turnaround times decrease with shorter base_granularities (40%) Long running tasks do not block short running tasks Makespan overheads increase with shorter base_granularities (13%) Overheads can be hidden by heterogeneous use of available hardware 23
24 Makespan reduction of up to 22% 765)(;$*(5*:$3459:,-+,++ &-+ & Heterogeneity Hides Overheads!"#$%%$& &+,+ -+ & $5)*)87$"9: Average runtimes of different counts of GPU affine prime factorization instances using a GPU and a combination of both GPU and CPUs./0*1*2/0 2/0 Effect will increase significantly with more applications providing Comparable affinities to several architectures Different hardware with highest affinity Using more architectures 24
25 Agenda Basic Concepts and Ideas Scheduling and Programming Model Task Management Experimental Evaluation Related Work Conclusions and Future Work 25
26 Related Work StarPU (Augonnet et al., 2009) User Space runtime system for heterogeneous systems including data management, scheduling policies and profiling models No preemption and migration support, extensive programming model ReconOS (Lübbers et al., 2010) OS extension to manage partial reconfigurable system resources Aims on dynamically reconfigurable hardware Jimenez et al. (2009) Predictive runtime code scheduling for heterogeneous architectures Prediction based on performance history, estimated waiting times Harmony (Diamos and Yalamanchili, 2008) Execution model and runtime for heterogeneous many core systems Programming model with meta-data, runtime kernel division 26
27 Agenda Basic Concepts and Ideas Scheduling and Programming Model Task Management Experimental Evaluation Related Work Conclusions and Future Work 27
28 Conclusions and Future Directions Cooperative multitasking with time-sharing is possible Reduces average turnaround times Increases interactivity and overall performance CFS extension implemented for conceptual validation Programming model Adds checkpoints and meta information to applications Next steps Compare to user space scheduler Extend scheduling policies, improve load-balancing Port further and more extensive example applications Approach automatic checkpoint identification Code available as open source 28
29 Thank you for your attention! Tobias Beisel This is work in context of the ENHANCE project. 29
Scheduling Support for Heterogeneous Hardware Accelerators under Linux
Scheduling Support for Heterogeneous Hardware Accelerators under Linux Tobias Wiersema University of Paderborn Paderborn, December 2010 1 / 24 Tobias Wiersema Linux scheduler extension for accelerators
More informationMultiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann Andre.Brinkmann@uni-paderborn.de Universität Paderborn PC² Agenda Multiprocessor and
More informationMulti-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationA general-purpose virtualization service for HPC on cloud computing: an application to GPUs
A general-purpose virtualization service for HPC on cloud computing: an application to GPUs R.Montella, G.Coviello, G.Giunta* G. Laccetti #, F. Isaila, J. Garcia Blas *Department of Applied Science University
More informationGPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
More information10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details
Thomas Fahrig Senior Developer Hypervisor Team Hypervisor Architecture Terminology Goals Basics Details Scheduling Interval External Interrupt Handling Reserves, Weights and Caps Context Switch Waiting
More informationMODULE 3 VIRTUALIZED DATA CENTER COMPUTE
MODULE 3 VIRTUALIZED DATA CENTER COMPUTE Module 3: Virtualized Data Center Compute Upon completion of this module, you should be able to: Describe compute virtualization Discuss the compute virtualization
More informationNext Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
More informationStream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens
More informationEmbedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
More informationDeciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run
SFWR ENG 3BB4 Software Design 3 Concurrent System Design 2 SFWR ENG 3BB4 Software Design 3 Concurrent System Design 11.8 10 CPU Scheduling Chapter 11 CPU Scheduling Policies Deciding which process to run
More informationPage 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems
Lecture Outline Operating Systems Objectives Describe the functions and layers of an operating system List the resources allocated by the operating system and describe the allocation process Explain how
More informationProject No. 2: Process Scheduling in Linux Submission due: April 28, 2014, 11:59pm
Project No. 2: Process Scheduling in Linux Submission due: April 28, 2014, 11:59pm PURPOSE Getting familiar with the Linux kernel source code. Understanding process scheduling and how different parameters
More informationOperatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings
Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,
More informationScheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:
Scheduling Scheduling Scheduling levels Long-term scheduling. Selects which jobs shall be allowed to enter the system. Only used in batch systems. Medium-term scheduling. Performs swapin-swapout operations
More informationCPU Scheduling Outline
CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationNetworking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
More informationNVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
More informationExperiences on using GPU accelerators for data analysis in ROOT/RooFit
Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,
More informationCORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER
CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER Tender Notice No. 3/2014-15 dated 29.12.2014 (IIT/CE/ENQ/COM/HPC/2014-15/569) Tender Submission Deadline Last date for submission of sealed bids is extended
More informationAdvanced topics: reentrant function
COSC 6374 Parallel Computation Advanced Topics in Shared Memory Programming Edgar Gabriel Fall 205 Advanced topics: reentrant function Functions executed in a multi-threaded environment need to be re-rentrant
More informationOpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group michael.wolfe@pgroup.com This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
More informationMulti-core and Linux* Kernel
Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores
More informationultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
More informationLinux scheduler history. We will be talking about the O(1) scheduler
CPU Scheduling Linux scheduler history We will be talking about the O(1) scheduler SMP Support in 2.4 and 2.6 versions 2.4 Kernel 2.6 Kernel CPU1 CPU2 CPU3 CPU1 CPU2 CPU3 Linux Scheduling 3 scheduling
More informationCPU Scheduling. CSC 256/456 - Operating Systems Fall 2014. TA: Mohammad Hedayati
CPU Scheduling CSC 256/456 - Operating Systems Fall 2014 TA: Mohammad Hedayati Agenda Scheduling Policy Criteria Scheduling Policy Options (on Uniprocessor) Multiprocessor scheduling considerations CPU
More informationFull and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
More informationCFD Implementation with In-Socket FPGA Accelerators
CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline
More informationLong-term monitoring of apparent latency in PREEMPT RT Linux real-time systems
Long-term monitoring of apparent latency in PREEMPT RT Linux real-time systems Carsten Emde Open Source Automation Development Lab (OSADL) eg Aichhalder Str. 39, 78713 Schramberg, Germany C.Emde@osadl.org
More informationReconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationMitigating Starvation of Linux CPU-bound Processes in the Presence of Network I/O
Mitigating Starvation of Linux CPU-bound Processes in the Presence of Network I/O 1 K. Salah 1 Computer Engineering Department Khalifa University of Science Technology and Research (KUSTAR) Sharjah, UAE
More informationEnabling Legacy Applications on Heterogeneous Platforms
Enabling Legacy Applications on Heterogeneous Platforms Michela Becchi, Srihari Cadambi and Srimat Chakradhar NEC Laboratories America, Inc. {mbecchi, cadambi, chak}@nec-labs.com Abstract In this paper
More informationEECE 276 Embedded Systems
EECE 276 Embedded Systems Embedded SW Architectures Round-robin Function-queue scheduling EECE 276 Embedded Systems Embedded Software Architectures 1 Software Architecture How to do things how to arrange
More informationThe High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
More informationCase Study on Productivity and Performance of GPGPUs
Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia
More informationLinux Process Scheduling. sched.c. schedule() scheduler_tick() hooks. try_to_wake_up() ... CFS CPU 0 CPU 1 CPU 2 CPU 3
Linux Process Scheduling sched.c schedule() scheduler_tick() try_to_wake_up() hooks RT CPU 0 CPU 1 CFS CPU 2 CPU 3 Linux Process Scheduling 1. Task Classification 2. Scheduler Skeleton 3. Completely Fair
More informationChapter 5 Cloud Resource Virtualization
Chapter 5 Cloud Resource Virtualization Contents Virtualization. Layering and virtualization. Virtual machine monitor. Virtual machine. Performance and security isolation. Architectural support for virtualization.
More informationInside the Erlang VM
Rev A Inside the Erlang VM with focus on SMP Prepared by Kenneth Lundin, Ericsson AB Presentation held at Erlang User Conference, Stockholm, November 13, 2008 1 Introduction The history of support for
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationSystem Requirements Table of contents
Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5
More informationSYSTEM ecos Embedded Configurable Operating System
BELONGS TO THE CYGNUS SOLUTIONS founded about 1989 initiative connected with an idea of free software ( commercial support for the free software ). Recently merged with RedHat. CYGNUS was also the original
More informationAn Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
More informationFPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
More informationò Paper reading assigned for next Thursday ò Lab 2 due next Friday ò What is cooperative multitasking? ò What is preemptive multitasking?
Housekeeping Paper reading assigned for next Thursday Scheduling Lab 2 due next Friday Don Porter CSE 506 Lecture goals Undergrad review Understand low-level building blocks of a scheduler Understand competing
More informationComp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d
Comp 204: Computer Systems and Their Implementation Lecture 12: Scheduling Algorithms cont d 1 Today Scheduling continued Multilevel queues Examples Thread scheduling 2 Question A starvation-free job-scheduling
More informationProgramming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
More informationHP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch
More informationEffective Computing with SMP Linux
Effective Computing with SMP Linux Multi-processor systems were once a feature of high-end servers and mainframes, but today, even desktops for personal use have multiple processors. Linux is a popular
More informationVersion 3.7 Technical Whitepaper
Version 3.7 Technical Whitepaper Virtual Iron 2007-1- Last modified: June 11, 2007 Table of Contents Introduction... 3 What is Virtualization?... 4 Native Virtualization A New Approach... 5 Virtual Iron
More informationChapter 5 Process Scheduling
Chapter 5 Process Scheduling CPU Scheduling Objective: Basic Scheduling Concepts CPU Scheduling Algorithms Why Multiprogramming? Maximize CPU/Resources Utilization (Based on Some Criteria) CPU Scheduling
More informationHPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014
HPC Cluster Decisions and ANSYS Configuration Best Practices Diana Collier Lead Systems Support Specialist Houston UGM May 2014 1 Agenda Introduction Lead Systems Support Specialist Cluster Decisions Job
More informationMCA Standards For Closely Distributed Multicore
MCA Standards For Closely Distributed Multicore Sven Brehmer Multicore Association, cofounder, board member, and MCAPI WG Chair CEO of PolyCore Software 2 Embedded Systems Spans the computing industry
More informationReal-Time Scheduling 1 / 39
Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
More informationCornell University Center for Advanced Computing
Cornell University Center for Advanced Computing David A. Lifka - lifka@cac.cornell.edu Director - Cornell University Center for Advanced Computing (CAC) Director Research Computing - Weill Cornell Medical
More informationVirtual Private Systems for FreeBSD
Virtual Private Systems for FreeBSD Klaus P. Ohrhallinger 06. June 2010 Abstract Virtual Private Systems for FreeBSD (VPS) is a novel virtualization implementation which is based on the operating system
More informationAccelerating variant calling
Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo
More informationResource Scheduling Best Practice in Hybrid Clusters
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Resource Scheduling Best Practice in Hybrid Clusters C. Cavazzoni a, A. Federico b, D. Galetti a, G. Morelli b, A. Pieretti
More informationOperating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:
Chapter 7 OBJECTIVES Operating Systems Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the concept of virtual memory. Understand the
More informationCS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study
CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what
More informationOptimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
More informationHigh Performance or Cycle Accuracy?
CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing
More informationEnabling Preemptive Multiprogramming on GPUs
Enabling Preemptive Multiprogramming on GPUs Ivan Tanasic 1,2, Isaac Gelado 3, Javier Cabezas 1,2, Alex Ramirez 1,2, Nacho Navarro 1,2, Mateo Valero 1,2 1 Barcelona Supercomputing Center 2 Universitat
More informationCPU Scheduling. Basic Concepts. Basic Concepts (2) Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems
Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems Based on original slides by Silberschatz, Galvin and Gagne 1 Basic Concepts CPU I/O Burst Cycle Process execution
More informationLast Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
More informationOperating System Tutorial
Operating System Tutorial OPERATING SYSTEM TUTORIAL Simply Easy Learning by tutorialspoint.com tutorialspoint.com i ABOUT THE TUTORIAL Operating System Tutorial An operating system (OS) is a collection
More informationXeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
More informationW4118 Operating Systems. Instructor: Junfeng Yang
W4118 Operating Systems Instructor: Junfeng Yang Outline Introduction to scheduling Scheduling algorithms 1 Direction within course Until now: interrupts, processes, threads, synchronization Mostly mechanisms
More informationOperating System: Scheduling
Process Management Operating System: Scheduling OS maintains a data structure for each process called Process Control Block (PCB) Information associated with each PCB: Process state: e.g. ready, or waiting
More informationImplementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration Jinglin Zhang, Jean François Nezan, Jean-Gabriel Cousin, Erwan Raffin To cite this version: Jinglin Zhang,
More informationCloud Computing. Up until now
Cloud Computing Lecture 11 Virtualization 2011-2012 Up until now Introduction. Definition of Cloud Computing Grid Computing Content Distribution Networks Map Reduce Cycle-Sharing 1 Process Virtual Machines
More informationCarlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu. Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu
Continuous Monitoring using MultiCores Carlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu Motivation Intrusion detection Intruder gets
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationAuto-Tunning of Data Communication on Heterogeneous Systems
1 Auto-Tunning of Data Communication on Heterogeneous Systems Marc Jordà 1, Ivan Tanasic 1, Javier Cabezas 1, Lluís Vilanova 1, Isaac Gelado 1, and Nacho Navarro 1, 2 1 Barcelona Supercomputing Center
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationProcesses and Non-Preemptive Scheduling. Otto J. Anshus
Processes and Non-Preemptive Scheduling Otto J. Anshus 1 Concurrency and Process Challenge: Physical reality is Concurrent Smart to do concurrent software instead of sequential? At least we want to have
More informationCompletely Fair Scheduler and its tuning 1
Completely Fair Scheduler and its tuning 1 Jacek Kobus and Rafał Szklarski 1 Introduction The introduction of a new, the so called completely fair scheduler (CFS) to the Linux kernel 2.6.23 (October 2007)
More informationVirtualization for Cloud Computing
Virtualization for Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF CLOUD COMPUTING On demand provision of computational resources
More informationLinux Scheduler. Linux Scheduler
or or Affinity Basic Interactive es 1 / 40 Reality... or or Affinity Basic Interactive es The Linux scheduler tries to be very efficient To do that, it uses some complex data structures Some of what it
More information1. Computer System Structure and Components
1 Computer System Structure and Components Computer System Layers Various Computer Programs OS System Calls (eg, fork, execv, write, etc) KERNEL/Behavior or CPU Device Drivers Device Controllers Devices
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationPerformance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
More informationDynamic Task-Scheduling and Resource Management for GPU Accelerators in Medical Imaging
This is the author s version of the work. The definitive work was published in Proceedings of the 25th International Conference on Architecture of Computing Systems (ARCS), Munich, Germany, February 28
More informationNVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v6.5 August 2014 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
More informationParallel Image Processing with CUDA A case study with the Canny Edge Detection Filter
Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Daniel Weingaertner Informatics Department Federal University of Paraná - Brazil Hochschule Regensburg 02.05.2011 Daniel
More informationLinux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler
Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform Ed Spetka Mike Kohler Outline Abstract Hardware Overview Completely Fair Scheduler Design Theory Breakdown of the
More informationLoad-Balancing for a Real-Time System Based on Asymmetric Multi-Processing
LIFL Report # 2004-06 Load-Balancing for a Real-Time System Based on Asymmetric Multi-Processing Éric PIEL Eric.Piel@lifl.fr Philippe MARQUET Philippe.Marquet@lifl.fr Julien SOULA Julien.Soula@lifl.fr
More informationCPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS
CPU SCHEDULING CPU SCHEDULING (CONT D) Aims to assign processes to be executed by the CPU in a way that meets system objectives such as response time, throughput, and processor efficiency Broken down into
More informationPOSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)
RTOSes Part I Christopher Kenna September 24, 2010 POSIX Portable Operating System for UnIX Application portability at source-code level POSIX Family formally known as IEEE 1003 Originally 17 separate
More information12. Introduction to Virtual Machines
12. Introduction to Virtual Machines 12. Introduction to Virtual Machines Modern Applications Challenges of Virtual Machine Monitors Historical Perspective Classification 332 / 352 12. Introduction to
More informationACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents
More informationIBM Software Group. Lotus Domino 6.5 Server Enablement
IBM Software Group Lotus Domino 6.5 Server Enablement Agenda Delivery Strategy Themes Domino 6.5 Server Domino 6.0 SmartUpgrade Questions IBM Lotus Notes/Domino Delivery Strategy 6.0.x MRs every 4 months
More informationGeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
More informationPeter Ruissen Marju Jalloh
Peter Ruissen Marju Jalloh Agenda concepts >> To research the possibilities for High Availability (HA) failover mechanisms using the XEN virtualization technology and the requirements necessary for implementation
More informationEECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun
EECS 750: Advanced Operating Systems 01/28 /2015 Heechul Yun 1 Recap: Completely Fair Scheduler(CFS) Each task maintains its virtual time V i = E i 1 w i, where E is executed time, w is a weight Pick the
More informationPerformance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
More informationRoad Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.
Road Map Scheduling Dickinson College Computer Science 354 Spring 2010 Past: What an OS is, why we have them, what they do. Base hardware and support for operating systems Process Management Threads Present:
More information