Multiprocessors/Multicores

Similar documents
Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

COS 318: Operating Systems. Virtual Machine Monitors

COS 318: Operating Systems. Virtual Machine Monitors

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Multi-core Programming System Overview

Full and Para Virtualization

Virtualization Technology. Zhiming Shen

COM 444 Cloud Computing

Virtualization. Explain how today s virtualization movement is actually a reinvention

Principles and characteristics of distributed systems and environments

Design and Implementation of the Heterogeneous Multikernel Operating System

Virtualization. Dr. Yingwu Zhu

An Operating System for Multicore and Clouds

A Comparison of Distributed Systems: ChorusOS and Amoeba

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

Chapter 14 Virtual Machines

Introduction to Virtual Machines

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

System Virtual Machines

Virtualization. Jukka K. Nurminen

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Virtualization. Jia Rao Assistant Professor in CS

Virtual Machine Monitors: Disco and Xen. Robert Grimm New York University

CPS221 Lecture: Operating System Structure; Virtual Machines

Cellular Disco: Resource Management Using Virtual Clusters on Shared-Memory Multiprocessors

Memory Resource Management in VMware ESX Server

Virtualization. Pradipta De

Introduction to Virtual Machines

Basics of Virtualisation

Virtualization. Types of Interfaces

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

VMkit A lightweight hypervisor library for Barrelfish

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

Symmetric Multiprocessing

Distributed and Cloud Computing

Distributed Systems. Virtualization. Paul Krzyzanowski

How To Understand The Concept Of A Distributed System

CMSC 611: Advanced Computer Architecture

Windows Server Virtualization & The Windows Hypervisor

Virtual machines and operating systems

Virtual machine interface. Operating system. Physical machine interface

Chapter 5 Cloud Resource Virtualization

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

COS 318: Operating Systems

The Microsoft Windows Hypervisor High Level Architecture

The Xen of Virtualization

x86 ISA Modifications to support Virtual Machines

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

nanohub.org An Overview of Virtualization Techniques

Introduction to Cloud Computing

Lecture 23: Multiprocessors

OS Concepts and structure

A Unified View of Virtual Machines

Lecture 2 Cloud Computing & Virtualization. Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

Virtual Machines.

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Hardware accelerated Virtualization in the ARM Cortex Processors

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Hypervisors. Introduction. Introduction. Introduction. Introduction. Introduction. Credits:

Enabling Technologies for Distributed and Cloud Computing

Cloud Computing CS

Operating System Organization. Purpose of an OS

B) Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Xeon+FPGA Platform for the Data Center

Jukka Ylitalo Tik TKK, April 24, 2006

How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself

Chapter 2 Parallel Computer Architecture

evm Virtualization Platform for Windows

Using the SimOS Machine Simulator to Study Complex Computer Systems

Knut Omang Ifi/Oracle 19 Oct, 2015

Anh Quach, Matthew Rajman, Bienvenido Rodriguez, Brian Rodriguez, Michael Roefs, Ahmed Shaikh

Virtual Machines. COMP 3361: Operating Systems I Winter


Next Generation Operating Systems

Cloud Computing. Up until now

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies. Virtualization of Clusters and Data Centers

GUEST OPERATING SYSTEM BASED PERFORMANCE COMPARISON OF VMWARE AND XEN HYPERVISOR

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

Your computer is already a distributed system. Why isn t your OS?

Enabling Technologies for Distributed Computing

Cloud Computing. Chapter 1 Introducing Cloud Computing

Virtualization for Cloud Computing

Client/Server and Distributed Computing

Virtual Machines. Virtualization

Distributed Systems LEEC (2005/06 2º Sem.)

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Operating System Components

Datacenter Operating Systems

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

How To Write A Windows Operating System (Windows) (For Linux) (Windows 2) (Programming) (Operating System) (Permanent) (Powerbook) (Unix) (Amd64) (Win2) (X

Virtualization. Michael Tsai 2015/06/08

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

CS 695 Topics in Virtualization and Cloud Computing. More Introduction + Processor Virtualization

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Transcription:

September 26, 2013

Road Map Motivation and Background - Standford multiprocessor system Barrelfish - ETH Zurich & Microsoft s multicore system.

Multi-core V.S. Multi-Processor Approaches Multi-core V.S. Multi-Processor Multiple Cores/ Chip & Single PU Independent L1 cache and shared L2 cache. Single or Multiple Cores/Chip & Multiple PUs Independent L1 cache and Independent L2 cache. 1 Understanding Parallel Hardware: Multiprocessors, Hyperthreading, Dual-Core, Multicore and FPGAs 1

Multi-core V.S. Multi-Processor Approaches Flynns Classification of multiprocessor machines: {SI, MI } {SD, MD} = {SISD, SIMD, MISD, MIMD} 1. SISD = Single Instruction Single Data 2. SIMD = Single Instruction Multiple Data ( Array Processors or Data Parallel machines) 3. MISD does not exist. 4. MIMD = Multiple Instruction Multiple Data Control parallelism.

Multi-core V.S. Multi-Processor Approaches MIMD 2 5 8 mazsola.iit.uni-miskolc.hu/tempus/parallel/doc/kacsuk/chap18.ps.gz 2

Multi-core V.S. Multi-Processor Approaches MIMD-Shared memory Uniform memory access Access time to all regions of memory the same Non-uniform memory access Different processors access different regions of memory at different speeds

Multi-core V.S. Multi-Processor Approaches MIMD-Distributed memory

Multi-core V.S. Multi-Processor Approaches MIMD-Cache coherent NUMA

Multi-core V.S. Multi-Processor Approaches History 3 3 http://www.cs.unm.edu/ fastos/03workshop/krieger.pdf

Multi-core V.S. Multi-Processor Approaches V.S. MultiKernel (1997) Adding software layer between the hardware and VM. MultiKernel(2009) Message passing idea from distributed system

Motivation and goal Vitualization Conclusion and Discussion Edouard Bugnion VP, Cisco. Phd from Stanford. Co-Founder of VMware, key member of Sim OS, Co-Founder of Nuova Systems Scott Devine Principal Engineer, VMware. Phd from Stanford. Co-Founder of VMware, key member of Sim OS Mendel Rosenblum Associate Prof in Stanford. Phd from UC Berkley. Co-Founder of VMware, key member of Sim OS

Motivation Motivation and Background Motivation and goal Vitualization Conclusion and Discussion CCNUMA system Large shared memory multi-processor systems Stanford FLASH (1994) Low-latency, high-bandwidth interconnection Porting OS to these platforms is expensive, difficult and error-prone. : Instead of porting, partition these systems into VM and run essentially unmodified OS on the VMs.

Motivation and goal Vitualization Conclusion and Discussion Goals Use the machine with minimal effort Overcome traditional VM overheads

Motivation and goal Vitualization Conclusion and Discussion Back to the future: Virtual Machine Monitors Why VMM would work? Cost of development is less Less risk of introducing software bugs Flexibility to support wide variety of workloads NUMA memory management is hidden from guest OS. Keep existing application and keep isolation

Motivation and goal Vitualization Conclusion and Discussion Virtual Machine Monitor Virtualizes resources for coexistence of multiple VMs. Additional layer of software between the hardware and the OS

Motivation and Background Motivation and goal Vitualization Conclusion and Discussion Virtual CPU Virtual Memory system NUMA optimizations Dynamic page migration and replication Virtual Disks Copy-on-write Virtual Network Interface

Motivation and goal Vitualization Conclusion and Discussion Virtualization CPU Direct operation Good performance Scheduling, set CPU registers to those of VCPU and jump to VCPUs PC. What if attempt is made to modify TLB or access physical memory? Privileged instructions need to be trapped and simulated by VMM.

Virtual Memory Motivation and Background Motivation and goal Vitualization Conclusion and Discussion Two-level mapping VM: Virtual addresses to Physical address : Physical to Machine address via pmap Real TLB stores Virtual Machine mapping TLB flush when virtual CPU changes Second level TLB and memmap ccnuma dynamic page migration and page replication system.

Page Replication Motivation and Background Motivation and goal Vitualization Conclusion and Discussion

Virtual I/O Motivation and Background Motivation and goal Vitualization Conclusion and Discussion Virtual I/O Devices Special device drivers written rather than emulating the hardware Virtual DMA mapped from Physical to Machine addresses

Motivation and goal Vitualization Conclusion and Discussion Virtual Disk & Network Virtual Disk Persistent disks are not shared (Sharing done using NFS) Non-persistent disks are shared copy-on-write Virtual Network When sending data between nodes, intercepts DMA and remaps when possible

Motivation and goal Vitualization Conclusion and Discussion on IRIX To run IRIX on top of DISCO, some changes had to be made: Changed IRIX kernel code and data in a location where VMM could intercept all address translations. Device drivers rewritten. Synchronization routines to protected registers, rewritten to non-privileged load/store.

runtime overhead Motivation and goal Vitualization Conclusion and Discussion Pmake, page initialization Rest, second level TLB

Motivation and goal Vitualization Conclusion and Discussion Memory Benefit Due To Data Sharing V: pmake memory used if no sharing. M: pmake memory used with sharing.

Scalability Motivation and Background Motivation and goal Vitualization Conclusion and Discussion

Motivation and goal Vitualization Conclusion and Discussion Conclusion Virtual Machine Monitor OS independent Manages resources, optimizes sharing primitives

Motivation and goal Vitualization Conclusion and Discussion DISCO v.s. Exokernel The Exokernel multiplexes resources between user-level library operating systems. DISCO differs from Exokernel is that it virtualizes resources rather than multiplexing them. Therefore, can run commodity OS with minor modifications.

Motivation and goal Vitualization Conclusion and Discussion Discussion NUMA: Is it a good thing to move the complexity from hardware to the OS., they didn t compare against other special multiprocessor operating systems (Hurricane and Hive). Imagine the combination of this approach with the extensibility of the microkernel, do you think apply both in one system can improve the performance? Is the use of a simple trade-off between performance and scalability? paper says sharing can help with managing unnecessarily replicated data structures. What about homogeneous v.s. heterogenous workload? Could you run on top of?

Motivation Key points Barrelfish The :A new OS architecture for scalable multicore systems SOSP 2009 http://research.microsoft.com/en-us/ news/features/070711-barrelfish.aspx Many authors are from ETH Zurich Systems Group and now working in MSR Andrew Baumann: Microsoft Research Simon Peter: Postdoc in University of Washington

Motivations Motivation and Background Motivation Key points Barrelfish 1)Wide range of hardware. 2)Diverse Cores. 3) Interconnect, message passing like 4) $ messages < $shared

Future of the OS Motivation and Background Motivation Key points Barrelfish Many cores Sharing within the OS is becoming a problem Cache-coherence protocol limits scalability Core diversity Scaling existing OSes Increasingly difficult to scale conventional OSes Optimizations are specific to hardware platforms Non-uniformity Memory hierarchy NUMA

Motivation Key points Barrelfish Rethink in terms of distributed System Look at OS as a distributed system of functional units communicating by message passing The three design principles: making inter-core communication explicit making OS structure hardware neutral instead of shared, view state as replicated

Traditional OS vs. multikernel Motivation Key points Barrelfish Traditional OSes scale up by: Reducing lock granularity Partitioning state State partitioned/replicated by default rather then shared

Motivation Key points Barrelfish : Barrelfish Goals for Barrelfish Give comparable performance Demonstrates evidence of scalability Can be re-targeted to different hardware without refactoring Can exploit the message-passing abstraction Can exploit the modularity of the OS

Motivation Key points Barrelfish Implementation of Barrelfish: System Structure Factored the OS instance on each core into a privileged-mode CPU driver and a distinguished user mode monitor process

Implementation of Barrelfish Motivation Key points Barrelfish CPU drivers Enforces protection Serially handles traps and exceptions Shares no state with other cores Monitors Collectively coordinate system-wide state Encapsulate much of the mechanism and policy Mediates local operations on global state Replicated data structures are kept globally consistent

Implementation of Barrelfish Motivation Key points Barrelfish Process structure Represented by a collection of dispatcher objects Scheduled by local CPU driver Inter-core communication Via messages Uses variant of user level RPC

Implementation of Barrelfish Motivation Key points Barrelfish Memory management Explicitly via system calls by user level code Cleanly decentralize resource allocation for scalability Support shared address space System knowledge base Maintains knowledge of the underlying hardware Facilitates optimization

Motivation Key points Barrelfish TLB shootdown Send a message to every core with a mapping, wait for all to be acknowledged Linux/Windows: Kernel sends IPIs and spins on acknowledgement Barrelfish: User request to local monitor and single-phase commit to remote monitors

TLB shootdown Motivation and Background Motivation Key points Barrelfish

Conclusion Strong Points Scales well with core count Adapt to evolving hardware Optimizing messaging Lightweight Lack of evaluation on Complex app workloads Higher level OS services Scalability on variety of HW

Discussion Do you think we are ready to make OS structure HW-neutral and it is practical now? How do you think it will affect performance? Is the concept of no inter-core shared data structures too idealistic? Could Barrelfish be expanded over a network? Able to efficiently manage multiple hardware systems over a LAN/WAN? Could there be benefits of sharing a replica of the state between a group of closely-coupled cores?

Thank You Questions?

Reference Deniz 2009 CS6410 slides Ashik R.2011 CS6410 slides Slides from Seokje at.el https://wiki.engr.illinois. edu/download/attachments/227741408/multicore1. pdf?version=1&modificationdate=1347974981000