Tilera s Many-core Processor
|
|
- Karin Robbins
- 7 years ago
- Views:
Transcription
1 Tilera s Many-core Processor A scalable architecture on a single chip. J. Whitesell & S. Ladavich Tuesday, May 14 th,
2 2
3 History of Tilera 3
4 History of Tilera Pros and Cons of Building a Manycore Architecture 4
5 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach 5
6 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera s 6
7 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera s Tile Architecture 7
8 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera s Tile Architecture imesh Network Topology 8
9 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Tilera s Tile Architecture imesh Network Topology 9
10 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Server Tilera s Tile Architecture imesh Network Topology 10
11 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Server Media Tilera s Tile Architecture imesh Network Topology 11
12 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Server Media Cloud Tilera s Tile Architecture imesh Network Topology 12
13 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Server Media Cloud Tilera s Tile Architecture imesh Network Topology Performance Analysis and Benchmarking 13
14
15 Multi-processor made of single chips MIT s Dr. Anant Agarwal leads the way for Tiled Manycore
16 Multi-processor made of single chips node meshmesh based cache-coherent processor 2002 MIT s RAW architecture
17 Multi-processor made of single chips node meshmesh based cache-coherent processor MIT s RAW architecture DARPA pays the bill! Gives 10s of millions supporting RAW
18 Tilera has solved the multi-processor scalability problem! does not exist! Multi-processor made of single chips DARPA pays the bill! Gives 10s of millions supporting RAW node meshmesh based cache-coherent processor 2004 Tilera s stealth launch
19 Tilera has solved the multi-processor scalability problem! does not exist! Multi-processor made of single chips DARPA pays the bill! Gives 10s of millions supporting RAW node meshmesh based cache-coherent processor 2004 Tilera s stealth launch Tilera s corporate launch
20 Multi-processor made of single chips DARPA pays the bill! Gives 10s of millions supporting RAW node meshmesh based cache-coherent processor Tilera s stealth launch Tilera s corporate launch Latest line Gx series is released 20
21 Traditional Architectures aren t Scalable Most Multi-Core Chips Stop Around 8 Cores Bus Interconnect Creates a Bottleneck for MM Access Consumes Chip-Area & Power 21
22 On-Chip Memory Limits Software Support Efficient API Development is Challenging Parallel Languages and Programmers are Needed 22
23 On-Chip Communication is Fast! Reduced Overheads Finer Grain Size On-Chip Network Footprint is Small! Natural Tiled Connections 2-D Mesh Suits 2-D Substrate 23
24 Create a Basic Modular Unit Homogeneous Across Chip Known as a Tile Full-Featured Processor Core Processor Engine Cache Engine Switch Engine Capable of Running an OS Basic Look Inside a Tile 24
25 Processor Engine 64-bit VLIW Architecture 3 Execution Pipelines ALU, Flow Control, LD/ST Cache Engine Dynamic Distributed Cache Shared L2 Caches (L3) Switch Engine Direct Neighbor Connections I/O Connections on Periphery Detailed Look Inside a Tile 25
26 Networks are easy! 26
27 Networks are easy! Communication is cheap! 27
28 Leverage Multiple Independent Networks 28
29 1) How many networks are needed? 29
30 1) How many networks are needed? 2) What functionalities do the networks have? 30
31 How are the message types and communications defined? Message Types: Dedicated Networks: 31
32 How are the message types and communications defined? Message Types: Implicit Message Passing Explicit Message Passing Dedicated Networks: 32
33 How are the message types and communications defined? Message Types: 1 1)Implicit Implicit Message Passing Explicit Message Passing 1)MDN 2)TDN Dedicated Networks: 33
34 How are the message types and communications defined? Message Types: 1 1)Implicit Implicit Message Implicit Passing Messages Explicit through Message Passing Tile-to-tile shared address space Non-uniform / distributed cache access (NUCA) Dedicated Networks: Shared address space in off-chip / main memory Uniform memory access (UMA) 1)MDN 2)TDN 34
35 How are the message types and communications defined? Message Types: 1 1)Implicit Implicit Message Passing Explicit Message Passing Streaming Data Messages 1)MDN 2)TDN Dedicated Networks: 35
36 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 1 Implicit Message Passing Explicit Message Passing 2 Streaming Data Messages 1)MDN 2)TDN 3)UDN Dedicated Networks: 36
37 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 1 Implicit Message Passing Explicit Message Passing 2 Streaming Data Messages 1)MDN 2)TDN 3)UDN Dedicated Networks: Large Buffers Small Buffers 37
38 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream 1 Implicit Message Passing Streaming Data Explicit Message Passing Messages 2 Dedicated Networks: Large Buffers Small Buffers 3a 1)MDN 2)TDN 3)UDN 38
39 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream 1 Implicit Message Passing Streaming Data Explicit Message Passing Messages 2 Dedicated Networks: 3b Large Buffers Small Buffers 3a 1)MDN 2)TDN 3)UDN 39
40 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream 1 Implicit Message Passing Streaming Data Explicit Message Passing Messages 2 Dedicated Networks: 3b Large Buffers Small Buffers 3a 1)MDN 2)TDN 3)UDN Special Case: High Performance Streaming 40
41 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 1)MDN 2)TDN 3)UDN 4)STN Dedicated Networks: 1 3b Implicit Message Passing Large Buffers 3c Streaming Data Special Case: High Performance Streaming Explicit Message Passing Small Buffers 3a Messages 2 41
42 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 1)MDN 2)TDN 3)UDN 4)STN Dedicated Networks: 1 3b Implicit Message Passing Large Buffers 3c Streaming Data Special Case: High Performance Streaming Explicit Message Passing Small Buffers 3a Messages 2 Special Case: IO Messages System Traffic 42
43 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 4)System Level & IO 1)MDN 2)TDN 3)UDN 4)STN 5)IDN Dedicated Networks: 1 3b Implicit Message Passing Large Buffers 3c Streaming Data Special Case: High Performance Streaming Explicit Message Passing Small Buffers 3a 4 Messages 2 Special Case: IO Messages System Traffic 43
44 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 4)System Level & IO 1)MDN 2)TDN 3)UDN 4)STN 5)IDN Dedicated Networks: 1 3b 5 Independent Hardware Networks: Implicit Message Passing Explicit Message Passing Memory Dynamic Network Tile Dynamic Network User Dynamic Network Streaming Static Network Data Messages I/O Dynamic Network Large Buffers 3c Special Case: High Performance Streaming Small Buffers 3a 4 2 Special Case: IO Messages System Traffic 44
45 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 4)System Level & IO 1)MDN 2)TDN 3)UDN 4)STN 5)IDN Dedicated Networks: 1 3b 5 Independent Hardware Networks: Implicit Message Passing Explicit Message Passing Memory Dynamic Network Tile Dynamic Network User Dynamic Network Streaming Static Network Data Messages I/O Dynamic Network Large Buffers Small Buffers Which minimize overheads for all desired forms of communication 3c Special Case: High Performance Streaming 3a 4 2 Special Case: IO Messages System Traffic 45
46 Parallel Processing in Embedded Domain Network Lossless Packet Capture Intrusion Detection & Prevention Multimedia Video Conferencing IP Surveillance Cloud In-Memory Caching Server Load Balancing 46
47 Numerous Evaluations Single-Core Performance CoreMark Score Parallelized Performance Information Fusion Gaussian Elimination MemCached Comparisons of SMPs & Many-Core 47
48 Evaluates Single-Core Performance 4 Algorithms 1 Final Score Single-Core Single Thread CoreMark Comparison Tilera s Processors Feature: VLIW Architecture 3 Pipelines 64-bit Instr. Words All or None Exec. CoreMark Score 48
49 Embedded Wireless Sensor Networks Cluster Heads Receive from 10 Sensors Head Node Performs Reduction Moving Average Filter 49
50 Results Vary Based on Application Integer-Based Arithmetic Floating-Point Intensive Information Fusion Application Gaussian Elimination Application 50
51 Why? Tiles Lack a Dedicated Floating-point Unit! Information Fusion Application Gaussian Elimination Application 51
52 Distributed Memory Caching System Creates a Virtual Memory Pool Used for Key-Value Stores Designed to Alleviate Database Load Currently Implemented by Social Media Giants Facebook, Twitter, and Zynga 52
53 For a Fixed Memory Footprint Tilera Achieves 3.35x Less Power Better Performance per Watt 53
54 The Tile Architecture Exhibits Superior Scalability Modular Design Low Cost of On-Chip Communication Exploiting a Variety of Task Grain Sizes ILP and TLP High Performance per Watt Relatively Low Clock Speeds Idle Mode for Unused Tiles Reducing Costs of Web Datacenters 54
55 55
56 Waingold, E.; Taylor, M.; Srikrishna, D.; Sarkar, V.; Lee, W.; Lee, V.; Kim, J.; Frank, M.; Finch, P.; Barua, R.; Babb, J.; Amarasinghe, S.; Agarwal, A., "Baring it all to software: Raw machines," Computer, vol.30, no.9, pp.86,93, Sep 1997 CURRENTLY NOT NEEDED Tilera Corporation, Tile Processor User Architecture Manual, UG101, Nov [Rev. 2.4] Wentzlaff, D.; Griffin, P.; Hoffmann, H.; Liewei Bao; Edwards, B.; Ramey, C.; Mattina, M.; Chyi-Chang Miao; Brown, J.F.; Agarwal, A., "On-Chip Interconnection Architecture of the Tile Processor," Micro, IEEE, vol.27, no.5, pp.15,31, Sept.-Oct Munir, A.; Gordon-Ross, A.; Ranka, S., "Parallelized benchmark-driven performance evaluation of SMPs and tiled multi-core architectures for embedded systems," Performance Computing and Communications Conference (IPCCC), 2012 IEEE 31st International, vol., no., pp.416,423, 1-3 Dec Berezecki, M.; Frachtenberg, E.; Paleczny, M.; Steele, K., "Many-core key-value store," Green Computing Conference and Workshops (IGCC), 2011 International, vol., no., pp.1,8, July 2011 R. Schooler, The TILE-Gx Processor: Enabling HPC through Massive-Scale Manycore, IEEE High Performance EMbedded Computing Conference Proceedings, Presentation Slides Links to Other Images (Presentation Only): Tilera Silicon - AMD Phenom Silicon - Scalability Graph - Tilera Products and Theme - Single Tile Detail
Scalable Cyber-Security for Terabit Cloud Computing. 2012 IEEE High Performance Extreme Computing. Jordi Ros-Giralt / giralt@reservoir.
Scalable Cyber-Security for Terabit Cloud Computing 2012 IEEE High Performance Extreme Computing Jordi Ros-Giralt / giralt@reservoir.com 1 Objectives To be able to do packet analysis at very high-speed
More informationON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR
... ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR... IMESH, THE TILE PROCESSOR ARCHITECTURE S ON-CHIP INTERCONNECTION NETWORK, CONNECTS THE MULTICORE PROCESSOR S TILES WITH FIVE 2D MESH NETWORKS,
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationMaking Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationOn-Chip Interconnect Network Communication Management for Multi-Core Design
Journal of Advances in Computer Network, Vol., No. 3, September 23 On-Chip Interconnect Network Communication Management for Multi-Core Design He Zhou, Mariya Bhopalwala, and Janet oveda multi-core system
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationAccelerating the Data Plane With the TILE-Mx Manycore Processor
Accelerating the Data Plane With the TILE-Mx Manycore Processor Bob Doud Director of Marketing EZchip Linley Data Center Conference February 25 26, 2015 1 Announcing the World s First 100-Core A 64-Bit
More informationEnterprise Applications
Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting
More informationVirtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationIntroducing EEMBC Cloud and Big Data Server Benchmarks
Introducing EEMBC Cloud and Big Data Server Benchmarks Quick Background: Industry-Standard Benchmarks for the Embedded Industry EEMBC formed in 1997 as non-profit consortium Defining and developing application-specific
More informationPerformance of Software Switching
Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationWhite Paper The Numascale Solution: Extreme BIG DATA Computing
White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad ABOUT THE AUTHOR Einar Rustad is CTO of Numascale and has a background as CPU, Computer Systems and HPC Systems De-signer
More informationHow To Create A Concurrent Cloud Computing System
THROUGHPUTER PaaS for creating and executing concurrent cloud applications OVERVIEW 1) Fundamental transformation in computing: Concurrent apps on dynamically shared resources Micro-services: unpredictable
More informationnumascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT
numascale Hardware Accellerated Data Intensive Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad www.numascale.com Supemicro delivers 108 node system with Numascale
More informationHow To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
More informationStream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
More informationSymmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
More informationEmerging IT and Energy Star PC Specification Version 4.0: Opportunities and Risks. ITI/EPA Energy Star Workshop June 21, 2005 Donna Sadowy, AMD
Emerging IT and Energy Star PC Specification Version 4.0: Opportunities and Risks ITI/EPA Energy Star Workshop June 21, 2005 Donna Sadowy, AMD Defining the Goal The ITI members and EPA share a common goal:
More informationAn Operating System for Multicore and Clouds
An Operating System for Multicore and Clouds Mechanisms and Implementataion David Wentzlaff, Charles Gruenwald III, Nathan Beckmann, Kevin Modzelewski, Adam Belay, Lamia Youseff, Jason Miller, Anant Agarwal
More informationLow-Overhead Hard Real-time Aware Interconnect Network Router
Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering
More informationCellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
More informationMulti-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationArchitecture Support for Big Data Analytics
Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1
More informationOperating System Support for Multiprocessor Systems-on-Chip
Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.
More informationHigh Performance Computing. Course Notes 2007-2008. HPC Fundamentals
High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationBuilding an energy dashboard. Energy measurement and visualization in current HPC systems
Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 thomas.geenen@surfsara.nl SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators
More informationWhy the Network Matters
Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing
More informationSeeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
More informationNext Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
More informationChapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
More informationChapter 2 Parallel Computer Architecture
Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general
More informationStudy Plan Masters of Science in Computer Engineering and Networks (Thesis Track)
Plan Number 2009 Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track) I. General Rules and Conditions 1. This plan conforms to the regulations of the general frame of programs
More informationPrinciples and characteristics of distributed systems and environments
Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single
More informationBig Data and Cloud Computing for GHRSST
Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge
More informationADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
More informationA Locality Approach to Architecture-aware Task-scheduling in OpenMP
A Locality Approach to Architecture-aware Task-scheduling in OpenMP Ananya Muddukrishna ananya@kth.se Mats Brorsson matsbror@kth.se Vladimir Vlassov vladv@kth.se ABSTRACT Multicore and other parallel computer
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis Parallel Computers Definition: A parallel computer is a collection of processing
More informationSwitching Architectures for Cloud Network Designs
Overview Networks today require predictable performance and are much more aware of application flows than traditional networks with static addressing of devices. Enterprise networks in the past were designed
More informationClient/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
More informationLecture 23: Multiprocessors
Lecture 23: Multiprocessors Today s topics: RAID Multiprocessor taxonomy Snooping-based cache coherence protocol 1 RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) it uses an array of disks
More informationREAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT
REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT SPOT THE ODD ONE BEFORE IT IS OUT flexaware.net Streaming analytics: from data to action Do you need actionable insights from various data streams fast?
More informationFactored Operating Systems(fos): The Case for a Scalable Operating System for Multicores
Factored Operating Systems(fos): The Case for a Scalable Operating System for Multicores David Wentzlaff and Anant Agarwal Computer Science and Artificial Intelligence Laboratory Massachusetts Institute
More informationTiled Multicore Processors: The Four Stages of Reality
Tiled Multicore Processors: The Four Stages of Reality Anant Agarwal MIT and Tilera 1 Moore s Gap. Diminishing returns from sequential processor mechanisms. Wire delays. Power envelopes 2 Tilera s Tile
More informationRecursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip
Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science and Engineering Texas A&M University
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationComputer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
More informationMultilevel Load Balancing in NUMA Computers
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,
More informationCISC, RISC, and DSP Microprocessors
CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:
More informationRadeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008
Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer
More informationInternational Journal of Computer & Organization Trends Volume20 Number1 May 2015
Performance Analysis of Various Guest Operating Systems on Ubuntu 14.04 Prof. (Dr.) Viabhakar Pathak 1, Pramod Kumar Ram 2 1 Computer Science and Engineering, Arya College of Engineering, Jaipur, India.
More informationPetascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing
Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationLoad Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud
Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud 1 S.Karthika, 2 T.Lavanya, 3 G.Gokila, 4 A.Arunraja 5 S.Sarumathi, 6 S.Saravanakumar, 7 A.Gokilavani 1,2,3,4 Student, Department
More information- Nishad Nerurkar. - Aniket Mhatre
- Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,
More informationEnabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
More informationParallel Programming
Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix
More informationWorkshare Process of Thread Programming and MPI Model on Multicore Architecture
Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma
More information3D On-chip Data Center Networks Using Circuit Switches and Packet Switches
3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,
More informationWhite Paper The Numascale Solution: Affordable BIG DATA Computing
White Paper The Numascale Solution: Affordable BIG DATA Computing By: John Russel PRODUCED BY: Tabor Custom Publishing IN CONJUNCTION WITH: ABSTRACT Big Data applications once limited to a few exotic disciplines
More informationSystem Models for Distributed and Cloud Computing
System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems
More informationOutline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip
Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC
More informationDriving force. What future software needs. Potential research topics
Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #
More informationMixed-Criticality: Integration of Different Models of Computation. University of Siegen, Roman Obermaisser
Workshop on "Challenges in Mixed Criticality, Real-time, and Reliability in Networked Complex Embedded Systems" Mixed-Criticality: Integration of Different Models of Computation University of Siegen, Roman
More informationSOC architecture and design
SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external
More informationNetworking Goes Open-Source. Michael Zimmerman VP Marketing, Tilera mzimmerman@tilera.com
Networking Goes Open-Source Michael Zimmerman VP Marketing, Tilera mzimmerman@tilera.com Open Server Summit, October 23, 2013 Networking Goes Open-Source ? Networking Goes Open-Source Are they connected
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationFLIX: Fast Relief for Performance-Hungry Embedded Applications
FLIX: Fast Relief for Performance-Hungry Embedded Applications Tensilica Inc. February 25 25 Tensilica, Inc. 25 Tensilica, Inc. ii Contents FLIX: Fast Relief for Performance-Hungry Embedded Applications...
More informationLSN 2 Computer Processors
LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2
More informationUNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction
More informationIntroduction to RISC Processor. ni logic Pvt. Ltd., Pune
Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC
More informationHighly Available Mobile Services Infrastructure Using Oracle Berkeley DB
Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides
More informationA Lab Course on Computer Architecture
A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,
More informationScalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
More informationLow Power AMD Athlon 64 and AMD Opteron Processors
Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD
More informationCS244 Lecture 5 Architecture and Principles
CS244 Lecture 5 Architecture and Principles Network Virtualiza/on in Mul/- tenant Datacenters, NSDI 2014. Guido Appenzeller Background Why is SDN Happening? CLOSED & PROPRIETARY NETWORKING EQUIPMENT Vertically
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationEnabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
More informationPerformance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09
Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,
More informationResource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
More informationDesign and Implementation of the Heterogeneous Multikernel Operating System
223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,
More informationCentralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
More informationEffective Utilization of Multicore Processor for Unified Threat Management Functions
Journal of Computer Science 8 (1): 68-75, 2012 ISSN 1549-3636 2012 Science Publications Effective Utilization of Multicore Processor for Unified Threat Management Functions Sudhakar Gummadi and Radhakrishnan
More informationTolerating SEU Faults in the Raw Architecture
Tolerating SEU Faults in the Raw Architecture Karandeep Singh #, Adnan Agbaria +*, Dong-In Kang #, and Matthew French # # USC Information Sciences Institute, Arlington VA, USA Email: {karan, dkang, mfrench}
More informationImproving the performance of data servers on multicore architectures. Fabien Gaud
Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble University Advisors: Jean-Bernard Stefani, Renaud Lachaize and Vivien Quéma Sardes (INRIA/LIG) December 2, 2010
More informationMemory Architecture and Management in a NoC Platform
Architecture and Management in a NoC Platform Axel Jantsch Xiaowen Chen Zhonghai Lu Chaochao Feng Abdul Nameed Yuang Zhang Ahmed Hemani DATE 2011 Overview Motivation State of the Art Data Management Engine
More informationmembase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010
membase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010 Membase is an Open Source distributed, key-value database management system optimized for storing data behind
More information~ Greetings from WSU CAPPLab ~
~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)
More informationDDR3 memory technology
DDR3 memory technology Technology brief, 3 rd edition Introduction... 2 DDR3 architecture... 2 Types of DDR3 DIMMs... 2 Unbuffered and Registered DIMMs... 2 Load Reduced DIMMs... 3 LRDIMMs and rank multiplication...
More informationAMD Opteron Quad-Core
AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced
More informationDesigning and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
More informationA Survey of Cloud Computing Guanfeng Octides
A Survey of Cloud Computing Guanfeng Nov 7, 2010 Abstract The principal service provided by cloud computing is that underlying infrastructure, which often consists of compute resources like storage, processors,
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationDesign Patterns for Packet Processing Applications on Multi-core Intel Architecture Processors
White Paper Cristian F. Dumitrescu Software Engineer Intel Corporation Design Patterns for Packet Processing Applications on Multi-core Intel Architecture Processors December 2008 321058 Executive Summary
More informationOpenSoC Fabric: On-Chip Network Generator
OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation
More informationDistributed and Cloud Computing
Distributed and Cloud Computing K. Hwang, G. Fox and J. Dongarra Chapter 1: Enabling Technologies and Distributed System Models Copyright 2012, Elsevier Inc. All rights reserved. 1 1-1 Data Deluge Enabling
More informationInterconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality
More information