Tilera s Many-core Processor

Size: px
Start display at page:

Download "Tilera s Many-core Processor"

Transcription

1 Tilera s Many-core Processor A scalable architecture on a single chip. J. Whitesell & S. Ladavich Tuesday, May 14 th,

2 2

3 History of Tilera 3

4 History of Tilera Pros and Cons of Building a Manycore Architecture 4

5 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach 5

6 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera s 6

7 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera s Tile Architecture 7

8 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Tilera s Tile Architecture imesh Network Topology 8

9 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Tilera s Tile Architecture imesh Network Topology 9

10 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Server Tilera s Tile Architecture imesh Network Topology 10

11 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Server Media Tilera s Tile Architecture imesh Network Topology 11

12 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Server Media Cloud Tilera s Tile Architecture imesh Network Topology 12

13 History of Tilera Pros and Cons of Building a Manycore Architecture The Tilera Approach Applications Server Media Cloud Tilera s Tile Architecture imesh Network Topology Performance Analysis and Benchmarking 13

14

15 Multi-processor made of single chips MIT s Dr. Anant Agarwal leads the way for Tiled Manycore

16 Multi-processor made of single chips node meshmesh based cache-coherent processor 2002 MIT s RAW architecture

17 Multi-processor made of single chips node meshmesh based cache-coherent processor MIT s RAW architecture DARPA pays the bill! Gives 10s of millions supporting RAW

18 Tilera has solved the multi-processor scalability problem! does not exist! Multi-processor made of single chips DARPA pays the bill! Gives 10s of millions supporting RAW node meshmesh based cache-coherent processor 2004 Tilera s stealth launch

19 Tilera has solved the multi-processor scalability problem! does not exist! Multi-processor made of single chips DARPA pays the bill! Gives 10s of millions supporting RAW node meshmesh based cache-coherent processor 2004 Tilera s stealth launch Tilera s corporate launch

20 Multi-processor made of single chips DARPA pays the bill! Gives 10s of millions supporting RAW node meshmesh based cache-coherent processor Tilera s stealth launch Tilera s corporate launch Latest line Gx series is released 20

21 Traditional Architectures aren t Scalable Most Multi-Core Chips Stop Around 8 Cores Bus Interconnect Creates a Bottleneck for MM Access Consumes Chip-Area & Power 21

22 On-Chip Memory Limits Software Support Efficient API Development is Challenging Parallel Languages and Programmers are Needed 22

23 On-Chip Communication is Fast! Reduced Overheads Finer Grain Size On-Chip Network Footprint is Small! Natural Tiled Connections 2-D Mesh Suits 2-D Substrate 23

24 Create a Basic Modular Unit Homogeneous Across Chip Known as a Tile Full-Featured Processor Core Processor Engine Cache Engine Switch Engine Capable of Running an OS Basic Look Inside a Tile 24

25 Processor Engine 64-bit VLIW Architecture 3 Execution Pipelines ALU, Flow Control, LD/ST Cache Engine Dynamic Distributed Cache Shared L2 Caches (L3) Switch Engine Direct Neighbor Connections I/O Connections on Periphery Detailed Look Inside a Tile 25

26 Networks are easy! 26

27 Networks are easy! Communication is cheap! 27

28 Leverage Multiple Independent Networks 28

29 1) How many networks are needed? 29

30 1) How many networks are needed? 2) What functionalities do the networks have? 30

31 How are the message types and communications defined? Message Types: Dedicated Networks: 31

32 How are the message types and communications defined? Message Types: Implicit Message Passing Explicit Message Passing Dedicated Networks: 32

33 How are the message types and communications defined? Message Types: 1 1)Implicit Implicit Message Passing Explicit Message Passing 1)MDN 2)TDN Dedicated Networks: 33

34 How are the message types and communications defined? Message Types: 1 1)Implicit Implicit Message Implicit Passing Messages Explicit through Message Passing Tile-to-tile shared address space Non-uniform / distributed cache access (NUCA) Dedicated Networks: Shared address space in off-chip / main memory Uniform memory access (UMA) 1)MDN 2)TDN 34

35 How are the message types and communications defined? Message Types: 1 1)Implicit Implicit Message Passing Explicit Message Passing Streaming Data Messages 1)MDN 2)TDN Dedicated Networks: 35

36 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 1 Implicit Message Passing Explicit Message Passing 2 Streaming Data Messages 1)MDN 2)TDN 3)UDN Dedicated Networks: 36

37 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 1 Implicit Message Passing Explicit Message Passing 2 Streaming Data Messages 1)MDN 2)TDN 3)UDN Dedicated Networks: Large Buffers Small Buffers 37

38 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream 1 Implicit Message Passing Streaming Data Explicit Message Passing Messages 2 Dedicated Networks: Large Buffers Small Buffers 3a 1)MDN 2)TDN 3)UDN 38

39 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream 1 Implicit Message Passing Streaming Data Explicit Message Passing Messages 2 Dedicated Networks: 3b Large Buffers Small Buffers 3a 1)MDN 2)TDN 3)UDN 39

40 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream 1 Implicit Message Passing Streaming Data Explicit Message Passing Messages 2 Dedicated Networks: 3b Large Buffers Small Buffers 3a 1)MDN 2)TDN 3)UDN Special Case: High Performance Streaming 40

41 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 1)MDN 2)TDN 3)UDN 4)STN Dedicated Networks: 1 3b Implicit Message Passing Large Buffers 3c Streaming Data Special Case: High Performance Streaming Explicit Message Passing Small Buffers 3a Messages 2 41

42 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 1)MDN 2)TDN 3)UDN 4)STN Dedicated Networks: 1 3b Implicit Message Passing Large Buffers 3c Streaming Data Special Case: High Performance Streaming Explicit Message Passing Small Buffers 3a Messages 2 Special Case: IO Messages System Traffic 42

43 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 4)System Level & IO 1)MDN 2)TDN 3)UDN 4)STN 5)IDN Dedicated Networks: 1 3b Implicit Message Passing Large Buffers 3c Streaming Data Special Case: High Performance Streaming Explicit Message Passing Small Buffers 3a 4 Messages 2 Special Case: IO Messages System Traffic 43

44 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 4)System Level & IO 1)MDN 2)TDN 3)UDN 4)STN 5)IDN Dedicated Networks: 1 3b 5 Independent Hardware Networks: Implicit Message Passing Explicit Message Passing Memory Dynamic Network Tile Dynamic Network User Dynamic Network Streaming Static Network Data Messages I/O Dynamic Network Large Buffers 3c Special Case: High Performance Streaming Small Buffers 3a 4 2 Special Case: IO Messages System Traffic 44

45 How are the message types and communications defined? Message Types: 1)Implicit 2)Message Passing 3)Streaming Data a) Small stream b) Large stream c) Large/Continuous 4)System Level & IO 1)MDN 2)TDN 3)UDN 4)STN 5)IDN Dedicated Networks: 1 3b 5 Independent Hardware Networks: Implicit Message Passing Explicit Message Passing Memory Dynamic Network Tile Dynamic Network User Dynamic Network Streaming Static Network Data Messages I/O Dynamic Network Large Buffers Small Buffers Which minimize overheads for all desired forms of communication 3c Special Case: High Performance Streaming 3a 4 2 Special Case: IO Messages System Traffic 45

46 Parallel Processing in Embedded Domain Network Lossless Packet Capture Intrusion Detection & Prevention Multimedia Video Conferencing IP Surveillance Cloud In-Memory Caching Server Load Balancing 46

47 Numerous Evaluations Single-Core Performance CoreMark Score Parallelized Performance Information Fusion Gaussian Elimination MemCached Comparisons of SMPs & Many-Core 47

48 Evaluates Single-Core Performance 4 Algorithms 1 Final Score Single-Core Single Thread CoreMark Comparison Tilera s Processors Feature: VLIW Architecture 3 Pipelines 64-bit Instr. Words All or None Exec. CoreMark Score 48

49 Embedded Wireless Sensor Networks Cluster Heads Receive from 10 Sensors Head Node Performs Reduction Moving Average Filter 49

50 Results Vary Based on Application Integer-Based Arithmetic Floating-Point Intensive Information Fusion Application Gaussian Elimination Application 50

51 Why? Tiles Lack a Dedicated Floating-point Unit! Information Fusion Application Gaussian Elimination Application 51

52 Distributed Memory Caching System Creates a Virtual Memory Pool Used for Key-Value Stores Designed to Alleviate Database Load Currently Implemented by Social Media Giants Facebook, Twitter, and Zynga 52

53 For a Fixed Memory Footprint Tilera Achieves 3.35x Less Power Better Performance per Watt 53

54 The Tile Architecture Exhibits Superior Scalability Modular Design Low Cost of On-Chip Communication Exploiting a Variety of Task Grain Sizes ILP and TLP High Performance per Watt Relatively Low Clock Speeds Idle Mode for Unused Tiles Reducing Costs of Web Datacenters 54

55 55

56 Waingold, E.; Taylor, M.; Srikrishna, D.; Sarkar, V.; Lee, W.; Lee, V.; Kim, J.; Frank, M.; Finch, P.; Barua, R.; Babb, J.; Amarasinghe, S.; Agarwal, A., "Baring it all to software: Raw machines," Computer, vol.30, no.9, pp.86,93, Sep 1997 CURRENTLY NOT NEEDED Tilera Corporation, Tile Processor User Architecture Manual, UG101, Nov [Rev. 2.4] Wentzlaff, D.; Griffin, P.; Hoffmann, H.; Liewei Bao; Edwards, B.; Ramey, C.; Mattina, M.; Chyi-Chang Miao; Brown, J.F.; Agarwal, A., "On-Chip Interconnection Architecture of the Tile Processor," Micro, IEEE, vol.27, no.5, pp.15,31, Sept.-Oct Munir, A.; Gordon-Ross, A.; Ranka, S., "Parallelized benchmark-driven performance evaluation of SMPs and tiled multi-core architectures for embedded systems," Performance Computing and Communications Conference (IPCCC), 2012 IEEE 31st International, vol., no., pp.416,423, 1-3 Dec Berezecki, M.; Frachtenberg, E.; Paleczny, M.; Steele, K., "Many-core key-value store," Green Computing Conference and Workshops (IGCC), 2011 International, vol., no., pp.1,8, July 2011 R. Schooler, The TILE-Gx Processor: Enabling HPC through Massive-Scale Manycore, IEEE High Performance EMbedded Computing Conference Proceedings, Presentation Slides Links to Other Images (Presentation Only): Tilera Silicon - AMD Phenom Silicon - Scalability Graph - Tilera Products and Theme - Single Tile Detail

Scalable Cyber-Security for Terabit Cloud Computing. 2012 IEEE High Performance Extreme Computing. Jordi Ros-Giralt / giralt@reservoir.

Scalable Cyber-Security for Terabit Cloud Computing. 2012 IEEE High Performance Extreme Computing. Jordi Ros-Giralt / giralt@reservoir. Scalable Cyber-Security for Terabit Cloud Computing 2012 IEEE High Performance Extreme Computing Jordi Ros-Giralt / giralt@reservoir.com 1 Objectives To be able to do packet analysis at very high-speed

More information

ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR

ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR ... ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR... IMESH, THE TILE PROCESSOR ARCHITECTURE S ON-CHIP INTERCONNECTION NETWORK, CONNECTS THE MULTICORE PROCESSOR S TILES WITH FIVE 2D MESH NETWORKS,

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

On-Chip Interconnect Network Communication Management for Multi-Core Design

On-Chip Interconnect Network Communication Management for Multi-Core Design Journal of Advances in Computer Network, Vol., No. 3, September 23 On-Chip Interconnect Network Communication Management for Multi-Core Design He Zhou, Mariya Bhopalwala, and Janet oveda multi-core system

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Accelerating the Data Plane With the TILE-Mx Manycore Processor

Accelerating the Data Plane With the TILE-Mx Manycore Processor Accelerating the Data Plane With the TILE-Mx Manycore Processor Bob Doud Director of Marketing EZchip Linley Data Center Conference February 25 26, 2015 1 Announcing the World s First 100-Core A 64-Bit

More information

Enterprise Applications

Enterprise Applications Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting

More information

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Introducing EEMBC Cloud and Big Data Server Benchmarks

Introducing EEMBC Cloud and Big Data Server Benchmarks Introducing EEMBC Cloud and Big Data Server Benchmarks Quick Background: Industry-Standard Benchmarks for the Embedded Industry EEMBC formed in 1997 as non-profit consortium Defining and developing application-specific

More information

Performance of Software Switching

Performance of Software Switching Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

White Paper The Numascale Solution: Extreme BIG DATA Computing

White Paper The Numascale Solution: Extreme BIG DATA Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad ABOUT THE AUTHOR Einar Rustad is CTO of Numascale and has a background as CPU, Computer Systems and HPC Systems De-signer

More information

How To Create A Concurrent Cloud Computing System

How To Create A Concurrent Cloud Computing System THROUGHPUTER PaaS for creating and executing concurrent cloud applications OVERVIEW 1) Fundamental transformation in computing: Concurrent apps on dynamically shared resources Micro-services: unpredictable

More information

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT numascale Hardware Accellerated Data Intensive Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad www.numascale.com Supemicro delivers 108 node system with Numascale

More information

How To Build A Cloud Computer

How To Build A Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

Emerging IT and Energy Star PC Specification Version 4.0: Opportunities and Risks. ITI/EPA Energy Star Workshop June 21, 2005 Donna Sadowy, AMD

Emerging IT and Energy Star PC Specification Version 4.0: Opportunities and Risks. ITI/EPA Energy Star Workshop June 21, 2005 Donna Sadowy, AMD Emerging IT and Energy Star PC Specification Version 4.0: Opportunities and Risks ITI/EPA Energy Star Workshop June 21, 2005 Donna Sadowy, AMD Defining the Goal The ITI members and EPA share a common goal:

More information

An Operating System for Multicore and Clouds

An Operating System for Multicore and Clouds An Operating System for Multicore and Clouds Mechanisms and Implementataion David Wentzlaff, Charles Gruenwald III, Nathan Beckmann, Kevin Modzelewski, Adam Belay, Lamia Youseff, Jason Miller, Anant Agarwal

More information

Low-Overhead Hard Real-time Aware Interconnect Network Router

Low-Overhead Hard Real-time Aware Interconnect Network Router Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering

More information

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Architecture Support for Big Data Analytics

Architecture Support for Big Data Analytics Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1

More information

Operating System Support for Multiprocessor Systems-on-Chip

Operating System Support for Multiprocessor Systems-on-Chip Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.

More information

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals

High Performance Computing. Course Notes 2007-2008. HPC Fundamentals High Performance Computing Course Notes 2007-2008 2008 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Building an energy dashboard. Energy measurement and visualization in current HPC systems Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 thomas.geenen@surfsara.nl SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators

More information

Why the Network Matters

Why the Network Matters Week 2, Lecture 2 Copyright 2009 by W. Feng. Based on material from Matthew Sottile. So Far Overview of Multicore Systems Why Memory Matters Memory Architectures Emerging Chip Multiprocessors (CMP) Increasing

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Next Generation Operating Systems

Next Generation Operating Systems Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the

More information

Chapter 2 Parallel Architecture, Software And Performance

Chapter 2 Parallel Architecture, Software And Performance Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program

More information

Chapter 2 Parallel Computer Architecture

Chapter 2 Parallel Computer Architecture Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general

More information

Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track)

Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track) Plan Number 2009 Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track) I. General Rules and Conditions 1. This plan conforms to the regulations of the general frame of programs

More information

Principles and characteristics of distributed systems and environments

Principles and characteristics of distributed systems and environments Principles and characteristics of distributed systems and environments Definition of a distributed system Distributed system is a collection of independent computers that appears to its users as a single

More information

Big Data and Cloud Computing for GHRSST

Big Data and Cloud Computing for GHRSST Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

A Locality Approach to Architecture-aware Task-scheduling in OpenMP

A Locality Approach to Architecture-aware Task-scheduling in OpenMP A Locality Approach to Architecture-aware Task-scheduling in OpenMP Ananya Muddukrishna ananya@kth.se Mats Brorsson matsbror@kth.se Vladimir Vlassov vladv@kth.se ABSTRACT Multicore and other parallel computer

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis Parallel Computers Definition: A parallel computer is a collection of processing

More information

Switching Architectures for Cloud Network Designs

Switching Architectures for Cloud Network Designs Overview Networks today require predictable performance and are much more aware of application flows than traditional networks with static addressing of devices. Enterprise networks in the past were designed

More information

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Client/Server Computing Distributed Processing, Client/Server, and Clusters Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the

More information

Lecture 23: Multiprocessors

Lecture 23: Multiprocessors Lecture 23: Multiprocessors Today s topics: RAID Multiprocessor taxonomy Snooping-based cache coherence protocol 1 RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) it uses an array of disks

More information

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT SPOT THE ODD ONE BEFORE IT IS OUT flexaware.net Streaming analytics: from data to action Do you need actionable insights from various data streams fast?

More information

Factored Operating Systems(fos): The Case for a Scalable Operating System for Multicores

Factored Operating Systems(fos): The Case for a Scalable Operating System for Multicores Factored Operating Systems(fos): The Case for a Scalable Operating System for Multicores David Wentzlaff and Anant Agarwal Computer Science and Artificial Intelligence Laboratory Massachusetts Institute

More information

Tiled Multicore Processors: The Four Stages of Reality

Tiled Multicore Processors: The Four Stages of Reality Tiled Multicore Processors: The Four Stages of Reality Anant Agarwal MIT and Tilera 1 Moore s Gap. Diminishing returns from sequential processor mechanisms. Wire delays. Power envelopes 2 Tilera s Tile

More information

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip

Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Recursive Partitioning Multicast: A Bandwidth-Efficient Routing for Networks-On-Chip Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim Department of Computer Science and Engineering Texas A&M University

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

Multilevel Load Balancing in NUMA Computers

Multilevel Load Balancing in NUMA Computers FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,

More information

CISC, RISC, and DSP Microprocessors

CISC, RISC, and DSP Microprocessors CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

International Journal of Computer & Organization Trends Volume20 Number1 May 2015

International Journal of Computer & Organization Trends Volume20 Number1 May 2015 Performance Analysis of Various Guest Operating Systems on Ubuntu 14.04 Prof. (Dr.) Viabhakar Pathak 1, Pramod Kumar Ram 2 1 Computer Science and Engineering, Arya College of Engineering, Jaipur, India.

More information

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons

More information

BSPCloud: A Hybrid Programming Library for Cloud Computing *

BSPCloud: A Hybrid Programming Library for Cloud Computing * BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,

More information

Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud

Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud 1 S.Karthika, 2 T.Lavanya, 3 G.Gokila, 4 A.Arunraja 5 S.Sarumathi, 6 S.Saravanakumar, 7 A.Gokilavani 1,2,3,4 Student, Department

More information

- Nishad Nerurkar. - Aniket Mhatre

- Nishad Nerurkar. - Aniket Mhatre - Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

Parallel Programming

Parallel Programming Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix

More information

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

Workshare Process of Thread Programming and MPI Model on Multicore Architecture Vol., No. 7, 011 Workshare Process of Thread Programming and MPI Model on Multicore Architecture R. Refianti 1, A.B. Mutiara, D.T Hasta 3 Faculty of Computer Science and Information Technology, Gunadarma

More information

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches 3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,

More information

White Paper The Numascale Solution: Affordable BIG DATA Computing

White Paper The Numascale Solution: Affordable BIG DATA Computing White Paper The Numascale Solution: Affordable BIG DATA Computing By: John Russel PRODUCED BY: Tabor Custom Publishing IN CONJUNCTION WITH: ABSTRACT Big Data applications once limited to a few exotic disciplines

More information

System Models for Distributed and Cloud Computing

System Models for Distributed and Cloud Computing System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

Driving force. What future software needs. Potential research topics

Driving force. What future software needs. Potential research topics Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #

More information

Mixed-Criticality: Integration of Different Models of Computation. University of Siegen, Roman Obermaisser

Mixed-Criticality: Integration of Different Models of Computation. University of Siegen, Roman Obermaisser Workshop on "Challenges in Mixed Criticality, Real-time, and Reliability in Networked Complex Embedded Systems" Mixed-Criticality: Integration of Different Models of Computation University of Siegen, Roman

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Networking Goes Open-Source. Michael Zimmerman VP Marketing, Tilera mzimmerman@tilera.com

Networking Goes Open-Source. Michael Zimmerman VP Marketing, Tilera mzimmerman@tilera.com Networking Goes Open-Source Michael Zimmerman VP Marketing, Tilera mzimmerman@tilera.com Open Server Summit, October 23, 2013 Networking Goes Open-Source ? Networking Goes Open-Source Are they connected

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

FLIX: Fast Relief for Performance-Hungry Embedded Applications

FLIX: Fast Relief for Performance-Hungry Embedded Applications FLIX: Fast Relief for Performance-Hungry Embedded Applications Tensilica Inc. February 25 25 Tensilica, Inc. 25 Tensilica, Inc. ii Contents FLIX: Fast Relief for Performance-Hungry Embedded Applications...

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction

More information

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC

More information

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB Executive Summary Oracle Berkeley DB is used in a wide variety of carrier-grade mobile infrastructure systems. Berkeley DB provides

More information

A Lab Course on Computer Architecture

A Lab Course on Computer Architecture A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Low Power AMD Athlon 64 and AMD Opteron Processors

Low Power AMD Athlon 64 and AMD Opteron Processors Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD

More information

CS244 Lecture 5 Architecture and Principles

CS244 Lecture 5 Architecture and Principles CS244 Lecture 5 Architecture and Principles Network Virtualiza/on in Mul/- tenant Datacenters, NSDI 2014. Guido Appenzeller Background Why is SDN Happening? CLOSED & PROPRIETARY NETWORKING EQUIPMENT Vertically

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09 Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,

More information

Resource Utilization of Middleware Components in Embedded Systems

Resource Utilization of Middleware Components in Embedded Systems Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system

More information

Design and Implementation of the Heterogeneous Multikernel Operating System

Design and Implementation of the Heterogeneous Multikernel Operating System 223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,

More information

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do

More information

Effective Utilization of Multicore Processor for Unified Threat Management Functions

Effective Utilization of Multicore Processor for Unified Threat Management Functions Journal of Computer Science 8 (1): 68-75, 2012 ISSN 1549-3636 2012 Science Publications Effective Utilization of Multicore Processor for Unified Threat Management Functions Sudhakar Gummadi and Radhakrishnan

More information

Tolerating SEU Faults in the Raw Architecture

Tolerating SEU Faults in the Raw Architecture Tolerating SEU Faults in the Raw Architecture Karandeep Singh #, Adnan Agbaria +*, Dong-In Kang #, and Matthew French # # USC Information Sciences Institute, Arlington VA, USA Email: {karan, dkang, mfrench}

More information

Improving the performance of data servers on multicore architectures. Fabien Gaud

Improving the performance of data servers on multicore architectures. Fabien Gaud Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble University Advisors: Jean-Bernard Stefani, Renaud Lachaize and Vivien Quéma Sardes (INRIA/LIG) December 2, 2010

More information

Memory Architecture and Management in a NoC Platform

Memory Architecture and Management in a NoC Platform Architecture and Management in a NoC Platform Axel Jantsch Xiaowen Chen Zhonghai Lu Chaochao Feng Abdul Nameed Yuang Zhang Ahmed Hemani DATE 2011 Overview Motivation State of the Art Data Management Engine

More information

membase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010

membase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010 membase.org: The Simple, Fast, Elastic NoSQL Database NorthScale Matt Ingenthron OSCON 2010 Membase is an Open Source distributed, key-value database management system optimized for storing data behind

More information

~ Greetings from WSU CAPPLab ~

~ Greetings from WSU CAPPLab ~ ~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)

More information

DDR3 memory technology

DDR3 memory technology DDR3 memory technology Technology brief, 3 rd edition Introduction... 2 DDR3 architecture... 2 Types of DDR3 DIMMs... 2 Unbuffered and Registered DIMMs... 2 Load Reduced DIMMs... 3 LRDIMMs and rank multiplication...

More information

AMD Opteron Quad-Core

AMD Opteron Quad-Core AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced

More information

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of

More information

A Survey of Cloud Computing Guanfeng Octides

A Survey of Cloud Computing Guanfeng Octides A Survey of Cloud Computing Guanfeng Nov 7, 2010 Abstract The principal service provided by cloud computing is that underlying infrastructure, which often consists of compute resources like storage, processors,

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Design Patterns for Packet Processing Applications on Multi-core Intel Architecture Processors

Design Patterns for Packet Processing Applications on Multi-core Intel Architecture Processors White Paper Cristian F. Dumitrescu Software Engineer Intel Corporation Design Patterns for Packet Processing Applications on Multi-core Intel Architecture Processors December 2008 321058 Executive Summary

More information

OpenSoC Fabric: On-Chip Network Generator

OpenSoC Fabric: On-Chip Network Generator OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation

More information

Distributed and Cloud Computing

Distributed and Cloud Computing Distributed and Cloud Computing K. Hwang, G. Fox and J. Dongarra Chapter 1: Enabling Technologies and Distributed System Models Copyright 2012, Elsevier Inc. All rights reserved. 1 1-1 Data Deluge Enabling

More information

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, Josef.Pelikan@mff.cuni.cz Abstract 1 Interconnect quality

More information