Memory Architecture and Management in a NoC Platform

Size: px
Start display at page:

Download "Memory Architecture and Management in a NoC Platform"

Transcription

1 Architecture and Management in a NoC Platform Axel Jantsch Xiaowen Chen Zhonghai Lu Chaochao Feng Abdul Nameed Yuang Zhang Ahmed Hemani DATE 2011

2 Overview Motivation State of the Art Data Management Engine Architecture Virtual Address Space Cache Coherence Consistency Synchronization Message Passing based Communication Experiments DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 2 / 27

3 The Wall Wulf and McKee predicted in 1995 the impact into the memory wall: t avg = p t c +(1 p) t m t avg...average access latency for one data word p...probability of a cache hit t c...access time to the cache t m...access time to main memory DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 3 / 27

4 The Wall Wulf and McKee predicted in 1995 the impact into the memory wall: t avg = p t c +(1 p) t m t avg...average access latency for one data word p...probability of a cache hit t c...access time to the cache t m...access time to main memory Today we navigate at the edge of this wall. DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 3 / 27

5 The Wall Wulf and McKee predicted in 1995 the impact into the memory wall: t avg = p t c +(1 p) t m t avg...average access latency for one data word p...probability of a cache hit t c...access time to the cache t m...access time to main memory Today we navigate at the edge of this wall. For example the Tilera 64 core chip: Raw processor performance: 443 Gops DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 3 / 27

6 The Wall Wulf and McKee predicted in 1995 the impact into the memory wall: t avg = p t c +(1 p) t m t avg...average access latency for one data word p...probability of a cache hit t c...access time to the cache t m...access time to main memory Today we navigate at the edge of this wall. For example the Tilera 64 core chip: Raw processor performance: 443 Gops Required memory bandwidth with two cache levels and 95% hit rate: 7.2Gb/s Available memory bandwidth: 50 Gb/s DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 3 / 27

7 The Wall Wulf and McKee predicted in 1995 the impact into the memory wall: t avg = p t c +(1 p) t m t avg...average access latency for one data word p...probability of a cache hit t c...access time to the cache t m...access time to main memory Today we navigate at the edge of this wall. For example the Tilera 64 core chip: Raw processor performance: 443 Gops Required memory bandwidth with two cache levels and 95% hit rate: 7.2Gb/s Available memory bandwidth: 50 Gb/s Required memory access latency: cycles Available average access time: cycles DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 3 / 27

8 The Access Bottleneck I/O Bottleneck Processing DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 4 / 27

9 Bandwidth in 3D ICs Embedded Processor DRAM Die Processing Die (NoC) DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 5 / 27

10 Bandwidth in 3D ICs Processor DRAM Die Processing Die (NoC) Embedded 8x8 MPSoC Resource size: 2x2 mm2 Switch size: 100x100 um2 TSV pitch: 5 um TSVs/switch: 256 Gb/s memory bandwidth per core 16 Tb/s aggregate memory bandwidth (200 Gb/s for Tilera) < 10 ns delay DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 5 / 27

11 Bandwidth in 3D ICs Processor DRAM Die Processing Die (NoC) Embedded 8x8 MPSoC Resource size: 2x2 mm2 Switch size: 100x100 um2 TSV pitch: 5 um TSVs/switch: 256 Gb/s memory bandwidth per core 16 Tb/s aggregate memory bandwidth (200 Gb/s for Tilera) < 10 ns delay 80 x bandwidth 1/5 latency 1/10 power No area overhead DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 5 / 27

12 Access Protocols Protocol Characteristics Frequency buswidth Voltage max bandiwtdh Efficiency DDR MHz 32 bit 1.5 V GB/s 4.17 GB/s/W LPDDR2 533 MHz 32 bit 1.2 V GB/s 6.25 GB/s/W Wide I/O 200 MHz 512 bit 1.2 V 12.8 GB/s 25.0 GB/s/W Cost for 1 TB/s Protocol No of ports No of pads Power DDR W LPDDR W Wide I/O W DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 6 / 27

13 Package Architecture Roadmap 3D with TSVs POP with Stacked MCP Stacked MCP Off-Package From: Denis Dutois and Ahmed Jerraya. 3D Integration Opportunities for Interconnect in Mobile Computing Architectures. In: Future Fab International 34 (July 2010) DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 7 / 27

14 Architecture State of the Art Custom memory architectures in many SoCs and embedded systems with no shared memory space No HW support for shared memory space, cache coherence and consistency (e.g. Inte s 48 core SCC) Uniform shared memory space with snooping based cache coherence for small multi-core systems (ARM s Cortex A9) Non-Uniform Cache Architecture (NUCA) in regular many-core SoCs; introduced in 2002: C. Kim, D. Burger, and S. Keckler. An Adaptive, Non-uniform Cache Structure for Wire-Delay Dominated On-Chip Caches. In: Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems No general, flexible, and scalable solution for the many-core era DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 8 / 27

15 Platform Overview DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 9 / 27

16 Data Management Engine Architecture DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 10 / 27

17 Virtual Address Space Translation Page Address Offset Virtual Address Address Translation Table Node Address Page Address Physical Address Offset DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 11 / 27

18 Virtual Address Space Translation S T = (log 2 N +log 2 M N log 2 P S ) (M T /P S ) S T... Size of translation table per node N... Number of nodes M N... per node P S... Page size M T... Total memory size DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 12 / 27

19 Virtual Address Space Translation S T = (log 2 N +log 2 M N log 2 P S ) (M T /P S ) S T... Size of translation table per node N... Number of nodes M N... per node P S... Page size M T... Total memory size 64 nodes DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 12 / 27

20 Virtual Address Space Translation S T = (log 2 N +log 2 M N log 2 P S ) (M T /P S ) S T... Size of translation table per node N... Number of nodes M N... per node P S... Page size M T... Total memory size 64 nodes MB/node DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 12 / 27

21 Virtual Address Space Translation S T = (log 2 N +log 2 M N log 2 P S ) (M T /P S ) S T... Size of translation table per node N... Number of nodes M N... per node P S... Page size M T... Total memory size 64 nodes MB/node GB total memory DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 12 / 27

22 Virtual Address Space Translation S T = (log 2 N +log 2 M N log 2 P S ) (M T /P S ) S T... Size of translation table per node N... Number of nodes M N... per node P S... Page size M T... Total memory size 64 nodes MB/node GB total memory Page size 1KB DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 12 / 27

23 Virtual Address Space Translation S T = (log 2 N +log 2 M N log 2 P S ) (M T /P S ) S T... Size of translation table per node N... Number of nodes M N... per node P S... Page size M T... Total memory size 64 nodes MB/node GB total memory Page size 1KB Translation table: 1 M entries DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 12 / 27

24 Virtual Address Space Translation S T = (log 2 N +log 2 M N log 2 P S ) (M T /P S ) S T... Size of translation table per node N... Number of nodes M N... per node P S... Page size M T... Total memory size 64 nodes MB/node GB total memory Page size 1KB Translation table: 1 M entries entry: 6+14=20 bit 3 Byte DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 12 / 27

25 Virtual Address Space Translation S T = (log 2 N +log 2 M N log 2 P S ) (M T /P S ) S T... Size of translation table per node N... Number of nodes M N... per node P S... Page size M T... Total memory size 64 nodes MB/node GB total memory Page size 1KB Translation table: 1 M entries entry: 6+14=20 bit 3 Byte 3MB/node (18.75%) for translation table DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 12 / 27

26 DME Virtual Address Translation Page Address Offset Virtual Address NO > BADDR Page Address Offset YES Address Translation Table Node Address Page Address Offset Physical Address DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 13 / 27

27 Supported Partitions local private physical Supported local private virtual - local shared physical - local shared virtual Supported remote private physical - remote private virtual - remote shared physical - remote shared virtual Supported DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 14 / 27

28 Address Space Management Node 1 Address Space Node 2 Address Space Node 3 Address Space Node 4 Address Space 0X X BADDR 0X X Private, Local local Private, Local local remote BADDR 0X0F00000 local Shared, Local or local Private, Local remote Shared, Local or remote 0XFB00000 BADDR 0XFFF0000 local remote Shared, Local or remote remote 0XFFFC000 local 0XFFFFFFFF 0XFFFFFFFF 0XFFFFFFFF BADDR 0XFFFFFFFF DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 15 / 27

29 Multi-core Speedup DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 16 / 27

30 Experiment: Central vs. Distributed Shared DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 17 / 27

31 Central vs. Distributed Shared : Matrix Multiplication DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 18 / 27

32 Central vs. Distributed Shared : FFT DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 19 / 27

33 Summary After computation (multi-core) and communication (NoC), memory access must be parallalized DME parallizes memory handling DME supports Central/distributed memory Private/shared Physical/virtual address space DME features Synchronization support Cache coherence protocols consistency support Message passing communication Dynamic memory allocator Future Work Improve cache coherency Improved high level support for programming models Adapt DME to integrate heterogeneous SoC subsystems DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 20 / 27

34 Data Management Engine Implementation Optimized for area Optimized for speed Frequency 444 MHz (2.25 ns) 455 MHz (2.2 ns) Area (Logic) 44k NAND gates 51k NAND gates Area (Control Store) 300k NAND gates power consumption [mw] Mini A 6.9 Mini B 7.0 NICU 2.3 CICU 5.2 Synchronizer 0.2 DME total 21.6 DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 21 / 27

35 Directory Based Cache Coherence Processor Node Processor Node Processor Node Processor Node Cache Cache Cache Cache Processor Node Processor Node Processor Node Processor Node Cache Cache Cache Cache Processor Node Processor Node Processor Node Controller Cache Cache Cache Directory DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 22 / 27

36 Directory Structure Node 1 Node 2 Node 3 Directory Node N block DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 23 / 27

37 Directory Size 1 entry / memory block Node 1 Node 2 Node 3 Directory Node N block S D = S T /S B (N +1) S D... Size of directory S T... Total memory N... No of nodes C B... blocks/cache S B... Block size S C... Cache size DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 24 / 27

38 Directory Size 1 entry / memory block Node 1 Node 2 Node 3 Directory Node N block S D = S T /S B (N +1) S D... Size of directory S T... Total memory N... No of nodes C B... blocks/cache S B... Block size S C... Cache size 64 nodes DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 24 / 27

39 Directory Size 1 entry / memory block Node 1 Node 2 Node 3 Directory Node N block S D = S T /S B (N +1) S D... Size of directory S T... Total memory N... No of nodes C B... blocks/cache S B... Block size S C... Cache size 64 nodes GB total memory DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 24 / 27

40 Directory Size 1 entry / memory block Node 1 Node 2 Node 3 Directory Node N block S D = S T /S B (N +1) S D... Size of directory S T... Total memory N... No of nodes C B... blocks/cache S B... Block size S C... Cache size 64 nodes GB total memory Block size 64B DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 24 / 27

41 Directory Size 1 entry / memory block Node 1 Node 2 Node 3 Directory Node N block S D = S T /S B (N +1) S D... Size of directory S T... Total memory N... No of nodes C B... blocks/cache S B... Block size S C... Cache size 64 nodes GB total memory Block size 64B Cache size 8MB DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 24 / 27

42 Directory Size 1 entry / memory block Node 1 Node 2 Node 3 Directory Node N block S D = S T /S B (N +1) S D... Size of directory S T... Total memory N... No of nodes C B... blocks/cache S B... Block size S C... Cache size 64 nodes GB total memory Block size 64B Cache size 8MB No of blocks/cache: 128K DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 24 / 27

43 Directory Size 1 entry / memory block Node 1 Node 2 Node 3 Directory Node N block S D = S T /S B (N +1) S D... Size of directory S T... Total memory N... No of nodes C B... blocks/cache S B... Block size S C... Cache size 64 nodes GB total memory Block size 64B Cache size 8MB No of blocks/cache: 128K S D = 520MB 13% of total memory 102% of total cache size DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 24 / 27

44 Directory Size 1 entry / cache block Node 1 Node 2 Node 3 Directory Node N Cache block Cache 1 Cache 2 Cache 3 S D = N C B (N +1) S D... Size of directory S T... Total memory N... No of nodes C B... blocks/cache S B... Block size S C... Cache size DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 25 / 27

45 Directory Size 1 entry / cache block Node 1 Node 2 Node 3 Directory Node N Cache block Cache 1 Cache 2 Cache 3 S D = N C B (N +1) S D... Size of directory S T... Total memory N... No of nodes C B... blocks/cache S B... Block size S C... Cache size 64 nodes GB total memory Block size 64B Cache size 8MB No of blocks/cache: 128K S D = 65MB 1.5% of total memory 12.6% of total cache size DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 25 / 27

46 Consistency Experiment Initially, int x, y,z = 0; (0,0) CS NODE (0,1) (0,2) // Non-critical section1 x = data1; Reg1 = x; // Lock Acquire Acquire (L); (1,0) (1,1) SYNC NODE (1,2) // Critical Section y = data2; Reg2 = y; // Lock Release Release(L) (2,0) (2,1) (2,2) (a) // Non-critical Section2 z = data3; Reg3 = z; (b) DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 26 / 27

47 Consistency Execution Time 10 x Average Code Latency (Release consistency) Max Code Latency (Release consistency) Average Code Latency (Weak consistency) Max Code Latency (Weak consistency) Average Code Latency (Sequential consistency) Max Code Latency (Sequential consistency) Cycles x1 1x2 2x2 2x4 4x4 4x8 8x8 Network Size DATE 2011 Tutorial MPSoC Hardware/Software Architectural and Design Challenges/Solutions, Axel Jantsch, KTH 27 / 27

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

Linux flash file systems JFFS2 vs UBIFS

Linux flash file systems JFFS2 vs UBIFS Linux flash file systems JFFS2 vs UBIFS Chris Simmonds 2net Limited Embedded Systems Conference UK. 2009 Copyright 2009, 2net Limited Overview Many embedded systems use raw flash chips JFFS2 has been the

More information

Multi-core Systems What can we buy today?

Multi-core Systems What can we buy today? Multi-core Systems What can we buy today? Ian Watson & Mikel Lujan Advanced Processor Technologies Group COMP60012 Future Multi-core Computing 1 A Bit of History AMD Opteron introduced in 2003 Hypertransport

More information

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015 Agenda Soft Machines Background

More information

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code

More information

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : Sebastien.Le-Beux@polymtl.ca 1 Multi-Processor Systems on Chip (MPSoC) Design Trends

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

DDR subsystem: Enhancing System Reliability and Yield

DDR subsystem: Enhancing System Reliability and Yield DDR subsystem: Enhancing System Reliability and Yield Agenda Evolution of DDR SDRAM standards What is the variation problem? How DRAM standards tackle system variability What problems have been adequately

More information

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant

More information

AMD Opteron Quad-Core

AMD Opteron Quad-Core AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced

More information

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi University of Utah & HP Labs 1 Large Caches Cache hierarchies

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

ARM Cortex A9. Alyssa Colyette Xiao Ling Zhuang

ARM Cortex A9. Alyssa Colyette Xiao Ling Zhuang ARM Cortex A9 Alyssa Colyette Xiao Ling Zhuang Outline Introduction ARMv7-A ISA Cortex-A9 Microarchitecture o Single and Multicore Processor Advanced Multicore Technologies Integrating System on Chips

More information

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook) Vivek Sarkar Department of Computer Science Rice University vsarkar@rice.edu COMP

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

enabling Ultra-High Bandwidth Scalable SSDs with HLnand

enabling Ultra-High Bandwidth Scalable SSDs with HLnand www.hlnand.com enabling Ultra-High Bandwidth Scalable SSDs with HLnand May 2013 2 Enabling Ultra-High Bandwidth Scalable SSDs with HLNAND INTRODUCTION Solid State Drives (SSDs) are available in a wide

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

The Effect and Technique of System Coherence in ARM Multicore Technology

The Effect and Technique of System Coherence in ARM Multicore Technology The Effect and Technique of System Coherence in ARM Multicore Technology John Goodacre Senior Program Manager ARM Division Cambridge, UK Cortex -A9 Microarchitecture (single core variant) Coresight / JTAG

More information

Run-time Partitioning of Hybrid Distributed Shared Memory on Multi-core Network-on-Chips

Run-time Partitioning of Hybrid Distributed Shared Memory on Multi-core Network-on-Chips 3rd International Symposium on Parallel Architectures, Algorithms and Programming Run-time Partitioning of Hybrid Distributed Shared Memory on Multi-core Network-on-Chips Xiaowen Chen,, Zhonghai Lu, Axel

More information

Main Memory Background

Main Memory Background ECE 554 Computer Architecture Lecture 5 Main Memory Spring 2013 Sudeep Pasricha Department of Electrical and Computer Engineering Colorado State University Pasricha; portions: Kubiatowicz, Patterson, Mutlu,

More information

Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs. Jim Ready September 18, 2012

Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs. Jim Ready September 18, 2012 Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs Jim Ready September 18, 2012 How HW guys view the world SW Software HW How SW guys view the world SW HW Reality The SoC Software

More information

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Introduction Big Data Analytics needs: Low latency data access Fast computing Power efficiency Latest

More information

Price/performance Modern Memory Hierarchy

Price/performance Modern Memory Hierarchy Lecture 21: Storage Administration Take QUIZ 15 over P&H 6.1-4, 6.8-9 before 11:59pm today Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Last Time Exam discussion

More information

Chapter 2 Heterogeneous Multicore Architecture

Chapter 2 Heterogeneous Multicore Architecture Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is

More information

Module 2: "Parallel Computer Architecture: Today and Tomorrow" Lecture 4: "Shared Memory Multiprocessors" The Lecture Contains: Technology trends

Module 2: Parallel Computer Architecture: Today and Tomorrow Lecture 4: Shared Memory Multiprocessors The Lecture Contains: Technology trends The Lecture Contains: Technology trends Architectural trends Exploiting TLP: NOW Supercomputers Exploiting TLP: Shared memory Shared memory MPs Bus-based MPs Scaling: DSMs On-chip TLP Economics Summary

More information

NAND Flash & Storage Media

NAND Flash & Storage Media ENABLING MULTIMEDIA NAND Flash & Storage Media March 31, 2004 NAND Flash Presentation NAND Flash Presentation Version 1.6 www.st.com/nand NAND Flash Memories Technology Roadmap F70 1b/c F12 1b/c 1 bit/cell

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based

More information

The Orca Chip... Heart of IBM s RISC System/6000 Value Servers

The Orca Chip... Heart of IBM s RISC System/6000 Value Servers The Orca Chip... Heart of IBM s RISC System/6000 Value Servers Ravi Arimilli IBM RISC System/6000 Division 1 Agenda. Server Background. Cache Heirarchy Performance Study. RS/6000 Value Server System Structure.

More information

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano Intel Itanium Quad-Core Architecture for the Enterprise Lambert Schaelicke Eric DeLano Agenda Introduction Intel Itanium Roadmap Intel Itanium Processor 9300 Series Overview Key Features Pipeline Overview

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Lecture 23: Multiprocessors

Lecture 23: Multiprocessors Lecture 23: Multiprocessors Today s topics: RAID Multiprocessor taxonomy Snooping-based cache coherence protocol 1 RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) it uses an array of disks

More information

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID The vast majority of the world s servers count on LSI SAS & MegaRAID Trust us, build the LSI credibility in storage, SAS, RAID Server installed base = 36M LSI SAS inside 60% of servers 21 million LSI SAS

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

COMP/ELEC 525 Advanced Microprocessor Architecture. Goals

COMP/ELEC 525 Advanced Microprocessor Architecture. Goals COMP/ELEC 525 Advanced Microprocessor Architecture Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 9, 2007 Goals 4 Introduction to research topics in processor design 4 Solid understanding

More information

High Performance or Cycle Accuracy?

High Performance or Cycle Accuracy? CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing

More information

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC Alan Gara Intel Fellow Exascale Chief Architect Legal Disclaimer Today s presentations contain forward-looking

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

DRAM Parameters. DRAM parameters. Large capacity: e.g., 1-4Gb Arranged as square + Minimizes wire length + Maximizes refresh efficiency.

DRAM Parameters. DRAM parameters. Large capacity: e.g., 1-4Gb Arranged as square + Minimizes wire length + Maximizes refresh efficiency. DRAM Parameters address DRAM parameters Large capacity: e.g., 1-4Gb Arranged as square + Minimizes wire length + Maximizes efficiency DRAM bit array Narrow data interface: 1 16 bit Cheap packages few bus

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Scaling from Datacenter to Client

Scaling from Datacenter to Client Scaling from Datacenter to Client KeunSoo Jo Sr. Manager Memory Product Planning Samsung Semiconductor Audio-Visual Sponsor Outline SSD Market Overview & Trends - Enterprise What brought us to NVMe Technology

More information

Industry First X86-based Single Board Computer JaguarBoard Released

Industry First X86-based Single Board Computer JaguarBoard Released Industry First X86-based Single Board Computer JaguarBoard Released HongKong, China (May 12th, 2015) Jaguar Electronic HK Co., Ltd officially launched the first X86-based single board computer called JaguarBoard.

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER ANDREAS-LAZAROS GEORGIADIS, SOTIRIOS XYDIS, DIMITRIOS SOUDRIS MICROPROCESSOR AND MICROSYSTEMS LABORATORY ELECTRICAL AND

More information

Tyche: An efficient Ethernet-based protocol for converged networked storage

Tyche: An efficient Ethernet-based protocol for converged networked storage Tyche: An efficient Ethernet-based protocol for converged networked storage Pilar González-Férez and Angelos Bilas 30 th International Conference on Massive Storage Systems and Technology MSST 2014 June

More information

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle M7: Next Generation SPARC Hotchips 26 August 12, 2014 Stephen Phillips Senior Director, SPARC Architecture Oracle Copyright 2014 Oracle and/ nd/or its affiliates. All rights reserved. Safe Harbor Statement

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

Architectural Overview of the Apple A4 APL0398 ARM Application Processor found on the ipad

Architectural Overview of the Apple A4 APL0398 ARM Application Processor found on the ipad Architectural Overview of the Apple A4 APL0398 ARM Application Processor found on the ipad THIS COPY SUPPLIED EXCLUSIVELY FOR THE USE OF: ARM Technology Conference April 2010 IMPORTANT NOTICE REGARDING

More information

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance. Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance

More information

TrustZone, DSP and SIMD Extensions

TrustZone, DSP and SIMD Extensions ARCHITECTURE FOR MULTIMEDIA SYSTEMS ARM Cortex-A Series with Jazelle, TrustZone, DSP and SIMD Extensions Professor: Cristina Silvano P t d b Presented by: Vu Duc Xuan Quang 736324 Contents Cortex-A series

More information

3D Integration for Smartphones

3D Integration for Smartphones 3D Integration for Smartphones European 3D TSV Summit, Jan 13, Grenoble georg.kimmich@stericsson.com Outline Introduction Smartphone Requirements WideIO 3D Smart Partitioning Conclusion 2 Smartphones Environment

More information

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel Computation Parallel I/O (I) I/O basics Spring 2008 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network

More information

HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS

HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS JASON POWER*, ARKAPRAVA BASU*, JUNLI GU, SOORAJ PUTHOOR, BRADFORD M BECKMANN, MARK D HILL*, STEVEN K REINHARDT, DAVID A WOOD* *University of

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Chapter 6. 6.1 Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig. 6.1. I/O devices can be characterized by. I/O bus connections

Chapter 6. 6.1 Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig. 6.1. I/O devices can be characterized by. I/O bus connections Chapter 6 Storage and Other I/O Topics 6.1 Introduction I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections

More information

A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems

A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems 1 A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems Chul Lee Software Laboratory Samsung Advanced Institute of Technology Samsung Electronics Outline 2 Background

More information

Resource Efficient Computing for Warehouse-scale Datacenters

Resource Efficient Computing for Warehouse-scale Datacenters Resource Efficient Computing for Warehouse-scale Datacenters Christos Kozyrakis Stanford University http://csl.stanford.edu/~christos DATE Conference March 21 st 2013 Computing is the Innovation Catalyst

More information

Introduction to AMBA 4 ACE and big.little Processing Technology

Introduction to AMBA 4 ACE and big.little Processing Technology Introduction to AMBA 4 and big.little Processing Technology Ashley Stevens Senior FAE, Fabric and Systems June 6th 2011 Updated July 29th 2013 Page 1 of 15 Why AMBA 4? The continual requirement for more

More information

Cray DVS: Data Virtualization Service

Cray DVS: Data Virtualization Service Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with

More information

Building Blocks for PRU Development

Building Blocks for PRU Development Building Blocks for PRU Development Module 1 PRU Hardware Overview This session covers a hardware overview of the PRU-ICSS Subsystem. Author: Texas Instruments, Sitara ARM Processors Oct 2014 2 ARM SoC

More information

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Web Email DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing) 1 1 Distributed Systems What are distributed systems? How would you characterize them? Components of the system are located at networked computers Cooperate to provide some service No shared memory Communication

More information

International Summer School on Embedded Systems

International Summer School on Embedded Systems International Summer School on Embedded Systems Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen, July 30 -- August 3, 2012 Sponsored by Chinese Academy of Sciences and

More information

Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support

Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support Ranjit Noronha and Dhabaleswar K. Panda Network Based Computing Laboratory (NBCL) The Ohio State University Outline

More information

Intel Xeon Processor E5-2600

Intel Xeon Processor E5-2600 Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset

More information

Embedded Parallel Computing

Embedded Parallel Computing Embedded Parallel Computing Lecture 5 - The anatomy of a modern multiprocessor, the multicore processors Tomas Nordström Course webpage:: Course responsible and examiner: Tomas

More information

Server Forum 2014. Copyright 2014 [JS Choi Samsung]

Server Forum 2014. Copyright 2014 [JS Choi Samsung] Next Big Thing : DDR4 3DS Server Forum 2014 Copyright 2014 [JS Choi Samsung] Agenda Memory Requirement How to Address What s 3DS It s IN PRODUCTION DRAM Market & Application 47% of DRAM for Server and

More information

Low-Overhead Hard Real-time Aware Interconnect Network Router

Low-Overhead Hard Real-time Aware Interconnect Network Router Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering

More information

Advantages of e-mmc 4.4 based Embedded Memory Architectures

Advantages of e-mmc 4.4 based Embedded Memory Architectures Embedded NAND Solutions from 2GB to 128GB provide configurable MLC/SLC storage in single memory module with an integrated controller By Scott Beekman, senior business development manager Toshiba America

More information

A New Chapter for System Designs Using NAND Flash Memory

A New Chapter for System Designs Using NAND Flash Memory A New Chapter for System Designs Using Memory Jim Cooke Senior Technical Marketing Manager Micron Technology, Inc December 27, 2010 Trends and Complexities trends have been on the rise since was first

More information

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

From Bus and Crossbar to Network-On-Chip. Arteris S.A. From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,

More information

Introducing the Singlechip Cloud Computer

Introducing the Singlechip Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

Introduction to Dataflow Computing

Introduction to Dataflow Computing Introduction to Dataflow Computing Maxeler Dataflow Computing Workshop STFC Hartree Centre, June 2013 Programmable Spectrum Control-flow processors Dataflow processor GK110 Single-Core CPU Multi-Core Several-Cores

More information

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Copyright 2013, Oracle and/or its affiliates. All rights reserved. 1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information

More information

Active Storage For Large-Scale Data Mining and Multimedia

Active Storage For Large-Scale Data Mining and Multimedia Active Storage For Large-Scale Data Mining and Multimedia Erik Riedel University www.cs.cmu.edu/~riedel CALD Seminar Center for Automated Learning and Discovery April 3, 1998 Outline Network-Attached Disks

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Lecture 11: Memory Hierarchy Design. CPU-Memory Performance Gap

Lecture 11: Memory Hierarchy Design. CPU-Memory Performance Gap Lecture 11: Memory Hierarchy Design Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 CPU-Memory Performance Gap 2 The Memory Bottleneck Typical CPU clock

More information

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing

More information

Performance Characteristics of Large SMP Machines

Performance Characteristics of Large SMP Machines Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark

More information

Mini System 101 Our Price: $669

Mini System 101 Our Price: $669 Mini System 101 Our Price: $669 Mini System 102 Our Price: $610 Processor Features 667MHz front side bus, 512KB L2 cache and 1.33GHz processor speed. with 1024 x 600 resolutions delivers intense detail

More information

Caching Mechanisms for Mobile and IOT Devices

Caching Mechanisms for Mobile and IOT Devices Caching Mechanisms for Mobile and IOT Devices Masafumi Takahashi Toshiba Corporation JEDEC Mobile & IOT Technology Forum Copyright 2016 Toshiba Corporation Background. Unified concept. Outline The first

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09 Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors NoCArc 09 Jesús Camacho Villanueva, José Flich, José Duato Universidad Politécnica de Valencia December 12,

More information

Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications

Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications Freescale Semiconductor Document Number:AN4749 Application Note Rev 0, 07/2013 Configuring CoreNet Platform Cache (CPC) as SRAM For Use by Linux Applications 1 Introduction This document provides, with

More information

Power Analysis of Link Level and End-to-end Protection in Networks on Chip

Power Analysis of Link Level and End-to-end Protection in Networks on Chip Power Analysis of Link Level and End-to-end Protection in Networks on Chip Axel Jantsch, Robert Lauter, Arseni Vitkowski Royal Institute of Technology, tockholm May 2005 ICA 2005 1 ICA 2005 2 Overview

More information

NAND Flash Architecture and Specification Trends

NAND Flash Architecture and Specification Trends NAND Flash Architecture and Specification Trends Michael Abraham (mabraham@micron.com) NAND Solutions Group Architect Micron Technology, Inc. August 2012 1 Topics NAND Flash Architecture Trends The Cloud

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

OpenSoC Fabric: On-Chip Network Generator

OpenSoC Fabric: On-Chip Network Generator OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Vorlesung Rechnerarchitektur 2 Seite 178 DASH Vorlesung Rechnerarchitektur 2 Seite 178 Architecture for Shared () The -architecture is a cache coherent, NUMA multiprocessor system, developed at CSL-Stanford by John Hennessy, Daniel Lenoski, Monica

More information

Hardware/Software Partitioning of Operating Systems The δ Hardware/Software RTOS Generation Framework for SoC

Hardware/Software Partitioning of Operating Systems The δ Hardware/Software RTOS Generation Framework for SoC Hardware/Software Partitioning of Operating Systems The δ Hardware/Software RTOS Generation Framework for SoC Vincent J. Mooney III http://codesign.ece.gatech.edu Assistant Professor, School of Electrical

More information