Computing for Data-Intensive Applications:



Similar documents
1. Memory technology & Hierarchy

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Chapter 4 System Unit Components. Discovering Computers Your Interactive Guide to the Digital World

Discovering Computers Living in a Digital World

CSCI 4717 Computer Architecture. Function. Data Storage. Data Processing. Data movement to a peripheral. Data Movement

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

ICRI-CI Retreat Architecture track

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Chapter 1 Computer System Overview

Pedraforca: ARM + GPU prototype

High Performance Computing in the Multi-core Area

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Storage Class Memory and the data center of the future

Parallel Programming Survey

Developments in Point of Load Regulation

Low Power AMD Athlon 64 and AMD Opteron Processors

FPGA-based MapReduce Framework for Machine Learning

Advanced Core Operating System (ACOS): Experience the Performance

Memory Hierarchy. Arquitectura de Computadoras. Centro de Investigación n y de Estudios Avanzados del IPN. adiaz@cinvestav.mx. MemoryHierarchy- 1

Memory Architecture and Management in a NoC Platform

Router Architectures

CHAPTER 2: HARDWARE BASICS: INSIDE THE BOX

Enabling Technologies for Distributed and Cloud Computing

Mixed-Criticality: Integration of Different Models of Computation. University of Siegen, Roman Obermaisser

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

GPUs for Scientific Computing

Chapter 2 Heterogeneous Multicore Architecture

Enabling Technologies for Distributed Computing

Binary search tree with SIMD bandwidth optimization using SSE

Efficient Big Data Analytics Computing: A Research Challenge

Generations of the computer. processors.

Memristor-Based Reactance-Less Oscillator

OpenSoC Fabric: On-Chip Network Generator

Scaling from Datacenter to Client

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures

1.Introduction. Introduction. Most of slides come from Semiconductor Manufacturing Technology by Michael Quirk and Julian Serda.

Industry First X86-based Single Board Computer JaguarBoard Released

RAM & ROM Based Digital Design. ECE 152A Winter 2012

NAND Flash Architecture and Specification Trends

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Fault Tolerant Servers: The Choice for Continuous Availability

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

Introduction to VLSI Programming. TU/e course 2IN30. Prof.dr.ir. Kees van Berkel Dr. Johan Lukkien [Dr.ir. Ad Peeters, Philips Nat.

Intel PCI and PCI Express*

White Paper The Numascale Solution: Extreme BIG DATA Computing

NTE2053 Integrated Circuit 8 Bit MPU Compatible A/D Converter

NAND Flash Architecture and Specification Trends

International Journal of Electronics and Computer Science Engineering 1482

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

A Flexible Cluster Infrastructure for Systems Research and Software Development

Principles and characteristics of distributed systems and environments

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

SPARC64 VIIIfx: CPU for the K computer

Republic Polytechnic School of Information and Communications Technology C226 Operating System Concepts. Module Curriculum

Offloading file search operation for performance improvement of smart phones

Riding silicon trends into our future

Accelerating Microsoft Exchange Servers with I/O Caching

DDR subsystem: Enhancing System Reliability and Yield

Parallel Computing. Benson Muite. benson.

Drivers to support the growing business data demand for Performance Management solutions and BI Analytics

Non-Volatile Memory. Non-Volatile Memory & its use in Enterprise Applications. Contents

Spacecraft Computer Systems. Colonel John E. Keesee

Next Generation GPU Architecture Code-named Fermi

Lecture 2 Parallel Programming Platforms

CS257 Introduction to Nanocomputing

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Computer Architecture

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Issue 1.0 Jan A Short Guide to Quality and Reliability Issues In Semiconductors For High Rel. Applications

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

The Big Picture. Cache Memory CSE Memory Hierarchy (1/3) Disk

AMD PhenomII. Architecture for Multimedia System Prof. Cristina Silvano. Group Member: Nazanin Vahabi Kosar Tayebani

How To Build A Cloud Computer

Slide Set 8. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

Big Data Challenges in Bioinformatics

Conditioned Reflex Mimic Circuit Design

Introducción. Diseño de sistemas digitales.1

Managing Data Center Power and Cooling

DDR4 Memory Technology on HP Z Workstations

Build an Energy Efficient Supercomputer from Items You can Find in Your Home (Sort of)!

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

The Central Processing Unit:

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

- Nishad Nerurkar. - Aniket Mhatre

Digital Design for Low Power Systems

Chapter 2 Logic Gates and Introduction to Computer Architecture

White Paper The Numascale Solution: Affordable BIG DATA Computing

Computer Systems Structure Main Memory Organization

Hadoop on the Gordon Data Intensive Cluster

MULTIPLE CHOICE FREE RESPONSE QUESTIONS

Power Reduction Techniques in the SoC Clock Network. Clock Power

Recommended hardware system configurations for ANSYS users

Transcription:

Computing for Data-Intensive Applications: Beyond CMOS and Beyond Von-Neumann Said Hamdioui Computer Engineering Delft University of Technology The Netherlands Workshop on Memristive systems for Space applications European Space Agency, ESTEC 30 April 2015, Noordwijk, Netherlands The big picture Applications Storage Computing efficiency Computers What is the solution? Technology High cost Reduced reliability Saturated Clk Higher power 2 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 1

Contents Technology The good, the bad and the challenging Computing The good, the bad and the challenging Future computers: possible scenarios Toward a new computing paradigm: CIM Architecture Technology, potential & open questions Conclusion 3 Technology.The good Density: double Dimensions reduce by 30% WxL: 1x1 0.7x0.7 = 0.5 Reduced IC cost Performance: 43% increase Gate delay reduced by 30% C=(0.7*0.7)/0.7= 0.7 freq =1/0.7 = 1.43 up freq ~ doubled every generation (till 2004) Power Constant voltage scaling Power= C x V 2 x f = 1 Constant field scaling Power= C x V 2 x f= 0.5 Vdd=0.7 [source: Mark Horowitz, Vdd=1 Stanford Uni] [Groeseneken, Scaling has been successful Financial investment model ESSSDERC 10] Vdd=1 4 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 2

Technology.The bad FIT Failure rate Burn-in out Of steam? Technology Scaling: Increasing transient errors Increasing wearout failures [Ref: NASA] Ref. H-Y. Kang, 2005 Infant mortality ~ 1 to 20 weeks Normal lifetime ~ 3 to 15 years Time Wearout Higher failure rate & reduced life time Components are becoming Unreliable 5 Technology.The challenging (long term) Source: ITRS + SEMATEC High static leakage (volatile) Unreliable components Expensive solutions Economically not affordable Complex manufacturing Low yield High cost Limited scalability Comes at additional cost END of CMOS scaling Less/no economical benefit => need of new device technologies 6 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 3

Computing.The good Early computers Core 1 core with embedded cache Core Cache Multi/many core with shared L2 Core 1 Core n L1 Cache L1 Cache L2 Cache DRAM DRAM DRAM µproc: ~55%/year (2X/1.5yr) DRAM: 7%/year (2X/10yrs) CPU-Mem Perfor. Gap (grows 50%/year) [source: Mark Horowitz, Stanford Uni] Ref: Intel 7 Computing.The bad High energy consumption Dominated by com & memory 70 to 90% for data-ints. appl Chip-level energy trends Limited data-bandwidth Communication bottleneck Stored program principle Complex programmability Memory coherency programmability overheard (scheduling policies, priorities, mapping,..) Reduced / saturated performance Enhancement based on expensive on chip memory (~70% of area) Requires LD & ST: killers of overall perf TUDelft/ Computer Engineering Source: S. Borkar, Exascale Computing - a fact or a fiction?, IPDPS, 2013 HPC system-level power break-down 2012 2018 Source: R. Nair, Active Memory Cube, 2 nd Workshop on Near-Data Processing, 2014 8 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 4

Computing.The bad Ref: Jorn W. Janneck, Computing in the age of parallelism: Challenges and opportunities, Multicore Day, Sept 2013 TUDelft/ Computer Engineering 9 Computing.The bad Intel desktop processor Speed-up is no longer the result of a faster clock Instead, we get more parallel devices... BUT: hard to scale!! Ref: Jorn W. Janneck, Computing in the age of parallelism: Challenges and opportunities, Multicore Day, Sept 2013 TUDelft/ Computer Engineering 10 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 5

Computing.The challenging Extremely data intensive application (Big Data problems) Economics, business, science, social media, healthcare, etc. Data storage and analysis E.g., 1PB@1000MB/sec = 12,5 days! Assume transfer rate of Front Side Bus at 1000MB/s Speed information increase exceed Moore s law Data size has already surpassed the capabilities of today s computation architectures All suffer from communication and memory bottleneck Core => need of new architectures Memory TUDelft/ Computer Engineering 11 Future computers: possible scenarios Near memory computing Bring computation closer to data From computer-centric to data-centric model Old concept, but renewed interest due to: Technology trends Big data New technologies (3D) HP: Machine Discard conv. computer model? Flatten complex data hierarchies Bring processing closer to the data Electronics for computing, Photons for communications and ions for storage Source: HP 12 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 6

Toward a new computing paradigm How to address: memory wall/ communication and big data? Core 1 L1 Cache Core n L1 Cache???? Core + Data memory L2 Cache DRAM External Memory Integrate storage and computation in the same physical location Significantly reduces communication/ memory bottleneck Use non-volatile technology Practically zero leakage Support massive parallelism Crossbar architecture [Source: S. Hamdioui, et.al, DATE 2015] 13 Toward a new computing paradigm CIM architecture: Computing in Memory Core + data memory External Memory Communication & Control Communication & Control Crossbar topology Dense, non-volatile two terminal device at each junction No/ limited memory wall Data will be loaded only at the first time Scalability at low cost Significant reduction in area Nanomtric device dimension (below 10 nm) Top electrode Junction with switching material Bottom electrode [Source: S. Hamdioui, et.al, DATE 2015] 14 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 7

Toward a new computing paradigm: CIM technology Requirements for switching device Dual functionality Realize both memory and logic functions Low energy consumption Low/zero leakage: Non-volatility; Reduce the overall power consumption Scalability/ Nanometric dimensions Realize extreme density at low price and reduce area CMOS compatibility Enhance manufacturing at low cost Two terminal device with metal/insulator/metal structure Realize the crossbar architecture Good endurance & Good Reliability Solution Emerging resistive switching devices = Memristors* * L.Chua, Resistance Switching Memories Are Memristors, 2014, Memristor Networks page 21-51, Springer 15 CIM technology: Memristor Invention 1971: Leon Chua proposed resistor with memory hysteretic current-voltage characteristic Demonstration 2008, HP manufactured memristor Analysed different properties integration density, leakage,.. Potential applications Memory Logic Etc 16 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 8

CIM potential Examples Healthcare: DNA sequencing we assume we have 200 GB of DNA data to be compared to A healthy reference of 3GB for 50% coverage** Mathematic: 10 6 parallel additions [**E. A. Worthey, Current Protocols in Human Genetics, 2001] Assumptions Conventional architecture FinFET 22nm multi-core implementation, with scalable number of clusters, each with 32 ALU (e.g comparator) 64 clusters; each cluster share a 8KB L1 cache CIM architecture Memristor 5nm crossbar implementation The crossbar size equals to total cache size of CMOS computer [Source: S. Hamdioui, et.al, DATE 2015] 17 CIM potential Metrics Energy-delay/operation Computing efficiency : number of operations per required energy Performance area : number of operations per required area Results 10 6 additions 1.5043e-18 9.25702-21 6.5226e+9 3.9063e+12 5.1118e+09 4.9164e+12 Key drives: Reduced memory bottleneck, non-volatile technology, massive parallelism > x100 > x100 > x100 [Source: S. Hamdioui, et.al, DATE 2015] 18 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 9

CIM Partners 19 8. Conclusion Constant voltage scaling (CMOS) Complex faults & unreliable components Higher leakage: volatile technology Increasing cost: Lower yield, limited scalability, Higher business pressure => limits the applicability & benefitability Short term: Not only new tools/ IC flow required, but also innovations in DFX/BISX both for manufacturing and online/in field testing, monitoring, characterization, recovery,. Long term: Alternative device technologies; E.g., Memristor 20 Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 10

8.Conclusion Von-Neumann based computers Memory & communication bottleneck Complex progammability of multi-cores Higher power consumption Big data & data-intensive applications => Unable to solve (today) and future big data problems Short term: Specialization: application-specific accelerators (reduced prog) Near memory computing, accelerator around memories (data-centric model) Long term: Alternative architecture, beyond Von Neumann E.g., CIM architecture 21 Computing for Data-Intensive Applications: Beyond CMOS and Beyond Von-Neumann Said Hamdioui Computer Engineering Delft University of Technology The Netherlands Workshop on Memristive systems for Space applications European Space Agency, ESTEC 30 April 2015, Noordwijk, Netherlands Said Hamdioui, Computer Engineering, Delft University of Technology, the Netherlands 11