Technology-Driven Architecture Innovations: Opportunities and Challenges

Similar documents
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

FPGA-based MapReduce Framework for Machine Learning

Non-Volatile Memory. Non-Volatile Memory & its use in Enterprise Applications. Contents

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Parallel Programming Survey

Flash & DRAM Si Scaling Challenges, Emerging Non-Volatile Memory Technology Enablement - Implications to Enterprise Storage and Server Compute systems

Embedded STT-MRAM for Mobile Applications:

Non-Volatile Memory and Its Use in Enterprise Applications

2014 EMERGING NON- VOLATILE MEMORY & STORAGE TECHNOLOGIES AND MANUFACTURING REPORT

How To Build A Cloud Computer

~ Greetings from WSU CAPPLab ~

Cloud Data Center Acceleration 2015

Chapter 1 Computer System Overview

OpenSoC Fabric: On-Chip Network Generator

Computing for Data-Intensive Applications:

Secured Embedded Many-Core Accelerator for Big Data Processing

Storage Class Memory and the data center of the future

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

New Dimensions in Configurable Computing at runtime simultaneously allows Big Data and fine Grain HPC

{The Non-Volatile Memory Technology Database (NVMDB)}, UCSD-CSE Techreport CS

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Scaling from Datacenter to Client

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

Naveen Muralimanohar Rajeev Balasubramonian Norman P Jouppi

Computer Systems Structure Main Memory Organization

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Nanotechnologies for the Integrated Circuits

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

The Evolving NAND Flash Business Model for SSD. Steffen Hellmold VP BD, SandForce

Implications of Storage Class Memories (SCM) on Software Architectures

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages

CISC, RISC, and DSP Microprocessors

快 閃 記 憶 體 的 產 業 應 用 與 製 程

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

CS252 Graduate Computer Architecture Spring 2014 Lecture 1: Introduc<on

C. Mohan, IBM Almaden Research Center, San Jose, CA

Xeon+FPGA Platform for the Data Center

Introduction to System-on-Chip

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Kalray MPPA Massively Parallel Processing Array

Flash and Storage Class Memories. Technology Overview & Systems Impact. Los Alamos/HECFSIO Conference August 6, 2008

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Cray Gemini Interconnect. Technical University of Munich Parallel Programming Class of SS14 Denys Sobchyshak

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Technology, Market and Cost Trends. 12. April 2015 Bernd Panzer-Steindel, CERN IT CTO 1

Discovering Computers Living in a Digital World

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Research Statement. Hung-Wei Tseng

Trends in High-Performance Computing for Power Grid Applications

PPGCC. Non-Volatile Memory: Emerging Technologies And Their Impacts on Memory Systems. Taciano Perez, César A. F. De Rose. Technical Report Nº 060

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

Towards Rack-scale Computing Challenges and Opportunities

A PRAM and NAND Flash Hybrid Architecture for High-Performance Embedded Storage Subsystems

Data Centric Systems (DCS)

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Next Generation GPU Architecture Code-named Fermi

Purchase of High Performance Computing (HPC) Central Compute Resources by Northwestern Researchers

High Performance Computing in the Multi-core Area

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Central Processing Unit

Masters Project Proposal

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Networking Virtualization Using FPGAs

How To Make A High Throughput Computing Data Center

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER

SeaMicro SM Server

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

Hardware accelerated Virtualization in the ARM Cortex Processors

The Evolving Role of Flash in Memory Subsystems. Greg Komoto Intel Corporation Flash Memory Group

The Central Processing Unit:

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Embedded Systems: map to FPGA, GPU, CPU?

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

Adaptive Bandwidth Management for Performance- Temperature Trade-offs in Heterogeneous HMC+DDRx Memory

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

Hardware Acceleration for CST MICROWAVE STUDIO

3D innovations: From design to reliable systems

1 / 25. CS 137: File Systems. Persistent Solid-State Storage

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

A Study of Application Performance with Non-Volatile Main Memory

SERVER CLUSTERING TECHNOLOGY & CONCEPT

Transcription:

Scalable and Energy-Efficient Architecture Lab (SEAL) Technology-Driven Architecture Innovations: Opportunities and Challenges Past, Present, and Future Yuan Xie University of California, Santa Barbara Architecture 2030 Workshop.1 June 18, 2016

Technology and Architecture Interaction Technology or Architecture: Contribution Contribution to computer performance growth roughly equally between technology and architecture, with architecture credited with ~80 improvement since 1985* *Danowitz, et al., CPU DB: Recording Microprocessor History, CACM 04/2012 Technology and Architecture:Evolving Interaction New technologies affect decision making by architects Development in architecture impacts the viability of technologies IEEE Computer, 09/1991 Architecture 2030 Workshop.2 June 18, 2016

Technology and Architecture Interaction (1991) Two technology trends: Transistor scaling Increasing memory density Two architecture trends: Processor Architecture: Pipelining/ILP Memory Architecture: Caching Architecture 2030 Workshop.3 June 18, 2016

The Next 15 Years in Computer Architecture Research? The best way to predict the future is to study the past - Robert Kiyosaki After Hennessy&Jouppi s Summary in 1991, what was the trend since then? We studied the topics of each ISCA papers from 1992-2016 IEEE Computer, 09/1991 Architecture 2030 Workshop.4 June 18, 2016

Topics in Components Memory architecture gains more importance since 2005 Interconnect architecture since 2002 (NoC) GPU architecture since 2008 Accelerator architecture since 2008 30 25 30 25 20 25 20 20 15 15 15 10 10 10 5 Processor+M+Interconnect+GPU+Accelerator Processor+Memory+Interconnect+GPU +Interconnect Processor Memory Processor Processor Accelerators Memory Interconnect Memory Interconnect GPU GPU Interconnect 0 1992 1992 1996 1996 2000 2000 2004 2004 2008 2008 2012 2012 2016 Architecture 2030 Workshop.5 June 18, 2016

Topics on Optimization Goals: 45 40 35 30 25 20 15 10 5 0 ISCA 2000: Power/Energy Reliability Performance 1992 1996 2000 2004 2008 2012 2016 Wattch (Princeton) and SimplePower (PennState) (2000) Transient fault detection via simultaneous multithreading (2000) Power/Reliability became major topics for architecture research since 2000 Architecture 2030 Workshop.6 June 18, 2016

CMOS Technology May Not Scale Anymore 1000 180 130 90 Technology Node (nm) 100 10 65 45 32 22 14nm slips by 2 quarters 14 10nm slips by 5-6 quarters 10 7nm by end 2020? 7 1 Street Dates for Intel s Lead Generation Products Courtesy David Brooks @ Harvard Architecture 2030 Workshop.7 June 18, 2016

Emerging Technologies Emerging Technologies other than traditional CMOS scaling may provide new opportunities for new architecture innovations 3D die-stacking Non-volatile memory Nanophotonics Quantum Architecture 2030 Workshop.8 June 18, 2016

Technology-Driven Architecture Technology and Architecture:Evolving Interaction New technologies affect decision making by architects Development in architecture impacts the viability of technologies A Case Study on 3D Die-Stacking Architecture Microprocessor 3D IC 2002/11/11 Architecture 2030 Workshop.9 June 18, 2016

Architecture 2030 Workshop.10 June 18, 2016

Intel 3D Pentium 4 (ICCD 2004) Source: Intel Top Bottom Source: B.Black (Intel) Architecture 2030 Workshop.11 June 18, 2016

ISCA 2006 Architecture 2030 Workshop.12 June 18, 2016

Intel s 3D +NOC Prototyping (2007) Courtesy: T. Karnik (Intel) Architecture 2030 Workshop.13 June 18, 2016

Gabe Loh, Yuan Xie. 3D Stacked Microprocessor: Are We There Yet? IEEE Micro, Volume 30 Issue 3, pp. 60-64, May. 2010 14 Architecture 2030 Workshop.14 June 18, 2016

2D XPU+ 3D Memory = 2.5D Integration TSV-based 3D integration Microprocessor Microprocessor Silicon Interposer 3D Stacked Silicon Stacked Silicon with TSV-based 3D integration More and more transistors can be integrated into a single package About 100MB-1GB on-package would be available How to use these transistors efficiently? Multi-core, and many-core? Larger cache size or deeper cache hierarchy? On-package main memory? X. Dong et al. Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support (SC 2010) Architecture 2030 Workshop.15 June 18, 2016 15

In package 3D Memory with GPU Conventional GDDRs, off chip Wide bus routing on silicon interposer Top View Side View Optimizing GPU Energy Efficiency with 3D Die-stacking Graphics Memory and Reconfigurable Memory Interface. Jishen Zhao, Yuan Xie, Gabe Loh, ISLPED 2012. Architecture 2030 Workshop.16 June 18, 2016

Die-Stacking is Happening AMD Announcement on June 16, 2015 The Fiji GPU Packaging is 50x50mm The interposer size is 26x32mm The GPU is about 20x24mm There are four 1GB HBM stacks for a total of 4GB of memory 17 Architecture 2030 Workshop.17 June 18, 2016

Nvidia Pascal (3/2016) Architecture 2030 Workshop.18 June 18, 2016

Architecture 2030 Workshop.19 June 18, 2016

Technology-Driven Architecture Innovation New technologies affect decision making by architects Development in architecture impacts the viability of technologies Technology development multi-core coarse-granularity up (ISCA06,MICRO06) 3D->2.5D (SC10) 2.5D GPU (ISLPED12) 2004 2007 2011 2015 1990-2004 2006 2010 2012 Single-core multi-core Fine-granularity up coarse-granularity up HMC 2.5D GPU AMD Fury X Architecture 2030 Workshop.20 June 18, 2016

Emerging Non-volatile Memories Magnetic RAM (MRAM) EverSpin (130nm, up to 16Mb) Spin-Torque-Transfer RAM (STTRAM) Grandis (54nm, acquired by Samsung) Phase-Change RAM (PCRAM) Samsung (20nm, diode, up to 8Gb) Resistive RAM (ReRAM) Micron (16Gb, 27nm, ISSCC14) Intel 3D Xpoints (2016) Architecture 2030 Workshop.21 June 18, 2016 21

Architecture Opportunities with NVM On-chip memory (SRAM, MRAM) Off-chip memory (, PRAM, ReRAM) Solid State Disk (Flash Memory, PRAM, ReRAM) Secondary Storage (HDD) Opportunities: Leveraging NVM as LLC/Memory/Storage (HPCA 09-10, ISCA11,12,16) Performance and energy improvement Leveraging Nonvolatility for instant power-on/power-off (HPCA15) Leveraging Nonvolatility for persistency Support (MICRO13, MICRO14) Challenges: Wear-out, write-overhead, asymmetric read/write Architecture 2030 Workshop.22 June 18, 2016

Emerging Application Domains Emerging application domains Mobile/embedded Data center AI/ML Application 12 10 8 6 4 2 Emerging Applications Neural Network Datacenter Mobile and Embedded 0 2002 2004 2006 2008 2010 2012 2014 2016 Architecture 2030 Workshop.23 June 18, 2016

The (Re)Rising of AI Applications Supercomputers Data Centers Smartphones Embedded Devices Business analytics Ad prediction Audio recognition Robotics Drug design Automatic translation Image analysis Source: Yuji Chen (ICT) Consumer electronics Architecture 2030 Workshop.24 June 18, 2016

Emerging Application + Emerging Technology When Emerging Application meets Emerging Technology -> Emerging Architecture Welcome to Session 1A-3 (11:40am 12:00pm)! PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory Ping Chi *, Shuangchn Li *, Tao Zhang, Cong Xu, Jishen Zhao δ, Yu Wang #, Yongpan Liu #, Yuan Xie * Architecture 2030 Workshop.25 June 18, 2016

Overall Statistics for ISCA 1992-2016 70 60 50 40 30 20 10 Architecture-driven Application-driven ISCA Ideas Technology-driven Total 0 1992 1996 2000 2004 2008 2012 2016 Architecture-driven innovations are still the dominant themes in ISCA Since 2000, there were increasing interests in Technology-driven architectural innovations: 3D, NVM, optical, Quantum etc. Application-driven architectural innovations: Datacenter, mobile, NN etc. Architecture 2030 Workshop.26 June 18, 2016

Data Center AI Application-Driven Innovations Mobile/Embedded Computer Architecture Innovations Technology scaling Technology-Driven Innovations Emerging Technologies 2 VT,intra Soft Errors NBTI Aging Power/Thermal 2 VT,inter PVT Variations 3D Integration Emerging NVM Architecture 2030 Workshop.27 June 18, 2016

Neuromophic Neural Network Large Graph Processing VR/AR Application Driven Heterogeneous Computing GPU ASIC FPGA Emerging Technology 3D Stacking TESLA P100 Cambricon 寒武纪 Intel HARP HP labs, 2012 STT RAM/ReRAM PCM/Memristor Architecture 2030 Workshop.28 June 18, 2016