Tilera (RAW) Processor. Jason Mars and Robbie Hott

Similar documents
A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Introduction to Cloud Computing

Accelerating the Data Plane With the TILE-Mx Manycore Processor

Chapter 1 Computer System Overview

Networking Goes Open-Source. Michael Zimmerman VP Marketing, Tilera

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

How To Build A Cloud Computer

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

FLIX: Fast Relief for Performance-Hungry Embedded Applications

VP/GM, Data Center Processing Group. Copyright 2014 Cavium Inc.

SOC architecture and design

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan

CISC, RISC, and DSP Microprocessors

Industry First X86-based Single Board Computer JaguarBoard Released

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Infrastructure Matters: POWER8 vs. Xeon x86

Architectures and Platforms

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Data and Control Plane Interconnect solutions for SDN & NFV Networks Raghu Kondapalli August 2014

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

- Nishad Nerurkar. - Aniket Mhatre

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Design Cycle for Microprocessors

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

White Paper. Innovate Telecom Services with NFV and SDN

Marvell DragonFly Virtual Storage Accelerator Performance Benchmarks


Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

The Orca Chip... Heart of IBM s RISC System/6000 Value Servers

7a. System-on-chip design and prototyping platforms

Comparative Performance Review of SHA-3 Candidates

Secure Cloud Storage and Computing Using Reconfigurable Hardware

Product Brief. R7A-200 Processor Card. Rev 1.0

Networking Virtualization Using FPGAs

Operating System Support for Multiprocessor Systems-on-Chip

Cut Network Security Cost in Half Using the Intel EP80579 Integrated Processor for entry-to mid-level VPN

Central Processing Unit (CPU)

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Enabling Technologies for Distributed Computing

Intel Xeon Processor E5-2600

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Intel Xeon +FPGA Platform for the Data Center

Memory Architecture and Management in a NoC Platform

Switch Fabric Implementation Using Shared Memory

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Scalable Cyber-Security for Terabit Cloud Computing IEEE High Performance Extreme Computing. Jordi Ros-Giralt / giralt@reservoir.

Computer Architecture TDTS10

SPARC64 VIIIfx: CPU for the K computer

LS DYNA Performance Benchmarks and Profiling. January 2009

High-Performance Modular Multiplication on the Cell Processor

Enabling Technologies for Distributed and Cloud Computing

SERVER CLUSTERING TECHNOLOGY & CONCEPT

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Intel RAID RS25 Series Performance

ARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Azul Compute Appliances

White Paper Utilizing Leveling Techniques in DDR3 SDRAM Memory Interfaces

Microwatt to Megawatt - Transforming Edge to Data Centre Insights

Chapter 4 System Unit Components. Discovering Computers Your Interactive Guide to the Digital World

AMD PhenomII. Architecture for Multimedia System Prof. Cristina Silvano. Group Member: Nazanin Vahabi Kosar Tayebani

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Advanced Core Operating System (ACOS): Experience the Performance

Pivot3 Reference Architecture for VMware View Version 1.03

Fusion iomemory iodrive PCIe Application Accelerator Performance Testing

Benchmarking Large Scale Cloud Computing in Asia Pacific

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6

International Journal of Computer & Organization Trends Volume20 Number1 May 2015

OpenSoC Fabric: On-Chip Network Generator

Multicore Processors A Necessity By Bryan Schauer

FPGAs for Trusted Cloud Computing

COMPUTING. SharpStreamer Platform. 1U Video Transcode Acceleration Appliance

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

System Requirements Table of contents

High-Density Network Flow Monitoring

Datasheet. Enterprise Gateway Router with Gigabit Ethernet. Models: USG, USG-PRO-4. Advanced Security, Monitoring, and Management

OC By Arsene Fansi T. POLIMI

Kalray MPPA Massively Parallel Processing Array

Transforming your IT Infrastructure for Improved ROI. October 2013

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server

Standardization with ARM on COM Qseven. Zeljko Loncaric, Marketing engineer congatec


High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

The Engine for Digital Transformation in the Data Center

Evaluating the Intel EP80579 Integrated Processor

Mainframe. Large Computing Systems. Supercomputer Systems. Mainframe

Core Syllabus. Version 2.6 C OPERATE KNOWLEDGE AREA: OPERATION AND SUPPORT OF INFORMATION SYSTEMS. June 2006

Intel Itanium Architecture

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Transcription:

Tilera (RAW) Processor Jason Mars and Robbie Hott

2 http://nextbigfuture.com/2010/06/quanta-and-tilera-launch-s2q-cloud.html

RAW Beginnings http://groups.csail.mit.edu/cag/raw/gallery/raw_chip_photos/ 3

RAW Philosophy Some networks are mem mapped (registers), others use explicit messages Routers are programmable 4 http://groups.csail.mit.edu/cag/raw/documents/ieee-micro-2002.pdf

16 Tile RAW 3 cycles between tiles 5 http://groups.csail.mit.edu/cag/raw/documents/ieee-micro-2002.pdf

Software ASICs Place and route software circuits (video) 5x faster than 700mhz PIII 6 http://groups.csail.mit.edu/cag/raw/documents/ieee-micro-2002.pdf

Compiling for RAW: Equivalence Class Unification Based on pointer analysis 7 http://groups.csail.mit.edu/cag/raw/documents/ieee-micro-2002.pdf

Compiling for RAW: Modulo Unrolling Basic Parallelization 8 http://groups.csail.mit.edu/cag/raw/documents/ieee-micro-2002.pdf

Comparing RAW to PIII 9 http://groups.csail.mit.edu/cag/raw/documents/ieee-micro-2002.pdf

Win Some / Lose Some 10 http://groups.csail.mit.edu/cag/raw/documents/ieee-micro-2002.pdf

RAW to Tilera 11

Tilera TILE-Gx Series Low-power multi-core RISC architecture 16, 36, 64, and 100 core models Up to 1.5 GHz clock frequency On-chip tiled mesh network Each tile could operate as an individual processor Multiple tiles can be used to run SMP Linux TILE-Gx Tiles include: 64-bit VLIW cores, 3-wide pipeline 64-entry register file DSP and SIMD extensions 12

TILE-Gx100 Tile 13 http://www.tilera.com/sites/default/files/productbriefs/pb025_tile-gx_processor_a_v3.pdf

TILE-Gx100 Chip Overview 14 http://www.tilera.com/sites/default/files/productbriefs/pb025_tile-gx_processor_a_v3.pdf

TILE-Gx100 Memory Hierarchy Scalable caching system: 32MB total on-chip cache Each tile has L1 and L2 cache 32 KB L1 Instruction Cache 32 KB L1 Data Cache 256 KB L2 Cache Access through on-chip network patent pending DDC TM (Dynamic Distributed Cache) technology provides a fully coherent shared cache system across an arbitrarily-sized array of tiles No large centralized cache TileDirect: coherent I/O directly into the tile caches 15

Tilera Extra Features On-chip Connections 4 DDR3 Memory controllers PCIe, USB, and Network interfaces MiCA (Multistream imesh Crypto Accelerator) for encryption, hashing, public key ops, at 40Gbps / 50,000 RSA ops/sec On-chip network eliminates on-chip bus interconnect Information must flow between processor cores or between cores and the memory / I/O controllers imesh network provides each tile with more than 1Tbps of interconnect bandwith 16

Power Consumption Processor Number of Cores Frequency Avg Power Consumption TILEPro36 36 500MHz 9-13W TILE64/Pro64 64 700MHz/866MHz 15-23W (all cores) TILE-Gx36 36 1.25GHz/1.5GHz 10-55W TILE-Gx64 64 1.25GHz/1.5GHz 10-55W TILE-Gx100 100 1.25GHz/1.5GHz 10-55W Intel Core i7-920 4 2.66GHz 130W (max TDP) 17 http://www.tilera.com/ http://ark.intel.com/product.aspx?id=37147

Package Size Processor Number of Cores Package Size TILEPro36 36 40mm x 40mm TILE64/Pro64 64 40mm x 40mm TILE-Gx36 36 35mm x 35mm TILE-Gx64 64 45mm x 45mm TILE-Gx100 100 45mm x 45mm Intel Core i7-920 4 42.5mm x 45mm 18 http://www.tilera.com/ http://ark.intel.com/product.aspx?id=37147

Dark Silicon? With so many cores, something has to be off? TILEPro64 draws up to 23W with ALL CORES running Individual idle cores can be turned off Q: How best to configure 100 cores? 19

Intended Market General purpose processor market TILE-Gx can run multiple OSes and applications simultaneously Four main categories Networking Machines (monitoring, firewall, vpn) Wireless Multimedia Production (streaming, conferencing) Cloud computing (servers) Server Market Around 10,000 cores in an 8kW rack Quanta S2Q Server: 8 TILEPro64 chips (512 cores) at 400W 20

Further Questions Interesting research areas into how to best control 100 cores on chip How best to organize cache data in the distributed cache model TILE-Gx100 vs Intel Core/Xeon/Atom benchmarks How might the cores have changed since RAW? What would it take to displace Intel? Which markets? 21

Further Reading Tilera Homepage: www.tilera.com RAW Publications: groups.csail.mit.edu/cag/raw/ documents M. Taylor et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, March 2002. J. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff, Energy Characterization of a Tiled Architecture Processor with On-Chip Networks, in 2003 ISLPED, 2003, pp. 424 427. 22

Questions?

24 http://nextbigfuture.com/2010/06/quanta-and-tilera-launch-s2q-cloud.html