A Scalable VISC Processor Platform for Modern Client and Cloud Workloads



Similar documents
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Hybrid Platform Application in Software Debug

Enabling Technologies for Distributed Computing

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Secure Containers. Jan Imagination Technologies HGI Dec, 2014 p1

Hardware accelerated Virtualization in the ARM Cortex Processors

Using Mobile Processors for Cost Effective Live Video Streaming to the Internet

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

System Design Issues in Embedded Processing

Xeon+FPGA Platform for the Data Center

Enabling Technologies for Distributed and Cloud Computing

VP/GM, Data Center Processing Group. Copyright 2014 Cavium Inc.

Multimedia Systems Hardware & Software THETOPPERSWAY.COM

COMPUTING. SharpStreamer Platform. 1U Video Transcode Acceleration Appliance

ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG

Chapter 1 Computer System Overview

Data Center and Cloud Computing Market Landscape and Challenges

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

ICRI-CI Retreat Architecture track

Going Linux on Massive Multicore

Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

PCI Express Impact on Storage Architectures and Future Data Centers

Scaling from Datacenter to Client

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

ARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM

Multi-Threading Performance on Commodity Multi-Core Processors

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Industry First X86-based Single Board Computer JaguarBoard Released

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

SiS AMD Athlon TM 64FX/PCI-E Solution. Silicon Integrated Systems Corp. Integrated Product Division April. 2004

The Transition to PCI Express* for Client SSDs

Datacenter Operating Systems

Putting it all together: Intel Nehalem.

Storage Architectures. Ron Emerick, Oracle Corporation

High Performance or Cycle Accuracy?

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Standardization with ARM on COM Qseven. Zeljko Loncaric, Marketing engineer congatec

Optimized dual-use server and high-end workstation performance

OC By Arsene Fansi T. POLIMI

Scaling Mobile Compute to the Data Center. John Goodacre

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

ECLIPSE Performance Benchmarks and Profiling. January 2009

7a. System-on-chip design and prototyping platforms

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Computer System Design. System-on-Chip

The Orca Chip... Heart of IBM s RISC System/6000 Value Servers

Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan

PCI Express and Storage. Ron Emerick, Sun Microsystems

Supercomputing Clusters with RapidIO Interconnect Fabric

Achieving QoS in Server Virtualization

LS DYNA Performance Benchmarks and Profiling. January 2009

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

Intel X58 Express Chipset

Secu6 Technology Co., Ltd. Industrial Mini-ITX Intel QM77 Ivy Bridge Mobile Motherboard Support 3 rd Generation Core i7 / i5 / i3 Mobile Processor

Memory Architecture and Management in a NoC Platform

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

FPO. Expanding Intel Architecture Flexibility in the Data Center. Markus Leberecht Data Center Solutions Architect, Intel EMEA March 20, 2013

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Servervirualisierung mit Citrix XenServer

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

What is a System on a Chip?

Performance Optimization and Debug Tools for mobile games with PlayCanvas

SABRE Lite Development Kit

Application Performance Analysis of the Cortex-A9 MPCore

New System-on-a-Chip (SoC) Intel Celeron and Pentium Processors Balance Cost, Performance, and Power for Everyday Computing

QorIQ T4 Family of Processors. Our highest performance processor family. freescale.com

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

Handling Multimedia Under Desktop Virtualization for Knowledge Workers

Kalray MPPA Massively Parallel Processing Array

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

Universal Flash Storage: Mobilize Your Data

Using Network Virtualization to Scale Data Centers

How To Build A Cloud Server For A Large Company

Stovepipes to Clouds. Rick Reid Principal Engineer SGI Federal by SGI Federal. Published by The Aerospace Corporation with permission.

ARM Processor Evolution

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Intel Xeon +FPGA Platform for the Data Center

Introduction to AMBA 4 ACE and big.little Processing Technology

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

1 Storage Devices Summary

NVIDIA GeForce GTX 580 GPU Datasheet

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Brainlab Node TM Technical Specifications

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

IOS110. Virtualization 5/27/2014 1

Transcription:

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015

Agenda Soft Machines Background Soft Machines VISC Architecture Roadmap Shasta VISC Processor Mojave VISC SoC Summary 2

Soft Machines Introduced Soft Machines VISC Architecture Oct 14 2-3x IPC speedup for up to 4x Perf/Watt, portable to all CPU ISAs Working 28nm VISC CPU and SoC prototype Developing VISC Architecture Processors and SoCs Customized to Guest ISA & I/F, Processor configuration, SoC features CPU/SoC licensing, Co-development and technology licensing Today we will preview Shasta and Mojave Shasta VISC Processor delivers server-class performance at mobile power Mojave VISC SoC platform scalable from smart mobile to servers To be announced in 2016 3

Soft Machines VISC Architecture

VISC Architecture Guest Sequential Code Single Thread OS & Hypervisor Guest ISA VISC SW Layer VISC Cores HW SW layer ISA Global Front End HW Threadlets Core2 Core3 Core4 Core2 Core3 Core4 L1 D$ L1 D$ L1 D$ L1 D$ L2$ & Memory VISC Architecture dynamically scales resources and is ISA independent 5

VISC Cores Dynamically Load Balance ST & MT Apps Single SW Thread Dual SW Threads Heavy App Heavy App Light App Cores HW Threads/Threadlets Cores HW Threads/Threadlets or Core2 Core2 Core2 L1 D$ L1 D$ L1 D$ L1 D$ VISC dynamically allocates resources across virtual cores based on individual application needs Performance/watt balanced for both single & multi-thread applications 6

Power Ratio VISC Cores Scale Power Linearly Perf-power DVFS Perf-power V cores 8.5 7.5 6.5 5.5 Core 1 4.5 P V 2 * F 3.5 2.5 Core 1 Core 2 1.5 P No. of virtual core resources 0.5 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Performance Ratio 7

VISC Architecture Platforms Cloud Networking 3-4x IPC speed up Guest ISA VISC Architecture Processors VISC Architecture Processors 2-3x IPC speed up Mobile / Desktop Smart Phones 3-4x IPC speed up IoT Gateways / Embedded 8

Roadmap

VISC TM Processor & SoC Roadmap 2015 2016 2017 2018 VISC Processors VISC Proof-of- Concept - 1VC/2C, 32 bit - 28nm Shasta (Mid 16)* - 1-2VC/2C or SMP 2-4VC/4C - 64 bit, 2GHz - 16nm Shasta+ - 1-4VC - 10nm Tahoe - 1-8VC - 10nm VISC SoCs - SoC Ref Design - 28nm Mojave (Mid 16)** - Shasta SMP 2-4VC/4ML2 - Customizable I/O features - 16nm Tabernas - Shasta+ SMP - 10nm Ordos - Tahoe SMP - 10nm *RTL available **SoC tape-out 10

Shasta VISC Processor

Shasta VISC Processor Single and Dual Core configuration Two physical cores act as 1 or 2 Cores Cores dynamically load balance to service threads 64-bit ISA Supports larger memory space addressing and more registers Support for Multiple Guest ISAs Also runs native VISC Apps 2GHz Frequency (16FF+) Up from ~500MHz prototype SMP configuration on top of Cores Proprietary coherency protocol Shasta VISC Dual Core Processor L1 D$ L1 D$ L2$ & Memory System Interface Unit Global Front End Core2 Core2 HW Threads (HW threadlets) 1 MB L2$ per physical core System interface unit Generic high speed 256-bit read/write bus adaptable to customer specification (AMBA, OCP, CoreConnect, etc..) 12

Shasta VISC Processor uarchitecture TH0 L1I$ 32KB Fetch 1 Instruction Assembly 1 Threadlet/ Formation 1 Threadlet Allocation & Scheduling 1 Core 1 EXE RH RF R F BP BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2 Threadlet Allocation & Scheduling 2 Core 2 EXE RH RF R F TH0 LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ 13

Shasta VISC Processor Pipeline 3 Stages 3 Stages 6 Stages+1 1 Stage 1-2/4 Cycles TH0 L1I$ 32KB Fetch 1 Instruction Assembly 1 Threadlet/ Formation 1 Threadlet Allocation & Scheduling 1 Core 1 EXE RH RF R F BP BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2 Threadlet Allocation & Scheduling 2 Core 2 EXE RH RF R F TH0 LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ 4 Stages 14

SIU SIU Shasta VISC Processor SMP VISC Dual Core Processor 0 Global Front End HW Threads (HW threadlets) Core2 L1 D$ L1 D$ Core2 L2$ & Memory Coherency Support L2$ & Memory L1 D$ L1 D$ Core2 Core2 VISC Dual Core Processor 1 Global Front End HW Threads (HW threadlets) 15

Power Mobile Server Single Thread OOO Ways Perf/Watt OOO 8-Wide OOO Dual Core 16-Wide(8+8) OOO 5-Wide OOO Dual Core 10-Wide(5+5) OOO 2-Wide OOO 3-Wide OOO Dual core 6-Wide(3+3) OOO Dual Core 4-Wide (2+2) SPEC 2006 Score (geomean of int & fp) * All cores scaled to 16nm. Geomean of 32-bit SPEC2006 int and fp components with GCC4.6/4.7 or equivalent. 16

Power Mobile Server Shasta Delivers Server Performance at Mobile Power OOO 8-Wide OOO Dual Core 16-Wide(8+8) OOO 5-Wide OOO Dual Core 10-Wide(5+5) Shasta VISC Processor (1VC/2C) OOO 2-Wide OOO 3-Wide OOO Dual core 6-Wide(3+3) OOO Dual Core 4-Wide (2+2) SPEC 2006 Score (geomean of int & fp) * All cores scaled to 16nm. Geomean of 32-bit SPEC2006 int and fp components with GCC4.6/4.7 or equivalent. 17

Mojave VISC SoC

VISC SoC Platform Scalable SoC Architecture Ease of adding / deleting devices in SoC Robust design methodology allows Specification to tape out in < 9 months High Performance Low Power System Focus on Memory / Interconnect performance >200GB/s coherent fabric, 40 GB/s dual channel DDR4, 200 GB/s L3 High bandwidth Network and Storage connectivity High performance Multimedia & Graphics Industry Standard APIs and IP Blocks OpenGL, OpenGL ES, OpenMAX, OpenCL, AHCI SATA and XHCI USB Soft Machines Enhanced SoC Subsystems Plug-n-play HW/SW architecture for simplified system S/W development Security & ization Dedicated management subsystem 19

Mojave VISC SoC Quad VISC CPU 2x Shasta Processor Fast System Memory 1-4 Ch. LP/DDR4 2400-3200, 1-8 MB 4-way interleaved system cache (WB/PF/DMA) DRAM System Cache Quad VISC Shasta OCI Storage Network Network / Storage 1-2 1G E-net w/tcp partial offload/sriov Dual Storage SATA 6G Dual Flash UFS PCIe 3.0 8 Lanes ization/ Security System MMU & GIC Secure Zones: Secured Peripherals, Memory and Message Signaled Interrupts System MMU/GIC Secure Zones Mgmt CPU GPU ISP Video Enc/Dec Audio Multimedia/Graphics 400G 1TFLOPS, 800M-2B Tri/Sec OpenCL 2.0, OpenGL ES 3.2 HEVC Video Enc/Dec DTS Audio DSP Enterprise/ Management Trusted Platform, HW AES/DES/HMAC/SHA, Remote Management, Fine grain DVFS PCIe, USB2/3 Display Display/Imaging Triple 4K display outputs Dual 20MP ISP, inputs HD Audio codec 20

Summary VISC Architecture provides up to 4x Perf/Watt Dynamic Cores and Threadlets provide 2-3x IPC speedup Portable to all CPU ISAs Applicable to a broad range of markets First VISC products to be announced in 2016 Shasta VISC Processor delivers server-class performance at mobile power Mojave VISC SoC platform scalable from smart mobile to servers Contact Soft Machines for more information Smi-info@softmachines.com 21