A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015
Agenda Soft Machines Background Soft Machines VISC Architecture Roadmap Shasta VISC Processor Mojave VISC SoC Summary 2
Soft Machines Introduced Soft Machines VISC Architecture Oct 14 2-3x IPC speedup for up to 4x Perf/Watt, portable to all CPU ISAs Working 28nm VISC CPU and SoC prototype Developing VISC Architecture Processors and SoCs Customized to Guest ISA & I/F, Processor configuration, SoC features CPU/SoC licensing, Co-development and technology licensing Today we will preview Shasta and Mojave Shasta VISC Processor delivers server-class performance at mobile power Mojave VISC SoC platform scalable from smart mobile to servers To be announced in 2016 3
Soft Machines VISC Architecture
VISC Architecture Guest Sequential Code Single Thread OS & Hypervisor Guest ISA VISC SW Layer VISC Cores HW SW layer ISA Global Front End HW Threadlets Core2 Core3 Core4 Core2 Core3 Core4 L1 D$ L1 D$ L1 D$ L1 D$ L2$ & Memory VISC Architecture dynamically scales resources and is ISA independent 5
VISC Cores Dynamically Load Balance ST & MT Apps Single SW Thread Dual SW Threads Heavy App Heavy App Light App Cores HW Threads/Threadlets Cores HW Threads/Threadlets or Core2 Core2 Core2 L1 D$ L1 D$ L1 D$ L1 D$ VISC dynamically allocates resources across virtual cores based on individual application needs Performance/watt balanced for both single & multi-thread applications 6
Power Ratio VISC Cores Scale Power Linearly Perf-power DVFS Perf-power V cores 8.5 7.5 6.5 5.5 Core 1 4.5 P V 2 * F 3.5 2.5 Core 1 Core 2 1.5 P No. of virtual core resources 0.5 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Performance Ratio 7
VISC Architecture Platforms Cloud Networking 3-4x IPC speed up Guest ISA VISC Architecture Processors VISC Architecture Processors 2-3x IPC speed up Mobile / Desktop Smart Phones 3-4x IPC speed up IoT Gateways / Embedded 8
Roadmap
VISC TM Processor & SoC Roadmap 2015 2016 2017 2018 VISC Processors VISC Proof-of- Concept - 1VC/2C, 32 bit - 28nm Shasta (Mid 16)* - 1-2VC/2C or SMP 2-4VC/4C - 64 bit, 2GHz - 16nm Shasta+ - 1-4VC - 10nm Tahoe - 1-8VC - 10nm VISC SoCs - SoC Ref Design - 28nm Mojave (Mid 16)** - Shasta SMP 2-4VC/4ML2 - Customizable I/O features - 16nm Tabernas - Shasta+ SMP - 10nm Ordos - Tahoe SMP - 10nm *RTL available **SoC tape-out 10
Shasta VISC Processor
Shasta VISC Processor Single and Dual Core configuration Two physical cores act as 1 or 2 Cores Cores dynamically load balance to service threads 64-bit ISA Supports larger memory space addressing and more registers Support for Multiple Guest ISAs Also runs native VISC Apps 2GHz Frequency (16FF+) Up from ~500MHz prototype SMP configuration on top of Cores Proprietary coherency protocol Shasta VISC Dual Core Processor L1 D$ L1 D$ L2$ & Memory System Interface Unit Global Front End Core2 Core2 HW Threads (HW threadlets) 1 MB L2$ per physical core System interface unit Generic high speed 256-bit read/write bus adaptable to customer specification (AMBA, OCP, CoreConnect, etc..) 12
Shasta VISC Processor uarchitecture TH0 L1I$ 32KB Fetch 1 Instruction Assembly 1 Threadlet/ Formation 1 Threadlet Allocation & Scheduling 1 Core 1 EXE RH RF R F BP BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2 Threadlet Allocation & Scheduling 2 Core 2 EXE RH RF R F TH0 LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ 13
Shasta VISC Processor Pipeline 3 Stages 3 Stages 6 Stages+1 1 Stage 1-2/4 Cycles TH0 L1I$ 32KB Fetch 1 Instruction Assembly 1 Threadlet/ Formation 1 Threadlet Allocation & Scheduling 1 Core 1 EXE RH RF R F BP BP L1I$ 32KB Fetch 2 Instruction Assembly 2 Threadlet/ Formation 2 Threadlet Allocation & Scheduling 2 Core 2 EXE RH RF R F TH0 LSQ L1 D$ 32KB L2 $ LSQ L1 D$ 32KB L2 $ 4 Stages 14
SIU SIU Shasta VISC Processor SMP VISC Dual Core Processor 0 Global Front End HW Threads (HW threadlets) Core2 L1 D$ L1 D$ Core2 L2$ & Memory Coherency Support L2$ & Memory L1 D$ L1 D$ Core2 Core2 VISC Dual Core Processor 1 Global Front End HW Threads (HW threadlets) 15
Power Mobile Server Single Thread OOO Ways Perf/Watt OOO 8-Wide OOO Dual Core 16-Wide(8+8) OOO 5-Wide OOO Dual Core 10-Wide(5+5) OOO 2-Wide OOO 3-Wide OOO Dual core 6-Wide(3+3) OOO Dual Core 4-Wide (2+2) SPEC 2006 Score (geomean of int & fp) * All cores scaled to 16nm. Geomean of 32-bit SPEC2006 int and fp components with GCC4.6/4.7 or equivalent. 16
Power Mobile Server Shasta Delivers Server Performance at Mobile Power OOO 8-Wide OOO Dual Core 16-Wide(8+8) OOO 5-Wide OOO Dual Core 10-Wide(5+5) Shasta VISC Processor (1VC/2C) OOO 2-Wide OOO 3-Wide OOO Dual core 6-Wide(3+3) OOO Dual Core 4-Wide (2+2) SPEC 2006 Score (geomean of int & fp) * All cores scaled to 16nm. Geomean of 32-bit SPEC2006 int and fp components with GCC4.6/4.7 or equivalent. 17
Mojave VISC SoC
VISC SoC Platform Scalable SoC Architecture Ease of adding / deleting devices in SoC Robust design methodology allows Specification to tape out in < 9 months High Performance Low Power System Focus on Memory / Interconnect performance >200GB/s coherent fabric, 40 GB/s dual channel DDR4, 200 GB/s L3 High bandwidth Network and Storage connectivity High performance Multimedia & Graphics Industry Standard APIs and IP Blocks OpenGL, OpenGL ES, OpenMAX, OpenCL, AHCI SATA and XHCI USB Soft Machines Enhanced SoC Subsystems Plug-n-play HW/SW architecture for simplified system S/W development Security & ization Dedicated management subsystem 19
Mojave VISC SoC Quad VISC CPU 2x Shasta Processor Fast System Memory 1-4 Ch. LP/DDR4 2400-3200, 1-8 MB 4-way interleaved system cache (WB/PF/DMA) DRAM System Cache Quad VISC Shasta OCI Storage Network Network / Storage 1-2 1G E-net w/tcp partial offload/sriov Dual Storage SATA 6G Dual Flash UFS PCIe 3.0 8 Lanes ization/ Security System MMU & GIC Secure Zones: Secured Peripherals, Memory and Message Signaled Interrupts System MMU/GIC Secure Zones Mgmt CPU GPU ISP Video Enc/Dec Audio Multimedia/Graphics 400G 1TFLOPS, 800M-2B Tri/Sec OpenCL 2.0, OpenGL ES 3.2 HEVC Video Enc/Dec DTS Audio DSP Enterprise/ Management Trusted Platform, HW AES/DES/HMAC/SHA, Remote Management, Fine grain DVFS PCIe, USB2/3 Display Display/Imaging Triple 4K display outputs Dual 20MP ISP, inputs HD Audio codec 20
Summary VISC Architecture provides up to 4x Perf/Watt Dynamic Cores and Threadlets provide 2-3x IPC speedup Portable to all CPU ISAs Applicable to a broad range of markets First VISC products to be announced in 2016 Shasta VISC Processor delivers server-class performance at mobile power Mojave VISC SoC platform scalable from smart mobile to servers Contact Soft Machines for more information Smi-info@softmachines.com 21