Scaling Mobile Compute to the Data Center. John Goodacre
|
|
- Melvin Lionel Nichols
- 8 years ago
- Views:
Transcription
1 Scaling Mobile Compute to the Data Center John Goodacre Director Technology and Systems, ARM Ltd. Cambridge Professor Computer Architectures, APT. Manchester
2 EuroServer Project EUROSERVER is a European commission FP7 part-funded project which is combining the technology trends of nanotechnology 3D integration, low-power SoC processor integration and the impossible requirements from next generation compute to investigate and build a solution for scalable, cost effective and flexible ARM-based micro-server system architecture suitable across multiple markets. 2
3 Why: The Perfect Storm? The ability to inexpensively extract heat from chips of any size maxed out. The ability to lower voltages with decreasing feature size slowed dramatically. The design complexity of single core microprocessors hits point of diminishing returns where more transistors could add little to the core s per cycle performance. We are approaching a limit on economically viable off-chip interconnect with the available technologies because of electrical issues. The cost of increasing wire-based signalling rates begins to grow considerably, especially in power and complexity of the interface circuits. The economics of chip production stops the growth in size of die. The electrical and power issues associated with driving off of a chip at high rates through inexpensive commodity packaging (such as found on commodity DIMMs) to a microprocessor chip more than a few inches away reached a point where further improvements become fairly difficult and/or expensive. 3
4 Trends in Data Centre In summary, it s a compute/thermal density issue
5 Scalability Beyond SMP Systems with Non-Unified (coherent) Get a few more CPUs within the coherent processor domain Challenges of non-unified access latencies and bandwidth Expensive in terms of power consumption Communicating systems through high-speed IO Opportunities for integration of IO directly onto processor bus High-latency through driver and IO abstractions System architecture defines software architecture Extend processor bus system wide (global addressing) Application virtual- inter-processor communication system-wide Caches need to be invisible to software, but coherence protocols will not scale
6 Scalability Concept: ARM Compute Unit Coherent view into hierarchy of this compute unit Compute Unit Local Coherent Interconnect Path to local addressable by this unit Path to address locations not accessible by this unit The Compute Unit unifies a number of localized compute resources Potentially both general purpose and other local heterogeneous accelerators Provides coherent and symmetric access across local resources Enabling a SMP capable operating system and resource sharing Each Compute Unit is registered at a partition within a system s global address space (GAS), including units with heterogeneous capability Any unit can access any remote location in the GAS (including cached) DMA can transfer between (virtually address cached) partitions
7 ARM: Example #1 Heterogeneous Compute Unit AMBA Extensions Interface (Master) Access to remote partitions Cortex-A15 Cluster CPU 0 L2 Cache CPU 2 CPU 1 CPU 3 Cortex-A7 Cluster CPU 0 L2 Cache CPU 2 CPU 1 CPU 3 Mali T600 series GPU Shader Core 0 Shader Core 1 L2 Cache SMMU Shader Core 2 Shader Core 3 Debug & Trace NIC 400 CoreSight Resources Mgt System Power JTAG & Trace PMIC/ APB Bus Cache Coherent Interconnect (CCI-400) AMBA Extensions Interface (Master) Unified coherent virtual TrustZone enabled DMC Compute Subsystem LPDDR2/DDR3 Controller DDR PHY or DDR Memory Local partition OpenCL TBA - H.S.A. Shared pointers Shared data Shared dispatch AMBA Extensions Interface (Slave) On-Chip Memories (RAM, ROM) Requests from remote partitions NIC 400 Base Peripheral
8 Example #2: 32-way ARM Cortex-A50 Series Up to 24 ports of AMBA 4 AXI4/ACE-Lite interfaces that support: Any ARM compatible CPU or GPU ARM compatible hardware accelerators ARM compatible high speed mastering I/O DSP or other compatible 3 rd party accelerator
9 EUROSERVER Approach: Next Generation Compute Market Competition is good one size doesn t fit all Make it flexible Critical mass is good a problem shared Keep it compatible Innovation is expensive and time consuming. Exploiting it even more so Collaborate appropriately Technology Towards the end of the road Utilize and extend the advancements in nanotechnology integration Increasingly high costs, NRE, Recurring Make the best economically accessible SoC, 3DIC, FDSOI/FINFET Use Embedded processors now feature capable for new markets The Common Requirement More for the same, more for less, specialized, specific, accepted by the critical mass software community leverage common Compute Units 9
10 Headline Objectives EUROSERVER will design and prototype technology, architecture, and systems software for the next generation of "Micro-Servers" to be used in building data centers. We will progress on the current state of the art in the following domains: Reduced Energy consumption by: (i) using ARM (64-bit) cores, which are the world-leading low-power processors, (ii) drastically reducing the core-to- distance by using novel silicon interposer packaging technology, and (iii) improving on the "energy proportionality Reduced Cost to build and operate each microserver, owing to (i) improved manufacturing yield when multiple smaller chiplets are placed on a larger silicon interposer(2.5d), and (ii) reduced physical volume of the packaged interposer module and (iii) and energy efficient semiconductor process (FDSOI) Better Software efficiency: Next Generation system SW that (i) manages the resources in a server that consists of multiple coherence-islands, and (ii) isolates and protects the multiple workloads from each other when they use the shared server resources of I/O, storage,, and interconnects. 10
11 Physical Distance: #1 challenge to address? What is a pico Joule? Joule is the energy of a given number of Watts of power for a Second duration For example 1 Joule is 1 Watt for 1 Second Consuming 20pJ at exascale level means delivering 20MW for 1 Second Examples of the issue: DDR mJ to deliver 1GB/s 1 bit of data across PCIe over 100pJ 1 bit of data 3-5mm along today s conductors ~1pJ/bit 1 DP flop ~20pJ Optical becomes a cheaper conductor only on distances more than 15-20cm
12 EUROSERVER Vision Low power Low cost SW efficiency Software interaction Classical blade architecture Active Interposer Integration EUROSERVER server on a chip Euroserver redesigns the entreprise server Lower cost through Improved Chiplet Yielding and Flexible System Integration Energy efficiency : low-power 64 bit processor, Everything local, Software Mutualization and sharing of I/O and common resources 12
13 EUROSERVER Applications Application Area Data Centres & Cloud Telecom infrastructures High-end Embedded Typical workloads Web-server hosting (LAMP/WAMP) Distributed databases (HADOOP) OLAP, OLTP workloads Traditional, relational databases (MySQL) Network communications Vehicle on-board computer Automatic vehicle location tracking Advanced security and surveillance 13
14 Focus of Developements 1. Next Generation Compute System Architecture Scale-out and flexible heterogeneous compute Define the Unimem model of a virtual-mappable shared GAS 2. Nanoscale, silicon-on-silicon 3D integration and IP Chiplet concept in 3DIC integration and test Heterogeneous silicon interface bridging compressing controller 3. Software Architecture and Frameworks Utilize the Unimem hierarchy Resource sharing and system wide reallocation 4. Applicability of Euroserver solution a) Device PCB Realization: Development System b) Embedded Mircoservers: Wireless Basestation c) Scale-out Servers: Cloud Services 14
15 Focus Area 1 / 4 System and Memory Architecture System Architecture Unimem Unified Memory Multiple independent nodes Heterogeneous IO and compute within and between nodes All share access to a common GAS Provide consistent/coherent access to all of its local space Topology Agnostic Address based routing across GAS Hierarchical address ownership Globally aware virtualization layer Sharing common IO/Peripherals System-wide power management Multiple coherent Islands Heterogeneous Compute 40-48bits Coherent Local Memory Essential IO and Peripherals Access to shared GAS Additional 40-48bits PA Shared IO and Peripheral Memory Model Any node can locally VA map any GAS location Single node can cache VA map of shared GAS 15
16 Thermal/ Cooling Peripherals Storage Power /Ctrl/ Clock Fabric & I/O Connectors Connector s Focus Area 2/5: Nanoscale Integration Actual configurations to be confirmed Compute chiplets Fabric & I/O chiplet Interposer -Structure of chiplets - IP to bridge silicon -Technology to realize Memory chiplets Thermal Solution Euroserver device Memory / storage Power/Clock Ctrl Fabric & I/O Micro-server board view -How to reuse across different markets - Market specific requirements Microserver board 1 Microserver board 2 Microserver board N-1 Microserver board N 16
17 Modeling Device Cost Impact Consider a traditional 300mm2 2D-SoC Currently yielding means $ per device In < 20nm, that cost nearly doubles! Assume a 60:40 split between logic best in < 20nm and that in >= 20nm + = Rest of SoC in Key IP/processors Replicated chiplets low cost > 20nm in <=20nm And 3D stack increases yield yet again. Both die are smaller so yield higher and hence ~20x lower cost, even when adding today s 3D cost 17
18 Example Packaged Devices On schedule for end 2015 Proposal for end 2016
19 Focus Area 3/5: Software Architecture Systems software level - Resource agregation System architecture level Chiplet level 19
20 Intralink (L1) Intralink (L1) Intralink (L1) Intralink (L1) System Architecture Chiplet: One or more CPU cores Level-0 Interconnect Single coherence island Compute Node 0 DMA Node: One or more chiplets Level-1 interconnect Shared IO (Ethernet) and Storage Compute Node 1 DMA Compute Node 2 DMA μserver: (EuroServer) 1 or more Nodes Scale-out server using Local-IO or HPC via Level-3 interconnect EuroServer System Compute Node. DMA ARM 64b ARM 64b ARM 64b ARM 64b ARM 64b ARM 64b ARM 64b ARM 64b HPC System: Multiple Nodes sharing Level-3 interconnect (topology agnostic) Node-SSD Intralink (L1) Local-IO FPGA/OpenC L Intralink (L1) Intralink (L1) Node-SSD Local-IO Node-SSD Local-IO (Optional) Global System Intralink (L3) Node-SSD Intralink (L1) Local-IO Each Coherence Island has its own local independent global (coherent) address space (GAS L ) Coherence Islands communicating through multi-level Interconnect Sharing via page mapping a common remote global address space (GAS R ) Either Remote DMA or direct Remote Load/Store from application virtual page mapping
21 EUROSERVER: Unimem Memory Model Today s server have simple DRAM or DEVICE types Sequentially consistent cached dram is very expensive.. Even more expensive to scale beyond a single processor socket Key observations of Euroserver No need for sequential consistency in DC workloads Applications tend to partition datasets and its access Best to place the processor (and its cache) near dataset of an application task Unimem extends today s model and enables: Maintains a consistent and coherent access from Node to local DRAM Adds access to any system-wide resource by any workload Allow local processors to cache local on remote accesses Allow dynamically change ownership of regions Seamlessly supported in today s communication and shared API Can be implemented efficiently using ARM + SoC design principles does not require modifications to software applications
22 Demonstrating low latency high bandwidth resource sharing Access anything anywhere Resources do not need to be duplicated per node Uses numa-like cost tables System level MMU enables multiple independent PA plus a shared virtually mappable PA Various OS and library services already ported TCP/UDP over direct access and DMA RAM stealing to create increase local pool mmap, and other shmem schemes Multi host shared NIC
23 Access to Petabytes of DRAM from in a single application Use the DRAM physically connected to different coherent islands in same and remote devices Allows a RAM demanding application access to capacities higher than can be supported by a single device Memory consistency rules allow peer (same package) to be have similar latencies to local
24 Scalable shared model support Allows multiple independent coherent islands to share global addresses Virtual mapped DMA copies Absolute shared address pointers Memory based sycronizations Can be managed via Numa OS No inter-island coherence protocol No coherence directory Direct coherent r/w between islands Pipelined for high bandwidth Native processor addressing for low latency communication
25 Low Latency Communications Either direct load/store for single transaction communication, or virtually mapped DMA for block transfers Aliased memories for broadcasts Native processor addresses used for nonabstracted communication Consistency rules allow data movement directly between LLC of nodes
26 Architecture enables duplicating only the resources the app needs Scaling using current commodity architecture Then connect via some interface card via PCIe Scaling with EuroServer 26 Shared GAS Constructed using chiplets then direct virtually map remote and devices
27 Initial software model compliant discrete prototype Enables OS and library porting. Targeting KVM with NUMA guest Development of share IO resources and FPGA images Forms basis of subsequent silicon based compute node 32x Cortex-A53 (4x by 8xSMP) per packaged part Enables subsequent projects with silicon speed prototype
28 Focus Area 4/4 part-a: Demonstration of a Micro-server Tentative specs Form factor: standard ComExpress or Custom Size: ~ 8 x 12 cm Components: 1 Euroserver Device (compute node) Single board-to-carrier connector Local storage, local Network Fabric Interconnect Board management and control logic and sensors Local test, debug and peripherals to consider The Micro-server board will populate (in different number) both the Embedded Server and the Enterprise Server Thermal Solution 28 28
29 Key Focus 4/4 part-b: Demo in an Embedded Server Tentative specs Target form factor: Eurotech Mounted Mobile Computer Size: 13 x 25 x 8 cm Prototype targeted for rugged sealed enclosure and extended temperature range Modular system design Shared design for micro-server board and carrier board with Enterprise Server version Up to 2 micro-server slots Support for optional general purpose card (using 1 slot) for specific function or peripheral (Reference only) (Reference only) 9 to 36 Vdc in; 30W max; Passive conduction cooling Support for Backup battery pack to investigate 29
30 Key focus 4/4 part-c: Demo as an Enterprise Server Tentative Specs: Compatible form factor with standard server rack design (Reference only) Minimum density design 1 node pitch (16 nodes per rack width) It enables 64 micro-server cards (16 x 4) in a standard cabinet and 12 kw compute power in a 42U rack Carrier board design shared with Embedded Server version Population options: any combination of micro-server node cards and general purpose cards (each card uses one slot) Support for multiple thermal and cooling options 30
31 Summary ARM Processor technology has the capabilities for high performance compute Best to think more than swap the ISA Its likely the business around competition that will decide on how ARM is used in the data centre EUROSERVER is a next small step in taking a longer term view at how the ARM technology could enable a new silicon ecosystem around ARM in the data center 31
Hardware accelerated Virtualization in the ARM Cortex Processors
Hardware accelerated Virtualization in the ARM Cortex Processors John Goodacre Director, Program Management ARM Processor Division ARM Ltd. Cambridge UK 2nd November 2010 Sponsored by: & & New Capabilities
More information7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
More informationCopyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information
More informationOpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
More informationXeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
More informationHigh Performance or Cycle Accuracy?
CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing
More informationCloud Data Center Acceleration 2015
Cloud Data Center Acceleration 2015 Agenda! Computer & Storage Trends! Server and Storage System - Memory and Homogenous Architecture - Direct Attachment! Memory Trends! Acceleration Introduction! FPGA
More informationARM Webinar series. ARM Based SoC. Abey Thomas
ARM Webinar series ARM Based SoC Verification Abey Thomas Agenda About ARM and ARM IP ARM based SoC Verification challenges Verification planning and strategy IP Connectivity verification Performance verification
More informationAll Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule
All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:
More informationIntroduction to AMBA 4 ACE and big.little Processing Technology
Introduction to AMBA 4 and big.little Processing Technology Ashley Stevens Senior FAE, Fabric and Systems June 6th 2011 Updated July 29th 2013 Page 1 of 15 Why AMBA 4? The continual requirement for more
More informationApplied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers
ZT Systems (ST based) Applied Micro development platform HP Redstone platform Mitac Dell Copper platform ARM in Servers 1 Server Ecosystem Momentum 2009: Internal ARM trials hosting part of website on
More informationA Scalable VISC Processor Platform for Modern Client and Cloud Workloads
A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015 Agenda Soft Machines Background
More informationDigitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation
More informationARM Processors and the Internet of Things. Joseph Yiu Senior Embedded Technology Specialist, ARM
ARM Processors and the Internet of Things Joseph Yiu Senior Embedded Technology Specialist, ARM 1 Internet of Things is a very Diverse Market Human interface Location aware MEMS sensors Smart homes Security,
More informationSupport a New Class of Applications with Cisco UCS M-Series Modular Servers
Solution Brief December 2014 Highlights Support a New Class of Applications Cisco UCS M-Series Modular Servers are designed to support cloud-scale workloads In which a distributed application must run
More informationEmerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting
Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Introduction Big Data Analytics needs: Low latency data access Fast computing Power efficiency Latest
More informationSupercomputing Clusters with RapidIO Interconnect Fabric
Supercomputing Clusters with RapidIO Interconnect Fabric Devashish Paul, Director Strategic Marketing, Systems Solutions devashish.paul@idt.com Ethernet Summit 2015 April 14-16, 2015 Santa Clara, CA Integrated
More informationHUAWEI Tecal E6000 Blade Server
HUAWEI Tecal E6000 Blade Server Professional Trusted Future-oriented HUAWEI TECHNOLOGIES CO., LTD. The HUAWEI Tecal E6000 is a new-generation server platform that guarantees comprehensive and powerful
More informationFPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab
FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search
More information760 Veterans Circle, Warminster, PA 18974 215-956-1200. Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974.
760 Veterans Circle, Warminster, PA 18974 215-956-1200 Technical Proposal Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA 18974 for Conduction Cooled NAS Revision 4/3/07 CC/RAIDStor: Conduction
More informationLSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID
The vast majority of the world s servers count on LSI SAS & MegaRAID Trust us, build the LSI credibility in storage, SAS, RAID Server installed base = 36M LSI SAS inside 60% of servers 21 million LSI SAS
More informationA-CLASS The rack-level supercomputer platform with hot-water cooling
A-CLASS The rack-level supercomputer platform with hot-water cooling INTRODUCTORY PRESENTATION JUNE 2014 Rev 1 ENG COMPUTE PRODUCT SEGMENTATION 3 rd party board T-MINI P (PRODUCTION): Minicluster/WS systems
More informationData Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
More informationWhat is a System on a Chip?
What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex
More informationThe Future of Computing Cisco Unified Computing System. Markus Kunstmann Channels Systems Engineer
The Future of Computing Cisco Unified Computing System Markus Kunstmann Channels Systems Engineer 2009 Cisco Systems, Inc. All rights reserved. Data Centers Are under Increasing Pressure Collaboration
More informationPower Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis
White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This document
More informationSeeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
More informationSAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
More informationWhite Paper Solarflare High-Performance Computing (HPC) Applications
Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters
More informationOpenSoC Fabric: On-Chip Network Generator
OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationICRI-CI Retreat Architecture track
ICRI-CI Retreat Architecture track Uri Weiser June 5 th 2015 - Funnel: Memory Traffic Reduction for Big Data & Machine Learning (Uri) - Accelerators for Big Data & Machine Learning (Ran) - Machine Learning
More informationArchitekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik
Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Contents Überblick: Aufbau moderner FPGA Einblick: Eigenschaften
More informationPCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation
PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
More informationHow To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual
Architetture di bus per System-On On-Chip Massimo Bocchi Corso di Architettura dei Sistemi Integrati A.A. 2002/2003 System-on on-chip motivations 400 300 200 100 0 19971999 2001 2003 2005 2007 2009 Transistors
More informationbig.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices
big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices Brian Jeff November, 2013 Abstract ARM big.little processing
More informationARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG
ARM Processors for Computer-On-Modules Christian Eder Marketing Manager congatec AG COM Positioning Proprietary Modules Qseven COM Express Proprietary Modules Small Module Powerful Module No standard feature
More informationQsys and IP Core Integration
Qsys and IP Core Integration Prof. David Lariviere Columbia University Spring 2014 Overview What are IP Cores? Altera Design Tools for using and integrating IP Cores Overview of various IP Core Interconnect
More informationEDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation
PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
More informationWhite Paper. S2C Inc. 1735 Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: +1 408 213 8818 Fax: +1 408 213 8821 www.s2cinc.com.
White Paper FPGA Prototyping of System-on-Chip Designs The Need for a Complete Prototyping Platform for Any Design Size, Any Design Stage with Enterprise-Wide Access, Anytime, Anywhere S2C Inc. 1735 Technology
More informationOverview: X5 Generation Database Machines
Overview: X5 Generation Database Machines Spend Less by Doing More Spend Less by Paying Less Rob Kolb Exadata X5-2 Exadata X4-8 SuperCluster T5-8 SuperCluster M6-32 Big Memory Machine Oracle Exadata Database
More informationCoreSight SoC enabling efficient design of custom debug and trace subsystems for complex SoCs
CoreSight SoC enabling efficient design of custom debug and trace subsystems for complex SoCs Key steps to create a debug and trace solution for an ARM SoC Mayank Sharma, Technical Marketing Engineer,
More informationHow To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
More informationArchitectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
More informationHETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
More informationPhotonic Networks for Data Centres and High Performance Computing
Photonic Networks for Data Centres and High Performance Computing Philip Watts Department of Electronic Engineering, UCL Yury Audzevich, Nick Barrow-Williams, Robert Mullins, Simon Moore, Andrew Moore
More informationNetwork connectivity controllers
Network connectivity controllers High performance connectivity solutions Factory Automation The hostile environment of many factories can have a significant impact on the life expectancy of PCs, and industrially
More informationMainframe. Large Computing Systems. Supercomputer Systems. Mainframe
1 Large Computing Systems Server Farm Networked cluster of interchangeable file/application servers Provides load balancing for availability and reliability Blade Server Server farm in a single cabinet
More informationServer Forum 2014. Copyright 2014 Micron Technology, Inc
DDR4 NVDIMM Standardization: Now and Future Server Forum 2014 Copyright 2014 Micron Technology, Inc NVDIMM Definition One of several Hybrid DIMM versions RDIMM/LRDIMM-like DRAM module with storage memory
More informationMPSoC Designs: Driving Memory and Storage Management IP to Critical Importance
MPSoC Designs: Driving Storage Management IP to Critical Importance Design IP has become an essential part of SoC realization it is a powerful resource multiplier that allows SoC design teams to focus
More informationOptimizing Data Center Networks for Cloud Computing
PRAMAK 1 Optimizing Data Center Networks for Cloud Computing Data Center networks have evolved over time as the nature of computing changed. They evolved to handle the computing models based on main-frames,
More informationA+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware
A+ Guide to Managing and Maintaining Your PC, 7e Chapter 1 Introducing Hardware Objectives Learn that a computer requires both hardware and software to work Learn about the many different hardware components
More informationIntel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family
Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
More informationIntel Xeon +FPGA Platform for the Data Center
Intel Xeon +FPGA Platform for the Data Center FPL 15 Workshop on Reconfigurable Computing for the Masses PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA
More informationMemory Channel Storage ( M C S ) Demystified. Jerome McFarland
ory nel Storage ( M C S ) Demystified Jerome McFarland Principal Product Marketer AGENDA + INTRO AND ARCHITECTURE + PRODUCT DETAILS + APPLICATIONS THE COMPUTE-STORAGE DISCONNECT + Compute And Data Have
More informationImplementing a unified Datacenter architecture for service providers
Implementing a unified Datacenter architecture for service providers IT, commercial & network Tomas Fredberg, Ericsson HDS 8000 Chief Architect Content Introduction Ericsson HDS 8000 Hardware components
More informationAccelerating the Data Plane With the TILE-Mx Manycore Processor
Accelerating the Data Plane With the TILE-Mx Manycore Processor Bob Doud Director of Marketing EZchip Linley Data Center Conference February 25 26, 2015 1 Announcing the World s First 100-Core A 64-Bit
More informationFibre Channel over Ethernet in the Data Center: An Introduction
Fibre Channel over Ethernet in the Data Center: An Introduction Introduction Fibre Channel over Ethernet (FCoE) is a newly proposed standard that is being developed by INCITS T11. The FCoE protocol specification
More informationPower Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure
White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This
More informationFrom Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller
White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller
More informationSOFTWARE-DEFINED: MAKING CLOUDS MORE EFFICIENT. Julian Chesterfield, Director of Emerging Technologies
SOFTWARE-DEFINED: MAKING CLOUDS MORE EFFICIENT Julian Chesterfield, Director of Emerging Technologies DEFINING SOFTWARE DEFINED! FLEXIBILITY IN SOFTWARE Leveraging commodity CPU cycles to provide traditional
More informationOutline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip
Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC
More informationGoing Linux on Massive Multicore
Embedded Linux Conference Europe 2013 Going Linux on Massive Multicore Marta Rybczyńska 24th October, 2013 Agenda Architecture Linux Port Core Peripherals Debugging Summary and Future Plans 2 Agenda Architecture
More informationAccelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance. Alex Ho, Product Manager Innodisk Corporation
Accelerating I/O- Intensive Applications in IT Infrastructure with Innodisk FlexiArray Flash Appliance Alex Ho, Product Manager Innodisk Corporation Outline Innodisk Introduction Industry Trend & Challenge
More informationVorlesung Rechnerarchitektur 2 Seite 178 DASH
Vorlesung Rechnerarchitektur 2 Seite 178 Architecture for Shared () The -architecture is a cache coherent, NUMA multiprocessor system, developed at CSL-Stanford by John Hennessy, Daniel Lenoski, Monica
More informationNetwork Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics. Qin Yin Fall Semester 2013
Network Virtualization and Data Center Networks 263-3825-00 Data Center Virtualization - Basics Qin Yin Fall Semester 2013 1 Walmart s Data Center 2 Amadeus Data Center 3 Google s Data Center 4 Data Center
More informationGadgetGatewayIa Configurable LON to IP Router and/or Remote Packet Monitor. ANSI 709.1 (LonTalk ) and ANSI 852 (IP) standards based.
Configurable LON to IP Router and/or Remote Packet Monitor. ANSI.1 (LonTalk ) and ANSI 852 (IP) standards based. Creator of GadgetTek Products 2966 Fort Hill Road Eagle Mountain, Utah 84043-4108 USA (voice)
More informationHP Moonshot: An Accelerator for Hyperscale Workloads
HP Moonshot: An Accelerator for Hyperscale Workloads Sponsored by HP, see HP Moonshot for more information www.hp.com/go/moonshot Executive Summary Hyperscale data center customers have specialized workloads,
More informationPCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)
PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4
More informationpræsentation oktober 2011
Johnny Olesen System X presale præsentation oktober 2011 2010 IBM Corporation 2 Hvem er jeg Dagens agenda Server overview System Director 3 4 Portfolio-wide Innovation with IBM System x and BladeCenter
More informationA Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
More informationManaging Data Center Power and Cooling
White PAPER Managing Data Center Power and Cooling Introduction: Crisis in Power and Cooling As server microprocessors become more powerful in accordance with Moore s Law, they also consume more power
More informationData Sheet FUJITSU Server PRIMERGY CX400 M1 Multi-Node Server Enclosure
Data Sheet FUJITSU Server PRIMERGY CX400 M1 Multi-Node Server Enclosure Data Sheet FUJITSU Server PRIMERGY CX400 M1 Multi-Node Server Enclosure Scale-Out Smart for HPC, Cloud and Hyper-Converged Computing
More informationHigh speed pattern streaming system based on AXIe s PCIe connectivity and synchronization mechanism
High speed pattern streaming system based on AXIe s connectivity and synchronization mechanism By Hank Lin, Product Manager of ADLINK Technology, Inc. E-Beam (Electron Beam) lithography is a next-generation
More informationOpenSPARC T1 Processor
OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware
More informationManagement of VMware ESXi. on HP ProLiant Servers
Management of VMware ESXi on W H I T E P A P E R Table of Contents Introduction................................................................ 3 HP Systems Insight Manager.................................................
More informationNetvisor Software Defined Fabric Architecture
Netvisor Software Defined Fabric Architecture Netvisor Overview The Pluribus Networks network operating system, Netvisor, is designed to power a variety of network devices. The devices Netvisor powers
More informationHP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
More informationSeaMicro SM10000-64 Server
SeaMicro SM10000-64 Server Building Datacenter Servers Using Cell Phone Chips Ashutosh Dhodapkar, Gary Lauterbach, Sean Lie, Ashutosh Dhodapkar, Gary Lauterbach, Sean Lie, Dhiraj Mallick, Jim Bauman, Sundar
More informationAn Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing
An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates
More informationEmbedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
More informationHow To Build An Ark Processor With An Nvidia Gpu And An African Processor
Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced
More informationPCI Express and Storage. Ron Emerick, Sun Microsystems
Ron Emerick, Sun Microsystems SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature
More informationVirtualised MikroTik
Virtualised MikroTik MikroTik in a Virtualised Hardware Environment Speaker: Tom Smyth CTO Wireless Connect Ltd. Event: MUM Krackow Feb 2008 http://wirelessconnect.eu/ Copyright 2008 1 Objectives Understand
More informationI/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology
I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology Reduce I/O cost and power by 40 50% Reduce I/O real estate needs in blade servers through consolidation Maintain
More informationMotherboard- based Servers versus ATCA- based Servers
Motherboard- based Servers versus ATCA- based Servers Summary: A comparison of costs, features and applicability for telecom application hosting After many years of struggling for market acceptance, it
More informationFrom Bus and Crossbar to Network-On-Chip. Arteris S.A.
From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,
More informationZadara Storage Cloud A whitepaper. @ZadaraStorage
Zadara Storage Cloud A whitepaper @ZadaraStorage Zadara delivers two solutions to its customers: On- premises storage arrays Storage as a service from 31 locations globally (and counting) Some Zadara customers
More informationAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand Pak Lui, Application Performance Manager September 12, 2013 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server and
More informationPedraforca: ARM + GPU prototype
www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of
More informationChapter 2 Heterogeneous Multicore Architecture
Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is
More informationMaking Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
More informationResource Efficient Computing for Warehouse-scale Datacenters
Resource Efficient Computing for Warehouse-scale Datacenters Christos Kozyrakis Stanford University http://csl.stanford.edu/~christos DATE Conference March 21 st 2013 Computing is the Innovation Catalyst
More informationCisco UCS B-Series M2 Blade Servers
Cisco UCS B-Series M2 Blade Servers Cisco Unified Computing System Overview The Cisco Unified Computing System is a next-generation data center platform that unites compute, network, storage access, and
More informationAXIe: AdvancedTCA Extensions for Instrumentation and Test
AXIe: AdvancedTCA Extensions for Instrumentation and Test November 2012 Copyright 2012 AXIe Consortium, Inc. * AdvancedTCA is a registered trademark of PICMG. AXIe is a registered trademark of the AXIe
More informationThe Internet of Things: Opportunities & Challenges
The Internet of Things: Opportunities & Challenges What is the IoT? Things, people and cloud services getting connected via the Internet to enable new use cases and business models Cloud Services How is
More informationSecure Containers. Jan 2015 www.imgtec.com. Imagination Technologies HGI Dec, 2014 p1
Secure Containers Jan 2015 www.imgtec.com Imagination Technologies HGI Dec, 2014 p1 What are we protecting? Sensitive assets belonging to the user and the service provider Network Monitor unauthorized
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationHigh-Speed SERDES Interfaces In High Value FPGAs
High-Speed SERDES Interfaces In High Value FPGAs February 2009 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 High-Speed SERDES
More informationSGI High Performance Computing
SGI High Performance Computing Accelerate time to discovery, innovation, and profitability 2014 SGI SGI Company Proprietary 1 Typical Use Cases for SGI HPC Products Large scale-out, distributed memory
More information