Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Size: px
Start display at page:

Download "Seeking Opportunities for Hardware Acceleration in Big Data Analytics"

Transcription

1 Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto

2 Who am I? I am interested in building computing machines/accelerators Why accelerate? To go faster, to reduce power How to accelerate? 2 We use FPGAs What are we doing? Some of our projects What are the opportunities in Big Data?

3 Moore s Law 3

4 Until 4 I m melting!!!! - Wizard of Oz 1939

5 More cores!!! 5 Programming hard and it s not always fast enough

6 Need for accelerators GPUs 6 FPGAs

7 What about these FPGAs? First, a quick introduction. 7

8 FIELD-PROGRAMMABLE GATE ARRAYS 8

9 FIELD-PROGRAMMABLE GATE ARRAYS 9 H.Roesner flickr / MadPhysicist

10 Hardwired FPGA functions SRAM Ser Des ARM A9 ARM A9 DSP 1 0 SRAM 1GE MAC

11 Computing with FPGAs Fully customized dataflow and buffering Tightly coupled pipelining of computations Very low energy / computation ratio 1 1

12 Example: Smith-Waterman DNA sequencing (Dynamic Programming) 49x 980x speedup (I/O dependent) on Xilinx V4-LX160 FPGA vs GHz AMD Opteron (Storaasli/Cray 2009)

13 Molecular Dynamics Simulate motion of molecules at atomic level Highly compute-intensive Understand protein folding Computer-aided drug design 1 3

14 Platform for MD Initial Breakdown of CPU Time 12 short range nonbond FPGAs 2-3 pipelines/nbe FPGA; Each runs 15-30x CPU NBE x 2 PME FPGAs with fast memory and fibre optic interconnects PME 420x Short range Nonbonded Long range Electrostatic Bonds Bonds on quad-core Xeon server Bonds 1x MEM PME 72.5 GB/s 1 PME MEM 4 NBE NBE NBE NBE NBE NBE Sys Mem Sys Mem NBE NBE NBE NBE NBE NBE Sys Mem FSB FSB FSB Quad Socket Xeon 0 Socket 1 Socket FSB MHz Socket 3

15 Performance Significant overlap between all force calculations ms is equivalent to between 80 and 88 Infinibandconnected cores at U of T s supercomputer, SciNet hyperthreaded cores Can we do better? 140 with hardware bond engines change engine from SW to HW, no architectural change

16 ISCA 2014 June 16, 2014 More Recently A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services Demonstrates that FPGAs can work in a data centre Accelerate the Bing ranking engine Double performance for only 10% increase in power To be deployed in Bing in 2015! 1 6

17 Traditional Programming of an FPGA 1 7

18 Design Flow Design conception DESIGN ENTRY Schematic capture Verilog Synthesis Functional simulation No Design correct? Yes Physical design 1 8 Timing simulation No Timing requirements met? Chip configuration Figure A typical CAD system. Brown and Vranesic, Fundamentals of Digital Logic Design with Verilog, 2 nd ed.

19 Placement 1 9 Figure Placement of the circuit in Figure Brown and Vranesic, Fundamentals of Digital Logic Design with Verilog, 2 nd ed.

20 Routing 2 0 Figure Routing for the placement in Figure Brown and Vranesic, Fundamentals of Digital Logic Design with Verilog, 2 nd ed.

21 Timing 2 1 Table A summary of static timing analysis results. Brown and Vranesic, Fundamentals of Digital Logic Design with Verilog, 2 nd ed.

22 FPGA Programmability Drawbacks Need to understand hardware design Implementation (compile-equivalent) takes hours Established Hardware Description Languages (Verilog HDL, VHDL) are very low-level Downside of design flexibility: No established programming models 2 2

23 Programmability is Improving Higher-level HDLs SystemC, SystemVerilog, Bluespec High-Level-Synthesis ( C-to-gates ) Very active area in research and industry, but: So far, only useful if programmer understands hardware Target not just an instruction set designing the processor too! Higher level of abstraction allows easier design exploration Today, possible to achieve better than hand design in some cases 2 3

24 So, why are FPGAs still interesting? Even at 10% of a CPU/GPU clock rate Very high performance for the right applications Building an application-specific computer Custom memory architectures Data stream processing especially fits Caches don t get in the way Fine-grain parallelism Bit manipulation Pattern matching Performance per watt 25W per chip versus 150W per chip Compute density Racks of servers reduced to less than one 2 4

25 Often used closer to the data 2 5

26 2 6 SOME POSSIBLY RELEVANT PROJECTS

27 FPGAs as OpenStack Cloud Resources VFR... FPGA(s) VFR Agent VM... VM Agent Hypervisor 2 7 Server Server OpenStack Control & Management (C & M) Now we can boot a network connected FPGA accelerator on demand, in seconds! Framework for HLS use HLS to create and then drop in accelerators

28 Outside World Data Center Example VM Resource VFR Site Requests Uploads Web Server Internal Network Data analysis engine 2 8 x Dynamically scalable according to demand Same OpenStack command to boot/release either resource!

29 Accelerators under Hadoop A Hadoop cluster with one x86 as master node and eight ZedBoards as slave nodes FPGA: computation ARM processor: communication and task tracing 2 9

30 MapReduce Data Flow HDFS Data0 Data1 Data2 Map Map Map 3 0 Reduce HDFS Result

31 MapReduce Data Flow HDFS Data0 Data1 Data2 Map Map Map 3 1 Reduce HDFS Result

32 MapReduce Data Flow with FPGA HDFS Data0 Data1 Data2 FPGA Map FPGA Map FPGA Map 3 2 Reduce HDFS Result

33 PGAS: Global Shared Memory Host CPU (x86) DRAM DRAM Application SRAM Embedded CPU Application FPGA SRAM Custom Hardware Soft API Network Drivers Soft API Hard API Hard API 3 3 Network Easy data transfers between all system memories Productive but efficient high-level programming model

34 HPC system: BEE4 + PC hosts PCIe 2 x8 3.2 GB/s 2x 16GB DDR Gb Eth

35 Low power, embedded system: Zynq DRAM FPGA 1Gbit Ethernet CH Gc BlockRAM x8 3 5 BlockRAM CH Gc On-Chip Network CH Gc BlockRAM

36 Big Data Memory Systems 3 6 To explore the use of FPGAs in the architecture of Big Data memory Tools systems to Tackle Big Data

37 Are there better memory architectures? 3 7 Replace middleware in-memory system with application-specific Tools to Tackle architecture Big Data

38 We Need Applications! We are not users We do not understand the applications Do you have a potential application? We could collaborate 3 8

39 Conclusions FPGAs are handicapped by the tools hard to use Good Abstraction is being raised Bad Iteration time is very long can be hours FPGAs can provide better performance for many interesting applications 3 9 FPGAs can provide better performance per watt There should be good opportunities for FPGAs in Big Data what are they?

40 Thank you for your attention! Questions? 4 0 Paul Chow

CFD Implementation with In-Socket FPGA Accelerators

CFD Implementation with In-Socket FPGA Accelerators CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline

More information

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search

More information

Extending the Power of FPGAs. Salil Raje, Xilinx

Extending the Power of FPGAs. Salil Raje, Xilinx Extending the Power of FPGAs Salil Raje, Xilinx Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

FPGA-based MapReduce Framework for Machine Learning

FPGA-based MapReduce Framework for Machine Learning FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China

More information

Achieving 10Gbps Line-rate Key-value Stores with FPGAs

Achieving 10Gbps Line-rate Key-value Stores with FPGAs Achieving 10Gbps Line-rate Key-value Stores with FPGAs Michaela Blott, Kimon Karras, Ling Liu, Kees Vissers - Xilinx Research Jeremia Baer, Zsolt Istvan - ETH Zurich Introduction Common middleware application

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

Zynq-7000 Extensible Processing Platform Press Backgrounder

Zynq-7000 Extensible Processing Platform Press Backgrounder Press Backgrounder March 1, 2011 Zynq-7000 Extensible Processing Platform Press Backgrounder The first question you may ask about the new Extensible Processing Platform is what exactly was the thinking

More information

Cloud Data Center Acceleration 2015

Cloud Data Center Acceleration 2015 Cloud Data Center Acceleration 2015 Agenda! Computer & Storage Trends! Server and Storage System - Memory and Homogenous Architecture - Direct Attachment! Memory Trends! Acceleration Introduction! FPGA

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

Zynq-7000 All Programmable SoC A Paradigm Shift for SoC-based Systems

Zynq-7000 All Programmable SoC A Paradigm Shift for SoC-based Systems Zynq-7000 All Programmable SoC A Paradigm Shift for SoC-based Systems Mark van der Bolt - Xilinx Account Manager BeNeLux October 2013 Demands of Today's Technology ASIC FPGA ASSP Structured ASIC Which

More information

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance

More information

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting Introduction Big Data Analytics needs: Low latency data access Fast computing Power efficiency Latest

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:

More information

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25

FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 December 2014 FPGAs in the news» Catapult» Accelerate BING» 2x search acceleration:» ½ the number of servers»

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

FPGA HPC The road beyond processors

FPGA HPC The road beyond processors Slide 1 of 21 FPGA HPC The road beyond processors Jeff Mason Research Labs mason@xilinx.com Outline FPGA High Performance Computing (FHPC) FPGAs What can be done today Why not more (Double Precision FP)

More information

Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?

Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim? Successful FPGA datacenter usage at scale will require differentiated capability, programming ease, and scalable implementation models Executive

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Contents Überblick: Aufbau moderner FPGA Einblick: Eigenschaften

More information

Michael Kagan. michael@mellanox.com

Michael Kagan. michael@mellanox.com Virtualization in Data Center The Network Perspective Michael Kagan CTO, Mellanox Technologies michael@mellanox.com Outline Data Center Transition Servers S as a Service Network as a Service IO as a Service

More information

Introduction to FPGAs

Introduction to FPGAs Introduction to FPGAs Outline: 1. What s s an FPGA? logic element fabric, i.e. logic gates + memory + clock trigger handling. 2. What s s so good about FPGAs? FPGA applications and capabilities FPGAs for

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Evolution of Multi-Core Technology. Energy Issue. HyperThreading and Multicore. Cluster Computing Paul A. Farrell 8/31/2011

Evolution of Multi-Core Technology. Energy Issue. HyperThreading and Multicore. Cluster Computing Paul A. Farrell 8/31/2011 Energy Issue Evolution of Multi- Technology Single-threading: Only one task processes at one time. Singlethreading Multitasking and Multithreading: Two or more tasks execute at one time by using content

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu

More information

Data Center and Cloud Computing Market Landscape and Challenges

Data Center and Cloud Computing Market Landscape and Challenges Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Digital electronics & Embedded systems

Digital electronics & Embedded systems FYS3240 PC-based instrumentation and microcontrollers Digital electronics & Embedded systems Spring 2015 Lecture #10 Bekkeng, 29.1.2015 Embedded systems An embedded system is a special-purpose system designed

More information

Embedded Systems: map to FPGA, GPU, CPU?

Embedded Systems: map to FPGA, GPU, CPU? Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware

More information

Embedded vision with FPGA vs CUDA processing. Directions and platform proposal

Embedded vision with FPGA vs CUDA processing. Directions and platform proposal Reconfigurable and High Performance Computing Lab INAOE Puebla, Mexico Embedded vision with FPGA vs CUDA processing. Directions and platform proposal WASC 2014 20 June 2014 Dr. Miguel Arias Estrada ariasmo@inaoep.mx

More information

A Realtime 1080P30 H.264 Encoder System on a Zynq Device

A Realtime 1080P30 H.264 Encoder System on a Zynq Device A Realtime 1080P30 H.264 Encoder System on a Zynq Device Introduction The Zynq all programmable System On a Chip is a recently introduced device from Xilinx which incorporates two ARM A9 CPU cores, I/O

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Model-based system-on-chip design on Altera and Xilinx platforms

Model-based system-on-chip design on Altera and Xilinx platforms CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect RJA.Grootelaar@3t.nl Agenda 3T Company profile Technology

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and

Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and Eli Levi Eli Levi holds B.Sc.EE from the Technion.Working as field application engineer for Systematics, Specializing in HDL design with MATLAB and Simulink targeting ASIC/FGPA. Previously Worked as logic

More information

Next Generation Operating Systems

Next Generation Operating Systems Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the

More information

Field Programmable Gate Array

Field Programmable Gate Array Field Programmable Gate Array FPGA Page 1 Using ROM as Combinational Logic B A B C F 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 0 1 1 1 1 C A C A B C 8 x 1 ROM (LUT) F F Page 2 Mapping Larger

More information

Accelerate Cloud Computing with the Xilinx Zynq SoC

Accelerate Cloud Computing with the Xilinx Zynq SoC X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce

More information

Kalray MPPA Massively Parallel Processing Array

Kalray MPPA Massively Parallel Processing Array Kalray MPPA Massively Parallel Processing Array Next-Generation Accelerated Computing February 2015 2015 Kalray, Inc. All Rights Reserved February 2015 1 Accelerated Computing 2015 Kalray, Inc. All Rights

More information

Incorporating FPGAs in Test Applications. NI Technical Conference Long Island Nov 10, 2009 Terry Stratoudakis, P.E. ALE System Integration

Incorporating FPGAs in Test Applications. NI Technical Conference Long Island Nov 10, 2009 Terry Stratoudakis, P.E. ALE System Integration Incorporating FPGAs in Test Applications NI Technical Conference Long Island Nov 10, 2009 Terry Stratoudakis, P.E. ALE System Integration Agenda Introduction to FPGAs in test FPGAs in test applications

More information

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures 11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the

More information

How NVMe and 3D XPoint Will Create a New Datacenter Architecture

How NVMe and 3D XPoint Will Create a New Datacenter Architecture How NVMe and 3D XPoint Will Create a New Datacenter Architecture Emilio Billi CTO A3Cube Inc Santa Clara, CA 1 The Storage Paradigm Shift To understand the future of the storage-memory devices we need

More information

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011 Oracle Database Reliability, Performance and scalability on Intel platforms Mitch Shults, Intel Corporation October 2011 1 Intel Processor E7-8800/4800/2800 Product Families Up to 10 s and 20 Threads 30MB

More information

Processor to Usher in a New Era of Computing

Processor to Usher in a New Era of Computing Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced

More information

Introducing the Singlechip Cloud Computer

Introducing the Singlechip Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

FPGAs 1. CMPE691/491: Advanced FPGA Design

FPGAs 1. CMPE691/491: Advanced FPGA Design FPGAs 1 CMPE691/491: Advanced FPGA Design FPGAs Large array of configurable logic blocks (CLB) connected via programmable interconnects Features and Specifications of FPGAs Basic Programmable Devices Features

More information

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers. White Paper Virtualized SAP: Optimize Performance with Cisco Data Center Virtual Machine Fabric Extender and Red Hat Enterprise Linux and Kernel-Based Virtual Machine What You Will Learn The virtualization

More information

The search engine you can see. Connects people to information and services

The search engine you can see. Connects people to information and services The search engine you can see Connects people to information and services The search engine you cannot see Total data: ~1EB Processing data : ~100PB/day Total web pages: ~1000 Billion Web pages updated:

More information

Intel Xeon +FPGA Platform for the Data Center

Intel Xeon +FPGA Platform for the Data Center Intel Xeon +FPGA Platform for the Data Center FPL 15 Workshop on Reconfigurable Computing for the Masses PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA

More information

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID The vast majority of the world s servers count on LSI SAS & MegaRAID Trust us, build the LSI credibility in storage, SAS, RAID Server installed base = 36M LSI SAS inside 60% of servers 21 million LSI SAS

More information

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design. www.ni.com/buildvsbuy

BUILD VERSUS BUY. Understanding the Total Cost of Embedded Design. www.ni.com/buildvsbuy BUILD VERSUS BUY Understanding the Total Cost of Embedded Design Table of Contents I. Introduction II. The Build Approach: Custom Design a. Hardware Design b. Software Design c. Manufacturing d. System

More information

NUMA-like architecture for Microservers

NUMA-like architecture for Microservers Foundation for Research and Technology Hellas (FORTH) Institute of Computer Science (ICS) NUMA-like architecture for Microservers Iakovos Mavroidis (jacob@ics.forth.gr) FORTH-ICS, Greece MPSoC 14, July

More information

MAXware: acceleration in HPC. R. Dimond, M. J. Flynn, O. Mencer and O. Pell Maxeler Technologies contact:

MAXware: acceleration in HPC. R. Dimond, M. J. Flynn, O. Mencer and O. Pell Maxeler Technologies contact: MAXware: acceleration in HPC R. Dimond, M. J. Flynn, O. Mencer and O. Pell Maxeler Technologies contact: flynn@maxeler.com Maxeler Technologies MAXware: acceleration in HPC 2 / 26 HPC: the case for accelerators

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

MASSIVELY SCALED INFRASTRUCTURE FOR VERIZON CLOUD COMPUTE AND STORAGE

MASSIVELY SCALED INFRASTRUCTURE FOR VERIZON CLOUD COMPUTE AND STORAGE MASSIVELY SCALED INFRASTRUCTURE FOR VERIZON CLOUD COMPUTE AND STORAGE Challenge Create the world s highest performance enterprise class public cloud Provide granular, customized configurations defined

More information

COMPUTER HARDWARE

COMPUTER HARDWARE 0113611 COMPUTER HARDWARE DIGITAL DESIGN AND CAD TOOLS Dr. Fethullah Karabiber DIGITAL HARDWARE 2 Integrated circuit chips are manufactered on a silicon wafer. The wafer is cut to produce the individual

More information

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor Von der Hardware zur Software in FPGAs mit Embedded Prozessoren Alexander Hahn Senior Field Application Engineer Lattice Semiconductor AGENDA Overview Mico32 Embedded Processor Development Tool Chain HW/SW

More information

A Holistic Model of the Energy-Efficiency of Hypervisors

A Holistic Model of the Energy-Efficiency of Hypervisors A Holistic Model of the -Efficiency of Hypervisors in an HPC Environment Mateusz Guzek,Sebastien Varrette, Valentin Plugaru, Johnatan E. Pecero and Pascal Bouvry SnT & CSC, University of Luxembourg, Luxembourg

More information

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER

DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER DEPLOYING AND MONITORING HADOOP MAP-REDUCE ANALYTICS ON SINGLE-CHIP CLOUD COMPUTER ANDREAS-LAZAROS GEORGIADIS, SOTIRIOS XYDIS, DIMITRIOS SOUDRIS MICROPROCESSOR AND MICROSYSTEMS LABORATORY ELECTRICAL AND

More information

Spectra-Q Engine BACKGROUNDER

Spectra-Q Engine BACKGROUNDER BACKGROUNDER Spectra-Q Engine 2010 s 2000 s 1990 s >50K >500K >5M FPGAs and SoCs have taken huge leaps with next-generation capabilities. These include multi-million logic elements, complex interface protocols,

More information

Hitachi Virtage Embedded Virtualization Hitachi BladeSymphony 10U

Hitachi Virtage Embedded Virtualization Hitachi BladeSymphony 10U Hitachi Virtage Embedded Virtualization Hitachi BladeSymphony 10U Datasheet Brings the performance and reliability of mainframe virtualization to blade computing BladeSymphony is the first true enterprise-class

More information

High Performance or Cycle Accuracy?

High Performance or Cycle Accuracy? CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013 AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013 OpenStack What is OpenStack? OpenStack is a cloud operaeng system that controls large pools of compute, storage, and networking resources

More information

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server Technology brief Introduction... 2 GPU-based computing... 2 ProLiant SL390s GPU-enabled architecture... 2 Optimizing

More information

FPGAs for Trusted Cloud Computing

FPGAs for Trusted Cloud Computing FPGAs for Trusted Cloud Computing Traditional Servers Datacenter Cloud Servers Datacenter Cloud Manager Client Client Control Client Client Control 2 Existing cloud systems cannot offer strong security

More information

Virtualizing FPGAs for Cloud Computing Applications. Stuart A. Byma

Virtualizing FPGAs for Cloud Computing Applications. Stuart A. Byma Virtualizing FPGAs for Cloud Computing Applications by Stuart A. Byma A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical

More information

The All Programmable SoC FPGA for Networking and Computing in Big Data Infrastructure

The All Programmable SoC FPGA for Networking and Computing in Big Data Infrastructure The All Programmable SoC FPGA for Networking and Computing in Big Data Infrastructure Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates on Transistor

More information

GROMACS Performance Benchmark and Profiling. October 2009

GROMACS Performance Benchmark and Profiling. October 2009 GROMACS Performance Benchmark and Profiling October 2009 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource

More information

Billions of Packets per Second

Billions of Packets per Second Billions of Packets per Second David Mendel Altera Corporation 1 2011 Altera Corporation Public Billions of packets per second 1Tbps = 1.6 Billion packets/second 40Gbps/100Gbps commonly requested today

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

VXS-620 FPGA & PowerPC VXS Multiprocessor

VXS-620 FPGA & PowerPC VXS Multiprocessor VXS-620 FPGA & PowerPC VXS Multiprocessor Xilinx Virtex -5 FPGA for high performance processing On-board PowerPC CPU for standalone operation, communications management and user applications Two PMC/XMC

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS) PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4

More information

SPARC T3-1B. Product Overview

SPARC T3-1B. Product Overview SPARC T3-1B EXTREME SCALABILITY AND DENSITY IN EFFICENT BLADE FORM FACTOR KEY FEATURES AND BENEFITS SPARC T3-1B base server available with either an 8-core, 64 thread, or 16- core, 128 thread T3 processor

More information

Reconfigurable Computing Architecture for Linux

Reconfigurable Computing Architecture for Linux Reconfigurable Computing Architecture for Linux Vince Bridgers & Yves Vandervennet October 13 th, 2016 Agenda Brief Introduction to Heterogeneous Computing Broad range of Systems Structures Some interesting

More information

Overview. Application Overview Compiler Features. Kernel Implementation Host Implementation. IO Channels Loop Pipelining

Overview. Application Overview Compiler Features. Kernel Implementation Host Implementation. IO Channels Loop Pipelining OPRA FAST decoder Overview Application Overview Compiler Features IO Channels Loop Pipelining Kernel Implementation Host Implementation 2 3 Application Overview MAC IP UDP 10G OPRA FAST HFT Trading platform

More information

Intel Processor Pricing Effective 4/20/2008 1Ku Tray Units

Intel Processor Pricing Effective 4/20/2008 1Ku Tray Units Desktop Intel Core 2 Extreme Desktop Processor Intel Core 2 Extreme Processor QX9775 (12M Cache, 3.20 GHz, 1600 MHz FSB) Intel Core 2 Extreme Processor QX9770 (12M Cache, 3.20 GHz, 1600 MHz FSB) Intel

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

How System Settings Impact PCIe SSD Performance

How System Settings Impact PCIe SSD Performance How System Settings Impact PCIe SSD Performance Suzanne Ferreira R&D Engineer Micron Technology, Inc. July, 2012 As solid state drives (SSDs) continue to gain ground in the enterprise server and storage

More information

White Paper. Innovate Telecom Services with NFV and SDN

White Paper. Innovate Telecom Services with NFV and SDN White Paper Innovate Telecom Services with NFV and SDN 2 NEXCOM White Paper As telecommunications companies seek to expand beyond telecommunications services to data services, they find their purposebuilt

More information

Hardware Based Flash Memory Failure Characterization Platform

Hardware Based Flash Memory Failure Characterization Platform Hardware Based Flash Memory Failure Characterization Platform Greg Bray B.S. Computer Engineering University of Utah http://code.google.com/p/2007-uofu-micron-clinic/ August 2008 1 Background: Graduated

More information

RAPID PROTOTYPING PLATFORM FOR RECONFIGURABLE IMAGE PROCESSING

RAPID PROTOTYPING PLATFORM FOR RECONFIGURABLE IMAGE PROCESSING RAPID PROTOTYPING PLATFORM FOR RECONFIGURABLE IMAGE PROCESSING B.Kovář 1, J. Kloub 1, J. Schier 1, A. Heřmánek 1, P. Zemčík 2, A. Herout 2 (1) Institute of Information Theory and Automation Academy of

More information

IBM Europe Announcement ZG , dated June 24, 2008

IBM Europe Announcement ZG , dated June 24, 2008 IBM Europe Announcement ZG08-0506, dated June 24, 2008 IBM System x3450 servers feature Intel Xeon processors 2.80 GHz and 3.0 GHz/1600 MHz, with 12 MB L2, and 3.4 GHz/1600 MHz, with 6 MB L2, optimizing

More information

ECDF Infrastructure Refresh - Requirements Consultation Document

ECDF Infrastructure Refresh - Requirements Consultation Document Edinburgh Compute & Data Facility - December 2014 ECDF Infrastructure Refresh - Requirements Consultation Document Introduction In order to sustain the University s central research data and computing

More information

Amazon Cloud Performance Compared. David Adams

Amazon Cloud Performance Compared. David Adams Amazon Cloud Performance Compared David Adams Amazon EC2 performance comparison How does EC2 compare to traditional supercomputer for scientific applications? "Performance Analysis of High Performance

More information

Graphics processing units (GPUs)

Graphics processing units (GPUs) Expanding the boundaries of GPU computing Supporting up to 16 PCI Express devices in a flexible, highly efficient design, the Dell PowerEdge C410x expansion chassis helps organizations take advantage of

More information

Performance Advantage of Dell PowerEdge R900 over HP DL585 Running Microsoft Hyper-V

Performance Advantage of Dell PowerEdge R900 over HP DL585 Running Microsoft Hyper-V Performance Advantage of Dell PowerEdge R900 over HP DL585 Running Microsoft Hyper-V By Todd Muirhead Dell Enterprise Technology Center Dell Enterprise Technology Center www.delltechcenter.com September

More information

LS-DYNA Performance Benchmark and Profiling. April 2012

LS-DYNA Performance Benchmark and Profiling. April 2012 LS-DYNA Performance Benchmark and Profiling April 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, LSTC Compute resource

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

SRC Computers 2007 and Beyond

SRC Computers 2007 and Beyond SRC Computers 2007 and Beyond July 19, 2007 Present IMPLICIT+EXPLICIT Architecture SRC is about a hardware and software architecture Peer relationship to all resources is key Fortran Carte Programming

More information