Secured Embedded Many-Core Accelerator for Big Data Processing

Size: px
Start display at page:

Download "Secured Embedded Many-Core Accelerator for Big Data Processing"

Transcription

1 Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland, Baltimore County

2 Agenda PENC: Power Efficient Nano s Many- and its implementation results Cognitive based Hardware Security for Many- architecture Compressive Sensing (CS) OMP Reconstruction Algorithm Modifications and its Implementation on 65nm CMOS Technology, PENC Many-, and FPGA CS-based framework for Big Data acceleration on hardware platforms Reduction in data transfers, and communication of secured Encrypted data Implementation on three different platform and evaluations in terms of hardware overhead Integration of CS-based framework with Hadoop Platform for Big Data Acceleration

3 PENC: Power Efficient Nano s by EEHPC PENC many-core acts as an accelerator to work with host processor for data analytics and machine learning applications Architecture, Simulator, Verilog ASIC implementation are fully developed by EEHPC lab members Composed of 64 processing clusters: 192 low power RISC s Fully Placed and routed processors and routers in 65 nm, 1V CMOS having very small Chip area 5.5 mm 2 for 64 clusters. Total power of the 8.7 W NSF Grant# ISCAS 12,ISQED 13,ISLPED 14,ISCAS 16,GLSVLSI 16,JETC 16

4 Cognitive Security Framework for PENC Many- Security Kernel & Interface Attack Detection Module Attack Detection Module Feature Sample Feed-Back Enable Feature Sample Feed-Back Enable Many- Platform (64-) R1 R R R1 R Security Kernel & Interface Feature Sample Feed-Back Enable Feature Sample Feed-Back Enable Attack Detection Module Attack Detection Module CLK ADM CLK MC Inter- Trigger Intra- Trigger Trojan Insertion Module CLK ADM FPGA Platform Test Setup for PENC Many- Platform (64-), where Attack Detection Module implemented using Online Machine Learning technique to prevent unexpected attack Assumptions: Processing cores and memories are safe, the Trojan is inserted at Design Phase triggers malicious activity on router internally at run-time Detects three different Denial-of-Service attacks Hardware area overhead of only 0.26% and requires 3 cycles for Trojan detection, performs 2.4x faster as compared to state-of-the-art implementation DARPA Grant JETC 16, ISQED 16, HOST 16

5 Compressive Sensing (CS): OMP Reconstruction Algorithm We propose platform independent and reconfigurable OMP CS Reconstruction Algorithm (experimented on PENC, FPGA, and GPU) OMP CS Reconstruction Algorithm Architecture of OMP CS Reconstruction Algorithm Fixed Point Hardware Implementation Analysis of OMP algorithm for 1024x1024 size image on PENC Many- Analysis of OMP algorithm on Xilinx Virtex-7 FPGA GLSVLSI 14 ISCAS 15

6 Compressive Sensing (CS): OMP Reconstruction Algorithm We propose two different modifications to OMP CS Reconstruction Algorithm, Gradient Descent OMP (GD-OMP) reduces complexity of Least Square kernel Hard Thresholding OMP (HT-OMP), reduces complexity of Identification kernel Architecture of GD-OMP Algorithm Architecture of HT-OMP Algorithm Architecture Signal Size Max Freq (MHz) Reconstruction Time (µs) Area (mm 2 ) ADP (mm 2 - µs) OMP (base) [Jerome et.al.] HT OMP (This Work) (1.6x) GD OMP (This Work) (1.9x) ASIC Implementation Analysis on 65nm CMOS, 1V technology Quality of OMP CS Reconstruction TVLSI 16*

7 CS-based Framework Implementation on Different Platforms Platform Image Size Chip Area (mm 2 ) Power (mw) Execution Time (ms) CS-based Framework for Big Data Acceleration ARM CPU (28nm,0.9V) Nvidia Jetson TK1 GPU (28nm,0.9V) PENC Many- (65nm,1V) 2MB ,120 2MB ,225 2MB ,019 CS-based framework is fully implemented for the Image reconstruction and Face Detection application on NVIDIA TK1 CPU+GPU platform and PENC many-core Compared to CPU and GPU implementations, PENC achieves 15x and 200x less energy consumption and 8x and 177x faster execution time Current Analysis on ARM CPU Quality Analysis of CS-based Framework Power Measurement Setup Current Analysis on K1 GPU

8 CS-based Framework for Big Data Acceleration using Secured PENC on Hadoop Platform treaming Data Reconstruction Quality Analysis Quality of EEG Signal and Image Reconstruction We propose compressive sensing (CS) along with PENC accelerator to reduce data communication and storage in big data streaming by up to 70%. CS-based framework with PENC has been tested for machine learning and data analytics algorithms. e.g Health monitoring, convolutional neural networks, deep learning, statistical analysis of sparse and dense matrices Framework has been implemented on low power Jetson GPU, ARM CPU & PENC

9 Publications 1) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Adaptive Realtime Trojan Detection Framework through Machine Learning, in Hardware Oriented Security and Trust (HOST), 2016 IEEE International Symposium on,3-5 May ) Amey Kulkarni, Ali Jafari, Chris Sagedy and Tinoosh Mohsenin," Sketching-Based High- Performance Biomedical Big Data Processing Accelerator", 49th ISCAS 2016,Canada, (Invited Talk) May2016 3) Amey Kulkarni, Youngok Pino and Tinoosh Mohsenin," SVM-based Real-Time Hardware Trojan Detection for Many- Platform", in 17th International Symposium on Quality Electronic Design (ISQED), March ) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Real-Time Anomaly Detection Framework for Many- Router through Machine Learning Techniques", ACM Journal on Emerging Technologies in Computing Systems 5) Amey Kulkarni, Ali Jafari, Colin Shea, and Tinoosh Mohsenin, "CS-based Secured Big Data Processing on FPGA, 24th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Washington DC, USA. 6) Amey Kulkarni, Tahmid Abtahi, Emily Smith and Tinoosh Mohsenin, " Low Energy Sketching Engines on Many- Platform for Big Data Acceleration, in Proceedings of the 26th Edition of the Great Lakes Symposium on VLSI, GLSVLSI'16. Boston, MA, USA.

10 Publications 7) Amey Kulkarni, and Tinoosh Mohsenin," Accelerating Compressive Sensing Reconstruction OMP Algorithm with CPU, GPU, FPGA and Domain Specific Many-", 48th ISCAS 2015,Portugal, May2015 8) Tawana Khawari, Amey Kulkarni, Abbas Rahimi, Tinoosh Mohsenin and Houman Homayoun "Energy-Efficient Mapping of biomedical applications on Domain-Specific Accelerator under Process Variation", International Symposium on Low Power Electronics and Design,ISLPED14 9) Amey Kulkarni, Houman Homayoun and Tinoosh Mohsenin, " A Parallel and Reconfigurable Architecture for Efficient OMP Compressive Sensing Reconstruction, 24th GLSVLSI 2014,Houston, Texas, USA, May2014 (27.32% Acceptance Rate) 10) Amey Kulkarni, Colin Shea, Tahmid Abtahi and Tinoosh Mohsenin, "Low Overhead CSbased Heterogeneous Framework for Big Data Acceleration, ACM Transaction on Embedded Computing Systems 2016, (Submitted)

http://www.ece.ucy.ac.cy/labs/easoc/people/kyrkou/index.html BSc in Computer Engineering, University of Cyprus

http://www.ece.ucy.ac.cy/labs/easoc/people/kyrkou/index.html BSc in Computer Engineering, University of Cyprus Christos Kyrkou, PhD KIOS Research Center for Intelligent Systems and Networks, Department of Electrical and Computer Engineering, University of Cyprus, Tel:(+357)99569478, email: ckyrkou@gmail.com Education

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

Performance Oriented Management System for Reconfigurable Network Appliances

Performance Oriented Management System for Reconfigurable Network Appliances Performance Oriented Management System for Reconfigurable Network Appliances Hiroki Matsutani, Ryuji Wakikawa, Koshiro Mitsuya and Jun Murai Faculty of Environmental Information, Keio University Graduate

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

A Comparison of ARM Implementations. by Matthew Hoffman and Erwins T. Milord

A Comparison of ARM Implementations. by Matthew Hoffman and Erwins T. Milord A Comparison of ARM Implementations by Matthew Hoffman and Erwins T. Milord Historical Overview Advanced Risc Machines (formerly Acorn Risc Machines) Originally conceived by Acorn Computers for business

More information

Andrey Filippov, Ph.D Elphel, Inc.

Andrey Filippov, Ph.D Elphel, Inc. Free Hardware Implementation of Ogg Theora Video Encoder Andrey Filippov, Ph.D Elphel, Inc. Background Started as a system based on embedded Linux, Elphel cameras dramatically increased performance by

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Security Enhanced Linux on Embedded Systems: a Hardware-accelerated Implementation

Security Enhanced Linux on Embedded Systems: a Hardware-accelerated Implementation Security Enhanced Linux on Embedded Systems: a Hardware-accelerated Implementation Leandro Fiorin, Alberto Ferrante Konstantinos Padarnitsas, Francesco Regazzoni University of Lugano Lugano, Switzerland

More information

Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs

Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs Automated Method to Generate Bitstream Intellectual Property Cores for Virtex FPGAs Edson L. Horta 1 and John W. Lockwood 2 1 Department of Electronic Engineering, Laboratory of Integrated Systems, EPUSP

More information

SSketch: An Automated Framework for Streaming Sketch-based Analysis of Big Data on FPGA *

SSketch: An Automated Framework for Streaming Sketch-based Analysis of Big Data on FPGA * SSketch: An Automated Framework for Streaming Sketch-based Analysis of Big Data on FPGA * Bita Darvish Rouhani, Ebrahim Songhori, Azalia Mirhoseini, and Farinaz Koushanfar Department of ECE, Rice University

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

ReCoSoC'11 Montpellier, France. Implementation Scenario for Teaching Partial Reconfiguration of FPGA

ReCoSoC'11 Montpellier, France. Implementation Scenario for Teaching Partial Reconfiguration of FPGA ReCoSoC'11 Montpellier, France Implementation Scenario for Teaching Partial Reconfiguration of FPGA Pierre Leray, Amor Nafkha, Christophe Moy SUPELEC/IETR 22 June 2011 SUPELEC - Campus de Rennes - France

More information

PyMTL and Pydgin Tutorial. Python Frameworks for Highly Productive Computer Architecture Research

PyMTL and Pydgin Tutorial. Python Frameworks for Highly Productive Computer Architecture Research PyMTL and Pydgin Tutorial Python Frameworks for Highly Productive Computer Architecture Research Derek Lockhart, Berkin Ilbeyi, Christopher Batten Computer Systems Laboratory School of Electrical and Computer

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

Algorithm and Programming Considerations for Embedded Reconfigurable Computers

Algorithm and Programming Considerations for Embedded Reconfigurable Computers Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor University Waco, Texas Douglas Fouts, Professor

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

Reconfigurable System-on-Chip Design

Reconfigurable System-on-Chip Design Reconfigurable System-on-Chip Design MITCHELL MYJAK Senior Research Engineer Pacific Northwest National Laboratory PNNL-SA-93202 31 January 2013 1 About Me Biography BSEE, University of Portland, 2002

More information

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT SPOT THE ODD ONE BEFORE IT IS OUT flexaware.net Streaming analytics: from data to action Do you need actionable insights from various data streams fast?

More information

H.264 AVC Encoder IP Core Datasheet V.4.2, 2015

H.264 AVC Encoder IP Core Datasheet V.4.2, 2015 SOC H.264 AVC Video/Audio Encoder IP Core Datasheet Standard version I-Frame Version Slim Version Low-Bit-rate Version (with B frame) Special version for Zynq-7020 1. Product Overview (Integration information

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Energiatehokas laskenta Ubi-sovelluksissa

Energiatehokas laskenta Ubi-sovelluksissa Energiatehokas laskenta Ubi-sovelluksissa Jarmo Takala Tampereen teknillinen yliopisto Tietokonetekniikan laitos email: jarmo.takala@tut.fi Energy-Efficiency Comparison: VGA 30 frames/s, 512kbit/s Software

More information

OpenSoC Fabric: On-Chip Network Generator

OpenSoC Fabric: On-Chip Network Generator OpenSoC Fabric: On-Chip Network Generator Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf MODSIM 2014 Presentation

More information

Reconfig'09 Cancun, Mexico

Reconfig'09 Cancun, Mexico Reconfig'09 Cancun, Mexico New OPBHW Interface for Real-Time Partial Reconfiguration of FPGA Julien Delorme, Amor Nafkha, Pierre Leray, Christophe Moy SUPELEC/IETR 10 December 2009 SUPELEC - Campus de

More information

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz http://vlsida.soe.ucsc.

FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz http://vlsida.soe.ucsc. Department of Computer Engineering, University of California Santa Cruz http://vlsida.soe.ucsc.edu Biographic Info 2006 PhD, University of Michigan in Electrical Engineering 2003-2005 Statistical Physical

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

SEMICON Europa Stephane Cordova, Director of Embedded Business Unit, Kalray

SEMICON Europa Stephane Cordova, Director of Embedded Business Unit, Kalray SEMICON Europa 2016 Stephane Cordova, Director of Embedded Business Unit, Kalray The real-time extreme computing processor Page 2 A NEW MARKET REQUIRING NEW SOLUTIONS EXPLOSION OF REAL-TIME AND CONNECTED

More information

Potential Thesis Topics in Networking

Potential Thesis Topics in Networking Geoff Xie 1 Potential Thesis Topics in Networking Prof. Geoffrey Xie xie@cs.nps.navy.mil, SP 544C April 2002 http://www.saamnet.org 1 What my Research Projects Offer Total learning experience for you You

More information

A General Framework for Tracking Objects in a Multi-Camera Environment

A General Framework for Tracking Objects in a Multi-Camera Environment A General Framework for Tracking Objects in a Multi-Camera Environment Karlene Nguyen, Gavin Yeung, Soheil Ghiasi, Majid Sarrafzadeh {karlene, gavin, soheil, majid}@cs.ucla.edu Abstract We present a framework

More information

Next Generation Operating Systems

Next Generation Operating Systems Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Achieving Performance Isolation with Lightweight Co-Kernels

Achieving Performance Isolation with Lightweight Co-Kernels Achieving Performance Isolation with Lightweight Co-Kernels Jiannan Ouyang, Brian Kocoloski, John Lange The Prognostic Lab @ University of Pittsburgh Kevin Pedretti Sandia National Laboratories HPDC 2015

More information

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT 216 ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT *P.Nirmalkumar, **J.Raja Paul Perinbam, @S.Ravi and #B.Rajan *Research Scholar,

More information

Silicon Valley University Doctor of Computer Engineering (DCE) Program Outline and Study Plan

Silicon Valley University Doctor of Computer Engineering (DCE) Program Outline and Study Plan Silicon Valley University Doctor of Computer Engineering (DCE) Program Outline and Study Plan DCE Program Outline DCE Curriculum: Minimum 108 semester credit hours of graduate study: (a). 96 credit hours

More information

Hardware Acceleration for CST MICROWAVE STUDIO

Hardware Acceleration for CST MICROWAVE STUDIO Hardware Acceleration for CST MICROWAVE STUDIO Chris Mason Product Manager Amy Dewis Channel Manager Agenda 1. Introduction 2. Why use Hardware Acceleration? 3. Hardware Acceleration Technologies 4. Current

More information

Data Center and Cloud Computing Market Landscape and Challenges

Data Center and Cloud Computing Market Landscape and Challenges Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution

More information

HARNESS project: Managing Heterogeneous Compute Resources for a Cloud Platform

HARNESS project: Managing Heterogeneous Compute Resources for a Cloud Platform HARNESS project: Managing Heterogeneous Compute Resources for a Cloud Platform J. G. F. Coutinho 1, O. Pell 2, E. O Neill 3, P. Sanders 2, J. McGlone 3, P. Grigoras 1, W. Luk 1, and C. Ragusa 2 1 Imperial

More information

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,

More information

International Workshop on Field Programmable Logic and Applications, FPL '99

International Workshop on Field Programmable Logic and Applications, FPL '99 International Workshop on Field Programmable Logic and Applications, FPL '99 DRIVE: An Interpretive Simulation and Visualization Environment for Dynamically Reconægurable Systems? Kiran Bondalapati and

More information

Hybrid System Design: The Only Practical Way. September 2010

Hybrid System Design: The Only Practical Way. September 2010 Hybrid System Design: The Only Practical Way September 2010 Company Introduction Adapteva has developed a ground breaking signal processing technology with energy efficiency of 50 GFLOPS/Watt Veteran SOC

More information

Hardware-accelerated Text Analytics

Hardware-accelerated Text Analytics R. Polig, K. Atasu, C. Hagleitner IBM Research Zurich L. Chiticariu, F. Reiss, H. Zhu IBM Research Almaden P. Hofstee IBM Research Austin Outline Introduction & background SystemT text analytics software

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

FPGA-based ASIC Design and Verification

FPGA-based ASIC Design and Verification Cisco Green Research Symposium 5 March 2008 FPGA-based ASIC Design and Verification Dejan Markovic Electrical Engineering Department University of California, Los Angeles The Issues I am Going to Address

More information

Hardware Trojans Detection Methods Julien FRANCQ

Hardware Trojans Detection Methods Julien FRANCQ DEFENDING WORLD SECURITY Hardware Trojans Detection Methods Julien FRANCQ 2013, December the 12th Outline c 2013 CASSIDIAN CYBERSECURITY - All rights reserved TRUDEVICE 2013, December the 12th Page 2 /

More information

3DES ECB Optimized for Massively Parallel CUDA GPU Architecture

3DES ECB Optimized for Massively Parallel CUDA GPU Architecture 3DES ECB Optimized for Massively Parallel CUDA GPU Architecture Lukasz Swierczewski Computer Science and Automation Institute College of Computer Science and Business Administration in Łomża Lomza, Poland

More information

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,

More information

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet Professor Jiann-Liang Chen Friday, September 23, 2011 Wireless Networks and Evolutional Communications Laboratory

More information

Xilinx FPGA Implementation of a Pixel Processor for Object Detection Applications

Xilinx FPGA Implementation of a Pixel Processor for Object Detection Applications Xilinx FPGA Implementation of a Pixel Processor for Object Detection Applications Peter Mc Curry, Fearghal Morgan, Liam Kilmartin Communications and Signal Processing Research Unit, Department of Electronic

More information

Parallel BP Neural Network on Single-chip Cloud Computer

Parallel BP Neural Network on Single-chip Cloud Computer Parallel BP Neural Network on Single-chip Cloud Computer Boyang Li Chen Liu Department of Electrical and Computer Engineering Clarkson University Potsdam, New York, USA {boyli, cliu}@clarkson.edu Abstract

More information

Intel Xeon +FPGA Platform for the Data Center

Intel Xeon +FPGA Platform for the Data Center Intel Xeon +FPGA Platform for the Data Center FPL 15 Workshop on Reconfigurable Computing for the Masses PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA

More information

Mentor Phillip Balister. Advisor Professor Miriam Leeser

Mentor Phillip Balister. Advisor Professor Miriam Leeser Mentor Phillip Balister Advisor Professor Miriam Leeser 1 Why FPGA Acceleration in GNU Radio? Faster performance for some algorithms Frees processor to perform other tasks Low latency, deterministic response

More information

Intel Labs at ISSCC 2012. Copyright Intel Corporation 2012

Intel Labs at ISSCC 2012. Copyright Intel Corporation 2012 Intel Labs at ISSCC 2012 Copyright Intel Corporation 2012 Intel Labs ISSCC 2012 Highlights 1. Efficient Computing Research: Making the most of every milliwatt to make computing greener and more scalable

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Extending the Power of FPGAs. Salil Raje, Xilinx

Extending the Power of FPGAs. Salil Raje, Xilinx Extending the Power of FPGAs Salil Raje, Xilinx Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of

More information

Innovating in the Kushagra Vaid GM, Azure Cloud Hardware Engineering Microsoft

Innovating in the Kushagra Vaid GM, Azure Cloud Hardware Engineering Microsoft Innovating in the Cloud @Hyperscale Kushagra Vaid GM, Azure Cloud Hardware Engineering Microsoft 1 Operating at scale Infrastructure capex $B s annual spend on Cloud infrastructure Servers account for

More information

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment Sanjay Kulhari, Jian Wen UC Riverside Team Sanjay Kulhari M.S. student, CS U C Riverside Jian Wen Ph.D. student, CS U

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

26 April (Next Friday)

26 April (Next Friday) MAXIMUM ADDITIONAL SCORE: 2 points Description: 1. Selection of a research paper of interest from a given list 2. Study of the selected paper and the referenced material 3. Presentation of the paper in

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Operating System Support for Multiprocessor Systems-on-Chip

Operating System Support for Multiprocessor Systems-on-Chip Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.

More information

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:

More information

Fraunhofer Institute for Telecommunications

Fraunhofer Institute for Telecommunications Fraunhofer Institute for Telecommunications Heinrich-Hertz-Institut SCUBE-ICT Emerging Berlin opportunities under FP7-ICT Call 5 Minsk, 25.-26.06.2009 Einsteinufer 37 10587 Berlin Germany Phone: Fax: email:

More information

E246: Electronics & Instrumentation. Lecture: Microprocessors and DSPs

E246: Electronics & Instrumentation. Lecture: Microprocessors and DSPs E246: Electronics & Instrumentation Lecture: Microprocessors and DSPs Microprocessor It is an integrated circuit that is the fundamental building block of a digital computer, controlled by software programs

More information

A Computer Vision System on a Chip: a case study from the automotive domain

A Computer Vision System on a Chip: a case study from the automotive domain A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel

More information

Implementation and Design of AES S-Box on FPGA

Implementation and Design of AES S-Box on FPGA International Journal of Research in Engineering and Science (IJRES) ISSN (Online): 232-9364, ISSN (Print): 232-9356 Volume 3 Issue ǁ Jan. 25 ǁ PP.9-4 Implementation and Design of AES S-Box on FPGA Chandrasekhar

More information

Verfahren zur Absicherung von Apps. Dr. Ullrich Martini IHK, 4-12-2014

Verfahren zur Absicherung von Apps. Dr. Ullrich Martini IHK, 4-12-2014 Verfahren zur Absicherung von Apps Dr. Ullrich Martini IHK, 4-12-2014 Agenda Introducing G&D Problem Statement Available Security Technologies Smartcard Embedded Secure Element Virtualization Trusted Execution

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Sifting through the many-core design space

Sifting through the many-core design space Sifting through the many-core design space Robert Mullins Computer Laboratory, University of Cambridge Robert.Mullins@cl.cam.ac.uk www.cl.cam.ac.uk/~rdm34 17 th August (2pm). CaRD group meeting School

More information

Course Name Course S15 F15 S16 F16 S17 Introduction to Electrical and Computer 18100 X X X X X Engineering

Course Name Course S15 F15 S16 F16 S17 Introduction to Electrical and Computer 18100 X X X X X Engineering 3 Year ECE Course Rollout for 2015 2017 (updated April 2015) All parts of the rollout are subject to updates. For course descriptions, please refer to http://www.ece.cmu.edu/courses/course homepages.html

More information

Processor to Usher in a New Era of Computing

Processor to Usher in a New Era of Computing Project Denver Processor to Usher in a New Era of Computing Bill Dally January 5, 2011 http://blogs.nvidia.com/2011/01/project-denver-processor-to-usher-in-new-era-of-computing/ Project Denver Announced

More information

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab FPGA Accelerator Virtualization in an OpenPOWER cloud Fei Chen, Yonghua Lin IBM China Research Lab Trend of Acceleration Technology Acceleration in Cloud is Taking Off Used FPGA to accelerate Bing search

More information

International Summer School on Embedded Systems

International Summer School on Embedded Systems International Summer School on Embedded Systems Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen, July 30 -- August 3, 2012 Sponsored by Chinese Academy of Sciences and

More information

CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING

CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING MAJOR: DEGREE: COMPUTER SCIENCE MASTER OF SCIENCE (M.S.) CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING The Department of Computer Science offers a Master of Science

More information

FPGA Implementation of a Bartlett Direction of Arrival Algorithm for a 5.8GHz Circular Antenna Array

FPGA Implementation of a Bartlett Direction of Arrival Algorithm for a 5.8GHz Circular Antenna Array 2010 IEEE Aerospace Conference Big Sky, MT, March 8, 2010 Session# 3.01 Phased Array Antennas Systems and Beamforming Technologies Pres #: 3.0101, Paper ID: 1080 Rm: Elbow 2, Time: 8:55am FPGA Implementation

More information

FlexPath Network Processor

FlexPath Network Processor FlexPath Network Processor Rainer Ohlendorf Thomas Wild Andreas Herkersdorf Prof. Dr. Andreas Herkersdorf Arcisstraße 21 80290 München http://www.lis.ei.tum.de Agenda FlexPath Introduction Work Packages

More information

A FPGA based Generic Architecture for Polynomial Matrix Multiplication in Image Processing

A FPGA based Generic Architecture for Polynomial Matrix Multiplication in Image Processing A FPGA based Generic Architecture for Polynomial Matrix Multiplication in Image Processing Prof. Dr. S. K. Shah 1, S. M. Phirke 2 Head of PG, Dept. of ETC, SKN College of Engineering, Pune, India 1 PG

More information

Introduction to OpenACC Directives. Duncan Poole, NVIDIA Thomas Bradley, NVIDIA

Introduction to OpenACC Directives. Duncan Poole, NVIDIA Thomas Bradley, NVIDIA Introduction to OpenACC Directives Duncan Poole, NVIDIA Thomas Bradley, NVIDIA GPUs Reaching Broader Set of Developers 1,000,000 s 100,000 s Early Adopters Research Universities Supercomputing Centers

More information

Dr. Tom Kean Principal Consultant Algotronix Ltd. Phone: Fax: Web:

Dr. Tom Kean Principal Consultant Algotronix Ltd. Phone: Fax: Web: Resume Dr. Tom Kean Principal Consultant Algotronix Ltd. Phone: +44 131 556 9242 Fax: +44 131 556 9247 Email: tom@algotronix.com Web: www.algotronix.com Professional Experience The principal of Algotronix,

More information

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications The implementation and performance/cost/power analysis of the network security accelerator on SoC applications Ruei-Ting Gu grating@eslab.cse.nsysu.edu.tw Kuo-Huang Chung khchung@eslab.cse.nsysu.edu.tw

More information

FACULTY OF POSTGRADUATESTUDIES Master of Science in Computer Engineering The Future University

FACULTY OF POSTGRADUATESTUDIES Master of Science in Computer Engineering The Future University FACULTY OF POSTGRADUATESTUDIES Master of Science in Computer Engineering The Future University 2 Table of Contents: Page I. Introduction 1 II. Philosophy of the Program 2 III. Aims of the Program 2 IV.

More information

CFD Implementation with In-Socket FPGA Accelerators

CFD Implementation with In-Socket FPGA Accelerators CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline

More information

Multimedia Data Processing Elements for Digital TV and Multimedia Services in Home Server Platform

Multimedia Data Processing Elements for Digital TV and Multimedia Services in Home Server Platform Multimedia Data Processing Elements for Digital TV and Multimedia Services in Home Server Platform Minte Chen IEEE Transactions on Consumer Electronics, Vol. 49, No.1, FEBRUARY 2003 IEEE Transactions on

More information

ELEC 5260/6260/6266 Embedded Computing Systems

ELEC 5260/6260/6266 Embedded Computing Systems ELEC 5260/6260/6266 Embedded Computing Systems Spring 2016 Victor P. Nelson Text: Computers as Components, 3 rd Edition Prof. Marilyn Wolf (Georgia Tech) Course Topics Embedded system design & modeling

More information

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt. Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming

More information

FSMD and Gezel. Jan Madsen

FSMD and Gezel. Jan Madsen FSMD and Gezel Jan Madsen Informatics and Mathematical Modeling Technical University of Denmark Richard Petersens Plads, Building 321 DK2800 Lyngby, Denmark jan@imm.dtu.dk Processors Pentium IV General-purpose

More information

Energy-Saving Cloud Computing Platform Based On Micro-Embedded System

Energy-Saving Cloud Computing Platform Based On Micro-Embedded System Energy-Saving Cloud Computing Platform Based On Micro-Embedded System Wen-Hsu HSIEH *, San-Peng KAO **, Kuang-Hung TAN **, Jiann-Liang CHEN ** * Department of Computer and Communication, De Lin Institute

More information

Embedded Systems. 9. Low Power Design

Embedded Systems. 9. Low Power Design Embedded Systems 9. Low Power Design Lothar Thiele 9-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Boosting Long Term Evolution (LTE) Application Performance with Intel System Studio

Boosting Long Term Evolution (LTE) Application Performance with Intel System Studio Case Study Intel Boosting Long Term Evolution (LTE) Application Performance with Intel System Studio Challenge: Deliver high performance code for time-critical tasks in LTE wireless communication applications.

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

World-wide University Program

World-wide University Program 1 World-wide University Program Preparing Today s Students for Tomorrow s Technology Joe Bungo Manager Americas/Europe R&D Division 2 CONFIDENTIAL ARM Ltd ARM founded in November 1990 Advanced RISC Machines

More information

Implementation of Image Processing Algorithms on the Graphics Processing Units

Implementation of Image Processing Algorithms on the Graphics Processing Units Implementation of Image Processing Algorithms on the Graphics Processing Units Natalia Papulovskaya, Kirill Breslavskiy, and Valentin Kashitsin Department of Information Technologies of the Ural Federal

More information

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING RasPi Kaveri Ratanpara 1, Priyan Shah 2 1 Student, M.E Biomedical Engineering, Government Engineering college, Sector-28, Gandhinagar (Gujarat)-382028,

More information

High Secure Mobile Operating System Based on a New Mobile Internet Device Hardware Architecture

High Secure Mobile Operating System Based on a New Mobile Internet Device Hardware Architecture , pp. 127-136 http://dx.doi.org/10.14257/ijfgcn.2015.8.1.14 High Secure Mobile Operating System Based on a New Mobile Internet Device Hardware Architecture Gengxin Sun and Sheng Bin International College

More information

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems A. Carbon, Y. Lhuillier, H.-P. Charles CEA LIST DACLE division Embedded Computing Embedded Software Laboratories France

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information