Secured Embedded Many-Core Accelerator for Big Data Processing



Similar documents
BSc in Computer Engineering, University of Cyprus

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Performance Oriented Management System for Reconfigurable Network Appliances

Stream Processing on GPUs Using Distributed Multimedia Middleware

OpenSoC Fabric: On-Chip Network Generator

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Power Reduction Techniques in the SoC Clock Network. Clock Power

How To Write Security Enhanced Linux On Embedded Systems (Es) On A Microsoft Linux (Amd64) (Amd32) (A Microsoft Microsoft 2.3.2) (For Microsoft) (Or

ReCoSoC'11 Montpellier, France. Implementation Scenario for Teaching Partial Reconfiguration of FPGA

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

PyMTL and Pydgin Tutorial. Python Frameworks for Highly Productive Computer Architecture Research

SSketch: An Automated Framework for Streaming Sketch-based Analysis of Big Data on FPGA *

Reconfigurable System-on-Chip Design

Achieving Performance Isolation with Lightweight Co-Kernels

FPGA Music Project. Matthew R. Guthaus. Department of Computer Engineering, University of California Santa Cruz

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

Networking Virtualization Using FPGAs

Potential Thesis Topics in Networking

7a. System-on-chip design and prototyping platforms

A General Framework for Tracking Objects in a Multi-Camera Environment

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Reconfig'09 Cancun, Mexico

ON SUITABILITY OF FPGA BASED EVOLVABLE HARDWARE SYSTEMS TO INTEGRATE RECONFIGURABLE CIRCUITS WITH HOST PROCESSING UNIT

Intel Labs at ISSCC Copyright Intel Corporation 2012

Hardware Acceleration for CST MICROWAVE STUDIO

FPGA area allocation for parallel C applications

International Workshop on Field Programmable Logic and Applications, FPL '99

Silicon Valley University Doctor of Computer Engineering (DCE) Program Outline and Study Plan

Data Center and Cloud Computing Market Landscape and Challenges

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

Introduction to GPU Programming Languages

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Hardware Trojans Detection Methods Julien FRANCQ

Extending the Power of FPGAs. Salil Raje, Xilinx

FSMD and Gezel. Jan Madsen

FPGA-based Multithreading for In-Memory Hash Joins

HPC with Multicore and GPUs

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

Next Generation Operating Systems

FlexPath Network Processor

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Operating System Support for Multiprocessor Systems-on-Chip

SURVEY REPORT DATA SCIENCE SOCIETY 2014

FACULTY OF POSTGRADUATESTUDIES Master of Science in Computer Engineering The Future University

CFD Implementation with In-Socket FPGA Accelerators

Cryptography & Network-Security: Implementations in Hardware

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

World-wide University Program

Control 2004, University of Bath, UK, September 2004

Parallelized Architecture of Multiple Classifiers for Face Detection

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

Intel Xeon +FPGA Platform for the Data Center

Implementation and Design of AES S-Box on FPGA

Computer Graphics Hardware An Overview

Hardware Acceleration for Just-In-Time Compilation on Heterogeneous Embedded Systems

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

Hardware Task Scheduling and Placement in Operating Systems for Dynamically Reconfigurable SoC

FPGA Accelerator Virtualization in an OpenPOWER cloud. Fei Chen, Yonghua Lin IBM China Research Lab

A bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

Implementation of Full -Parallelism AES Encryption and Decryption

Architectures and Platforms

Speeding Up RSA Encryption Using GPU Parallelization

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

A Computer Vision System on a Chip: a case study from the automotive domain

PFP Technology White Paper

The implementation and performance/cost/power analysis of the network security accelerator on SoC applications

How To Get A Computer Science Degree

Overview. Surveillance Systems. The Smart Camera - Hardware

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Data Centric Systems (DCS)

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

high-performance computing so you can move your enterprise forward

Advances in Smart Systems Research : ISSN : Vol. 3. No. 3 : pp.

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

MAJORS: Computer Engineering, Computer Science, Electrical Engineering

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays

Dell* In-Memory Appliance for Cloudera* Enterprise

A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

ANA: ARCNET Network Analyzer

ELEC 5260/6260/6266 Embedded Computing Systems

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

High Secure Mobile Operating System Based on a New Mobile Internet Device Hardware Architecture

Transcription:

Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland, Baltimore County http://www.csee.umbc.edu/~ameyk1/

Agenda PENC: Power Efficient Nano s Many- and its implementation results Cognitive based Hardware Security for Many- architecture Compressive Sensing (CS) OMP Reconstruction Algorithm Modifications and its Implementation on 65nm CMOS Technology, PENC Many-, and FPGA CS-based framework for Big Data acceleration on hardware platforms Reduction in data transfers, and communication of secured Encrypted data Implementation on three different platform and evaluations in terms of hardware overhead Integration of CS-based framework with Hadoop Platform for Big Data Acceleration

PENC: Power Efficient Nano s by EEHPC PENC many-core acts as an accelerator to work with host processor for data analytics and machine learning applications Architecture, Simulator, Verilog ASIC implementation are fully developed by EEHPC lab members Composed of 64 processing clusters: 192 low power RISC s Fully Placed and routed processors and routers in 65 nm, 1V CMOS having very small Chip area 5.5 mm 2 for 64 clusters. Total power of the chip @1GHz: 8.7 W NSF Grant# 00010145 ISCAS 12,ISQED 13,ISLPED 14,ISCAS 16,GLSVLSI 16,JETC 16

Cognitive Security Framework for PENC Many- Security Kernel & Interface Attack Detection Module Attack Detection Module Feature Sample Feed-Back Enable Feature Sample Feed-Back Enable 1 3 9 11 Many- Platform (64-) R1 R1 2 4 10 12 R2 5 7 13 15 R1 R1 6 8 14 16 Security Kernel & Interface Feature Sample Feed-Back Enable Feature Sample Feed-Back Enable Attack Detection Module Attack Detection Module CLK ADM CLK MC Inter- Trigger Intra- Trigger Trojan Insertion Module CLK ADM FPGA Platform Test Setup for PENC Many- Platform (64-), where Attack Detection Module implemented using Online Machine Learning technique to prevent unexpected attack Assumptions: Processing cores and memories are safe, the Trojan is inserted at Design Phase triggers malicious activity on router internally at run-time Detects three different Denial-of-Service attacks Hardware area overhead of only 0.26% and requires 3 cycles for Trojan detection, performs 2.4x faster as compared to state-of-the-art implementation DARPA Grant JETC 16, ISQED 16, HOST 16

Compressive Sensing (CS): OMP Reconstruction Algorithm We propose platform independent and reconfigurable OMP CS Reconstruction Algorithm (experimented on PENC, FPGA, and GPU) OMP CS Reconstruction Algorithm Architecture of OMP CS Reconstruction Algorithm Fixed Point Hardware Implementation Analysis of OMP algorithm for 1024x1024 size image on PENC Many- Analysis of OMP algorithm on Xilinx Virtex-7 FPGA GLSVLSI 14 ISCAS 15

Compressive Sensing (CS): OMP Reconstruction Algorithm We propose two different modifications to OMP CS Reconstruction Algorithm, Gradient Descent OMP (GD-OMP) reduces complexity of Least Square kernel Hard Thresholding OMP (HT-OMP), reduces complexity of Identification kernel Architecture of GD-OMP Algorithm Architecture of HT-OMP Algorithm Architecture Signal Size Max Freq (MHz) Reconstruction Time (µs) Area (mm 2 ) ADP (mm 2 - µs) OMP (base) [Jerome et.al.] HT OMP (This Work) 256 165 13.69 0.69 9.44 256 317 9.32 0.63 5.87 (1.6x) GD OMP (This Work) 256 317 12.52 0.40 5.01 (1.9x) ASIC Implementation Analysis on 65nm CMOS, 1V technology Quality of OMP CS Reconstruction TVLSI 16*

CS-based Framework Implementation on Different Platforms Platform Image Size Chip Area (mm 2 ) Power (mw) Execution Time (ms) CS-based Framework for Big Data Acceleration ARM CPU (28nm,0.9V) Nvidia Jetson TK1 GPU (28nm,0.9V) PENC Many- (65nm,1V) 2MB 16 12.75 378,120 2MB 37 9.52 169,225 2MB 5.5 8.67 38,019 CS-based framework is fully implemented for the Image reconstruction and Face Detection application on NVIDIA TK1 CPU+GPU platform and PENC many-core Compared to CPU and GPU implementations, PENC achieves 15x and 200x less energy consumption and 8x and 177x faster execution time Current Analysis on ARM CPU Quality Analysis of CS-based Framework Power Measurement Setup Current Analysis on K1 GPU

CS-based Framework for Big Data Acceleration using Secured PENC on Hadoop Platform treaming Data Reconstruction Quality Analysis Quality of EEG Signal and Image Reconstruction We propose compressive sensing (CS) along with PENC accelerator to reduce data communication and storage in big data streaming by up to 70%. CS-based framework with PENC has been tested for machine learning and data analytics algorithms. e.g Health monitoring, convolutional neural networks, deep learning, statistical analysis of sparse and dense matrices Framework has been implemented on low power Jetson GPU, ARM CPU & PENC

Publications 1) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Adaptive Realtime Trojan Detection Framework through Machine Learning, in Hardware Oriented Security and Trust (HOST), 2016 IEEE International Symposium on,3-5 May 2016 2) Amey Kulkarni, Ali Jafari, Chris Sagedy and Tinoosh Mohsenin," Sketching-Based High- Performance Biomedical Big Data Processing Accelerator", 49th ISCAS 2016,Canada, (Invited Talk) May2016 3) Amey Kulkarni, Youngok Pino and Tinoosh Mohsenin," SVM-based Real-Time Hardware Trojan Detection for Many- Platform", in 17th International Symposium on Quality Electronic Design (ISQED), March 2016 4) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Real-Time Anomaly Detection Framework for Many- Router through Machine Learning Techniques", ACM Journal on Emerging Technologies in Computing Systems 5) Amey Kulkarni, Ali Jafari, Colin Shea, and Tinoosh Mohsenin, "CS-based Secured Big Data Processing on FPGA, 24th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2016. Washington DC, USA. 6) Amey Kulkarni, Tahmid Abtahi, Emily Smith and Tinoosh Mohsenin, " Low Energy Sketching Engines on Many- Platform for Big Data Acceleration, in Proceedings of the 26th Edition of the Great Lakes Symposium on VLSI, GLSVLSI'16. Boston, MA, USA.

Publications 7) Amey Kulkarni, and Tinoosh Mohsenin," Accelerating Compressive Sensing Reconstruction OMP Algorithm with CPU, GPU, FPGA and Domain Specific Many-", 48th ISCAS 2015,Portugal, May2015 8) Tawana Khawari, Amey Kulkarni, Abbas Rahimi, Tinoosh Mohsenin and Houman Homayoun "Energy-Efficient Mapping of biomedical applications on Domain-Specific Accelerator under Process Variation", International Symposium on Low Power Electronics and Design,ISLPED14 9) Amey Kulkarni, Houman Homayoun and Tinoosh Mohsenin, " A Parallel and Reconfigurable Architecture for Efficient OMP Compressive Sensing Reconstruction, 24th GLSVLSI 2014,Houston, Texas, USA, May2014 (27.32% Acceptance Rate) 10) Amey Kulkarni, Colin Shea, Tahmid Abtahi and Tinoosh Mohsenin, "Low Overhead CSbased Heterogeneous Framework for Big Data Acceleration, ACM Transaction on Embedded Computing Systems 2016, (Submitted) http://www.csee.umbc.edu/~ameyk1/publications.html