Secured Embedded Many-Core Accelerator for Big Data Processing

Secured Embedded Many- Accelerator for Big Data Processing Amey Kulkarni PhD Candidate Advisor: Professor Tinoosh Mohsenin Energy Efficient High Performance Computing (EEHPC) Lab University of Maryland, Baltimore County http://www.csee.umbc.edu/~ameyk1/

Agenda PENC: Power Efficient Nano s Many- and its implementation results Cognitive based Hardware Security for Many- architecture Compressive Sensing (CS) OMP Reconstruction Algorithm Modifications and its Implementation on 65nm CMOS Technology, PENC Many-, and FPGA CS-based framework for Big Data acceleration on hardware platforms Reduction in data transfers, and communication of secured Encrypted data Implementation on three different platform and evaluations in terms of hardware overhead Integration of CS-based framework with Hadoop Platform for Big Data Acceleration

PENC: Power Efficient Nano s by EEHPC PENC many-core acts as an accelerator to work with host processor for data analytics and machine learning applications Architecture, Simulator, Verilog ASIC implementation are fully developed by EEHPC lab members Composed of 64 processing clusters: 192 low power RISC s Fully Placed and routed processors and routers in 65 nm, 1V CMOS having very small Chip area 5.5 mm 2 for 64 clusters. Total power of the chip @1GHz: 8.7 W NSF Grant# 00010145 ISCAS 12,ISQED 13,ISLPED 14,ISCAS 16,GLSVLSI 16,JETC 16

Cognitive Security Framework for PENC Many- Security Kernel & Interface Attack Detection Module Attack Detection Module Feature Sample Feed-Back Enable Feature Sample Feed-Back Enable 1 3 9 11 Many- Platform (64-) R1 R1 2 4 10 12 R2 5 7 13 15 R1 R1 6 8 14 16 Security Kernel & Interface Feature Sample Feed-Back Enable Feature Sample Feed-Back Enable Attack Detection Module Attack Detection Module CLK ADM CLK MC Inter- Trigger Intra- Trigger Trojan Insertion Module CLK ADM FPGA Platform Test Setup for PENC Many- Platform (64-), where Attack Detection Module implemented using Online Machine Learning technique to prevent unexpected attack Assumptions: Processing cores and memories are safe, the Trojan is inserted at Design Phase triggers malicious activity on router internally at run-time Detects three different Denial-of-Service attacks Hardware area overhead of only 0.26% and requires 3 cycles for Trojan detection, performs 2.4x faster as compared to state-of-the-art implementation DARPA Grant JETC 16, ISQED 16, HOST 16

Compressive Sensing (CS): OMP Reconstruction Algorithm We propose platform independent and reconfigurable OMP CS Reconstruction Algorithm (experimented on PENC, FPGA, and GPU) OMP CS Reconstruction Algorithm Architecture of OMP CS Reconstruction Algorithm Fixed Point Hardware Implementation Analysis of OMP algorithm for 1024x1024 size image on PENC Many- Analysis of OMP algorithm on Xilinx Virtex-7 FPGA GLSVLSI 14 ISCAS 15

Compressive Sensing (CS): OMP Reconstruction Algorithm We propose two different modifications to OMP CS Reconstruction Algorithm, Gradient Descent OMP (GD-OMP) reduces complexity of Least Square kernel Hard Thresholding OMP (HT-OMP), reduces complexity of Identification kernel Architecture of GD-OMP Algorithm Architecture of HT-OMP Algorithm Architecture Signal Size Max Freq (MHz) Reconstruction Time (µs) Area (mm 2 ) ADP (mm 2 - µs) OMP (base) [Jerome et.al.] HT OMP (This Work) 256 165 13.69 0.69 9.44 256 317 9.32 0.63 5.87 (1.6x) GD OMP (This Work) 256 317 12.52 0.40 5.01 (1.9x) ASIC Implementation Analysis on 65nm CMOS, 1V technology Quality of OMP CS Reconstruction TVLSI 16*

CS-based Framework Implementation on Different Platforms Platform Image Size Chip Area (mm 2 ) Power (mw) Execution Time (ms) CS-based Framework for Big Data Acceleration ARM CPU (28nm,0.9V) Nvidia Jetson TK1 GPU (28nm,0.9V) PENC Many- (65nm,1V) 2MB 16 12.75 378,120 2MB 37 9.52 169,225 2MB 5.5 8.67 38,019 CS-based framework is fully implemented for the Image reconstruction and Face Detection application on NVIDIA TK1 CPU+GPU platform and PENC many-core Compared to CPU and GPU implementations, PENC achieves 15x and 200x less energy consumption and 8x and 177x faster execution time Current Analysis on ARM CPU Quality Analysis of CS-based Framework Power Measurement Setup Current Analysis on K1 GPU

CS-based Framework for Big Data Acceleration using Secured PENC on Hadoop Platform treaming Data Reconstruction Quality Analysis Quality of EEG Signal and Image Reconstruction We propose compressive sensing (CS) along with PENC accelerator to reduce data communication and storage in big data streaming by up to 70%. CS-based framework with PENC has been tested for machine learning and data analytics algorithms. e.g Health monitoring, convolutional neural networks, deep learning, statistical analysis of sparse and dense matrices Framework has been implemented on low power Jetson GPU, ARM CPU & PENC

Publications 1) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Adaptive Realtime Trojan Detection Framework through Machine Learning, in Hardware Oriented Security and Trust (HOST), 2016 IEEE International Symposium on,3-5 May 2016 2) Amey Kulkarni, Ali Jafari, Chris Sagedy and Tinoosh Mohsenin," Sketching-Based High- Performance Biomedical Big Data Processing Accelerator", 49th ISCAS 2016,Canada, (Invited Talk) May2016 3) Amey Kulkarni, Youngok Pino and Tinoosh Mohsenin," SVM-based Real-Time Hardware Trojan Detection for Many- Platform", in 17th International Symposium on Quality Electronic Design (ISQED), March 2016 4) Amey Kulkarni, Youngok Pino, Matthew French and Tinoosh Mohsenin,"Real-Time Anomaly Detection Framework for Many- Router through Machine Learning Techniques", ACM Journal on Emerging Technologies in Computing Systems 5) Amey Kulkarni, Ali Jafari, Colin Shea, and Tinoosh Mohsenin, "CS-based Secured Big Data Processing on FPGA, 24th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2016. Washington DC, USA. 6) Amey Kulkarni, Tahmid Abtahi, Emily Smith and Tinoosh Mohsenin, " Low Energy Sketching Engines on Many- Platform for Big Data Acceleration, in Proceedings of the 26th Edition of the Great Lakes Symposium on VLSI, GLSVLSI'16. Boston, MA, USA.

Publications 7) Amey Kulkarni, and Tinoosh Mohsenin," Accelerating Compressive Sensing Reconstruction OMP Algorithm with CPU, GPU, FPGA and Domain Specific Many-", 48th ISCAS 2015,Portugal, May2015 8) Tawana Khawari, Amey Kulkarni, Abbas Rahimi, Tinoosh Mohsenin and Houman Homayoun "Energy-Efficient Mapping of biomedical applications on Domain-Specific Accelerator under Process Variation", International Symposium on Low Power Electronics and Design,ISLPED14 9) Amey Kulkarni, Houman Homayoun and Tinoosh Mohsenin, " A Parallel and Reconfigurable Architecture for Efficient OMP Compressive Sensing Reconstruction, 24th GLSVLSI 2014,Houston, Texas, USA, May2014 (27.32% Acceptance Rate) 10) Amey Kulkarni, Colin Shea, Tahmid Abtahi and Tinoosh Mohsenin, "Low Overhead CSbased Heterogeneous Framework for Big Data Acceleration, ACM Transaction on Embedded Computing Systems 2016, (Submitted) http://www.csee.umbc.edu/~ameyk1/publications.html