Deep Neural Networks in Embedded and Real Time Systems

Similar documents
MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

Steven C.H. Hoi School of Information Systems Singapore Management University

Applications of Deep Learning to the GEOINT mission. June 2015

Fast R-CNN Object detection with Caffe

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Deep Learning GPU-Based Hardware Platform

Intel Xeon +FPGA Platform for the Data Center

Software: Systems and. Application Software. Software and Hardware. Types of Software. Software can represent 75% or more of the total cost of an IS.

Steelcape Product Overview and Functional Description

REAL-TIME STREAMING ANALYTICS DATA IN, ACTION OUT

Fundamentals of Information Systems, Seventh Edition

CFD Implementation with In-Socket FPGA Accelerators

A Computer Vision System on a Chip: a case study from the automotive domain

Business Intelligence and Decision Support Systems

A simple application of Artificial Neural Network to cloud classification

Learning to Process Natural Language in Big Data Environment

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

InstaNet: Object Classification Applied to Instagram Image Streams

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:

Position Classification Flysheet for Computer Science Series, GS Table of Contents

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

How to handle Out-of-Memory issue

Emerging Geospatial Trends The Convergence of Technologies. Jim Steiner Vice President, Product Management

Unified Batch & Stream Processing Platform

Advanced analytics at your hands

Architectures for Big Data Analytics A database perspective

Pedestrian Detection with RCNN

Reference Architecture, Requirements, Gaps, Roles

Why Big Data in the Cloud?

Machina Research. Where is the value in IoT? IoT data and analytics may have an answer. Emil Berthelsen, Principal Analyst April 28, 2016

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

Performing a data mining tool evaluation

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

DPtech ADX Application Delivery Platform Series

Masters in Information Technology

7a. System-on-chip design and prototyping platforms

Frequently Asked Questions

Stream Processing on GPUs Using Distributed Multimedia Middleware

Understanding Megapixel Camera Technology for Network Video Surveillance Systems. Glenn Adair

LONG BEACH CITY COLLEGE MEMORANDUM

UF EDGE brings the classroom to you with online, worldwide course delivery!

The 4 Pillars of Technosoft s Big Data Practice

GPU-Based Deep Learning Inference:

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Enterprise Content Management. A White Paper. SoluSoft, Inc.

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

VisioWave TM. Intelligent Video Platform (IVP) Smart Scalable Open. Optimized for high-performance, mission-critical digital video surveillance

Capstone Overview Architecture for Big Data & Machine Learning. Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015

Optimizing Converged Cisco Networks (ONT)

White Paper November Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

NOVA COLLEGE-WIDE COURSE CONTENT SUMMARY ITE INTRODUCTION TO COMPUTER APPLICATIONS & CONCEPTS (3 CR.)

How Telecom Italia Empowers Customer Service from the IMS Cloud

Using Convolutional Neural Networks for Image Recognition

Machine Learning using MapReduce

SAP Predictive Analysis: Strategy, Value Proposition

Cloud Computing and the Future of Internet Services. Wei-Ying Ma Principal Researcher, Research Area Manager Microsoft Research Asia

How To Program With Adaptive Vision Studio

Embedded Software development Process and Tools: Lesson-4 Linking and Locating Software

Computer Graphics Hardware An Overview

Introduction to GPU Programming Languages

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Client Overview. Engagement Situation. Key Requirements

Computer Layers. Hardware BOOT. Operating System. Applications

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis

Spark: Cluster Computing with Working Sets

Implementing a Successful Digital First Strategy

The Rise of Industrial Big Data

Taking Inverse Graphics Seriously

Compacting ConvNets for end to end Learning

How To Handle Big Data With A Data Scientist

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

CSC384 Intro to Artificial Intelligence

Voice Communication Package v7.0 of front-end voice processing software technologies General description and technical specification

ProTrack: A Simple Provenance-tracking Filesystem

Fact sheet. Tractable. Automating visual recognition tasks with Artificial Intelligence

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

Analytics Strategy Information Architecture Data Management Analytics Value and Governance Realization

Data Deduplication and Corporate PC Backup

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

SAP HANA In-Memory Database Sizing Guideline

Building a Hybrid Data Warehouse Model

Professor, D.Sc. (Tech.) Eugene Kovshov MSTU «STANKIN», Moscow, Russia

APTA TransiTech Conference Communications: Vendor Perspective (TT) Phoenix, Arizona, Tuesday, VoIP Solution (101)

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Return On Investment XpoLog Center

balesio Native Format Optimization Technology (NFO)

QUEST The Systems Integration, Process Flow Design and Visualization Solution

Business Intelligence and Big Data Analytics: Speeding the Cycle from Insights to Action Four Steps to More Profitable Customer Engagement

Optimizing AAA Games for Mobile Platforms

SIGNAL INTERPRETATION

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Transcription:

Deep Neural Networks in Embedded and Real Time Systems

Deep Learning Neural Networks Deep Learning A family of neural network methods using high number of layers Focused on feature representations Convolutional Neural Networks (CNN) Most popular deep learning neural network method Benefits 1. Best recognition quality (vs. alternative recognition algorithms) 2. Re-trainable without code changes (implemented once and used many times) Driver Assistance (ADAS) Vision Analytics Artificial Intelligence (AI) Object Recognition Augmented Reality / Virtual Reality 2

Training is for everyone Free available training frameworks: Caffe The most popular, developed by the Berkeley Vision and Learning Center TensorFlow Open source software library for numerical computation using data flow graphs supported by Google Training is done offline Training and designing a network became an expertise There are many ways to design a network and train it Large variety in network structures (GoogLeNet, AlexNet, VGG, NIN) Many open source examples are available (Caffe Model Zoo) 3

RT classification requires knowledge There are many ways to design a CNN Optimizations are usually done to a specific network There is a strong requirement for flexible implementations Modifying the network Changing network characteristics Porting proprietary networks consumes time Special programing knowledge (intrinsic, assembly) Specific experience in the embedded platform (instructions, hardware capabilities) Optimized solutions are not flexible for changes Long Time To Market 4

Neural network embedded challenges Very high bandwidth consuming Between Layers of data transfer in and out the DDR AlexNet 12 MB between layers data transfer (16-bit precision single execution) Convolution and Fully Connected data weights from DDR AlexNet 243 MB weights in floating point precision Processing more than one ROI with the same network Required conversion from floating point to fixed point Training results are in floating point while low power and area platforms prefer fixed point calculations to reduce power and area The number of operations can reach more than a mega operations per layer especially in the convolution layer Internal memory size limitation on embedded platform 5

What can be done? (1) Reducing bandwidth In the convolution layer, each output is calculated by the same inputs Weights matrix are shared between output results in the same map (in order not to load the weights more than once) The input data can be reused to avoid useless transactions from DDR Convolution Polling Fully Connected m n 6

What can be done? (2) Maximum multiply accumulate utilization Differentiating between large inputs to small input maps and the amount from each type ll Large size maps - XX cc,ii,jj CC cc=0 HH ii=0 WW jj=0 = WW cc,mm,nn ll Many maps with small size (last layers) - XX cc,ii,jj ll Convolution ll 1 XX cc,ii+mm,jj+nn HH ii=0 WW jj=0 CC cc=0 = WW cc,mm,nn ll ll 1 XX cc,ii+mm,jj+nn n m 7

What can be done? (3) Overcome small internal memory size Trying to preserve the principle of All inputs must be in the internal memory by tile division Dividing all input maps to same tile size Convolution n m 8

What can be done? (3) Using compression algorithms and prior knowledge to reduce bandwidth to and from the external memory It is known there is a lot of redundancy in the network data Can be done offline Example AlexNet fully connected BW, before compression, can be reduced to 6 MBytes Using dedicated and smart instructions for large scale operations such as convolutions Example: CEVA vector processing by CEVA-XM4 DSP core Includes dedicated instructions which help with CNN acceleration 9

CEVA Deep Neural Network Library General acceleration library for deep neural network algorithms especially convolution neural network (CNN) Provides offline tool for network generator Converts from offline to real-time network representation Converts floating point trained network weights to fixed point weights with small degradation Optimizes the initial network structure to real time execution Real-time initialization and execution of any designed network Supports single layer acceleration such as: Convolution, pooling, softmax, normalization, activation and fully connected 10

Programmer Flow for CNN Acceleration (1) Offline training Network developer uses proprietary training process In-house framework Open source frameworks Training output : Network weights in floating point precision Final network defined structure Network structure and parameters Training data base and reference Final network structure Trained network weights Optional mean image Back error propagation 11

Programmer Flow for CNN Acceleration (2) CEVA Network Generator Converts weights and network structure definition to real-time execution on CEVA-XM4 Floating point to fixed point network weights Network optimizations Final network structure Trained network weights (floating point) Optional mean image Optimized CEVA network structure Fixed point rearranged network weights Fixed point mean image 12

Programmer Flow for Fast CNN Acceleration (1) Programmer interface : Network creation and initialization (done once) CDNN context creation Memory buffers description (output and input) Network creation Network initialization 13

Programmer Flow for Fast CNN Acceleration (2). Programmer interface : Network execution (streaming) Update network input memory buffer Execution Query for results Update OpenCV data structures 14

Programmer Flow for Fast CNN Acceleration (3). Programmer interface : Network release Release all open CDNN memory object created Release the CDNN Network Release CDNN 15

CNN Usage Flow with Caffe & CDNN 16

Thank You