DEEP LEARNING WITH GPUS

Similar documents
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Deep Learning and GPUs. Julie Bernauer

Applications of Deep Learning to the GEOINT mission. June 2015

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

Steven C.H. Hoi School of Information Systems Singapore Management University

Deep Learning Meets Heterogeneous Computing. Dr. Ren Wu Distinguished Scientist, IDL, Baidu

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Deep Learning GPU-Based Hardware Platform

THE WORLD LEADER IN VISUAL COMPUTING

FAST DATA = BIG DATA + GPU. Carlo Nardone, Senior Solution Architect EMEA Enterprise UNITO, March 21 th, 2016

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Scalable Developments for Big Data Analytics in Remote Sensing

Getting Started with Caffe Julien Demouth, Senior Engineer

Learning to Process Natural Language in Big Data Environment

Deep Learning For Text Processing

SIGNAL INTERPRETATION

PhD in Computer Science and Engineering Bologna, April Machine Learning. Marco Lippi. Marco Lippi Machine Learning 1 / 80

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

HPC Wales Skills Academy Course Catalogue 2015

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

HPC with Multicore and GPUs

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Parallel Computing with MATLAB

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

Machine Learning Introduction

Extreme Machine Learning with GPUs John Canny

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Apache Hama Design Document v0.6

: Introduction to Machine Learning Dr. Rita Osadchy

ST810 Advanced Computing

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Accelerating CFD using OpenFOAM with GPUs

Machine Learning What, how, why?

SURVEY REPORT DATA SCIENCE SOCIETY 2014

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

NVIDIA Tesla K20-K20X GPU Accelerators Benchmarks Application Performance Technical Brief

Analecta Vol. 8, No. 2 ISSN

MACHINE LEARNING BASICS WITH R

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Map-Reduce for Machine Learning on Multicore

Introduction to Machine Learning CMU-10701

Origins, Evolution, and Future Directions of MATLAB Loren Shure

Scientific Computing Programming with Parallel Objects

High Performance Computing in CST STUDIO SUITE

Machine Learning with MATLAB David Willingham Application Engineer

GPU Renderfarm with Integrated Asset Management & Production System (AMPS)

Compacting ConvNets for end to end Learning

An Introduction to Data Mining

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies

Taking Inverse Graphics Seriously

Overview of HPC Resources at Vanderbilt

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Advanced analytics at your hands

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Dell* In-Memory Appliance for Cloudera* Enterprise

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Summit and Sierra Supercomputers:

An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048

Hadoop Ecosystem B Y R A H I M A.

Image Classification for Dogs and Cats

An Introduction to Deep Learning

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Big Data in medical image processing

Spark and the Big Data Library

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Spark: Cluster Computing with Working Sets

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com


Big Workflow: More than Just Intelligent Workload Management for Big Data

HIGH PERFORMANCE BIG DATA ANALYTICS

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

How To Get A Computer Science Degree

A Brief Introduction to Apache Tez

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Retargeting PLAPACK to Clusters with Hardware Accelerators

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Using the Windows Cluster

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

How To Write A Trusted Analytics Platform (Tap)

Analysis Tools and Libraries for BigData

Transcription:

DEEP LEARNING WITH GPUS GEOINT 2015 Larry Brown Ph.D. June 2015

AGENDA 1 Introducing NVIDIA 2 What is Deep Learning? 3 GPUs and Deep Learning 4 cudnn and DiGiTS 5 Machine Learning & Data Analytics and a video! 2

INTRODUCING NVIDIA I know what you re thinking GAMING SPECIAL FX 3

NVIDIA - INVENTOR OF THE GPU NVIDIA Invented the GPU in 1999, with over 1 Billion shipped to date. Initially a dedicated a graphics processor for PCs, the GPU s computational power and energy efficiency led to it quickly being adopted for professional high performance computing and enterprise data analytics. In 2007, NVIDIA launched the CUDA programming platform, and opened up the general purpose parallel processing capabilities of the GPU. 4

NVIDIA PLATFORM DESIGN and VISUALIZATION MOBILE and EMBEDDED GPUs and SOCs GRAPHICS CARDS SYSTEMS IP HPC and DATA CENTER GAMING 5

USA - TWO FLAGSHIP SUPERCOMPUTERS SUMMIT 150-300 PFLOPS Peak Performance SIERRA > 100 PFLOPS Peak Performance Powered by the NVIDIA GPU IBM POWER9 CPU + NVIDIA Volta GPU NVLink High Speed Interconnect >40 TFLOPS per Node, >3,400 Nodes 2017 6

BEYOND HPC TO BIG DATA ANALYTICS Attendees at GPU Technology Conference, 2014 7

NVIDIA GPU - ACCELERATING SCIENCE Researchers using Tesla GPUs are solving the world s great scientific and technical challenges. Using a supercomputer powered by 3,000 Tesla processors, University of Illinois scientists achieved a breakthrough in HIV research. Another research team from Baylor, Rice, MIT, Harvard and the Broad Institute used GPUs to map how the human genome folds within the nucleus of a cell. These and other advances in science have been highlighted in top journals and are regularly showcased at GTC, our annual developer conference. 8

10X GROWTH IN GPU COMPUTING 2008 2015 150,000 CUDA Downloads 3 Million CUDA Downloads 27 CUDA Apps 319 CUDA Apps 60 Universities Teaching 800 Universities Teaching 4,000 Academic Papers 60,000 Academic Papers 6,000 Tesla GPUs 450,000 Tesla GPUs 77 Supercomputing Teraflops 54,000 Supercomputing Teraflops 9

GPU ACCELERATED COMPUTING 10X PERFORMANCE 5X ENERGY EFFICIENCY CPU Optimized for Serial Tasks GPU Accelerator Optimized for Parallel Tasks 10

What is Deep Learning? 11

ACCELERATING DEEP LEARNING Machine Learning is in some sense a rebranding of AI. CUDA for Deep Learning The focus is now on more specific, often perceptual tasks, and there are many successes. Today, some of the world s largest internet companies, as well as the foremost research institutions, are using GPUs for machine learning. 12

INDUSTRIAL USE CASES machine learning is pervasive Social Media Defense / Intelligence Consumer Electronics Medical Energy Media & Entertainment 13

TRADITIONAL ML HAND TUNED FEATURES Images/video Image Vision features Detection Audio Audio Audio features Speaker ID Text Text Text features Text classification, Machine translation, Information retrieval,... Slide courtesy of Andrew Ng, Stanford University 14

WHAT IS DEEP LEARNING? Systems that learn to recognize objects that are important, without us telling the system explicitly what that object is ahead of time Key components Task Features Model Learning Algorithm 15

THE PROMISE OF MACHINE LEARNING ML Systems Extract Value From Big Data 350 millions images uploaded per day 2.5 Petabytes of customer data hourly 100 hours of video uploaded every minute 16

IMAGE CLASSIFICATION WITH DNNS Training Inference cars buses trucks motorcycles truck 17

IMAGE CLASSIFICATION WITH DNNS Training cars buses trucks motorcycles Typical training run Pick a DNN design Input 100 million training images spanning 1,000 categories One week of computation Test accuracy If bad: modify DNN, fix training set or update training parameters 18

WHAT MAKES DEEP LEARNING DEEP? Today s Largest Networks ~10 layers 1B parameters 10M images ~30 Exaflops ~30 GPU days Human brain has trillions of parameters only 1,000 more. Input Result 19

DEEP LEARNING ADVANTAGES Deep Learning Don t have to figure out the features ahead of time. Use same neural net approach for many different problems. Fault tolerant. Scales well. Support Vector Machine Linear classifier Regression Decision Trees Bayesian Clustering Association Rules 20

CONVOLUTIONAL NEURAL NETWORKS Biologically inspired. Neuron only connected to a small region of neurons in layer below it called the receptive field. A given layer can have many convolutional filters/kernels. Each filter has the same weights across the whole layer. Bottom layers are convolutional, top layers are fully connected. Generally trained via supervised learning. Supervised Unsupervised Reinforcement ideal system automatically switches modes 21

CONVOLUTIONAL NETWORKS BREAKTHROUGH Y. LeCun et al. 1989-1998 : Handwritten digit reading A. Krizhevsky, G. Hinton et al. 2012 : Imagenet classification winner 22

CNNS DOMINATE IN PERCEPTUAL TASKS Slide credit: Yann Lecun, Facebook & NYU 23

RECURRENT NEURAL NETWORK - RNN AKA: LSTM Remembers prior state. Good for sequences. Predict next character given input text. Translate sentence between languages. Generate a caption for an image. 24

WHY IS DEEP LEARNING HOT NOW? Three Driving Factors Big Data Availability New ML Techniques Compute Density 350 millions images uploaded per day 2.5 Petabytes of customer data hourly 100 hours of video uploaded every minute Deep Neural Networks GPUs ML systems extract value from Big Data 25

GPUs and Deep Learning 26

GPUs THE HOT PLATFORM FOR MACHINE LEARNING Image Recognition Challenge 1.2M training images 1000 object categories Hosted by person car bird helmet frog motorcycle person person hammer dog flower pot chair power drill 120 100 80 60 40 20 0 30% 25% 20% 15% 10% 5% 0% GPU Entries 110 60 4 2010 2011 2012 2013 2014 Classification Error Rates 28% 26% 16% 12% 7% 2010 2011 2012 2013 2014 27

GPUS MAKE DEEP LEARNING ACCESSIBLE Deep learning with COTS HPC systems A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, B. Catanzaro ICML 2013 GOOGLE DATACENTER STANFORD AI LAB Now You Can Build Google s $1M Artificial Brain on the Cheap 1,000 CPU Servers 2,000 CPUs 16,000 cores 600 kwatts $5,000,000 3 GPU-Accelerated Servers 12 GPUs 18,432 cores 4 kwatts $33,000 28

Accuracy % 72% IMAGENET CHALLENGE 74% 84% DNN CV 2010 2011 2012 2013 2014 Deep Image: Scaling up Image Recognition Baidu: 5.98%, Jan. 13, 2015 Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Microsoft: 4.94%, Feb. 6, 2015 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariant Shift Google: 4.82%, Feb. 11, 2015 29

GOOGLE KEYNOTE AT GTC 2015 30

DATA AUGMENTATION Augmentation expands your dataset Mirror images Distorted / blurred Rotations Color changes Example from Baidu 31

WHY ARE GPUs GOOD FOR DEEP LEARNING? ImageNet Challenge Accuracy Neural Networks GPUs 93% Inherently Parallel Matrix Operations FLOPS 84% 88% GPUs deliver -- - same or better prediction accuracy - faster results - smaller footprint - lower power 72% 74% 2010 2011 2012 2013 2014 32

GPU ACCELERATION Training A Deep, Convolutional Neural Network Batch Size Training Time CPU Training Time GPU GPU Speed Up 64 images 64 s 7.5 s 8.5X 128 images 124 s 14.5 s 8.5X 256 images 257 s 28.5 s 9.0X ILSVRC12 winning model: Supervision 7 layers 5 convolutional layers + 2 fully-connected ReLU, pooling, drop-out, response normalization Implemented with Caffe Dual 10-core Ivy Bridge CPUs 1 Tesla K40 GPU CPU times utilized Intel MKL BLAS library GPU acceleration from CUDA matrix libraries (cublas) 33

DEEP LEARNING EXAMPLES Image Classification, Object Detection, Localization, Action Recognition, Scene Understanding Speech Recognition, Speech Translation, Natural Language Processing Pedestrian Detection, Traffic Sign Recognition Breast Cancer Cell Mitosis Detection, Volumetric Brain Image Segmentation 34

GPU-ACCELERATED DEEP LEARNING FRAMEWORKS CAFFE TORCH THEANO CUDA- CONVNET2 KALDI Domain Deep Learning Framework Scientific Computing Framework Math Expression Compiler Deep Learning Application Speech Recognition Toolkit cudnn R2 R2 R2 -- -- Multi-GPU In Progress Partial Partial (nnet2) Multi-CPU (nnet2) License BSD-2 GPL BSD Apache 2.0 Apache 2.0 Interface(s) Text-based definition files, Python, MATLAB Python, Lua, MATLAB Python C++ C++, Shell scripts Embedded (TK1) http://developer.nvidia.com/deeplearning 35

cudnn and DiGiTS 36

HOW GPU ACCELERATION WORKS Application Code Compute-Intensive Functions GPU 5% of Code ~ 80% of run-time Rest of Sequential CPU Code CPU + 37

WHAT IS CUDNN? cudnn is a library of primitives for deep learning Applications Programming Languages Libraries OpenACC Directives Maximum Flexibility Drop-in Acceleration Easily Accelerate Applications 38

ANALOGY TO HPC cudnn is a library of primitives for deep learning Application Fluid Dynamics Computational Physics BLAS standard interface Various CPU BLAS implementations cublas/nvblas Intel CPUs IBM Power Tesla TK1 Titan TX1 39

DEEP LEARNING WITH CUDNN cudnn is a library of primitives for deep learning Applications Frameworks cudnn GPUs Tesla TX-1 Titan 40

CUDNN ROUTINES Convolutions 80-90% of the execution time Pooling - Spatial smoothing Activation - Pointwise non-linear function 41

CONVOLUTIONS THE MAIN WORKLOAD Very compute intensive, but with a large parameter space 1 Minibatch Size 2 Input feature maps 3 Image Height 4 Image Width 5 Output feature maps 6 Kernel Height 7 Kernel Width 8 Top zero padding 9 Side zero padding 10 Vertical stride 11 Horizontal stride Layout and configuration variations Other cudnn routines have straightforward implementations 42

CUDNN V2 - PERFORMANCE CPU is 16 core Haswell E5-2698 at 2.3 GHz, with 3.6 GHz Turbo GPU is NVIDIA Titan X 43

CUDNN EASY TO ENABLE Install cudnn on your system Download CAFFE In CAFFE Makefile.config uncomment USE_CUDNN := 1 Install CAFFE as usual Use CAFFE as usual. Install cudnn on your system Install Torch as usual Install cudnn.torch module Use cudnn module in Torch instead of regular nn module. cudnn module is API compatable with standard nn module. Replace nn with cudnn CUDA 6.5 or newer required 44

DIGITS Interactive Deep Learning GPU Training System Data Scientists & Researchers: Quickly design the best deep neural network (DNN) for your data Visually monitor DNN training quality in real-time Manage training of many DNNs in parallel on multi-gpu systems developer.nvidia.com/digits 45

Main Console DIGITS Workflow Configure your Network Create your database Create your dataset Configure your model Start training Choose your database Start Training Choose a default network, modify one, or create your own 46

Machine Learning and Data Analytics 47

TRADITIONAL MACHINE LEARNING For your many non-dl applications Interactive environment for easily building and deploying ML systems. Holds records for performance on many common ML tasks, on single nodes or clusters. Uses Scala. Feels like SciPy or Matlab. 48

Time per iteration (s) GPU ACCELERATION FOR GRAPH ANALYTICS Comms & Social networks Cyber pattern recognition Shortest path finding 1.2 PageRank : 19x Speedup 1 GPU vs 60 Nodes 280x vs optimized Spark 1440x vs Spark 3420x vs Hadoop 1 0.8 0.967 2 1 2 3 1 0.6 0.4 0.2 0 Lower is Better 0.051 Intel Xeon E5-2690 v2 2 2 2 2 1 0 1 3 2 1 2 2 3 49

Thank you!