FAST DATA = BIG DATA + GPU Carlo Nardone, Senior Solution Architect EMEA Enterprise FastData @ UNITO, March 21 th, 2016
GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO / EMBEDDED THE WORLD LEADER IN VISUAL COMPUTING 2
From industrial design to advanced special effects, Quadro is the preeminent platform for professional artists. NVIDIA Quadro GPUs power 90% of the world s workstations and nearly every major design application uses its tools. For six years running, every film nominated for the Academy Award for Best Visual Effects was made using NVIDIA technology. 3
Virtual reality is a revolutionary visual computing experience that requires very powerful GPUs. It will transform next-gen gaming and ripple through many other industries. Our GameWorks VR software helps both headset and game developers create amazing VR experiences. Multi-Res Shading VR SLI Context Priority Direct Mode Front Buffer Rendering 4
NVIDIA GPUs are essential to the field of medical imaging. They power the GE Revolution CT scanner, which can produce high-quality imagery while reducing radiation dosage by up to 82% for patients of all ages. 5
Tomorrow s cars will have rich, virtual digital cockpits that require complete system and software integration. NVIDIA processors power the digital cockpits and infotainment systems of some of the world s most innovative cars, including models from Audi, BMW, Honda, Lamborghini, Tesla and VW. There are over 10 million cars with NVIDIA processors on the road today. 6
Visual computing and AI will make future cars safer and delightful to drive. At the same time, Uber-like services with driverless shuttles will revolutionize transportation. With the horsepower of 150 MacBook Pros, NVIDIA DRIVE PX 2 can perform 24 trillion deep learning operations per second. The size of a lunchbox, it can fit neatly into a trunk. The end-to-end DRIVE PX platform is loaded with software, including DriveWorks, for developing applications across the entire self-driving pipeline; DIGITS, for training and visualizing deep neural networks; and DriveNet, our reference deep neural network. 7
We re also bringing AI and deep learning to a world of robots and drones. Jetson TX1, the first embedded computer designed to process deep neural networks, delivers a whopping 1 TeraFLOPS of performance in a credit-card sized module. Such power will enable drones that don't just fly by remote control, but navigate their way for search and rescue; compact security surveillance systems that don't just scan crowds, but identify suspicious activity; and robots that don't just perform tasks, but tailor them to individuals habits. 8
GPU COMPUTING
It s time to start planning for the end of Moore s Law, and it s worth pondering how it will end, not just when. Robert Colwell Director, Microsystems Technology Office, DARPA 10
HOW GPU ACCELERATION WORKS Optimized for parallel, high throughput tasks Application Code Optimized for sequential low latency GPU Compute-Intensive Functions 5% of Code Rest of Sequential CPU Code CPU + 11
70% OF TOP HPC APPS ACCELERATED INTERSECT360 SURVEY OF TOP APPS TOP 25 APPS IN SURVEY GROMACS SIMULIA Abaqus NAMD AMBER ANSYS Mechanical Exelis IDL MSC NASTRAN LAMMPS NWChem LS-DYNA Schrodinger Gaussian GAMESS Top 10 HPC Apps 90% Accelerated Intersect360, Nov 2015 HPC Application Support for GPU Computing Top 50 HPC Apps 70% Accelerated ANSYS Fluent WRF VASP OpenFOAM CHARMM Quantum Espresso ANSYS CFX Star-CD CCSM COMSOL Star-CCM+ BLAST = All popular functions accelerated = Some popular functions accelerated = In development = Not supported 12
3D STACKED MEMORY - HBM 3D Chip-on-Wafer integration Microbump array TSV (Through Silicon Via) Silicon interposer Many X bandwidth 2.5X capacity 1200 1000 800 600 400 200 Memory Bandwidth 4X energy efficiency 0 2008 2010 2012 2014 2016 13
NVLINK GPU to CPU via NVLink GPU to GPU via NVLink Pascal 4 NVLink 20GB/s each CPU (NVLINK Enabled) CPU (x86) HBM 16-32GB PCIe Control 1Tbyte/s DDR4 50-75 GB/s DDR Memory 10s-100s GB Pascal PCIe Switch 4 NVLink 20GB/s each Pascal Whitepaper: http://www.nvidia.com/object/nvlink.html 14
DEEP LEARNING
THE BIG BANG IN MACHINE LEARNING DNN BIG DATA GPU Google s AI engine also reflects how the world of computer hardware is changing. (It) depends on machines equipped with GPUs And it depends on these chips more than the larger tech universe realizes.
THE PROMISE OF MACHINE LEARNING LIVE DATA ANALYSIS PREDICTION? MACHINE LEARNING GPUs for Classification TRAINING DATA GPUs for Training
THE AI RACE IS ON Amazon ML MS AzureML CNTK IBM Watson Jeopardy Caffe Torch Theano ML Beats Humans Microsoft Google Google TensorFlow Facebook Torch IBM Watson ImageNet NVIDIA cudnn Toyota $1B AI Lab Google Brain Google Car 1M Miles 2010 2011 2012 2013 2014 2015
MACHINE LEARNING USING DNN Today s Largest Networks ~10 layers 1B parameters 10M images ~30 Exaflops ~30 GPU days Human brain has trillions of parameters 1,000 more. Input Result Image source: Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks ICML 2009 & Comm. ACM 2011. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng.
DEEP LEARNING: UNREASONABLY EFFECTIVE* Image Classification, Object Detection, Localization, Action Recognition, Scene Understanding Speech Recognition, Speech Translation, Natural Language Processing Pedestrian Detection, Traffic Sign Recognition Breast Cancer Cell Mitosis Detection, Volumetric Brain Image Segmentation * Credited to Yann LeCun, Facebook AI Research & Center for Data Science, NYU
DEEP LEARNING EVERYWHERE NVIDIA DRIVE PX NVIDIA Tesla NVIDIA Jetson NVIDIA Titan X
BIG DATA TECHNOLOGIES Role of ML & AI? Forrester Research TechRadar Big Data, Q1 2016
CASE HISTORIES
BIG DATA, VIDEO & IMAGE PROCESSING Social Networks and Large DB Analytics Heat Maps Object Recognition 24
VIDEO AND GEOSPATIAL PROCESSING APPS Video Enhancement and Analytics Real-time only with GPUs Video Analytics GPUs enable 12x speed on more stream of data Remote Sensing, Ray Tracing 100x speed up over CPU Geospatial Image Processing 60x faster using GPUs 25
IMAGRY LARGE SCALE IMAGE CLASSIFICATION 30,000 Categories trained on 100,000,000 Categories Multi-GPU Training with scaling up to 128 nodes Mobile (Offline Mode) & Online Mode 20-50x Speedup using GPU 26
Stratified Medical Developing Better Drugs with Deep Learning AI-based Big-data Platform for Drug- Targeting Insights Aggregating many Industry Data Sources Natural Language Understanding Algorithms 30M scientific articles, 100M patents Integrated with various Ontologies, Drug Compound dbs, etc. Unified Dynamic Biomedical Knowledge Repository targeted for Drug Opportunities Judgement Augmented Correlation System (JACS) Deep Neural Networks that understand disease in the human body AI Networks working at multiple levels of representation for reasoning DNN training speedup of 15x using cudnn Preliminary results over state-of-the-art by 15% in understanding interactions from text Next-generation User Interface to navigate human knowledge with AI assistance 27
THANK YOU! cnardone@nvidia.com +39 335 5828197