Deep Learning for Big Data

Similar documents
Learning to Process Natural Language in Big Data Environment

Deep learning applications and challenges in big data analytics

Machine Learning Introduction

Steven C.H. Hoi School of Information Systems Singapore Management University

Introduction to Machine Learning CMU-10701

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Introduction to Machine Learning Using Python. Vikram Kamath

: Introduction to Machine Learning Dr. Rita Osadchy

CSE 517A MACHINE LEARNING INTRODUCTION

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Compacting ConvNets for end to end Learning

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Learning Deep Architectures for AI. Contents

Tutorial on Deep Learning and Applications

What is Artificial Intelligence?

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Taking Inverse Graphics Seriously

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Visualizing Higher-Layer Features of a Deep Network

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

CSC384 Intro to Artificial Intelligence

PhD in Computer Science and Engineering Bologna, April Machine Learning. Marco Lippi. Marco Lippi Machine Learning 1 / 80

How To Use Neural Networks In Data Mining

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Machine Learning for Data Science (CS4786) Lecture 1

Manifold Learning with Variational Auto-encoder for Medical Image Analysis

Representation Learning: A Review and New Perspectives

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Machine Learning What, how, why?

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Networked Virtual Spaces and Clouds. Magda El Zarki UC Irvine

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

MA2823: Foundations of Machine Learning

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Marc'Aurelio Ranzato

Latent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

Learning is a very general term denoting the way in which agents:

An Introduction to Deep Learning

The Discipline of Machine Learning

On Optimization Methods for Deep Learning

Supervised Learning (Big Data Analytics)

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Sparse deep belief net model for visual area V2

Machine Learning: Overview

6.2.8 Neural networks for data mining

Applications of Deep Learning to the GEOINT mission. June 2015

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Efficient online learning of a non-negative sparse autoencoder

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data, Measurements, Features

The Applications of Deep Learning on Traffic Identification

Interac(ve Broker (UK) Limited Webinar: Proprietary Trading Groups

Financial Opera,ons Track: ROI vs. ROCE (Return on Customer Experience) Speaker: Robert Lane, Strategic Sourcing Manager, Premier Health Partners

Simplified Machine Learning for CUDA. Umar

Obtaining Value from Big Data

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Statistics for BIG data

Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images

Introduction to Pattern Recognition

HT2015: SC4 Statistical Data Mining and Machine Learning

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Generalized Denoising Auto-Encoders as Generative Models

Neural Networks in Data Mining

Big Data Analytics. The Hype and the Hope* Dr. Ted Ralphs Industrial and Systems Engineering Director, Laboratory

Phone Systems Buyer s Guide

Unsupervised Feature Learning and Deep Learning

Learning multiple layers of representation

Big Data Deep Learning: Challenges and Perspectives

Cloud Compu)ng. Yeow Wei CHOONG Anne LAURENT

Machine Learning using MapReduce

Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

DEEP LEARNING WITH GPUS

Software & systems for the neuromorphic generation of computing. Peter Suma co-ceo peter.suma@appliedbrainresearch.

NEURAL NETWORKS IN DATA MINING

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Chapter 4: Artificial Neural Networks

Transcription:

Deep Learning for Big Data Yoshua Bengio Département d Informa0que et Recherche Opéra0onnelle, U. Montréal 30 May 2013, Journée de la recherche École Polytechnique, Montréal

Big Data & Data Science Super- hot buzzword Data deluge Two sides of the coin: 1. Allowing computers to understand the data (percep0on) 2. Allowing computers to take decisions (ac0on) My research: 1. CERC in Data Science and Real- Time Decision- Making: The necessity to combine 1 and 2. 2

Business execu0ves are faced with a relentless and exponen0al growth of data that can be collected by their enterprises 3 Big Data a Growing Torrent 30 billion pieces of content shared on Facebook every month 5 billion mobile phones in use in 2010 40% projected growth in global data generated per year vs. 5% growth in global IT spending Data: McKinsey 1 Exabyte = 1 Billion Gigabytes Figure: The Economist

Making sense of this data could unleash substan0al value across an array of industries. 4 Big Data Big Value $300 billion poten0al annual value to US health care $600 billion poten0al annual consumer surplus from using personal loca0on data globally 250 billion poten0al annual value to Europe s public sector 60% poten0al increase in retailers opera0ng margins possible with big data Source: McKinsey

There are many reasons to believe that since last year turning data into a compeffve advantage is becoming a top- of- mind C- level issue. 5 Big Data: in the minds of executives O Reilly Strata Conference, Twice yearly event, started 2011 McKinsey White Paper, 2011 The Economist Special Report, 2010 The world of Big Data is on fire The Economist, Sept 2011 #bigdata on Twider

Data Science: automatically extracting knowledge from data From: Yann LeCun Lecture 1 on Big Data, large scale machine learning, 2013 6

Decision Science + Machine Learning The topic of a successful CERC applica0on Why? Data deluge & real- 0me online learning 7 Learned models are used to take decisions on the fly The data used to train depends on the decisions taken Can t separate the learning from the decisions like in tradi0onal OR & ML setups Examples: Online adver0sing & recommenda0on systems Online video games Fraud detec0on, targeted marking, etc.

Ultimate Goals for AI AI Needs knowledge Needs learning Needs generalizing where probability mass concentrates Needs to fight the curse of dimensionality Needs disentangling the underlying explanatory factors ( making sense of the data ) 8

Easy Learning = example (x,y) y true unknown function learned function: prediction = f(x) x

Local Smoothness Prior: Locally Capture the Variations y prediction f(x) = training example true function: unknown x x è f(x) f(x ) learnt = interpolated test point x x

What We Are Fighting Against: The Curse of Dimensionality To generalize locally, need representa0ve examples for all relevant varia0ons!

Manifold Learning Prior: examples concentrate near lower dimensional manifold 12

Putting Probability Mass where Structure is Plausible Empirical distribu0on: mass at training examples Smoothness: spread mass around Insufficient Guess structure and generalize accordingly 13

Representation Learning Good input features essen0al for successful ML (feature engineering = 90% of effort in industrial ML) Handcrasing features vs learning them Representa0on learning: guesses the features / factors / causes = good representa0on. 14

Deep Representation Learning Deep learning algorithms adempt to learn mul0ple levels of representa0on of increasing complexity/abstrac0on When the number of levels can be data- selected, this is Deep Learning h 3 h 2 15 h 1 x

A Modern Deep Architecture Op0onal Output layer Here predic0ng a supervised target Hidden layers These learn more abstract representa0ons as you head up Input layer 16 This has raw sensory inputs (roughly)

Google Image Search: Different object types represented in the same space Google: S. Bengio, J. Weston & N. Usunier (IJCAI 2011, NIPS 2010, JMLR 2010, MLJ 2010)

How do humans generalize from very few examples? Brains may be born with generic priors. Which ones? Humans transfer knowledge from previous learning: Representa0ons Explanatory factors Previous learning from: unlabeled data + labels for other tasks 18

Learning multiple levels of representation Theore0cal evidence for mul0ple levels of representa0on ExponenFal gain for some families of funcfons Biologically inspired learning Brain has a deep architecture Cortex seems to have a generic learning algorithm Humans first learn simpler concepts and then compose them to more complex ones 19

Learning multiple levels of representation (Lee, Largman, Pham & Ng, NIPS 2009) (Lee, Grosse, Ranganath & Ng, ICML 2009) Successive model layers learn deeper intermediate representa0ons Layer 3 High- level linguis0c representa0ons Parts combine to form objects Layer 2 20 Layer 1 Prior: underlying factors & concepts compactly expressed w/ mulfple levels of abstracfon

subsubsub1 subsubsub2 subsubsub3 subsub1 subsub2 subsub3 sub1 sub2 sub3 main Deep computer program

subroutine1 includes subsub1 code and subsub2 code and subsubsub1 code subroutine2 includes subsub2 code and subsub3 code and subsubsub3 code and main Shallow computer program

Major Breakthrough in 2006 Ability to train deep architectures by using layer- wise unsupervised learning, whereas previous purely supervised adempts had failed Unsupervised feature learners: RBMs Auto- encoder variants Sparse coding variants Empirical successes since then: 2 competitions, Google, Microsoft, IBM, Apple 23 Toronto Hinton Bengio Montréal Le Cun New York

Deep Networks for Speech Recognition: results from Google, IBM, Microsoft task Hours of training data Deep net+hmm GMM+HMM same data GMM+HMM more data Switchboard 309 16.1 23.6 17.1 (2k hours) English Broadcast news Bing voice search Google voice input 50 17.5 18.8 24 30.4 36.2 5870 12.3 16.0 (lots more) Youtube 1400 47.6 52.3 24 (numbers taken from Geoff Hinton s June 22, 2012 Google talk)

Deep Sparse Rectifier Neural Networks (Glorot,Bordes and Bengio AISTATS 2011), following up on (Nair & Hinton 2010) Machine learning motivations Neuroscience motivations Leaky integrate-and-fire model Sparse representations Sparse gradients Rectifier f(x)=max(0,x) Outstanding results by Krizhevsky et al 2012 killing the state- of- the- art on ImageNet 1000: 1st choice Top- 5 2nd best 27% err Previous SOTA 45% err 26% err Krizhevsky et al 37% err 15% err

Learning Multiple Levels of Abstraction The big payoff of deep learning is to allow learning higher levels of abstrac0on Higher- level abstrac0ons disentangle the factors of varia0on, which allows much easier generaliza0on and transfer More abstract representa0ons à Successful transfer (domains, languages), 2 interna0onal compe00ons won 26

Challenges Ahead Big data + deep learning = underfizng, local minima, ill- condi0oning, difficulty of using 2 nd - order methods in stochas0c / online sezng The challenge of inference with non- unimodal non- factorial posteriors (can we avoid this altogether?) Big data + deep learning + parallel compu0ng à our current best training algorithms are highly sequen0al big efforts @ Google in this respect (Dean et al ICML 2012, NIPS 2012) Much remains to be understood mathema0cally, (Alain & Bengio ICLR 2013) one of few scratching the 0p of the iceberg 27

LISA team: Merci! Questions?