Learning to Process Natural Language in Big Data Environment



Similar documents
An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048

Introduction to Machine Learning CMU-10701

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Steven C.H. Hoi School of Information Systems Singapore Management University

Image Classification for Dogs and Cats

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies

Machine Learning Introduction

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Neural Networks and Support Vector Machines

An Introduction to Deep Learning

Research Article Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network

Deep learning applications and challenges in big data analytics

Optimizing content delivery through machine learning. James Schneider Anton DeFrancesco

NEURAL NETWORKS A Comprehensive Foundation

Deep Learning for Big Data

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Playing Atari with Deep Reinforcement Learning

Machine Learning and Data Mining -

How To Use Neural Networks In Data Mining

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Scalable Developments for Big Data Analytics in Remote Sensing

6.2.8 Neural networks for data mining

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Novelty Detection in image recognition using IRF Neural Networks properties

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Taking Inverse Graphics Seriously

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Recurrent Neural Networks

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Sequence to Sequence Weather Forecasting with Long Short-Term Memory Recurrent Neural Networks

The Applications of Deep Learning on Traffic Identification

CSC384 Intro to Artificial Intelligence

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks

Applications of Deep Learning to the GEOINT mission. June 2015

Predict Influencers in the Social Network

CSE 517A MACHINE LEARNING INTRODUCTION

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Neural Networks and Back Propagation Algorithm

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Supervised Learning (Big Data Analytics)

Pedestrian Detection with RCNN

Image and Video Understanding

Chapter 4: Artificial Neural Networks

Artificial neural networks

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Handwritten Digit Recognition with a Back-Propagation Network

Introduction to Machine Learning Using Python. Vikram Kamath

LEUKEMIA CLASSIFICATION USING DEEP BELIEF NETWORK

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Classification of Bad Accounts in Credit Card Industry

Data Mining and Neural Networks in Stata

Gender Identification using MFCC for Telephone Applications A Comparative Study

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Learning is a very general term denoting the way in which agents:

Analecta Vol. 8, No. 2 ISSN

ELECTRICAL AND COMPUTER ENGINEERING

PhD in Computer Science and Engineering Bologna, April Machine Learning. Marco Lippi. Marco Lippi Machine Learning 1 / 80

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Neural Networks in Data Mining

Neural network software tool development: exploring programming language options

The relation between news events and stock price jump: an analysis based on neural network

An Introduction to Data Mining

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Supporting Online Material for

Neural Networks algorithms and applications

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

DATA MINING TECHNIQUES AND APPLICATIONS

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

Active Learning SVM for Blogs recommendation

Do Deep Nets Really Need to be Deep?

Neural Network Design in Cloud Computing

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Deep Learning for Multivariate Financial Time Series. Gilberto Batres-Estrada

Feature Engineering in Machine Learning

Neural Networks for Sentiment Detection in Financial Text

Towards applying Data Mining Techniques for Talent Mangement

Introduction to Artificial Neural Networks

Marc'Aurelio Ranzato

Challenges of Cloud Scale Natural Language Processing

arxiv: v2 [cs.cv] 19 Jun 2015

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Deep Learning using Linear Support Vector Machines

Lecture 6. Artificial Neural Networks

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC

Transcription:

CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies

Part 1: Deep Learning - Present and Future

Talk Outline Overview of Deep Learning Recent Advances in Research of Deep Learning Future of Deep Learning

General Picture of Machine Learning Artificial Intelligence Deep Learning Machine Learning Data Mining Deep learning is important and promising sub-area of machine learning

Brief History of Machine Learning Perceptron Rosenblatt Neural Net Rumelhart et al SVM Vapnik et al Deep Learning Hinton et al 1957 1986 1995 2006 Two layer liner model Three layer non-linear model Kernel SVM = Three layer non-linear model Deep learning means learning of complicated non-linear models Multi-layer non-linear model

Deep Neural Network Deep learning is to learn multi-layer non-linear models from data

Four Questions to Answer Why is deep learning powerful? Why is deep learning popular now? Is deep learning almighty? How similar/dissimilar are deep neural networks from human brain?

Question: Why Is Deep Learning Powerful? Powerful Representation Learning Ability Deep network = complicated non-linear function Eamples: XOR classification problem, cannot be modeled by linear function Simple non-linear classification problem, can be modeled by Kernel SVM Complicated non-linear classification problem, can be modeled by deep Convolutional Neural Network

Eample: XNOR Network a XNOR function can be represented by three layer network 1 1 e (2) a 1 20 z -30 20 (3) a 1 20 20-20 -20-10 (2) a 2 10 + 1 1 2 (2) (2) a 1 a 2 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 XNOR (3) a 1 1 2 + 1 Eample from Andrew Ng

Eample: XOR/XNOR Classification Problem XOR / XNOR classification problem: linear model cannot deal with; multi-layer neural network can

Eample: RBF Kernel SVM f () RBF Kernel SVM is a three layer network w 1 wn b +1 1 N scalar vector f ( ) i w i i y i i ep( ep( i 2 i ) b 2 ) b

Eample: Complicated Non-Linear Classification Problem N f ( ) y ep( i1 i i i 2 ) b RBF SVM can effectively separate positive and negative eamples Eample from Andrew Ng

Eample: Ale Net (A Convotional Neural Network for Image Classification) 11 layers; 650,000 neurons; 60 million parameters 5 convolution layers, 3 fully connected layers, 3 mapooling layers Krizhevsky et al, 2012

Eample: Ale Net for ImageNet Challenge ILSVRC 2012: Classifying 1.2 million images into 1000 classes Top 5 error rate: 15.2% v.s. 26.2% (second place)

Question: Why Is Deep Learning Powerful? Statistical Efficiency Deep models have higher statistical efficiency, i.e., need less data to train Functions can be more compactly represented with deep networks than shallow networks Eample: Sum-Product Network

Eample: Deep Sum-Product Network (( 1 2)( 3 4))( 5 6) 4 layers; 11 neurons + + + 5 6 1 2 3 4 Deep network has less parameters and thus higher statistical efficiency.

Eample: Shallow Product-Sum Network 6 4 2 6 3 2 6 4 1 6 3 1 5 4 2 5 3 2 5 4 1 5 3 1 + 1 2 3 4 5 6 3 layers; 15 neurons

Conclusion: Deep Learning Has Ability of Dealing with Complicated Problems Picture from Nvidia.com

Why Is Deep Learning Popular Now? Computers become more powerful More data is available Number of parameters vs size of training data Parametric vs non-parametric Simple problems have been solved

Computers Are More Powerful Neural networks did not beat eisting technologies 20-30 years ago Now we know the reason; the machines at the time were not fast enough Geoffrey Hinton Interview with Google's AI and Deep Learning 'Godfather' Geoffrey Hinton, 2015/7

More Parameters Model Has, More Training Data Is Needed Sample compleity Occam s Razor Theorem For binary classification : precision, 1- : confidence, h : model compleity Empirical Sample Compleity Empirically at least 100 times of number of parameters

Parametric v.s. Non-Parametric Number of parameters Parametric problem: number of parameters does not further increase, when size of data eceeds a threshold Size of training data Number of parameters Non-parametric problem: number of parameters continuously increases, when size of data increases Size of training data

Eample: Gaussian Distribution 2 2 2 ) ( ep 2 1 ) ( u f 2 1 2 1 ˆ) ( 1 ˆ 1 ˆ N i i N i i u N N u ) 1 ( precision : N O Parametric problems can be easily handled with big data

vocabulary size Eample: Vocabulary Size in Tet Data Non-parametric problem: need to continuously increase model compleity 1400000 Xinhua News Data 1200000 1000000 800000 600000 400000 200000 0 Vocabulary size increases when data size increases Data from Xiao Chen

Question: Is Deep Learning Almighty? Suitable for Complicated non-linear problem, and Big data No-free lunch theorem Interpretation: no algorithm can *always* work better than other algorithms

Acc (L) No Free Lunch Theorem (Wolpert & Macready, 1997) = generalization accuracy of learner = accuracy on test eamples L Theorem: for any learners L 1, L 2 if learning problem, s.t. Acc ( L1 ) Acc ( L2 ) then learning problem, s.t. Acc ( L2 ) Acc ( L1 )

Conclusion: Using Right Tools to Do Right Things Simple problems can be solved by using simple models (linear or non-linear) Deep learning is more suitable for complicated non-linear problems with large amount of data

Question: How Similar/Dissimilar from Human Brain? Brain and computer = automatas Von Neumann Artificial neural networks are machine learning models, inspired by human brains, but motivated by problem solving Similarity Elements: e.g., neurons, synapses Architecture: e.g., deep cascaded structure, local receptive field Signal: e.g., digital and analogue signals Dissimilarity Mechanism of learning, e.g., back propagation

Talk Outline Overview of Deep Learning Recent Advances in Research of Deep Learning Future of Deep Learning

Recent Advances of Deep Learning Research Restricted Boltzmann Machine (Hinton 2006) Ale Net: Deep Convolutional Network for Image Classification (Krizhevsky et al, 2012) Deep Neural Network for Speech (Dahl et al, 2012) Unsupervised Learning of High Level Features Using Auto Encoder (Le et al, 2012) @Google Atari Player: Deep Reinforcement Learning (Mnih et al, 2013) @Google Mind Neural Turing Machine (Graves et al, 2014) @Google Mind Memory Networks (Weston et al, 2014) @Facebook

Unsupervised Learning of High Level Features Using Auto Encoder (Le et al, 2012) Nine layers: replicating same stage three times Each stage: filtering, pooling, and contrast normalization Training: learning weights of auto encoding and decoding 10 million images 1 billion parameters 1000 machines (1600 cores), trained on 3 days Can learn the feature for detecting cat

Atari Player: Deep Reinforcement Learning (Mnih et al, 2013) Learning to control in dynamic and uncertain environment Eample: Atari games Deep reinforcement learning system beats human eperts on three out of si games Formalized as Q-learning Input: raw piels Action: game action Reward: game score Using convolutional neural network to model Q-function

Controller = Neural Network, with input, output, and memory Memory has read and writing operations All components are differentiable and thus trainable by gradient descent Neural Network = DNN or LSTM Tasks such as copy, sort can be performed Neural Turing Machine (Graves et al, 2014)

Long term memory + inference Model is learned Can answer factoid questions Acc = 40%+ Memory Networks (Weston et al, 2014) Eample John is in the playground. Bob is in the office. John picked up the football. Bob went to the kitchen. Q: where is the football? A: playground memory q pre-process i match m i a generate o

Talk Outline Overview of Deep Learning Recent Advances in Research of Deep Learning Future of Deep Learning

Future of Deep Learning - Beyond Complicated Pattern Recognition? More complicated tasks E.g., multi-turn dialogue More effective use of human knowledge E.g., question answering Combination with reasoning E.g., question answering

References Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Contet-dependent pretrained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1), 30-42. Le, Q. V. (2013, May). Building high-level features using large scale unsupervised learning. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8595-8598). IEEE. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arxiv preprint arxiv:1312.5602. Graves, A., Wayne, G., & Danihelka, I. (2014). Neural Turing Machines. arxiv preprint arxiv:1410.5401. Weston, J., Chopra, S., & Bordes, A. (2014). Memory networks. arxiv preprint arxiv:1410.3916.

Thank you! hangli.hl@huawei.com