Chapter 4: Artificial Neural Networks



Similar documents
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Neural Networks and Support Vector Machines

Lecture 6. Artificial Neural Networks

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Machine Learning and Data Mining -

6.2.8 Neural networks for data mining

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

The Backpropagation Algorithm

Stock Prediction using Artificial Neural Networks

Introduction to Machine Learning Using Python. Vikram Kamath

Lecture 8 February 4

Machine Learning: Multi Layer Perceptrons

Temporal Difference Learning in the Tetris Game

Linear Threshold Units

Data Mining using Artificial Neural Network Rules

TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC

American International Journal of Research in Science, Technology, Engineering & Mathematics

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

Machine Learning and Pattern Recognition Logistic Regression

Neural Computation - Assignment

An Introduction to Neural Networks

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

Neural Networks algorithms and applications

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Machine Learning over Big Data

Big Data Analytics Using Neural networks

Simplified Machine Learning for CUDA. Umar

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

OPTIMUM LEARNING RATE FOR CLASSIFICATION PROBLEM

Recurrent Neural Networks

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.

AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING

A simple application of Artificial Neural Network to cloud classification

NEURAL NETWORKS A Comprehensive Foundation

Comparison of K-means and Backpropagation Data Mining Algorithms

Analysis of Multilayer Neural Networks with Direct and Cross-Forward Connection

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

Neural Network and Genetic Algorithm Based Trading Systems. Donn S. Fishbein, MD, PhD Neuroquant.com

Neural Network Design in Cloud Computing

Neural Networks and Back Propagation Algorithm

Novelty Detection in image recognition using IRF Neural Networks properties

Simple and efficient online algorithms for real world applications

Artificial neural networks

Follow links Class Use and other Permissions. For more information, send to:

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 8: Multi-Layer Perceptrons

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning

Data Mining Techniques Chapter 7: Artificial Neural Networks

APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION

Statistical Machine Learning

Feedforward Neural Networks and Backpropagation

Performance Evaluation On Human Resource Management Of China S Commercial Banks Based On Improved Bp Neural Networks

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

Steven C.H. Hoi. School of Computer Engineering Nanyang Technological University Singapore

Package AMORE. February 19, 2015

Analecta Vol. 8, No. 2 ISSN

Application of Neural Network in User Authentication for Smart Home System

Neural network software tool development: exploring programming language options

An Introduction to Data Mining

Iranian J Env Health Sci Eng, 2004, Vol.1, No.2, pp Application of Intelligent System for Water Treatment Plant Operation.

CLOUD FREE MOSAIC IMAGES

SMORN-VII REPORT NEURAL NETWORK BENCHMARK ANALYSIS RESULTS & FOLLOW-UP 96. Özer CIFTCIOGLU Istanbul Technical University, ITU. and

A Time Series ANN Approach for Weather Forecasting

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

More Data Mining with Weka

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

Supporting Online Material for

Chapter 12 Discovering New Knowledge Data Mining

Role of Neural network in data mining

Learning is a very general term denoting the way in which agents:

Time Series Data Mining in Rainfall Forecasting Using Artificial Neural Network

A Learning Algorithm For Neural Network Ensembles

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

A Multi-level Artificial Neural Network for Residential and Commercial Energy Demand Forecast: Iran Case Study

What is Learning? CS 391L: Machine Learning Introduction. Raymond J. Mooney. Classification. Problem Solving / Planning / Control

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Multiple Layer Perceptron Training Using Genetic Algorithms

Parallel & Distributed Optimization. Based on Mark Schmidt s slides

PLAANN as a Classification Tool for Customer Intelligence in Banking

Neural Networks: a replacement for Gaussian Processes?

Artificial Neural Network for Speech Recognition

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

CSCI567 Machine Learning (Fall 2014)

Efficient online learning of a non-negative sparse autoencoder

Neural Network Add-in

Artificial Intelligence and Machine Learning Models

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Transcription:

Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/ Weka assignment http://www.cs.rutgers.edu/~mlittman/courses/ml03/hw1.pdf 1

Artificial Neural Networks [Read Ch. 4] [Review exercises 4.1, 4.2, 4.5, 4.9, 4.11] Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics Connectionist Models Consider humans: Neuron switching time ~.001 second Number of neurons ~ 10 10 Connections per neuron ~ 10 4-5 Scene recognition time ~.1 second 100 inference steps doesn't seem like enough Æ much parallel computation 2

Artificial Networks Properties of artificial neural nets (ANNs): Many neuron-like threshold switching units Many weighted interconnections among units Highly parallel, distributed process Emphasis on tuning weights automatically When to Consider ANNs Input is high-dimensional discrete or real-valued (e.g. raw sensor input) Output is discrete or real valued Output is a vector of values Possibly noisy data Form of target function is unknown Human readability of result is unimportant 3

ANNs: Example Uses Examples: Speech phoneme recognition [Waibel] Image classification [Kanade, Baluja, Rowley] Financial prediction Backgammon [Tesauro] ALVINN drives on highways 4

Perceptron Or, more succinctly: o(x) = sgn(w x) Perceptron Decision Surface A single unit can represent some useful functions What weights represent g(x1, x2) = AND(x1, x2)? But some functions not representable e.g., not linearly separable Therefore, we'll want networks of these... 5

Perceptron training rule where w i w i +D w i Where: D w i = h (t-o) x i t = c(x) is target value o is perceptron output h is small constant (e.g.,.1) called the learning rate (or step size) Perceptron training rule Can prove it will converge If training data is linearly separable and h sufficiently small 6

Gradient Descent To understand, consider simpler linear unit, where o = w 0 + w 1 x 1 + + w n x n Let's learn w i 's to minimize squared error E[w] 1/2 S d in D (t d -o d ) 2 Where D is set of training examples Error Surface 7

Gradient Descent Gradient E [w] = [ E/ w 0, E/ w 1,, E/ w n ] Training rule: Dw = -h E [w] in other words: Dw i = -h E/ w i Gradient of Error E/ w i = / w i 1/2 S d (t d -o d ) 2 = 1/2 S d / w i (t d -o d ) 2 = 1/2 S d 2 (t d -o d ) / w i (t d -o d ) = S d (t d -o d ) / w i (t d -w x d ) = S d (t d -o d ) (-x i,d ) 8

Gradient Descent Code GRADIENT-DESCENT(training examples, h) Each training example is a pair of the form <x, t>, where x is the vector of input values, and t is the target output value. h is the learning rate (e.g.,.05). Initialize each w i to some small random value Until the termination condition is met, Do Initialize each Dw i to zero. For each <x, t> in training examples, Do Input the instance x to the unit and compute the output o For each linear unit weight w i, Do Dw i Dw i + h (t-o)x i For each linear unit weight w i, Do w i w i + Dw i Summary Perceptron training rule will succeed if Training examples are linearly separable Sufficiently small learning rate h Linear unit training uses gradient descent Guaranteed to converge to hypothesis with minimum squared error Given sufficiently small learning rate h Even when training data contains noise Even when training data not H separable 9

Stochastic Gradient Descent Batch mode Gradient Descent: Do until satisfied 1. Compute the gradient E D [w] 2. w w - E D [w] Incremental mode Gradient Descent: Do until satisfied For each training example d in D 1. Compute the gradient E d [w] 2. w w - E d [w] More Stochastic Grad. Desc. E D [w] 1/2 S d in D (t d -o d ) 2 E d [w] 1/2 (t d -o d ) 2 Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if h set small enough 10

Multilayer Networks Decision Boundaries 11

Sigmoid Unit s(x) is the sigmoid (s-like) function 1/(1 + e -x ) Deriviates of Sigmoids Nice property: d s(x)/dx = s(x) (1-s(x)) We can derive gradient decent rules to train One sigmoid unit Multilayer networks of sigmoid units Æ Backpropagation 12

Error Gradient for Sigmoid E/ w i = / w i 1/2 S d (t d -o d ) 2 = 1/2 S d / w i (t d -o d ) 2 = 1/2 S d 2 (t d -o d ) / w i (t d -o d ) = S d (t d -o d ) (- o d / w i ) = - S d (t d -o d ) ( o d / net d net d / w i ) Even more But we know: o d / net d = s(net d )/ net d = o d (1- o d ) net d / w i = (w x d ) / w i = x i,d So: E/ w i = - S d (t d -o d )o d (1 - o d ) x i,d 13

Backpropagation Algorithm Initialize all weights to small random numbers. Until satisfied, Do For each training example, Do 1. Input the training example to the network and compute the network outputs 2. For each output unit k d k = o k (1 - o k )(t k -o k ) 3. For each hidden unit h d h = o h (1 - o h ) S k in outputs w h,k d d 4. Update each network weight w i,j w i,j w i,j + Dw i,j where Dw i,j = h d j x i,j More on Backpropagation Gradient descent over entire network weight vector Easily generalized to arbitrary directed graphs Will find a local, not necessarily global error minimum In practice, often works well (can run multiple times) 14

More more Often include weight momentum a Dw i,j (n) = h d j x i,j + a Dw i,j (n-1) Minimizes error over training examples Will it generalize well to subsequent examples? Training can take thousands of iterations Æ slow! Using network after training is very fast 15