Autonomous Vehicle Steering Characteristics of ANNs

Similar documents
Chapter 4: Artificial Neural Networks

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Lecture 6. Artificial Neural Networks

An Introduction to Neural Networks

Stock Prediction using Artificial Neural Networks

Neural Networks and Support Vector Machines

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

Performance Evaluation On Human Resource Management Of China S Commercial Banks Based On Improved Bp Neural Networks

Recurrent Neural Networks

Self Organizing Maps: Fundamentals

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

Neural network software tool development: exploring programming language options

Application of Neural Network in User Authentication for Smart Home System

Analecta Vol. 8, No. 2 ISSN

NEURAL NETWORKS A Comprehensive Foundation

Machine Learning: Multi Layer Perceptrons

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

6.2.8 Neural networks for data mining

Neural Computation - Assignment

Data Mining using Artificial Neural Network Rules

Multiple Layer Perceptron Training Using Genetic Algorithms

Lecture 8 February 4

Follow links Class Use and other Permissions. For more information, send to:

A simple application of Artificial Neural Network to cloud classification

Neural Network Design in Cloud Computing

ELLIOTT WAVES RECOGNITION VIA NEURAL NETWORKS

SEMINAR OUTLINE. Introduction to Data Mining Using Artificial Neural Networks. Definitions of Neural Networks. Definitions of Neural Networks

APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Machine Learning and Data Mining -

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Advanced analytics at your hands

An Augmented Normalization Mechanism for Capacity Planning & Modelling Elegant Approach with Artificial Intelligence

A Multi-level Artificial Neural Network for Residential and Commercial Energy Demand Forecast: Iran Case Study

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

129: Artificial Neural Networks. Ajith Abraham Oklahoma State University, Stillwater, OK, USA 1 INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS

PLAANN as a Classification Tool for Customer Intelligence in Banking

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Neural Network and Genetic Algorithm Based Trading Systems. Donn S. Fishbein, MD, PhD Neuroquant.com

Linear Threshold Units

Horse Racing Prediction Using Artificial Neural Networks

IBM SPSS Neural Networks 22

Lecture 2: The SVM classifier

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

Artificial Neural Network for Speech Recognition

Temporal Difference Learning in the Tetris Game

Principles of Data Mining by Hand&Mannila&Smyth

Back Propagation Neural Networks User Manual

Neural Networks and Back Propagation Algorithm

Using Artifical Neural Networks to Model Opponents in Texas Hold'em

Novelty Detection in image recognition using IRF Neural Networks properties

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

OPTIMUM LEARNING RATE FOR CLASSIFICATION PROBLEM

3 An Illustrative Example

American International Journal of Research in Science, Technology, Engineering & Mathematics

Big Data Analytics Using Neural networks

Photonic Reservoir Computing with coupled SOAs

Models of Cortical Maps II

Power Prediction Analysis using Artificial Neural Network in MS Excel

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

Prediction Model for Crude Oil Price Using Artificial Neural Networks

Back Propagation Neural Network for Wireless Networking

Feedforward Neural Networks and Backpropagation

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.

Data Mining Techniques Chapter 7: Artificial Neural Networks

Padma Charan Das Dept. of E.T.C. Berhampur, Odisha, India

Data-stream Mining for Rule-based Access Control. Andrii Shalaginov, 13 th of October 2014 COINS PhD seminar

The PageRank Citation Ranking: Bring Order to the Web

Chapter 12 Discovering New Knowledge Data Mining

Computational Neural Network for Global Stock Indexes Prediction

Neural Networks algorithms and applications

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC

NEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi

Planning Workforce Management for Bank Operation Centers with Neural Networks

Logistic Regression for Spam Filtering

A Content based Spam Filtering Using Optical Back Propagation Technique

GLOVE-BASED GESTURE RECOGNITION SYSTEM

Tennis Winner Prediction based on Time-Series History with Neural Modeling

A Time Series ANN Approach for Weather Forecasting

ARTIFICIAL INTELLIGENCE METHODS IN EARLY MANUFACTURING TIME ESTIMATION

Introduction to Machine Learning Using Python. Vikram Kamath

Supporting Online Material for

Neural Networks: a replacement for Gaussian Processes?

ARTIFICIAL NEURAL NETWORKS FOR ADAPTIVE MANAGEMENT TRAFFIC LIGHT OBJECTS AT THE INTERSECTION

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING

Neural Network Add-in

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Stock Market Prediction System with Modular Neural Networks

An Artificial Neural Networks-Based on-line Monitoring Odor Sensing System

DYNAMIC LOAD BALANCING OF FINE-GRAIN SERVICES USING PREDICTION BASED ON SERVICE INPUT JAN MIKSATKO. B.S., Charles University, 2003 A THESIS

The general inefficiency of batch training for gradient descent learning

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

Transcription:

EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University Background Artificial neural networks (ANNs) provide a general, practical method for learning real-valued, discretevalued, and vector-valued functions from examples. Algorithms such as BACKPROPAGATION use gradient descent to tune network parameters to best fit a training set of input-output pairs. ANN learning is robust to errors in the training data and has been successfully applied to problems such as face recognition/detection, speech recognition, and learning robot control strategies. 1 2 Autonomous Vehicle Steering Characteristics of ANNs Instances are represented by many attribute-value pairs. The target function output may be discrete-valued, realvalued, or a vector of several real- or discrete-valued attributes. The training examples may contain errors. Long training times are acceptable. Fast evaluation of the learned target function may be required. The ability of humans to understand the learned target function is not important. 3 4 Very simple example 0 0.4 0 1 net input = 0.4 0 + -0.1 1 = -0.1-0.1 Learning problem to be solved Suppose we have an input pattern (0 1) We have a single output pattern (1) We have a net input of -0.1, which gives an output pattern of (0) How could we adjust the weights, so that this situation is remedied and the spontaneous output matches our target output pattern of (1)? 5 6

Answer Increase the weights, so that the net input exceeds 0.0 E.g., add 0.2 to all weights Observation: Weight from input node with activation 0 does not have any effect on the net input So we will leave it alone Perceptrons One type of ANN system is based on a unit called a perceptron. The perceptron function can sometimes be written as The space H of candidate hypotheses considered in perceptron learning is the set of all possible real-valued weight vectors. 7 8 Representational Power of Perceptrons Decision surface linear decision surface nonlinear decision surface 9 Programming Example of Decision Surface 10 The Perceptron Training Rule One way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modifying the perceptron weights whenever it misclassifies an example. This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly. Weights are modified at each step according to the perceptron training rule, which revises the weight associated with input according to the rule and Delta Rule The delta training rule is best understood by considering the task of training an unthresholded perceptron; that is, a linear unit for which the output o is given by In order to derive a weight learning rule for linear units, let us begin by specifying a measure for the training error of a hypothesis (weight vector), relative to the training examples. 11 12

Visualizing the Hypothesis Space initial weight vector by random minimum error Derivation of the Rule The vector derivative is called the gradient of E with respect to, written The gradient specifies the direction that produces the steepest increase in E. The negative of this vector therefore gives the direction of steepest decrease. The training rule for gradient descent is 13 14 Derivation of the Rule (cont.) Derivation of the Rule (cont.) The negative sign is presented because we want to move the weight vector in the direction that decreases E. This training rule can also written in its component form The vector of derivatives that form the gradient can be obtained by differentiating E The weight update rule for standard gradient descent can be summarized as which makes it clear that steepest descent is achieved by altering each component of in proportion to. 15 16 EECP0720 Expert Systems Artificial Neural Networks Stochastic Approximation to EECP0720 Expert Systems Artificial Neural Networks Summary of Perceptron Perceptron training rule guaranteed to succeed if training examples are linearly separable sufficiently small learning rate Linear unit training rule uses gradient descent guaranteed to converge to hypothesis with minimum squared error given sufficiently small learning rate even when training data contains noise 17 18

EECP0720 Expert Systems Artificial Neural Networks BACKPROPAGATION Algorithm Error Function The Backpropagation algorithm learns the weights for a multilayer network, given a network with a fixed set of units and interconnections. It employs gradient descent to attempt to minimize the squared error between the network output values and the target values for those outputs. We begin by redefining E to sum the errors over all of the network output units where outputs is the set of output units in the network, and t kd and o kd are the target and output values associated with the k th output unit and training example d. 19 20 Architecture of Backpropagation Backpropagation Learning Algorithm 21 22 Backpropagation Learning Algorithm (cont.) Backpropagation Learning Algorithm (cont.) 23 24

Backpropagation Learning Algorithm (cont.) Backpropagation Learning Algorithm (cont.) 25 26 Inputs To Neurons Arise from other neurons or from outside the network Nodes whose inputs arise outside the network are called input nodes and simply copy values An input may excite or inhibit the response of the neuron to which it is applied, depending upon the weight of the connection Weights Represent synaptic efficacy and may be excitatory or inhibitory Normally, positive weights are considered as excitatory while negative weights are thought of as inhibitory Learning is the process of modifying the weights in order to produce a network that performs some function 27 28 Output Backpropagation Preparation The response function is normally nonlinear Samples include Sigmoid 1 f ( x) = 1 λx + e Piecewise linear x, if x θ f ( x) = 0, if x < θ 29 Training Set A collection of input-output patterns that are used to train the network Testing Set A collection of input-output patterns that are used to assess network performance Learning Rate-η A scalar parameter, analogous to step size in numerical integration, used to set the rate of adjustments 30

Network Error Total-Sum-Squared-Error (TSSE) 1 2 TSSE = ( desired actual) 2 patterns outputs Root-Mean-Squared-Error (RMSE) RMSE = 2 * TSSE # patterns *# outputs A Pseudo-Code Algorithm Randomly choose the initial weights While error is too large For each training pattern Apply the inputs to the network Calculate the output for every neuron from the input layer, through the hidden layer(s), to the output layer Calculate the error at the outputs Use the output error to compute error signals for preoutput layers Use the error signals to compute weight adjustments Apply the weight adjustments Periodically evaluate the network performance 31 32 Face Detection using Neural Networks Face Database Non-Face Database Training Process Output=0, for non-face database Testing Process Output=1, for face database Neural Network Face or Non- Face? 33 Backpropagation Using Advantages Relatively simple implementation Standard method and generally works well Disadvantages Slow and inefficient Can get stuck in local minima resulting in suboptimal solutions 34 Local Minima Alternatives To Gradient Descent Simulated Annealing Advantages Local Minimum Can guarantee optimal solution (global minimum) Disadvantages May be slower than gradient descent Much more complicated implementation Global Minimum 35 36

Alternatives To Gradient Descent Genetic Algorithms/Evolutionary Strategies Advantages Faster than simulated annealing Less likely to get stuck in local minima Disadvantages Slower than gradient descent Memory intensive for large nets Momentum Adds a percentage of the last movement to the current movement 37 38 Momentum Useful to get over small bumps in the error function Often finds a minimum in less steps w(t) = -n*d*y + a*w(t-1) w is the change in weight n is the learning rate d is the error y is different depending on which layer we are calculating a is the momentum parameter Adaptive Backpropagation Algorithm It assigns each weight a learning rate That learning rate is determined by the sign of the gradient of the error function from the last iteration If the signs are equal it is more likely to be a shallow slope so the learning rate is increased The signs are more likely to differ on a steep slope so the learning rate is decreased This will speed up the advancement when on gradual slopes 39 40 Adaptive Backpropagation Possible Problems: Since we minimize the error for each weight separately the overall error may increase Solution: Calculate the total output error after each adaptation and if it is greater than the previous error reject that adaptation and calculate new learning rates SuperSAB(Super Self-Adapting Backpropagation) Combines the momentum and adaptive methods. Uses adaptive method and momentum so long as the sign of the gradient does not change This is an additive effect of both methods resulting in a faster traversal of gradual slopes When the sign of the gradient does change the momentum will cancel the drastic drop in learning rate This allows for the function to roll up the other side of the minimum possibly escaping local minima 41 42

SuperSAB Experiments show that the SuperSAB converges faster than gradient descent Overall this algorithm is less sensitive (and so is less likely to get caught in local minima) Other Ways To Minimize Error Varying training data Cycle through input classes Randomly select from input classes Add noise to training data Randomly change value of input node (with low probability) Retrain with expected inputs after initial training E.g. Speech recognition 43 44 Other Ways To Minimize Error Adding and removing neurons from layers Adding neurons speeds up learning but may cause loss in generalization Removing neurons has the opposite effect 45