An Introduction to Neural Networks



Similar documents
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Neural network software tool development: exploring programming language options

Lecture 6. Artificial Neural Networks

Neural Networks and Back Propagation Algorithm

Neural Networks and Support Vector Machines

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Chapter 4: Artificial Neural Networks

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

Data Mining Techniques Chapter 7: Artificial Neural Networks

Analecta Vol. 8, No. 2 ISSN

SEMINAR OUTLINE. Introduction to Data Mining Using Artificial Neural Networks. Definitions of Neural Networks. Definitions of Neural Networks

NEURAL NETWORKS A Comprehensive Foundation

Prediction Model for Crude Oil Price Using Artificial Neural Networks

6.2.8 Neural networks for data mining

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

Machine Learning and Data Mining -

Temporal Difference Learning in the Tetris Game

Stock Prediction using Artificial Neural Networks

Self Organizing Maps: Fundamentals

Neural Network Add-in

Neural Networks algorithms and applications

Comparison of K-means and Backpropagation Data Mining Algorithms

3 An Illustrative Example

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Neural Network and Genetic Algorithm Based Trading Systems. Donn S. Fishbein, MD, PhD Neuroquant.com

Lecture 8 February 4

Recurrent Neural Networks

Iranian J Env Health Sci Eng, 2004, Vol.1, No.2, pp Application of Intelligent System for Water Treatment Plant Operation.

Cash Forecasting: An Application of Artificial Neural Networks in Finance

Neural Networks in Quantitative Finance

Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep

Making Sense of the Mayhem: Machine Learning and March Madness

Tennis Winner Prediction based on Time-Series History with Neural Modeling

Chapter 12 Discovering New Knowledge Data Mining

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

Artificial Intelligence and Machine Learning Models

Machine Learning: Multi Layer Perceptrons

Application of Neural Network in User Authentication for Smart Home System

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

Introduction to Artificial Neural Networks

Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

Follow links Class Use and other Permissions. For more information, send to:

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Novelty Detection in image recognition using IRF Neural Networks properties

Statistical Machine Learning

SMORN-VII REPORT NEURAL NETWORK BENCHMARK ANALYSIS RESULTS & FOLLOW-UP 96. Özer CIFTCIOGLU Istanbul Technical University, ITU. and

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations

Role of Neural network in data mining

Price Prediction of Share Market using Artificial Neural Network (ANN)

PLAANN as a Classification Tool for Customer Intelligence in Banking

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Neural Computation - Assignment

Multiple Layer Perceptron Training Using Genetic Algorithms

AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING

Power Prediction Analysis using Artificial Neural Network in MS Excel

Sound Quality Evaluation of Hermetic Compressors Using Artificial Neural Networks

ARTIFICIAL NEURAL NETWORKS FOR DATA MINING

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS

An Introduction to Artificial Neural Networks (ANN) - Methods, Abstraction, and Usage

How To Use Neural Networks In Data Mining

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Stabilization by Conceptual Duplication in Adaptive Resonance Theory

Neural Network Applications in Stock Market Predictions - A Methodology Analysis

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION

Weather forecast prediction: a Data Mining application

Artificial Neural Network for Speech Recognition

Designing a neural network for forecasting financial time series

IBM SPSS Neural Networks 22

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

Introduction to Machine Learning Using Python. Vikram Kamath

Logistic Regression for Spam Filtering

Biological Neurons and Neural Networks, Artificial Neurons

Taking Inverse Graphics Seriously

Package AMORE. February 19, 2015

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Modelling and Big Data. Leslie Smith ITNPBD4, October Updated 9 October 2015

NEURAL NETWORKS IN DATA MINING

Package neuralnet. February 20, 2015

Data Mining and Neural Networks in Stata

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Analysis of Multilayer Neural Networks with Direct and Cross-Forward Connection

Neural Network Design in Cloud Computing

Models of Cortical Maps II

Lecture 3: Linear methods for classification

Artificial Neural Network and Non-Linear Regression: A Comparative Study

Horse Racing Prediction Using Artificial Neural Networks

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

A Property and Casualty Insurance Predictive Modeling Process in SAS

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

Supporting Online Material for

Transcription:

An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27, 2002

Outline Fundamentals and Verification and Discussion Conclusion Cheung/Cannons 1

What Are Artificial? Fundamentals An extremely simplified model of the brain Essentially a function approximator Transforms inputs into outputs to the best of its ability Inputs Outputs Inputs NN Outputs Cheung/Cannons 2

What Are Artificial? Composed of many neurons that co-operate to perform the desired function Fundamentals Cheung/Cannons 3

Fundamentals What Are They Used For? Classification Pattern recognition, feature extraction, image matching Noise Reduction Recognize patterns in the inputs and produce noiseless outputs Prediction Extrapolation based on historical data Cheung/Cannons 4

Why Use? Ability to learn NN s figure out how to perform their function on their own Determine their function based only upon sample inputs Ability to generalize i.e. produce reasonable outputs for inputs it has not been taught how to deal with Fundamentals Cheung/Cannons 5

How Do Work? The output of a neuron is a function of the weighted sum of the inputs plus a bias i 1 i 2 i 3 w 1 w 2 w 3 Neuron Output = f(i 1 w 1 + i 2 w 2 + i 3 w 3 + bias) bias The function of the entire neural network is simply the computation of the outputs of all the neurons An entirely deterministic calculation Cheung/Cannons 6

Activation Functions Applied to the weighted sum of the inputs of a neuron to produce the output Majority of NN s use sigmoid functions Smooth, continuous, and monotonically increasing (derivative is always positive) Bounded range - but never reaches max or min Consider ON to be slightly less than the max and OFF to be slightly greater than the min Cheung/Cannons 7

Activation Functions The most common sigmoid function used is the logistic function f(x) = 1/(1 + e -x ) The calculation of derivatives are important for neural networks and the logistic function has a very nice derivative f (x) = f(x)(1 - f(x)) Other sigmoid functions also used hyperbolic tangent arctangent The exact nature of the function has little effect on the abilities of the neural network Cheung/Cannons 8

Fundamentals Where Do The Weights Come From? The weights in a neural network are the most important factor in determining its function Training is the act of presenting the network with some sample data and modifying the weights to better approximate the desired function There are two main types of training Supervised Training Supplies the neural network with inputs and the desired outputs Response of the network to the inputs is measured The weights are modified to reduce the difference between the actual and desired outputs Cheung/Cannons 9

Fundamentals Where Do The Weights Come From? Unsupervised Training Epoch Only supplies inputs The neural network adjusts its own weights so that similar inputs cause similar outputs The network identifies the patterns and differences in the inputs without any external assistance One iteration through the process of providing the network with an input and updating the network's weights Typically many epochs are required to train the neural network Cheung/Cannons 10

Perceptrons First neural network with the ability to learn Made up of only input neurons and output neurons Input neurons typically have two states: ON and OFF Output neurons use a simple threshold activation function In basic form, can only solve linear problems Limited applications.5.2.8 Input Neurons Weights Output Neuron Cheung/Cannons 11

How Do Perceptrons Learn? Uses supervised training If the output is not correct, the weights are adjusted according to the formula: w new = w old + α(desired output)*input α is the learning rate 1 0 0.5 0.2 1 1 0.8 1 * 0.5 + 0 * 0.2 + 1 * 0.8 = 1.3 Assuming Output Threshold = 1.2 1.3 > 1.2 Assume Output was supposed to be 0 update the weights Assume α = 1 W 1new = 0.5 + 1*(0-1)*1 = -0.5 W 2new = 0.2 + 1*(0-1)*0 = 0.2 W 3new = 0.8 + 1*(0-1)*1 = -0.2 Cheung/Cannons 12

Multilayer Feedforward Networks Most common neural network An extension of the perceptron Multiple layers The addition of one or more hidden layers in between the input and output layers Activation function is not simply a threshold Usually a sigmoid function A general function approximator Not limited to linear problems Information flows in one direction The outputs of one layer act as inputs to the next layer Cheung/Cannons 13

XOR Example 0 Inputs 1 Output Inputs: 0, 1 H 1 : Net = 0(4.83) + 1(-4.83) 2.82 = -7.65 Output = 1 / (1 + e 7.65 ) = 4.758 x 10-4 H 2 : Net = 0(-4.63) + 1(4.6) 2.74 = 1.86 Output = 1 / (1 + e -1.86 ) = 0.8652 O: Net = 4.758 x 10-4 (5.73) + 0.8652(5.83) 2.86 = 2.187 Output = 1 / (1 + e -2.187 ) = 0.8991 1 Cheung/Cannons 14

Backpropagation Most common method of obtaining the many weights in the network A form of supervised training The basic backpropagation algorithm is based on minimizing the error of the network using the derivatives of the error function Simple Slow Prone to local minima issues Cheung/Cannons 15

Backpropagation Most common measure of error is the mean square error: E = (target output) 2 Partial derivatives of the error wrt the weights: Output Neurons: let: δ j = f (net j ) (target j output j ) E/ w ji = -output i δ j Hidden Neurons: let: δ j = f (net j ) Σ(δ k w kj ) E/ w ji = -output i δ j j = output neuron i = neuron in last hidden j = hidden neuron i = neuron in previous layer k = neuron in next layer Cheung/Cannons 16

Fundamentals Backpropagation Calculation of the derivatives flows backwards through the network, hence the name, backpropagation These derivatives point in the direction of the maximum increase of the error function A small step (learning rate) in the opposite direction will result in the maximum decrease of the (local) error function: w new = w old α E/ w old where α is the learning rate Cheung/Cannons 17

Fundamentals Backpropagation The learning rate is important Too small Convergence extremely slow Too large May not converge Momentum Tends to aid convergence Applies smoothed averaging to the change in weights: new = β old - α E/ w old w new = w old + new Acts as a low-pass filter by reducing rapid fluctuations β is the momentum coefficient Cheung/Cannons 18

Local Minima Training is essentially minimizing the mean square error function Key problem is avoiding local minima Traditional techniques for avoiding local minima: Simulated annealing Perturb the weights in progressively smaller amounts Genetic algorithms Use the weights as chromosomes Apply natural selection, mating, and mutations to these chromosomes Cheung/Cannons 19

Counterpropagation (CP) Networks Another multilayer feedforward network Up to 100 times faster than backpropagation Not as general as backpropagation Made up of three layers: Input Kohonen Grossberg (Output) Inputs Input Layer Kohonen Layer Grossberg Layer Outputs Cheung/Cannons 20

How Do They Work? Kohonen Layer: Neurons in the Kohonen layer sum all of the weighted inputs received The neuron with the largest sum outputs a 1 and the other neurons output 0 Grossberg Layer: Each Grossberg neuron merely outputs the weight of the connection between itself and the one active Kohonen neuron Cheung/Cannons 21

Why Two Different Types of Layers? More accurate representation of biological neural networks Each layer has its own distinct purpose: Kohonen layer separates inputs into separate classes Inputs in the same class will turn on the same Kohonen neuron Grossberg layer adjusts weights to obtain acceptable outputs for each class Fundamentals Cheung/Cannons 22

Training a CP Network Training the Kohonen layer Uses unsupervised training Input vectors are often normalized The one active Kohonen neuron updates its weights according to the formula: w new = w old + α(input - w old ) where α is the learning rate The weights of the connections are being modified to more closely match the values of the inputs At the end of training, the weights will approximate the average value of the inputs in that class Cheung/Cannons 23

Training a CP Network Training the Grossberg layer Uses supervised training Weight update algorithm is similar to that used in backpropagation Fundamentals Cheung/Cannons 24

Hidden Layers and Neurons For most problems, one layer is sufficient Two layers are required when the function is discontinuous The number of neurons is very important: Too few Underfit the data NN can t learn the details Too many Overfit the data NN learns the insignificant details Start small and increase the number until satisfactory results are obtained Cheung/Cannons 25

Overfitting Training Test Well fit Overfit Cheung/Cannons 26

How is the Training Set Chosen? Overfitting can also occur if a good training set is not chosen What constitutes a good training set? Samples must represent the general population Samples must contain members of each class Samples in each class must contain a wide range of variations or noise effect Fundamentals Cheung/Cannons 27

Size of the Training Set The size of the training set is related to the number of hidden neurons Eg. 10 inputs, 5 hidden neurons, 2 outputs: 11(5) + 6(2) = 67 weights (variables) If only 10 training samples are used to determine these weights, the network will end up being overfit Any solution found will be specific to the 10 training samples Analogous to having 10 equations, 67 unknowns you can come up with a specific solution, but you can t find the general solution with the given information Cheung/Cannons 28

Training and Verification The set of all known samples is broken into two orthogonal (independent) sets: Training set A group of samples used to train the neural network Testing set A group of samples used to test the performance of the neural network Used to estimate the error rate Known Samples Training Set Testing Set Cheung/Cannons 29

Verification Provides an unbiased test of the quality of the network Common error is to test the neural network using the same samples that were used to train the neural network The network was optimized on these samples, and will obviously perform well on them Doesn t give any indication as to how well the network will be able to classify inputs that weren t in the training set Cheung/Cannons 30

Verification Various metrics can be used to grade the performance of the neural network based upon the results of the testing set Mean square error, SNR, etc. Resampling is an alternative method of estimating error rate of the neural network Basic idea is to iterate the training and testing procedures multiple times Two main techniques are used: Cross-Validation Bootstrapping Cheung/Cannons 31

Fundamentals and Discussion A simple toy problem was used to test the operation of a perceptron Provided the perceptron with 5 pieces of information about a face the individual s hair, eye, nose, mouth, and ear type Each piece of information could take a value of +1 or -1 +1 indicates a girl feature -1 indicates a guy feature The individual was to be classified as a girl if the face had more girl features than guy features and a boy otherwise Cheung/Cannons 32

and Discussion Fundamentals Constructed a perceptron with 5 inputs and 1 output Face Feature Input Values Input neurons Output neuron Output value indicating boy or girl Trained the perceptron with 24 out of the 32 possible inputs over 1000 epochs The perceptron was able to classify the faces that were not in the training set Cheung/Cannons 33

Fundamentals and Discussion A number of toy problems were tested on multilayer feedforward NN s with a single hidden layer and backpropagation: Inverter The NN was trained to simply output 0.1 when given a 1 and 0.9 when given a 0 A demonstration of the NN s ability to memorize 1 input, 1 hidden neuron, 1 output With learning rate of 0.5 and no momentum, it took about 3,500 epochs for sufficient training Including a momentum coefficient of 0.9 reduced the number of epochs required to about 250 Cheung/Cannons 34

and Discussion Inverter (continued) Increasing the learning rate decreased the training time without hampering convergence for this simple example Increasing the epoch size, the number of samples per epoch, decreased the number of epochs required and seemed to aid in convergence (reduced fluctuations) Increasing the number of hidden neurons decreased the number of epochs required Allowed the NN to better memorize the training set the goal of this toy problem Not recommended to use in real problems, since the NN loses its ability to generalize Cheung/Cannons 35

and Discussion AND gate 2 inputs, 2 hidden neurons, 1 output About 2,500 epochs were required when using momentum XOR gate Same as AND gate 3-to-8 decoder 3 inputs, 3 hidden neurons, 8 outputs About 5,000 epochs were required when using momentum Cheung/Cannons 36

and Discussion Absolute sine function approximator ( sin(x) ) A demonstration of the NN s ability to learn the desired function, sin(x), and to generalize 1 input, 5 hidden neurons, 1 output The NN was trained with samples between π/2 and π/2 The inputs were rounded to one decimal place The desired targets were scaled to between 0.1 and 0.9 The test data contained samples in between the training samples (i.e. more than 1 decimal place) The outputs were translated back to between 0 and 1 About 50,000 epochs required with momentum Not smooth function at 0 (only piece-wise continuous) Cheung/Cannons 37

and Discussion Gaussian function approximator (e -x2 ) 1 input, 2 hidden neurons, 1 output Similar to the absolute sine function approximator, except that the domain was changed to between -3 and 3 About 10,000 epochs were required with momentum Smooth function Cheung/Cannons 38

and Discussion Primality tester 7 inputs, 8 hidden neurons, 1 output The input to the NN was a binary number The NN was trained to output 0.9 if the number was prime and 0.1 if the number was composite Classification and memorization test The inputs were restricted to between 0 and 100 About 50,000 epochs required for the NN to memorize the classifications for the training set No attempts at generalization were made due to the complexity of the pattern of prime numbers Some issues with local minima Cheung/Cannons 39

Fundamentals and Discussion Prime number generator Provide the network with a seed, and a prime number of the same order should be returned 7 inputs, 4 hidden neurons, 7 outputs Both the input and outputs were binary numbers The network was trained as an autoassociative network Prime numbers from 0 to 100 were presented to the network and it was requested that the network echo the prime numbers The intent was to have the network output the closest prime number when given a composite number After one million epochs, the network was successfully able to produce prime numbers for about 85-90% of the numbers between 0 and 100 Using Gray code instead of binary did not improve results Perhaps needs a second hidden layer, or implement some heuristics to reduce local minima issues Cheung/Cannons 40

Conclusion The toy examples confirmed the basic operation of neural networks and also demonstrated their ability to learn the desired function and generalize when needed The ability of neural networks to learn and generalize in addition to their wide range of applicability makes them very powerful tools Cheung/Cannons 41

Questions and Comments Cheung/Cannons 42

Acknowledgements Natural Sciences and Engineering Research Council (NSERC) University of Manitoba Cheung/Cannons 43

References [AbDo99] H. Abdi, D. Valentin, B. Edelman,, Thousand Oaks, CA: SAGE Publication Inc., 1999. [Hayk94] S. Haykin,, New York, NY: Nacmillan College Publishing Company, Inc., 1994. [Mast93] T. Masters, Practial Neural Network Recipes in C++, Toronto, ON: Academic Press, Inc., 1993. [Scha97] R. Schalkoff, Artificial, Toronto, ON: the McGraw-Hill Companies, Inc., 1997. [WeKu91] S. M. Weiss and C. A. Kulikowski, Computer Systems That Learn, San Mateo, CA: Morgan Kaufmann Publishers, Inc., 1991. [Wass89] P. D. Wasserman, Neural Computing: Theory and Practice, New York, NY: Van Nostrand Reinhold, 1989. Cheung/Cannons 44