Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Similar documents
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Neural Networks and Support Vector Machines

Lecture 8 February 4

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial neural networks

An Introduction to Neural Networks

NEURAL NETWORKS A Comprehensive Foundation

Introduction to Artificial Neural Networks

Neural Networks algorithms and applications

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Neural network software tool development: exploring programming language options

Chapter 4: Artificial Neural Networks

Lecture 6. Artificial Neural Networks

Role of Neural network in data mining

Neural Network Design in Cloud Computing

Introduction to Machine Learning Using Python. Vikram Kamath

Machine Learning and Data Mining -

Feedforward Neural Networks and Backpropagation

A Time Series ANN Approach for Weather Forecasting

6.2.8 Neural networks for data mining

Analecta Vol. 8, No. 2 ISSN

Chapter 12 Discovering New Knowledge Data Mining

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Neural Networks and Back Propagation Algorithm

Neural Computation - Assignment

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

Package AMORE. February 19, 2015

SMORN-VII REPORT NEURAL NETWORK BENCHMARK ANALYSIS RESULTS & FOLLOW-UP 96. Özer CIFTCIOGLU Istanbul Technical University, ITU. and

Recurrent Neural Networks

APPLICATION OF ARTIFICIAL NEURAL NETWORKS USING HIJRI LUNAR TRANSACTION AS EXTRACTED VARIABLES TO PREDICT STOCK TREND DIRECTION

Novelty Detection in image recognition using IRF Neural Networks properties

TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC

Machine Learning and Pattern Recognition Logistic Regression

Neural Networks in Quantitative Finance

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Follow links Class Use and other Permissions. For more information, send to:

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Horse Racing Prediction Using Artificial Neural Networks

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

ARTIFICIAL NEURAL NETWORKS FOR DATA MINING

Programming Exercise 3: Multi-class Classification and Neural Networks

Machine Learning: Multi Layer Perceptrons

3 An Illustrative Example

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

SEMINAR OUTLINE. Introduction to Data Mining Using Artificial Neural Networks. Definitions of Neural Networks. Definitions of Neural Networks

Neural Networks: a replacement for Gaussian Processes?

129: Artificial Neural Networks. Ajith Abraham Oklahoma State University, Stillwater, OK, USA 1 INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS

Deep Learning for Multivariate Financial Time Series. Gilberto Batres-Estrada

Artificial Neural Network for Speech Recognition

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

1. Classification problems

Big Data Analytics CSCI 4030

XIV. Title. 2.1 Schematics of the WEP Encryption in WEP technique Decryption in WEP technique Process of TKIP 25

Application of Neural Network in User Authentication for Smart Home System

Application of Neural Networks to Character Recognition

EVALUATION OF NEURAL NETWORK BASED CLASSIFICATION SYSTEMS FOR CLINICAL CANCER DATA CLASSIFICATION

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS

Data Mining using Artificial Neural Network Rules

TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE

Package neuralnet. February 20, 2015

Stock Prediction using Artificial Neural Networks

Linear Threshold Units

More Data Mining with Weka

A Neural Network based Approach for Predicting Customer Churn in Cellular Network Services

NEUROMATHEMATICS: DEVELOPMENT TENDENCIES. 1. Which tasks are adequate of neurocomputers?

Design call center management system of e-commerce based on BP neural network and multifractal

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 8: Multi-Layer Perceptrons

Bank Customers (Credit) Rating System Based On Expert System and ANN

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Data Mining and Neural Networks in Stata

Power Prediction Analysis using Artificial Neural Network in MS Excel

Self Organizing Maps: Fundamentals

OPTIMUM LEARNING RATE FOR CLASSIFICATION PROBLEM

Preface. C++ Neural Networks and Fuzzy Logic:Preface. Table of Contents

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Learning to Process Natural Language in Big Data Environment

Linear Models for Classification

Predictive Dynamix Inc

Neural Network Add-in

Back Propagation Neural Network for Wireless Networking

Biological Neurons and Neural Networks, Artificial Neurons

An Introduction to Artificial Neural Networks (ANN) - Methods, Abstraction, and Usage

The Research of Data Mining Based on Neural Networks

Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

Neural Network Toolbox

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

Simplified Machine Learning for CUDA. Umar

Neural Network and Genetic Algorithm Based Trading Systems. Donn S. Fishbein, MD, PhD Neuroquant.com

Analysis of Multilayer Neural Networks with Direct and Cross-Forward Connection

Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

Introduction to Learning & Decision Trees

Data quality in Accounting Information Systems

The Backpropagation Algorithm

Comparison of K-means and Backpropagation Data Mining Algorithms

Transcription:

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

How much energy do we need for brain functions?

Information processing: Trade-off between energy consumption and wiring cost

Trade-off between energy consumption (wiring cost) and maximal information processing

Optical character recognition (OCR) OCR is the process of optically scanning an image and interpreting this digital image so that the computer understands its meaning.

Two feed-forward process The perception of a letter, the physical sensing of an image of a letter. Process attaching meaning to such an image.

The perception of a letter Sensory feature vector: the number of feature values defines the dimensionality of the feature space.

Mapping functions: the recognition process to a vector function

Perceptron The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.

Population node as perceptron Linear perceptron

Output manifold of population node with two input channels

Boolean algebra Boolean algebra (or Boolean logic) is a logical calculus of truth values, developed by George Boole in the 1840s. These turn out to coincide with the set of all operations on the set {0,1} that take only finitely many arguments; there are 2 2n such operations when there are n arguments.

Boolean algebra It resembles the algebra of real numbers, but with the numeric operations of multiplication xy, addition x + y, and negation x replaced by the respective logical operations of conunction x y, disunction x y, and negation x. The Boolean operations are these and all other operations that can be built from these, such as x (y z).

Boolean functions: the threshold node

Look-up table, graphical representation, and single threshold population nodes

The history of perceptron A feed-forward neural network with two or more layers (i.e., a multilayer perceptron) had far greater processing power than perceptrons with one layer (i.e., a single layer perceptron). Single layer perceptrons are only capable of learning linearly separable patterns; in 1969 a famous book entitled Perceptrons by Marvin Minsky and Seymour Papert showed that it was not possible for these classes of network to learn an XOR function.

The history of perceptron Both Minsky and Papert already knew that multi-layer perceptrons were capable of producing an XOR Function. Three years later, Stephen Grossberg published a series of papers introducing networks capable of modelling differential, contrast-enhancing and XOR functions.

The history of perceptron Nevertheless the often-miscited Minsky/Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until neural network research experienced a resurgence in the 1980s. This text was reprinted in 1987 as "Perceptrons - Expanded Edition" where some errors in the original text are shown and corrected.

The number of nonlinear separable functions grows rapidly with the dimension of the feature space and soon outgrows the number of linear separable functions

The weight matrix will be changed by small amounts in an attempt to find a better answer. Learning delta rule The delta rule is a gradient descent learning rule for updating the weights of the artificial neurons in a singlelayer perceptron.

Supervised learning Given examples Find perceptron such that R N 0,1 x 1 y 1 y Hw T x a a x 2 y 2 x 3 y 3

Example: handwritten digits Find a perceptron that detects two s.

Delta rule w y Hw T xx Learning from mistakes. delta : difference between desired and actual output. Also called perceptron learning rule

Two types of mistakes False positive Make w less like x. y 0, Hw T x1 False negative Make w more like x. w x y 1, Hw T x 0 w x The update is always proportional to x.

Gradient update Obective function w e w ew,x,y y Hw T x w T x Stochastic gradient descent on Ew ew,x,y E=0 means no mistakes.

If examples are nonseparable The delta rule does not converge. Obective function is not equal to the number of mistakes. No reason to believe that the delta rule minimizes the number of mistakes.

Contrast with Hebb rule w yx w y y x Hebb rule Perceptron learning rule Assume that the teacher can drive the perceptron to produce the desired output. What are the obective functions?

Is the delta rule biological? Actual output: anti-hebbian w Hw T xx Desired output: Hebbian w yx Contrastive

Obective function Hebb rule distance from inputs Delta rule error in reproducing the output

Multi-layer Perceptrons

Multilayer Perceptrons: Architecture Input layer Output layer Hidden Layers

A solution for the XOR problem x 1 x 1 x 2 x 1 xor x 2-1 -1-1 -1 1 1 1-1 1 1 1-1 -1-1 1-1 1 x 2 x 1 x 2 +1-1 -1 +1 +1 +1-1 0.1 1 if v > 0 (v) = -1 if v 0 is the sign function.

NEURON MODEL Sigmoidal Function v ) ( 1 Increasing a (v ) 1 e av 1-10 -8-6 -4-2 2 4 6 8 10 v induced field of neuron Most common form of activation function a threshold function Differentiable v v w i0,..., m i y i

Learning Algorithms Back-propagation algorithm Function signals Forward Step Error signals Backward Step It adusts the weights of the NN in order to minimize the average squared error.

Average Squared Error Error signal of output neuron at presentation of n-th training example: Total energy at time n: Average squared error: Measure of learning performance: e (n) E(n) E AV d 1 2 1 N (n) - y C N n1 e 2 (n) (n) E(n) C: Set of neurons in output layer N: size of training set Goal: Adust weights of NN to minimize E AV

Notation e y v Error at output of neuron Output of neuron w i0,..., m i y i Induced local field of neuron

Weight Update Rule Update rule is based on the gradient descent meth take a step in the direction yielding the maximum decrease of E w i E - w i Step in direction opposite to the gradient w i With to neuron weight associated to the link from neuro

Definition of the Local Gradient of neuron v - E Local Gradient ) (v e We obtain because ) (v ' 1) ( e v y y e e v E E

Update Rule We obtain because w i y i E E v w i v w i E v v w i y i

Error e of output neuron Single layer Perceptron: output neuron e d - y Then ( d - y ) ' (v )

Multi-layer Perceptron