Lecture 2: Single Layer Perceptrons Kevin Swingler



Similar documents
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Support Vector Machines

Multiple stage amplifiers

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

The OC Curve of Attribute Acceptance Plans

Recurrence. 1 Definitions and main statements

This circuit than can be reduced to a planar circuit

What is Candidate Sampling

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Implementation of Deutsch's Algorithm Using Mathcad

21 Vectors: The Cross Product & Torque

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Forecasting the Direction and Strength of Stock Market Movement

Single and multiple stage classifiers implementing logistic discrimination

An Alternative Way to Measure Private Equity Performance

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

A Probabilistic Theory of Coherence

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Calculation of Sampling Weights

1 Example 1: Axis-aligned rectangles

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

CHAPTER 14 MORE ABOUT REGRESSION

1. Measuring association using correlation and regression

Finite Math Chapter 10: Study Guide and Solution to Problems

Statistical algorithms in Review Manager 5

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Intra-day Trading of the FTSE-100 Futures Contract Using Neural Networks With Wavelet Encodings

Simple Interest Loans (Section 5.1) :

Probabilistic Linear Classifier: Logistic Regression. CS534-Machine Learning

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Solder paste inspection and 3D shape estimation using directional LED lightings

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING. Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582

Formulating & Solving Integer Problems Chapter

We assume your students are learning about self-regulation (how to change how alert they feel) through the Alert Program with its three stages:

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Chapter 6. Classification and Prediction

Design of Output Codes for Fast Covering Learning using Basic Decomposition Techniques

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

An Overview of Financial Mathematics

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Development of an intelligent system for tool wear monitoring applying neural networks

Least Squares Fitting of Data

BERNSTEIN POLYNOMIALS

Texas Instruments 30X IIS Calculator

IMPACT ANALYSIS OF A CELLULAR PHONE

Logistic Regression. Steve Kroon

Figure 1. Training and Test data sets for Nasdaq-100 Index (b) NIFTY index

The Mathematical Derivation of Least Squares

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

IS-LM Model 1 C' dy = di


Financial Mathemetics

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*

Time Delayed Independent Component Analysis for Data Quality Monitoring

Performance Analysis and Coding Strategy of ECOC SVMs

Section 5.4 Annuities, Present Value, and Amortization

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Using Series to Analyze Financial Situations: Present Value

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

(6)(2) (-6)(-4) (-4)(6) + (-2)(-3) + (4)(3) + (2)(-3) = = 0

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Credit Limit Optimization (CLO) for Credit Cards

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks

8 Algorithm for Binary Searching in Trees

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Meta-Analysis of Hazard Ratios

Hybrid-Learning Methods for Stock Index Modeling

A Continuous Restricted Boltzmann Machine with a Hardware-Amenable Learning Algorithm

Lecture 2 Sequence Alignment. Burr Settles IBS Summer Research Program 2008 bsettles@cs.wisc.edu

Hedging Interest-Rate Risk with Duration

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Data Mining Analysis and Modeling for Marketing Based on Attributes of Customer Relationship

Biometric Signature Processing & Recognition Using Radial Basis Function Network

IT09 - Identity Management Policy

A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach

Efficient Project Portfolio as a tool for Enterprise Risk Management

A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Time Value of Money Module

Overview of monitoring and evaluation

Small-Signal Analysis of BJT Differential Pairs

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

L10: Linear discriminants analysis

For example, you might want to capture security group membership changes. A quick web search may lead you to the 632 event.

FINANCIAL MATHEMATICS. A Practical Guide for Actuaries. and other Business Professionals

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Project Networks With Mixed-Time Constraints

Figure 1. Inventory Level vs. Time - EOQ Problem

Transcription:

Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk

Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses (.e. connectons) brngs n actvatons from other neurons 2. A processng unt sums the nputs, and then apples a non-lnear actvaton functon 3. An output lne transmts the result to other neurons 2

Netorks of McCulloch-Ptts Neurons One neuron can t do much on ts on. Usually e ll have many neurons labelled by ndces k,, and actvaton flos beteen va synapses th strengths k, : W 2 Y 3 θ Neuron k k Neuron Synapse Neuron k Yk k Y sgn( k θ) Y n k 3

The Perceptron We can connect any number of McCulloch-Ptts neurons together n any ay e lke An arrangement of one nput layer of McCulloch-Ptts neurons feedng forard to one output layer of McCulloch-Ptts neurons s knon as a Perceptron. θ 2 N : : θ 2 : : θ M 2 M Y n sgn( Y θ ) 4

mplementng Logc Gates th MP Neurons We can use McCulloch-Ptts neurons to mplement the basc logc gates (e.g. AND, OR, NOT). t s ell knon from logc that e can construct any logcal functon from these three basc logc gates. All e need to do s fnd the approprate connecton eghts and neuron thresholds to produce the rght outputs for each set of nputs. We shall see explctly ho one can construct smple netorks that perform NOT, AND, and OR. 5

mplementaton of Logcal NOT, AND, and OR n NOT out n AND n 2 out n OR n 2 out??? Problem: Tran netork to calculate the approprate eghts and thresholds n order to classfy correctly the dfferent classes (.e. form decson boundares beteen classes). 6

Decson Surfaces Decson surface s the surface at hch the output of the unt s precsely equal to the threshold,.e. θ n -D the surface s ust a pont: θ/ n 2-D, the surface s Y Y + 2 2 θ hch e can re-rte as θ So, n 2-D the decson boundares are 2 2 2 alays straght lnes. 7

Decson Boundares for AND and OR We can no plot the decson boundares of our logc gates AND, 2, θ.5 OR, 2, θ.5 OR 2 out AND 2 out (, ) (, ) (, ) (, ) (, ) (, ) 2 (, ) (, ) 2 8

Decson Boundary for XOR The dffculty n dealng th XOR s rather obvous. We need to straght lnes to separate the dfferent outputs/decsons: XOR 2 out 2 2 Soluton: ether change the transfer functon so that t has more than one decson boundary, or use a more complex netork that s able to generate more complex decson boundares. 9

ANN Archtectures Mathematcally, ANNs can be represented as eghted drected graphs. The most common ANN archtectures are: Sngle-Layer Feed-Forard NNs: One nput layer and one output layer of processng unts. No feedback connectons (e.g. a Perceptron) Mult-Layer Feed-Forard NNs: One nput layer, one output layer, and one or more hdden layers of processng unts. No feedback connectons (e.g. a Mult-Layer Perceptron) Recurrent NNs: Any netork th at least one feedback connecton. t may, or may not, have hdden unts Further nterestng varatons nclude: sparse connectons, tme-delayed connectons, movng ndos,

Examples of Netork Archtectures Sngle Layer Mult-Layer Recurrent Feed-Forard Feed-Forard Netork

Types of Actvaton/Transfer Functon Threshold Functon f ( x) f f x x < f(x) x Pecese-Lnear Functon f ( x) x +.5 f x.5 f.5 x f x.5.5 f(x) x Sgmod Functon f ( x) + e x f(x) x 2

3 The Threshold as a Specal Knd of Weght The basc Perceptron equaton can be smplfed f e consder that the threshold s another connecton eght: n n n θ θ + + + K 2 2 f e defne -θ and then n n n n + + + + 2 2 K θ The Perceptron equaton then becomes ) sgn( ) sgn( n n Y θ So, e only have to compute the eghts.

Example: A Classfcaton Task A typcal neural netork applcaton s classfcaton. Consder the smple example of classfyng trucks gven ther masses and lengths: Mass. 2. 5. 2. 2. 3.. 5. 5. Length 6 5 4 5 5 6 7 8 9 Class Lorry Lorry Van Van Van Lorry Lorry Lorry Lorry Ho do e construct a neural netork that can classfy any Lorry and Van? 4

Cookbook Recpe for Buldng Neural Netorks Formulatng neural netork solutons for partcular problems s a mult-stage process:. Understand and specfy the problem n terms of nputs and requred outputs 2. Take the smplest form of netork you thnk mght be able to solve your problem 3. Try to fnd the approprate connecton eghts (ncludng neuron thresholds) so that the netork produces the rght outputs for each nput n ts tranng data 4. Make sure that the netork orks on ts tranng data and test ts generalzaton by checkng ts performance on ne testng data 5. f the netork doesn t perform ell enough, go back to stage 3 and try harder 6. f the netork stll doesn t perform ell enough, go back to stage 2 and try harder 7. f the netork stll doesn t perform ell enough, go back to stage and try harder 8. Problem solved or not 5

Buldng a Neural Netork (stages & 2) For our truck example, our nputs can be drect encodngs of the masses and lengths. Generally e ould have one output unt for each class, th actvaton for yes and for no. n our example, e stll have one output unt, but the actvaton corresponds to lorry and to van (or vce versa). The smplest netork e should try frst s the sngle layer Perceptron. We can further smplfy thngs by replacng the threshold by an extra eght as e dscussed before. Ths gves us: Classsgn( +.Mass+ 2.Length) 2 Mass Length 6

Tranng the Neural Netork (stage 3) Whether our neural netork s a smple Perceptron, or a much complcated mult-layer netork, e need to develop a systematc procedure for determnng approprate connecton eghts. The common procedure s to have the netork learn the approprate eghts from a representatve set of tranng data. For classfcatons a smple Perceptron uses decson boundares (lnes or hyperplanes), hch t shfts around untl each tranng pattern s correctly classfed. The process of shftng around n a systematc ay s called learnng. The learnng process can then be dvded nto a number of small steps. 7

Supervsed Tranng. Generate a tranng par or pattern: - an nput x [ x x 2 x n ] - a target output y target (knon/gven) 2. Then, present the netork th x and allo t to generate an output y 3. Compare y th y target to compute the error 4. Adust eghts,, to reduce error 5. Repeat 2-4 multple tmes 8

Perceptron Learnng Rule. ntalze eghts at random 2. For each tranng par/pattern (x, y target ) - Compute output y - Compute error, δ(y target y) - Use the error to update eghts as follos: old η*δ*x or ne old + η*δ*x here η s called the learnng rate or step sze and t determnes ho smoothly the learnng process s takng place. 3. Repeat 2 untl convergence (.e. error δ s zero) The Perceptron Learnng Rule s then gven by ne old + η*δ*x here δ(y target y) 9