Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk
Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses (.e. connectons) brngs n actvatons from other neurons 2. A processng unt sums the nputs, and then apples a non-lnear actvaton functon 3. An output lne transmts the result to other neurons 2
Netorks of McCulloch-Ptts Neurons One neuron can t do much on ts on. Usually e ll have many neurons labelled by ndces k,, and actvaton flos beteen va synapses th strengths k, : W 2 Y 3 θ Neuron k k Neuron Synapse Neuron k Yk k Y sgn( k θ) Y n k 3
The Perceptron We can connect any number of McCulloch-Ptts neurons together n any ay e lke An arrangement of one nput layer of McCulloch-Ptts neurons feedng forard to one output layer of McCulloch-Ptts neurons s knon as a Perceptron. θ 2 N : : θ 2 : : θ M 2 M Y n sgn( Y θ ) 4
mplementng Logc Gates th MP Neurons We can use McCulloch-Ptts neurons to mplement the basc logc gates (e.g. AND, OR, NOT). t s ell knon from logc that e can construct any logcal functon from these three basc logc gates. All e need to do s fnd the approprate connecton eghts and neuron thresholds to produce the rght outputs for each set of nputs. We shall see explctly ho one can construct smple netorks that perform NOT, AND, and OR. 5
mplementaton of Logcal NOT, AND, and OR n NOT out n AND n 2 out n OR n 2 out??? Problem: Tran netork to calculate the approprate eghts and thresholds n order to classfy correctly the dfferent classes (.e. form decson boundares beteen classes). 6
Decson Surfaces Decson surface s the surface at hch the output of the unt s precsely equal to the threshold,.e. θ n -D the surface s ust a pont: θ/ n 2-D, the surface s Y Y + 2 2 θ hch e can re-rte as θ So, n 2-D the decson boundares are 2 2 2 alays straght lnes. 7
Decson Boundares for AND and OR We can no plot the decson boundares of our logc gates AND, 2, θ.5 OR, 2, θ.5 OR 2 out AND 2 out (, ) (, ) (, ) (, ) (, ) (, ) 2 (, ) (, ) 2 8
Decson Boundary for XOR The dffculty n dealng th XOR s rather obvous. We need to straght lnes to separate the dfferent outputs/decsons: XOR 2 out 2 2 Soluton: ether change the transfer functon so that t has more than one decson boundary, or use a more complex netork that s able to generate more complex decson boundares. 9
ANN Archtectures Mathematcally, ANNs can be represented as eghted drected graphs. The most common ANN archtectures are: Sngle-Layer Feed-Forard NNs: One nput layer and one output layer of processng unts. No feedback connectons (e.g. a Perceptron) Mult-Layer Feed-Forard NNs: One nput layer, one output layer, and one or more hdden layers of processng unts. No feedback connectons (e.g. a Mult-Layer Perceptron) Recurrent NNs: Any netork th at least one feedback connecton. t may, or may not, have hdden unts Further nterestng varatons nclude: sparse connectons, tme-delayed connectons, movng ndos,
Examples of Netork Archtectures Sngle Layer Mult-Layer Recurrent Feed-Forard Feed-Forard Netork
Types of Actvaton/Transfer Functon Threshold Functon f ( x) f f x x < f(x) x Pecese-Lnear Functon f ( x) x +.5 f x.5 f.5 x f x.5.5 f(x) x Sgmod Functon f ( x) + e x f(x) x 2
3 The Threshold as a Specal Knd of Weght The basc Perceptron equaton can be smplfed f e consder that the threshold s another connecton eght: n n n θ θ + + + K 2 2 f e defne -θ and then n n n n + + + + 2 2 K θ The Perceptron equaton then becomes ) sgn( ) sgn( n n Y θ So, e only have to compute the eghts.
Example: A Classfcaton Task A typcal neural netork applcaton s classfcaton. Consder the smple example of classfyng trucks gven ther masses and lengths: Mass. 2. 5. 2. 2. 3.. 5. 5. Length 6 5 4 5 5 6 7 8 9 Class Lorry Lorry Van Van Van Lorry Lorry Lorry Lorry Ho do e construct a neural netork that can classfy any Lorry and Van? 4
Cookbook Recpe for Buldng Neural Netorks Formulatng neural netork solutons for partcular problems s a mult-stage process:. Understand and specfy the problem n terms of nputs and requred outputs 2. Take the smplest form of netork you thnk mght be able to solve your problem 3. Try to fnd the approprate connecton eghts (ncludng neuron thresholds) so that the netork produces the rght outputs for each nput n ts tranng data 4. Make sure that the netork orks on ts tranng data and test ts generalzaton by checkng ts performance on ne testng data 5. f the netork doesn t perform ell enough, go back to stage 3 and try harder 6. f the netork stll doesn t perform ell enough, go back to stage 2 and try harder 7. f the netork stll doesn t perform ell enough, go back to stage and try harder 8. Problem solved or not 5
Buldng a Neural Netork (stages & 2) For our truck example, our nputs can be drect encodngs of the masses and lengths. Generally e ould have one output unt for each class, th actvaton for yes and for no. n our example, e stll have one output unt, but the actvaton corresponds to lorry and to van (or vce versa). The smplest netork e should try frst s the sngle layer Perceptron. We can further smplfy thngs by replacng the threshold by an extra eght as e dscussed before. Ths gves us: Classsgn( +.Mass+ 2.Length) 2 Mass Length 6
Tranng the Neural Netork (stage 3) Whether our neural netork s a smple Perceptron, or a much complcated mult-layer netork, e need to develop a systematc procedure for determnng approprate connecton eghts. The common procedure s to have the netork learn the approprate eghts from a representatve set of tranng data. For classfcatons a smple Perceptron uses decson boundares (lnes or hyperplanes), hch t shfts around untl each tranng pattern s correctly classfed. The process of shftng around n a systematc ay s called learnng. The learnng process can then be dvded nto a number of small steps. 7
Supervsed Tranng. Generate a tranng par or pattern: - an nput x [ x x 2 x n ] - a target output y target (knon/gven) 2. Then, present the netork th x and allo t to generate an output y 3. Compare y th y target to compute the error 4. Adust eghts,, to reduce error 5. Repeat 2-4 multple tmes 8
Perceptron Learnng Rule. ntalze eghts at random 2. For each tranng par/pattern (x, y target ) - Compute output y - Compute error, δ(y target y) - Use the error to update eghts as follos: old η*δ*x or ne old + η*δ*x here η s called the learnng rate or step sze and t determnes ho smoothly the learnng process s takng place. 3. Repeat 2 untl convergence (.e. error δ s zero) The Perceptron Learnng Rule s then gven by ne old + η*δ*x here δ(y target y) 9