Learning framework for NNs. Introduction to Neural Networks. Learning goal: Inputs/outputs. x 1 x 2. y 1 y 2

Similar documents
An Idiot s guide to Support vector machines (SVMs)

Face Hallucination and Recognition

Simultaneous Routing and Power Allocation in CDMA Wireless Data Networks

Lecture 6. Artificial Neural Networks

FRAME BASED TEXTURE CLASSIFICATION BY CONSIDERING VARIOUS SPATIAL NEIGHBORHOODS. Karl Skretting and John Håkon Husøy

Chapter 4: Artificial Neural Networks

ONE of the most challenging problems addressed by the

Pricing Internet Services With Multiple Providers

Fast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing.

GREEN: An Active Queue Management Algorithm for a Self Managed Internet

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

An Introduction to Neural Networks

Lecture 7 Datalink Ethernet, Home. Datalink Layer Architectures

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Figure 1. A Simple Centrifugal Speed Governor.

Elsevier Editorial System(tm) for Computers in Biology and Medicine Manuscript Draft

Neural Networks and Support Vector Machines

With the arrival of Java 2 Micro Edition (J2ME) and its industry

Neural network software tool development: exploring programming language options

CONDENSATION. Prabal Talukdar. Associate Professor Department of Mechanical Engineering IIT Delhi

Discounted Cash Flow Analysis (aka Engineering Economy)

A Latent Variable Pairwise Classification Model of a Clustering Ensemble

Normalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies

GWPD 4 Measuring water levels by use of an electric tape

INTRODUCTION TO THE FINITE ELEMENT METHOD

Linear Models for Classification

Leakage detection in water pipe networks using a Bayesian probabilistic framework

Neural Networks in Quantitative Finance

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Linear Classification. Volker Tresp Summer 2015

Virtual trunk simulation


Temporal Difference Learning in the Tetris Game

Market Design & Analysis for a P2P Backup System

Applying graph theory to automatic vehicle tracking by remote sensing

A Supplier Evaluation System for Automotive Industry According To Iso/Ts Requirements

Artificial neural networks

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 8: Multi-Layer Perceptrons

WINMAG Graphics Management System

Budgeting Loans from the Social Fund

Multi-Robot Task Scheduling

Art of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN:

Secure Network Coding with a Cost Criterion

3.5 Pendulum period :40:05 UTC / rev 4d4a39156f1e. g = 4π2 l T 2. g = 4π2 x1 m 4 s 2 = π 2 m s Pendulum period 68

Statistical Machine Learning

Interpreting Individual Classifications of Hierarchical Networks

Predict Influencers in the Social Network

Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images

Recurrent Neural Networks

Precise assessment of partial discharge in underground MV/HV power cables and terminations

The Use of Cooling-Factor Curves for Coordinating Fuses and Reclosers

Australian Bureau of Statistics Management of Business Providers

SQL. Ilchul Yoon Assistant Professor State University of New York, Korea. on tables. describing schema. CSE 532 Theory of Database Systems

An FDD Wideband CDMA MAC Protocol for Wireless Multimedia Networks

Advanced ColdFusion 4.0 Application Development Server Clustering Using Bright Tiger

Hamstring strains. What is a hamstring strain? How do hamstring strains occur? what you ll find in this brochure

Design and Analysis of a Hidden Peer-to-peer Backup Market

LADDER SAFETY Table of Contents

TCP/IP Gateways and Firewalls

500 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 3, MARCH 2013

A Practical Framework for Privacy-Preserving Data Analytics

SELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH. Ufuk Cebeci

arxiv: v1 [cs.ai] 18 Jun 2015

A Branch-and-Price Algorithm for Parallel Machine Scheduling with Time Windows and Job Priorities

Optimizing QoS-Aware Semantic Web Service Composition

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

Machine Learning and Data Mining -

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Spatio-Temporal Asynchronous Co-Occurrence Pattern for Big Climate Data towards Long-Lead Flood Prediction

SMORN-VII REPORT NEURAL NETWORK BENCHMARK ANALYSIS RESULTS & FOLLOW-UP 96. Özer CIFTCIOGLU Istanbul Technical University, ITU. and

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 12, DECEMBER

On Capacity Scaling in Arbitrary Wireless Networks

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Design of Follow-Up Experiments for Improving Model Discrimination and Parameter Estimation

NCH Software PlayPad Media Player

Presented at the 107th Convention 1999 September New York


The Simple Pendulum. by Dr. James E. Parks

Chapter 3: e-business Integration Patterns

A New Statistical Approach to Network Anomaly Detection

Wide-Area Traffic Management for. Cloud Services

Neural Networks algorithms and applications

Betting Strategies, Market Selection, and the Wisdom of Crowds

The Basel II Risk Parameters. Second edition

Measuring operational risk in financial institutions

CERTIFICATE COURSE ON CLIMATE CHANGE AND SUSTAINABILITY. Course Offered By: Indian Environmental Society

Neural Networks: a replacement for Gaussian Processes?

ACO and SVM Selection Feature Weighting of Network Intrusion Detection Method

Machine Learning and Pattern Recognition Logistic Regression

Spherical Correlation of Visual Representations for 3D Model Retrieval

TERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007.

Telephony Trainers with Discovery Software

Link Dimensioning and LSP Optimization for MPLS Networks Supporting DiffServ EF and BE traffic classes

Cooperative Content Distribution and Traffic Engineering in an ISP Network

Business schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations.

Online Supplement for The Robust Network Loading Problem under Hose Demand Uncertainty: Formulation, Polyhedral Analysis, and Computations

Teamwork. Abstract. 2.1 Overview

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

APPENDIX 10.1: SUBSTANTIVE AUDIT PROGRAMME FOR PRODUCTION WAGES: TROSTON PLC

Transcription:

Introduction to Neura Networks Learning framework for NNs What are neura networks? Noninear function approimators How do they reate to pattern recognition/cassification? Noninear discriminant functions More compe decision boundaries than inear discriminant functions (e.g. Fisher, Gaussians with equa covariances inputs n Unknown mapping G rainabe mode Γ y y y m m mode outputs desired outputs Inputs/outputs Learning goa: Definitions: y y G( (e.g discriminant function we want to earn n ( n inputs y y y m ( m process outputs Find w such that: E( w E( w, w, where E( w error between G and Γ. rainabe mode: Γ(, ( w adjustabe parameters w m ( m mode outputs What shoud E( w be?

Error function (idea Error function (practica Input/output data: p input-output training patterns Ideay, E( w y p( d p y y y p How to compute? i y i i i in y i y i y im, i Γ( i, w. E( w p -- y i i i p m -- y ( ij ij i j Artificia neura networks (NNs Neura networks are one type of parametric mode Γ. Noninear function approimators Bioogica inspiration Structure and function oosey based on bioogica neura networks (e.g. brain. Reativey simpe buiding bocks connected together in massive and parae network. Adjustabe (trainabe parameters w (weights Map inputs to outputs dendrites neuron aon Why Neura Network? What does a neuron do?

Neuron transfer function Rough approimation: threshod function aon output net stimuus from dendrites Neura networks: crude emuation of bioogy Simpe basic buiding bocks. Individua units are connected massivey and in parae. Individua units have threshod-type activation functions. Learning through adjustment of the strength of connection (weights between individua units Caveat: Artificia neura networks are much, much, much simper than bioogica systems. Eampe: Human brain: neurons connections Basic buiding bocks of neura networks Basic buiding bock: the unit ψ φ φ φ φ q (scaar inputs w ω ω ω q (weights ω ω ω ω q φ φ φ φ q noninear activation function ψ ( w ω i φ i φ q i (output

Perceptrons: the simpest neura network hreshod activation function t ( u u θ u < θ ω ω n ω ω n t ( u What is this? θ u Perceptron output Limited mapping capabiity Perceptron mapping: w t w t < ω.5 ω ω.8.6.4...4.6.8 OR function where,.8 n.6.4. w ω ω ω n..4.6.8 XOR function

More genera networks: activation function ( u.8.6.4. ( u ( u ---------------- (sigmoid + e u e u e u e u e u ------------------ (hyperboic tangent + sigmoid ( u.5 -.5 hyperboic tangent signa fow (feedforward More genera networks: mutiayer perceptrons (MLPs m n output ayer input ayer hidden unit ayer u - -5 5 - - -5 5 u signa fow (feedforward More genera networks: mutiayer perceptrons (MLPs m n output ayer hidden unit ayer # hidden unit ayer # input ayer MLP appication eampe: ALVINN Sharp Left Straight Ahead Sharp Right 4 Hidden Units 3 Output Units 33 Sensor Input Retina ALVINN: Neura Network for Autonomous Steering

A simpe eampe Derivation of function ƒ( f ( c[ t ( a t ( b ] ω 6 ω 7 f ( c t ( a c t ( b ω 5 f ( c ω ω ω 4 ω 3 t ( u ( ku as k f ( c [ k ( a ] c [ k ( b ] for arge k. a b ω 5 + ω 6 ( ω + ω + ω 7 ( ω 3 + ω 4 Weight vaues for simpe eampe Some theoretica properties of NNs set # set # ω ω ω 3 ω 4 ω 5 ω 6 ω 7 kb k ka k c c ka k kb k c c Singe-input functions: what does the previous eampe say about singe-input functions? f (.8.6.4. NN error.4. -. -.4 -.6 4 6 8 -.8 4 6 8

Muti-input functions: universa function approimator? Does the singe-input eampe hod in genera? Neura networks in practice: 3 basic steps. Coect input/output training data.. Seect an appropriate neura network architecture: Number of hidden ayers Number of hidden units in each ayer. 3. rain (adjust the weights of the neura network to minimie the error measure, E p -- y i i i Neura network training Gradient descent (one parameter Key probem: How to adjust w to minimie E? Answer: use derivative information on error surface.. Initiaie ω to some random initia vaue.. Change ω iterativey at step t according to: E ( ω a b e c ω( t + ω( t η------------- de dω( t d f ω Impies oca, not goba minimum...

Genera gradient descent Simpe eampe of gradient computation. Initiaie w to some random initia vaue.. Change w iterativey at step t according to: w( t + w( t η E[ w( t ] E[ w( t ] ω ( t ω ( t ( ωq t Compute ω 4 ω 5 for the neura network beow: ω 6 ω 7 ω ω ω 4 Singe training pattern y, E -- ( y ω 3 Derivation Generaiation to mutipe training patterns: ω j p -- y ω j ( i i i p i -- ( y i i ω j. net ω + ω net ω 3 + ω 4 h ( net h ( net ω 5 + ω 6 h + ω 7 h ω 4 ( y ω4 Derivation ω 5 ω ω ω 4 ω 3 ω 6 ω 7 ω 4 h ( y h ------------ net net ω 4

net ω + ω net ω 3 + ω 4 h ( net h ( net ω 5 + ω 6 h + ω 7 h ω 4 ω 4 Derivation ω 5 h ( y h ------------ net net ω 4 ( yω 7 ' ( net ω ω ω 4 ω 3 ω 6 ω 7 Generaiation: Backpropagation Key probem: Generaie specific resut to compute derivatives in more genera manner. Answer: Backpropagation agorithm [Rumehart and McCeand,986]. Efficient, agorithmic formuation for computing error derivatives Gradient computation without hardcoding derivatives (aows on-the-fy adjustment of NN architectures. ( h k ω kj ω ij k Backpropagation derivation net j ----------- ω ij ----------- h i ω ij h i ω ij ω ij h i Backpropagation derivation: output units net k ω jk h i unit k ω ij E net k m -- y ( k ------------ net k

Backpropagation derivation: output units unit k m E -- y ( Backpropagation derivation:output units unit k net k k ------------ net k net k ω jk net k k ------------ net k net k ω jk ( net k h i ω ij ------------ ' ( net net k k h i ω ij ( y k '( net k ( y k ω jk Backpropagation derivation: hidden units unit k Backpropagation derivation: hidden units unit k net k ω jk net k ω jk h i ω ij net ----------- net h i ω ij net ----------- net net δ -----------

Backpropagation derivation: hidden units net k ω jk net unit k δ ----------- net s ω s ( net s Backpropagation derivation: hidden units net k ω jk unit k h i ω ij net ----------- ω ' ( j δ ω j ' ( h i ω ij δ ω j ' ( netj ω ij h i Output units: ω jk Hidden units: Backpropagation summary ( y k '( net k Basic steps in using neura networks. Coect training data. Preprocess training data 3. Seect neura network architecture 4. Seect earning agorithm 5. Weight initiaiation ω ij h i 6. Forward pass δ ω j ' ( netj 7. Backward pass 8. Repeat steps 6 and 7 unti satisfactory mode is reached.

he Forward Pass he Backward Pass. Evauate at the outputs, where,. Appy an input vector to network.. Compute the net input to each hidden unit ( netj. 3. Compute the hidden-unit outputs (, 4. Compute the neura network outputs (. i net k for each output unit k.. Backpropagate the δ vaues from the outputs backwards through the neura network. 3. Compute. ω i 4. Update weights based on the computed gradient, w( t + w( t η E[ w( t ]. Practica issues. What shoud your training data be? Practica issues (continued 5. Seecting the earning parameter Sufficient training data? Biased training data? Deterministic/stochastic task? Stationary/non-stationary? In gradient descent: w( t + w( t η E[ w( t ] what shoud η be?. What shoud your neura network architecture be? 3. Preprocessing of data. 4. Weight initiaiation why sma, random vaues? Difficut question to answer...

Seecting the earning parameter: an eampe Sampe error surface: E ω + ω (reaistic? Seecting the earning parameter: an eampe Where is the minimum of this error surface? E ω + ω E 4.5 -.5 -.5 ω -.5 ω.5 -.5 How many steps to convergence? ( E < 6 Different initia weights Different earning rates Deriving the gradient descent equations Gradient? E ω + ω Convergence eperiments Initia weights: ( ω, ω (, 4 -------- 4ω ω Gradient descent? ω ( t + ω ( t η--------------- ω ( t ω ( t + ω ( t ( 4η ω ( t + ω ( t ( η -------- ω ω # steps to convergence 8 6 4...3.4.5 η

A coser ook A coser ook η. η.4.5.5 ω.5 ω.5 -.5 -.5 - -.5.5 -.5 -.5 - -.5.5 ω ω A coser ook η.5 What happens at η >.5? Gradient descent equations: ω ( t + ω ( t ( 4η.5 ω ( t + ω ( t ( η ω.5 -.5 -.5 - -.5.5 ω Simiar to fied-point iteration: ω( t + cω( t diverges for c >, ω( converges for c <.

Convergence of gradient descent equations Learning rate discussion ω ( t + ω ( t ( 4η require that: 4η < < 4η < < η <.5 Why not η <? ω ( t + ω ( t ( η Probematic error surfaces: ong, steep-sided vaeys If earning rate is too sma, sow convergence. If earning rate is too arge, possibe divergence. heoretica bounds not possibe in genera case (ony for specific, trivia eampe. Motivation for ooking at more advanced training agorithms doing more with the gradient information. Any thoughts? Practica issues (continued Good generaiation: wo data sets 6. Pattern vs. batch training 7. Good generaiation y y NN error cross-vaidation data training data Sufficienty constrained neura network architecture. Cross vaidation. eary stopping point training time