We have been discussing some simple ideas from statistical learning theory. PR NPTEL course p.1/123

Transcription

1 We have been discussing some simple ideas from statistical learning theory. PR NPTEL course p.1/123

2 We have been discussing some simple ideas from statistical learning theory. The risk minimization framework that we discussed gives us a better perspective on understanding the unifying theme in different learning algorithms. PR NPTEL course p.2/123

3 We have been discussing some simple ideas from statistical learning theory. The risk minimization framework that we discussed gives us a better perspective on understanding the unifying theme in different learning algorithms. We will now go back to studying pattern classification algorithms. PR NPTEL course p.3/123

4 We have been discussing some simple ideas from statistical learning theory. The risk minimization framework that we discussed gives us a better perspective on understanding the unifying theme in different learning algorithms. We will now go back to studying pattern classification algorithms. We will first briefly review algorithms for learning linear classifiers and then start looking at methods to learn nonlinear classifiers. PR NPTEL course p.4/123

5 Linear Models In the two class case, the linear classifier is given by h(x) = sign(w T X + w 0 ) PR NPTEL course p.5/123

6 Linear Models In the two class case, the linear classifier is given by h(x) = sign(w T X + w 0 ) We have seen that we can also think of h(x) as h(x) = sign(w T Φ(X) + w 0 ), where Φ(X) = [φ 1 (X),,φ m (X)] T as long as φ i are fixed (possibly non-linear) functions. PR NPTEL course p.6/123

7 We discussed many algorithms for learning W. PR NPTEL course p.7/123

8 We discussed many algorithms for learning W. The Perceptron algorithm is a simple error-correcting method that is guarenteed to find a separating hyperplane if one exists. PR NPTEL course p.8/123

9 We discussed many algorithms for learning W. The Perceptron algorithm is a simple error-correcting method that is guarenteed to find a separating hyperplane if one exists. The perceptron convergence theorem shows that given any training set of linearly separable patterns, the algorithm will find a separating hyperplane. PR NPTEL course p.9/123

10 We discussed many algorithms for learning W. The Perceptron algorithm is a simple error-correcting method that is guarenteed to find a separating hyperplane if one exists. The perceptron convergence theorem shows that given any training set of linearly separable patterns, the algorithm will find a separating hyperplane. Our discussion on statistical learning theory gives us an idea of how many iid examples we should have before we can be confident that the hyperplane that separates the examples will also do well on test data. PR NPTEL course p.10/123

11 We have also seen the least-squares method where we find W to minimize J(W) = 1 n i (W T X i y i ) 2 where, for simplicity of notation, we have assumed augumented feature vectors. PR NPTEL course p.11/123

12 We have also seen the least-squares method where we find W to minimize J(W) = 1 n i (W T X i y i ) 2 where, for simplicity of notation, we have assumed augumented feature vectors. In our risk minimization framework, H is parametrized by W, we take h(x) = W T X and minimize empirical risk under squared-error loss function. PR NPTEL course p.12/123

13 We have seen how to obtain the least-squares solution: W = (A T A) 1 A T Y where rows of matrix A are feature vectors and components of Y are y i. PR NPTEL course p.13/123

14 We have seen how to obtain the least-squares solution: W = (A T A) 1 A T Y where rows of matrix A are feature vectors and components of Y are y i. The least-squares method can also be used to learn linear regression models. PR NPTEL course p.14/123

15 We have seen how to obtain the least-squares solution: W = (A T A) 1 A T Y where rows of matrix A are feature vectors and components of Y are y i. The least-squares method can also be used to learn linear regression models. The only difference is that in a regression model, the y i are real-valued. PR NPTEL course p.15/123

16 We have seen that we can also minimize the empirical risk J(W) using gradient descent. PR NPTEL course p.16/123

17 We have seen that we can also minimize the empirical risk J(W) using gradient descent. We can also run this gradient descent in an incremental fashion by considering one example at a time. PR NPTEL course p.17/123

18 We have seen that we can also minimize the empirical risk J(W) using gradient descent. We can also run this gradient descent in an incremental fashion by considering one example at a time. That gives us another classical algorithm called the LMS algorithm. PR NPTEL course p.18/123

19 We have also seen that we can use the least squares idea to learn a model g(w T X) by redefining J as J(W) = 1 n i (g(w T X i ) y i ) 2 PR NPTEL course p.19/123

20 We have also seen that we can use the least squares idea to learn a model g(w T X) by redefining J as J(W) = 1 n i (g(w T X i ) y i ) 2 An important example is the logistic regression where we take g as the sigmoid function. PR NPTEL course p.20/123

21 We have also seen that we can use the least squares idea to learn a model g(w T X) by redefining J as J(W) = 1 n i (g(w T X i ) y i ) 2 An important example is the logistic regression where we take g as the sigmoid function. We minimize J by incremental version of gradient descent. PR NPTEL course p.21/123

22 Another important method for learning linear classifiers is the Fisher Linear Discriminant. PR NPTEL course p.22/123

23 Another important method for learning linear classifiers is the Fisher Linear Discriminant. Here, we look for a direction W such that the patterns of the two classes get well-separated when projected onto this one-dimensional subspace. PR NPTEL course p.23/123

24 Another important method for learning linear classifiers is the Fisher Linear Discriminant. Here, we look for a direction W such that the patterns of the two classes get well-separated when projected onto this one-dimensional subspace. As we mentioned, Fisher Linear Discriminant can be thought of as a special case of least-squares method of learning a linear regression model with special target values. PR NPTEL course p.24/123

25 Beyond linear models Learning linear models (classifiers) is generally efficient. PR NPTEL course p.25/123

26 Beyond linear models Learning linear models (classifiers) is generally efficient. However, linear models are not always sufficient. PR NPTEL course p.26/123

27 Beyond linear models Learning linear models (classifiers) is generally efficient. However, linear models are not always sufficient. Best linear functions may still be a poor fit. PR NPTEL course p.27/123

28 Beyond linear models Learning linear models (classifiers) is generally efficient. However, linear models are not always sufficient. Best linear functions may still be a poor fit. We have looked at three broad approaches to learning nonlinear classifiers. PR NPTEL course p.28/123

29 Beyond linear models Learning linear models (classifiers) is generally efficient. However, linear models are not always sufficient. Best linear functions may still be a poor fit. We have looked at three broad approaches to learning nonlinear classifiers. We now discuss neural network models. PR NPTEL course p.29/123

30 Neural network models We need a good parameterized class of nonlinear functions to learn nonlinear classifiers. PR NPTEL course p.30/123

31 Neural network models We need a good parameterized class of nonlinear functions to learn nonlinear classifiers. Artificial neural networks are one such class PR NPTEL course p.31/123

32 Neural network models We need a good parameterized class of nonlinear functions to learn nonlinear classifiers. Artificial neural networks are one such class Nonlinear functions are built up through composition of summation and sigmoids. PR NPTEL course p.32/123

33 Neural network models We need a good parameterized class of nonlinear functions to learn nonlinear classifiers. Artificial neural networks are one such class Nonlinear functions are built up through composition of summation and sigmoids. Useful for both classification and Regression. PR NPTEL course p.33/123

34 In this course we will study only multilayer feedforward networks. PR NPTEL course p.34/123

35 In this course we will study only multilayer feedforward networks. They are useful because they offer good parameterized class of nonlinear functions and there are some efficient algorithms to learn them. PR NPTEL course p.35/123

36 In this course we will study only multilayer feedforward networks. They are useful because they offer good parameterized class of nonlinear functions and there are some efficient algorithms to learn them. However, historically, development of (artificial) neural network models was motivated by some ideas on the structure of human brain. PR NPTEL course p.36/123

37 In this course we will study only multilayer feedforward networks. They are useful because they offer good parameterized class of nonlinear functions and there are some efficient algorithms to learn them. However, historically, development of (artificial) neural network models was motivated by some ideas on the structure of human brain. We briefly look at this perspective of neural networks as an approach to engineering intelligent systems. PR NPTEL course p.37/123

38 What is an Artificial Neural Network? PR NPTEL course p.38/123

39 What is an Artificial Neural Network? "A parallel distributed information processor made up of simple processing units that has a propensity for acquiring problem solving knowledge through experience" PR NPTEL course p.39/123

40 What is an Artificial Neural Network? "A parallel distributed information processor made up of simple processing units that has a propensity for acquiring problem solving knowledge through experience" Large number of inter connected units PR NPTEL course p.40/123

41 What is an Artificial Neural Network? "A parallel distributed information processor made up of simple processing units that has a propensity for acquiring problem solving knowledge through experience" Large number of inter connected units Each unit implements simple function, nonlinear PR NPTEL course p.41/123

42 What is an Artificial Neural Network? "A parallel distributed information processor made up of simple processing units that has a propensity for acquiring problem solving knowledge through experience" Large number of inter connected units Each unit implements simple function, nonlinear The knowledge resides in the interconnection strengths. PR NPTEL course p.42/123

43 What is an Artificial Neural Network? "A parallel distributed information processor made up of simple processing units that has a propensity for acquiring problem solving knowledge through experience" Large number of inter connected units Each unit implements simple function, nonlinear The knowledge resides in the interconnection strengths. Problem solving ability is often through learning PR NPTEL course p.43/123

44 What is an Artificial Neural Network? "A parallel distributed information processor made up of simple processing units that has a propensity for acquiring problem solving knowledge through experience" Large number of inter connected units Each unit implements simple function, nonlinear The knowledge resides in the interconnection strengths. Problem solving ability is often through learning An architecture inspired by the structure of Brain PR NPTEL course p.44/123

45 The Human Brain Neuron - the basic computing unit PR NPTEL course p.45/123

46 The Human Brain Neuron - the basic computing unit Brain is a highly organized structure of networks of interconnected neurons PR NPTEL course p.46/123

47 The Human Brain Neuron - the basic computing unit Brain is a highly organized structure of networks of interconnected neurons In the Brain No of neurons (100 billion) PR NPTEL course p.47/123

48 The Human Brain Neuron - the basic computing unit Brain is a highly organized structure of networks of interconnected neurons In the Brain No of neurons (100 billion) Average synapses per neuron ( ) PR NPTEL course p.48/123

49 The Human Brain Neuron - the basic computing unit Brain is a highly organized structure of networks of interconnected neurons In the Brain No of neurons (100 billion) Average synapses per neuron ( ) Total synapses PR NPTEL course p.49/123

50 The Human Brain Neuron - the basic computing unit Brain is a highly organized structure of networks of interconnected neurons In the Brain No of neurons (100 billion) Average synapses per neuron ( ) Total synapses Neuron time constants Milliseconds PR NPTEL course p.50/123

51 The Human Brain Neuron - the basic computing unit Brain is a highly organized structure of networks of interconnected neurons In the Brain No of neurons (100 billion) Average synapses per neuron ( ) Total synapses Neuron time constants Milliseconds Single neuron can send 100 spikes per second PR NPTEL course p.51/123

52 A rough estimate of processing power: One arithmetic operation per synapse 10 4 operations per neuron per spike 10 6 operations per neuron per sec operations per sec!! (A gigaflop is 10 9 operations, teraflop is operations!) PR NPTEL course p.52/123

53 A rough estimate of processing power: One arithmetic operation per synapse 10 4 operations per neuron per spike 10 6 operations per neuron per sec operations per sec!! (A gigaflop is 10 9 operations, teraflop is operations!) Massive parallelism can deliver massive computing power, PR NPTEL course p.53/123

54 A rough estimate of processing power: One arithmetic operation per synapse 10 4 operations per neuron per spike 10 6 operations per neuron per sec operations per sec!! (A gigaflop is 10 9 operations, teraflop is operations!) Massive parallelism can deliver massive computing power, if we know how to manage it PR NPTEL course p.54/123

55 Digital computers: Precise design, highly constrained, not very adaptive or fault tolerant, Centralized control, deterministic, basic switching times 10 9 sec PR NPTEL course p.55/123

56 Digital computers: Precise design, highly constrained, not very adaptive or fault tolerant, Centralized control, deterministic, basic switching times 10 9 sec Natural neural networks: massively parallel, highly adaptive and fault tolerant, self configuring, self repairing, noisy, stochastic, basic switching time 10 3 sec PR NPTEL course p.56/123

57 Digital computers: Precise design, highly constrained, not very adaptive or fault tolerant, Centralized control, deterministic, basic switching times 10 9 sec Natural neural networks: massively parallel, highly adaptive and fault tolerant, self configuring, self repairing, noisy, stochastic, basic switching time 10 3 sec Most capabilities of Brain are LEARNT. PR NPTEL course p.57/123

58 Artificial Intelligence (AI) Understanding intelligence in computational terms. PR NPTEL course p.58/123

59 Artificial Intelligence (AI) Understanding intelligence in computational terms. Developing machines that are intelligent. PR NPTEL course p.59/123

60 Artificial Intelligence (AI) Understanding intelligence in computational terms. Developing machines that are intelligent. At least two distinct approaches PR NPTEL course p.60/123

61 Artificial Intelligence (AI) Understanding intelligence in computational terms. Developing machines that are intelligent. At least two distinct approaches Try to model intelligent behavior in terms of processing structured symbols. (Resulting methods, algorithms etc may not resemble brain at the implementation level) PR NPTEL course p.61/123

62 Artificial Intelligence (AI) Understanding intelligence in computational terms. Developing machines that are intelligent. At least two distinct approaches Try to model intelligent behavior in terms of processing structured symbols. (Resulting methods, algorithms etc may not resemble brain at the implementation level) A second approach is based on mimicking human brain at architectural/implementation level PR NPTEL course p.62/123

63 The symbolic AI approach Brain is to be understood in computational terms only PR NPTEL course p.63/123

64 The symbolic AI approach Brain is to be understood in computational terms only Physical symbol system hypothesis PR NPTEL course p.64/123

65 The symbolic AI approach Brain is to be understood in computational terms only Physical symbol system hypothesis A digital computer is a universal symbol manipulator and can be programmed to be intelligent PR NPTEL course p.65/123

66 The symbolic AI approach Brain is to be understood in computational terms only Physical symbol system hypothesis A digital computer is a universal symbol manipulator and can be programmed to be intelligent Many useful engineering applications e.g. Expert systems PR NPTEL course p.66/123

67 The symbolic AI approach Brain is to be understood in computational terms only Physical symbol system hypothesis A digital computer is a universal symbol manipulator and can be programmed to be intelligent Many useful engineering applications e.g. Expert systems An implicit faith: The architecture of Brain per se is irrelevant for engineering intelligent artifacts PR NPTEL course p.67/123

68 Artificial Neural Networks Can be viewed as one approach towards understanding brain/building intelligent machines PR NPTEL course p.68/123

69 Artificial Neural Networks Can be viewed as one approach towards understanding brain/building intelligent machines Computational architectures inspired by brain PR NPTEL course p.69/123

70 Artificial Neural Networks Can be viewed as one approach towards understanding brain/building intelligent machines Computational architectures inspired by brain Computational methods for learning dependencies in data stream PR NPTEL course p.70/123

71 Artificial Neural Networks Can be viewed as one approach towards understanding brain/building intelligent machines Computational architectures inspired by brain Computational methods for learning dependencies in data stream e.g. Pattern Recognition, System identification PR NPTEL course p.71/123

72 Artificial Neural Networks Can be viewed as one approach towards understanding brain/building intelligent machines Computational architectures inspired by brain Computational methods for learning dependencies in data stream e.g. Pattern Recognition, System identification Characteristics: Emergent properties, learning, self adaptation PR NPTEL course p.72/123

73 Artificial Neural Networks Can be viewed as one approach towards understanding brain/building intelligent machines Computational architectures inspired by brain Computational methods for learning dependencies in data stream e.g. Pattern Recognition, System identification Characteristics: Emergent properties, learning, self adaptation Modeling Biology? Mathematically purified neurons!! PR NPTEL course p.73/123

74 Artificial Neural Networks Computing machines that try to mimic brain architecture. PR NPTEL course p.74/123

75 Artificial Neural Networks Computing machines that try to mimic brain architecture. A large network of interconnected units PR NPTEL course p.75/123

76 Artificial Neural Networks Computing machines that try to mimic brain architecture. A large network of interconnected units Each unit has simple input-output mapping PR NPTEL course p.76/123

77 Artificial Neural Networks Computing machines that try to mimic brain architecture. A large network of interconnected units Each unit has simple input-output mapping Each interconnection has numerical weight attached to it PR NPTEL course p.77/123

78 Artificial Neural Networks Computing machines that try to mimic brain architecture. A large network of interconnected units Each unit has simple input-output mapping Each interconnection has numerical weight attached to it Output of unit depends on outputs and connection weights of units connected to it PR NPTEL course p.78/123

79 Artificial Neural Networks Computing machines that try to mimic brain architecture. A large network of interconnected units Each unit has simple input-output mapping Each interconnection has numerical weight attached to it Output of unit depends on outputs and connection weights of units connected to it Knowledge resides in the weights PR NPTEL course p.79/123

80 Artificial Neural Networks Computing machines that try to mimic brain architecture. A large network of interconnected units Each unit has simple input-output mapping Each interconnection has numerical weight attached to it Output of unit depends on outputs and connection weights of units connected to it Knowledge resides in the weights Problem solving ability is often through learning PR NPTEL course p.80/123

81 Single neuron model PR NPTEL course p.81/123

82 Single neuron model x i are inputs into the (artificial) neuron and w i are the corresponding weights. y is the output of the neuron PR NPTEL course p.82/123

83 Single neuron model x i are inputs into the (artificial) neuron and w i are the corresponding weights. y is the output of the neuron Net input : η = j w jx j output: y = f(η), where f(.) is called activation function PR NPTEL course p.83/123

84 Single neuron model x i are inputs into the (artificial) neuron and w i are the corresponding weights. y is the output of the neuron Net input : η = j w jx j output: y = f(η), where f(.) is called activation function (Perceptron, AdaLinE are such models). PR NPTEL course p.84/123

85 Networks of neurons We can connect a number of such units or neurons to form a network. Inputs to a neuron can be outputs of other neurons (and/or external inputs). PR NPTEL course p.85/123

86 Networks of neurons We can connect a number of such units or neurons to form a network. Inputs to a neuron can be outputs of other neurons (and/or external inputs). PR NPTEL course p.86/123

87 Networks of neurons We can connect a number of such units or neurons to form a network. Inputs to a neuron can be outputs of other neurons (and/or external inputs). Notation: y j output of j th neuron; w ij weight of connection from neuron i to neuron j. PR NPTEL course p.87/123

88 Each neuron computes weighted sum of inputs and passes it through its activation function, to compute output PR NPTEL course p.88/123

89 Each neuron computes weighted sum of inputs and passes it through its activation function, to compute output For example, output of neuron 5 is y 5 = f 5 (w 35 y 3 + w 45 y 4 ) PR NPTEL course p.89/123

90 Each neuron computes weighted sum of inputs and passes it through its activation function, to compute output For example, output of neuron 5 is y 5 = f 5 (w 35 y 3 + w 45 y 4 ) = f 5 (w 35 f 3 (w 13 y 1 + w 23 y 2 ) + w 45 f 4 (w 14 y 1 + w 24 y 2 )) PR NPTEL course p.90/123

91 Each neuron computes weighted sum of inputs and passes it through its activation function, to compute output For example, output of neuron 5 is y 5 = f 5 (w 35 y 3 + w 45 y 4 ) = f 5 (w 35 f 3 (w 13 y 1 + w 23 y 2 ) + w 45 f 4 (w 14 y 1 + w 24 y 2 )) By convention, we take y 1 = x 1 and y 2 = x 2. PR NPTEL course p.91/123

92 Each neuron computes weighted sum of inputs and passes it through its activation function, to compute output For example, output of neuron 5 is y 5 = f 5 (w 35 y 3 + w 45 y 4 ) = f 5 (w 35 f 3 (w 13 y 1 + w 23 y 2 ) + w 45 f 4 (w 14 y 1 + w 24 y 2 )) By convention, we take y 1 = x 1 and y 2 = x 2. Here, x 1, x 2 are inputs and y 5, y 6 are outputs. PR NPTEL course p.92/123

93 A single neuron represents a class of functions from R m to R. PR NPTEL course p.93/123

94 A single neuron represents a class of functions from R m to R. Specific set of weights realise specific functions. PR NPTEL course p.94/123

95 A single neuron represents a class of functions from R m to R. Specific set of weights realise specific functions. By interconnecting many units/neurons, networks can represent more complicated functions from R m to R m. PR NPTEL course p.95/123

96 A single neuron represents a class of functions from R m to R. Specific set of weights realise specific functions. By interconnecting many units/neurons, networks can represent more complicated functions from R m to R m. The architecture constrains the function class that can be represented. Weights define specific function in the class. PR NPTEL course p.96/123

97 A single neuron represents a class of functions from R m to R. Specific set of weights realise specific functions. By interconnecting many units/neurons, networks can represent more complicated functions from R m to R m. The architecture constrains the function class that can be represented. Weights define specific function in the class. To form meaningful networks, nonlinearity of activation function is important. PR NPTEL course p.97/123

98 Typical activation functions 1. Hard limiter: f(x) = 1 if x > τ = 0 otherwise PR NPTEL course p.98/123

99 Typical activation functions 1. Hard limiter: f(x) = 1 if x > τ = 0 otherwise We can keep the τ to be zero and add one more input line to the neuron. An example of a single neuron with this activation function is Perceptron. PR NPTEL course p.99/123

100 Activation functions (cont) Sigmoid function: f(x) = a 1 + exp ( bx), a, b > 0 PR NPTEL course p.100/123

101 Activation functions (Contd.) 3. tanh f(x) = atanh(bx), a, b > 0 PR NPTEL course p.101/123

102 Why study such models? A belief that the architecture of Brain is critical to intelligent behavior. PR NPTEL course p.102/123

103 Why study such models? A belief that the architecture of Brain is critical to intelligent behavior. Models can implement highly nonlinear functions. They are adaptive and can be trained. PR NPTEL course p.103/123

104 Why study such models? A belief that the architecture of Brain is critical to intelligent behavior. Models can implement highly nonlinear functions. They are adaptive and can be trained. Useful in many applications PR NPTEL course p.104/123

105 Why study such models? A belief that the architecture of Brain is critical to intelligent behavior. Models can implement highly nonlinear functions. They are adaptive and can be trained. Useful in many applications Time series prediction PR NPTEL course p.105/123

106 Why study such models? A belief that the architecture of Brain is critical to intelligent behavior. Models can implement highly nonlinear functions. They are adaptive and can be trained. Useful in many applications Time series prediction system identification and control PR NPTEL course p.106/123

107 Why study such models? A belief that the architecture of Brain is critical to intelligent behavior. Models can implement highly nonlinear functions. They are adaptive and can be trained. Useful in many applications Time series prediction system identification and control pattern recognition and Regression PR NPTEL course p.107/123

108 Why study such models? A belief that the architecture of Brain is critical to intelligent behavior. Models can implement highly nonlinear functions. They are adaptive and can be trained. Useful in many applications Time series prediction system identification and control pattern recognition and Regression Model can help us understand Brain function Computational neuroscience PR NPTEL course p.108/123

109 Many different models are possible PR NPTEL course p.109/123

110 Many different models are possible Evolution: Discrete time / continuous time synchronous / asynchronous deterministic / stochastic PR NPTEL course p.110/123

111 Many different models are possible Evolution: Discrete time / continuous time synchronous / asynchronous deterministic / stochastic Interconnections: Feedforward / having feedback PR NPTEL course p.111/123

112 Many different models are possible Evolution: Discrete time / continuous time synchronous / asynchronous deterministic / stochastic Interconnections: Feedforward / having feedback States or outputs of units: binary / finitely many / continuous PR NPTEL course p.112/123

113 Recurrent networks The network we saw earlier has no feedback. PR NPTEL course p.113/123

114 Recurrent networks The network we saw earlier has no feedback. Here is an example of a network with feedback PR NPTEL course p.114/123

115 Recurrent networks The network we saw earlier has no feedback. Here is an example of a network with feedback Can model a dynamical system: y(k) = f(y(k 1), x 1 (k), x 2 (k)) PR NPTEL course p.115/123

116 We will consider only feedforward networks which provide a general class of nonlinear functions. PR NPTEL course p.116/123

117 We will consider only feedforward networks which provide a general class of nonlinear functions. These can always be organized as a layered network. PR NPTEL course p.117/123

118 We will consider only feedforward networks which provide a general class of nonlinear functions. These can always be organized as a layered network. This network represents a class of functions from R 2 to R 2. PR NPTEL course p.118/123

119 Each unit can also have a bias input. PR NPTEL course p.119/123

120 Each unit can also have a bias input. This is shown for a single unit below. PR NPTEL course p.120/123

121 Each unit can also have a bias input. This is shown for a single unit below. One can always think of bias as an extra input y = f ( d i=1 w i x i + w 0 ) PR NPTEL course p.121/123

122 Each unit can also have a bias input. This is shown for a single unit below. One can always think of bias as an extra input ( d ) ( d y = f w i x i + w 0 = f i=1 i=0 ) w i x i, x 0 = +1 PR NPTEL course p.122/123

123 Multilayer feedforward networks Here is a general multilayer feedforward network. PR NPTEL course p.123/123