Artificial Neural Networks (1)
|
|
- Whitney Griffin
- 7 years ago
- Views:
Transcription
1 Artificial Neural Networks (1) John Kelleher & Brian Mac Namee Machine DIT
2 1 Artificial Neurons Cognitive Basis Basic Components Unthresholded linear units Activation Functions 2 Implementing logical functions 3 Network structures 4 Feed-forward networks Feed-forward networks as Parameterised Func s of Input Types of Feed-forward Networks 5 Perceptrons What is a Perceptron? Representational Power of Threshold Perceptrons Perceptron Learning (Perceptron Training Rule) Perceptron Learning (Gradient Descent Learning) Perceptron Learning Algorithm 6 Perceptrons versus Decision Trees 7 Summary
3 Cognitive Basis Figure: A diagram illustrating the structure of a neuron neurons of > 20 types, synapses, 1ms 10ms cycle time, Signals are noisy spike trains of electrical potential.
4 Basic Components Neural networks are composed of nodes or units connected by directed links A link from unit j to unit i serves to propagate the activation a j from j to i. Each link also has a numeric weight W j,i associated with it, which determines the strength and the sign of the connection These units are a gross oversimplification of real neurons, but its purpose is to develop understanding of what networks of simple units can do
5 Unthresholded linear units The simplest form of unit is an unthresholded linear unit. Given a vector of real-value inputs a = a 0,..., a j, and a vector of real-value weights each of which is associated with an input w = w 0,..., w j, the output of an unthresholded linear unit is equal to the weighted sum of its inputs: a i j w ji a j or in vector form: a i w a Recall the dot product ( ) is defined as, given w = [w 1, w 2,..., w n ] and a = [a 1, a 2,..., a n ] then: w a = (w 1 a 1 ) + (w 2 a 2 ) + (w 3 a 3 )
6 Activation Functions Usually however the basic unthresholded linear unit is augmented with some form of activation function. In these more complex units, the processing stages of a unit are: 1 First each unit i computes a weighted sum of its inputs: in i j w ji a j 2 Then it applies an activation function g to this sum to derive the output (activation) a i : a i g(in i ) = g w ji a j j
7 Activation Functions Figure: The figure above illustrates the processing stages of such an augmented unit. Note: we have included a bias weight W 0,i connected to a fixed input a 0 = 1. This bias weight is used to parameterize the behaviour of the activation functions.
8 Activation Functions Figure: The outputs of two frequently used activations functions (a) is a graph of the activation of a unit that is using a step activation function aka threshold activation function. Units that use a threshold function as their activation function are called linear threshold units or McCulloch Pitts units. The activation of these units is equal to: { 1 if ini < 0 i.e. w a < 0 a i = g(in i ) = 0 otherwise
9 Activation Functions Figure: The outputs of two frequently used activations functions (b) is a graph of the activation of a unit that is using a sigmoid (aka. logistic) function 1/(1 e x ). A unit using a sigmoidal activation function is known as a sigmoid unit. The activation of a sigmoid unit is a continuous function of its inputs that ranges between 0 and 1, increasing monotonically with its inputs. It is equal to: a i = g(in i ) = 1/(1 e ( w a) )
10 Activation Functions Figure: The outputs of two frequently used activations functions Notice that both functions have a threshold (either hard or soft) at zero. For the linear threshold function, the bias weight w 0i sets the actual threshold for the unit in the sense that the unit is activated when the weighted sum of real inputs n j=1 w jia j (i.e. excluding the bias input) exceeds w 0i a 0.
11 Activation Functions Why use the sigmoidal activation function? Although complex the sigmoid function has the advantage of being differentiable, a property which is important for the weight-learning algorithm we will develop. When g(in i ) is a sigmoidal function δg(in i ) δin i = g(in i ) (1 g(in i ))
12 McCulloch and Pitts: every basic Boolean function can be implemented using a linear threshold activation function, and given the appropriate inputs and bias weights. This is important because it means we can use these units to build a network to compute any Boolean function of inputs.
13 Figure: Threshold units that implement standard logical functions Class exercise: Assuming a 0 = 1 show using a truth table how each of the above units models the corresponding logic function using a {1 if ( w a > 0 ) threshold function g(in i ) = 0 otherwise
14 Figure: Threshold units that implement standard logical functions Inputs in i a i a 0 a 1 a 2 2 j=0 w jia j (in i > 0)?(1) : (0) Table: Truth table for AND unit
15 Figure: Threshold units that implement standard logical functions Inputs in i a i a 0 a 1 a 2 2 j=0 w jia j (in i > 0)?(1) : (0) Table: Truth table for OR unit
16 Figure: Threshold units that implement standard logical functions Inputs in i a i a 0 a 2 1 j=0 w jia j (in i > 0)?(1) : (0) Table: Truth table for NOT unit
17 There are two main categories of neural networks structures: 1 acyclic or feed-forward networks Feed-forward networks implement functions, have no internal state e.g: single-layer perceptrons, multi-layer perceptrons 2 cyclic or recurrent networks. feeds its outputs back into its own inputs. recurrent neural nets have directed cycles with delays = have internal state (like flip-flops), can oscillate etc. = can support short-term memory.
18 Feed-forward networks as Parameterised Func s of Input Figure: Example feed-forward network with two input units (1, 2), two hidden units (3, 4) and 1 output unit (5). To keep things simple we have omitted the bias units in this example
19 Feed-forward networks as Parameterised Func s of Input The output of a feed-forward network is a parameterized function of the network inputs, e.g.: a 5 = g(w 3,5 a 3 + w 4,5 a 4 ) = g(w 3,5 g(w 1,3 a 1 + w 2,3 a 2 ) + w 4,5 g(w 1,4 a 1 + w 2,4 a 2 )) By expressing the output of each hidden unit as a function of its inputs, we can see that the output of the network as a whole, a 5, is a function of the network s inputs. Furthermore, we see that the weights in the network act as parameters of this function. = By adjusting the weights, we can change the function that the network represents. This is how learning occurs in neural nets!
20 Types of Feed-forward Networks Feed-forward networks are usually arranged in layers, such that each unit receives input only from units in the immediately preceding layer. In single-layer (aka. perceptron) networks there are no hidden units: i.e., all the inputs connected directly to the outputs In multi-layer networks there are one or more layers of hidden units.
21 What is a Perceptron? Figure: A perceptron. A perceptron is a feed-forward network with no hidden units. It takes a vector of real-valued inputs calculates a linear combination of these inputs then outputs a 1 if the result is greater than some threshold and 1 otherwise.
22 What is a Perceptron? Figure: A perceptron network consisting of three perceptron output units that share five inputs. Looking at a particular output (say the second one, outlined in bold), we see that the weights on its incoming links have no effect on the other output units = output units all operate separately no shared weights
23 What is a Perceptron? Figure: A graph of the output of a two-input perceptron unit with a sigmoid activation function. adjusting weights moves the location, orientation, and steepness of cliff.
24 Representational Power of Threshold Perceptrons A threshold perceptron is a single layer network (no hidden units) with units that use a threshold activation function (i.e. McCulloch-Pitts units). We have already shown how such a network can represent the basic boolen functions: AND, OR, NOT However, such a network cannot be used to implement XOR. Why is this?
25 Representational Power of Threshold Perceptrons We can think of a 2 input threshold perceptron as representing a line separator in 2D input space. In other words, the function j W ja j > 0 or w a > 0 defines a line in the input space and the perceptron outputs a 1 for instances lying on one side of the line and -1 for instances lying on the other side of the line.
26 Representational Power of Threshold Perceptrons Figure: The decision surface represented by a two-input perceptron. The + and symbols represent training examples from two different class that the decision surface can distinguish between. The figure above illustrates this concept. The equation for the line in this image is w a > 0. The inputs into the perceptron are x 1 and x 2 ; i.e. a = {x 1, x 2 }.
27 Representational Power of Threshold Perceptrons This concept of a linear perceptron defining a line scales up into higher dimensional input space. In 3D and higher inputs space (i.e. in situations where the linear perceptron has 3 or more inputs), the linear activation function w a > 0 defines a hyperplane decision surface in the n-dimensional space of inputs. Data-sets of positive and negative examples that can be separated by a hyperplane are called linearly separable. Of course, not all data-sets of positive and negative examples are linearly separable. The XOR function is one example of a non-linearly separable function.
28 Representational Power of Threshold Perceptrons Linear Separability in threshold perceptrons. Figure: The circles represent the data points to be classified. The colour of the circles indicates their correct classification: black dots represent a point in the input space where the value of the function is 1, and white dots indicate a point where the value is 0. The diagonal lines illustrate potential linear borders of demarcation between the classes. As is evident from the images, in the AND and OR inputs spaces it is possible to draw a line that separates the black dots from the white dots. However, in the XOR space no such line exists. As a result, a threshold perceptron cannot represent the XOR function.
29 Representational Power of Threshold Perceptrons This limitation of threshold perceptrons was first highlighted by Minsky & Papert (1969) and resulted in a lot of people turning away from neural networks. Minsky & Papert showed that in general, threshold perceptrons can represent only linearly separable functions. Sigmoid perceptrons are similarly limited, in the sense that they represent only soft linear separators However, its not all dome and gloom we can represent the XOR function using a multilayer network. Minsky, M. and Papert, S. (1969). Perceptrons. MIT Press, Cambridge.
30 Representational Power of Threshold Perceptrons Figure: A multilayer neural network that implements the XOR function. XOR is easiest to construct using step-function units. Because XOR is not linearly separable, we will need a hidden layer. It turns out that just one hidden node suffices. We can think of the XOR function as OR with the AND case (both inputs on) ruled out. Thus the hidden layer computes AND, while the output layer computes OR but weights the output of the hidden node negatively.
31 Perceptron Learning (Perceptron Training Rule) Because threshold perceptrons have limitations on their representational power we will generally be interested in training multilayer networks of threshold units. However, as an introduction to training multilayer networks we will first look at how to learn weights for a single perceptron. Here the precise learning problem is to determine a weight vector that causes the perceptron to produce the correct + 1 output for each of the given training examples.
32 Perceptron Learning (Perceptron Training Rule) Perhaps the most interesting aspect of neural networks is that the connection weights need not be set by hand or fixed in advance. Most models are born with their weights set to random values and then these weights are iteratively adjusted by a learning algorithm on the basis of a series of training examples that pair inputs with targets.
33 Perceptron Learning (Perceptron Training Rule) Perceptron Training Rule Weights are modified at each step according to the perceptron training rule, which revises the weight w i associated with input a i according to the rule: w i w i + η(t o) a i where t = target output, o = observed output, η is a positive constant called the learning rate. Learning rate moderates the degree to which weights are changed at each step; usually η = a small value (e.g., 0.1); sometimes made to decay as number of weight-training iterations increases.
34 Perceptron Learning (Perceptron Training Rule) Why would this rule converge toward successful weight values? If the training example is correctly classified (t o) = 0 η(t o) a i = 0 so no weights are updated. If the case of a false negative (o=0 and t=1) we want to make the perceptron output a 1 instead of a 0 so the weights must be altered to increase the value of w a. Notice that in this case the rule will increase w i because (t o), η and a i are all positive. On the other hand, in the case of a false positive (o=1 and t=0) then the weights associated with a i will be decreased.
35 Perceptron Learning (Gradient Descent Learning) The learning procedure we have just described can be proven to converge within a finite number of applications of the perceptron training rule to a weight vector that correctly classifies all training examples, provided the training examples are linearly separable and provided that a sufficiently small η is used (see Minsky & Papert, 1969). However, the perceptron training rule can fail to converge if the examples are not linearly separable. A second approach is to use gradient descent to search the hypothesis space of possible weights to find the weights that best fit the training examples. To understand the gradient descent algorithm, it is helpful to visualise the entire hypothesis space of possible weights vectors and their associated error values. (see next slide)
36 Perceptron Learning (Gradient Descent Learning) Figure: Graph of an error surface across a hypotheses space. w 0 and w 1 represent possible values for two weights of a simple linear unit. The w 0, w 1 plane therefore represents the entire hypothesis space. The vertical axes indicates the error relative to some fixed set of training examples.
37 Perceptron Learning (Gradient Descent Learning) Figure: Graph of an error surface across a hypotheses space. The error surface shown summarises the desirability of every weight vector in the hypothesis space, we are searching for the hypothesis (weight vector) with minimum error, (the hypothesis at the global minimum in the error surface).
38 Perceptron Learning (Gradient Descent Learning) Figure: Graph of an error surface across a hypotheses space. The arrow shows the negated gradient at one particular point, indicating the direction in the w 0, w 1 plane producing steepest descent along the error surface.
39 Perceptron Learning (Gradient Descent Learning) Note that for linear units the error surface must always be parabolic with a single global minimum. Gradient descent search determines a weight vector that minimises the error E by starting with an arbitrary initial weight vector, then repeatedly modifiying it in small steps. At each step, the weight vector is altered in the direction that produces the steepest descent along the error surface. This process continues until the global minimum error is reached.
40 Perceptron Learning (Gradient Descent Learning) To apply this gradient descent approach we need to: 1 define the function that computes the error of the network. 2 be able to compute the slope of the error surface at a particular point as a function in the change of the weights at a node. 3 define an weight update rules that uses the slope information to produce the steepest descent along the error surface.
41 Perceptron Learning (Gradient Descent Learning) Defining the error of the network The classical measure of error used in gradient descent search is the mean squared network error. This is computed by summing the squared error for each node. Formally the mean squared error for a network = E = 1 2 m (tn i on i ) 2 = 1 2 i=1 m (tn i g( a i w i ) 2 i=1 where the network has m output units, tn i is the target output for unit i, on i is the actual output for unit i, g the activation function of the network units, a i is the vector of inputs to unit i, w i is the vector of weights in the network on the links into unit i, and on i = g( w a).
42 Perceptron Learning (Gradient Descent Learning) In order to compute slopes we need to know some calculus: Fundamental Calculus: In mathematics, the derivative is a measurement of how a function changes when the values of its inputs change and a partial derivative (denoted by the symbol ) of a function of several variables is its derivative with respect to one of those variables with the others held constant. x x = 1, x 2 x = 2x g(f (x)) Chain rule: x = g f (x) (f (x)) x Ex.1: if f (x) = (x 2 + 1) 3 then f (x) = 3(x 2 + 1) 2 (2x) Ex.2: if f (x) = 1 2 (c x)2 then f (x) = 2 (c x 2 (c x)( x ) = (c x)( 1) = (c x) If the activation function g( w a) is a sigmoidal function 1 ( ) then δg(in i ) 1 e ( w a) δin i = g(in i ) (1 g(in i )) with in i = w a
43 Perceptron Learning (Gradient Descent Learning) Computing the slope of the error surface as a function in the change of the weights at a node. We can compute the slope of a surface by taking the derivative of the function that defines that surface. The error surface is defined by the function: E = 1 m (tn i on i ) 2 = 1 m (tn i g( a i w i )) i=1 What we want is the rate of the change of network error E as a function of change in a particular weight w k : E = E on i w k on i w k i=1 Chain rule: g(f (x)) x = g (f (x)) f (x) x
44 Perceptron Learning (Gradient Descent Learning) How do we compute E w k = E on i on i w k? Step 1, we will compute the partial derivative of the total E error with respect to each output unit: E on i = 1 2 on i m i=1 (tn i g( a i w i )) 2 g( a i w i ) = 1 2 (tn i g( a i w i )) 2 g( a i w i ) = 1 2 2(tn i g( a i w i )) tn i g( a i w i ) g( a i w i ) = (tn i g( a i w i ))( 1) = (tn i g( a i w i )) We can drop the summation because we are considering a node on the output layer, where its error will not affect any other node.
45 Perceptron Learning (Gradient Descent Learning) How do we compute E w k = E on i on i w k? Step 2 we will compute the partial derivative of the actual output at the ith node taken with respect to each weight at that node: on i = g( a i w i ) w k w k = g ( a i w i ) a i w i w k = g ( a i w i )a k where g () is the derivative of the activation function of the network units and a k is the single input component k on the input whose weight w k is being updated.
46 Perceptron Learning (Gradient Descent Learning) Putting these two derivations together we can now define as: E w k E = (tn i g( a i w i ) g ( a w k }{{} i w i )a }{{ k } E on i on i w k where g is the derivative of the activation function.
47 Perceptron Learning (Gradient Descent Learning) Gradient Descent Weight Update Rule In the gradient descent algorithm where we want to reduce E (i.e. we want the weight to be change in the direction of the negative gradient component) we update the weight using the following rule: w k w k + (η (tn i g( a i w i )) g ( a i w i ) a k ) where η is the learning rate.
48 Perceptron Learning (Gradient Descent Learning) Gradient descent weight update rule: w k w k + (η (tn i g( a i w i )) g ( a i w i ) a k ) Intuitively, this makes a lot of sense. If the error (tn i g( a i w i )) is positive (i.e., we have a false negative), the network output is too small and so the weights are increased for the positive inputs a i > 0 and decreased for the negative inputs a i < 0. The rule does this because η, (tn i g( a i w i )), g ( a i w i ) are all positive so (η (tn i g( a i w i )) g ( a i w i ) a i ) will be positive when a k is positive and negative when a k is negative. The opposite happens when the error is negative.
49 Perceptron Learning (Gradient Descent Learning) Gradient descent weight update rule and Activation Functions w k w k + (η (tn i g( a i w i )) g ( a i w i ) a k ) For threshold perceptrons, the factor g ( a i w i ) is omitted from the weight update. Omitting g ( a i w i ) makes the weight update rule identical to the percepton learning rule. Since g ( a i w i ) is the same for all weights, its omission changes only the magnitude and not the direction of the overall weight update for each example. If the units in the network are using a continuous activation function we must be able to take the derivative g If the activation function g( w a) is a sigmoidal function 1 ( ) then δg(in i ) 1 e ( w a) δin i = g(in i ) (1 g(in i )) with in i = w a
50 Perceptron Learning Algorithm The perceptron learning algorithm, listed on the next slide, runs the training examples through the net one at a time, adjusting the weights slightly after each example to reduce the error. Each cycle through the examples is called an epoch. Epochs are repeated until some stopping criterion is reached typically, that the weight changes have become very small. The hypothesis returned computes the network output for any given example.
51 Perceptron Learning Algorithm function PERCEPTRON-LEARNING(examples, network, ) returns a perceptron hypothesis inputs: examples, a set of examples, each with input a = a 1,..., a n and output target t network, a perceptron with weights w = w 0... w n, and activation function g repeat for each e in examples do in w a [ e ] Err t [ e ] g ( in ) W j W j + (η Err g (in) a j [ e ]) until some stopping criterion is satisfied return NEURAL-NET-HYPOTHESIS(network) Perceptron learning rule converges to a consistent function for any linearly separable data set
52 Figure: Performance comparison of perceptrons and decision-trees on learning the majority function (which outputs a 1 only if more than half of its n inputs are 1). The perceptron learns majority function easily (because the majority function is linearly separable), DTL is hopeless (a decision tree would need O(2 n ) nodes to represent this function for n inputs and won t learn that without a very large data set.
53 Figure: Performance comparison of perceptrons and decision-trees on learning the restaurant example. The perceptron finds this problem very difficult (the solution to this problem is not linearly separable, the best plane through the data correctly classifies only 65%). However, the problem is easily represented as a decision tree.
54 Artificial Neurons: linear threshold versus sigmoidal activation functions Network types: Feedforward (connections only in one direction) versus Recurrent Feed-forward networks: Perceptrons (one-layer networks) verus Multi-layer Networks Perceptrons: cannot express non-linearly-seprable functions. Learning done by adjusting weights. Perceptron Training Rule: w i w i + w i where w i = η(t o) a i Gradient descent learning attempts to reduce the squared error: by calculating the partial derivative of the squared error of the network with respect to each weight Gradient descent weight update rule: w k w k + (η (tn i g( a i w i )) g ( a i w i ) a k )
55 1 Artificial Neurons Cognitive Basis Basic Components Unthresholded linear units Activation Functions 2 Implementing logical functions 3 Network structures 4 Feed-forward networks Feed-forward networks as Parameterised Func s of Input Types of Feed-forward Networks 5 Perceptrons What is a Perceptron? Representational Power of Threshold Perceptrons Perceptron Learning (Perceptron Training Rule) Perceptron Learning (Gradient Descent Learning) Perceptron Learning Algorithm 6 Perceptrons versus Decision Trees 7 Summary
Neural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationFeed-Forward mapping networks KAIST 바이오및뇌공학과 정재승
Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승 How much energy do we need for brain functions? Information processing: Trade-off between energy consumption and wiring cost Trade-off between energy consumption
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationRole of Neural network in data mining
Role of Neural network in data mining Chitranjanjit kaur Associate Prof Guru Nanak College, Sukhchainana Phagwara,(GNDU) Punjab, India Pooja kapoor Associate Prof Swami Sarvanand Group Of Institutes Dinanagar(PTU)
More informationLecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
More informationChapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
More informationMachine Learning and Data Mining -
Machine Learning and Data Mining - Perceptron Neural Networks Nuno Cavalheiro Marques (nmm@di.fct.unl.pt) Spring Semester 2010/2011 MSc in Computer Science Multi Layer Perceptron Neurons and the Perceptron
More informationMachine Learning: Multi Layer Perceptrons
Machine Learning: Multi Layer Perceptrons Prof. Dr. Martin Riedmiller Albert-Ludwigs-University Freiburg AG Maschinelles Lernen Machine Learning: Multi Layer Perceptrons p.1/61 Outline multi layer perceptrons
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationLecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
More informationRecurrent Neural Networks
Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time
More information3 An Illustrative Example
Objectives An Illustrative Example Objectives - Theory and Examples -2 Problem Statement -2 Perceptron - Two-Input Case -4 Pattern Recognition Example -5 Hamming Network -8 Feedforward Layer -8 Recurrent
More informationNeural Network Design in Cloud Computing
International Journal of Computer Trends and Technology- volume4issue2-2013 ABSTRACT: Neural Network Design in Cloud Computing B.Rajkumar #1,T.Gopikiran #2,S.Satyanarayana *3 #1,#2Department of Computer
More informationLecture 8: Synchronous Digital Systems
Lecture 8: Synchronous Digital Systems The distinguishing feature of a synchronous digital system is that the circuit only changes in response to a system clock. For example, consider the edge triggered
More informationArtificial neural networks
Artificial neural networks Now Neurons Neuron models Perceptron learning Multi-layer perceptrons Backpropagation 2 It all starts with a neuron 3 Some facts about human brain ~ 86 billion neurons ~ 10 15
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationData Mining Techniques Chapter 7: Artificial Neural Networks
Data Mining Techniques Chapter 7: Artificial Neural Networks Artificial Neural Networks.................................................. 2 Neural network example...................................................
More informationFollow links Class Use and other Permissions. For more information, send email to: permissions@pupress.princeton.edu
COPYRIGHT NOTICE: David A. Kendrick, P. Ruben Mercado, and Hans M. Amman: Computational Economics is published by Princeton University Press and copyrighted, 2006, by Princeton University Press. All rights
More informationSUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,
More informationAn Introduction to Neural Networks
An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,
More informationSelf Organizing Maps: Fundamentals
Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationLCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
More informationBinary Adders: Half Adders and Full Adders
Binary Adders: Half Adders and Full Adders In this set of slides, we present the two basic types of adders: 1. Half adders, and 2. Full adders. Each type of adder functions to add two binary bits. In order
More informationNEURAL NETWORKS A Comprehensive Foundation
NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments
More informationStock Prediction using Artificial Neural Networks
Stock Prediction using Artificial Neural Networks Abhishek Kar (Y8021), Dept. of Computer Science and Engineering, IIT Kanpur Abstract In this work we present an Artificial Neural Network approach to predict
More informationPredictive Dynamix Inc
Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished
More informationImpact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
More informationIFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...
IFT3395/6390 Historical perspective: back to 1957 (Prof. Pascal Vincent) (Rosenblatt, Perceptron ) Machine Learning from linear regression to Neural Networks Computer Science Artificial Intelligence Symbolic
More information4.1 Learning algorithms for neural networks
4 Perceptron Learning 4.1 Learning algorithms for neural networks In the two preceding chapters we discussed two closely related models, McCulloch Pitts units and perceptrons, but the question of how to
More informationApplication of Neural Network in User Authentication for Smart Home System
Application of Neural Network in User Authentication for Smart Home System A. Joseph, D.B.L. Bong, D.A.A. Mat Abstract Security has been an important issue and concern in the smart home systems. Smart
More informationNeural Networks algorithms and applications
Neural Networks algorithms and applications By Fiona Nielsen 4i 12/12-2001 Supervisor: Geert Rasmussen Niels Brock Business College 1 Introduction Neural Networks is a field of Artificial Intelligence
More informationAN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING
AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING Abhishek Agrawal*, Vikas Kumar** 1,Ashish Pandey** 2,Imran Khan** 3 *(M. Tech Scholar, Department of Computer Science, Bhagwant University,
More informationIntroduction to Artificial Neural Networks
POLYTECHNIC UNIVERSITY Department of Computer and Information Science Introduction to Artificial Neural Networks K. Ming Leung Abstract: A computing paradigm known as artificial neural network is introduced.
More informationProgramming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
More informationNeural Networks in Quantitative Finance
Neural Networks in Quantitative Finance Master Thesis submitted to Prof. Dr. Wolfgang Härdle Institute for Statistics and Econometrics CASE - Center for Applied Statistics and Economics Humboldt-Universität
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationA Time Series ANN Approach for Weather Forecasting
A Time Series ANN Approach for Weather Forecasting Neeraj Kumar 1, Govind Kumar Jha 2 1 Associate Professor and Head Deptt. Of Computer Science,Nalanda College Of Engineering Chandi(Bihar) 2 Assistant
More informationPerformance Evaluation of Artificial Neural. Networks for Spatial Data Analysis
Contemporary Engineering Sciences, Vol. 4, 2011, no. 4, 149-163 Performance Evaluation of Artificial Neural Networks for Spatial Data Analysis Akram A. Moustafa Department of Computer Science Al al-bayt
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationNeural Computation - Assignment
Neural Computation - Assignment Analysing a Neural Network trained by Backpropagation AA SSt t aa t i iss i t i icc aa l l AA nn aa l lyy l ss i iss i oo f vv aa r i ioo i uu ss l lee l aa r nn i inn gg
More information6.3 Conditional Probability and Independence
222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted
More informationIn order to describe motion you need to describe the following properties.
Chapter 2 One Dimensional Kinematics How would you describe the following motion? Ex: random 1-D path speeding up and slowing down In order to describe motion you need to describe the following properties.
More informationMultiple Layer Perceptron Training Using Genetic Algorithms
Multiple Layer Perceptron Training Using Genetic Algorithms Udo Seiffert University of South Australia, Adelaide Knowledge-Based Intelligent Engineering Systems Centre (KES) Mawson Lakes, 5095, Adelaide,
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationNovelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
More informationTRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC
777 TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC M.R. Walker. S. Haghighi. A. Afghan. and L.A. Akers Center for Solid State Electronics Research Arizona State University Tempe. AZ 85287-6206 mwalker@enuxha.eas.asu.edu
More informationTRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE
494 TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE Avrim Blum'" MIT Lab. for Computer Science Cambridge, Mass. 02139 USA Ronald L. Rivest t MIT Lab. for Computer Science Cambridge, Mass. 02139 USA ABSTRACT
More informationThe Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy
BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.
More informationNeural Networks and Back Propagation Algorithm
Neural Networks and Back Propagation Algorithm Mirza Cilimkovic Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15 Ireland mirzac@gmail.com Abstract Neural Networks (NN) are important
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationCounters and Decoders
Physics 3330 Experiment #10 Fall 1999 Purpose Counters and Decoders In this experiment, you will design and construct a 4-bit ripple-through decade counter with a decimal read-out display. Such a counter
More informationdegrees of freedom and are able to adapt to the task they are supposed to do [Gupta].
1.3 Neural Networks 19 Neural Networks are large structured systems of equations. These systems have many degrees of freedom and are able to adapt to the task they are supposed to do [Gupta]. Two very
More informationLecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
More informationNeural network software tool development: exploring programming language options
INEB- PSI Technical Report 2006-1 Neural network software tool development: exploring programming language options Alexandra Oliveira aao@fe.up.pt Supervisor: Professor Joaquim Marques de Sá June 2006
More informationSELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS
UDC: 004.8 Original scientific paper SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS Tonimir Kišasondi, Alen Lovren i University of Zagreb, Faculty of Organization and Informatics,
More information1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationSection 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationFeedforward Neural Networks and Backpropagation
Feedforward Neural Networks and Backpropagation Feedforward neural networks Architectural issues, computational capabilities Sigmoidal and radial basis functions Gradient-based learning and Backprogation
More informationThe Cobb-Douglas Production Function
171 10 The Cobb-Douglas Production Function This chapter describes in detail the most famous of all production functions used to represent production processes both in and out of agriculture. First used
More informationData Mining using Artificial Neural Network Rules
Data Mining using Artificial Neural Network Rules Pushkar Shinde MCOERC, Nasik Abstract - Diabetes patients are increasing in number so it is necessary to predict, treat and diagnose the disease. Data
More informationNeural network models: Foundations and applications to an audit decision problem
Annals of Operations Research 75(1997)291 301 291 Neural network models: Foundations and applications to an audit decision problem Rebecca C. Wu Department of Accounting, College of Management, National
More informationELLIOTT WAVES RECOGNITION VIA NEURAL NETWORKS
ELLIOTT WAVES RECOGNITION VIA NEURAL NETWORKS Martin Kotyrba Eva Volna David Brazina Robert Jarusek Department of Informatics and Computers University of Ostrava Z70103, Ostrava, Czech Republic martin.kotyrba@osu.cz
More informationSEMINAR OUTLINE. Introduction to Data Mining Using Artificial Neural Networks. Definitions of Neural Networks. Definitions of Neural Networks
SEMINAR OUTLINE Introduction to Data Mining Using Artificial Neural Networks ISM 611 Dr. Hamid Nemati Introduction to and Characteristics of Neural Networks Comparison of Neural Networks to traditional
More informationA Primer on Index Notation
A Primer on John Crimaldi August 28, 2006 1. Index versus Index notation (a.k.a. Cartesian notation) is a powerful tool for manipulating multidimensional equations. However, there are times when the more
More informationEigenvalues, Eigenvectors, and Differential Equations
Eigenvalues, Eigenvectors, and Differential Equations William Cherry April 009 (with a typo correction in November 05) The concepts of eigenvalue and eigenvector occur throughout advanced mathematics They
More informationA Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
More informationCHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER
93 CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER 5.1 INTRODUCTION The development of an active trap based feeder for handling brakeliners was discussed
More informationPerformance Evaluation On Human Resource Management Of China S Commercial Banks Based On Improved Bp Neural Networks
Performance Evaluation On Human Resource Management Of China S *1 Honglei Zhang, 2 Wenshan Yuan, 1 Hua Jiang 1 School of Economics and Management, Hebei University of Engineering, Handan 056038, P. R.
More informationTennis Winner Prediction based on Time-Series History with Neural Modeling
Tennis Winner Prediction based on Time-Series History with Neural Modeling Amornchai Somboonphokkaphan, Suphakant Phimoltares, and Chidchanok Lursinsap Abstract Tennis is one of the most popular sports
More informationBig Data Analytics Using Neural networks
San José State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 4-1-2014 Big Data Analytics Using Neural networks Follow this and additional works at: http://scholarworks.sjsu.edu/etd_projects
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationRegular Languages and Finite Automata
Regular Languages and Finite Automata 1 Introduction Hing Leung Department of Computer Science New Mexico State University Sep 16, 2010 In 1943, McCulloch and Pitts [4] published a pioneering work on a
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationBurgers vector, Burgers circuit, and Dislocation Line Direction
Burgers vector, Burgers circuit, and Dislocation Line Direction Keonwook Kang and Wei Cai November 21, 2007 The 1st version of this white paper was written after the online discussion between Keonwook
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationMath 3000 Section 003 Intro to Abstract Math Homework 2
Math 3000 Section 003 Intro to Abstract Math Homework 2 Department of Mathematical and Statistical Sciences University of Colorado Denver, Spring 2012 Solutions (February 13, 2012) Please note that these
More informationUsing Neural Networks with Limited Data to Estimate Manufacturing Cost. A thesis presented to. the faculty of
Using Neural Networks with Limited Data to Estimate Manufacturing Cost A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationProblem of the Month: Cutting a Cube
Problem of the Month: The Problems of the Month (POM) are used in a variety of ways to promote problem solving and to foster the first standard of mathematical practice from the Common Core State Standards:
More informationDesign call center management system of e-commerce based on BP neural network and multifractal
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationSystems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
More informationThe Counterpropagation Network
214 The Counterpropagation Network The Counterpropagation Network " The Counterpropagation network (CPN) is the most recently developed of the models that we have discussed so far in this text. The CPN
More informationA Simple Feature Extraction Technique of a Pattern By Hopfield Network
A Simple Feature Extraction Technique of a Pattern By Hopfield Network A.Nag!, S. Biswas *, D. Sarkar *, P.P. Sarkar *, B. Gupta **! Academy of Technology, Hoogly - 722 *USIC, University of Kalyani, Kalyani
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationFeature Subset Selection in E-mail Spam Detection
Feature Subset Selection in E-mail Spam Detection Amir Rajabi Behjat, Universiti Technology MARA, Malaysia IT Security for the Next Generation Asia Pacific & MEA Cup, Hong Kong 14-16 March, 2012 Feature
More informationLINEAR INEQUALITIES. Mathematics is the art of saying many things in many different ways. MAXWELL
Chapter 6 LINEAR INEQUALITIES 6.1 Introduction Mathematics is the art of saying many things in many different ways. MAXWELL In earlier classes, we have studied equations in one variable and two variables
More informationModels of Cortical Maps II
CN510: Principles and Methods of Cognitive and Neural Modeling Models of Cortical Maps II Lecture 19 Instructor: Anatoli Gorchetchnikov dy dt The Network of Grossberg (1976) Ay B y f (
More informationXIV. Title. 2.1 Schematics of the WEP. 21. 2.2 Encryption in WEP technique 22. 2.3 Decryption in WEP technique. 22. 2.4 Process of TKIP 25
XIV LIST OF FIGURES Figure Title Page 2.1 Schematics of the WEP. 21 2.2 Encryption in WEP technique 22 2.3 Decryption in WEP technique. 22 2.4 Process of TKIP 25 2.5 IEEE 802.1x Structure 30 2.6 RSNA Architecture
More informationThe Point-Slope Form
7. The Point-Slope Form 7. OBJECTIVES 1. Given a point and a slope, find the graph of a line. Given a point and the slope, find the equation of a line. Given two points, find the equation of a line y Slope
More informationMap Patterns and Finding the Strike and Dip from a Mapped Outcrop of a Planar Surface
Map Patterns and Finding the Strike and Dip from a Mapped Outcrop of a Planar Surface Topographic maps represent the complex curves of earth s surface with contour lines that represent the intersection
More informationSPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
More information