An Introduction to Artificial Neural Networks (ANN) - Methods, Abstraction, and Usage

An Introduction to Artificial Neural Networks (ANN) - Methods, Abstraction, and Usage Introduction An artificial neural network (ANN) reflects a system that is based on operations of biological neural networks and hence can be defined as an emulation of biological neural systems. ANN's are at the forefront of computational systems designed to produce, or at least mimic, intelligent behavior. Unlike classical Artificial Intelligence (AI) systems that are designed to directly emulate rational, logical reasoning, neural networks aim at reproducing the underlying processing mechanisms that give rise to intelligence as an emergent property of complex, adaptive systems. Neural network systems have successfully been developed and deployed to solve pattern recognition, capacity planning, business intelligence, robotics, or intuitive problem related aspects. In computer science, neural networks gained a lot of steam over the last few years in areas such forecasting, data analytics, as well as data mining. It has to be pointed out that data analytics is normally defined as the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is used in many industries to allow companies to either make better business decisions, or in science, to verify/disprove existing models or theories. Data analytics differs from data mining by scope, purpose, and focus of the analysis. Data Mining describes the process of discovering new patterns out of very large data sets and does so by applying a vast set of methods that originate out of statistics, artificial intelligence, or database management. The actual data mining task reflects an automatic (or semi-automatic) analysis of large quantities of data and the goal is to extract previously unknown patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), or dependencies (association rule mining). Hence, data mining basically focuses on sorting through large data sets (by utilizing sophisticated software applications) to identify undiscovered patterns and establish hidden relationships (aka extract value). Data analytics focuses on inference, the process of deriving a conclusion that is solely based on what is already known to the researcher. To summarize, a neural network can be described as a highly parallel system that is capable of resolving paradigms that linear computing cannot tackle. Some more comment types of ANN's are (Note: this list is far from being complete. Deep Learning is not covered in this introductory paper): Feed Forward Neural Network. A simple neural network type where synapses (connections) are made from an input layer to zero or more hidden layers and ultimately to an output layer. The feed forward neural network type is one of the most common neural networks in use. It is suitable for many types of applications. Feed forward neural networks are often trained via simulated annealing, genetic algorithms, or via one of the propagation techniques. To illustrate, annealing is a term used in metallurgy. If a metal is heated to a very high temperature, the atoms move about at high speeds. Yet, if they are cooled very slowly, they settle into patterns and structures, rendering the metal as being much stronger than before. This principle can be employed as an optimization technique in computer science. More specifically, simulated annealing can be used to aid a neural network to avoid local minima scenarios in its energy function. Simulated annealing basically involves perturbing the independent variables (the ANN weights) by a random value and keeping track of the value with the least error. Dominique A. Heger (dheger@dhtusa.com) 1

Self Organizing Map (SOM). A neural network that contains two layers and implements a winner take all strategy in the output layer. Rather than taking the output of individual neurons, the neuron with the highest output is considered the winner. SOM's are typically used for clustering related problems where the output neurons represent groups that the input neurons are to be classified into. SOM's may employ a competitive learning strategy. Hopfield Neural Network. A simple single layer recurrent neural network. The Hopfield neural network is trained via an algorithm that teaches it to learn to recognize patterns. The Hopfield network will indicate that the pattern is recognized by echoing it back. Hopfield neural networks are typically used for pattern recognition. Simple Recurrent Network (SRN) - Elman or Jordan Style. A recurrent neural network that has a context layer. The context layer holds the previous output from the hidden layer and then echoes that value back to the hidden layer's input. The hidden layer then always receives input from its previous iteration's output. Elman or Jordan neural networks are generally trained by using genetic, simulated annealing, or one of the propagation techniques. Elman or Jordan neural networks are typically used for prediction related problems. Simple Recurrent Network (SRN) - Self Organizing Map. A recurrent self organizing map that has an input and output layer, just as a regular SOM. However, the RSOM has a context layer as well. This context layer echo's the previous iteration's output back to the input layer of the neural network. RSOM's may be trained via a competitive learning algorithm (just as a non-recurrent SOM). RSOM's can be used to classify temporal data or to predict outcomes. Feed forward Radial Basis Function (RBF) Network. A feed forward network with an input layer, output layer and a hidden layer. The hidden layer is based on a radial basis function. The RBF generally employed is the Gaussian function. Several RBF's in the hidden layer allow the RBF network to approximate a more complex activation function than a typical feed forward neural network. RBF networks can be used for pattern recognition. They can be trained by using genetic, annealing or one of the propagation techniques. Other means must be employed to determine the structure of the RBF's used in the hidden layer. ANN - Pros: A neural network can be used to solve linear as well as non-linear programming tasks As a component of an ANN fails, the net continues to operate (based on its highly parallel nature) A neural network learns and does not have to be re-programmed An ANN can be used to solve classification, clustering, and regression related problems ANN - Cons: Most ANN's require a training phase to operate/function As an ANN's architecture differs from microprocessors, ANN's have to be emulated Large ANN's require rather powerful HW to run (to accomplish reasonable execution times) In a lot of circumstances, a neural network based approach is chosen if a linear programming based method is not suitable to answer the question at hand. Linear Programming (LP) is defined as the process of finding the extreme values (maximum and minimum values) of a linear function ƒ(x1,, xn) for a region that is defined by inequalities (subject to given constraints that are linear inequalities Dominique A. Heger (dheger@dhtusa.com) 2

involving the variables xi). To illustrate, LP simulation models are used to determine the best possible solution in allocating limited resources in order to achieve maximum profit or minimum cost. The field of LP is based on work conducted by the Russian mathematician A. N. Kolmogorov. Nonlinear programming (NLP) on the other hand is described as the process of solving a system of equalities and inequalities (collectively termed constraints) over a set of unknown real variables, along with an objective function to be maximized or minimized (where some of the constraints or the objective function are nonlinear). The term nonlinear describes designating or involving an equation whose terms are not of the first degree. Figure 1: A Simple ANN Example As its biological predecessor, an ANN is considered an adaptive system, or in other words, each parameter is changed during its operation and is deployed for solving the problem at hand (called the ANN training phase). Any ANN is designed/developed via a systematic step-by-step procedure that optimizes a criterion known as the learning rule. The input/output training data is fundamental for these networks, as the data conveys the information necessary to decipher or discover the optimal operating point. Further, the non-linear nature of ANN's processing elements contribute to a very flexible system setup. Generally speaking, an artificial neural network is considered a system (see Figure 1). A system represents a structure that receives an input, processes the data, and provides an output. Normally, the input consists of a data array (vector) that may reflect raw data, an image, a wave, or any other data that can be stored in an array/vector. As an input is presented to the ANN, a corresponding desired/target response is set at the output and an error is composed based on the delta between the desired and the actual system output. Depending on the type of net, the error data is fed back into the system where the system adjusts the net parameters in a systematic manner (based on the learning rule). This process is basically repeated until the desired output is reached (or an applicable error state is posted). In most cases, the data used as input into an ANN is pre-processed (cleansed). While designing/developing an ANN, Dominique A. Heger (dheger@dhtusa.com) 3

based on the problem at hand, (1) a network type and topology, (2) a transfer and an activation function, as well as (3) a criteria for finishing the training phase (an error rate) has to be chosen. The Biological Model In 1943, McCulloc and Pitts introduced a set of simplified neurons. These neurons were depicted as models of biological networks and transformed into conceptual components for circuits that could perform computational tasks. Hence, the basic model of the artificial neuron is derived from the functionality of the biological neuron (see Figure 2). By definition, neurons are basic signaling units of the nervous system of a living being where each neuron represents a discrete cell whose N processes are originating from its cell body. Figure 2: Neuron Cells The biological neuron has 4 main regions to its structure. The cell body (soma) contains 2 offshoots, the dendrites and the axon (see Figure 2). The cell body is basically the heart of the cell. It contains the nucleolus and maintains protein synthesis. A neuron has many dendrites (a tree structure) to receives signals from other neurons. A single neuron usually has 1 axon that expands from the cell body (labeled an axon hillock). The axon's main purpose is to distribute (downward) electrical signals generated at the axon hillock. These signals are labeled action potentials. The other end of the axon may split into N branches that end in a pre-synaptic terminal. The electrical signals (the action potential) that the neurons use to convey the brain's data/information are all identical. The brain determines the type of information/data being received based on the actual path of the signal. To reiterate, the brain analyzes the patterns of the sent signals and based on that information, interprets the type of information received. The myelin insulates the axon. The non-insulated parts of the axon area are known as Nodes of Ranvier. At these nodes, the signal that is traveling down the axon is being regenerated. This ensures that the signal that is travelling down the axon is fast and constant. The synapse reflects the contact area between 2 neurons. The 2 neurons are physically separated by a cleft. The actual electric signals are transmitted via chemical interaction. The neuron sending the signal is called the pre-synaptic cell whereas the receiving neuron is labeled the post-synaptic cell. Dominique A. Heger (dheger@dhtusa.com) 4

The electrical signals are generated by the membrane potential (based on differences in concentration of sodium and potassium ions outside the cell membrane). Biological neurons can be classified by either their function or by the quantity of processes they operate on. While classified by processes, 3 categories emerge. (1) Unipolar neurons have a single process. Their dendrites and axon are located on the same stem. These type of neurons are found in invertebrates. (2) Bipolar neurons have 2 processes. Their dendrites and axon have 2 separated processes as well. (3) Multipolar neurons that are commonly found in mammals. While classified by function, 3 categories immerge as well. (1) Sensory neurons that provide all information for perception and motor coordination. (2) Motor neurons that provide information to muscles and glands. (3) Interneuronal that incorporate all other neurons. The Interneuronal consist of 2 subclasses. (1) Relay or Protection Interneurons that are normally found in the brain (connectors) and (2) Local Interneurons that are only used in local circuits. ANN - Mathematical Abstraction Figure 3: Mathematical Abstraction - A simple ANN Example While mapping the biological neuron model onto an ANN model, several key components have to be considered. 1st, the synapses of the biological neuron are modeled as weights in the ANN. To reiterate, the synapse of the biological neuron interconnects the neural network and provides the strength of the connection. For an ANN, the weight depicts a value representing the synapse. A negative weight reflects an inhibitory connection while a positive value depicts an excitatory connection. 2nd, (in this example!) all the inputs are summed together and are modified by the weights (linear combination). 3d, an activation function has to be defined that controls the amplitude of the output (see Figure 3). To illustrate, in many cases, an acceptable range of output values may be between [0 and 1] or it could be between [-1 and 1]. Based on Figure 3, The interval activity of the neuron can be described by vk. Hence, Dominique A. Heger (dheger@dhtusa.com) 5

the output of the neuron yk would therefore be the outcome of some transformation or activation function on the value of vk. To simplify the discussion, an ANN basically operates within a framework such as: 1. The input to a neuron arrives as a signal 2. The signal builds up in the cell 3. Ultimately, the cell discharges (the cell fires) through the output (threshold dependent) 4. The cell starts to build up the signals again Activation Functions Figure 4: Some ANN Activation Functions An activation or transfer function acts as a transformation entity so that the output of a neuron in an ANN may be between certain values (such as 0 and 1, or -1 and 1 - see Figure 4). Some of the more popular activation functions are: (1) The Threshold Function which is set to 0 if the summed input is less than a certain threshold value v or is set to 1 if the summed input is greater than or equal to the threshold value. (2) The Piecewise-Linear or Logistic Function that may be set to 0 or 1 but that can also be set to any other values in that interval depending on the amplification factor in a certain region of the linear operation. (3) The Sigmoid Function that can range between 0 and 1 ( in some models, it is also beneficial to use a -1 to 1 interval). An example of the sigmoid function would be the hyperbolic tangent function. ANN - Processing Units To reiterate, an ANN consists of a pool of simple processing units that communicate by sending signals to each other over a large number of weighted connections. Each unit performs a relatively basic task; receive input from neighbors or external sources and utilize that information to compute an output signal that is propagated to other units in the ANN. Next to the actual information processing task, weights have to be adjusted. An ANN is inherently parallel in the sense that many units can perform computation cycles simultaneously. ANN's distinguish among 3 types of units. (1) The Input Units that receive data from outside the net. (2) The Output Units that act as the ANN endpoints. (3) The Hidden Units where the input and output signals remain within the ANN framework. During operation, units can be updated either synchronously or asynchronously. Dominique A. Heger (dheger@dhtusa.com) 6

ANN - Topologies In general, ANN solutions can be classified as: Feed-Forward ANN's. In such an ANN solution, the data moves from the input to the output units in a strictly feed-forward manner. Data processing may spawn multiple layers, but no feedback connections are implemented. Examples of feed-forward ANN's would be a Perceptron (Rosenblatt) or an Adaline (Adaptive Linear Neuron) based net. Recurrent ANN's. These types of ANN's incorporate feedback connections. Compared to feedforward ANN's, the dynamic properties of the network are paramount. In some circumstances, the activation values of the units undergo a relaxation process so that the network evolves into a stable state where these activation values remain unchanged. Examples of recurrent ANN's would be a Kohonen (SOM) or a Hopfield based solution. ANN - Training An ANN has to be designed and implemented in a way that the set of input data results into a desired output (either direct or by using a relaxation process). Several methods to quantify the strengths of the connections can be applied. In other words, the weights can be set explicitly (utilizing a priori knowledge) or the net can be trained by feeding learning patterns into the solution and by letting the net change/adjust the weights according to some learning rule. Learning based solutions can be categorized as: Supervised or associative learning. Where the net is trained by quantifying input and matching output patterns (learning by example). These input/output pairs may either be provided by an external teaching component or by the net itself (self-supervised approach). Unsupervised learning (self-organizing paradigm). Where the net (output) unit is trained to respond to clusters of pattern within the input framework (only input but no output examples are provided). In this paradigm, the system is supposed to discover statistically salient features in the input population. Compared to the supervised learning method, there is no a priori set of categories into which the patterns are to be classified, rather the system has to develop its own representation of the input stimuli. For supervised and unsupervised learning methods, lots of learning rules reflect a variant of the Hebbian learning rule. Reinforcement Learning. Where the net applies a learning paradigm that is considered an intermediate form of the above 2 types of learning. In this method, the learning machine executes some action on the environment and as a result, receives some feedback/response. The learning component grades its action (as either good or bad) based on the environmental response and adjusts its parameters accordingly. Generally speaking, the parameter adjustment process is continued until an equilibrium state surfaces where no further adjustments are necessary. Summary An ANN reflects/represents a system of simple processing elements (neurons) that can exhibit complex, global behavior that is determined by the connections among the processing elements and the element parameters, respectively. Neural networks offer a number of advantages, including the ability to implicitly detect complex, nonlinear relationships among dependent and independent variables, the ability to detect all possible interactions between predictor variables, or the availability of multiple training algorithms. Disadvantages of using an ANN solution may include the black-box nature of these systems or the greater computational burden on the HW infrastructure available for the analysis. Nevertheless, Dominique A. Heger (dheger@dhtusa.com) 7

ANN based solutions have provided excellent results/insides into very complex problems in forecasting, data-mining, task scheduling, or optimized resource allocation problems. References 1. Krieger, C. "Neural Networks in Data Mining", University of Massachusetts, 1996 2. Yuancan Huang, "Lagrange-Type Neural Networks for Nonlinear Programming Problems with Inequality Constraints", 41st IEEE Conference, Spain, 2005 3. Oskoei, H., "An efficient simplified neural network for solving linear and quadratic programming problems", Applied Mathematics and Computation, 2006 4. MLADENOV, V., "On Neural Networks for Solving Nonlinear Programming Problems", Technical University of Sofia, 2005 5. Hongjun Lu, "Effective Data Mining Using Neural Networks", IEEE, 1996 6. Russell D. Reed, Robert J. Marks II, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 7. Spyros G. Makridakis, Steven C. Wheelwright, Rob J Hyndman, Forecasting : Methods and Applications 8. NeuroAI, "Neural networks: A requirement for intelligent systems", 2007 9. Wikipedia, "Neural Networks", 2011 10. Pandey, S., "Minimizing Execution Costs when using Globally Distributed Cloud Services", The University of Melbourne, Australia, 2010 11. Marzban, "Stochastic Neural Networks with the Weighted Hebb Rule", University of Oklahoma, 2001 Dominique A. Heger (dheger@dhtusa.com) 8