TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC



Similar documents
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Chapter 4: Artificial Neural Networks

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

Role of Neural network in data mining

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS

Artificial neural networks

Neural Network Design in Cloud Computing

Neural Networks and Support Vector Machines

TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE

NEURAL NETWORKS A Comprehensive Foundation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Stock Prediction using Artificial Neural Networks

The CHIR Algorithm for Feed Forward. Networks with Binary Weights

CLOUD FREE MOSAIC IMAGES

Analecta Vol. 8, No. 2 ISSN

Neural network software tool development: exploring programming language options

Machine Learning: Multi Layer Perceptrons

Follow links Class Use and other Permissions. For more information, send to:

Computational Neural Network for Global Stock Indexes Prediction

Performance Evaluation of Artificial Neural. Networks for Spatial Data Analysis

Performance Evaluation On Human Resource Management Of China S Commercial Banks Based On Improved Bp Neural Networks

Application of Neural Network in User Authentication for Smart Home System

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Lecture 6. Artificial Neural Networks

Neural Networks in Data Mining

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

An Artificial Neural Networks-Based on-line Monitoring Odor Sensing System

Machine Learning and Data Mining -

Iranian J Env Health Sci Eng, 2004, Vol.1, No.2, pp Application of Intelligent System for Water Treatment Plant Operation.

Forecasting of Economic Quantities using Fuzzy Autoregressive Model and Fuzzy Neural Network

Introduction to Artificial Neural Networks

Design call center management system of e-commerce based on BP neural network and multifractal

Models of Cortical Maps II

Recurrent Neural Networks

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.

American International Journal of Research in Science, Technology, Engineering & Mathematics

Novelty Detection in image recognition using IRF Neural Networks properties

Comparison of K-means and Backpropagation Data Mining Algorithms

SURVIVABILITY ANALYSIS OF PEDIATRIC LEUKAEMIC PATIENTS USING NEURAL NETWORK APPROACH

Data quality in Accounting Information Systems

NEUROMATHEMATICS: DEVELOPMENT TENDENCIES. 1. Which tasks are adequate of neurocomputers?

Neural Network and Genetic Algorithm Based Trading Systems. Donn S. Fishbein, MD, PhD Neuroquant.com

Neural Computation - Assignment

Feedforward Neural Networks and Backpropagation

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

A Multi-level Artificial Neural Network for Residential and Commercial Energy Demand Forecast: Iran Case Study

Predictive Dynamix Inc

NEUROEVOLUTION OF AUTO-TEACHING ARCHITECTURES

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Time Series Data Mining in Rainfall Forecasting Using Artificial Neural Network

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Chapter 12 Discovering New Knowledge Data Mining

NEURAL NETWORKS IN DATA MINING

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

Performance Comparison between Backpropagation Algorithms Applied to Intrusion Detection in Computer Network Systems

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Analysis of Multilayer Neural Networks with Direct and Cross-Forward Connection

An Introduction to Neural Networks

How To Use Neural Networks In Data Mining

Introduction to Machine Learning Using Python. Vikram Kamath

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Neural network models: Foundations and applications to an audit decision problem

Power Prediction Analysis using Artificial Neural Network in MS Excel

AN APPLICATION OF TIME SERIES ANALYSIS FOR WEATHER FORECASTING

Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks

Neural Networks and Back Propagation Algorithm

Sensitivity Analysis for Data Mining

6.2.8 Neural networks for data mining

Back Propagation Neural Networks User Manual

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Joseph Twagilimana, University of Louisville, Louisville, KY

Neural Network Applications in Stock Market Predictions - A Methodology Analysis

Software Development Cost and Time Forecasting Using a High Performance Artificial Neural Network Model

DYNAMIC LOAD BALANCING OF FINE-GRAIN SERVICES USING PREDICTION BASED ON SERVICE INPUT JAN MIKSATKO. B.S., Charles University, 2003 A THESIS

Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

DEVELOPING AN ARTIFICIAL NEURAL NETWORK MODEL FOR LIFE CYCLE COSTING IN BUILDINGS

Data Mining mit der JMSL Numerical Library for Java Applications

Learning to Process Natural Language in Big Data Environment

A bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale

Price Prediction of Share Market using Artificial Neural Network (ANN)

Data Mining Techniques Chapter 7: Artificial Neural Networks

A Simple Feature Extraction Technique of a Pattern By Hopfield Network

Introduction to Machine Learning CMU-10701

Predict Influencers in the Social Network

COMPUTATIONIMPROVEMENTOFSTOCKMARKETDECISIONMAKING MODELTHROUGHTHEAPPLICATIONOFGRID. Jovita Nenortaitė

Neural Networks in Quantitative Finance

Programming Exercise 3: Multi-class Classification and Neural Networks

Multiple Layer Perceptron Training Using Genetic Algorithms

The Research of Data Mining Based on Neural Networks

Transcription:

777 TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC M.R. Walker. S. Haghighi. A. Afghan. and L.A. Akers Center for Solid State Electronics Research Arizona State University Tempe. AZ 85287-6206 mwalker@enuxha.eas.asu.edu ABSTRACT Hardware implementation of neuromorphic algorithms is hampered by high degrees of connectivity. Functionally equivalent feedforward networks may be formed by using limited fan-in nodes and additional layers. but this complicates procedures for determining weight magnitudes. No direct mapping of weights exists between fully and limited-interconnect nets. Low-level nonlinearities prevent the formation of internal representations of widely separated spatial features and the use of gradient descent methods to minimize output error is hampered by error magnitude dissipation. The judicious use of linear summations or collection units is proposed as a solution. HARDWARE IMPLEMENTATIONS OF FEEDFORWARD, SYNTHETIC NEURAL SYSTEMS The pursuit of hardware implementations of artificial neural network models is motivated by the need to develop systems which are capable of executing neuromorphic algorithms in real time. The most significant barrier is the high degree of connectivity required between the processing elements. Current interconnect technology does not support the direct implementation of large-scale arrays of this type. In particular. the high fan-in/fan-outs of biology impose connectivity requirements such that the electronic implementation of a highly interconnected biological neural networks of just a few thousand neurons would require a level of connectivity which exceeds the current or even projected interconnection density ofulsi systems (Akers et al. 1988). Highly layered. limited-interconnected architectures are however. especially well suited for VLSI implementations. In previous works. we analyzed the generalization and fault-tolerance characteristics of a limited-interconnect perceptron architecture applied in three simple mappings between binary input space and binary output space and proposed a CMOS architecture (Akers and Walker. 1988). This paper concentrates on developing an understanding of the limitations on layered neural network architectures imposed by hardware implementation and a proposed solution.

778 Walker, Haghighi, Afghan and Akers TRAINING CONSIDERATIONS FOR LIMITED.. INTERCONNECT FEEDFORWARD NETWORKS The symbolic layout of the limited fan-in network is shown in Fig. 1. Re-arranging of the individual input components is done to eliminate edge effects. Greater detail on the actual hardware architecture may be found in (Akers and Walker, 1988) As in linear filters, the total number of connections which fan-in to a given processing element determines the degrees of freedom available for forming a hypersurface which implements the desired node output function (Widrow and Stearns, 1985). When processing elements with fixed, low fan-in are employed, the affects of reduced degrees of freedom must be considered in order to develop workable training methods which permit generalization of novel inputs. First. no direct or indirect relation exists between weight magnitudes obtained for a limited-interconnect, multilayered perceptron, and those obtained for the fully connected case. Networks of these types adapted with identical exemplar sets must therefore fonn completely different functions on the input space. Second, low-level nonlinearities prevent direct internal coding of widely separated spatial features in the input set. A related problem arises when hyperplane nonlinearities are used. Multiple hyperplanes required on a subset of input space are impossible when no two second level nodes address identical positions in the input space. Finally, adaptation methods like backpropagation which minimize output error with gradient descent are hindered since the magnitude of the error is dissipated as it back-propagates through large numbers of hidden layers. The appropriate placement of linear summation elements or collection units is a proposed solution. 1 2 3 4 5 12 11 10 9 6 7 8 Figure 1. Symbolic Layout of Limited-Interconnect Feedforward Architecture

Training a Limited-Interconnect, Synthetic Neural Ie 779 COMPARISON OF WEIGHT VALVES IN FULLY CONNECTED AND LIMITED-INTERCONNECT NETWORKS Fully connected and limited-interconnect feedforward structures may be functionally equivalent by virtue of identical training sets, but nonlinear node discriminant functions in a fully-connected perceptron network are generally not equivalent to those in a limited-interconnect, multilayered network. This may be shown by comparing the Taylor series expansion of the discriminant functions in the vicinity of the threshold for both types and then equating terms of equivalent order. A simple limited-interconnect network is shown in Fig. 2. x1 x2 y3 x3 x4 Figure 2. Limited-Interconnect Feedforward Network A discriminant function with a fan-in of two may be represented with the following functional form, where e is the threshold and the function is assumed to be continuously differentiable. The Taylor series expansion of the discriminant is, Expanding output node three in Fig. 2 to second order, where fee), fee) and f'(e) are constant terms. Substituting similar expansions for Yl and Y2 into Y3 yields the expression,

780 Walker, Haghighi, Afghan and Akers The output node in the fully-connected case may also be expanded, x1 x2 x3 ~y3 x4 Figure 3. Fully Connected Network where Expanding to second order yields, We seek the necessary and sufficient conditions for the two nonlinear discriminant functions to be analytically equivalent. This is accomplished by comparing terms of equal order in the expansions of each output node in the two nets. Equating the constant terms yields, Equating the first order terms, Equating the second order terms, w =-w 5 6 W =W = 1 5 6 f(9)

Training a Limited-Interconnect, Synthetic Neural Ie 781 The ftrst two conditions are obviously contradictory. In addition, solving for w5 or w6 using the ftrst and second constraints or the frrst and third constraints yields the trivial result, w5=w6=o. Thus, no relation exists between discriminant functions occurring in the limited and fully connected feedforward networks. This eliminates the possibility that weights obtained for a fully connected network could be transformed and used in a limited-interconnect structure. More signiftcant is the fact that full and limited interconnect nets which are adapted with identical sets of exemplars must form completely different functions on the input space, even though they exhibit identical output behavior. For this reason, it is anticipated that the two network types could produce different responses to a novel input. NON-OVERLAPPING INPUT SUBSETS Signal routing becomes important for networks in which hidden units do not address identical subsets in the proceeding layer. Figure 4 shows an odd-parity algorithm implemented with a limited-interconnect architecture. Large weight magnitudes are indicated by darker lines. Many nodes act as "pass-through" elements in that they have few dominant input and output connections. These node types are necessary to pass lower level signals to common aggregation points. In general, the use of limited fan-in processing elements implementing a nonlinear discriminant function decreases the probability that a given correlation within the input data will be encoded, especially if the "width" of the feature set is greater than the fan-in, requiring encoding at a high level within the net. In addition, since lower-level connections determine the magnitudes of upper level connections in any layered net when baclcpropagation is used, the set of points in weight space available to a limited-interconnect net for realizing a given function is further reduced by the greater number of weight dependencies occurring in limited-interconnect networks, all of which must be satisfted during training. Finally, since gradient descent is basically a shortcut through an NP-complete search in weight space, reduced redundancy and overlapping of internal representations reduces the probability of convergence to a near-optimal solution on the training set. DISSIPATION OF ERROR MAGNITUDE WITH INCREASING NUMBERS OF LAYERS Following the derivation of backpropagation in (plaut, 1986), the magnitude change for a weight connecting a processing element in the m-iayer with a processing element in the I-layer is given by, where

782 Walker, Haghighi, Afghan and Akers Figure 4. Six-input odd parity function implemented with limited-interconnect then [ f [f [f d Y a ] dy j ] dy 1 dy i..j '1 a=l a x, ) dx l-k dx m = ~ L L(ya-da) dx wb-a "-d w k _, _k W ~ k=l J= ) k 1 Where y is the output of the discriminant function, x is the activation level, w is a connection magnitude, and f is the fan-in for each processing element. If N layers of elements intervene between the m-layer and the output layer, then each of the f (N-l) tenns in the above summation consists of the product,

Training a Limited-Interconnect, Synthetic Neural Ie 783 Wb dy. ) -a '-d x j If we replace the weight magnitudes and the derivatives in each tenn with their mean values, The value of the first derivative of the sigmoid discriminant function is distributed between 0.0 and 0.5. The weight values are typically initially distrlbuted evenly between small positive and negative values. Thus with more layers. the product of the derivatives occurring in each tenn approaches zero. The use of large numbers of perceptron layers therefore has the affect of dissipating the magnitude of the error. This is exacerbated by the low fan-in, which reduces the total number of tenns in the summation. The use of linear collection units (McClelland. 1986), discussed in the following section, is a proposed solution to this problem. LINEAR COLLECTION UNITS As shown in Fig. 5, the output of the limited-interconnect net employing collection units is given by the function, x1 x2 [::>linear summation o non-linear discriminant y3 x3 x4 Figure S. Limited-interconnect network employing linear summations where c 1 and c2 are constants. The position of the summations may be determined by using euclidian k-means clustering on the exemplar set to a priori locate cluster centers

784 Walker, Haghighi, Afghan and Akers and determine their widths (Duda and Hart, 1973). The cluster members would be combined using linear elements until they reached a nonlinear discrminant, located higher in the net and at the cluster center. With this arrangement, weights obtained for a fully-connected net could be mapped using a linear transformation into the limited-interconnect network. Alternatively, backpropagation could be used since error dissipation would be reduced by setting the linear constant c of the summation elements to arbitrarily large values. CONCLUSIONS No direct transformation of weights exists between fully and limited interconnect nets which employ nonlinear discrmiminant functions. The use of gradient descent methods to minimize output error is hampered by error magnitude dissipation. In addition, low-level nonlinearities prevent the formation of internal representations of widely separated spatial features. The use of strategically placed linear summations or collection units is proposed as a means of overcoming difficulties in determining weight values in limited-interconnect perceptron architectures. K-means clustering is proposed as the method for determining placement. References L.A. Akers, M.R. Walker, O.K. Ferry & R.O. Grondin, "Limited Interconnectivity in Synthetic Neural Systems," in R. Eckmiller and C. v.d. Malsburg eds., Neural Computers. Springer-Verlag, 1988. L.A. Akers & M.R. Walker, "A Limited-Interconnect Synthetic Neural IC," Proceedings of the IEEE International Conference on Neural Networks, p. 11-151,1988. B. Widrow & S.D. Stearns, Adaptive Signal Processing. Prentice-Hall, 1985. D.C. Plaut, SJ. Nowlan & G.E. Hinton, "Experiments on Learning by Back Propagation," Carnegie-Mellon University, Dept. of Computer Science Technical Report, June, 1986. J.L. McClelland, "Resource Requirements of Standard and Programmable Nets," in D.E. Rummelhart and J.L. McClelland eds., Parallel Distributed Processing - Volume 1: Foundations. MIT Press, 1986. R.O. Duda & P.E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.