# On the Computational Power of Winner-Take-All

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 NOTE Communicated by John Lazzaro On the Computational Power of Winner-Take-All Wolfgang Maass Institute for Theoretical Computer Science, Technische Universität Graz, A-8010 Graz, Austria This article initiates a rigorous theoretical analysis of the computational power of circuits that employ modules for computing winner-take-all. Computational models that involve competitive stages have so far been neglected in computational complexity theory, although they are widely used in computational brain models, artificial neural networks, and analog VLSI. Our theoretical analysis shows that winner-take-all is a surprisingly powerful computational module in comparison with threshold gates (also referred to as McCulloch-Pitts neurons) and sigmoidal gates. We prove an optimal quadratic lower bound for computing winner-takeall in any feedforward circuit consisting of threshold gates. In addition we show that arbitrary continuous functions can be approximated by circuits employing a single soft winner-take-all gate as their only nonlinear operation. Our theoretical analysis also provides answers to two basic questions raised by neurophysiologists in view of the well-known asymmetry between excitatory and inhibitory connections in cortical circuits: how much computational power of neural networks is lost if only positive weights are employed in weighted sums and how much adaptive capability is lost if only the positive weights are subject to plasticity. 1 Introduction Computational models that involve competitive stages are widely used in computational brain models, artificial neural networks, and analog VLSI (see Arbib, 1995). The simplest competitive computational module is a hard winner-take-all gate that computes a function WTA n : R n {0, 1} n whose output b 1,...,b n =WTA n (x 1,...,x n ) satisfies { 1, if xi > x b i = j for all j i 0, if x j > x i for some j i. Thus, in the case of pairwise different inputs x 1,...,x n a single output bit b i has value 1, which marks the position of the largest input x i. 1 1 Different conventions are considered in the literature in case that there is no unique winner x i, but we need not specify any convention in this article since our lower-bound result for WTA n holds for all of these versions. Neural Computation 1, (000) c 000 Massachusetts Institute of Technology

2 50 Wolfgang Maass In this article we also investigate the computational power of two common variations of winner-take-all: k-winner-take-all, where the ith output b i has value 1 if and only if x i is among the k largest inputs, and soft winnertake-all, where the ith output is an analog variable r i whose value reflects the rank of x i among the input variables. Winner-take-all is ubiquitous as a computational module in computational brain models, especially in models involving computational mechanisms for attention (Niebur & Koch, 1998). Biologically plausible models for computing winner-take-all in biological neural systems exist on the basis of both the assumption that the analog inputs x i are encoded through firing rates where the most frequently firing neuron exerts the strongest inhibition on its competitors and thereby stops them from firing after a while and the assumption that the analog inputs x i are encoded through the temporal delays of single spikes where the earliest-firing neuron (that encodes the largest x i ) inhibits its competitors before they can fire (Thorpe, 1990). We would like to point to another link between results of this article and computational neuroscience. There exists a notable difference between the computational role of weights of different signs in artificial neural network models on one hand, and anatomical and physiological data regarding the interplay of excitatory and inhibitory neural inputs in biological neural systems on the other hand (see Abeles, 1991, and Shepherd, 1998). Virtually any artificial neural network model is based on an assumed symmetry between positive and negative weights. Typically weights of either sign occur on an equal footing as coefficients in weighted sums that represent the input to an artificial neuron. In contrast, there exists a strong asymmetry regarding positive (excitatory) and negative (inhibitory) inputs to biological neurons. At most 15% of neurons in the cortex are inhibitory neurons, and these are the only neurons that can exert a negative (inhibitory) influence on the activity of other neurons. Furthermore, the location and structure of their synaptic connections to other neurons differ drastically from those formed by excitatory neurons. Inhibitory neurons are usually connected to just neurons in their immediate vicinity, and it is not clear to what extent their synapses are changed through learning. Furthermore, the location of their synapses on the target neurons (rarely on spines, often close to the soma, frequently with multiple synapses on the target neuron) suggests that their computational function is not symmetric to that of excitatory synapses. Rather, such data would support a conjecture that their impact on other neurons may be more of a local regulatory nature that inhibitory neurons do not function as computational units per se like the (excitatory) pyramidal neurons, whose synapses are subject to fine-tuning by various learning mechanisms. These observations from anatomy and neurophysiology have given rise to the question of whether a quite different style of neural circuit design may be feasible, which achieves sufficient computational power without requiring symmetry between excitatory and

4 5 Wolfgang Maass whether there are practically important computational tasks that can be implemented substantially more efficiently on a circuit with lateral connections than on a strictly feedforward circuit. In this article, we address this issue in the following manner: we view modules that compute winner-takeall (whose implementation usually involves lateral connections) as black boxes of which we only model their input-output behavior. We refer to these modules as winner-take-all gates in the following. We study possible computational uses of such winner-take-all gates in circuits where these gates (and other gates) are wired together in a feedforward fashion. We charge one unit of time for the operation of each gate. We will show that in this framework, the new modules, which may internally involve lateral connections, do in fact add substantial computational power to a feedforward circuit. The arguably most powerful gates that have previously been studied in computational complexity theory are threshold gates (also referred to as McCulloch-Pitts neurons or perceptrons; see Minsky & Papert, 1969; Siu, Roychowdury, & Kailath, 1995) and sigmoidal gates (which may be viewed as soft versions of threshold gates). A threshold gate with weights α 1,...,α n R and threshold R computes the function G: R n {0, 1} defined by G(x 1,...,x n ) = 1 n α i x i. Note that AND and OR of n i=1 bits as well as NOT are special cases of threshold gates. A threshold circuit (also referred to as multilayer perceptron) is a feedforward circuit consisting of threshold gates. The depth of a threshold circuit C is the maximal length of a directed path from an input node to an output gate. The circuit is called layered if all such paths have the same length. Note that a circuit with k hidden layers has depth k + 1 in this terminology. Except for theorem 1 we will discuss in this article only circuits with a single output. The other results can be extended to networks with several outputs by duplicating the network so that each output variable is formally computed by a separate network. We will prove in section that WTA n is a rather expensive computational operation from the point of view of threshold circuits, since any such circuit needs quadratically in n many gates to compute WTA n. We will show in section 3 that the full computational power of two layers of threshold gates can be achieved by a single k-winner-take-all gate applied to positive weighted sums of the input variables. Furthermore, we will show in section 4 that by replacing the single k-winner-take-all gate by a single soft winner-take-all gate, these extremely simple circuits become universal approximators for arbitrary continuous functions. We examine computations in such a circuit for only a single batch-input, not for streams of varying inputs. Hence, it does not matter for this analysis whether one implements a winner-take-all gate by a circuit that needs to be reinitialized before the next input, or by a circuit that immediately responds to changes in its input.

5 On the Computational Power of Winner-Take-All 53 An Optimal Quadratic Lower Bound for Hard Winner-Take-All We show in this section that any feedforward circuit consisting of threshold gates needs to consist of quadratically in n many gates for computing WTA n. This result also implies a lower bound for any circuit of threshold gates involving lateral and/or recurrent connections that compute WTA n (provided one assumes that each gate receives at time t only outputs from other gates that were computed before time t). One can simulate any circuit with s gates whose computation takes t discrete time steps by a feedforward circuit with s t gates whose computation takes t discrete time steps. Hence, for example, if there exists some circuit consisting of O(n) threshold gates with lateral and recurrent connections that computes WTA n in t discrete time steps, then WTA n can be computed by a feedforward circuit consisting of O(t n) threshold gates. Therefore theorem 1 implies that WTA n cannot be computed in sublinear time by any linear-size circuit consisting of threshold gates with arbitrary (i.e., feedforward, lateral, and recurrent) connections. In case linear-size implementations of (approximations of) WTA n in analog VLSI can be built whose computation time grows sublinearly in n, then this negative result would provide theoretical evidence for the superior computational capabilities of analog circuits with lateral connections. For n = it is obvious that WTA n can be computed by a threshold circuit of size and that this size is optimal. For n 3 the most straightforward ( design of a (feedforward) threshold circuit C that computes WTA n uses n ) + n threshold gates. For each pair i, j with 1 i < j n, one employs a threshold gate G ij that outputs 1 if and only if x j x i. The ith output b i of WTA n for a circuit input x is computed by a threshold gate G i with G i = 1 j<i G ji(x) + j>i G ij(x) i 1. This circuit design appears to be suboptimal, since most of its threshold gates the ( n ) gates Gij do not make use of their capability to evaluate weighted sums of many variables, with arbitrary weights from R. However, the following result shows that no feedforward threshold circuit (not even with an arbitrary number of layers, and threshold gates of arbitrary fan-in with arbitrary real weights) can compute WTA n with fewer than ( n ) + n gates. Theorem 1. Assume that n 3 and WTA n is computed by some arbitrary feedforward circuit C consisting of threshold gates with arbitrary weights. Then C consists of at least ( n ) + n threshold gates. Proof. Let C be any threshold circuit that computes WTA n for all x = x 1,...,x n R n with pairwise different x 1,...,x n. We say that a threshold gate in the circuit C contains an input variable x k if there exists a direct wire (i.e., an edge in the directed graph underlying C) from the kth input node to this gate, and the kth input variable x k occurs in the weighted sum of

6 54 Wolfgang Maass this threshold gate with a weight 0. This threshold gate may also receive outputs from other threshold gates as part of its input, since we do not assume that C is a layered circuit. Fix any i, j {1,...,n} with i j. We will show that there exists a gate G ij in C that contains the input variables x i and x j, but no other input variables. The proof proceeds in four steps: Step 1. Choose h (0.6, 0.9) and ρ (0, 0.1) such that no gate G in C that contains the input variable x j, but no other input variable, changes its output value when x j varies over (h ρ,h + ρ), no matter which fixed binary values a have been assigned to the other inputs of gate G. For any fixed a there exists at most a single value t a for x j,so that G changes its output for x j = t a when x j varies from to +. It suffices to choose h and ρ so that none of these finitely many values of t a (for arbitrary binary a and arbitrary gates G) falls into the interval (h ρ,h + ρ). Step. Choose a closed ball B (0, 0.5) n with a center c R n and a radius δ>0 so that no gate G in C changes its output when we set x i = x j = h and let the vector of the other n input variables vary over B (this is required to hold for any fixed binary values of the inputs that G may receive from other threshold gates). We exploit here that for fixed x i = x j = h, the gates in C (with inputs from other gates replaced by all possible binary values) partition (0, 0.5) n into finitely many sets S, each of which can be written as an intersection of half-spaces, so that no gate in C changes its output when we fix x i = x j = h and let the other n input variables of the circuit vary over S (while keeping inputs to C from other gates artificially fixed). Step 3. Choose γ (0,ρ)so that in every gate G in C that contains besides x j some other input variable x k with k / {i, j}, a change of x j by an amount γ causes a smaller change of the weighted sum at this threshold gate G than a change by an amount δ of any of the input variables x k with k / {i, j} that it contains. This property can be satisfied because we can choose for γ an arbitrarily small positive number. Step 4. Set x i := h, x k k/ {i,j} := c, and let x j vary over [h γ, h + γ ]. By assumption, the output of C changes since the output of WTA n (x 1,...,x n ) changes (x i is the winner for x j < h, x j is the winner for x j > h). Let G ij be some gate in C that changes its output, whereas no gate that lies on a path from an input variable to G ij changes its output. Hence, G ij contains the input variable x j. The choice of h and ρ>γin step 1 implies that G ij contains besides x j some other input variable. Assume for a contradiction that G ij contains an input variable x k with k / {i, j}. By the choice of γ in step 3

7 On the Computational Power of Winner-Take-All 55 this implies that the output of G ij changes when we set x i = x j = h and move x k by an amount up to δ from its value in c, while keeping all other input variables and inputs that G ij receives from other threshold gates fixed (even if some preceding threshold gates in C would change their output in response to this change in the input variable x k ). This yields a contradiction to the definition of the ball B in step. Thus we have shown that k {i, j} for any input variable x k that G ij contains. Therefore G ij contains exactly the input variables x i and x j. So far we have shown that for any i, j {1,...,n} with i j, there exists a gate G ij in C that contains the two input variables x i, x j, and no other input variables. It remains to be shown that apart from these ( n ) gates Gij the circuit C contains n other gates G 1,...,G n that compute the n output bits b 1,...,b n of WTA n. It is impossible that G k = G l for some k l, since b k b l for some arguments of WTA n. Assume for a contradiction that G k = G ij for some k, i, j {1,...,n} with i j. We have k i or k j. Assume without loss of generality that k i. Let l be any number in {1,...,n} {k, i}. Assign to x 1,...,x n some pairwise different values a 1,...,a n so that a l > a l for all l l. Since l k and i k, we have that for any x i R, the kth output variable b k of WTA n (a 1,...,a i 1, x i, a i+1,...a n ) has value 0 (a k cannot be the maximal argument for any value of x i since a l > a k ). On the other hand, since G ij contains the variable x i, there exist values for x i (move x i or x i ) so that G ij outputs 1, no matter which binary values are assigned to the inputs that G ij receives in C from other threshold gates (while we keep x j fixed at value a j ). This implies that G ij does not output b k in circuit C, that is, G k G ij. Therefore, the circuit C contains in addition to the gates G ij n other gates that provide the circuit outputs b 1,...,b n. 3 Simulating Two Layers of Threshold Gates with a Single k-winner-take-all Gate as the Only Nonlinearity A popular variation of winner-take-all (which has also been implemented in analog VLSI; Urahama & Nagao, 1995) is k-winner-take-all. The output of k-winner-take-all indicates for each input variable x i whether x i is among the k largest inputs. Formally we define for any k {1,...,n} the function k-wta n : R n {0, 1} n where k-wta n (x 1,...,x n ) = b 1,...,b n has the property that b i = 1 (x j > x i holds for at most k 1 indices j). We will show in this section that a single gate that computes k-wta n can absorb all nonlinear computational operations of any two-layer threshold circuit with one output gate and any number of hidden gates.

8 56 Wolfgang Maass Theorem. Any two-layer feedforward circuit C (with m analog or binary input variables and one binary output variable) consisting of threshold gates can be simulated by a circuit consisting of a single k-winner-take-all gate k-wta n applied to n weighted sums of the input variables with positive weights, except for some set S R m of inputs that has measure 0. In particular, any boolean function f : {0, 1} m {0, 1} can be computed by a single k-winner-take-all gate applied to weighted sums of the input bits. If C has polynomial size and integer weights, whose size is bounded by a polynomial in m, then n can be bounded by a polynomial in m, and all weights in the simulating circuit are natural numbers whose size is bounded by a polynomial in m. Remark 1. The exception set of measure 0 in this result is a union of up to n hyperplanes in R m. This exception set is apparently of no practical significance, since for any given finite set D, not just for D {0, 1} m but, for example, for D := { z 1,...,z m R m : each z i is a rational number with bitlength 1000}, one can move these hyperplanes (by adjusting the constant terms in their definitions) so that no point in D belongs to the exception set. Hence the k-wta circuit can simulate the given threshold circuit C on any finite set D, with an architecture that is independent of D. On the other hand, the proof of the lower bound result from theorem 1 requires that the circuit C computes WTA everywhere, not just on a finite set. Remark. One can easily show that the exception set S of measure 0 in theorem is necessary: The set of inputs z R m for which a k-wta n gate applied to weighted sums outputs 1 is always a closed set, whereas the set of inputs for which a circuit C of depth consisting of threshold gates outputs 1 can be an open set. Hence in general, a circuit C of the latter type cannot be simulated for all inputs z R m by a k-wta n gate applied to weighted sums. Corollary 1. Any layered threshold circuit C of arbitrary even depth d can be simulated for all inputs except for a set of measure 0 by a feedforward circuit consisting of l k-wta gates on d layers (each applied to positive weighted sums of circuit inputs on the first layer and of outputs from preceding k-wta gates on the subsequent layers), where l is the number of gates on even-numbered layers in C. Thus, one can view the simulating circuit as a d-layer circuit with alternating layers of linear and k-wta gates. Alternatively one can simulate the layers 3 to d of circuit C by a single k- WTA gate applied to a (possibly very large) number of linear gates. This yields a simulation of C (except for some input set S of measure 0) by a four-layer circuit with alternating layers of linear gates and k-wta gates. The number of k-wta gates in this circuit can be bounded by l + 1, where l is the number of threshold gates on layer of C.

9 On the Computational Power of Winner-Take-All 57 Proof of Theorem. Since the outputs of the gates on the hidden layer of C are from {0, 1}, we can assume without loss of generality that the weights α 1,...,α n of the output gate G of C are from { 1, 1}. (See, e.g., Siu et al., 1995, for details. One first observes that it suffices to use integer weights for threshold gates with binary inputs; one can then normalize these weights to values in { 1, 1} by duplicating gates on the hidden layer of C.) Thus, for any z R m, we have C(z) = 1 n α jĝj(z), where Ĝ 1,...,Ĝ n are the threshold gates on the hidden layer of C, α 1,...,α n are from { 1, 1}, and is the threshold of the output gate G. In order to eliminate the negative weights in G, we replace each gate Ĝ j for which α j = 1 by another threshold gate G j so that G j (z) = 1 Ĝ j (z) for all z R m except on some hyperplane. We utilize here that m i=1 w iz i m i=1 ( w i)z i > for arbitrary w i, z i, R. We set G j := Ĝ j for all j {1,...,n} with α j = 1. Then we have for all z R m, except for z from some exception set S consisting of up to n hyperplanes, n n α j Ĝ j (z) = G j (z) {j {1,...,n}: α j = 1}. Hence C(z) = 1 n G j(z) k for all z R m S, for some suitable k N. Let w j 1,...,wj m R be the weights and j R be the threshold of gate G j, j = 1,...,n. Thus, G j (z) = 1 i: wi>0 j wj i z i i: wi<0 j wj i z i j. Hence with S j : = wi z j i + j + w l i z i for j = 1,...,n l j and S n+1 : = i: w j i<0 n i: wi>0 j w j i z i, i: w l i >0 we have for every j {1,...,n} and every z R m : S n+1 S j wi z j i wi z j i j G j (z) = 1. i: w j i>0 i: w j i<0 This implies that the (n + 1)st output b n+1 of the gate (n k + 1) WTA n+1 applied to S 1,...,S n+1 satisfies b n+1 = 1 {j {1,...,n + 1}: S j > S n+1 } n k {j {1,...,n + 1}: S n+1 S j } k + 1 {j {1,...,n}: S n+1 S j } k n G j (z) k C(z) = 1.

10 58 Wolfgang Maass Note that all the coefficients in the sums S 1,...,S n+1 are positive. Proof of Corollary 1. Apply the preceding construction separately to the first two layers of C, then to the second two layers of C, and so on. Note that the later layers of C receive just boolean inputs from the preceding layer. Hence one can simulate the computation of layer 3 to d also by a two-layer threshold circuit, which requires just a single k WTA gate for its simulation in our approach from theorem. 4 Soft Winner-Take-All Applied to Positive Weighted Sums Is an Universal Approximator We consider in this section a soft version soft-wta of a winner-take-all gate. Its output r 1,...,r n for real valued inputs x 1,...,x n consists of n analog numbers r i, whose value reflects the relative position of x i within the ordering of x 1,...,x n according to their size. Soft versions of winner-takeall are also quite plausible as computational function of cortical circuits with lateral inhibition. Efficient implementations in analog VLSI are still elusive, but in Indiveri (in press), an analog VLSI circuit for a version of soft winner-take-all has been presented where the time when the ith output unit is activated reflects the rank of x i among the inputs. We show in this section that single gates from a fairly large class of soft winner-take-all gates can serve as the only nonlinearity in universal approximators for arbitrary continuous functions. The only other computational operations needed are weighted sums with positive weights. We start with a basic version of a soft winner-take-all gate soft-wta n,k for natural numbers k, n with k n, which computes the following function x 1,...,x n r 1,...,r n from R n into [0, 1] n : ( {j {1,...,n}: xi x j } n r i := π k where π: R [0, 1] is the piecewise linear function defined by 1, if x > 1 π(x) = x, if 0 x 1 0, if x < 0. If one implements a soft winner-take-all gate via lateral inhibition, one can expect that its ith output r i is lowered (down from 1) by every competitor x j that wins the competition with x i (i.e., x j > x i ). Furthermore, one can expect that the ith output r i is completely annihilated (i.e., is set equal to 0) once the number of competitors x j that win the competition with x i reach a certain critical value c. This intuition has served as a guiding principle in the previously described formalization. However, for the sake of mathematical ),

11 On the Computational Power of Winner-Take-All 59 simplicity, we have defined the outputs r i of soft-wta n,k not in terms of the number of competitors x j that win over x i (i.e., x j > x i ), but rather in terms of the number of competitors x j that do not win over x i (i.e., x i x j ). Obviously one has {j {1,...,n}: x i x j } = n {j {1,...,n} : x j > x i }. Hence, the value of r i goes down if more competitors x j win over x i as desired. The critical number c of winning competitors that suffice to annihilate r i is formalized through the threshold n. Our subsequent result shows that in principle, it suffices to set the annihilation-threshold always equal to n. But in applications, one may, of course, choose other values for the annihilation threshold. It is likely that an analog implementation of a soft-winner-take-all gate will not be able to produce an output that is really linearly related to {j {1,...,n}: x i x j } n over a sufficiently large range of values. Instead, an analog implementation is more likely to output a value of the form ( {j {1,...,n}: xi x j } n ) g, k where the piecewise linear function π is replaced by some possibly rather complicated nonlinear warping of the target output values. The following result shows that such gates with any given nonlinear warping g can serve just as well as the competitive stage (and as the only nonlinearity) in universal approximators. More precisely, let g be any continuous function g: R [0, 1] that has value 1 for x > 1, value 0 for x < 0, and which is strictly increasing over the interval [0, 1]. We denote a soft winner-take-all gate that employs g instead of π by soft-wta g n,k. Theorem 3. Assume that h: D [0, 1] is an arbitrary continuous function with a bounded and closed domain D R m (for example: D = [0, 1] m ). Then for any ε>0 and for any function g (satisfying above conditions) there exist natural numbers k, n, biases α j 0 R, and nonnegative coefficients αj i for i = 1,...,m and j = 1,...,n, so that the circuit consisting of the soft winner-take-all gate soft-wta g n,k applied to the n sums m i=1 αj i z i + α j 0 for j = 1,...,n computes a function 3 f : D [0, 1] so that f (z) h(z) <εfor all z D. Thus, circuits consisting of a single soft WTA-gate applied to positive weighted sums of the input variables are universal approximators for continuous functions. Remark 3. The proof shows that the number n of sums that arise in the circuit constructed in theorem 3 can essentially be bounded in terms of the number of hidden gates in a one-hidden-layer circuit C of sigmoidal gates that approximates the given continuous function h (more precisely, 3 More precisely, we set f (z) := r n for the nth output variable r n of soft-wta g n,k.

12 530 Wolfgang Maass the function g 1 h), the maximal size of weights (expressed as multiple of the smallest nonzero weight) on the second layer of C, and 1/ε (where ε is the desired approximation precision). Numerous practical applications of backprop suggest that at least the first one of these three numbers can be kept fairly small for most functions h that are of practical interest. Proof of Theorem 3. The proof has the following structure. We first apply the regular universal approximation theorem for sigmoidal neural nets in order to approximate the continuous function g 1 (h(z)) by a weighted sum ñ β j Gj 1(z) of π-gates, that is, of sigmoidal gates G1 j that employ the piecewise linear activation function π that has already been defined. As the next step, we simplify the weights and approximate the weighted sum ñ β j Gj 1(z) by another weighted sum ˆn α j ˆk G j (z) of π gates G j with α j { 1, 1} for j = 1,..., ˆn and some suitable ˆk N. We then eliminate all negative weights by making use of the fact that one can rewrite Gj (z) as Gj 3(z) 1 with another π gate G3 j, for all j with α j = 1. By setting Gj 3 := Gj for all j with α j = 1, we then have ˆn ˆn α j ˆk G j (z) = G3 j (z) ˆT for ˆk all z D, with some ˆT {0,..., ˆn}. As the next step, we approximate each π gate G 3 by a sum 1 l l i=1 G (i) of l threshold gates G (i). Thus, altogether we have constructed an approximation of the continuous function g 1 (h(z)) by a weighted sum n G j (z) T k of threshold gates with a uniform value 1 k for all weights. 4 By expanding our technique from the proof of theorem, we can replace this simple sum of threshold gates by a single soft-wta g n,k gate applied to weighted sums in such a way that the resulting WTA circuit approximates the given function h. We now describe the proof in detail. According to Leshno, Lin, Pinkus, and Schocken (1993) there exist ñ N, β j R, and π gates Gj 1 for j = 1,...,ñ so that ñ β j Gj 1 (z) g 1 (h(z)) < ε for all z D, 3 where g 1 is the inverse of the restriction of the given function g to [0, 1], and ε >0 is chosen so that g(x) g(y) <εfor any x, y R with x y < ε. We are relying here on the fact that g 1 h is continuous. 4 One could, of course, also apply the universal approximation theorem directly to threshold gates (instead of π gates) in order to get an approximation of g 1 h by a weighted sum of threshold gates. But the elimination of negative weights would then give rise to an exception set S, as in theorem 3.

13 On the Computational Power of Winner-Take-All 531 As the next step, we simplify the weights. It is obvious that there exist ˆk N and β 1,...,βñ Z so that ñ β j β j ˆk < ε 3. We then have ñ β j ˆk G1 j (z) g 1 (h(z)) < ε for all z D. 3 We set ˆn := ñ β j, and for all j {1,...,ñ} we create β j copies of the π-gate Gj 1. Let G 1,...,Gˆn be the resulting sequence of π gates G j. Then there exist α j { 1, 1} for j = 1,..., ˆn so that ˆn α j ˆk G j (z) g 1 (h(z)) < ε for all z D. 3 We now eliminate all negative weights from this weighted sum. More precisely, we show that there are π gates G 3 1 so that,...,g3ˆn ˆn ˆn α j ˆk G j (z) = G3 j (z) ˆT ˆk for all z D, (4.1) where ˆT is the number of j {1,..., ˆn} with α j = 1. Consider some arbitrary j {1,..., ˆn} with α j = 1. Assume that Gj is defined by Gj (z) = π(w z+w ) for all z Rm, with parameters w R m and w 0 R. Because of the properties of the function π, we have ( π w z + w ) ( = 1 + π w z w ) for all z R m. Indeed, if w z + w 0 1 we have π(w z + w ) = w z w 0 1 = 1 + π( w z w ).Ifw z + w 0 < 1 then π(w z + w ) = 0 = 1 + π( w z w ), and if w z + w 0 > 1 then π(w z + w ) = 1 = 1 + π( w z w ). Thus, if we define the π gate Gj 3 by ( Gj 3 (z) = π w z w ) for all z R m, we have Gj (z) = 1 + G3 j (z) for all z Rm. Besides transforming all π gates Gj with α j = 1 in this fashion, we set Gj 3 = Gj for all j {1,..., ˆn} with α j = 1. Obviously equation 4.1 is satisfied with these definitions. Since the activation function π can be approximated arbitrarily close by step functions, there exists for each π gate G 3 a sequence G (1),...,G (l) of threshold gates so that l G3 i=1 (z) G (i)(z) ε ˆk < for all z R m. l 3 ˆn

14 53 Wolfgang Maass By applying this transformation to all π-gates G 3 1, we arrive at a,...,g3ˆn sequence G 1,...,G n of threshold gates with n := ˆn l so that one has for k := ˆk l and T := ˆT l that n G j(z) T g 1 (h(z)) < ε for all z D. k According to the choice of ε, this implies that ( n g G ) j(z) T h(z) <ε for all z D. k By adding dummy threshold gates G j that give constant output for all z D, one can adjust n and T so that n k and T = n 1, without changing the value of n G j(z) T for any z D. It remains to show that ( n g G ) j(z) n 1 k can be computed for all z D, by some soft winner-take-all gate soft-wta g n,k applied to n weighted sums with positive coefficients. We set n := n + 1. For j = 1,...,n and i = 1,...,m let wi, j j R be parameters so that G j (z) = 1 wi z j i wi z j i j. i: w j i>0 i: w j i<0 Then n G j(z) = {j {1,...,n }: S n +1 S j } for S n +1 := n i: wi>0 j wj i z i and S j := i: wi<0 j wj i z i + j + l j i: w l i >0 wl i z i. This holds because we have for every j {1,...,n } S n +1 S j wi z j i wi z j i j. i: w j i>0 i: w j i<0 Since n = n + 1 and therefore n = n, the preceding implies that the nth output variable r n of soft-wta g n,k applied to S 1,...,S n outputs ( {j {1,...,n}: Sn S j } n ) ( n g = g G j(z) n 1 k k for all z D. Note that all weights in the weighted sums S 1,...,S n are positive. )

15 On the Computational Power of Winner-Take-All Conclusions We have established the first rigorous analytical results regarding the computational power of winner-take-all. The lower-bound result of section shows that the computational power of hard winner-take-all is already quite large, even if compared with the arguably most powerful gate commonly studied in circuit complexity theory: the threshold gate (also referred to a McCulloch-Pitts neuron or perceptron). Theorem 1 yields an optimal quadratic lower bound for computing hard winner-take-all on any feedforward circuit consisting of threshold gates. This implies that no circuit consisting of linearly many threshold gates with arbitrary (i.e., feedforward, lateral, and recurrent) connections can compute hard winner-take-all in sublinear time. Since approximate versions of winner-take-all can be computed very fast in linear-size analog VLSI chips (Lazzaro et al., 1989), this lower-bound result may be viewed as evidence for a possible gain in computational power that can be achieved by lateral connections in analog VLSI (and apparently also in cortical circuits). It is well known (Minsky & Papert, 1969) that a single threshold gate is unable to compute certain important functions, whereas circuits of moderate (i.e., polynomial)-size consisting of two layers of threshold gates with polynomial size weights have remarkable computational power (Siu et al., 1995). We showed in section 3 that any such two-layer (i.e., one hidden layer) circuit can be simulated by a single competitive stage, applied to polynomially many weighted sums with positive integer weights of polynomial size. In section 4, we analyzed the computational power of soft winner-takeall gates in the context of analog computation. It was shown that a single soft winner-take-all gate may serve as the only nonlinearity in a class of circuits that have universal computational power in the sense that they can approximate any continuous functions. In addition, we showed that this result is robust with regard to any continuous nonlinear warping of the output of such a soft winner-take-all gate, which is likely to occur in an analog implementation of soft winner-take-all in hardware or wetware. Furthermore, our novel universal approximators require only positive linear operations besides soft winner-take-all, thereby showing that in principle, no computational power is lost if inhibition is used exclusively for unspecific lateral inhibition in neural circuits, and no flexibility is lost if synaptic plasticity (i.e., learning ) is restricted to excitatory synapses. This result appears to be of interest for understanding the function of biological neural systems, since typically only 15% of synapses in the cortex are inhibitory, and plasticity for those synapses is somewhat dubious. Our somewhat surprising results regarding the computational power and universality of winner-take-all point to further opportunities for lowpower analog VLSI chips, since winner-take-all can be implemented very efficiently in this technology. In particular, our theoretical results of section 4 predict that efficient implementations of soft winner-take-all will be useful

17 On the Computational Power of Winner-Take-All 535 Neural Networks, 3, Elias, S. A., & Grossberg, S. (1975). Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center off-surround networks. Biol. Cybern., 0, Fang, Y., Cohen, M., & Kincaid, M. (1996). Dynamics of a winner-take-all neural network. Neural Networks, 9, Horiuchi, T. K., Morris, T. G., Koch, C., & DeWeerth, S. P. (1997). Analog VLSI circuits for attention-based visual tracking. In M. Mozer, M. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems (pp ). Cambridge, MA: MIT Press. Indiveri, G. (in press). Modeling selective attention using a neuromorphic analog-vlsi device. Neural Computation. Lazzaro, J., Ryckebusch, S., Mahowald, M. A., & Mead, C. A. (1989). Winnertake-all networks of O(n) complexity. In D. Touretzky (Ed.), Advances in neural information processing systems, 1 (pp ). San Mateo, CA: Morgan Kaufmann. Leshno, M., Lin, V., Pinkus, A., & Schocken, S. (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6, Maass, Wolfgang. (1999). Neural computation with winner-take-all as the only nonlinear operation. Advances in Neural Information Processing Systems 1999, 1. Cambridge: MIT Press. Meador, J. L., & Hylander, P. D. (1994). Pulse coded winner-take-all networks. In M. E. Zaghloul, J. Meador, & R. W. Newcomb (Eds.), Silicon implementation of pulse coded neural networks (pp ). Boston: Kluwer Academic Publishers. Minsky, M. C., & Papert, S. A. (1969). Perceptrons. Cambridge, MA: MIT Press. Niebur, E., & Koch, C. (1998). Computational architectures for attention. In R. Parasuraman (Ed.), The Attentive Brain (pp ). Cambridge, MA: MIT Press. Savage, J. E. (1998). Models of computation: Exploring the power of computing. Reading, MA: Addison-Wesley. Shepherd, G. M. (Ed.). (1998). The synaptic organization of the brain. Oxford: Oxford University Press. Siu, K.-Y., Roychowdhury, V., & Kailath, T. (1995). Discrete neural computation: A theoretical foundation. Englewood Cliffs, NJ: Prentice Hall. Thorpe, S. J. (1990). Spike arrival times: A highly efficient coding scheme for neural networks. In R. Eckmiller, G. Hartmann, and G. Hauske (Eds.), Parallel processing in neural systems and computers (pp ). Amsterdam: Elsevier. Urahama, K., & Nagao, T. (1995). k-winner-take-all circuit with O(N) complexity. IEEE Trans. on Neural Networks, 6, Yuille, A. L., & Grzywacz, N. M. (1989). A winner-take-all mechanism based on presynaptic inhibition. Neural Computation, 1, Wegener, I. (1987). The complexity of boolean functions. New York: Wiley. Received May 1, 1999; accepted November 15, 1999.

### A Current-Mode Hysteretic Winner-take-all Network, with Excitatory and Inhibitory Coupling

C Analog Integrated Circuits and Signal Processing, 28, 279 291, 21 21 Kluwer Academic Publishers. Manufactured in The Netherlands. A Current-Mode Hysteretic Winner-take-all Network, with Excitatory and

### Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

### Threshold Logic. 2.1 Networks of functions

2 Threshold Logic 2. Networks of functions We deal in this chapter with the simplest kind of computing units used to build artificial neural networks. These computing elements are a generalization of the

### Learning to classify complex patterns using a VLSI network of spiking neurons

Learning to classify complex patterns using a VLSI network of spiking neurons Srinjoy Mitra, Giacomo Indiveri and Stefano Fusi Institute of Neuroinformatics, UZH ETH, Zurich Center for Theoretical Neuroscience,

### Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

### Lecture 1: Introduction to Neural Networks Kevin Swingler / Bruce Graham

Lecture 1: Introduction to Neural Networks Kevin Swingler / Bruce Graham kms@cs.stir.ac.uk 1 What are Neural Networks? Neural Networks are networks of neurons, for example, as found in real (i.e. biological)

### Neural Voting Machines

massachusetts institute of technology computer science and artificial intelligence laboratory Neural Voting Machines Whitman Richards & H. Sebastian Seung AI Memo 2004-029 December 2004 2004 massachusetts

### Recurrent Neural Networks

Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time

### Tolerance of Radial Basis Functions against Stuck-At-Faults

Tolerance of Radial Basis Functions against Stuck-At-Faults Ralf Eickhoff 1 and Ulrich Rückert 1 Heinz Nixdorf Institute System and Circuit Technology University of Paderborn, Germany eickhoff,rueckert@hni.upb.de

### 6. Feed-forward mapping networks

6. Feed-forward mapping networks Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002. Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer

### WINNER-TAKE-ALL ABSTRACT

703 WINNER-TAKE-ALL NETWORKS OF O(N) COMPLEXITY J. Lazzaro, S. Ryckebusch, M.A. Mahowald, and C. A. Mead California Institute of Technology Pasadena, CA 91125 ABSTRACT We have designed, fabricated, and

### Biological Neurons and Neural Networks, Artificial Neurons

Biological Neurons and Neural Networks, Artificial Neurons Neural Computation : Lecture 2 John A. Bullinaria, 2015 1. Organization of the Nervous System and Brain 2. Brains versus Computers: Some Numbers

### Learning. Artificial Intelligence. Learning. Types of Learning. Inductive Learning Method. Inductive Learning. Learning.

Learning Learning is essential for unknown environments, i.e., when designer lacks omniscience Artificial Intelligence Learning Chapter 8 Learning is useful as a system construction method, i.e., expose

### Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

### Models of Cortical Maps II

CN510: Principles and Methods of Cognitive and Neural Modeling Models of Cortical Maps II Lecture 19 Instructor: Anatoli Gorchetchnikov dy dt The Network of Grossberg (1976) Ay B y f (

### A 2D Neuromorphic VLSI architecture for modeling selective attention

A D Neuromorphic VLSI architecture for modeling selective attention Giacomo Indiveri Institute of Neuroinformatics Swiss Federal Institute of Technology/University of Zurich December 5, 999 Abstract Selective

### ON THE K-WINNERS-TAKE-ALL NETWORK

634 ON THE K-WINNERS-TAKE-ALL NETWORK E. Majani Jet Propulsion Laboratory California Institute of Technology R. Erlanson, Y. Abu-Mostafa Department of Electrical Engineering California Institute of Technology

### TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC

777 TRAINING A LIMITED-INTERCONNECT, SYNTHETIC NEURAL IC M.R. Walker. S. Haghighi. A. Afghan. and L.A. Akers Center for Solid State Electronics Research Arizona State University Tempe. AZ 85287-6206 mwalker@enuxha.eas.asu.edu

### A TUTORIAL. BY: Negin Yousefpour PhD Student Civil Engineering Department TEXAS A&M UNIVERSITY

ARTIFICIAL NEURAL NETWORKS: A TUTORIAL BY: Negin Yousefpour PhD Student Civil Engineering Department TEXAS A&M UNIVERSITY Contents Introduction Origin Of Neural Network Biological Neural Networks ANN Overview

### Implementation of Neural Gas training in analog VLSI

Implementation of Neural Gas training in analog VLSI Fabio Ancona, Stefano Rovetta, and Rodolfo Zunino Department of Biophysical and Electronic Engineering University of Genova Via all Opera Pia 11a, 16145

### A single minimal complement for the c.e. degrees

A single minimal complement for the c.e. degrees Andrew Lewis Leeds University, April 2002 Abstract We show that there exists a single minimal (Turing) degree b < 0 s.t. for all c.e. degrees 0 < a < 0,

### Biological Neurons and Neural Networks, Artificial Neurons

Biological Neurons and Neural Networks, Artificial Neurons Introduction to Neural Networks : Lecture 2 John A. Bullinaria, 2004 1. Organization of the Nervous System and Brain 2. Brains versus Computers:

### THE HUMAN BRAIN. observations and foundations

THE HUMAN BRAIN observations and foundations brains versus computers a typical brain contains something like 100 billion miniscule cells called neurons estimates go from about 50 billion to as many as

### princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of

### Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.

Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.com This paper contains a collection of 31 theorems, lemmas,

### Neural networks. Chapter 20, Section 5 1

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### 2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

### MAT2400 Analysis I. A brief introduction to proofs, sets, and functions

MAT2400 Analysis I A brief introduction to proofs, sets, and functions In Analysis I there is a lot of manipulations with sets and functions. It is probably also the first course where you have to take

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### Introduction to Artificial Neural Networks

POLYTECHNIC UNIVERSITY Department of Computer and Information Science Introduction to Artificial Neural Networks K. Ming Leung Abstract: A computing paradigm known as artificial neural network is introduced.

### Coefficient of Potential and Capacitance

Coefficient of Potential and Capacitance Lecture 12: Electromagnetic Theory Professor D. K. Ghosh, Physics Department, I.I.T., Bombay We know that inside a conductor there is no electric field and that

### arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

### Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승 How much energy do we need for brain functions? Information processing: Trade-off between energy consumption and wiring cost Trade-off between energy consumption

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### Lecture 6. Artificial Neural Networks

Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

### Self Organizing Maps: Fundamentals

Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen

### Direct Methods for Solving Linear Systems. Linear Systems of Equations

Direct Methods for Solving Linear Systems Linear Systems of Equations Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University

### U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory

### Integer roots of quadratic and cubic polynomials with integer coefficients

Integer roots of quadratic and cubic polynomials with integer coefficients Konstantine Zelator Mathematics, Computer Science and Statistics 212 Ben Franklin Hall Bloomsburg University 400 East Second Street

### Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras

Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture No. # 31 Recursive Sets, Recursively Innumerable Sets, Encoding

### 1 SELF-ORGANIZATION MECHANISM IN THE NETWORKS

Mathematical literature reveals that the number of neural network structures, concepts, methods, and their applications have been well known in neural modeling literature for sometime. It started with

### Neural Computing with Small Weights

Neural Computing with Small Weights Kai-Yeung Siu Dept. of Electrical & Computer Engineering University of California, Irvine Irvine, CA 92717 J ehoshua Bruck IBM Research Division Almaden Research Center

### Knowledge Incorporation into Neural Networks From Fuzzy Rules

published in: Neural Processing Letters (3), pages 23-242, 999. Knowledge Incorporation into Neural Networks From Fuzzy Rules Yaochu Jin (yaochu.jin@alliedsignal.com) Dept. of Industrial Engineering The

### SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

UDC: 004.8 Original scientific paper SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS Tonimir Kišasondi, Alen Lovren i University of Zagreb, Faculty of Organization and Informatics,

### Best Monotone Degree Bounds for Various Graph Parameters

Best Monotone Degree Bounds for Various Graph Parameters D. Bauer Department of Mathematical Sciences Stevens Institute of Technology Hoboken, NJ 07030 S. L. Hakimi Department of Electrical and Computer

### TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE

494 TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE Avrim Blum'" MIT Lab. for Computer Science Cambridge, Mass. 02139 USA Ronald L. Rivest t MIT Lab. for Computer Science Cambridge, Mass. 02139 USA ABSTRACT

### JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Scientiae Mathematicae Japonicae Online, Vol. 10, (2004), 431 437 431 JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS Ondřej Čepeka and Shao Chin Sung b Received December May 12, 2003; revised February

### Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

### The Dirichlet Unit Theorem

Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

### ! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

### Week 5 Integral Polyhedra

Week 5 Integral Polyhedra We have seen some examples 1 of linear programming formulation that are integral, meaning that every basic feasible solution is an integral vector. This week we develop a theory

### MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL G. Maria Priscilla 1 and C. P. Sumathi 2 1 S.N.R. Sons College (Autonomous), Coimbatore, India 2 SDNB Vaishnav College

### SOLUTIONS TO ASSIGNMENT 1 MATH 576

SOLUTIONS TO ASSIGNMENT 1 MATH 576 SOLUTIONS BY OLIVIER MARTIN 13 #5. Let T be the topology generated by A on X. We want to show T = J B J where B is the set of all topologies J on X with A J. This amounts

### Artificial neural networks

Artificial neural networks Now Neurons Neuron models Perceptron learning Multi-layer perceptrons Backpropagation 2 It all starts with a neuron 3 Some facts about human brain ~ 86 billion neurons ~ 10 15

### On Neural Networks with Minimal Weights

On Neural Networks with Minimal Weights Vasken Bohossian J ehoshua Bruck California Institute of Technology Mail Code 136-93 Pasadena, CA 91125 E-mail: {vincent, bruck }«Iparadise. cal tech. edu Abstract

### Introduction to computer science

Introduction to computer science Michael A. Nielsen University of Queensland Goals: 1. Introduce the notion of the computational complexity of a problem, and define the major computational complexity classes.

### Notes on Factoring. MA 206 Kurt Bryan

The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

### Introduction to Artificial Neural Networks MAE-491/591

Introduction to Artificial Neural Networks MAE-491/591 Artificial Neural Networks: Biological Inspiration The brain has been extensively studied by scientists. Vast complexity prevents all but rudimentary

### Masters research projects. 1. Adapting Granger causality for use on EEG data.

Masters research projects 1. Adapting Granger causality for use on EEG data. Background. Granger causality is a concept introduced in the field of economy to determine which variables influence, or cause,

### Optimal Index Codes for a Class of Multicast Networks with Receiver Side Information

Optimal Index Codes for a Class of Multicast Networks with Receiver Side Information Lawrence Ong School of Electrical Engineering and Computer Science, The University of Newcastle, Australia Email: lawrence.ong@cantab.net

### Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

### On the representability of the bi-uniform matroid

On the representability of the bi-uniform matroid Simeon Ball, Carles Padró, Zsuzsa Weiner and Chaoping Xing August 3, 2012 Abstract Every bi-uniform matroid is representable over all sufficiently large

### ECEN 5682 Theory and Practice of Error Control Codes

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007 Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols.

### Lecture 2: The Complexity of Some Problems

IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Basic Course on Computational Complexity Lecture 2: The Complexity of Some Problems David Mix Barrington and Alexis Maciel July 18, 2000

### Rate-based artificial neural networks and error backpropagation learning. Scott Murdison Machine learning journal club May 16, 2016

Rate-based artificial neural networks and error backpropagation learning Scott Murdison Machine learning journal club May 16, 2016 Murdison, Leclercq, Lefèvre and Blohm J Neurophys 2015 Neural networks???

### ! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

### A class of on-line scheduling algorithms to minimize total completion time

A class of on-line scheduling algorithms to minimize total completion time X. Lu R.A. Sitters L. Stougie Abstract We consider the problem of scheduling jobs on-line on a single machine and on identical

### The minimum number of monochromatic 4-term progressions

The minimum number of monochromatic 4-term progressions Rutgers University May 28, 2009 Monochromatic 3-term progressions Monochromatic 4-term progressions 1 Introduction Monochromatic 3-term progressions

### The Counterpropagation Network

214 The Counterpropagation Network The Counterpropagation Network " The Counterpropagation network (CPN) is the most recently developed of the models that we have discussed so far in this text. The CPN

### OSTROWSKI FOR NUMBER FIELDS

OSTROWSKI FOR NUMBER FIELDS KEITH CONRAD Ostrowski classified the nontrivial absolute values on Q: up to equivalence, they are the usual (archimedean) absolute value and the p-adic absolute values for

International Journal of Computer Trends and Technology- volume4issue2-2013 ABSTRACT: Neural Network Design in Cloud Computing B.Rajkumar #1,T.Gopikiran #2,S.Satyanarayana *3 #1,#2Department of Computer

### 1 Polyhedra and Linear Programming

CS 598CSC: Combinatorial Optimization Lecture date: January 21, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im 1 Polyhedra and Linear Programming In this lecture, we will cover some basic material

### Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

### Two classes of ternary codes and their weight distributions

Two classes of ternary codes and their weight distributions Cunsheng Ding, Torleiv Kløve, and Francesco Sica Abstract In this paper we describe two classes of ternary codes, determine their minimum weight

### Algebraic Computation Models. Algebraic Computation Models

Algebraic Computation Models Ζυγομήτρος Ευάγγελος ΜΠΛΑ 201118 Φεβρουάριος, 2013 Reasons for using algebraic models of computation The Turing machine model captures computations on bits. Many algorithms

### No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

### Section 1.1. Introduction to R n

The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

### NEUROEVOLUTION OF AUTO-TEACHING ARCHITECTURES

NEUROEVOLUTION OF AUTO-TEACHING ARCHITECTURES EDWARD ROBINSON & JOHN A. BULLINARIA School of Computer Science, University of Birmingham Edgbaston, Birmingham, B15 2TT, UK e.robinson@cs.bham.ac.uk This

### Open and Closed Sets

Open and Closed Sets Definition: A subset S of a metric space (X, d) is open if it contains an open ball about each of its points i.e., if x S : ɛ > 0 : B(x, ɛ) S. (1) Theorem: (O1) and X are open sets.

### Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Stationary random graphs on Z with prescribed iid degrees and finite mean connections Maria Deijfen Johan Jonasson February 2006 Abstract Let F be a probability distribution with support on the non-negative

### Using chaotic artificial neural networks to model memory in the brain. 1. Introduction

Using chaotic artificial neural networks to model memory in the brain Zainab Aram a, Sajad Jafari a,*, Jun Ma b, J. C. Sprott c a Biomedical Engineering Department, Amirkabir University of Technology,

### About the inverse football pool problem for 9 games 1

Seventh International Workshop on Optimal Codes and Related Topics September 6-1, 013, Albena, Bulgaria pp. 15-133 About the inverse football pool problem for 9 games 1 Emil Kolev Tsonka Baicheva Institute

### NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS N. K. Bose HRB-Systems Professor of Electrical Engineering The Pennsylvania State University, University Park P. Liang Associate Professor

### Fourier Series. A Fourier series is an infinite series of the form. a + b n cos(nωx) +

Fourier Series A Fourier series is an infinite series of the form a b n cos(nωx) c n sin(nωx). Virtually any periodic function that arises in applications can be represented as the sum of a Fourier series.

### By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

### 13 Infinite Sets. 13.1 Injections, Surjections, and Bijections. mcs-ftl 2010/9/8 0:40 page 379 #385

mcs-ftl 2010/9/8 0:40 page 379 #385 13 Infinite Sets So you might be wondering how much is there to say about an infinite set other than, well, it has an infinite number of elements. Of course, an infinite

### Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

### An Introduction to Neural Networks

An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,

### MA651 Topology. Lecture 6. Separation Axioms.

MA651 Topology. Lecture 6. Separation Axioms. This text is based on the following books: Fundamental concepts of topology by Peter O Neil Elements of Mathematics: General Topology by Nicolas Bourbaki Counterexamples

### Computationally Complete Spiking Neural P Systems Without Delay: Two Types of Neurons Are Enough

Computationally Complete Spiking Neural P Systems Without Delay: Two Types of Neurons Are Enough Rudolf Freund 1 and Marian Kogler 1,2 1 Faculty of Informatics, Vienna University of Technology Favoritenstr.

### can be extended into NC and RNC algorithms, respectively, that solve more

Extending NC and RNC Algorithms Nimrod Megiddo A technique is presented by which NC and RNC algorithms for some problems can be extended into NC and RNC algorithms, respectively, that solve more general

### A Brief Introduction to Systems Biology: Gene Regulatory Networks Rajat K. De

A Brief Introduction to Systems Biology: Gene Regulatory Networks Rajat K. De Machine Intelligence Unit, Indian Statistical Institute 203 B. T. Road Kolkata 700108 email id: rajat@isical.ac.in 1 Informatics

### ON GALOIS REALIZATIONS OF THE 2-COVERABLE SYMMETRIC AND ALTERNATING GROUPS

ON GALOIS REALIZATIONS OF THE 2-COVERABLE SYMMETRIC AND ALTERNATING GROUPS DANIEL RABAYEV AND JACK SONN Abstract. Let f(x) be a monic polynomial in Z[x] with no rational roots but with roots in Q p for

### 1 if 1 x 0 1 if 0 x 1

Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

### Introduction To Artificial Neural Network

Introduction To Artificial Neural Network Marcello Pelillo Department of Computer Science University of Venice http://www.unive.it http://www.dsi.unive.it DARPA Neural Network Study Over the history of

### Partitioning edge-coloured complete graphs into monochromatic cycles and paths

arxiv:1205.5492v1 [math.co] 24 May 2012 Partitioning edge-coloured complete graphs into monochromatic cycles and paths Alexey Pokrovskiy Departement of Mathematics, London School of Economics and Political