STUDY OF MULTISTAGE SIMD INTERCONNECTION NETWORKS
|
|
|
- Sophia Griffin
- 10 years ago
- Views:
Transcription
1 SDY OF MLISAGE SIMD INERCONNECION NEWORKS Howard Jay Siegel and S. Diane Smith urdue niversity, Electrical Engineering School West Lafayette, Indiana Abstract Four SIMD multistage networks - Feng's data manipulator, SARAN flip network, omega network, and indirect binary n-cube -- are analyzed~ hree parameters - topology, interchange box, and control structure -- are defined. It is shown that the latter three networks use equivalent topologies and differences in their capabilities result from the other parameters. An augmented data manipulator network using a modified control structure to perform more single pass interconnections than the other networks is presented. Some problems may be solved more effi- ciently if the 2 n processing elements of an SIMD machine can be partitioned into submachines of size 2 r. Single and multiple control partitioning are defined. he capabilities of these multistage networks to perform in these partioned environments are discussed. Introduction he age of the microprocessor has made feasible Large-scale multiprocessor computer systems with as many as 214 to 216 processors [10,14]. he SIMD (single instruction stream - multiple data stream) [6] mode of parallel processing provides a structure for muttimicroprocessor systems that is capable of performing a variety of tasks from matrix multiplication to image processing. ypically an SIMD machine has a control unit which broadcasts instructions to N=2 n processors, numbered (addressed) from 0 to N-I, and all active processors execute the same instruction at the same time (for a detailed model of SIMD machines see [12]). he interconnection network can be used to connect processors to memory modules. Alternatively, the network can be used to interconnect processing elements, where each processing element (E) consists of a processor and a memory module. hree parameters for characterizing multistage interconnection networks are defined: topology, interchange box, and control structure. wo types of SIMD machine partitioning are introduced: single control and multiple control. hese five features will be used to analyze and compare four multistage networks that have been proposed in the literature: Feng's data manipulator [5], Batcher's SARAN flip network [2], Lawrie's omega network [8], and ease's indirect binary n-cube [10]. For each network numerous interconnection capabilities that make the network very useful have been demonstrated [I-5,8,10]. he analyses presented here will allow an architect to determine which parameters are required for a network to perform the tasks the system is being designed to handle. he basic connection pattern underlying many of these SIMD multistage networks has also been suggested for MIMD (multiple instruction stream - multiple data stream) [6] machines [9,14]. his work was supported in part by the urdue Research Foundation under an XL Grant and a David Ross Grant (RF#8718). 223 Multistage Network arameters he topology of the network is the actual interconnection pattern used to connect the set of input lines and output lines. he generalized cube topology is ~ a SAGE 2 1 o Figure I: 8-item generalized cube network. defined to consist of n stages, where N=2 n is the number of input and output lines, as shown in Figure I. Each stage consists of an interconnection pattern of N lines and N/2 interchange boxes. For each box the upper and tower outputs are labeled with the same numbers as the upper and lower inputs, respectively. At stage i the input lines that differ only in their i-th bit position are paired together as inputs to an interchange box, for 0 < i < n. he data first passes through stage n-l, then--n-2, etc. At each stage the interchange boxes connect their input lines to their output lines to complete the connections from one end of the network to the other. he fundamental connection patterns which form the basis for the generalized cube topology are discussed in [11-13]. he interchange box is a two-input two-ouput device. Let the upper input and output lines be labeled i and the lower input and output Lines be Labeled j. he four legitimate states of an interchange box are: (1) straight - input i to output i, input j to output j; (2) exchange - input i to output j, input j to output i; (3) lower broadcast - input j to outputs i and j; and (4) upper broadcast - input i to outputs i and j [8]. here are also two illegitimate states: (I) lower conflict - inputs i and j both to output j; and (2) upper conflict - inputs i and j both to output i. Both of these states involve conflicts, the condition when both inputs attempt to feed the same output. A two-function interchange box is an interchange box capable of being in either the straight or exchange states. A four-function interchange box is an interchange box capable of being in any of the four legitimate states. he control structure of a network sets the states of the interchange boxes. Individual stage control uses the same control signal to set the state the interchange boxes in a stage, i.e., all the boxes in a given stage must be in the same state. Individual box control uses a separate control signal to set the state of each interchange box. artial stage control uses i+i control signals to control stage i, 0 < < n. In Figure 2, the letters a, b, c,
2 N 1 SAGE Figure 2: 8-item generalized cube network with partial stage control. d, e, and f each represent a different control line. An interchange box is controlled by the line corresponding to its label. his is used for "shift" permutations as will be discussed later. Comparisons of Network arameters he SARAN flip network [2] has N input lines and N output lines, both labeled from 0 to N-1. Input lines which differ in the i-th bit position are paired in the i-th stage of the network and the data passes through stage 0, then stage I, etc., as shown in Figure 3. hus, the flip network topology is identi- 1 N 3 p-;- u--~- r~- SAG Figure 3: 0 0 I \ -/ < ~ - -/-, 2 _~_ ~./_ j_ 3 \ \/V' / s "7 / \ item f11p network with Individual stage (flip) control [2]. o ~-o 3 4 p --3- cal to the generalized cube, except the order of the stages is reversed. he interchange boxes are the two-function type. he network has two control mechanisms, the flip control, generating "flip" permutations, and the shift control, generating "shift" permutations. An n-bit flip control vector is used for individual stage control. he shift control al- 2A. IA F--'--~ 0 0 F-----I 0 jiz 0 t ~-- --'~" 2 N ~2~ ~_=~ p 6.= '" 4 u '~- _>._.~. s u 2C SAGE o 1 2 Figure 4: 8-item fllp network with partial stage (shift) control [2]. OA, IAD etc. are the control signals. 224 tows shifts of 2 m places modulo 2, 0 < m < < n, using i+i control lines at stage i, 0 < i < n, as shown in Figure 4. he flip control is implemented by controlling all the shift control lines for a given stage by a single signal. heorem: he flip network using flip control is functionally equivalent to the generalized cube network with two-function boxes using individual stage control. roof: Consider I=in_1...ilio, an arbitrary input line to the flip network. If the flip control vector F equals (fn_1...flfo), where fi is 0 or I, the data on input line I is sent to output line I6F = (in_ 1 9 fn-1""" i1~fl, io~fo), where e is the EXCLSIVE-OR function [2]. his is because fi = 0 sets stage i to straight and fi = I sets it to exchange. Associate fi with stage i of the generalized cube network. hen, input I will be sent to output I O F since no matter what input line data from a particular source is on when it passes through stage i it will exchange if fi = 1 and will go straight through if fi = 0 (this can be proven formally by induction). I-I heorem: he generalized cube network with twofunction interchange boxes and partial stage control can perform the same 2 m shifts modulo 2, 0 < m < < n, that the flip network with shift control can perform. Control Signal ermutation OA IA I B 2A 2B 2C I rood8 I! 0 I 0 0 2mod 8 0 1! 1! 0 4rood I I l I rood4 1 I rood4 0 l! I rood2! identity able I : Shift control specifications for Figure 4 [2]. roof: In able 1, when a control signal is 1 the interchange boxes to which it is connected use the exchange state, otherwise they use the straight state. he partitioned stage control version of the generalized cube network can perform the same shifting tasks by equating the following pairs of control signals: f and OA; d and 1B; e and 1A; a and 2C; b and 2B; and c and 2A (see Figures 2 and 4). his can be proven formally for arbitrary N based on the "Cube to M2I" algorithm presented in [12]. I-I It should be noted that the generalized cube network with two-function interchange boxes and partial stage control is not directly equivalent to the flip network; e.g., the generalized cube network can connect 7 to 0 and 5 to 6, while the flip network can not. his problem arises due to reversed order of the stages of the two networks, causing data that pass through different control groupings at stage i in one network to pass through the same control grouping at stage i in the other network. hese theorems prove that any network whose topology is functionally equivalent to the generalized cube and has two- or four-function interconnection boxes with partitioned stage or individual box control can perform all the "flip" and "shift" permutations which the flip network can achieve. An N by N omega network, shown in Figure 5, consists of n identical stages, where each stage is a perfect shuffle interconnection followed by a column of N/2 four-function interchange boxes under individu-
3 I q I SAGE 2! 0 Figure 5: 8-item omega network [7]. at box control [7]. he shuffle connects output n_l...plo of stage i to input n_2...plon_l of stage i-1. Each interchange box in an omega network is controlled by the n-bit destination tags associated with the data on its input lines. heorem: he omega network is functionally equivalent to the generalized cube network with four-function interchange boxes and individual box control. roof: As a result of the shuffle interconnection prior to each column of interchange boxes, at stage j two inputs are paired if and only if they differ only in the j-th bit position (this can be proved formally by induction). he data passes through the stages of the omega and generalized cube networks in the same order. 0 Any network functionally equivalent to a generalized cube network with four-function boxes and individual box control can perform the same interconnection tasks the omega network can. Furthermore, any network functionally equivalent to a generalized cube network with two-function boxes and individual box control can perform the subset of these tasks that do not use data duplication, i.e., that do not use upper or lower broadcasts. An indirect binary n-cube network consists of N/2 two-function interchange boxes under individual box control connected so that the two inputs to an interchange box at stage i differ only in the i-th bit position and data passes through stage O, then stage 1, etc. as shown in Figure 6 [I0]. his is the reverse of the order of the stages in the generalized cube network. t R2- o I ~ 3 SAGE O' 1 2 Figure 6: 8-item Indirect binary n-cube network [9]. heorem: he indirect binary n-cube network is functionally equivalent to the generalized cube network with two-function boxes and individual box control if address transformations are used. 225 roof: o show that the two networks are not functionally equivalent without address transformations note that the n-cube can not connect 0 to 5 and I to 7 while the generalized cube can. his occurs because as data passes thro~h these networks the data items that are paired in an interchange box depend on the order of the stages of the network. Consider an arbitrary set of input and output connections which the n-cube can perform. o perform these connections on the generalized cube the following address transformations are necessary. ransform the number (address) of each input and output line from n_l...plo to Ol---n-1- se this same transformation to specify any desired set of input and output connections. For example, for N=8 the n-c~e can perform generalized cube connections 0 to 5 and 1 to 7 by using the transformations, i.e., it can connect 0 to 5 and 4 to 7. he correctness of this address transformation method can be proven formally by realizing that the renumbering causes the generalized cube network to first pair data lines whose numbers differ only in the O-th bit, then the 1-st bit, etc., just the way the n-cube does. (In [10] it is noted that the n-cube "is similar... to the omega network itself with renumbering.") D his theorem proves that any network functionally equivalent to a generalized cube network with two- or four-function boxes and individual box control can perform the same tasks the n-cube network can perform if an address transformation is used. he data manipulator network consists of n stages of N cells, where for 0 < j < N and 0 < i < n, there are three sets of intercon~ections going--from cell j in stage i to stage i-i: to cell j + 2 i mod N, to cell j - 2 i mod N, and to cell j [5]. Each stage of the network is controlled by a pair of signals, specifying two of H1, H2, 1, 2, DI, and D2, as described in Figure 7. he topology is such that at - a X /e b ~/,f /k O~ , ~ 0 0 N ~-'.~:'"---:~,~2 6 -p p f - 6 r v I ~ eefg//~ a f b i "J n/- --m h \d S~GE 2 I 0 Figure 7: B-item data ~nlpulator net~rk [5]. he dashed lines represent the control line Interconnections. he dotted lines represent the H control line interconnections. he solid lines represent the O control line interconnections. For stage 1, I, D I, and H I control those cells whose [-th bit is O, and 2, D2, and H 2 control those cells whose i-th bit Is!.
4 stage i cell x is connected to cell x and to the cell which differs from x in only the i-th bit position, just as the topology of the generalized cube allows. In addition, the data manipulator connects x to both x + 2 i mod N and X - 2 i mod N, one of which differs from x in more than the i-th bit position. he stages are ordered from n-1 to 0, as in the generalized cube topology. An augmented data manipulator (ADM) is a data manipulator with individual ceil control, i.e., each ceil can get none, one, or two of the signals H,, and D. he effects of these controls are as described in Figure 7 for the regular data manipulator, except that each cell can be controlled independently. heorem: he ADM can perform all the interconnections the generalized cube with four-function boxes and individual box control can achieve and some that the generalized cube can not achieve. roof: Consider simulating the four-function box at stage i whose upper input is x and lower input is y. Let the subscript x denote the ADM control signal for cell x at stage i, and let the subscript y be used similarly. hen to perform the four functions use the following control signals: straight - HxHy, exchange - Dxy, lower broadcast - yhy, and upper broadcast - DxH x- In addition, the ADM network can perform interconnections the generalized cube cannot, such as connecting 3 to 3, 5 to 2, and 6 to 6, for N=8. artitioning Some computations may be more efficiently executed if the N E's of the SIMD machine can be partitioned into 2 n-r groups of 2 r E's. Each group may then behave like a smaller SIMD computer, and the system resources may be more efficiently utilized. As an example, consider an algorithm that is composed of k independent subalgorithms, each of which can be executed using 2 r or fewer E's. Suppose that each subalgorithm requires at most j time periods to execute. he algorithm could be performed one subalgorithm at a time, using at most 2 r E's at any time. he computation would require at most j*k time periods, and 2 n - 2 r E's would be unused. If on the other hand, the 2 n E's were partitioned into 2 n-r groups of 2 r E's, 2 n-r of the k subatgorithms could be performed in parallel (assuming that all groups use the same instruction stream or that multiple control units are available). Now, the computer is more fully utilized, and if k < 2 n-r, the algorithm can be executed in j time periods. he number of E's that can be utilized for a given computation is highly algorithm dependent. wo classes of partitioning for an SIMD machine will be considered: (I) use the system as one to 2 n-r SIMD machines, each having 2 r E's, each performing the same algorithm (using the same instruction stream), but each on a different data set; and (2) use the sys- tem as one to 2 n-r SIMD machines, each of size 2 r, where each may be performing a different algorithm on a different data set. he first class will be referred to as single control partitioning, since only one control m ~ will serve all of the E's. he second class will be referred to as multiple control partitioning, since a separate contro m~nism wi~-ff be required for each instruction stream. Note that multiple control partitioning includes having, for ex- 226 ample, four groups following one instruction stream, while four other groups each follow their own instruction streams. Furthermore, it may be the case that two or more groups may be executing different SIMD programs on the same logical data set, although each group would have its own physical copy. Note that the single control class is a special case of the multiple control class, where all the groups use copies of the same instruction stream. When the machine is partitioned, each submachine needs an interconnection network. An interconnection network may be described as a set of interconnection functions, where each interconnection function is a permutation (bijection) on the set of --~--a-d-d-resses [11]. If f is an interconnection function, then "f(x) = y" means that the function connects input x to output y. An n stage shuffle-exchange network, such as the omega network [8] shown in Figure 5, can be defined with two interconnection functions. For N E's, let the binary representation of a E address be = n-1 " " " lo" hen, the perfect shuffle is defined as [11] Shuff(e(n_ln_2...plO) = n_2...plon_l. he exchange function is defined as [11] Exchange(n_In_2...plO) = n_in_2...plp O, whereto denotes the complement of O" Each stage of the network consists of a shuffle and an optional exchange. artitioning a multistage shuffle-exchange network is discussed in [73. hat discussion is extended here and is related to the two types of partitioning defined above. heorem: Let the system be partitioned into 2 n-r sub- systems of 2 r E's each. If only a shuffle and no exchange is allowed on each of the first n-r stages of the network, the network can serve as a complete r stage shuffle-e~change network for each of the 2 n-r groups of E's of the partitioned machine. roof: Let the addresses of all E's in each partition a~"a'~--the same n-r most significant bits. Let Ioc(i) () refer to the location of the data originally in E() after passing through i stages of the network. After n-r shuffles with no exchanges, the intermediate address of the data that was initially in is Ioc(n-r)() = pn_(n_r)_l..plon_l...pn_(n_r) = r_l...plon_l...p r. Lastly, r shuffle-exchange steps follow, and the final address of the data is Ioc(n) () = n_l...rdr_l...dldo, where d i = i or i" All E's with addresses that have the same n-r most significant bits will be in the same partition of the machine. Analogous arguments may be made to show that the E's may be partitioned based on any common set of n-r bits of the E addresses. C] If the shuffle-exchange network employs individual box control, it is equally applicable for both single control partitioning and multiple control partitioning. Consider a 3 stage shuffle-exchange network that is to be partioned into two groups of four E's each. Let the two groups be GI = (0, I, 2, 3) and G2 = (4, 5, 6, 7). Consider the case where only GI is using the network. Let c be a control matrix for the network of Figure 5. c(i,j) controls the i-th row exchange module at stage (n-1)-j. An exchange occurs at that modu-
5 le if and only if c(i,j) = 1. o control the network for GI, the matrix is c = i c(0,1) - c(!,211 c(2,1) Note that only the control bits that relate to the submachine that is using the network need be specified. All others may be ignored. o control the network for G2 alone, the matrix is [! l c(1,1) c = c(2,2) c(3,1) c(3,2)j if G1 and G2 are to share the network under a single control partitioning scheme, set c(0,i) = c(1,1), c(2,1) = c(3,1), c(0,2) = c(2,2), and c(1,2) = c(3,2). For multiple control partitioning, GI is controlled by c(0,i), c(2,1) F c(0,2), and c(1,2), while G2 is controlled by c(1,1), c(3,1), c(2,2), and c(3,2). hus, if c(i,o)=o, 0 < i < 4, the two partitions can operate independently a-rid asynchronously. heorem: An n stage shuffle-exchange network with individual box control c(i,j) may be used with multiple or single control partitioning. roof: Divide the 2 n E's into 2 n-r groups of 2 r and that the E addresses of a group G are consecutive and range from 2 k to 2k + 2 r - I, where 2 k = j2 r, j is an integer, 0 ~ j < 2 n-r. he n-r most significant bits, n_l...pr, of all E's in G are the same. he first n-r stages allow only a shuffle and no exchange, so the entries of the first n-r columns of c are all O, and loc (n-r) () = r_1...plon.1...pr. Stage r-1 is the first to allow an exchange to follow the shuffle. Let c(a, n-r) be a control bit for stage r-1 where A = an_2an_3...alao, since there are 2 n-1 rows of c. hen c(a, n-r) controls E's belonging to G if A = an_2...an_rn_l...pr. At stage r-(l+j), I < j < r, c(a, n-r+j) controls E's of G if A = an_2...an_r+jn_1...raj_1...a O. For single and multiple control partitioning, if the group of E's is known, the control signals that refer to that group are given by the above expression. [] A disadvantage of the scheme is that the c matrix requires n x 2 n-1 storage locations, which may become prohibitive, both for storage and programming. In [7] an approach is presented that allows each exchange box to calculate its own control. However, this network can not perform ali the permutations of a shuffleexchange network. he omega network [8] passes destination tags, thus effectively passing each row of the c matrix with the data. As an alternative to the multistage implementation of the shuffle-exchange, a "recirculating" implementation may be used [12,13]. his would allow each E to store one row, n bits, of the control matrix or to compute its own control bits based on its address or the data it contains. In the SARAN flip network (Figure 3), stage i can perform the cube-type interconnection function ~(~), where c(i) () = n_ln_2...i+lii_l...plo, for 0 < i < n [11]. he'shuffle-exchange and the flip network have equivalent topologies, and the partitioning proofs are similar. 227 heorem: he SARAN flip network may be partitioned under single control partitioning. roof: he control vector F of the flip network allows the partitioning of the network. In order to parti- tion the machine into 2 n-r groups of 2 r E's, where the addresses in each group have the same n-r most significant bits, the control vector used is F = ( fr-1 "'" fl fo )" For an arbitrary E address,, the first r stages of the network produce loc(r) () = n.1...rdr_l...do, where d i =i or i" If the last n-r stages allow no exchanges, the address of the final destination of the data is 1oc(n) () = n_1...rdr_l...do. hus, all E's with the same n-r most significant bits will be in the same partition of the reconfigurable machine. Again, this procedure can be generalized to group E's that have any set of n-r bits in common. [] For single control partitioning, the individual stage control of the flip network negates the need for equating control bits as in the shuffle-exchange case. hus, for single control partitioning, the flip network can act as up to 2 n'r flip networks of size 2 r. If up to 2 n-r groups of E's have access to the network at the same time to do different tasks, as in multiple control partitioning, then the control structure is not general enough. he flip network shift control cannot be effectively used for either type of partitioning. heorem: he indirect binary n-cube with individual box control can be partitioned under single or multiple control partlt~onlng. " " " roof: he proof is analogous to that for the shuffle-exc~anqe network, since the two networks have the same topologies. he only difference between the two cases is the positioning of the control matrix entries which control each position. [] he data manipulator network is based on the 2n "M2I" interconnection functions t+i (j) = (j + 2 i) rood N t_i (j) = (j - 2 i) rood N for 0 ~ j < N, 0 < i < n [11]. he data manipulator is an n stage n~twork where stage i can perform t+i (i.e., D), t_i (i.e., ), and no change (i.e., H), as shown in Figure 7. Recall that for stage 21, I, D1, and H I controt those E's whose i-th bit is O, and 2, D2, and H 2 control those E's whose i-th bit is I. heorem: he data manipulator network may be divided into 2 n-1 groups of 2 adjacent E's for either single or multiple controi partitioning. roof: Let the two E's be E(i) and E(i+I mod N), for i even. Since N = 2, the network need consist only of t±o (note that for N = 2, t+o is the same as t_o). his can be obtained using DI 2 on stage 0 and H 1H 2 on all other stages. Similarly, E(i) and E(i+I mod N), for i odd, can exchange data if 1D 2 is used on stage 0 and H I H 2 is used on all other stages. his is true for multiple control partitioning since E's not exchanging data can ignore the network. []
6 As an additional point, consider pairing two trary E's. arbi- I " ~ / k heorem: For the data manipulator network, for the control functions given as in Figure 7, if only two E's use the network at one time, then any two E's can exchange data in one pass through the network. roof: For N E's, let E(A) and E(B) exchange data, w-h-ere A = a(n-1)...a(o), B = b(n-1)...b(o), and A ~ B. In order for E(A) and E(B) to exchange data, transform the addresses such that a(n-1)'...a(o)' = b(n-1)...b(o) and b(n-1)'...b(o)' = a(n-1)...a(o). An algorithm to do so is as follows. For stage i, i = n-l, n-2,... O: If a(i) = b(i), then no change (H I H2). If a(i) = 0 and b(i) = 1, then apply t+i to the data from A, and apply t_i to the data from B (D I 2). hus, a(i)' = a(i) + I = I = b(i), and b(i)' = b(i) - 1 = 0 = a(i). If a(i) = I and b(i) = 0, then apply t_i to the data from A, and apply t+i to the data from B (D I 2). hus, a(i)' = a(i) - I = 0 = b(i), and b(i)' = b(i) + I = I = a(i). After the i-th stage, loc(i) (A) = b(n-1)...b(n-i)a(n-i-1)...a(o), and Loc(i) (B) = a(n-1)...a(n-i)b(n-i-1)...b(o). Finally, after n stages, loc(n) (A) = b(n-1)...b(o), and Loc(n) (B) = a(n-1)...a(o). [] his algorithm may be adapted for the n stage shuffle-exchange, the indirect binary n-cube, and the flip networks. For the shuffle-exchange, if bit i of the two E addresses, A and B, are the same, no exchange occurs at stage i, and c(a, n-l-i) = 0, 0 < A < 2 n-1. If the two bits differ, then an exchange oc- curs, and c(a, n-l-i) = 1, 0 < A < 2 n-1. For the flip network, at stage i, if the--i-th bits of A and B are the same, then f = 0, and no exchange occurs. If the i two bits differ, then fi = I, and an exchange occurs. he algorithm for the n-cube is similar. For the algorithm, the data manipulator has behaved as a generalized cube network, so the similarities of the algorithms are not unexpected. If a restriction is made, the data manipulator network can be partitioned such that each group has a complete data manipulator network at its disposal. For N = 8, let the machine be partitioned into 21 groups of 22 E's. Let one group contain E's O, 2, 4, and 6, while the other group contains E's I, 3, 5, and 7. he control structure of this network is not general enough to allow multiple control paritioning, and so only single control partitioning will be considered. If only the straight through path ( H I H 2 ) is allowed at stage 0, then each group of 4 E's has a Log 2 4 stage data manipulator network available to it. he I level now corresponds to a 0 level for each partitioned group, since each E address is two apart from its neighbor's addresses, t+1 moves data from position 0 to position 2, from 2 to 4, from 4 to 6, and from 6 to 0. Similarly, the 2 stage corresponds to a I stage for the group. Figure 8 illustrates this process. heorem: Let the data manipulator be partitioned into 2 n-r groups of 2 r E's so that all E's in a partition have the same n-r low order bits. nder this restric- 228 N SAGE l 0 _4_ Figure 8: /~Z~'~"... ~2 I... 2 I 2 I 0 8-item data manipulator network for the use of four E'sonly (0,2,4,and 6). Note that for N=8 t+2=t_2 (i.e., =D). tion, the data manipulator may be partitioned under single control partitioning. roof: Each group is composed such that consecutive E's within the group have E addresses that differ by 2 n-r. hen stage n-r behaves as a stage 0 for the group, stage (n-r+1) behaves as a stage 1, and so on, and lastly, stage n-1 behaves as a stage (r-l) for the group of 2 r E's. While this method does group 2 r E's together, the E addresses are not consecutive, and this may cause confusion for the programmer. A "Logical" t+o inter- connection for the group 0, 2n-r,..., (2 r - 1)2 n-r means to send data from 0 to 2 n-r, 2 n-r to 2n-r2, and so forth. It does not mean to send data from 0 to I, etc. Also, this Logical t+o interconnection is a "physical" t+(n_r) interconnection on the network. here are means of allowing 2 r E's with consecutive addresses to be grouped together for this network and still have a complete network available. However, problems arise due to missing connections, such as the 0 to 2 r - I connection for t+o. In summary, for multiple control partitioning, the data manipulator network is adequate to divide the E's into 2 n-1 groups of two E's where a group consists of E(i) and E(i+I mod N), 0 < i < N. With single control partitioning, certain sets of 2 r E's may have an r stage network available, but the grouping of 2 r E's is more restricted than with the indirect binary n-cube, the flip, and shuffle-exchange networks. Like the flip network, the control structure here is not general enough to allow multiple control partitioning. he ADM network has the additional control capability to allow multiple control. Conclusions hree parameters for classifying multistage SIMD machine interconnection networks have been presented: topology, interchange box, and control structure. he generalized cube topology has been defined and used as a standard to which to compare other multistage networks. Six states for interchange box functions were described and three types of control schemes were defined. I I"I
7 It was shown that the SARAN flip network, the indirect binary n-cube network, and the omega network all use topologies equivalent to the generalized cube. he omega network sends its data through stage n-l, then n-2, etc., while the other two networks use the reverse order. he omega network uses four-function interchange boxes, while the other two networks use two-function boxes. he flip network uses partial stage control and the other two networks use individual box control. he networks may be ordered in terms of increasing capability as follows: flip network, indirect binary n-cube, and omega network. Due to the reversed order of the stages, the omega network may have to use address transformations to perform all of the n-cube interconnections. One possible alternative to address transformations is to construct the network so that it is bidirectional. hus, a system architect beginning with the generalized cube topology can add features to improve the flexibility of the system network. he information presented about the flip, n-cube and omega networks in [1-3,7,9] can be used in conjunction with the information presented here to do a cost-performance analysis based on the intended uses of the architecture being designed. he augmented data manipulator (ADM) network was introduced. It is based on the data manipulator network, but it uses a more flexible control scheme. It was shown that this network can perform any interconnection task the generalized cube network with four-function boxes and individual box control can perform (hence, any task the flip, omega, or n-cube can perform). In addition, it was shown that the ADM network can perform tasks the generalized cube can not (hence, tasks the flip, omega, and n-cube networks can not perform). As in the case of choosing features for a generalized cube network, a system architect must decide if the added flexibility of the ADM network is worth the additional cost to construct it. wo ways to partition an SIMD machine consisting of 2 n processing elements into two to 2 n-r groups of 2 r E's, each group acting as an SIMD machine of size 2 r have been defined. One way, single control partitioning, used a single control to broadcast instructions to art groups. hus, the same SIMD machine program may be executed on several different data sets simultaneously (e.g., multiplying two pairs of matrices). he other way, multiple control partitioning, assumes the availability of additional control units so that each 2 r E submachine may have its own instruction stream. he data being acted on by different submachines may be copies of the same data file (e.g., processing two copies of the same image to find different features). he way in which various types of networks may function in both of these partitioning environments was examined. nder single control partitioning, the shuffle-exchange, n-cube, and flip networks are readi- ly partitioned into 2 n-r groups of 2 r E's. he data manipulator network can group certain sets of 2 r E's, but the ability to connect sets of 2 r E's with consecutive addresses is restricted. he flip and data manipulator networks do not have a control structure that is general enough to accommodate multiple control partitioning. he omega, n- cube, and ADM network have control structures which atlow them to function in a multiple control environment. [2] [3] [4] [5] [6] [7] [8] [9] [10] K. E. Batcher, "he flip network in SARAN," 1976 Int. Conf. on arallel rocessing, (Aug. 1976), pp L. H. Bauer, "Implementation of data manipulating functions on the SARAN associative processor," 1974 Sagamore Computer Conf. on arallel rocessing, (Aug., 1974), pp Feng, arallel rocessing Characteristics and Implementation of Data Manipulating Functions, Dept. of Elect-rmca'F---and Computer Englneerlng, Syracuse niversity, RADC-R (Jut., 1973).. Feng, "Data manipulating functions in parallel processors and their implementations," IEEE rans. Comput., Vol. C-23 (Mar., 1974), pp M. J. Flynn, "Very high-speed computing systems," roceedings of the IEEE, Vo{. 54 (Dec., 1966), pp Lang and H. S. Stone, "A shuffle-exchange network with simplified control," IEEE rans. Comput., Vol. C-25 (Jan., 1976), pp. 5~. - - D. Lawrie, "Access and alignment of data in an array processor," IEEE rans. Comput., Vot. C-24 (Dec., 1975), pp G. J. Lipovski and A. ripathi, "A reconfigurable varistructure array processor," 1977 Int. Conf. on arallel rocessing (Aug., 1977), pp. ~5--i-7-4. M. C. ease, "he indirect binary n-cube microprocessor array," IEEE rans. Comput., Vol. C-26 (May, 1977), pp. 4-5"~=-~73. [11] H. J. Siegel, "Analysis techniques for SIMD machine interconnection networks and the effects of processor address masks," IEEE rans. Comput., Vol. C-26 (Feb., 1977), pp. 1"5"-3r~61. [12] H. J. Siegel, "Single instruction stream - multiple data stream machine interconnection network design," 1976 Int. Conf. on arallel rocessing (Aug., 1976), pp [13] H. J. Siegel, "he universality of various types of SIMD machine interconnection networks," Fourth Annual Symposium on Computer Architecture (Mar., -i"-(~'ff), pp [14] H. Sullivan,. R. Bashkow, and K. Klappholz, "A large scale homogeneous, fully distributed parallel machine," Fourth Annual Symposium on Computer Architecture (Mar., 1977), pp References [1] K. E. Batcher, "he multidimensional access memory in SARAN," IEEE rans. Comput., Vol. C-26 (Feb., 1977), pp
Tolerating Multiple Faults in Multistage Interconnection Networks with Minimal Extra Stages
998 IEEE TRANSACTIONS ON COMPUTERS, VOL. 49, NO. 9, SEPTEMBER 2000 Tolerating Multiple Faults in Multistage Interconnection Networks with Minimal Extra Stages Chenggong Charles Fan, Student Member, IEEE,
System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1
System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect
Chapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe
Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!
Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel
Chapter 2. Multiprocessors Interconnection Networks
Chapter 2 Multiprocessors Interconnection Networks 2.1 Taxonomy Interconnection Network Static Dynamic 1-D 2-D HC Bus-based Switch-based Single Multiple SS MS Crossbar 2.2 Bus-Based Dynamic Single Bus
Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.
Some Computer Organizations and Their Effectiveness Michael J Flynn IEEE Transactions on Computers. Vol. c-21, No.9, September 1972 Introduction Attempts to codify a computer have been from three points
Computer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
Components: Interconnect Page 1 of 18
Components: Interconnect Page 1 of 18 PE to PE interconnect: The most expensive supercomputer component Possible implementations: FULL INTERCONNECTION: The ideal Usually not attainable Each PE has a direct
AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC
AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC W. T. TUTTE. Introduction. In a recent series of papers [l-4] on graphs and matroids I used definitions equivalent to the following.
Solution of Linear Systems
Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems
Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks
Chapter 12: Multiprocessor Architectures Lesson 04: Interconnect Networks Objective To understand different interconnect networks To learn crossbar switch, hypercube, multistage and combining networks
MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL
MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL Sandeep Kumar 1, Arpit Kumar 2 1 Sekhawati Engg. College, Dundlod, Dist. - Jhunjhunu (Raj.), [email protected], 2 KIIT, Gurgaon (HR.), Abstract
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Direct Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 12 Block Cipher Standards
Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language
Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,
Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
Load Balancing between Computing Clusters
Load Balancing between Computing Clusters Siu-Cheung Chau Dept. of Physics and Computing, Wilfrid Laurier University, Waterloo, Ontario, Canada, NL 3C5 e-mail: [email protected] Ada Wai-Chee Fu Dept. of Computer
Improved Irregular Augmented Shuffle Multistage Interconnection Network
Improved Irregular Augmented Shuffle Multistage Interconnection Network Sandeep Sharma Department of Computer Science & Engineering Guru Nanak Dev University, Amritsar, 4, India Dr. K.S.Kahlon Department
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Topological Properties
Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
Analysis of Various Crosstalk Avoidance Techniques in Optical Multistage Interconnection Network
International Journal of P2P Network Trends and Technology- VolumeIssue2-2 Analysis of Various Crosstalk Avoidance Techniques in Optical Multistage Interconnection Network Sehajpal Kaur, Rajan Vohra 2,
Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng
Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption
A Dynamic Programming Approach for Generating N-ary Reflected Gray Code List
A Dynamic Programming Approach for Generating N-ary Reflected Gray Code List Mehmet Kurt 1, Can Atilgan 2, Murat Ersen Berberler 3 1 Izmir University, Department of Mathematics and Computer Science, Izmir
ON THE COMPLEXITY OF THE GAME OF SET. {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu
ON THE COMPLEXITY OF THE GAME OF SET KAMALIKA CHAUDHURI, BRIGHTEN GODFREY, DAVID RATAJCZAK, AND HOETECK WEE {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu ABSTRACT. Set R is a card game played with a
3.Basic Gate Combinations
3.Basic Gate Combinations 3.1 TTL NAND Gate In logic circuits transistors play the role of switches. For those in the TTL gate the conducting state (on) occurs when the baseemmiter signal is high, and
C H A P T E R. Logic Circuits
C H A P T E R Logic Circuits Many important functions are naturally computed with straight-line programs, programs without loops or branches. Such computations are conveniently described with circuits,
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
A Systolic Algorithm to Process Compressed Binary Images
A Systolic Algorithm to Process Compressed Binary Images Fikret Ercal, Mark Allen, and Hao Feng University of Missouri Rolla Department of Computer Science and Intelligent Systems Center Rolla, MO 65401
Interconnection Networks
CMPT765/408 08-1 Interconnection Networks Qianping Gu 1 Interconnection Networks The note is mainly based on Chapters 1, 2, and 4 of Interconnection Networks, An Engineering Approach by J. Duato, S. Yalamanchili,
Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.
Matrix Algebra A. Doerr Before reading the text or the following notes glance at the following list of basic matrix algebra laws. Some Basic Matrix Laws Assume the orders of the matrices are such that
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Memory Systems. Static Random Access Memory (SRAM) Cell
Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
UNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
Computer Network. Interconnected collection of autonomous computers that are able to exchange information
Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.
GENERATING SETS KEITH CONRAD
GENERATING SETS KEITH CONRAD 1 Introduction In R n, every vector can be written as a unique linear combination of the standard basis e 1,, e n A notion weaker than a basis is a spanning set: a set of vectors
Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani
Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani Overview:
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Secure Physical-layer Key Generation Protocol and Key Encoding in Wireless Communications
IEEE Globecom Workshop on Heterogeneous, Multi-hop Wireless and Mobile Networks Secure Physical-layer ey Generation Protocol and ey Encoding in Wireless Communications Apirath Limmanee and Werner Henkel
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
A permutation can also be represented by describing its cycles. What do you suppose is meant by this?
Shuffling, Cycles, and Matrices Warm up problem. Eight people stand in a line. From left to right their positions are numbered,,,... 8. The eight people then change places according to THE RULE which directs
Continued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
Why? A central concept in Computer Science. Algorithms are ubiquitous.
Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online
Resource Allocation Schemes for Gang Scheduling
Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor
Interconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
An Adaptive Decoding Algorithm of LDPC Codes over the Binary Erasure Channel. Gou HOSOYA, Hideki YAGI, Toshiyasu MATSUSHIMA, and Shigeichi HIRASAWA
2007 Hawaii and SITA Joint Conference on Information Theory, HISC2007 Hawaii, USA, May 29 31, 2007 An Adaptive Decoding Algorithm of LDPC Codes over the Binary Erasure Channel Gou HOSOYA, Hideki YAGI,
Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 5, Issue, Ver. III (Jan - Feb. 205), PP 0- e-issn: 239 4200, p-issn No. : 239 497 www.iosrjournals.org Design and Analysis of Parallel AES
Interconnection Networks
Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode
Integer Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 [email protected] March 16, 2011 Abstract We give
Systolic Computing. Fundamentals
Systolic Computing Fundamentals Motivations for Systolic Processing PARALLEL ALGORITHMS WHICH MODEL OF COMPUTATION IS THE BETTER TO USE? HOW MUCH TIME WE EXPECT TO SAVE USING A PARALLEL ALGORITHM? HOW
Linear Codes. Chapter 3. 3.1 Basics
Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length
Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages
Behavior Analysis of Multilayer Multistage Interconnection Network With Extra Stages Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Computer
2) Write in detail the issues in the design of code generator.
COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction
Performance of networks containing both MaxNet and SumNet links
Performance of networks containing both MaxNet and SumNet links Lachlan L. H. Andrew and Bartek P. Wydrowski Abstract Both MaxNet and SumNet are distributed congestion control architectures suitable for
Recall that two vectors in are perpendicular or orthogonal provided that their dot
Orthogonal Complements and Projections Recall that two vectors in are perpendicular or orthogonal provided that their dot product vanishes That is, if and only if Example 1 The vectors in are orthogonal
Hardware Assisted Virtualization
Hardware Assisted Virtualization G. Lettieri 21 Oct. 2015 1 Introduction In the hardware-assisted virtualization technique we try to execute the instructions of the target machine directly on the host
Cartesian Products and Relations
Cartesian Products and Relations Definition (Cartesian product) If A and B are sets, the Cartesian product of A and B is the set A B = {(a, b) :(a A) and (b B)}. The following points are worth special
= 2 + 1 2 2 = 3 4, Now assume that P (k) is true for some fixed k 2. This means that
Instructions. Answer each of the questions on your own paper, and be sure to show your work so that partial credit can be adequately assessed. Credit will not be given for answers (even correct ones) without
Notes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
New binary representation in Genetic Algorithms for solving TSP by mapping permutations to a list of ordered numbers
Proceedings of the 5th WSEAS Int Conf on COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS AND CYBERNETICS, Venice, Italy, November 0-, 006 363 New binary representation in Genetic Algorithms for solving
Parallel Algorithm for Dense Matrix Multiplication
Parallel Algorithm for Dense Matrix Multiplication CSE633 Parallel Algorithms Fall 2012 Ortega, Patricia Outline Problem definition Assumptions Implementation Test Results Future work Conclusions Problem
FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers
FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers Bogdan Mătăsaru and Tudor Jebelean RISC-Linz, A 4040 Linz, Austria email: [email protected]
FAREY FRACTION BASED VECTOR PROCESSING FOR SECURE DATA TRANSMISSION
FAREY FRACTION BASED VECTOR PROCESSING FOR SECURE DATA TRANSMISSION INTRODUCTION GANESH ESWAR KUMAR. P Dr. M.G.R University, Maduravoyal, Chennai. Email: [email protected] Every day, millions of people
a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors
2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,
Offline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]
Codes for Network Switches
Codes for Network Switches Zhiying Wang, Omer Shaked, Yuval Cassuto, and Jehoshua Bruck Electrical Engineering Department, California Institute of Technology, Pasadena, CA 91125, USA Electrical Engineering
PROBLEMS (Cap. 4 - Istruzioni macchina)
98 CHAPTER 2 MACHINE INSTRUCTIONS AND PROGRAMS PROBLEMS (Cap. 4 - Istruzioni macchina) 2.1 Represent the decimal values 5, 2, 14, 10, 26, 19, 51, and 43, as signed, 7-bit numbers in the following binary
S on n elements. A good way to think about permutations is the following. Consider the A = 1,2,3, 4 whose elements we permute with the P =
Section 6. 1 Section 6. Groups of Permutations: : The Symmetric Group Purpose of Section: To introduce the idea of a permutation and show how the set of all permutations of a set of n elements, equipped
Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
( ) which must be a vector
MATH 37 Linear Transformations from Rn to Rm Dr. Neal, WKU Let T : R n R m be a function which maps vectors from R n to R m. Then T is called a linear transformation if the following two properties are
(Refer Slide Time: 01.26)
Discrete Mathematical Structures Dr. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture # 27 Pigeonhole Principle In the next few lectures
Lecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
Efficient Data Recovery scheme in PTS-Based OFDM systems with MATRIX Formulation
Efficient Data Recovery scheme in PTS-Based OFDM systems with MATRIX Formulation Sunil Karthick.M PG Scholar Department of ECE Kongu Engineering College Perundurau-638052 Venkatachalam.S Assistant Professor
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],
A Locally Cache-Coherent Multiprocessor Architecture
A Locally Cache-Coherent Multiprocessor Architecture Kevin Rich Computing Research Group Lawrence Livermore National Laboratory Livermore, CA 94551 Norman Matloff Division of Computer Science University
A PPENDIX G S IMPLIFIED DES
A PPENDIX G S IMPLIFIED DES William Stallings opyright 2010 G.1 OVERVIEW...2! G.2 S-DES KEY GENERATION...3! G.3 S-DES ENRYPTION...4! Initial and Final Permutations...4! The Function f K...5! The Switch
A simple algorithm with no simple verication
A simple algorithm with no simple verication Laszlo Csirmaz Central European University Abstract The correctness of a simple sorting algorithm is resented, which algorithm is \evidently wrong" at the rst
Cryptography and Network Security
Cryptography and Network Security Spring 2012 http://users.abo.fi/ipetre/crypto/ Lecture 3: Block ciphers and DES Ion Petre Department of IT, Åbo Akademi University January 17, 2012 1 Data Encryption Standard
SECTIONS 1.5-1.6 NOTES ON GRAPH THEORY NOTATION AND ITS USE IN THE STUDY OF SPARSE SYMMETRIC MATRICES
SECIONS.5-.6 NOES ON GRPH HEORY NOION ND IS USE IN HE SUDY OF SPRSE SYMMERIC MRICES graph G ( X, E) consists of a finite set of nodes or vertices X and edges E. EXMPLE : road map of part of British Columbia
DNA Data and Program Representation. Alexandre David 1.2.05 [email protected]
DNA Data and Program Representation Alexandre David 1.2.05 [email protected] Introduction Very important to understand how data is represented. operations limits precision Digital logic built on 2-valued
Fault Tolerance in the Block-Shift Network
IEEE TRANSACTIONS ON RELIABILITY, VOL. 50, NO. 1, MARCH 2001 85 Fault Tolerance in the Block-Shift Network Yi Pan, Member, IEEE Abstract The Block Shift Network (BSN) is a new topology for interconnection
ALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology)
ALLIED PAPER : DISCRETE MATHEMATICS (for B.Sc. Computer Technology & B.Sc. Multimedia and Web Technology) Subject Description: This subject deals with discrete structures like set theory, mathematical
Factoring Algorithms
Institutionen för Informationsteknologi Lunds Tekniska Högskola Department of Information Technology Lund University Cryptology - Project 1 Factoring Algorithms The purpose of this project is to understand
Formal Languages and Automata Theory - Regular Expressions and Finite Automata -
Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March
An Efficient RNS to Binary Converter Using the Moduli Set {2n + 1, 2n, 2n 1}
An Efficient RNS to Binary Converter Using the oduli Set {n + 1, n, n 1} Kazeem Alagbe Gbolagade 1,, ember, IEEE and Sorin Dan Cotofana 1, Senior ember IEEE, 1. Computer Engineering Laboratory, Delft University
Interconnection Network Design
Interconnection Network Design Vida Vukašinović 1 Introduction Parallel computer networks are interesting topic, but they are also difficult to understand in an overall sense. The topological structure
COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012
Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about
Compression algorithm for Bayesian network modeling of binary systems
Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the
