A prototype data flow computer with token labelling

Size: px
Start display at page:

Download "A prototype data flow computer with token labelling"

Transcription

1 A prototype data flow computer with token labelling by IAN WATSON and JOHN GURD University of Manchester Manchester, England INTRODUCTION Computer users in many areas of scientific study appear to have an almost insatiable desire for processing power. Many of the computations exhibit a high degree of parallelism which is not exploited in conventional computer structures. Considerable interest has been shown in parallel computer architectures in the last decade in order to make use of this property. The two main approaches toward this goal have become known as SIMD (array processor) and MIMD (multiprocessor) architectures which are typified by the ILLIAC IV and the C.mmp systems respectively. It is not appropriate to describe these architectures in detail here, but it should be noted that a number of inadequacies have come to light as these systems have been applied to many problems. There are two basic difficulties which appear in both architectures in slightly different guises. The first concerns the arbitrary choice of a fixed number of processing units in an architecture (64 in the case of ILL lac IV, 16 in the case ofc.mmp). The difficulty that is occasioned by this choice derives from the fact that it is often inconvenient, if not impossible, to organize the problem to be solved so that it is exactly the "width" of the architecture. The second problem arises when sections of essentially serial programs are processed. The problems in SIMD architecture are clear; in MIMD systems the difficulty usually appears when a critical area of memory becomes "locked-out" while many processors are trying to access it. Evidence is accumulating that relatively little of the potential speedup of SIMD and MIMD systems can actually be achieved for a broad spectrum of problems: Minsky's conjecture, 1 Flynn's analysis 2 and Enslow's view of operating system overheads in multiprocessors 3 all attest to this opinion. More recently, interest has arisen in data driven computation for the expression of programs (known as Data Flow). A computation is expressed as a data dependent directed graph with nodes representing computational operations and the arcs representing the flow of data. The nodes become ready for execution when all their input data are available. The expression of parallelism in terms of data dependencies rather than in spite of them, leads to a far more natural and flexible picture of parallel program execution. The theoretical basis of data driven computation can be traced to the paper of Karp and Miller in An extension of this line into machine architecture was pioneered by the team led by J. B. Dennis at MIT. 5,6 The architecture described in this paper follows the same broad principles as the MIT work but introduces the concept of token labelling as a mechanism to support re-entrant code structures. It is believed that this leads to a more efficient use of storage resources and provides a greater potential for the utilization of parallelism. Another important aspect of the design is the use of pseudo-associative store to perform token matching; this is achieved by reducing the nodes to a simple machine instruction level with a maximum of two inputs. DATA FLOW PRINCIPLES In order to illustrate the principles of a directed graph representation, consider the 'Butterfly' of a Fast Fourier Transform shown in Figure 1. The expressions being evaluated are: A'=A+C Cos a+d Sin a B/=B-C Sin a+d Cos a C'=A-C Cos a-d Sin a D'=B-D Cos a+c Sin a These can be built up from six simple nodes performing the functions of addition, subtraction, multiplication, sine, cosine and duplication of a value. A node becomes executable when its input values (normally called tokens) are available. Following this principle it can be seen that, assuming at least four execution units, the computation can be performed in seven steps as indicated by the number adjacent to the nodes. The total number of nodes in the graph is 2 t and therefore the computation has an average parallelism of three. The graph shown uses only simple operational nodes. In order to specify a more comprehensive computational model it is necessary to introduce a much wider set of primitives which provide control functions such as conditional branch- 623

2 624 National Computer Conference, 1979 A B c ( o Figure I-Graph of an F.F.T. Butterfly. ing. A complete description of a practical order code is beyond the scope of this paper but can be found in other literature. 7 PROBLEMS OF RE-ENTRANCY A complete Fast Fourier Transform is an iterative computation performed on a large array structure. Ignoring, for the moment, practical implementation mechanisms, consider the problems associated with using the graph of Figure 1 as a complete description of the arithmetic computation. The input values are not simple scalars but arrays of the form A[l]... A[N], B[l].. B[N], etc., for each iteration. In a flexible parallel machine, there is no reason why further iteration steps may not proceed even though one iteration has not been completed for all points in an array. The inputs to the graph can then be of the general form A[n]j, B[n]j, etc., where n represents the array index and i the iteration level. It is then no longer possible to declare a node executable by the presence of any two tokens on its input as they may belong to totally different parts of the computation. There are three possible solutions to this problem: 1. The use of a re-entrant graph is prohibited; each point of the array and each stage of the iteration must be described by a separate graph. 2. The use of the graph is limited by allowing one token to reside on an arc of the graph at anyone time. This can be achieved by only allowing a node to become executable when both its input tokens are present and no token exists on jt~ output arc. 3. The tokens are assumed to carry with them their index and iteration level as a label The rules are extended to require two tokens with the same label before a node can be declared executable. The first mechanism may require large amounts of code storage and, in problems where the iteration depth is only known at run time, the necessity for dynamic code generation. It is believed that this could be a significant overhead in a practical system. The second implies a sequential but pipeiined use of the code. This will severely reduce the exploitation of potential parallelism. The labelling method permits the use of pure static code and enables maximum usage of any parallelism which exists in the problem specification, This is dearly at the expense of the extra information which needs to be carried by each token. Assuming that it is desirable to utilize the maximum amount of parallelism available, then the limitation of one token per arc must be rejected in the general case. Of the remaining two methods, it is our contention that token labelling will result in much lower overheads in a practical system. One major use of re-entrant code, that of the procedure, has not yet been mentioned. Parallel invocation of procedures can be supported by ensuring that a unique identifier is allocated to each input token at the procedure call. It is not possible in this paper to describe all the implications of token labelling. Techniques are required to support nested iterations, recursive procedure calls and index manipulation in data structures. These can be achieved by nodes which operate on the label fields of tokens rather than the data values. More comprehensive information on the labelling concept can be found in descriptions of our own work 7 and that of Arvind and Gostelow at the University of Irvine, California, U.S.A. 8 THE MACHINE ARCHITECTURE Although it may be profitable to use a Data Flow computational model to describe parallel activities in a wide range of architectures, it is likely that a hardware structure which reflects the model directly will result in the most efficient implementation. The basic elements of a Data Flow machine must contain parallel processing units which perform the nodal operations, a stored description of the directed graph and a mechanism for collecting and matching tokens. The first two requirements can be realized using standard processing and storage techniques; the matching operation is more complex. During a computation, tokens will be produced by processing units which must await the arrival of other Lukens with identical labels which are directed to the same node.

3 A Prototype Data Flow Computer 625 They can then form an executable package to be allocated to a free processing unit. This storage and matching function can be simplified greatly if the maximum number of node inputs is limited to two; an associative storage technique can then be used. Figure 2 shows the basic architecture of the Manchester Data Flow Prototype which uses these principles. The Instruction Store is a random access memory which holds the directed graph description. Each entry is in the form of a nodal operation and the addresses of the subsequent nodes to which the output token(s) will be directed. Note that in this practical implementation, nodes are able to specify two output destinations for their result. It is also possible to specify a literal value as one input to the node. The processing units are microprogrammed microprocessors with a distribution and arbitration system. The distribution system, on receipt of an executable package, will select any processor which is free and allocate the nodal operation. The arbitration system controls the output of tokens from the processors. The switch provides input and output for the system. Initial tokens are directed to the starting nodes of the computation. A special destination address in the final nodes of the graph allows tokens to be output. The Token Queue is a First In First Out buffer which equalizes data rates around the system. The Matching Store is associative in nature, although it is implemented using conventional random access store with hardware hashing techniques. The associative field is formed from a concatenation of the label and next instruction fields; the value field is the token value. There is a requirement in a practical instruction set for single input nodes which require no matching operation. A control digit in the next instruction information allows a bypass of the Matching Store in this case. In order to execute a program, the graph description is entered in the instruction store. The initial data tokens are input to the Token queue via the switch. As an example of this, Figure 3 shows a very simple graph to form the sum of the product of two pairs of numbers, together with an indication of the Instruction Store entries and initial token formats. The label is not used in this case and is thus set to zero. The tokens, on reaching the front of the Token Queue, can access or bypass the Matching Store dependent on the Next Instruction Information. An access to the Matching Store will cause an associative search of the store. If a token is found with the same label and Instruction Address it is removed to form a token pair. If no match is found, the incoming token is written to the store. Token pairs from the Matching Store, or single tokens which have bypassed it, now access the Instruction Store x y ADDR=l A. - (1\' * X) + (Y * Z) input tokens A INSTRJCTION STORE operation next instruction 'A next, Instruction 'B' - literal TOKEN QUEUE output tokens AllDRESS OPERATlO:\ ~IPY MPY Ann Instruction Store entries ~rxt I~STRUCTIO~ 'A' ~rxt TSSTm~TIOS 'B' AnllR L, II. /R.11. OUTPUT L.II. R,II. AnDR :\ONE l\r 'Jo~r l.. II. /R.lI. value label next instruction Initial Tokens ~rxt I I\STRUCT 1m,: Ul.lJr LABH ADDR L.II./R.II. M.S. BYPASS Figure 2-Basic architecture. \'i y L.II. R.Il. L.II. rul. Figure 3-A simple graph and its machine representation.

4 626 National Computer Conference, 1979 and form an executable package. This is distributed to any free processor for execution. Tokens produced by the processors are entered on the back of the Token Queue via the Arbitration Unit and Switch. This operation proceeds in a pipelined manner around the ring structure until the computation is complete and any output tokens have been produced. Each unit communicates with its successor and predecessor by a two-way handshake interface. This ensures that if, for example, all processing units are busy, the ring operation is suspended until the necessary resources become available. In order to facilitate the description of the architecture, certain simplifications have been made. It does, however, contain the underlying principles of the prototype Manchester Data Flow machine. I r PROCESSORS INSTRUCTION I STORE I~-~ nl1 + P2o- POo)/(1+P2i) SWITCH t ~ I IMPLEMENTATION OF THE MACHINE UNITS The token queue TOKEN 1 QUEUE The Token Queue is implemented using Random Access Memory with pointers indicating the front and back of the queue. The store is divided into two independent stacks which permit concurrent read and write operations. Although it is not intended to include storage hierarchies in the prototype design, it is felt that use could be made of shift register devices if a much larger queue were required. The matching store The pseudo-content Addressable Memory required for the matching operation uses hardware parallel hashing techniques similar to those described by Goto and Ida. 9,1 However, as the order of propagation of results around the ring is unimportant, use can be made of an overflow hashing mechanism rather than a re-hash. The advantage of this is that the average store access time can be reduced and a backing store structure can be introduced. The instruction store No special techniques are necessary in the Instruction Store. It is simply a linear Random Access Memory which is addressable by the Next Instruction Information field. The processing units Ten Processing Units are constructed from Schottky 'bitslice' microprocessor elements. They contain alterable microprogram store in order that the instruction set can be changed with ease. All processors are connected to a common bus with a distribution mechanism to allocate executable instructions to free processors and allow the output of results. A preliminary study of arithmetic operations has indicated that an average instruction execution time of 3~S can be achieved in each processor.ll SYSTEM PERFORMANCE MATCHING STORE n(1-p 2i J/(1+P 2i ) Figure 4-Data rates. The processor speed has already been stated as an average instruction time of 3/-tS. If all ten processors can be utilized fully then an instruction execution rate of 3.3 MIPS can be achieved. It is necessary to estimate the storage speeds which are consistent with this. With reference to Figure 4 we define four values: Tokens per unit time read from the token queue. Probability of an instruction requiring two inputs. Probability of an instruction producing two outputs. Probability of an instruction producing zero outputs. The following relationships can be derived: Tokens will access the matching store. Tokens will bypass the matching store. Pairs of tokens will exit from the matching store. Instruction accesses will be made, and executable instructions formed.

5 A Prototype Data Flow Computer 627 Tokens will be produced by the processors, pass through the Switch, and be written to the token queue. The constraint imposed by the processing unit is that: nj(1 + P 21 )=3.3x 1 6 For a typical program it appears, from simulation experience, that the following values of the probabilities are realistic for typical problems. U sing these we see that: P~=.5 P2o=.6 P Oo =.1 n=3.3x 1 6 x 1.5=4.95x 1 6 From this we can derive the following required operation times for the units: 1. Token Queue Read-Given by 1/n=22nS. 2. Matching Store Access-A token reaching the matching store, requires one read cycle plus one write cycle whether or not it is a successful match. Therefore the times are given by Read time + Write time=(1 + P2)12nP 2i =33nS 3. Instruction Store Read-Given by (1 + P21)/n=33nS 4. Switch Operation-Given by (1 + P ~)/n( 1+ P 2 - P oj = 22nS 5. Token Queue Write-Same as Switch Operation=22nS From these figures it can be seen that the storage units must have access times of the order of 2nS to maintain the execution rate of ten processors. This speed can be readily achieved by low cost MOS storage devices currently available. ARCHITECTURAL EXPANSION In order to obtain very high speeds from any parallel computer system it is necessqry to exploit parallelism in processing, storage and information transfer. The critical "bottlenecks," particularly in MIMD machines usually appear in the form of crossbar switches, common highways or common stores through which all the processors in the system may wish to communicate. The reason for this can be traced to the need for a processor to demand, and require rapidly, data from any other part of the system. In addition, there is a necessity to control access to data which has yet to be formed; this can also introduce significant communication overheads. In a data driven environment, a processor is not required to perform a section of a computation until all the data are presented to it as an executable package. Rapid data access to other parts of the system and access control are then unnec~ssary. This suggests an architecture containing a mechanism which accepts tokens from all processors and then distributes them to parallel but independent storage and processing resources. In the Manchester architecture, this can be achieved by extending the input/output switch to become the intersection point of many identical "rings." A strategy is adopted, using the label and instruction address fields, which distributes tokens from different parts of the computation across the parallel rings. The switch could be a crossbar, but this would suffer from the same problems as in a MIMD machine. Due to the pipelined nature of the rings, the requirement is for a high throughput rate rather than a fast transfer across the switch. A parallel pipelined structure can therefore be used which does not create a "bottleneck" in the system. The prototype will only contain a single "ring" as described previously. The switch will be constructed to enable the connection of further identical rings at a later date. A more comprehensive description of the implementation and estimated performance of the multi-ring architecture can be found in other literature. 7 PROGRAMMING A DATA FLOW MACHINE This section is intended to provide a brief outline of the methods available for programming a Data Flow machine. It would seem that the natural way of expressing a directed graph would be via a graphical language, and such languages have been investigated. 12 However, textual languages are far more familiar and it can be argued that features such as data structures are easier to express into a textual form. One approach is to take a conventional language and translate it into a Data Flow graph. The principles involved in such a translation were originally suggested by Miller and Rutledge. 13 More recently, Whitelock 14 has developed an experimental compiler for a subset of PASCAL which compiles code for the Manchester machine. Another class of languages, the Single Assignment Languages, are more naturally suited to the expression of parallelism in a Data Flow form. Some of these languages, for example Id,15 TDFLi6 and LAPSE,17 have been deveioped specifically for this purpose. Other similar languages such as LUCID18 have developed independently with emphasis on the proof of correctness of programs. It is too early to forecast with certainty which of these approaches will be most fruitful. The Manchester single assignment and conventional compilers produce code for an architecture simulator; they are being evaluated at the present time. It is certain that a Data Flow machine can be programmed without great difficulty, with no requirement for a knowledge of the underlying architecture which support,s the ex-

6 628 National Computer Conference, 1979 ecution. The exact form of languages which will gain acceptance is yet to be decided. CONCLUSIONS The expression of computations in data dependent graphical form provides a natural method of determining the parallelism which is present. In the design of practical execution mechanisms, problems exist in the implementation of reentrant code. An architecture has been proposed which attempts to overcome many of these problems by introducing a label which is carried by every data token. This, together with a pseudo-associative token matching store, results in an architecture which can be constructed at low cost using components which are readily available. A prototype machine is in the process of design and its performance is being evaluated by simulation. It is not intended that this prototype should be of very high speed but should provide a research vehicle for studies of the potential of Data Flow machines for the solution of real problems. The architecture is extensible by using copies of the basic design and this will proceed if the initial investigations are successful. No attempt has been made in this paper to address the problems of programming a Data Flow machine. Research is, however, being conducted both at Manchester and many other places which promises to produce high-level languages which will allow a machine independent formulation of parallel programs. REFERENCES 1. Minsky, M. and S. Papert. "On Some Associative, Parallel and Analog Computations," Associath'e Information Techniques (E. J. Jacks, ed.), Elsevier, Flynn, M. J., "Some Computer Organizations and their Effectiveness," I.E.E.E. Transactions on Computers, Vol. C-21, No.9, September 1972, p Enslow, P. H., "Multiprocessor Organization-A Survey," A.C.M. Computing Surveys, Vol. 9, No. I, March 1977, p Karp, R. M. and R. E. Miller, "Properties of a Model for Parallel Computations: Determinacy, Termination, Queueing," SIAM J. Applied Mathematics, Vol. 14, November 1966, pp ]]. 5. Dennis, J. B. and D. P. Misunas, "A Preliminary Architecture for a Basic Data Flow Processor," Proc. 2nd Ann. I.E.E.E. Symposium on Computer Architecture, January 1974, p. ] Dennis, J. B., D. P. Misunas and C. K. Leung, "A Highly Parallel Processor Using a Data Flow Machine Language," CSG Memo 134, Laboratory for Computer Science, M.I.T., January ] Gurd, J. R., I. Watson and J. R. W. Glauert, "A Multilayered Data Flow Computer Architecture," Internal Report, Development of Computer Science, University of Manchester, February Arvind and K. P. GosteIow, "A Computer Capable of Exchanging Processors for Time," Information Processing 77, North Holland, ]977, p Goto, E. and T. Ida, "Parallel Hashing Algorithms," Information Processing Letters, Vol. 6, No. ], February Ida, T and E. Goto, "Performance of a Parallel Hash Hardware with Key Deletion," Information Processing 77, North Holland, Zurawski, J,. 'The Design of a Processor Array for a Data Flow Architecture," M.Sc. Dissertation, Department of Computer Science, University of Manchester, England, Dennis, J. B., "First Version of a Data Flow Procedure Language," Lecture Notes in Computer Science, Vol. 19, 1974, p Miller, R. E. and J. D. Rutledge, "Generating a Data Flow Model of a Program," IBM Technical Disclosure Bulletin, Vol. 8, No. II, April Whitelock, P. J., "A Conventional Language for Data Flow Computation," M.Sc. Dissertation, Department of Computer Science, University of Manchester, October Arvind, K. P. Gostelow and W. Plouffe, "The (Preliminary) ID Report," Technical Report 114, Department of Information and Computer Science, University of California, Irvine, May Ackerman, W. B., "Preliminary Data Flow Language," CSG Note 36, Laboratory for Computer Science, MIT, March Glauert, J. R. W., "A Single Assignment Language for Data Flow Computing," M.Sc. Dissertation, Department of Computer Science. University of Manchester, January Ashcroft, E. A. and W. W. Wadge, "Lucid, a Nonprocedural Language with Iteration," CACM, Vol. 2, No.7, July 1977, p. 519.

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

TDT 4260 lecture 11 spring semester 2013. Interconnection network continued

TDT 4260 lecture 11 spring semester 2013. Interconnection network continued 1 TDT 4260 lecture 11 spring semester 2013 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Interconnection network continued Routing Switch microarchitecture

More information

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No. Some Computer Organizations and Their Effectiveness Michael J Flynn IEEE Transactions on Computers. Vol. c-21, No.9, September 1972 Introduction Attempts to codify a computer have been from three points

More information

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3 A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti, Nidhi Rajak 1 Department of Computer Science & Applications, Dr.H.S.Gour Central University, Sagar, India, ranjit.jnu@gmail.com

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available: Tools Page 1 of 13 ON PROGRAM TRANSLATION A priori, we have two translation mechanisms available: Interpretation Compilation On interpretation: Statements are translated one at a time and executed immediately.

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis Parallel Computers Definition: A parallel computer is a collection of processing

More information

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department

More information

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach

Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden smakadir@csc.kth.se,

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Glossary of Object Oriented Terms

Glossary of Object Oriented Terms Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction

More information

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

How To Develop Software

How To Develop Software Software Engineering Prof. N.L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-4 Overview of Phases (Part - II) We studied the problem definition phase, with which

More information

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

Chapter 2. Multiprocessors Interconnection Networks

Chapter 2. Multiprocessors Interconnection Networks Chapter 2 Multiprocessors Interconnection Networks 2.1 Taxonomy Interconnection Network Static Dynamic 1-D 2-D HC Bus-based Switch-based Single Multiple SS MS Crossbar 2.2 Bus-Based Dynamic Single Bus

More information

Using Power to Improve C Programming Education

Using Power to Improve C Programming Education Using Power to Improve C Programming Education Jonas Skeppstedt Department of Computer Science Lund University Lund, Sweden jonas.skeppstedt@cs.lth.se jonasskeppstedt.net jonasskeppstedt.net jonas.skeppstedt@cs.lth.se

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Chapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Keywords: Regression testing, database applications, and impact analysis. Abstract. 1 Introduction

Keywords: Regression testing, database applications, and impact analysis. Abstract. 1 Introduction Regression Testing of Database Applications Bassel Daou, Ramzi A. Haraty, Nash at Mansour Lebanese American University P.O. Box 13-5053 Beirut, Lebanon Email: rharaty, nmansour@lau.edu.lb Keywords: Regression

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

More information

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis Middleware and Distributed Systems Introduction Dr. Martin v. Löwis 14 3. Software Engineering What is Middleware? Bauer et al. Software Engineering, Report on a conference sponsored by the NATO SCIENCE

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Load Balancing and Switch Scheduling

Load Balancing and Switch Scheduling EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

More information

Contributions to Gang Scheduling

Contributions to Gang Scheduling CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could

More information

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE

More information

Network Traffic Monitoring an architecture using associative processing.

Network Traffic Monitoring an architecture using associative processing. Network Traffic Monitoring an architecture using associative processing. Gerald Tripp Technical Report: 7-99 Computing Laboratory, University of Kent 1 st September 1999 Abstract This paper investigates

More information

Preserving Message Integrity in Dynamic Process Migration

Preserving Message Integrity in Dynamic Process Migration Preserving Message Integrity in Dynamic Process Migration E. Heymann, F. Tinetti, E. Luque Universidad Autónoma de Barcelona Departamento de Informática 8193 - Bellaterra, Barcelona, Spain e-mail: e.heymann@cc.uab.es

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine 7 Objectives After completing this lab you will: know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine Introduction Branches and jumps provide ways to change

More information

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format , pp.91-100 http://dx.doi.org/10.14257/ijhit.2014.7.4.09 Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format Jingjing Zheng 1,* and Ting Wang 1, 2 1,* Parallel Software and Computational

More information

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING

FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING FAULT TOLERANCE FOR MULTIPROCESSOR SYSTEMS VIA TIME REDUNDANT TASK SCHEDULING Hussain Al-Asaad and Alireza Sarvi Department of Electrical & Computer Engineering University of California Davis, CA, U.S.A.

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva robert.geva@intel.com Introduction Intel

More information

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study Milan E. Soklic Abstract This article introduces a new load balancing algorithm, called diffusive load balancing, and compares its performance

More information

Memory Systems. Static Random Access Memory (SRAM) Cell

Memory Systems. Static Random Access Memory (SRAM) Cell Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled

More information

Chapter 5 Linux Load Balancing Mechanisms

Chapter 5 Linux Load Balancing Mechanisms Chapter 5 Linux Load Balancing Mechanisms Load balancing mechanisms in multiprocessor systems have two compatible objectives. One is to prevent processors from being idle while others processors still

More information

University of Dayton Department of Computer Science Undergraduate Programs Assessment Plan DRAFT September 14, 2011

University of Dayton Department of Computer Science Undergraduate Programs Assessment Plan DRAFT September 14, 2011 University of Dayton Department of Computer Science Undergraduate Programs Assessment Plan DRAFT September 14, 2011 Department Mission The Department of Computer Science in the College of Arts and Sciences

More information

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems

More information

TPCalc : a throughput calculator for computer architecture studies

TPCalc : a throughput calculator for computer architecture studies TPCalc : a throughput calculator for computer architecture studies Pierre Michaud Stijn Eyerman Wouter Rogiest IRISA/INRIA Ghent University Ghent University pierre.michaud@inria.fr Stijn.Eyerman@elis.UGent.be

More information

Floating Point Fused Add-Subtract and Fused Dot-Product Units

Floating Point Fused Add-Subtract and Fused Dot-Product Units Floating Point Fused Add-Subtract and Fused Dot-Product Units S. Kishor [1], S. P. Prakash [2] PG Scholar (VLSI DESIGN), Department of ECE Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu,

More information

Properties of Stabilizing Computations

Properties of Stabilizing Computations Theory and Applications of Mathematics & Computer Science 5 (1) (2015) 71 93 Properties of Stabilizing Computations Mark Burgin a a University of California, Los Angeles 405 Hilgard Ave. Los Angeles, CA

More information

C++ Programming Language

C++ Programming Language C++ Programming Language Lecturer: Yuri Nefedov 7th and 8th semesters Lectures: 34 hours (7th semester); 32 hours (8th semester). Seminars: 34 hours (7th semester); 32 hours (8th semester). Course abstract

More information

Masters in Information Technology

Masters in Information Technology Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101

More information

Application Design: Issues in Expert System Architecture. Harry C. Reinstein Janice S. Aikins

Application Design: Issues in Expert System Architecture. Harry C. Reinstein Janice S. Aikins Application Design: Issues in Expert System Architecture Harry C. Reinstein Janice S. Aikins IBM Scientific Center 15 30 Page Mill Road P. 0. Box 10500 Palo Alto, Ca. 94 304 USA ABSTRACT We describe an

More information

Bob Boothe. Education. Research Interests. Teaching Experience

Bob Boothe. Education. Research Interests. Teaching Experience Bob Boothe Computer Science Dept. University of Southern Maine 96 Falmouth St. P.O. Box 9300 Portland, ME 04103--9300 (207) 780-4789 email: boothe@usm.maine.edu 54 Cottage Park Rd. Portland, ME 04103 (207)

More information

Chapter 2 Parallel Architecture, Software And Performance

Chapter 2 Parallel Architecture, Software And Performance Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 8, August-2013 1300 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 8, August-2013 1300 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 8, August-2013 1300 Efficient Packet Filtering for Stateful Firewall using the Geometric Efficient Matching Algorithm. Shriya.A.

More information

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards eddie.edwards@imperial.ac.uk At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

CS556 Course Project Performance Analysis of M-NET using GSPN

CS556 Course Project Performance Analysis of M-NET using GSPN Performance Analysis of M-NET using GSPN CS6 Course Project Jinchun Xia Jul 9 CS6 Course Project Performance Analysis of M-NET using GSPN Jinchun Xia. Introduction Performance is a crucial factor in software

More information

Masters in Human Computer Interaction

Masters in Human Computer Interaction Masters in Human Computer Interaction Programme Requirements Taught Element, and PG Diploma in Human Computer Interaction: 120 credits: IS5101 CS5001 CS5040 CS5041 CS5042 or CS5044 up to 30 credits from

More information

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

Synchronization of sampling in distributed signal processing systems

Synchronization of sampling in distributed signal processing systems Synchronization of sampling in distributed signal processing systems Károly Molnár, László Sujbert, Gábor Péceli Department of Measurement and Information Systems, Budapest University of Technology and

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

KITES TECHNOLOGY COURSE MODULE (C, C++, DS) KITES TECHNOLOGY 360 Degree Solution www.kitestechnology.com/academy.php info@kitestechnology.com technologykites@gmail.com Contact: - 8961334776 9433759247 9830639522.NET JAVA WEB DESIGN PHP SQL, PL/SQL

More information

Masters in Computing and Information Technology

Masters in Computing and Information Technology Masters in Computing and Information Technology Programme Requirements Taught Element, and PG Diploma in Computing and Information Technology: 120 credits: IS5101 CS5001 or CS5002 CS5003 up to 30 credits

More information

Fault Analysis in Software with the Data Interaction of Classes

Fault Analysis in Software with the Data Interaction of Classes , pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental

More information

The mathematics of RAID-6

The mathematics of RAID-6 The mathematics of RAID-6 H. Peter Anvin 1 December 2004 RAID-6 supports losing any two drives. The way this is done is by computing two syndromes, generally referred P and Q. 1 A quick

More information

Programming Language Pragmatics

Programming Language Pragmatics Programming Language Pragmatics THIRD EDITION Michael L. Scott Department of Computer Science University of Rochester ^ШШШШШ AMSTERDAM BOSTON HEIDELBERG LONDON, '-*i» ЩЛ< ^ ' m H NEW YORK «OXFORD «PARIS»SAN

More information

Chapter 5 Instructor's Manual

Chapter 5 Instructor's Manual The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 5 Instructor's Manual Chapter Objectives Chapter 5, A Closer Look at Instruction

More information

Reliable Systolic Computing through Redundancy

Reliable Systolic Computing through Redundancy Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/

More information

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL

MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL MULTISTAGE INTERCONNECTION NETWORKS: A TRANSITION TO OPTICAL Sandeep Kumar 1, Arpit Kumar 2 1 Sekhawati Engg. College, Dundlod, Dist. - Jhunjhunu (Raj.), 1987san@gmail.com, 2 KIIT, Gurgaon (HR.), Abstract

More information

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Ying Peng, Bin Gong, Hui Liu, and Yanxin Zhang School of Computer Science and Technology, Shandong University,

More information

How To Get A Computer Science Degree At Appalachian State

How To Get A Computer Science Degree At Appalachian State 118 Master of Science in Computer Science Department of Computer Science College of Arts and Sciences James T. Wilkes, Chair and Professor Ph.D., Duke University WilkesJT@appstate.edu http://www.cs.appstate.edu/

More information

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

More information

A Markovian Sensibility Analysis for Parallel Processing Scheduling on GNU/Linux

A Markovian Sensibility Analysis for Parallel Processing Scheduling on GNU/Linux A Markovian Sensibility Analysis for Parallel Processing Scheduling on GNU/Linux Regiane Y. Kawasaki 1, Luiz Affonso Guedes 2, Diego L. Cardoso 1, Carlos R. L. Francês 1, Glaucio H. S. Carvalho 1, Solon

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,

More information

COMPUTER SCIENCE. FACULTY: Jennifer Bowen, Chair Denise Byrnes, Associate Chair Sofia Visa

COMPUTER SCIENCE. FACULTY: Jennifer Bowen, Chair Denise Byrnes, Associate Chair Sofia Visa FACULTY: Jennifer Bowen, Chair Denise Byrnes, Associate Chair Sofia Visa COMPUTER SCIENCE Computer Science is the study of computer programs, abstract models of computers, and applications of computing.

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

Kingdom of Saudi Arabia King Saud University

Kingdom of Saudi Arabia King Saud University Kingdom of Saudi Arabia King Saud University College of Computer & Information Sciences Department of Computer Engineering The MASTER S PROGRAM IN COMPUTER ENGINEERING ٢٠٠٣ M.S. PROGRAM IN COMPUTER ENGINEERING

More information

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1 MICROPROCESSOR A microprocessor incorporates the functions of a computer s central processing unit (CPU) on a single Integrated (IC), or at most a few integrated circuit. It is a multipurpose, programmable

More information

Challenges and Opportunities for formal specifications in Service Oriented Architectures

Challenges and Opportunities for formal specifications in Service Oriented Architectures ACSD ATPN Xi an China June 2008 Challenges and Opportunities for formal specifications in Service Oriented Architectures Gustavo Alonso Systems Group Department of Computer Science Swiss Federal Institute

More information

GameTime: A Toolkit for Timing Analysis of Software

GameTime: A Toolkit for Timing Analysis of Software GameTime: A Toolkit for Timing Analysis of Software Sanjit A. Seshia and Jonathan Kotker EECS Department, UC Berkeley {sseshia,jamhoot}@eecs.berkeley.edu Abstract. Timing analysis is a key step in the

More information

Parallel Programming

Parallel Programming Parallel Programming Parallel Architectures Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Parallel Architectures Acknowledgements Prof. Felix

More information

Computer Science. Requirements for the Major (updated 11/13/03)

Computer Science. Requirements for the Major (updated 11/13/03) Computer Science Faculty: Knox Chair; Komagata,, Martinovic, Neff, Sampath, Wolz Faculty from mathematics with joint teaching appointments in computer science: Conjura, Greenbaun, Iannone The computer

More information

{emery,browne}@cs.utexas.edu ABSTRACT. Keywords scalable, load distribution, load balancing, work stealing

{emery,browne}@cs.utexas.edu ABSTRACT. Keywords scalable, load distribution, load balancing, work stealing Scalable Load Distribution and Load Balancing for Dynamic Parallel Programs E. Berger and J. C. Browne Department of Computer Science University of Texas at Austin Austin, Texas 78701 USA 01-512-471-{9734,9579}

More information

FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers

FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers Bogdan Mătăsaru and Tudor Jebelean RISC-Linz, A 4040 Linz, Austria email: bmatasar@risc.uni-linz.ac.at

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

SHARED HASH TABLES IN PARALLEL MODEL CHECKING

SHARED HASH TABLES IN PARALLEL MODEL CHECKING SHARED HASH TABLES IN PARALLEL MODEL CHECKING IPA LENTEDAGEN 2010 ALFONS LAARMAN JOINT WORK WITH MICHAEL WEBER AND JACO VAN DE POL 23/4/2010 AGENDA Introduction Goal and motivation What is model checking?

More information

Supercomputing applied to Parallel Network Simulation

Supercomputing applied to Parallel Network Simulation Supercomputing applied to Parallel Network Simulation David Cortés-Polo Research, Technological Innovation and Supercomputing Centre of Extremadura, CenitS. Trujillo, Spain david.cortes@cenits.es Summary

More information

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection

More information

Masters in Networks and Distributed Systems

Masters in Networks and Distributed Systems Masters in Networks and Distributed Systems Programme Requirements Taught Element, and PG Diploma in Networks and Distributed Systems: 120 credits: IS5101 CS5001 CS5021 CS4103 or CS5023 in total, up to

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Evaluation of Complexity of Some Programming Languages on the Travelling Salesman Problem

Evaluation of Complexity of Some Programming Languages on the Travelling Salesman Problem International Journal of Applied Science and Technology Vol. 3 No. 8; December 2013 Evaluation of Complexity of Some Programming Languages on the Travelling Salesman Problem D. R. Aremu O. A. Gbadamosi

More information

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science Program Schedule CTech Computer Science Credits CS101 Computer Science I 3 MATH100 Foundations of Mathematics and

More information