OpenTransputer: reinventing a parallel machine from the past

Size: px
Start display at page:

Download "OpenTransputer: reinventing a parallel machine from the past"

Transcription

1 Dissertation Type: enterprise DEPARTMENT OF COMPUTER SCIENCE OpenTransputer: reinventing a parallel machine from the past David Keller, Andres Amaya Garcia A dissertation submitted to the University of Bristol in accordance with the requirements of the degree of Master of Engineering in the Faculty of Engineering. Thursday 25 th June, 2015

2 2

3 Declaration This dissertation is submitted to the University of Bristol in accordance with the requirements of the degree of MEng in the Faculty of Engineering. It has not been submitted for any other degree or diploma of any examining body. Except where specifically acknowledged, it is all the work of the Authors. David Keller, Andres Amaya Garcia, Thursday 25 th June, 2015 i

4 ii

5 Contents 1 Contextual Background Project Context Why Re-implement the Transputer? Overview of Computer Architecture Project Aims and Objectives Technical Background Occam Transputer Architecture Transputer Microarchitecture Transputer Versions and Enhancements Project Execution The OpenTransputer CPU External Communication I/O Interface The OpenTransputer System Digital Design with Hardware Description Languages (HDL) Critical Evaluation Evaluating Design Decisions Design Verification Synthesis results Conclusion Current Project Status Future Work Final Conclusions A Transputer Instruction Set 51 iii

6 iv

7 List of Figures 1.1 A six level computer as described in [31]. Below each level the way the abstraction is implemented is indicated (including the program responsible for it in parentheses) Instruction sizes for the four different machines, the zero-address machine has the highest code density as each instruction only takes up 8 bits. It should be noted that destination and source operands are addresses in memory Flow of events in sample Occam parallel program Transputer evaluation stack [18] Transputer instruction format [18] The Transputer s hardware implemented process scheduler [18] Conditional execution of microcodes in Simple 42 Transputer [2] Next microinstruction state address generation Stages of the microassembling process Example datapath implementation using three buses Example implementation using a wide datapath approach OpenTransputer s simplified datapath schematic Integration of the three main components of the OpenTransputer CPU Signal timing for correct CPU behaviour Example 8 8 Beneš network Folded over Beneš network with capacity for 8 OpenTransputers Bit fields of the channel addresses for internal and external communication and I/O pins Example route of a package through a Beneš network of OpenTransputers Internal connections of 4 input and 4 output switches Data exchange protocol between two components of a network of OpenTransputers Packet formats for external communication in the Inmos Transputer and the OpenTransputer Interaction between input and output controllers to transfer a message over the network [16] Major components of the OpenTransputer system v

8 vi

9 List of Tables 1.1 High-level overview of three different architectures A comparison of how different architectures encode a simple add instruction A comparison of the assembly code of different architectures for the same high-level statement. It should be noted that all operands are addresses in memory. [9] Occam primitive processes [14] Occam constructs [14] Execution time (in clock cycles) of primary instructions in both the Inmos Transputer [15] and the OpenTransputer Execution time (in clock cycles) of secondary instructions in both the Inmos Transputer [15] and the OpenTransputer. w is the number of 32-bit words to be moved FPGA resources used by the design. The first row shows how much resources are consumed in total. The bottom three rows show how many of these resources the two cores and the switch use individually FPGA resources used by a single core. The bottom three rows show the number of LUTs consumed by the individual components of Core FPGA resources used by the major components within the datapath Comparison of chip area and manufacturing process of the OpenTransputer and the original Transputer after synthesis A.1 Primary instructions [28] A.2 Implemented secondary instructions [28] in the OpenTransputer A.3 Unimplemented secondary instructions [28] in the OpenTransputer vii

10 viii

11 List of Listings 2.1 Occam purely sequential program Occam parallel program Simple 42 microcode for bitwise AND operation OpenTransputer microinstruction for state GT (greater than instruction) OpenTransputer microinstruction for state CCNT Assembly program for a register machine Assembly program using the Transputer Instruction Set Architecture (ISA) ix

12 x

13 Executive Summary We have developed the OpenTransputer, a new microprocessor based on the Transputer architecture designed by Inmos in the 1980s. We believe that the features of the Transputer architecture, such as inbuilt communication and concurrency management, make it an interesting proposition for the emerging Internet of Things (IoT) market, where a small, general-purpose processor could serve as a building block for a broad range of networked applications. The OpenTransputer modernises many aspects of the original Transputer, features a new microarchitecture, external communication mechanism and an I/O interface. The following is a list of our achievements during the project: Developed a functional simulator in C of the Transputer architecture, which was used to evaluate the trade-offs of different design approaches before they were implemented in Verilog. During the late stages of the project, the functional simulator was used as a reference model to verify the behavioural correctness of our design. Developed a new implementation of the Transputer architecture in Verilog: the OpenTransputer. The new design introduces significant changes to the microarchitecture that take advantage of state-of-the-art manufacturing technologies. Introduced a routing mechanism for inter-processes communication of the Inmos Transputer with a switch-based network that distributes data packets among the OpenTransputer nodes. Switches form a rearrangeably non-blocking Benes network, which greatly improves the performance and usability of the processor as a building block to assemble any kind of system. Replaced the existing input and output link controllers for external communication of the 1980s Transputer by a more scalable implementation that uses the idea of virtual channels. Designed and implemented an I/O interface that builds on the existing channel communication functionality of the processor and can be used to connect external hardware devices. xi

14 xii

15 Supporting Technologies The following is a list of third party software and hardware tools, libraries and components used during development of the OpenTransputer project. We used the D7202 Inmos Occam 2 Toolset compiler sources developed by Inmos. The compiler was used to generate assembly code from Occam programs. The output would then be translated into machine code using an assembler script written by us. The compiler s C source code is accessible under yet the program is difficult to compile using modern versions of GNU C Compiler (GCC). We introduced minor changes to the C source and corrected syntax mistakes in existing makefiles before we were able to generate an ELF executable that could be run in modern x86 machines. We used PyYAML to implement an assembler that parses configuration information associated with an Occam program that is meant to execute in a network of OpenTransputers. PyYAML is a Python library for parsing and emitting YAML code, which is a human-readable serialisation language. We used Red Hat s hosting platform OpenShift to make the OpenTransputer website accessible. The OpenTransputer website was implemented by using the WordPress software available from Furthermore, we used the HTML5/CSS3 based theme onetone to produce the graphical interface of the website. We used the ZedBoard XC7Z020-CLG484 FPGA supplied by the Computer Science Department and programmed it using the Verilog description of the OpenTransputer to develop a demo application. The Xilinx Vivado Design Suite was used to simulate the Verilog description of the OpenTransputer and program the FPGA mentioned above. We used the Distributed Memory Generator v8.0 from the Xilinx Vivado Intellectual Property (IP) catalogue to create a single RAM and three ROM structures used within the OpenTransputer design. To estimate the area of a hypothetical OpenTransputer chip, we used Synopsys Design Vision version G SP5 to synthesize the design for a silicon target. Also, we used the UMC synthesis library for a 180µm process. xiii

16 xiv

17 Notation and Acronyms The following is a list of acronyms used throughout this documents. ROM : Read-Only Memory RAM : Random Access Memory ISA : Instruction Set Architecture HDL : Hardware Description Language DMA : Direct Memory Access FPGA : Field-Programmable Gate Array CPU : Central Processing Unit PL : Programmable Logic Iptr : Instruction Pointer Wptr : Workspace Pointer I/O : Input/Output IP : Intellectual Property MUX : Multiplexer DEMUX : Demultiplexer RTL : Register-Transfer Level RTE : Route Towards Edge RTC : Route Towards Core ALU : Arithmetic and Logic Unit AU : Arithmetic Unit LU : Logic Unit LUT : Look-Up Table CSP : Communicating Sequential Processes CISC : Complex Instruction Set Computing RISC : Reduced Instruction Set Computing MISC : Minimal Instruction Set Computing FSM : Finite State Machine RTOS : Real-Time Operating System IoT : Internet of Things ILP : Instruction-Level Parallelism LIFO : Last-In-First-Out xv

18 xvi

19 Acknowledgements First and foremost, we would like to express our gratitude to our supervisor, Prof. David May, for his valuable guidance and support throughout the project. Without his patient explanations about the Inmos Transputer and his advice our dissertation would not be completed. Furthermore, we would like to thank Roger Shepherd for his early input and advice on how to develop the OpenTransputer. Thanks are also owed to Dr. Simon Hollis for his advice on digital design using the Vivado Design Suite and Fred Barnes for his input on the compiler and configuration system of the Transputer. We would also like to thanks Richard Grafton and the University of Bristol Computer Science Department who supplied us with the FPGA used throughout the project. Finally, we thank Iman Malik for her help with the design of the OpenTransputer logo. xvii

20 xviii

21 Chapter 1 Contextual Background 1.1 Project Context The OpenTransputer is a re-implementation of the Transputer, a pioneering microprocessor architecture first released in the 1980s [3]. The original Transputer was considered revolutionary at its time for its integrated memory and serial communication links intended for parallel computing. Including memory and external links on the same chip made the Transputer essentially a computer on a chip. This was supposed to allow information systems to be designed at a higher level the Transputer functioning as a building block for parallel computing networks. Over the last few years, with the shift to cloud computing there has been a trend in the technology world to building large clusters of powerful computers that serve data to an ever-growing number of client devices, which themselves only feature tiny and low-powered processors. These currently include mobile phones and tablets, but will soon also comprise every other device that connects to the internet, ranging from washing machines to cars [11]. We think that the Transputer and its unique feature set make it an excellent processor for this emerging Internet of Things (IoT) market, specifically the connected homes and wearables markets. As such, the OpenTransputer project aims to modernise the Transputer microarchitecture and introduce a more scalable network for communication based on switches and an easy-to-use I/O interface. In the following, we will give a short overview of the Transputer and the recent developments in the field of computer architecture. This will help explain and give background on the design decisions made in the original Transputer, which are detailed in Chapter 2 of this document, and also motivate the changes we have introduced in the OpenTransputer that are explained in Chapter Why Re-implement the Transputer? The Transputer, while revolutionary at its time for being a computer on a single chip, comprising processor, memory and communication links did not receive the attention it deserved. However, it found its way into spacecraft [1], satellites [32], set-top boxes and supercomputers [3] [12]. The Transputer has in-built support for concurrency through message passing, which is used for both off- and on-chip communication between processes, and a single Transputer can maintain and schedule multiple processes. This makes it both a multitasking and multiprocessing platform and provides a useful abstraction, since to the programmer communication operations are used in exactly the same way regardless of where the sender and receiver processes are running. In other architectures, approaches used 1

22 CHAPTER 1. CONTEXTUAL BACKGROUND for inter-process and inter-processor communication often vary and may require programmers to know about the intricacies of the operating system in the former case or the processor in the latter. With its own simple Real-Time Operating System (RTOS) implemented directly in hardware the Transputer can not only perform context switches in a fraction of the time traditional platforms take, but it also simplifies the way programmers interact with the processor. That is, there is no need to know the low-level implementation details of the processor, which is often the case with other platforms. Instead, a programmer can simply write code in the high-level language Occam, which exposes constructs to explicitly describe concurrent processes and their communication patterns [18]. Its ability to essentially serve as both a microcontroller or a small building block for a parallel network make it an attractive starting point to build upon for the IoT market. In this technological landscape, we foresee small chips being the dominant contenders due to their size, low cost and small energy footprint. On the other hand, this market is also defined by a need for communication between devices. The Transputer fits both descriptions and provides further useful features, as detailed above. 1.3 Overview of Computer Architecture The study of computer architecture is the study of the organisation and interconnection of components of computer systems. [29] A computer can be seen as a hierarchy of abstractions as shown in Figure 1.1, each performing a certain function. These levels are the digital logic level, the microarchitecture level, the Instruction Set Architecture (ISA), the operating system machine level and the assembly language level. Each level builds up on the one below it. Computer processor and computer are somewhat synonymous in this context, essentially they refer to a device that performs some sort of computation. Processor nowadays refers to the Central Processing Unit (CPU), which is usually part of a microchip, and hence also known as microprocessor. The CPU is a collection of logic tasked with executing programs that generate results for the user. In this section we want to briefly introduce how a CPU works, ignoring mostly the details of the implementation in digital logic. Instead we aim to focus on the concepts and mechanisms that dictate how a CPU will execute programs. We will give an overview of microarchitecture, the design of the physical hardware underlying the processor, and instruction set design, which refers to the interface to the hardware offered to the programmer. We will focus on basic processor design and omit issues in Advanced Computer Architecture, such as Instruction-Level Parallelism (ILP), pipelining, superscalar and vector processing, as these are not necessary to understand the rest of this document. The inclined reader is advised to take a look at the material referenced, as they will have more information on these and other issues. On the microarchitecture level, we think of computers as constructed from basic building blocks such as memories, arithmetic units and buses. The functional behaviour of these building blocks is similar across most machines. The differences between computers lie in how the modules they are made up of are connected together, in the performance of these modules and the way the entire computer is controlled by programs Instruction Set Architecture (ISA) The instruction set of a processor is a common abstraction used in the literature to describe the way a processor s internals work. It abstracts away the details of the underlying implementation instead providing an explanation of the high-level logical processor. Specifying a processor at this level means a logical processor with a certain ISA can have several different physical implementations or microarchitectures that ultimately work the same way but have 2

23 1.3. OVERVIEW OF COMPUTER ARCHITECTURE Level 5 Problem-oriented language level Translation (compiler) Level 4 Assembly language level Translation (assembler) Level 3 Operating system machine level Partial interpretation (operating system) Level 2 Instruction set architecture level Interpretation (micro-program) or direct execution Level 1 Micro-architecture level Hardware Level 0 Digital logic level Figure 1.1: A six level computer as described in [31]. Below each level the way the abstraction is implemented is indicated (including the program responsible for it in parentheses). Instruction set Type Design Bits Registers ARM Register register RISC 32/64 16/32 Transputer Stack machine MISC 16/32 - x86 Register memory CISC 16/32/64 6/8/16 Table 1.1: High-level overview of three different architectures. different performance characteristics and prices. The ISA-level is a very useful abstraction for programmers, providing a common platform to execute programs on. It means a program can be written in various high-level programming languages, which are then translated (compiled) into ISA-level machine language the processor can actually understand. This machine language is commonly referred to as assembly language or assembly code. Most ISAs used in processors today can be classified as following either a Reduced Instruction Set Computing (RISC) or Complex Instruction Set Computing (CISC) design philosophy [9]. The former refers to instruction sets comprised of very simple instructions each taking very few clock cycles to execute. In contrast, CISC designs typically include more complex instructions that take longer to execute, but perform more operations per instruction. Table 1.1 shows three different architectures: The Intel IA-32 architecture, which is commonly referred to as i386 is a an example of CISC, while the ARM is an example for a RISC. The Transputer is sometimes used as an example for a Minimal Instruction Set Computer (MISC). MISC computers commonly have a very small number of basic operations and are typically stack-based. In reality, the Transputer falls somewhere in the middle of all these three philosophies: A somewhat small and focused feature set would place it in the RISC camp, but its microcode design and 3

24 CHAPTER 1. CONTEXTUAL BACKGROUND some of its more complex instructions, which schedule processes or move several memory words at once resemble a CISC design Processor Design Issues Having defined the concepts and established the level at which we want to lead the discussion, the issue we want to elaborate now is the different options a processor designer has when implementing a processor at this level Number of Addresses Architecture Three-address Two-address One-address (accumulator machine) Zero-address (stack machine) Instruction add dest,src1,src2 add dest,src add addr add Table 1.2: A comparison of how different architectures encode a simple add instruction. A basic characteristic that has an implication on a processor s architecture is the number of addresses used in an instruction, as can be seen in Table 1.2. Most operations performed by an instruction are either binary or unary. A binary operation such as addition and multiplication require two input operands whereas a unary operation only has a single operand. Typically, an operation produces a single output result. This means a total number of three addresses need to be encoded in the instruction: Two addresses to specify two input operands and one additional address to specify the output result. Many processors specify all three addresses explicitly in the instruction. In some architectures such as the Intel IA-32, a two-address format is used where one of the input addresses serves as both source and destination addresses. It is possible to construct architectures in such a way that instructions have only one or zero addresses explicitly encoded. The former are called accumulator machines, while the latter are referred to as stack machines. The Transputer and the OpenTransputer are stack machines. A comparison of the different assembly code, and, by extension, machine code generated from a simple statement written in a high-level language, such as C, will help to discuss the benefits and disadvantages of each type of architecture. It should be noted that in this example, all four hypothetical machines expect operands to be an address in memory. An operand is then just an offset to some base egister or stack pointer. Consider the simple C statement: A = B + C * D - E + F + A Depending on the type of architecture it will be translated into one of the following pieces of assembly code in Table 1.3: RISC processors are usually three-address machines, meaning they define all three addresses explicitly in the instruction. As can be seen from the table above, such an architecture needs the fewest instructions to execute the high-level statement. In the first column, the one depicting the assembly code for the three-address machine, we can identify T as both a source and result operand. This simple observation serves as a basic justification for twoaddress architectures: If most instructions use one operand as both source and destination address, then only encoding two addresses in an instruction will not lose much flexibility and only makes the generated assembly code a little bit more complex, while allowing for shorter instructions. As can be seen in the second column, a two-address machine uses slightly more instructions to execute the statement, which is due to the fact that the value of C is loaded into T in the first instruction. 4

25 1.3. OVERVIEW OF COMPUTER ARCHITECTURE Three-address Two-address One-address Zero-address mult T,C,D load T,C load C push E add T,T,B mult T,D mult D push C sub T,T,E add T,B add B push D add T,T,F sub T,E sub E mult add A,T,A add T,F add F push B add A,T add A add store A sub push F add push A add pop A Table 1.3: A comparison of the assembly code of different architectures for the same high-level statement. It should be noted that all operands are addresses in memory. [9] Examining the second column of Table 1.3, we realise that all of them use the same operand, T. Making this the default forms the basis for a one-address machine, one where destination register, or accumulator, is implicit in the instruction. This kind of architecture is often used in environments where memory is constrained. Finally, in a zero-address machine all operands are implicit to the instruction and hence assumed to be at default locations. A stack is used to obtain the source input operands and the result written back onto the stack. A stack is a Last-In-First-Out (LIFO) data structure, supported and used by most processors, even non-zero-address ones. All operations in a stack machine assume that the top two values of the stack are the input operands. Results are placed (pushed) on the top of the stack. Notice that the pseudo-assembly instructions push and pop are an exception, as they take an address as operand Comparing Different Architectures Each of the four address schemes in Table 1.3 has advantages and disadvantages. Just by a quick glance at the table it can be noted that the number of instructions needed increases as we go from the lefthand side to the right-hand side of the table, i.e. as the number of addresses encoded in an instruction decreases. A possible performance metric is the number of memory accesses. The lower this number the faster we assume the processor is. Hypothetically, if the three-address machine does not have any registers then every instruction takes four memory accesses and because there are five instructions it would take 20 memory accesses in total to execute the statement. The two-address machine also takes four accesses per instruction, it should be recalled that one address doubles as both source and destination operand. Including the load instruction, which requires three accesses to memory, the two-address machine needs 23 memory accesses in total. If we do introduce a single register to store T into both designs, to make the example a bit more realistic, we can reduce the number of memory accesses to 12 for the three-address machine and 13 for the two-address machine respectively. The accumulator and stack machine use registers to read and write values and need 14 and 19 accesses respectively. This, however, does not mean that a three-address machine or two-address machine with registers is faster than a stack machine. Code density, the number of instructions that can be stored per memory unit (byte, 32-bit word, etc.), is another factor that needs to be considered. A stack machine does not need to specify any addresses and therefore requires fewer bits to encode, as can be seen in Figure 1.2. This means we can store more instructions in memory, which makes for shorter programs and helps improve 5

26 CHAPTER 1. CONTEXTUAL BACKGROUND 3-address format 23 bits Opcode Destination Source 1 Source 2 8 bits 5 bits 5 bits 5 bits 2-address format 18 bits Destination / Source Opcode Source bits 5 bits 5 bits 13 bits 1-address format Opcode 8 bits Destination / Source 2 5 bits 0-address format 8 bits Opcode 8 bits Figure 1.2: Instruction sizes for the four different machines, the zero-address machine has the highest code density as each instruction only takes up 8 bits. It should be noted that destination and source operands are addresses in memory. memory bandwidth usage Control Unit The control unit (also called control path or decoder) is responsible for decoding instructions and telling the system what to do. There are two commonly used approaches to design the physical logic within the control unit: hard-coded or hardwired, and microcode designs. RISC machines typically use a hard-coded control unit while CISC processors use a more modular microcode design. A hard-coded design is a direct implementation of a state machine based on wires, flip-flops and logic gates that operates the datapath components. It is simple to implement, although the resulting state machines can be quite complicated. They are however efficient, as the state machine ideally does not contain any superfluous elements but is tailored precisely to what is required. Microcodes can be tightly linked to CISC. In the 1970s memory used to be very expensive, so computer architects tried to minimise the amount of storage a program took up. This meant that each individual instruction a program is made up of had to do more work. On the other hand, designing hardware to carry out complex instructions was also a very expensive task. Microprograms offered the solution: A small run-time interpreter takes a complex instruction and executes simple (micro) instructions. Computer architects use microcode routines to implement more complex instructions. 6

27 1.4. PROJECT AIMS AND OBJECTIVES 1.4 Project Aims and Objectives The aim of the OpenTransputer project is to make a re-implementation of the Transputer, keeping the original ISA whilst updating the microarchitecture. All changes have been proposed and implemented with the IoT, wearables and connected homes market in mind. Specifically, the following are the main areas of improvement over the original Transputer the project is focussed on: 1. Develop an implementation of the Transputer instruction set taking advantage of state-of-the-art technology. 2. Replace the communication mechanism used in the original 1980s Transputer by a network implemented by switches to improve the usability of the processor as a building block for large parallel systems. 3. Introduce an easy-to-use I/O interface that can be used to connect hardware peripherals, such as sensors, commonly used in IoT applications. 7

28 CHAPTER 1. CONTEXTUAL BACKGROUND 8

29 Chapter 2 Technical Background 2.1 Occam Occam is a programming language designed at Inmos to facilitate the development of concurrent, distributed systems [14]. The language was developed hand-in-hand with the Transputer to extract maximum performance from the architecture. In fact, the Transputer System Description and documentation is written in Occam. Despite its purpose, Occam is a high-level language and not assembly. However, the fact that its model of concurrency was derived from Hoare s work on Communicating Sequential Processes (CSP) [13] means that Occam greatly differs from conventional programming languages such as C and Java [20]. Due to its simplicity, we use Occam to introduce the Transputer from a high-level point of view. Occam enables systems to be described as collections of concurrent processes that communicate with each other. Contrary to conventional languages, Occam programs are based on processes rather than procedures. Procedures can be thought of as sequences of instructions that are enclosed into callable constructs enabling code reusability and modularity. On the other hand, processes can be thought of as independent entities or self-contained programs with separate memory spaces and state. These processes are specifically designed to be run on identical Transputer system components containing on-chip memory and serial communication links to other components. The inter-process communication model is based on message passing [7], a scheme in which processes communicate by exchanging messages. Occam exposes point-to-point channel primitives to easily describe the connections between processes. Occam programs can be executed by a single Transputer that allocates CPU time for individual processes. Just as easily, the same concurrent programs can be executed by a network of Transputers without changing the Occam description. In this case, processes truly run in parallel and communicate by using the serial links. Programs are built from three primitive processes described in Table 2.1. Syntax v := e c! e c? v Description Assign expression e to variable v Output expression e to channel c Input from channel c to variable v Table 2.1: Occam primitive processes [14]. These primitives are combined to form the constructs in Table 2.2. Similarly, multiple constructs can be combined to write complex processes that communicate using the point-to-point channels. It is important to mention that in Occam communication is synchronised, 9

30 CHAPTER 2. TECHNICAL BACKGROUND Keyword SEQ PAR ALT WHILE IF Description Components executed in sequence Components executed in parallel First ready component is executed Iterative instructions Conditional construct Table 2.2: Occam constructs [14]. meaning that both sender and receiver processes wait until the transfer is complete before proceeding execution Sequential Processes The SEQ construct labels a process as sequential and causes its components to be executed one after another just as in conventional programming languages. Listing 2.1 is an example of a process that first assigns the value 10 to variable x, then outputs it to channel buffer.out and finally inputs a value into x from buffer.in. Note that the instructions are completed exactly in this order. VAR x: SEQ x := 10 buffer. out! x buffer. in? x Listing 2.1: Occam purely sequential program Parallel Processes The PAR keyword causes processes to be executed at the same time (if possible). Listing 2.2 shows two processes executing in parallel. The first process outputs x to channel c, while the other process receives it into a variable y. Also, note that the parallel construct is enclosed into a sequential construct and is followed by an assignment to z. In this case, the assignment will only be executed after both parallel processes have completed. The flow of events is as described in Figure 2.1. SEQ PAR c! x c? y z := y Listing 2.2: Occam parallel program. 2.2 Transputer Architecture Internal Process Representation In the Transputer each process is associated with a workspace in memory [15]. This area can be thought of as a stack where local variables, channel information and other values that describe the state of the process are stored. When the process is being executed, then its context is also held in six registers: 10

31 2.2. TRANSPUTER ARCHITECTURE c! x c? y z := y Figure 2.1: Flow of events in sample Occam parallel program. A B C Figure 2.2: Transputer evaluation stack [18]. A, B and C. These registers form the evaluation stack used to compute results for expressions. Instructions push values into the top of the stack and pop them when operations are executed as shown in Figure 2.2. Source and destination operands do not need to be encoded within the instructions because all operations are performed on the stack. Workspace pointer (Wptr). Holds the memory address of the top of the workspace stack. This is a special purpose register and can only be manipulated by selected instructions. Instruction pointer (Iptr). Stores the byte address of the next instruction. Operand. The Transputer instructions are 8-bits. The most significant four bits encode the instruction code (opcode) and the remaining bits are an immediate operand. When an instruction is executed, the immediate is loaded into the four least significant bits of the operand register as shown in Figure 2.3. There are also two special prefix instructions that can be used to accumulate operands into the operand register to be used later on by other instructions. It is also worth noting that the 4-bit opcode can only encode 16 different instructions, which clearly does not provide enough flexibility. For this reason, one of these opcodes causes the operand register to be treated as an instruction. The Transputer can only execute a single process at a time, yet it must be able to keep track of many others and eventually allocate them CPU time. To do so, processes are queued in a linked list 11

32 CHAPTER 2. TECHNICAL BACKGROUND 8-bit instruction Opcode Operand Operand register Figure 2.3: Transputer instruction format [18]. with the processor maintaining pointers to the front and back nodes of the data structure in registers. Thus, processes can either be active or inactive. The former refers to processes that are currently being executed or are scheduled to be executed. In contrast, the latter refers to processes that are witing i.e. currently not executing and its context is stored in memory. A process may be in this situation because it is waiting for some communication operation to complete, a timer to expire or its allocated CPU time has finished. When the process is ready to be executed, it is placed at the back of a linked list that acts as a waiting queue as shown in Figure 2.4. When the process reaches the front of the list it is executed [18]. The fact that the Transputer can automatically context switch and time slice its processing resources means that it effectively implements a scheduler in hardware. In most architectures these operations are entrusted to operating systems. However, context switches often involve storing a large number of registers in memory to preserve the state of the process (context), making the task slow and more complex when implemented in software. The Transputer s stack based architecture means that it can perform context switches extremely efficiently. According to the documentation in [15], the Transputer is able to stop an active process and start a new one in approximately 12 clock cycles. The tight relationship between Occam and the Transputer resulted in many of the programming language primitives being implemented as single assembly instructions. This not only includes scheduling operations such as starting and terminating processes, but also inter-process communication tasks. Therefore, processes can perform input and output operations by executing single instructions. It could be argued that the Transputer essentially comes with a Real-Time Operating System (RTOS) implemented in hardware Inter-process Communication As mentioned above, two processes communicate by using the Occam channel primitives which are directly operated by Transputer instructions. The processes might reside within the same or different machines, yet the same instructions are used. In the former case, a channel is represented by a word in memory. When the first process becomes ready, it writes its workspace pointer (identity) in the channel and is descheduled. Then, when the second process is ready the message is copied by the processor to the specified location, the first process rescheduled and the channel returned to the empty state [28]. On the other hand, if the processes reside in different Transputers both sender and receiver are descheduled while the transfer takes place and rescheduled when it concludes. In this case the communication is performed by autonomous link controllers. The links fetch or store data using a Direct Memory Access (DMA) mechanism and exchange messages with remote components through serial point-to-point connections. Each Transputer comes with four bidirectional serial links that make it possible to connect the processor to up to four other devices. There is no limit on the number of Transputers that can be connected in this fashion, which allows for large parallel networks to be constructed from individual Transputers [20]. However, as they are linked in a point-to-point fashion, the more components in the 12

33 2.3. TRANSPUTER MICROARCHITECTURE Workspaces Instructions Scheduling registers Front Process X (inactive process) Back Active process registers Process Y (inactive process) A B C Process Z (active process) Workspace Operand Instruction pointer Figure 2.4: The Transputer s hardware implemented process scheduler [18]. system, the slower the communication will potentially be as messages might have to be relayed over an increasing number of intermediate Transputers. In the Occam communication model, channels are synchronised and unbuffered; therefore, both processes must be ready before messages are transferred [14]. Furthermore, the compiler ensures that before an input or output instruction is executed the source and destination addresses, the channel address and the message length are available. 2.3 Transputer Microarchitecture Microcode Implementation Even though the Transputer is conceptually simple, many of its instructions are complex to implement in hardware and require multiple clock cycles to complete. At each step of the computation, the processor needs a different set of signals to control the datapath. The designers of the Transputer generated these signals by using a microcode approach which is commonly found in CISC processors. The idea is that each assembly instruction is executed by a sequence of simpler hardware-implemented microinstructions that can be completed in a single clock cycle. Each microinstruction is associated with a number of control signals that the Transputer stores in high-speed Read-Only Memory (ROM). In all Transputer models a microcode ROM is used, yet its size varies depending on the complexity of the processor. For instance, one of the early designs known as Simple 42 contains 122 microinstruction words of 68-bits each [10]. However, the more complex T414 version of the Transputer has approximately five times that 13

34 CHAPTER 2. TECHNICAL BACKGROUND number of microinstructions and over 100 bits per word. For simplicity, we focus on the microcodes defined for the Simple 42 Transputer. The control signals of each microcode can be divided into groups or bit fields. The most relevant ones for this discussion are listed and explained below [10]. X and Y bus source select and Z destination select. Due to the limited manufacturing capabilities available during the 1980s, processors needed to be designed to contain only two layers of interconnect. This greatly limited the number of connections between the different components of the design since the number of wire crossings had to be minimised. Thus, the Transputer uses three buses onto which data is multiplexed to be transported to where its required. Two of these (X and Y) correspond to source values for the computation and the remaining (Z) is used to carry the result. This mechanism greatly simplifies the connections between the different components of the processor at the expense of reducing the number of operations that can be performed by the processor simultaneously. ALU operations. These bits of the microcode select which operation is to be performed using the operands in the X and Y buses. Examples include addition, subtraction, etc. Next microinstruction address base. These bits contain the address of the next microinstruction in ROM that must be executed the next clock cycle. It is also possible that a microinstruction concludes the execution of an instruction, in which case these bits are not used. Conditional select. It is desirable to execute conditional statements within the microcode routines. For instance, when executing a conditional jump instruction, the processor must first evaluate an integer comparison and based on the result decide whether to increment the instruction pointer by a specified offset or 1. The Transputer implements conditional microcode execution by replacing the least significant two bits of the next microinstruction address base by conditional bits. The new address formed selects one of four possible microinstructions as shown in Figure 2.5. The two condition bits are generated by the operation specified by the conditional select bits associated with each microinstruction. AND : XfromA YfromB NoCarry ZfromXandY AfromZ Next ; Listing 2.3: Simple 42 microcode for bitwise AND operation. To illustrate the idea consider the Simple 42 microinstruction to execute a bitwise AND instruction shown in Listing 2.3. In this case, the X and Y buses contain the data stored in the A and B stack registers. The Z bus contains the result of the AND operation and this value is stored back into A. The ROM is managed by a microcode engine that ensures that the correct control signals are available at the right time. When an instruction is executed, the engine associates its opcode with a microinstruction and loads the required word from the ROM [10]. At the next clock cycle, the engine will decide whether a new instruction or a further microinstruction is executed. In the first case, the next instruction will be fetched from memory and executed as normal. In the latter case, the engine uses the address encoded into the microinstruction to fetch the next set of control signals. However, the fact that the two least significant bits of the address are overwritten by the conditional bits means that there are four possible microinstructions that could be executed, yet the decision is deferred for the conditional bits to 14

35 2.3. TRANSPUTER MICROARCHITECTURE Conditional bits Base address in microcode currently being executed Multiplexer Datapath control signals for next microinstruction Microcode Read-Only Memory (ROM) Fetch the word at the base address and the next three microinstructions Figure 2.5: Conditional execution of microcodes in Simple 42 Transputer [2]. be computed. For this reason, the engine loads four contiguous words from the ROM each time. Then, these are multiplexed using the conditional bits as control signals as shown in Figure Process Scheduling We mentioned above that the Transputer maintains a linked list of ready processes to be executed. When the CPU becomes available (no process is currently being executed), the process at the front of the list is dequeued and executed. In reality, the Transputer maintains two linked lists of process pointers. One list manages the high priority processes, while the other takes care of low priority ones. The process at the front of the high priority list is run first. It is possible that at any time a background operation requires the CPU s attention, such as a communicating process becoming ready, in which case the high priority process executing will be waiting until the request is completed. When the high priority process completes or blocks the next process at the front of the high priority list is executed. This continues until the high priority queue is empty, in which case the Transputer starts executing processes in the low priority list in a similar fashion. However, if at any point of time a high priority process becomes active, the context of the low priority process will be stored in memory and the CPU executes the high priority process. For this reason, high priority processes should only be used in special cases where performance and responsiveness are paramount, otherwise they will starve low priority processes of CPU resources. In fact, when the Transputer is reset its priority is set to low by default [28]. In Occam it is possible to declare timers, which behave like channels that cannot be output to. Timers are a simple mechanism to force processes to wait for a specified time without consuming any processing resources. The Transputer implements two additional linked lists (high and low priority) to support this feature. The processor implements two clock registers, one for each priority. The high and low priority clock registers are incremented every 1µs and 64µs respectively [28]. Processes in timer lists are sorted with regards to the clock registers. Processes whose time is closer to the time of the clock register are placed towards the front of the queue. When the time of the process at the front of the queue is after the time of the clock register, the Transputer dequeues and executes that process. 15

36 CHAPTER 2. TECHNICAL BACKGROUND Endianness and Memory Addressing The Transputer is purely a little endian processor [15], which means that the least significant byte of a word is stored at the lowest address. Also, contrary to most other processors, memory addresses in the Transputer are signed. 2.4 Transputer Versions and Enhancements The first Transputer was released in 1984 and subsequent models where introduced thereafter. Inmos developed three Transputer variants: T2, T4 and T8. All these designs maintain the core features of the architecture with regards to concurrency management and inter-process communication, yet some additions and enhancements were introduced at each iteration. The T2 variant is the 16-bit version of the processor. There were multiple releases, but one of the latter ones was the T225 [24], which contains 4KB of on-chip RAM, an external memory interface and four communication links. In contrast, the T4 series are 32-bit processors. The smaller of these is the T400 [25], which contained only two communication links and 2 KB of on-chip memory. The larger counterparts of the T400 are the T414 and T425 [26], with the latter containing twice as much memory as the T400 and four autonomous link controllers. Finally, the T800 Transputers were introduced in 1987 [27]. This variant was still 32-bit, but also came with an extended instruction set and a 64-bit floating-point unit. To make it easier to connect large numbers of Transputers in a network, Inmos introduced the C004 programmable link switch [23]. This devices had 32 link inputs and 32 outputs and complied with the serial link protocol already used by the Transputers. 16

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

MICROPROCESSOR AND MICROCOMPUTER BASICS

MICROPROCESSOR AND MICROCOMPUTER BASICS Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit

More information

Chapter 5 Instructor's Manual

Chapter 5 Instructor's Manual The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 5 Instructor's Manual Chapter Objectives Chapter 5, A Closer Look at Instruction

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

150127-Microprocessor & Assembly Language

150127-Microprocessor & Assembly Language Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

An Overview of Stack Architecture and the PSC 1000 Microprocessor

An Overview of Stack Architecture and the PSC 1000 Microprocessor An Overview of Stack Architecture and the PSC 1000 Microprocessor Introduction A stack is an important data handling structure used in computing. Specifically, a stack is a dynamic set of elements in which

More information

CPU Organisation and Operation

CPU Organisation and Operation CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

More information

Central Processing Unit

Central Processing Unit Chapter 4 Central Processing Unit 1. CPU organization and operation flowchart 1.1. General concepts The primary function of the Central Processing Unit is to execute sequences of instructions representing

More information

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing. CS143 Handout 18 Summer 2008 30 July, 2008 Processor Architectures Handout written by Maggie Johnson and revised by Julie Zelenski. Architecture Vocabulary Let s review a few relevant hardware definitions:

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

Instruction Set Design

Instruction Set Design Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,

More information

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

More information

Guide to RISC Processors

Guide to RISC Processors Guide to RISC Processors Sivarama P. Dandamudi Guide to RISC Processors for Programmers and Engineers Sivarama P. Dandamudi School of Computer Science Carleton University Ottawa, ON K1S 5B6 Canada sivarama@scs.carleton.ca

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

MACHINE ARCHITECTURE & LANGUAGE

MACHINE ARCHITECTURE & LANGUAGE in the name of God the compassionate, the merciful notes on MACHINE ARCHITECTURE & LANGUAGE compiled by Jumong Chap. 9 Microprocessor Fundamentals A system designer should consider a microprocessor-based

More information

Basic Computer Organization

Basic Computer Organization Chapter 2 Basic Computer Organization Objectives To provide a high-level view of computer organization To describe processor organization details To discuss memory organization and structure To introduce

More information

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.

More information

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1 MICROPROCESSOR A microprocessor incorporates the functions of a computer s central processing unit (CPU) on a single Integrated (IC), or at most a few integrated circuit. It is a multipurpose, programmable

More information

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER Pierre A. von Kaenel Mathematics and Computer Science Department Skidmore College Saratoga Springs, NY 12866 (518) 580-5292 pvonk@skidmore.edu ABSTRACT This paper

More information

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

A s we saw in Chapter 4, a CPU contains three main sections: the register section, 6 CPU Design A s we saw in Chapter 4, a CPU contains three main sections: the register section, the arithmetic/logic unit (ALU), and the control unit. These sections work together to perform the sequences

More information

RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition

RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition RAPID PROTOTYPING OF DIGITAL SYSTEMS Second Edition A Tutorial Approach James O. Hamblen Georgia Institute of Technology Michael D. Furman Georgia Institute of Technology KLUWER ACADEMIC PUBLISHERS Boston

More information

Let s put together a Manual Processor

Let s put together a Manual Processor Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce

More information

MACHINE INSTRUCTIONS AND PROGRAMS

MACHINE INSTRUCTIONS AND PROGRAMS CHAPTER 2 MACHINE INSTRUCTIONS AND PROGRAMS CHAPTER OBJECTIVES In this chapter you will learn about: Machine instructions and program execution, including branching and subroutine call and return operations

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Intel 8086 architecture

Intel 8086 architecture Intel 8086 architecture Today we ll take a look at Intel s 8086, which is one of the oldest and yet most prevalent processor architectures around. We ll make many comparisons between the MIPS and 8086

More information

18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right

More information

An Introduction to the ARM 7 Architecture

An Introduction to the ARM 7 Architecture An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new

More information

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC6504 - Microprocessor & Microcontroller Year/Sem : II/IV UNIT I THE 8086 MICROPROCESSOR 1. What is the purpose of segment registers

More information

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines Operating System Concepts 3.1 Common System Components

More information

Network Traffic Monitoring an architecture using associative processing.

Network Traffic Monitoring an architecture using associative processing. Network Traffic Monitoring an architecture using associative processing. Gerald Tripp Technical Report: 7-99 Computing Laboratory, University of Kent 1 st September 1999 Abstract This paper investigates

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

Computer organization

Computer organization Computer organization Computer design an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine inputs

More information

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit. Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

Levels of Programming Languages. Gerald Penn CSC 324

Levels of Programming Languages. Gerald Penn CSC 324 Levels of Programming Languages Gerald Penn CSC 324 Levels of Programming Language Microcode Machine code Assembly Language Low-level Programming Language High-level Programming Language Levels of Programming

More information

A New Paradigm for Synchronous State Machine Design in Verilog

A New Paradigm for Synchronous State Machine Design in Verilog A New Paradigm for Synchronous State Machine Design in Verilog Randy Nuss Copyright 1999 Idea Consulting Introduction Synchronous State Machines are one of the most common building blocks in modern digital

More information

AC 2007-2027: A PROCESSOR DESIGN PROJECT FOR A FIRST COURSE IN COMPUTER ORGANIZATION

AC 2007-2027: A PROCESSOR DESIGN PROJECT FOR A FIRST COURSE IN COMPUTER ORGANIZATION AC 2007-2027: A PROCESSOR DESIGN PROJECT FOR A FIRST COURSE IN COMPUTER ORGANIZATION Michael Black, American University Manoj Franklin, University of Maryland-College Park American Society for Engineering

More information

Managing Variability in Software Architectures 1 Felix Bachmann*

Managing Variability in Software Architectures 1 Felix Bachmann* Managing Variability in Software Architectures Felix Bachmann* Carnegie Bosch Institute Carnegie Mellon University Pittsburgh, Pa 523, USA fb@sei.cmu.edu Len Bass Software Engineering Institute Carnegie

More information

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu. Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

More information

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s) Addressing The problem Objectives:- When & Where do we encounter Data? The concept of addressing data' in computations The implications for our machine design(s) Introducing the stack-machine concept Slide

More information

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

More information

Week 1 out-of-class notes, discussions and sample problems

Week 1 out-of-class notes, discussions and sample problems Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

9/14/2011 14.9.2011 8:38

9/14/2011 14.9.2011 8:38 Algorithms and Implementation Platforms for Wireless Communications TLT-9706/ TKT-9636 (Seminar Course) BASICS OF FIELD PROGRAMMABLE GATE ARRAYS Waqar Hussain firstname.lastname@tut.fi Department of Computer

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

A3 Computer Architecture

A3 Computer Architecture A3 Computer Architecture Engineering Science 3rd year A3 Lectures Prof David Murray david.murray@eng.ox.ac.uk www.robots.ox.ac.uk/ dwm/courses/3co Michaelmas 2000 1 / 1 6. Stacks, Subroutines, and Memory

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory 1 1. Memory Organisation 2 Random access model A memory-, a data byte, or a word, or a double

More information

(Refer Slide Time: 02:39)

(Refer Slide Time: 02:39) Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering, Indian Institute of Technology, Delhi Lecture - 1 Introduction Welcome to this course on computer architecture.

More information

Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine

Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine Chapter 4 Lecture 5 The Microarchitecture Level Integer JAVA Virtual Machine This is a limited version of a hardware implementation to execute the JAVA programming language. 1 of 23 Structured Computer

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

Test Driven Development of Embedded Systems Using Existing Software Test Infrastructure

Test Driven Development of Embedded Systems Using Existing Software Test Infrastructure Test Driven Development of Embedded Systems Using Existing Software Test Infrastructure Micah Dowty University of Colorado at Boulder micah@navi.cx March 26, 2004 Abstract Traditional software development

More information

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level System: User s View System Components: High Level View Input Output 1 System: Motherboard Level 2 Components: Interconnection I/O MEMORY 3 4 Organization Registers ALU CU 5 6 1 Input/Output I/O MEMORY

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

Transport Layer Protocols

Transport Layer Protocols Transport Layer Protocols Version. Transport layer performs two main tasks for the application layer by using the network layer. It provides end to end communication between two applications, and implements

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program. Name: Class: Date: Exam #1 - Prep True/False Indicate whether the statement is true or false. 1. Programming is the process of writing a computer program in a language that the computer can respond to

More information

CPU Organization and Assembly Language

CPU Organization and Assembly Language COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:

More information

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

More information

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware A+ Guide to Managing and Maintaining Your PC, 7e Chapter 1 Introducing Hardware Objectives Learn that a computer requires both hardware and software to work Learn about the many different hardware components

More information

RISC AND CISC. Computer Architecture. Farhat Masood BE Electrical (NUST) COLLEGE OF ELECTRICAL AND MECHANICAL ENGINEERING

RISC AND CISC. Computer Architecture. Farhat Masood BE Electrical (NUST) COLLEGE OF ELECTRICAL AND MECHANICAL ENGINEERING COLLEGE OF ELECTRICAL AND MECHANICAL ENGINEERING NATIONAL UNIVERSITY OF SCIENCES AND TECHNOLOGY (NUST) RISC AND CISC Computer Architecture By Farhat Masood BE Electrical (NUST) II TABLE OF CONTENTS GENERAL...

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance What You Will Learn... Computers Are Your Future Chapter 6 Understand how computers represent data Understand the measurements used to describe data transfer rates and data storage capacity List the components

More information

Computer Systems Design and Architecture by V. Heuring and H. Jordan

Computer Systems Design and Architecture by V. Heuring and H. Jordan 1-1 Chapter 1 - The General Purpose Machine Computer Systems Design and Architecture Vincent P. Heuring and Harry F. Jordan Department of Electrical and Computer Engineering University of Colorado - Boulder

More information

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Pentium vs. Power PC Computer Architecture and PCI Bus Interface Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the

More information

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Programming Logic controllers

Programming Logic controllers Programming Logic controllers Programmable Logic Controller (PLC) is a microprocessor based system that uses programmable memory to store instructions and implement functions such as logic, sequencing,

More information

Getting off the ground when creating an RVM test-bench

Getting off the ground when creating an RVM test-bench Getting off the ground when creating an RVM test-bench Rich Musacchio, Ning Guo Paradigm Works rich.musacchio@paradigm-works.com,ning.guo@paradigm-works.com ABSTRACT RVM compliant environments provide

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

Design and Verification of Nine port Network Router

Design and Verification of Nine port Network Router Design and Verification of Nine port Network Router G. Sri Lakshmi 1, A Ganga Mani 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Pragathi Engineering College, Andhra

More information

Computer Organization & Architecture Lecture #19

Computer Organization & Architecture Lecture #19 Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of

More information

Computer Organization

Computer Organization Computer Organization and Architecture Designing for Performance Ninth Edition William Stallings International Edition contributions by R. Mohan National Institute of Technology, Tiruchirappalli PEARSON

More information

The Central Processing Unit:

The Central Processing Unit: The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Objectives Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

Chapter 2: OS Overview

Chapter 2: OS Overview Chapter 2: OS Overview CmSc 335 Operating Systems 1. Operating system objectives and functions Operating systems control and support the usage of computer systems. a. usage users of a computer system:

More information

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C Embedded Systems A Review of ANSI C and Considerations for Embedded C Programming Dr. Jeff Jackson Lecture 2-1 Review of ANSI C Topics Basic features of C C fundamentals Basic data types Expressions Selection

More information

8051 MICROCONTROLLER COURSE

8051 MICROCONTROLLER COURSE 8051 MICROCONTROLLER COURSE Objective: 1. Familiarization with different types of Microcontroller 2. To know 8051 microcontroller in detail 3. Programming and Interfacing 8051 microcontroller Prerequisites:

More information

Computer Organization and Architecture

Computer Organization and Architecture Computer Organization and Architecture Chapter 11 Instruction Sets: Addressing Modes and Formats Instruction Set Design One goal of instruction set design is to minimize instruction length Another goal

More information

1 The Java Virtual Machine

1 The Java Virtual Machine 1 The Java Virtual Machine About the Spec Format This document describes the Java virtual machine and the instruction set. In this introduction, each component of the machine is briefly described. This

More information

Processor Architectures

Processor Architectures ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture

More information

CHAPTER 6: Computer System Organisation 1. The Computer System's Primary Functions

CHAPTER 6: Computer System Organisation 1. The Computer System's Primary Functions CHAPTER 6: Computer System Organisation 1. The Computer System's Primary Functions All computers, from the first room-sized mainframes, to today's powerful desktop, laptop and even hand-held PCs, perform

More information

Instruction Set Architecture. Datapath & Control. Instruction. LC-3 Overview: Memory and Registers. CIT 595 Spring 2010

Instruction Set Architecture. Datapath & Control. Instruction. LC-3 Overview: Memory and Registers. CIT 595 Spring 2010 Instruction Set Architecture Micro-architecture Datapath & Control CIT 595 Spring 2010 ISA =Programmer-visible components & operations Memory organization Address space -- how may locations can be addressed?

More information

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions CS101 Lecture 26: Low Level Programming John Magee 30 July 2013 Some material copyright Jones and Bartlett 1 Overview/Questions What did we do last time? How can we control the computer s circuits? How

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

PART B QUESTIONS AND ANSWERS UNIT I

PART B QUESTIONS AND ANSWERS UNIT I PART B QUESTIONS AND ANSWERS UNIT I 1. Explain the architecture of 8085 microprocessor? Logic pin out of 8085 microprocessor Address bus: unidirectional bus, used as high order bus Data bus: bi-directional

More information

Chapter 7D The Java Virtual Machine

Chapter 7D The Java Virtual Machine This sub chapter discusses another architecture, that of the JVM (Java Virtual Machine). In general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware or software) that directly

More information

Systems I: Computer Organization and Architecture

Systems I: Computer Organization and Architecture Systems I: Computer Organization and Architecture Lecture : Microprogrammed Control Microprogramming The control unit is responsible for initiating the sequence of microoperations that comprise instructions.

More information

Lecture 12: More on Registers, Multiplexers, Decoders, Comparators and Wot- Nots

Lecture 12: More on Registers, Multiplexers, Decoders, Comparators and Wot- Nots Lecture 12: More on Registers, Multiplexers, Decoders, Comparators and Wot- Nots Registers As you probably know (if you don t then you should consider changing your course), data processing is usually

More information

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008 An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008 Computer Science the study of algorithms, including Their formal and mathematical properties Their hardware realizations

More information

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,

More information

SFWR 4C03: Computer Networks & Computer Security Jan 3-7, 2005. Lecturer: Kartik Krishnan Lecture 1-3

SFWR 4C03: Computer Networks & Computer Security Jan 3-7, 2005. Lecturer: Kartik Krishnan Lecture 1-3 SFWR 4C03: Computer Networks & Computer Security Jan 3-7, 2005 Lecturer: Kartik Krishnan Lecture 1-3 Communications and Computer Networks The fundamental purpose of a communication network is the exchange

More information

Switch Fabric Implementation Using Shared Memory

Switch Fabric Implementation Using Shared Memory Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today

More information