VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU



Similar documents
A Lab Course on Computer Architecture

Architectures and Platforms

Chapter 1 Computer System Overview

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Q. Consider a dynamic instruction execution (an execution trace, in other words) that consists of repeats of code in this pattern:

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

Digital Systems Design! Lecture 1 - Introduction!!

Computer Architecture TDTS10

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

COMPUTER SCIENCE AND ENGINEERING - Microprocessor Systems - Mitchell Aaron Thornton

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

ADVANCED COMPUTER ARCHITECTURE

Introduction to Cloud Computing

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

9/14/ :38

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Introducción. Diseño de sistemas digitales.1

Operating Systems 4 th Class

CPU Organisation and Operation

CHAPTER 4 MARIE: An Introduction to a Simple Computer

路 論 Chapter 15 System-Level Physical Design

Chapter 13. PIC Family Microcontroller

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

AES Power Attack Based on Induced Cache Miss and Countermeasure

Computer Engineering: Incoming MS Student Orientation Requirements & Course Overview

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

Instruction scheduling

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Introduction to Digital System Design

Computer Organization and Components

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Execution Cycle. Pipelining. IF and ID Stages. Simple MIPS Instruction Formats

FPGA-based Multithreading for In-Memory Hash Joins

Switch Fabric Implementation Using Shared Memory

An Introduction to the ARM 7 Architecture

CHAPTER 7: The CPU and Memory

Hardware/Software Co-Design of a Java Virtual Machine

AC : A PROCESSOR DESIGN PROJECT FOR A FIRST COURSE IN COMPUTER ORGANIZATION

Generations of the computer. processors.

A SystemC Transaction Level Model for the MIPS R3000 Processor

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

1. Memory technology & Hierarchy

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

ARM Microprocessor and ARM-Based Microcontrollers

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

The Quest for Speed - Memory. Cache Memory. A Solution: Memory Hierarchy. Memory Hierarchy

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

GUJARAT TECHNOLOGICAL UNIVERSITY, AHMEDABAD, GUJARAT. COURSE CURRICULUM COURSE TITLE: COMPUTER ORGANIZATION AND ARCHITECTURE (Code: )

Control 2004, University of Bath, UK, September 2004

An Ultra-low low energy asynchronous processor for Wireless Sensor Networks

Processor Architectures

EE361: Digital Computer Organization Course Syllabus

Using Graphics and Animation to Visualize Instruction Pipelining and its Hazards

7a. System-on-chip design and prototyping platforms

How To Design An Image Processing System On A Chip

Computer organization

İSTANBUL AYDIN UNIVERSITY

ELEC 5260/6260/6266 Embedded Computing Systems

How To Understand The Design Of A Microprocessor

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

Rapid System Prototyping with FPGAs

The Designer's Guide to VHDL

Computer Architecture

Exception and Interrupt Handling in ARM

Using the Game Boy Advance to Teach Computer Systems and Architecture

Programming Logic controllers

VLSI IMPLEMENTATION OF INTERNET CHECKSUM CALCULATION FOR 10 GIGABIT ETHERNET

Binary search tree with SIMD bandwidth optimization using SSE

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Interconnection Networks

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Real-Time Monitoring Framework for Parallel Processes

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Aims and Objectives. E 3.05 Digital System Design. Course Syllabus. Course Syllabus (1) Programmable Logic

Module: Software Instruction Scheduling Part I

SOC architecture and design

MICROPROCESSOR AND MICROCOMPUTER BASICS

Chapter 2 Logic Gates and Introduction to Computer Architecture

Computer Architecture Syllabus of Qualifying Examination

Computer Organization and Architecture. Characteristics of Memory Systems. Chapter 4 Cache Memory. Location CPU Registers and control unit memory

The WIMP51: A Simple Processor and Visualization Tool to Introduce Undergraduates to Computer Organization

SPARC64 VIIIfx: CPU for the K computer

Developments in Point of Load Regulation

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

Multi-Threading Performance on Commodity Multi-Core Processors

CS352H: Computer Systems Architecture

Instruction Set Architecture (ISA) Design. Classification Categories

Power Reduction Techniques in the SoC Clock Network. Clock Power

Chapter 2 Parallel Computer Architecture

Transcription:

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU Martin Straka Doctoral Degree Programme (1), FIT BUT E-mail: strakam@fit.vutbr.cz Supervised by: Zdeněk Kotásek E-mail: kotasek@fit.vutbr.cz ABSTRACT The paper deals with design of a modern, open-architecture CPU utilizable for educational purposes. It is expected that use of the CPU in the educational process will greatly contribute to deeper understanding of key-topics taught in the area of modern architectures. Our CPU is based on the Von-Neumann architecture, equipped with a five-stage pipeline, cache memory unit and simple branch prediction unit. The architecture is designed in VHDL including set of 16 instructions. Rich variety of educative tasks can be performed by means of the CPU. It has been both successfully simulated in ModelSim and synthesized in Precision RTL Synthesis in order to be implemented in FPGA and utilized in practice as a real working CPU. 1. INTRODUCTION It may seem to be nearly unbelievable from today s perspective but the computer market wasn t flowing with dozens of general-purpose CPUs ever since. Systematic development enabled the advent of microcontrollers and control units in a form of widespread components. However, these were still dedicated to a given task and the inherently rigid interface, in fact, put significant constraints on other application scenarios. This situation was gradually changing with the introduction of modern CPUs and programmable circuits like FPGA, which have taken a dominant position in computation systems. The universal nature of such electronic components offers a convenient platform for mostly any kind of possible tasks. The core of our interest is focused on more advanced processor architectures, especially the CPU with features including 5-stage pipeline, fast instruction and data cache, and simple branch prediction unit. The resulting design of our CPU reflects Von Neumann conception [3] and employs RISC instruction set with 16 basic instructions. Built-in register file is optimized for efficient data and instructions manipulation at corresponding locations in both cache and RAM memory. Execution of instructions takes place in a 5-stage pipeline with parallel overlay of its slices. The overall design has been implemented in VHDL language [1, 2], which represents very convenient way of digital systems design. The implementation and description in VHDL

was evaluated through ModelSim environment. Final step included synthesis with Precision RTL Synthesis for target FPGA circuit. Achieved results can be found in [4]. 2. DESIGN OF CPU The initial phase of our effort to develop a processor with modern features was closely tied with simple and, in the same time, sequential instruction processing method. This architecture has been completely reworked and modified towards scalar CPU. Our processor is 16bit device with both data and address bus at width of 16bit. Length of every instruction word is again 16bit. 2.1. 5-STAGE PIPELINE The conception of pipeline, or chained processing, can be described as the implementation technique which partitions execution of a given operation into number of subsequent steps, as far as possible of the same duration. Furthermore, each section assigned to a particular step can exploit standalone technical resources. Execution of single instruction requires 5 clock cycles but this overhead can be effectively hidden by parallel operation. Instruction processing may go through the following stages: IF instruction fetch ID instruction decode EX execution MA memory access WB write back Overlay processing of several consecutive instructions shows Table 1: CPU clock instruction 1 2 3 4 5 6 7 8 i IF ID EX MA WB i+1 IF ID EX MA WB i+2 IF ID EX MA WB Table 1: Instruction processing flow. Synchronous pipeline is composed of 5 stages where each of them is dedicated to different stage of instruction processing. Individual stages are separated by embedding additional registers. Interconnection scheme for 5-stage pipeline and respective building blocks are shown on Figure 1.

Figure 1: Structure of implementation for 5-stage pipeline. Pipelined or chained instruction processing may impose three types of conflicts which have been addressed in our design by appropriate circuitry structures. Summary of these conflicts and recommended solutions are shown in Table 2: Conflict type Data(RAW,WAR,WAW) Control Unit eliminating the conflict Stalling & forwarding unit Branch & prediction unit Structural Stalling unit Table 2: Hazards in pipeline. Every functional block of CPU is characterized by behavioural description. Then, mutual relationship between these elements is set up on structural level, where the physical interconnections are defined. General block scheme of our CPU, together with its interface, is depicted on Figure 2: 2.2. INSTRUCTIONS AND DATA CACHE Figure 2: Entity of CPU & CPU block The other part of implementation is dealing with L2 cache memory. The proposal of data and instruction cache is based on 2-way associative architecture. These two blocks of memory work in an asynchronous way and communication with CPU by means of ac-

knowledgement signals. Implemented cache allows for both methods of data update with LRU algorithm write-back and write-through. Their interface is outlined on Figure 3: Figure 3: Entity of Data and Instruction cache 2.3. UNIT FOR PREDICTION BRANCH Characteristic feature of modern processors is, among many others, the built-in branch prediction unit. This functional block keeps a track of jump instructions that have been already used. Such information are used to estimate in advance, already during the decode phase, whether the current jump under evaluation will really happen together with target address determination. Thus, our CPU will use this prediction to choose next instruction according to its probability occurrence ratio. Interface of 1-bit predictor, which was implemented, is shown on Figure 4. Figure 4: Entity of Prediction unit 3. RESULTS It can be easily concluded that characteristics of both processors (sequential and pipelined) are significantly different in all relevant parameters. From the architectural point of view, implementation of pipeline is much more complex than simple sequential processor. Let s compare execution speed with the following code snippet, where z=100, x=2 and y=2 are values stored in RAM (see Table 3). Table 3: Instruction code for CPU

Simulation results are recapitulated in Table 4: Table 4: Results of sequential CPU and pipelined procesor with cache 4. CONCLUSION Future directions of our work will be connected with the implementation of more sophisticated branch prediction and dynamic instruction scheduling. Last but not least, the eventuality of pipeline modification towards higher concurrency by means of additional functional units or even the whole pipeline replication is going to be thoroughly evaluated. However, the introduction of higher scale parallelism brings new issues on the stage, which are related to improved pipeline control and conflicts elimination. With extension of instruction set, floating point unit merged with the rest of CPU and some other changes our CPU could be probably refined to superscalar processor. REFERENCES [1] Cohen, B. VHDL Coding Styles and Methodologies, Kluwer, Dordrecht, 453 p., 2001, ISBN 0-7923-8474-1. [2] Douša, J. VHDL Language, Czech Technical University Publishing House, Praha, 76 p., 2003, ISBN 80-01-02670-1. [3] Hennessy, J.L., Patterson, D.A. Computer Architecture A Quantitative Approach, Third Edition, Morgan Kaufmann Publishers, New York, 1000 p., 2003, ISBN 1-55860-596-7. [4] Straka, M. CPU Design in VHDL, Brno University of Technology, Brno, Diploma work, 90 p., 2006. [5] Dvořák, V., Drábek, V. Architektura procesorů, VUTIUM, Brno, 293 p., 1999, ISBN 80-214-1458-8