Hybrid Simulation Framework for Virtual Prototyping Using OVP, SystemC & SCML

Size: px
Start display at page:

Download "Hybrid Simulation Framework for Virtual Prototyping Using OVP, SystemC & SCML"

Transcription

1 Hybrid Simulation Framework for Virtual Prototyping Using OVP, SystemC & SCML A Feasibility Study PRIYA AGRAWAL VLSI DESIGN TOOLS & TECHNOLOGY INDIAN INSTITUTE OF TECHNOLOGY, DELHI 2009

2 Hybrid Simulation Framework for Virtual Prototyping Using OVP, SystemC & SCML A Feasibility Study A thesis submitted in partial fulfilment of requirements for the degree of MASTER OF TECHNOLOGY in VLSI DESIGN TOOLS & TECHNOLOGY by Priya Agrawal 2007JVL2170 Under the guidance of Prof. Anshul Kumar Mr. Desingh Devibalan B (NXP Semiconductors) VLSI DESIGN TOOLS & TECHNOLOGY Indian Institute of Technology, Delhi 2009

3 CERTIFICATE This is to certify that the thesis titled Hybrid Simulation Framework for Virtual Prototyping Using OVP, SystemC and SCML A Feasibility Study being submitted by Priya Agrawal to the Indian Institute of Technology, Delhi for the award of the degree of Master of Technology in VLSI Design Tools and Technology is a bonafide work carried out by her under our supervision and guidance. The research reports and the results presented in the thesis have not been submitted in parts or in full to any other University or Institute for the award of any degree or diploma. Dr. Anshul Kumar Professor Department of Computer Science & Engg. Indian Institute of Technology Delhi Desingh Devibalan B Technical Leader CTO & IC Design Cluster NXP Semiconductors Bangalore i

4 ACKNOWLEDGEMENT I would like to express my heartily thanks to Professor Anshul Kumar, Department for Computer Science and Engineering, IIT Delhi, my academic guide for overall motivation, support and guidance during this project. I would like to sincerely thank my supervisor Desingh Devibalan B for providing me such a challenging project to work on. His constant guidance and invaluable suggestions throughout the project and his critical approach to problems has led to the successful completion of this project. Furthermore, I am also thankful to Duncan Graham, Lee Moore and Larry Lapides (Imperas), Raghunandan Balasubramaniam, Chandrashekhar and Mischa Jonker (NXP Semiconductors) for their support and encouragement during this project. It has been very enlightening and enjoyable experience to work with them. I wish to express my great thanks to my family members who supported me in all the endeavors I had during thesis work. Finally, I owe many thanks to my colleagues and friends for making my stay in IIT Delhi and NXP Semiconductors, Bangalore memorable. Priya Agrawal M. Tech (VLSI design Tools and Technology) IIT Delhi ii

5 ABSTRACT The increasing software development cost and effort and decreasing turnaround time requirement for Multiprocessor SoC has made the designers strive for fast virtual prototyping solutions capable of simulating the system at speed of several hundreds of MIPS. Several fast prototyping solutions are provided by the ESL designers worldwide. Open Virtual Platforms enables simulating embedded systems running real application code. This project aims at exploring this new technology and its interoperability with the existing TLM based SystemC platforms. The present work addresses the details of the technology, experiments done with it to check its simulation performance and possibility for hybrid simulation with SCML. Experimentation for simulation speed comparison of OVP with existing proprietary prototyping solutions and hybrid simulation discloses some important observations which are also reported. iii

6 Table of Contents 1. INTRODUCTION Overview Motivation Organization BACKGROUND Need of Prototyping SystemC / Transaction Level Modeling System Simulators OPEN VIRTUAL PLATFORMS Introduction OVP APIs Innovative CPU Manager (ICM) Virtual Machine Interface (VMI) Behavioral Hardware Modeling (BHM) and Peripheral Programming Model (PPM) OVP models OVPSim The OVP Simulator Additional Features of OVP WHY is OVP fast? Hybrid Simulation support for OVP Approach to TLM Details of OVP inside TLM SIMULATION PERFORMANCE EXPERIMENTS Application under test - JPEG Decoder Application Task Graph Mapping for Dual Core Platform Single Core Platform Backdoor Mode Simulations...22 iv

7 4.5 Dual Core Platform RESULTS AND ANALYSIS Single Core Platform Dual Core Platform EXPERIMENTATION FOR HYBRID SIMULATION WITH SCML Proposed Wrapper for Hybrid Simulation of OVP, SystemC and SCML Initial Experimentation Integrating SCML modeled SystemC TLM peripheral Important Observations Proposed Solutions CONCLUSIONS Summary Future Scope...38 REFERENCES v

8 List of Figures Figure No. Caption Page No OVP Interfaces Wrapper for Processor Model Processor Wrapper Implementation JPEG Decoder Dual Core System Architecture Partitioning for JPEG Decoder Single Core Platform Backdoor Memory Access Dual Core Platform Input Image Time variation with Nominal MIPs for single core platform Time variation with Quantum size for single core platform Time variation with Nominal MIPs for dual core platform Time variation with Quantum size for dual core platform Hybrid OVP/SCML Simulation Inter Processor Communication Block Interrupt Driven Dual Core System Dual core inter processor communication flow vi

9 List of Tables Table No. Caption Page No System Load distribution for JPEG Decoder Speed Comparison for Single Core System Speed Comparison for Dual Core System Simulation Statistics for IPC Based System vii

10 Chapter 1 INTRODUCTION 1.1 Overview Today s embedded systems need to verify that the combination of hardware and software matches the expected functionality and performance. The turnaround time requirement of any project design is decreasing every year. In order to design and verify the prototype of large systems, fast simulation requirement is a must. This project aims to investigate the feasibility of adopting a virtual Prototyping technology based on binary translation to improve the simulation speed of Software verification. Imperas, on March 07, 2008 announced the release of a virtual platform and modeling technology to enable simulating embedded systems running real application code. This technology is called OPEN VIRTUAL PLATFORMS. In this project, an attempt has been made to explore the technology provided by Imperas. 1.2 Motivation Virtual platforms (VP) have been used for some years to develop, analyze, optimize and validate system - level hardware architecture. Today s offerings of prototypes are architected for single core SoC and does not scale to large number of embedded processors, specifically when it comes to simulation speed and debugging usability. IMPERAS provide multi-processor (MP) virtual prototyping, simulation, and debugging. Building a virtual prototype with Imperas tools simulate efficiently at speeds of 100s and 100s of MIPS on desktop PCs.They are completely Instruction Accurate and model the whole system. OVP and its APIs help foster model interoperability, which is vitally needed now in electronic system level (ESL) design. [1][2][3] As the virtual platform solutions offered by Imperas seems to be quite promising, we have intended to analyze the feasibility of stitching OVP processor models together with other peripherals modeled in SystemC/Open SCML. The key objective is to create a proof-of-concept platform to demonstrate this hybrid simulation framework for virtual platforms. Also, we have tried to benchmark and compare the simulation performance 1

11 of OVP for single/multi-core platforms against the simulation framework provided by one of the industry leading ESL vendor. This hybrid simulation framework shall lead to new avenues in simulation of complex SoC Platforms built from various ESL/IP vendor supplied IPs in SystemC/SCML (eg. CoWare, ARM), testing the true inter-operability defined by the TLM2.0 standard. Thus it shall reduce the engineering effort in creating high speed multi-core virtual platforms for early software development to meet the tight time-to-market windows. 1.3 Organization The entire work is organized as follows. Chapter 2 presents the background base needed. Some existing prototyping solutions are also discussed. Chapter 3 highlights the details and components of the Open Virtual Platform technology. The hybrid simulation support provided to integrate OVP models in SystemC environment is also presented. Chapter 4 contains detailed description of platforms constructed in OVP and the proprietary modeling environment with corresponding simulation statistics presented in Chapter 5. The proposed wrapper for hybrid simulation of OVP with SCML, simulation experiments and observations for the same are discussed in Chapter 6. Finally Chapter 7 provides the summary of all the experimentation and suggestions for future exploration. 2

12 Chapter 2 BACKGROUND 2.1 Need of Prototyping With the increasing complexity and integration in SoC, software development costs are rising very high. A simulation environment is necessary to simulate the system under design so that software developers can test the software and hardware developers can investigate design alternatives. Traditionally, techniques like FPGA prototyping and emulation have been proposed for software validation. [4] However, these solutions are available too late (once the RTL is available) and significantly impact the design cycle. With software development determining project success/failures, modularity and fast prototyping have become important aspects of simulation framework. The SystemC and TLM based new approaches of system level modeling helps provide fast prototyping solutions. 2.2 SystemC/ Transaction Level Modeling SystemC supports modeling of complex hardware systems with different abstraction levels, with modeling of hierarchical components, as it is build over C++. No doubt, the achievable simulation speed depends on the level of model abstraction, which also determines the platform s accuracy. SystemC has always been intended to support the actual embedded software development, but SystemC has not possessed all the necessary technology components to fully enable it. Within the Transaction-Level Modeling (TLM) working group of OSCI, several different abstraction levels are introduced which enable faster simulations. [5][6] The transaction mechanism allows a process of an initiator module to call methods exported by a target module, thus allowing communication between TLM modules with very little synchronization code thereby significantly reducing communication overhead in modeling of SoCs. The draft of TLM2.0 standard introduces new transaction abstractions so that platform components can communicate and be interoperable [7]. The use of default tlm_generic_payload transaction type enables this. 3

13 The further improvements provided by TLM2.0 which results in faster simulations of models are listed as follows: 1. Direct Memory Interface (DMI): This allows direct backdoor access to memory and thus allows un-inhibited Instruction set simulator execution as the transport call does not actually goes over the bus avoiding any bus conflicts. 2. Loosely Timed modeling: There is no timing annotation in the model. This has speed- accuracy tradeoffs. 3. Temporal Decoupling: The models can have their own local clock which synchronizes with the SystemC global clock only at adequate synchronization points. This allows simulation speed up for multi-core platforms. Much work has been done in embedded software generation from transaction level description. Some examples of this are discussed in the following section. 2.3 System Simulators Full system simulation makes it possible to run the exact binary embedded software including the operating system on a totally simulated hardware platform. The simulation environments thus need to support full system simulation, and should use some hardware modeling techniques. Moreover, the simulations should be fast to enable early software development. The most challenging way to enhance simulation speed is to simulate the processors. Processor simulation is achieved with Instruction Set Simulation (ISS) [8]. Instruction set Simulators can be: - Interpretive ISS - Static compiled ISS - Dynamic compiles ISS In the past decade, dynamic translation technology has favored many ISS [9]. The binary target code to be executed is dynamically translated into an executable representation. There are typically two variants of dynamic translation technology: 1. The target code is translated directly into machine code for the simulation host. 2. The target code is translated into an intermediate representation that makes it possible to execute the code with fast speed. 4

14 Dynamic translation introduces a compile time phase as part of the overall simulation time. But as the resulting code is re-used, the compilation time is amortized over time. Based on dynamic translation, some simulators have been designed. SimSoC demonstrates an integrated simulation framework relying only upon SystemC and transaction-level modeling [10]. The ISS uses dynamic binary translation using the second technology stated above. The speed results are lower than what are achieved using the binary translation to host machine code. Moreover, the solution do not scale well with multi-core platforms as it uses lot of time costly wait() instruction for simulating parallel executing cores. Moreover, if wait () is used after a large number of instruction to avoid simulation overhead, simulations are not faithful enough. Virtual Machines such as QEMU and GXEMUL [11] also emulate to a large extent the behavior of a particular hardware platform. QEMU is a form of dynamic translation based on technique 1. Though QEMU and GXEMUL include many device models of open-source C code, but these models lack interoperability. Besides, QEMU enables simulation of fixed defined single processor simulators. Several providers of virtual platform technology have also come up with their own platform-driven electronic system-level (ESL) design solutions which promise high simulation speed and accuracy. Technologies like that of Virtio, Simics from Virutech, Platform Architect from CoWare, Design Ware from Synopsys are few examples.[12] The fate is that all of them develop proprietary modeling solutions. Imperas on the other hand provides Open Virtual Platforms, the infrastructure technology which is open source and free, focused on multi-core platform development and high simulation speed for embedded software development. In the following chapter we discuss the basic know-how of the OVP technology, its core components, significant features and the extensions that enable it to work in SystemC platforms with TLM2.0. 5

15 Chapter 3 OPEN VIRTUAL PLATFORMS 3.1 Introduction Imperas announced the formation of the Open Virtual Platforms alliance, or OVP, and seeded it with some of their technology serving the market requirements. This includes programming models, verification/debug/analysis tools, and simulation platforms. The interfaces provided by Imperas address the model interoperability problem. The primary entity is that, Imperas have made their technology public. OVP has three main components [13] 1. The OVP APIs that enable C models to be written. 2. A collection of open source processor and peripheral models. 3. OVPsim, a simulator that executes these models. 3.2 OVP APIs To model an embedded system there are several main items to be modeled: Platforms, Processors, Peripherals and environment. The platform connects and configures the behavioral components. The processors fetch and execute object code instructions from the memories, and the peripherals model the components and environment that the operating system and application software interacts with. OVP is thus made of four interfaces. - Innovative CPU Manager - Virtual Machine Interface - Behavioral Hardware Modeling Interface - Peripheral Programming Model Interface The combination of these interfaces makes the complete Platform. The interaction between these interfaces can be shown in figure Innovative CPU Manager (ICM) The ICM is a C API used to create the platform netlist of the design/system for use with OVPsim simulator. It allows instantiation of multiple processors, buses, memories and 6

16 peripherals that can further be connected together and application programs executables can be loaded in simulated memories. [14] Virtual Machine Interface (VMI) Figure 3.1 OVP Interfaces VMI is the C based processor interface, allowing the processor model to communicate with the simulation kernel and the other components of the system. VMI is the heart and soul of the high performance execution provided by OVP. Processors in OVP use a code morphing approach which is coupled with a just-in-time (JIT) compiler to map the processor instructions into those provided by the host machine. In between are a set of optimized opcodes into which the processor operations are mapped, and OVPsim provides fast and efficient interpretation or compilation into the native machine capabilities. Some of the capabilities of VMI are listed below [15]: 1. VMI allows a form of virtualization for capabilities such as file I/O. This allows direct execution on the host using the standard libraries provided. 2. Encapsulating existing ISS models within OVPsim, provided that they export some basic features (for example, the existing ISS model should be available as a shared object, provide an API to allow it to be run instruction-by-instruction or for a number of instructions, and provide an API allowing memory to be modeled externally) is possible through VMI. 3. VMI enables modeling of the mode dependent behavior (kernel/user mode) of an instruction. Using the VMI, OVPsim can implement arbitrary multiprocessor systems. 4. The VMI can be used for both RISC and CISC processors. Any instruction format can be supported. 7

17 5. VMI also allows modeling of L2 caches and other extensions around the processor Behavioral Hardware Modeling (BHM) and Peripheral Programming Model (PPM) They are used to write behavioral models of hardware/software systems which are peripheral to the processors in the platform being developed. Each instance of a peripheral model runs on its own virtual machine with an address space large enough for the model. This processor and its memory are separate from any processors, memories and buses in the platform being simulated; they exist only to execute the code of the peripheral model. This processor is called a Peripheral Simulation Engine or PSE for short [16]. The difference between PPM and BHM is: BHM This API gives access to Behavioral modeling processes (threads) Simulated delays Events Diagnostic control and simulator message stream. This API can support more general forms of communication and provides the piece that TLM is missing. PPM This API gives access to Connectivity of peripherals in platforms. o o o o Creation and control of Ports and nets Address spaces Windows into memory address space Create behavior on memory region accesses Install callbacks Thus this API understands about buses and networks and is similar in terms of functionality with the OSCI TLM interface proposal. The BHM/PPM has similar concepts to SystemC, but each instance of each model exists in its own private address space. It is normally pretty easy and simple to wrap existing C functions in a BHM/PPM peripheral model. 3.3 OVP models OVP provides with some processor models like ARM7, several MIPS processors, Tensilica and the OpenRISC OR1K. 8

18 A number of standard embedded devices to allow assembly of a complete platform, including various types of memories, traps, bridges, DMA engines, and UARTs, to name a few are also modeled. OVP processor models are instruction accurate. In the realm of ISS models, the instruction accurate models are approximately timed in that they claim to, on most occasions, execute each instruction using the correct number of clock cycles and they perform their I/O operations at sort of the right place within the instruction [17]. OVP processor models are however, instruction accurate in purely functional space and not in the behavioral space. To make this point clear functional models and behavioral models are strictly defined to be different. A functional model does not include timing, although it may include sequence. A behavioral model includes timing although the level of detail of timing is not defined. Both models can exist at any level of abstraction. Thus the ISS models which are generally used by several prototyping environments (eg. PV abstracton level of TLM compliant SystemC) are the behavioral models. OVP models are functional models. Instruction accuracy in terms of OVP means that the registers hold the correct values at the end of each instruction and create the right side effects from executing that instruction. They progress one instruction at a time and do not know anything about multi-execution pipelines, out of order execution or anything of those sorts. 3.4 OVPSim The OVP Simulator OVPsim provides infrastructure for describing multicore platforms. The OVPsim simulator can simulate arbitrary multiprocessor shared memory configurations and heterogeneous multiprocessor platforms. OVPsim is a very fast simulator. Performance of OVPsim depends on several factors (for example, the processor variants used in the platform and the exact nature of the application itself), but typically speeds of hundreds of millions of simulated instructions per second can be expected. The simulation experiments conducted for similar platforms in OVP and one of the proprietary modeling industry standards reiterates the claim of greater speed efficiency of OVP. Since OVPsim platform models can be compiled as shared objects, they can be encapsulated in any simulation environment that is able to load shared objects. This includes C, C++, and SystemC simulation environments. The commercial Linux based Imperas simulator supports multiprocessor debugging, not provided in Windows based free simulator and provides even higher simulation performance. 9

19 3.5 Additional Features of OVP 1. Semi-hosting: Semihosting is the ability to provide host functionality to the simulated processor or peripheral. The semihosting library has full access to the simulated processors registers, stack and memory space. Far more complex scenarios can be envisaged including for example, using the host network interface, host USB port in order to get connectivity to the outside world, from the simulated platform. The capabilities within the semihosting library interface provided by OVP can be used to model a huge range of system functions. 2. Mapping the processor address region to external memory: The processor address space can be explicitly specified to contain separate RAMs and ROMs. It is also possible to specify that certain address ranges will be modeled by callback functions in the ICM platform itself, which is useful for modeling memorymapped devices. Such a capability is exploited in the current work to make OVP processor work with SystemC/TLM based models thereby establishing a hybrid simulation framework. 3. Integration with other environments: Normally, simulators tend to want to be masters and can call into other models or simulators. This creates a conflict when two simulators need to be bolted together because neither of them really wants to relinquish control. Imperas OVP simulator is built as a slave and thus callable from other environments such as SystemC. The reverse is however not true. OVPsim cannot call a SystemC model. This is quite natural since the calling of SystemC would bring the entire simulator performance back down to the very thing it is trying to replace. On the other hand, substituting part of the system which is a SystemC based platform with an OVP model may bring about a large performance gain in relative terms. However, Amdahl s Law tells us that we get diminishing returns dominated by the slowest running piece of the entire system, and thus even one slow SystemC model will make the entire system crawl along at the slow rates. Putting OVP models in SystemC environment therefore requires careful scheduling. OVP models and subsystems can be encapsulated in SystemC platforms and harnessed using: - sc_clock(), i.e. at the detailed instruction or clock level - TLM 2.0, i.e. the new OSCI transaction level approach 10

20 Both cases of integration have been tested in this work. Since modeling in pure SystemC brings down the simulation speed and is not desirable for Software development use case, we emphasize on integrating OVP models at the transaction level. 3.6 WHY is OVP fast? The OVP technology from Imperas enables to create faster virtual platforms for software development. This includes several key components to enable fast simulation speed. As a result of the following key technologies incorporated in OVP, virtual platforms are able to run at several 100 MIPS of execution speed. Some of these features are mentioned below: 1. Just in Time Code Morphing: Conventional processor models written in an HDL or similar modeling language might be implemented by a loop that is activated by a clock signal. On activation of the system clock, the model might fetch the next instruction, decode it, and call specific functions to execute the instruction. Certain optimizations however may be performed to speed up execution. Although models written in this conventional style can be accurate and straightforward in structure, they are not fast. Processor models designed for the Imperas tool set instead use just-in-time (JIT) code morphing technology. [15] The technique is quite similar to dynamically compiled ISS. This works as follows: 1. As each new processor instruction is encountered during program execution, the instruction is translated (morphed) into equivalent native machine code. The exact translations to be made are specified by the processor modeler using the Imperas Virtual Machine Interface (VMI) API. 2. Contiguous sections of translated processor instructions are gathered into code blocks, which are held in a dictionary for the processor. Separate dictionaries are held for supervisor mode code fragments and user mode code fragments. 3. If a processor performs a jump to a simulated address that has already been translated to a code block held in the dictionary, there is no need to perform the translation again: the simulator simply re-executes the existing code block. Imperas technology handles the generation of native machine code and the efficient management of code blocks and dictionaries to give extremely fast simulation. This is possible because, as simulation proceeds, run time (execution of translated code blocks) 11

21 dominates morph time (JIT compilation). High processor models are created by doing as much work as possible at morph time and as little as possible at run time. It may be possible that not all instructions map closely to the Just-In-Time code morphing opcode set. Those that don t can be implemented using function calls from Just-In-Time morphed code at run-time. Such a simulation method is capable of providing speed improvements if the application under test has a portion of code used repeatedly, which in general, all the real time applications do. 2. Program Counter Modeling: The simulator always knows the address of the current instruction. Instead of maintaining the program counter value each time in the processor model, it is fetched directly from the simulator when required. Thus the processor models do not explicitly model the register values that are infrequently referenced and can be created easily on demand. The same is the case very often for processor status registers. This makes processor models execute at a faster rate. 3. Simulation Performance Options: ICM_ATTR_RELAXED_SCHED: The standard multiprocessor scheduling algorithm built-in to the simulator normally simulates each processor for exactly the number of instructions implied by the processor MIPS rate and time slice before moving on to the next processor in that time slice. Using the instance attribute ICM_ATTR_RELAXED_SCHED indicates to the scheduling algorithm that a closely-approximate number of instructions can be used for that instance. This makes simulations much faster. This could be explained in detail as follows: The exact number of instructions for which the processor needs to execute can be calculated as: No. of instructions = Processor Nominal MIPS 10 time slice duration Consider an example of a single code block containing native code implementing four simulated arithmetic instructions and one simulated jump instruction, so five simulated instructions in total. Suppose that relaxed scheduling isn't enabled and the simulation is reaching the end of a time slice, with just three instructions left to perform in that time 12

22 slice, and that the next block to run is the one described above, which actually contains five instructions. In this case, the simulator won't be able to use the code block as it stands, as that would result in execution of too many instructions in this time slice. It therefore has to discard that code block and generate a new one, containing only three instructions, so that the instruction count is exactly correct at the end of the time slice. This incurs significant overhead. In relaxed scheduling mode, the simulator won't execute the code block in this time slice, and won't discard it. This is much more efficient, but it means that not quite enough instructions have been executed in the time slice (e.g instead of ). The simulator will attempt to make up the difference in the next time slice (i.e. it will try to execute instead of instructions next time round) so errors do not build up over time. ICM_ATTR_APPROX_TIMER: Processor models often contain countdown timers that expire after a certain number of instructions, causing an exception. Once again, modeling these timers to an exact instruction imposes a significant simulation overhead. If a closely-approximate number will do (as is usually the case, as instruction countdown timers are themselves often approximations of cycle countdown timers) simulation is much faster when the countdown counter expires frequently. Using the instance attribute ICM_ATTR_APPROX_TIMER indicates to the scheduling algorithm that a closelyapproximate number of instructions can be used for countdown timer expiry. Besides the key features mentioned above which enable fast system simulations, OVP also comes with the capability to be integrated with the existing SystemC based platforms. In order to achieve this, a wrapper is needed to be put around the OVP models. The next section describes the methodology which enables hybrid virtual prototyping using OVP models. 3.7 Hybrid Simulation Support for OVP SoC makes intensive use of various IPs. Components reuse becomes necessity to reduce the design challenge. This requires design methodologies for inter IP communication and implementation. This flattening of the design process can be best managed through platform based design at transaction level. TLM2.0 provides new level of performance and interoperability. With TLM2.0 it is possible to enable models from different vendors to work together in a virtual platform. The OVP provides C++ interface to encapsulate OVP models in the SystemC environment. New developments have been made to make OVP models work in TLM2.0 compliant SystemC platforms. The availability of SystemC TLM2.0 technology to use with OVP CPU models allows the encapsulation of OVP models in existing TLM2.0 compliant SystemC platforms, thereby solving the model 13

23 interoperability issue and enabling fast solutions for successful deployment of virtual platforms by hybrid simulation of OVP and SystemC. 3.8 Approach to TLM2.0 In order to integrate already existing OVP models, wrappers are written that is put around the existing code for making it compatible with the OSCI TLM APIs. The conventional APIs in OVP are built in C. To make TLM2.0 compliant SystemC wrapper several new classes are constructed in which the conventional C routines for the models are called. These classes build the wrapper around the binaries of the OVP processor, peripheral, memories and bus models allowing them to be exported to an outer simulation environment other than OVP. Once exported to SystemC environment, these models can then be controlled from the SystemC interfaces. Of the various abstraction levels provided by TLM2.0, it is the loosely timed modeling that gives a higher performance. It enables processes to run ahead of simulation time (temporal decoupling) and uses a quantum keeper. It is this abstraction level on which wrappers have been built so that the models could be run as fast as possible. Features like Direct Memory Interface are used to provide direct pointer to memory in the target bypassing the sockets in the transport calls enabling a faster simulation needed for software development use case. The processor has the option to invalidate DMI in which the transport calls goes over the bus. The wrappers are supported for TLM2.0 blocking transport interface with timing annotation. 3.9 Details of OVP inside TLM2.0 The wrapper to put OVP processor models in the TLM2.0 environment is a generic wrapper that can further be extended according to the processor under use. The wrapper allows free-running of each processor for a large number of instructions rather than advancing all processors in lock-step. [18] The generic wrapper for the processor model is described in the form of a class derived from SC_MODULE. The details of the wrapper are shown in figure 3.2. The implementation methodology can be seen in figure 3.3. To enable encapsulation at TLM level, first very basic C++ wrappers are built that put every instance of a processor, bus etc inside separate classes (Processor/Bus object shown in figure 3.2). These classes access the core OVP functionality of the respective model. The outer SC_MODULE then calls objects of these processor and bus classes. (CPU object shown in figure 3.2). The specific processor for e.g. MIPS, ARM can then be derived from this basic processor 14

24 providing a third layer for the wrapper. Based on this hierarchy of wrappers, thus the module of processor shown in the figure 3.3 has objects of the bus instantiated inside it. This allows mapping of the OVP processor address space to a local OVP memory/peripheral (through OVP Bus) as well as an external memory or peripheral with a TLM2.0 target socket. When the processor is connected to an internal OVP memory or a peripheral, the connection is made directly from the OVP bus shown. In order to connect to an external memory/peripheral, a portion of address space of the local OVP bus, directly connected to the OVP processor is bridged to another bus (TLM Bus shown in figure) over which read/write callbacks are registered. Initiator sockets are opened on the processor model. Any access to this TLM bus address space which is mapped to an external memory/peripheral will trigger these read/write callback functions on the TLM bus indirectly connected to the processor. The callback functions then create the appropriate transaction request and forward the transport call with its generic payload over the initiator sockets. MIPS/ ARM Processor Object (Layer 3) CPU Object (Layer 2) Processor Object (Layer 1) Processor Model (OVP) Bus Object (Layer 1) Bus Model (OVP) Fig 3.2 Wrapper for Processor Model This is a generic wrapper put around CPU models and is used in a processor configuration specific layer to create specific processor wrappers like that for ARM, MIPS etc. which is then instantiated into the SystemC platform. The processor thus, on encountering an instruction that do a load/store to/from memory location on the bus, will call a function in the wrapper code which in turn issues the necessary blocking transactions on the bus. Wrappers for the peripheral model are also constructed in a similar fashion using the read/write callbacks registered on the bus connected to the peripheral model within an SC_MODULE. The TLM2.0 wrapper also provides a bus decoder with a configurable number of initiator and target sockets which is used to forward the transaction arriving on its target port to the proper initiator port based on the bus address map. 15

25 SC_MODULE TLM2 Initiator Socket OVP Processor OVP Bus Bus Bridge TLM Bus Fig 3.3 Processor Wrapper Implementation The SystemC environment thus calls the OVP simulator through this wrapper. Proper synchronization between the two simulators needs to be maintained to achieve correct working of the models in the platform. As the simulation starts, each processor runs from a SystemC thread. The thread executes IPQ instructions on the processor without advancing SystemC time where: IPQ = Processor Nominal MIPS QuantumSize The function call asking the processor to simulate for IPQ instructions is from OVP environment through the wrapper. When the allotted instructions have completed, the thread calls SystemC wait() to advance time. The OVP simulator synchronizes with the SystemC simulation kernel every time the quantum is over. Thus each processor executes a number of instructions at a time in a round-robin schedule. Based on this background, a wrapper is prepared to enable OVP models communicate with Open SCML based models. The details of the wrapper and the experimentations done with that are presented in chapter 6. The following chapter presents the simulation performance experiments done with OVP. 16

26 CHAPTER 4 SIMULATION PERFORMANCE EXPERIMENTS Imperas solutions claims to simulate platforms consisting of one or more processors running real time application, at speed of hundreds of MIPS which is needed for today s embedded software development environments. In order to validate this claim put by OVP, we have compared the simulation statistics for similar platforms constructed using OVP and some other modeling technology. The proprietary virtual prototyping solutions provided by leading ESL designer are chosen to be compared against Open Virtual Platforms Technology. Similar single and dual core platforms are constructed in different environments and their simulation statistics are compared. 4.1 Application under test - JPEG Decoder In order to simulate the platforms, there is a need to choose proper application which could be executed on the processor. The choice of application should be such that the workload on the processor is quite high. Baseline JPEG Decoder is chosen as a benchmark application for our current simulation framework. Joint Photographic Experts Group or in short, JPEG is a widely used image compression technique. It is used in image processing systems such as copiers, scanners and digital cameras. A JPEG decoder is capable of reconstructing image data from a stream of compressed image data. This requires that some transformations be applied to the compressed image data. This results in the reconstruction of the image data. The fact that this coding method forms the basic coding method for all DCT-based JPEG decoders makes it an interesting decoding method. For that reason it was selected to be implemented in this project. JPEG decoder is a streaming multimedia application which has a degree of parallelism and consists of 5-6 tasks. The JPEG decoding process is graphically depicted in Figure 4.1. Before the operations performed by the decoder are explained, we look at the encoder. The JPEG encoder divides an image in blocks of 8 by 8 pixels. The encoder then has a number of blocks, which when placed in the right order, form the original image. The encoder applies a number of operations on each of these blocks. These operations include a discrete 17

27 cosine transform, quantization, zigzag scan and variable length encoding. The result of these operations, and of the encoder, is a compressed image. Compressed Image Data VL ZZ D IDC Color Conversion Re-order JPEG Reconstructed Image Fig 4.1 JPEG Decoder The decoder reverts the transformations applied by the encoder to the image data. The decoder takes the compressed image data as its input. It then subsequently applies following operations to the compressed image 1. Variable Length Decoding (VLD) 2. Zigzag scan (ZZ) 3. De-quantization (DQ) 4. Inverse Discrete Cosine Transform (IDCT) 5. Color Conversion 6. Reordering The decoder then obtains the reconstructed bitmap image. The compressed image data forms a byte stream input for the decoder. This byte stream contains so called markers. A marker is a two-byte combination, which identifies a structural part of the compressed image data. The incoming bit stream is parsed to get header information and image data, based on the markers and various transformations are then applied. Details about different transformations and markers can be found in [19]. 4.2 Application Task Graph Mapping for Dual Core Platform Besides single core, homogeneous multi-core core platforms using MIPS processor are also constructed over which the same JPEG decoder application is tested. The case is limited to dual core systems but could be extended to several cores depending on the workload of the application. In order to execute the same application on two processors, we need to partition the total tasks among two processors in such a way that each processor has almost equal computation and communication load. [20] 18

28 As seen from figure 4.1, the various tasks in the decoder are performed one after the other. Thus the platform will be having processor cores which are active one after the other. Therefore the architecture of the dual core system looks something like in Figure Core 1 2 Core 2 3 Fig 4.2 Dual Core System Architecture To select proper task partitioning for the application under experimentation, the careful study of the application is done to find the match between JPEG decoder and the dual core platform. The compressed image data uses connection 1 for our twin-processor system. The compressed image data is connected to the VLD in the JPEG decoder. Therefore, the VLD must be incorporated in the first processor. As seen in Figure 4.1, re-ordering is connected to the output, which is connection number 3 in the system so is mapped to core 2. In order to divide the ZZ, DQ, IDCT and color conversion over the two cores, the data consumption and production rate of the various parts of the system is looked upon. The VLD consumes data from the outside world and produces data in blocks. The zigzag scan, de-quantization and IDCT also consume and produce one block at a time. The color conversion and re-ordering requires one or more (up to 10) blocks before they can run. The color conversion however produces data in a block-by-block basis and sends this to the re-ordering unit which then produces output data. This implies that the communication over connection 2 of our dual core system is always in blocks. Thus every division of the JPEG decoder in two cores requires the same data rate. The subdivision of the JPEG decoder does not influence the communication load of the system. Still, for the proper partitioning of decoder among the 2 cores, the computation load on two cores must be more or less same. This enables core 2 to start as soon as core 1 has produced one block. The survey result of the system load for various parts of the JPEG decoder is shown in Table 1. The table 4.1 shows that partitioning just before and after IDCT-function is the easiest to realize. This choice enables almost 50-50% of load sharing among two cores. It also has the advantage that the Huffman decoding and de-quantization tables required by the VLD and DQ units respectively do not need to be shared by both processors. Based on this 19

29 task partitioning, the data flow among the two processors in the system is shown in Figure 4.3. Part Task in JPEG decoder CORE 1 VLD 35 ZZ 5 DQ 10 CORE 2 IDCT 20 Color Conversion 15 Re-ordering 15 Computation Load (% of total load) Table 4.1 System Load distribution for JPEG Decoder Input: Proc 1 Image Properties JPEG Image VLD ZZ DQ Proc 2 FBlocks Output: BITMAP image Color Conv. IDCT Re-order Fig 4.3 Partitioning for JPEG Decoder 4.3 Single Core Platform To study the simulation performance, platforms having the same configuration are constructed for the following three cases - Pure OVP simulation framework ESL vendor supplied virtual Prototyping framework Hybrid OVP+ SystemC in OVP simulation framework Hybrid OVP+ Open SCML in OVP simulation framework 20

30 For the simplicity of the experimentation, dedicated peripherals are not added to the platform. Also to maintain the fairness of the comparison, it is necessary that the same variant of processor model should be used in all the three cases. We have chosen Instruction Accurate (IA) MIPS32_24Kc processor model for the same. The details of the platform can be seen in Figure 4.4. The program memory shown in the figure is used to store the executable binary of the application to be executed. This binary is in standard executable and linking (.elf) file format. The input and output image memories are used to read the compressed image data and store the final reconstructed image data after decoding respectively. The process of loading and storing of image in memory is automated in the platform. Once the image is loaded, the processor executes the application in which the data is read from the input image memory. The Quantization and Huffman tables needed for various transformation during the decode process are present in the image. As the image is read on byte-by-byte basis from the memory, based on various markers that are found, these tables are read and stored in the local storage with the processor. These are then used to decode the image pixel data and the final reconstructed image data is obtained in a sun-raster image format. This image is stored in an output image memory. The hybrid single core platform where the OVP processor model is put around TLM2.0 compliant SystemC wrapper is also experimented. In such platforms, OVP processor with a TLM2.0 compliant SystemC wrapper is made to interact with simple TLM2.0 target memories. This is done by connecting a processor model to a SystemC based bus decoder which has TLM2.0 target and initiator sockets. The bus decodes the incoming address and based on the address, forwards the transaction to one of its initiator port which is connected to TLM2.0 target socket of the memory. Program Memory Input image Memory Output image Memory Bus Decoder MIPS32_24Kc Processor Model (IA) Fig 4.4 Single Core Platform 21

31 4.4 Backdoor Mode simulations In all the cases, simulations have been carried out in backdoor mode. This is a way to access memory/peripheral in which the transaction request does not actually goes over the bus. For the case of pure OVP environment, the processor accesses the memory through a pointer. The proprietary solutions also provide options for simulation in backdoor mode. For the Hybrid simulation case, it is the Direct Memory Interface support provided by TLM2.0 that enables backdoor mode simulation. The DMI provides a means by which an initiator can get direct access to an area of memory owned by a target, thereafter accessing that memory using a direct pointer rather than through the transport interface. This offers a potential increase in simulation speed for memory access between initiator and target models. Figure 4.5 gives a representation of how DMI works. Once established, DMI is able to bypass the normal path of multiple transport (blocking transport in current framework) calls from initiator through interconnect components to target. Wrapper OVP MIPS32_24K CPU Memory Direct Memory Interface Fig 4.5 Backdoor Memory Access 4.5 Dual Core Platform Based on the task partitioning explained in section 4.3, the application was split into two parts, each part being executed on a separate processor. The platform constructed to simulate such a system is illustrated in Figure 4.6. Each processor has its own program memory which contains the executable in.elf format. In the current framework, the cores communicate with each other via shared memory. This shared memory is used to transfer necessary information among the processors like the image properties consisting parameters as image size, number of components, sampling rate and the necessary blocks from core 1 to core 2 for IDCT computation. To main correctness of the application, the two processors synchronize via polling mechanism in which the semaphore present in shared memory is constantly polled by both processors. Thus processor 1 reads the input image from the memory. Each time processor 1 generates 22

32 an 8x8 block after de-quantization, it places the block into the shared memory and sets the semaphore to high. It then waits for this semaphore to set back to a low value, which is done by Processor 2. Processor 1 writes the block to the shared memory only when the semaphore is low. Processor 2 on the other hand, waits for the high value of semaphore. When semaphore is found high, it reads the block from the shared memory, and reset the semaphore to a low value so that next block could be written by Processor 1. Processor 2 then performs IDCT on the block. When sufficient numbers of blocks are obtained, color-conversion and re-ordering is performed and the reconstructed data is stored back in the output memory. MIPS32_24K Core 1 (IA) MIPS32_24K Core 2 (IA) Bus 1 Bus 2 Program Memory Input Memory Program Memory Output Memory Bus 3 Shared Memory Fig 4.6 Dual Core Platform Based on this description the next chapter presents the results of our experimentation. 23

System Level Virtual Prototyping becomes a reality with OVP donation from Imperas.

System Level Virtual Prototyping becomes a reality with OVP donation from Imperas. System Level Virtual Prototyping becomes a reality with OVP donation from Imperas. Brian Bailey EDA Consultant Abstract For many years, Electronic System Level (ESL) design and verification has been on

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

TLM-2.0 in Action: An Example-based Approach to Transaction-level Modeling and the New World of Model Interoperability

TLM-2.0 in Action: An Example-based Approach to Transaction-level Modeling and the New World of Model Interoperability DVCon 2009 TLM-2.0 in Action: An Example-based Approach to Transaction-level Modeling and the New World of Model Interoperability John Aynsley, Doulos TLM Introduction CONTENTS What is TLM and SystemC?

More information

High Performance or Cycle Accuracy?

High Performance or Cycle Accuracy? CHIP DESIGN High Performance or Cycle Accuracy? You can have both! Bill Neifert, Carbon Design Systems Rob Kaye, ARM ATC-100 AGENDA Modelling 101 & Programmer s View (PV) Models Cycle Accurate Models Bringing

More information

Early Hardware/Software Integration Using SystemC 2.0

Early Hardware/Software Integration Using SystemC 2.0 Early Hardware/Software Integration Using SystemC 2.0 Jon Connell, ARM. Bruce Johnson, Synopsys, Inc. Class 552, ESC San Francisco 2002 Abstract Capabilities added to SystemC 2.0 provide the needed expressiveness

More information

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH WHITE PAPER METRIC-DRIVEN VERIFICATION ENSURES SOFTWARE DEVELOPMENT QUALITY BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH INTRODUCTION The complexity of electronic systems is rapidly

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

MPSoC Virtual Platforms

MPSoC Virtual Platforms CASTNESS 2007 Workshop MPSoC Virtual Platforms Rainer Leupers Software for Systems on Silicon (SSS) RWTH Aachen University Institute for Integrated Signal Processing Systems Why focus on virtual platforms?

More information

Efficient Software Development Platforms for Multimedia Applications at Different Abstraction Levels

Efficient Software Development Platforms for Multimedia Applications at Different Abstraction Levels Efficient Software Development Platforms for Multimedia Applications at Different ion Levels Katalin Popovici 1 Xavier Guerin 1 1 TIMA Laboratory 46 Avenue Felix Viallet F38031, Grenoble, FRANCE {FirstName.LastName@imag.fr}

More information

Tensilica Software Development Toolkit (SDK)

Tensilica Software Development Toolkit (SDK) Tensilica Datasheet Tensilica Software Development Toolkit (SDK) Quickly develop application code Features Cadence Tensilica Xtensa Xplorer Integrated Development Environment (IDE) with full graphical

More information

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor Von der Hardware zur Software in FPGAs mit Embedded Prozessoren Alexander Hahn Senior Field Application Engineer Lattice Semiconductor AGENDA Overview Mico32 Embedded Processor Development Tool Chain HW/SW

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

12. Introduction to Virtual Machines

12. Introduction to Virtual Machines 12. Introduction to Virtual Machines 12. Introduction to Virtual Machines Modern Applications Challenges of Virtual Machine Monitors Historical Perspective Classification 332 / 352 12. Introduction to

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

A case study of mobile SoC architecture design based on transaction-level modeling

A case study of mobile SoC architecture design based on transaction-level modeling A case study of mobile SoC architecture design based on transaction-level modeling Eui-Young Chung School of Electrical & Electronic Eng. Yonsei University 1 EUI-YOUNG(EY) CHUNG, EY CHUNG Outline Introduction

More information

CSC 2405: Computer Systems II

CSC 2405: Computer Systems II CSC 2405: Computer Systems II Spring 2013 (TR 8:30-9:45 in G86) Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Introductions Mirela Damian Room 167A in the Mendel Science Building mirela.damian@villanova.edu

More information

Real-Time Operating Systems for MPSoCs

Real-Time Operating Systems for MPSoCs Real-Time Operating Systems for MPSoCs Hiroyuki Tomiyama Graduate School of Information Science Nagoya University http://member.acm.org/~hiroyuki MPSoC 2009 1 Contributors Hiroaki Takada Director and Professor

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

Hybrid Platform Application in Software Debug

Hybrid Platform Application in Software Debug Hybrid Platform Application in Software Debug Jiao Feng July 15 2015.7.15 Software costs in SoC development 2 Early software adoption Previous Development Process IC Development RTL Design Physical Design

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

ECU State Manager Module Development and Design for Automotive Platform Software Based on AUTOSAR 4.0

ECU State Manager Module Development and Design for Automotive Platform Software Based on AUTOSAR 4.0 ECU State Manager Module Development and Design for Automotive Platform Software Based on AUTOSAR 4.0 Dhanamjayan P.R. 1, Kuruvilla Jose 2, Manjusree S. 3 1 PG Scholar, Embedded Systems, 2 Specialist,

More information

An Easier Way for Cross-Platform Data Acquisition Application Development

An Easier Way for Cross-Platform Data Acquisition Application Development An Easier Way for Cross-Platform Data Acquisition Application Development For industrial automation and measurement system developers, software technology continues making rapid progress. Software engineers

More information

Hardware Virtualization for Pre-Silicon Software Development in Automotive Electronics

Hardware Virtualization for Pre-Silicon Software Development in Automotive Electronics Hardware Virtualization for Pre-Silicon Software Development in Automotive Electronics Frank Schirrmeister, Filip Thoen fschirr@synopsys.com Synopsys, Inc. Market Trends & Challenges Growing electronics

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Virtual Platforms in System-on-Chip Design

Virtual Platforms in System-on-Chip Design Virtual Platforms in System-on-Chip Design Katalin Popovici 1 and Ahmed A. Jerraya 2 1 The MathWorks, Inc., Natick, MA, USA 2 CEA-LETI, Grenoble, France Notice of Copyright This material is protected under

More information

AN10850. LPC1700 timer triggered memory to GPIO data transfer. Document information. LPC1700, GPIO, DMA, Timer0, Sleep Mode

AN10850. LPC1700 timer triggered memory to GPIO data transfer. Document information. LPC1700, GPIO, DMA, Timer0, Sleep Mode LPC1700 timer triggered memory to GPIO data transfer Rev. 01 16 July 2009 Application note Document information Info Keywords Abstract Content LPC1700, GPIO, DMA, Timer0, Sleep Mode This application note

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

Intel CoFluent Methodology for SysML *

Intel CoFluent Methodology for SysML * Intel CoFluent Methodology for SysML * UML* SysML* MARTE* Flow for Intel CoFluent Studio An Intel CoFluent Design White Paper By Thomas Robert and Vincent Perrier www.cofluent.intel.com Acronyms and abbreviations

More information

Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation

Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation Satish Narayanasamy, Cristiano Pereira, Harish Patil, Robert Cohn, and Brad Calder Computer Science and

More information

IA-64 Application Developer s Architecture Guide

IA-64 Application Developer s Architecture Guide IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve

More information

Performance Analysis and Visualization of SystemC Models. Adam Donlin and Thomas Lenart Xilinx Research

Performance Analysis and Visualization of SystemC Models. Adam Donlin and Thomas Lenart Xilinx Research Performance Analysis and Visualization of SystemC Models Adam Donlin and Thomas Lenart Xilinx Research Overview Performance Analysis!= Functional Verification Analysis and Visualization Overview Simulation

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU ESE566 REPORT3 Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU Nov 19th, 2002 ABSTRACT: In this report, we discuss several recent published papers on design methodologies of core-based

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details Thomas Fahrig Senior Developer Hypervisor Team Hypervisor Architecture Terminology Goals Basics Details Scheduling Interval External Interrupt Handling Reserves, Weights and Caps Context Switch Waiting

More information

x86 ISA Modifications to support Virtual Machines

x86 ISA Modifications to support Virtual Machines x86 ISA Modifications to support Virtual Machines Douglas Beal Ashish Kumar Gupta CSE 548 Project Outline of the talk Review of Virtual Machines What complicates Virtualization Technique for Virtualization

More information

Virtual Machines. www.viplavkambli.com

Virtual Machines. www.viplavkambli.com 1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software

More information

1. PUBLISHABLE SUMMARY

1. PUBLISHABLE SUMMARY 1. PUBLISHABLE SUMMARY ICT-eMuCo (www.emuco.eu) is a European project with a total budget of 4.6M which is supported by the European Union under the Seventh Framework Programme (FP7) for research and technological

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

Performance Comparison of RTOS

Performance Comparison of RTOS Performance Comparison of RTOS Shahmil Merchant, Kalpen Dedhia Dept Of Computer Science. Columbia University Abstract: Embedded systems are becoming an integral part of commercial products today. Mobile

More information

Network Scanning: A New Feature for Digital Copiers

Network Scanning: A New Feature for Digital Copiers Network Scanning: A New Feature for Digital Copiers Abstract Introduction The method of implementing electronic document capture and distribution, known as network scanning, into the traditional copier/printer

More information

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 26 Real - Time POSIX. (Contd.) Ok Good morning, so let us get

More information

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

More information

evm Virtualization Platform for Windows

evm Virtualization Platform for Windows B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400

More information

Operating System Support for Multiprocessor Systems-on-Chip

Operating System Support for Multiprocessor Systems-on-Chip Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.

More information

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to: 55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Virtualization. Dr. Yingwu Zhu

Virtualization. Dr. Yingwu Zhu Virtualization Dr. Yingwu Zhu What is virtualization? Virtualization allows one computer to do the job of multiple computers. Virtual environments let one computer host multiple operating systems at the

More information

CSE 237A Final Project Final Report

CSE 237A Final Project Final Report CSE 237A Final Project Final Report Multi-way video conferencing system over 802.11 wireless network Motivation Yanhua Mao and Shan Yan The latest technology trends in personal mobile computing are towards

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Chapter 3 Operating-System Structures

Chapter 3 Operating-System Structures Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

Linux Driver Devices. Why, When, Which, How?

Linux Driver Devices. Why, When, Which, How? Bertrand Mermet Sylvain Ract Linux Driver Devices. Why, When, Which, How? Since its creation in the early 1990 s Linux has been installed on millions of computers or embedded systems. These systems may

More information

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014) Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014) ManolisMarazakis (maraz@ics.forth.gr) Institute of Computer Science (ICS) Foundation

More information

SYSTEM ecos Embedded Configurable Operating System

SYSTEM ecos Embedded Configurable Operating System BELONGS TO THE CYGNUS SOLUTIONS founded about 1989 initiative connected with an idea of free software ( commercial support for the free software ). Recently merged with RedHat. CYGNUS was also the original

More information

Full and Para Virtualization

Full and Para Virtualization Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels

More information

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1

MICROPROCESSOR. Exclusive for IACE Students www.iace.co.in iacehyd.blogspot.in Ph: 9700077455/422 Page 1 MICROPROCESSOR A microprocessor incorporates the functions of a computer s central processing unit (CPU) on a single Integrated (IC), or at most a few integrated circuit. It is a multipurpose, programmable

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE Sudha M 1, Harish G M 2, Nandan A 3, Usha J 4 1 Department of MCA, R V College of Engineering, Bangalore : 560059, India sudha.mooki@gmail.com 2 Department

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

More information

(Refer Slide Time: 02:39)

(Refer Slide Time: 02:39) Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering, Indian Institute of Technology, Delhi Lecture - 1 Introduction Welcome to this course on computer architecture.

More information

Multi-core Programming System Overview

Multi-core Programming System Overview Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Networking Remote-Controlled Moving Image Monitoring System

Networking Remote-Controlled Moving Image Monitoring System Networking Remote-Controlled Moving Image Monitoring System First Prize Networking Remote-Controlled Moving Image Monitoring System Institution: Participants: Instructor: National Chung Hsing University

More information

Introduction to Embedded Systems. Software Update Problem

Introduction to Embedded Systems. Software Update Problem Introduction to Embedded Systems CS/ECE 6780/5780 Al Davis logistics minor Today s topics: more software development issues 1 CS 5780 Software Update Problem Lab machines work let us know if they don t

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.

More information

NetFlow probe on NetFPGA

NetFlow probe on NetFPGA Verze #1.00, 2008-12-12 NetFlow probe on NetFPGA Introduction With ever-growing volume of data being transferred over the Internet, the need for reliable monitoring becomes more urgent. Monitoring devices

More information

Testing of Digital System-on- Chip (SoC)

Testing of Digital System-on- Chip (SoC) Testing of Digital System-on- Chip (SoC) 1 Outline of the Talk Introduction to system-on-chip (SoC) design Approaches to SoC design SoC test requirements and challenges Core test wrapper P1500 core test

More information

System Software Integration: An Expansive View. Overview

System Software Integration: An Expansive View. Overview Software Integration: An Expansive View Steven P. Smith Design of Embedded s EE382V Fall, 2009 EE382 SoC Design Software Integration SPS-1 University of Texas at Austin Overview Some Definitions Introduction:

More information

Processor Architectures

Processor Architectures ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture

More information

A SystemC Transaction Level Model for the MIPS R3000 Processor

A SystemC Transaction Level Model for the MIPS R3000 Processor SETIT 2007 4 th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 25-29, 2007 TUNISIA A SystemC Transaction Level Model for the MIPS R3000 Processor

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

Hardware accelerated Virtualization in the ARM Cortex Processors

Hardware accelerated Virtualization in the ARM Cortex Processors Hardware accelerated Virtualization in the ARM Cortex Processors John Goodacre Director, Program Management ARM Processor Division ARM Ltd. Cambridge UK 2nd November 2010 Sponsored by: & & New Capabilities

More information

Codesign: The World Of Practice

Codesign: The World Of Practice Codesign: The World Of Practice D. Sreenivasa Rao Senior Manager, System Level Integration Group Analog Devices Inc. May 2007 Analog Devices Inc. ADI is focused on high-end signal processing chips and

More information

Chapter 2 System Structures

Chapter 2 System Structures Chapter 2 System Structures Operating-System Structures Goals: Provide a way to understand an operating systems Services Interface System Components The type of system desired is the basis for choices

More information

EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

More information

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what

More information

On Demand Loading of Code in MMUless Embedded System

On Demand Loading of Code in MMUless Embedded System On Demand Loading of Code in MMUless Embedded System Sunil R Gandhi *. Chetan D Pachange, Jr.** Mandar R Vaidya***, Swapnilkumar S Khorate**** *Pune Institute of Computer Technology, Pune INDIA (Mob- 8600867094;

More information

Driving force. What future software needs. Potential research topics

Driving force. What future software needs. Potential research topics Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #

More information

Application Note: AN00141 xcore-xa - Application Development

Application Note: AN00141 xcore-xa - Application Development Application Note: AN00141 xcore-xa - Application Development This application note shows how to create a simple example which targets the XMOS xcore-xa device and demonstrates how to build and run this

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operatin g Systems: Internals and Design Principle s Chapter 11 I/O Management and Disk Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles An artifact can

More information

Building Applications Using Micro Focus COBOL

Building Applications Using Micro Focus COBOL Building Applications Using Micro Focus COBOL Abstract If you look through the Micro Focus COBOL documentation, you will see many different executable file types referenced: int, gnt, exe, dll and others.

More information

Operating Systems. Lecture 03. February 11, 2013

Operating Systems. Lecture 03. February 11, 2013 Operating Systems Lecture 03 February 11, 2013 Goals for Today Interrupts, traps and signals Hardware Protection System Calls Interrupts, Traps, and Signals The occurrence of an event is usually signaled

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

- An Essential Building Block for Stable and Reliable Compute Clusters

- An Essential Building Block for Stable and Reliable Compute Clusters Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative

More information

1. Computer System Structure and Components

1. Computer System Structure and Components 1 Computer System Structure and Components Computer System Layers Various Computer Programs OS System Calls (eg, fork, execv, write, etc) KERNEL/Behavior or CPU Device Drivers Device Controllers Devices

More information

Eight Ways to Increase GPIB System Performance

Eight Ways to Increase GPIB System Performance Application Note 133 Eight Ways to Increase GPIB System Performance Amar Patel Introduction When building an automated measurement system, you can never have too much performance. Increasing performance

More information

Using a Generic Plug and Play Performance Monitor for SoC Verification

Using a Generic Plug and Play Performance Monitor for SoC Verification Using a Generic Plug and Play Performance Monitor for SoC Verification Dr. Ambar Sarkar Kaushal Modi Janak Patel Bhavin Patel Ajay Tiwari Accellera Systems Initiative 1 Agenda Introduction Challenges Why

More information

Network connectivity controllers

Network connectivity controllers Network connectivity controllers High performance connectivity solutions Factory Automation The hostile environment of many factories can have a significant impact on the life expectancy of PCs, and industrially

More information

Addressing the Challenges of Synchronization/Communication and Debugging Support in Hardware/Software Cosimulation

Addressing the Challenges of Synchronization/Communication and Debugging Support in Hardware/Software Cosimulation Addressing the Challenges of Synchronization/Communication and Debugging Support in Hardware/Software Cosimulation Banit Agrawal Timothy Sherwood Department of Computer Science University of California,

More information

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16 1 / 16 Virtualization P. A. Wilsey The text highlighted in green in these slides contain external hyperlinks. 2 / 16 Conventional System Viewed as Layers This illustration is a common presentation of the

More information