On-Chip Memory Architecture Exploration of Embedded System on Chip

Size: px
Start display at page:

Download "On-Chip Memory Architecture Exploration of Embedded System on Chip"

Transcription

1 On-Chip Memory Architecture Exploration of Embedded System on Chip A Thesis Submitted for the Degree of Doctor of Philosophy in the Faculty of Engineering by T.S. Rajesh Kumar Supercomputer Education and Research Centre Indian Institute of Science Bangalore September 2008

2

3 To my Family, Sree, Amma, Advika and Adarsh

4

5 Abstract Today s feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at low cost and lower energy consumption. SoCs are complex designs with multiple embedded processors, memory subsystems, and application specific peripherals. The memory architecture of embedded SoCs strongly influences the area, power and performance of the entire system. Further, the memory subsystem constitutes a major part (typically up to 70%) of the silicon area for the current day SoC. The on-chip memory organization of embedded processors varies widely from one SoC to another, depending on the application and market segment for which the SoC is deployed. There is a wide variety of choices available for the embedded designers, starting from simple on-chip SPRAM based architecture to more complex cache-spram based hybrid architecture. The performance of a memory architecture also depends on how the data variables of the application are placed in the memory. There are multiple data layouts for each memory architecture that are efficient from a power and performance viewpoint. Further, the designer would be interested in multiple optimal design points to address various market segments. Hence a memory architecture exploration for an embedded system involves evaluating a large design space in the order of 100,000 of design points and each design points having several tens of thousands of data layouts. Due to its large impact on system performance parameters, the memory architecture is often hand-crafted by experienced designers exploring a very small subset of this design space. The vast memory design space prohibits any possibility for a manual analysis. In this work, we propose an automated framework for on-chip memory architecture

6 exploration. Our proposed framework integrates memory architecture exploration and data layout to search the design space efficiently. While the memory exploration selects specific memory architectures, the data layout efficiently maps the given application on to the memory architecture under consideration and thus helps in evaluating the memory architecture. The proposed memory exploration framework works at both logical and physical memory architecture level. Our work addresses on-chip memory architecture for DSP processors that is organized as multiple memory banks, with each back can be a single/dual port banks and with non-uniform bank sizes. Further, our work also address memory architecture exploration for on-chip memory architectures that is SPRAM and cache based. Our proposed method is based on multi-objective Genetic Algorithm based and outputs several hundred Pareto-optimal design solutions that are interesting from a area, power and performance viewpoints within a few hours of running on a standard desktop configuration.

7 Acknowledgments There are many people I would like to thank who have helped me in various ways. First and foremost I would like to thank my Supervisors, Prof. R. Govindarajan and Dr.C.P. Ravikumar, who have guided me and supported me in various aspects through the entire journey in completion of my thesis work. I profusely thank for the encouragement they provided and their perseverance in keeping me focused on the Ph.D. work. I would like to express my gratitude to Texas Instruments for giving me the time and opportunity to pursue my studies. I would like to thank my colleagues at Texas Instruments for their support and reviews. In particular my manager Balaji Holur. I would also like to thank my previous managers Pamela Kumar and Manohar Sambandam. Last but not the least, I would like to thank my dearest family members for the encouragement they provided and the sacrifices they made to help me achieve my goals.

8 iv

9 Contents Abstract Acknowledgments i iii List of Publications from this Thesis 1 1 Introduction Application Specific Systems Memory Subsystem On-chip Memory Organization Cache-based Memory Organization Scratch Pad Memory-based Organization Data Layout Memory Architecture Exploration Embedded System Design Flow Contributions Thesis Overview Background On-chip Memory Architecture of Embedded Processors DSP On-chip SPRAM Architecture Microcontroller Memory Architecture Software Optimizations

10 vi DSP Software Optimizations MCU Software Optimizations Cache Based Embedded SOC Cache-SPRAM Based Hybrid On-chip Memory Architecture Genetic Algorithms - An Overview Multi-objective Multiple Design Points Data Layout for Embedded Applications Introduction Method Overview and Problem Statement Method Overview Problem Statement ILP Formulation Basic Formulation Handling Multiple Memory Banks Handling SARAM and DARAM Overlay of Data Sections Swapping of Data Genetic Algorithm Formulation Heuristic Algorithm Data Partitioning into Internal and External Memory DARAM and SARAM placements Experimental Methodology and Results Experimental Methodology Integer Linear Programming - Results Heuristic and GA Results Comparison of Heuristic Data Layout with GA Comparison of Different Approaches Related Work Conclusions

11 vii 4 Logical Memory Exploration Introduction Method Overview Memory Architecture Parameters Memory Architecture Exploration Objectives Memory Architecture Exploration and Data Layout Genetic Algorithm Formulation GA Formulation for Memory Architecture Exploration Pareto Optimality and Non-Dominated Sorting Simulated Annealing Formulation Memory Subsystem Optimization Experimental Results Experimental Methodology Experimental Results Related Work Conclusions Data Layout Exploration Introduction Problem Definition MODLEX: Multi Objective Data Layout EXploration Method Overview Mapping Logical Memory to Physical Memory Genetic Algorithm Formulation Experimental Results Experimental Methodology Experimental Results Comparison of MODLEX and Stand-alone Optimizations Related Work Conclusions

12 viii 6 Physical Memory Exploration Introduction Logical Memory Exploration to Physical Memory Exploration (LME2PME) Method Overview Physical Memory Exploration Genetic Algorithm Formulation Direct Physical Memory Exploration (DirPME) Framework Method Overview Genetic Algorithm Formulation Experimental Methodology and Results Experimental Methodology Experimental Results from LME2PME Experimental Results from DirPME Comparison of LME2PME and DirPME Related Work Conclusions Cache Based Architectures Introduction Solution Overview Data Partitioning Heuristic Cache Conscious Data Layout Overview Graph Partitioning Formulation Cache Offset Computation Experimental Methodology and Results Experimental Methodology Cache-Conscious Data Layout Cache-SPRAM Data Partitioning Memory Architecture Exploration

13 ix 7.6 Related Work Cache Conscious Data Layout SPRAM-Cache Data Partitioning Memory Architecture Exploration Conclusions Conclusions Thesis Summary Future Work Standardization of Input and Output Parameters Impact of platform change on system performance Impact of Application IP library rework on system performance Impact of semiconductor library rework on the system performance Multiprocessor Architectures Bibliography 176

14 List of Tables 1.1 Explanation of Xchart Steps List of Symbols Used Memory Architecture for the Experiments Experimental Results Results from Heuristic Placement (HP) and Genetic Placement (GP) on 4 Embedded Applications, VE = Voice Encoder, JP = JPEG Decoder, LLP = Levinson s Linear Predictor, 2D = 2D Wavelet Transform Comparative Ranking of Algorithms Memory Architecture Parameters Evaluation of Multi-Objective Cost Function Memory Architecture Exploration Non-dominant Points Comparison GA-SA Memory Architectures Used for Data Layout Memory Architectures Explored - Using DirPME Approach Non-dominant Points Comparison LME2PME-DirPME Input Parameters for Data Partitioning Algorithm Data Layout Comparison Data Layout for Different Cache Configurations

15 List of Figures 1.1 Architecture of an Embedded SoC Embedded Application Development Flow Memory Trends in SoC Application Specific SoC Design Flow Illustration with X-chart Mapping Chapters to X-chart Steps Example DSP Memory Map Cache-SPRAM Based On-Chip Memory Architecture Genetic Algorithm Flow Overview of Data Layout Illustration of Parallel and Self Conflicts Heuristic Algorithm for Data Layout Relative performance of the Genetic Algorithm w.r.t. Heuristic, for Varying Number of Generations Comparison of Heuristic Data Layout Performance with GA Data layout DSP Processor Memory Architecture Two-stage Approach to Memory Subsystem Optimization Comparison of GA and SA Approaches for Memory Exploration Vocoder Non-dominated Points Comparison Between GA and SA Vocoder: Memory Exploration (All Design Points Explored and Non-dominated Points)

16 xii 4.6 MPEG: Memory Exploration (All Design Points Explored and Non-dominated Points) JPEG: Memory Exploration (All Design Points Explored and Non-dominated Points) DSL: Memory Exploration (All Design Points Explored and Non-dominated Points) MODLEX: Multi Objective Data Layout EXploration Framework Data Layout Exploration: MPEG Encoder Data Layout Exploration: Voice Encoder Data Layout Exploration: Multi-Channel DSL Individual Optimizations vs Integrated Memory Architecture Exploration Memory Architecture Exploration - Integrated Approach Logical to Physical Memory Exploration - Overview Logical to Physical Memory Exploration - Method GA Formulation of LME2PME MAX: Memory Architecture exploration Framework GA Formulation of Physical Memory Exploration Voice Encoder: Memory Architecture Exploration - Using LME2PME Approach MPEG: Memory Architecture Exploration - Using LME2PME Approach DSL: Memory Architecture Exploration - Using LME2PME Approach Voice Encoder (3D view): Memory Architecture Exploration - Using DirPME Approach Voice Encoder: Memory Architecture Exploration - Using DirPME Approach MPEG Encoder: Memory Architecture Exploration - Using DirPME Approach DSL: Memory Architecture Exploration - Using DirPME Approach

17 xiii 7.1 Target Memory Architecture Memory Exploration Framework Example: Temporal Relationship Graph Heuristic Algorithm for Data Partitioning Cache Conscious Data Layout Heuristic Algorithm for Offset Computation AAC: Performance for different Hybrid Memory Architecture MPEG: Performance for different Hybrid Memory Architecture JPEG: Performance for different Hybrid Memory Architecture AAC: Power consumed for different hybrid memory architecture MPEG: Power consumed for different hybrid memory architecture JPEG: Power consumed for different hybrid memory architecture AAC: Non-dominated Solutions MPEG: Non-dominated Solutions JPEG: Non-dominated Solutions

18

19 List of Publications from this Thesis 1. T.S.Rajesh Kumar, R.Govindarajan, and C.P. Ravikumar. On-chip Memory Architecture Exploration Framework for DSP Processor Based Embedded SoC. Submitted to the ACM Transactions on Embedded Computing Systems, May T.S.Rajesh Kumar, R.Govindarajan, and C.P. Ravikumar. Memory Architecture Exploration Framework for Cache-based Embedded SoC. In Proceedings of the International Conference on VLSI Design, Jan T.S.Rajesh Kumar, R.Govindarajan, and C.P. Ravikumar. MODLEX: A Multi-Objective Data Layout EXploration Framework for Embedded SoC. In Proceedings of the 12th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan T.S.Rajesh Kumar, R.Govindarajan, and C.P. Ravikumar. MAX: A Multi-Objective Memory Architecture Exploration Framework for Embedded SoC. In Proceedings of the International Conference on VLSI Design, Jan T.S.Rajesh Kumar, R.Govindarajan, and C.P. Ravikumar. Embedded Tutorial on Multi- Processor Architectures for Embedded SoC. In Proceedings of the VLSI Design and Test, Aug T.S.Rajesh Kumar, R.Govindarajan, and C.P. Ravikumar. Optimal Code and Data Layout for Embedded Systems. Design, Jan In Proceedings of the International Conference on VLSI 7. T.S.Rajesh Kumar, R.Govindarajan, and C.P. Ravikumar. Memory Exploration for Embedded Systems. In Proceedings of the VLSI Design and Test, Aug 2002.

20 2 List of Publications from this Thesis

21 Chapter 1 Introduction 1.1 Application Specific Systems Today s VLSI technology allows us to integrate tens of processor cores on the same chip along with embedded memories, application specific circuits, and interconnect infrastructure. As a result, it is possible to integrate an entire system onto a single chip. The single chip phone, which has been introduced by several semiconductor vendors, is an example of such a system-on-chip; it includes the modem, radio transceiver, power management functionality, a multimedia engine and security features, all on the same chip. An embedded system is an application-specific system which is optimized to perform a single function or a small set of functions [70]. We distinguish this from a general-purpose system, which is software-programmable to perform multiple functions. A personal computer is an example of a general-purpose system; depending on the software we run on the computer, it can be useful for playing games, word processing, database operations, scientific computation, etc. On the other hand, a digital camera is an example of an embedded system, which can perform a limited set of functions such as taking pictures, organizing them, or transferring them to another device through a suitable I/O interface. Other examples of embedded systems include mobile phones, audio/video players, videogame consoles, settop boxes, car infotainment systems, personal digital assistants, telephone central-office switches, dedicated network routers and bridges. Note that a large number of embedded

22 4 Introduction systems are built for the consumer market. As a result, in order to be competetive, the cost of an embedded system cannot be very high. Yet, the consumers demand higher performance and more features from the embedded systems products. It is easy to appreciate this point if we compare the performance and feature set offered by mobile phones that cost Rs 5000/-(or 100$) today and which cost the same a few years ago. We also see that a large number of embedded systems are being built for the mobile market. This trend is not surprising - the number of mobile phone subscribers increased from 500 Million in year 2000 to 2.6 Billion in 2007 [7]. Because of such high volumes, embedded systems are extremely cost sensitive and their design demands careful silicon-area optimization. Since mobile devices use batteries as the main source of power, embedded systems must also be optimized for energy dissipation. Power, which represents the rate at which energy is consumed, must also be kept low to avoid heating and improving reliability. In summary, the designer of an embedded system must simultaneously consider and optimize price, performance, energy, and power dissipation. Application specific embedded systems designed today demand innovative methods to optimize these system cost functions [11, 19]. Many of today s embedded systems are based on system-on-chip platforms [16], which, in turn, consist of one or more embedded microcontrollers, digital signal processors (DSP), application specific circuits and read-only memory, all integrated into a single package. These blocks are available from vendors of intellectual property (IP) as hard cores or soft cores [42, 28]. A hard core, or hard IP block, is one where the circuit is available at a lower level of abstraction such as the layout-level [42, 28]; it is impossible to customize a hard IP to suit the requirements of the embedded system. As a result, there are limited opportunities in optimizing the cost functions by modifying the hard IP. For example, if some functionality included in the IP is not required in the present application, we cannot remove the function to save area. Soft IP refers to circuits which are available at a higher level of abstraction, such as register-transfer level [28, 42]. It is possible to customize the soft IP for the specific application. The designer of an embedded SoC integrates the IP cores for processors, memories, and application-specific hardware to create the SoC. Figure 1.1 illustrates the architecture of an embedded system-on-chip (SoC). As can

23 1.2 Memory Subsystem 5 be seen in the figure, there are four principal components in such an SoC. 1. An Analog Front End which includes the analog/digital and digital/analog converters 2. Programmable Components which include microprocessors, microcontrollers, and DSPs. The number of embedded processors is increasing every year. An interesting statistic shows that of the nine billion processors manufactured in 2005, less than 2% were used for general-purpose computers. The other 8.8 billion went into embedded systems [13]. The microcontroller/microprocessor is useful in handling interrupts, house-keeping and performing timing related functions. The DSP is useful for processing the audio and video information e.g., compression and decompression of audio and video information. The application software is normally preloaded in the memory and is not user programmable, unlike general-purpose processor-based systems 3. Application-specific components these include hardware accelerators for computeintensive functions. Examples of hardware accelerators include digital image processors which are useful in cameras 1.2 Memory Subsystem On-chip Memory Organization The memory architecture of an embedded processor core is complex and is custom designed to improve run-time performance and power consumption. In this section we describe only on the memory architecture of the DSP processor as this is the focus of the thesis. This is because, the memory architecture of the DSP is more complex than that of microcontrollers (MCU) due to the following reasons: (a) DSP applications are more data dominated than the control-dominated software executed on an MCU. Memory bandwidth requirements for DSP applications range from 2 to 3 memory accesses per

24 6 Introduction Figure 1.1: Architecture of an Embedded SoC processor clock cycle. For an MCU, this figure is, at best, one memory access per cycle. (b) It is critical in DSP application to extract maximum performance from the memory subsystem in order to meet the real-time constraints of the embedded application. As a consequence, the DSP software for critical kernels is developed mostly as hand optimized assembly code. In contrast, the software for MCU is typically developed in high-level languages. The memory architecture for a DSP is unique since the DSP has multiple onchip buses and multiple address generation units to service higher bandwidth needs. The on-chip memory of embedded processors can include (a) only Level-1 cache (L1-cache) (e.g., [1]), (b) only scratch-pad RAM (SPRAM) (e.g., [75, 76], or (c) a combination of L1-cache and SPRAM (e.g., [2, 77]) Cache-based Memory Organization Purely cache-based on-chip memory organization is generally not preferred by embedded system designers as this organization cannot guarantee the worst-case execution time constraints. This is because the access time in a cache based system can vary depending on whether the access results in a cache miss or a hit [33]. As a consequence, the run-time

25 1.2 Memory Subsystem 7 performance of cache-based memory subsystems varies, based on the execution path of application and is data dependent. However cache architecture is advantageous in the sense that it reduces programmer s responsibility in terms of placement of data to achieve better memory access time. Further the movement of data from off-chip memory to cache is transparent. In [12], the authors present a comparison study of SPRAM and cache for embedded applications and conclude that SPRAM has 34% smaller area and 40% lower power consumption than a cache of the same capacity. There is published literature to estimate the worst case execution time [81] and find an upper bound on run-time [78] for cache-based embedded systems. Hence it was argued that for real-time embedded systems which require stringent worst-case performance guarantee, purely cache based on-chip organization is not suitable Scratch Pad Memory-based Organization On-chip memory organization based only on Scratch Pad memory ensures single cycle access times and guarantees on worst-case execution for data that resides in Scratch-Pad RAM (SPRAM). However, it is the responsibility of the programmer to identify data section that should be placed in SPRAM or place code in the program to appropriately move data from off-chip memory to SPRAM. A DSP core can include the following types of memories static RAM (SRAM), ROM, and/or dynamic RAM (DRAM). The scratch pad memory in the DSP core is organized into multiple memory banks to facilitate multiple simultaneous data accesses. A memory bank can be organized as a single-access RAM (SARAM) or a dual-access RAM (DARAM) to provide single or dual access to the memory bank in a single cycle. Also the on-chip memory banks can be of different sizes. Smaller memory banks consume lesser power per access than the larger memories. The embedded system may also be interfaced to off-chip memory, which can include SRAM and DRAM. Purely SPRAM based on-chip organization is suitable only for low to medium complex embedded applications. SPRAM based systems do not use the on-chip RAM efficiently as it requires the entire data sections that are currently accessed to be placed exclusively

26 8 Introduction in the SPRAM. It is possible to accommodate different data sections in SPRAM at different points in execution time by moving data dynamically between off-chip memory and SPRAM. But this results in certain run-time overhead and increase in code size. For medium to large applications, which have large number of critical data variables, a large amount of on-chip RAM will become necessary to meet the real-time performance constraints. Hence for such applications pure SPRAM architecture are not preferred. 1.3 Data Layout To efficiently use the on-chip memory, critical data variables of the application need to be identified and mapped to the on-chip RAM. The memory architecture may contain both on-chip cache and SPRAM. In such a case it is important to partition the data section and assign them appropriately to on-chip cache and SPRAM such that memory performance of the application is optimized. Further, among the data sections assigned to on-chip cache and SPRAM, a proper placement of the data sections on the cache and SPRAM is required to ensure that the cache misses are reduced and the multiple memory banks of the SPRAM and the dual ported SPRAMs are efficiently utilized. Identifying such a data placement for data sections, referred to as the data layout problem, is complex and critical step [10, 53]. This task is typically performed manually as the compiler cannot assume that the code under compilation represents the entire system [10]. The application program in a modern embedded system is complex since it must support a variety of device interfaces such as networking interfaces, credit card readers, USB interfaces, parallel ports, and so on. The application also has many multimedia components like MP3, AAC and MIDI [8]. This necessitates an IP reuse methodology [74], where software modules developed and optimized independently by different vendors are integrated. Figure 1.2 explains the typical flow in embedded application development. This integration is a very challenging job with multiple objectives: (a) it has to be done under tight constraints on time-to-market constraints, (b) it has to be repeated for different variants of SoCs with different custom memory architectures, and (c) it has to perform in such a way that the embedded application is optimized for performance,

27 1.3 Data Layout 9 power consumption and cost. Figure 1.2: Embedded Application Development Flow Since the IPs/modules are independently optimized, the integrator is under pressure to deliver the complete product with the expectation that each component performs at the same level as it did in isolation. This is a major challenge. When a module is optimized independently, the developer has all the resources of the SoC (MIPS and Memory) to optimize the module. When these modules are integrated at the system-level, the system resources are shared among the modules. So the application integrator needs to know the MIPS and memory requirements of the modules unambiguously to be able to allocate the shared resources to critical needs [74]. Usually, the modules memory requirements are given only at a high level. To be able to optimize the whole application/system, the integrator will need detailed memory analysis at the module-level; e.g., which data buffers need to be placed in dual ported memories and which data buffers should not be placed in the same memory bank this data is usually not available. Moreover, the critical code is usually written in low-level assembly language to meet real-time constraints and/or

28 10 Introduction due to legacy reasons. Because of the above mentioned reasons, the application integration/optimization, analyzing the application and mapping software modules in order to obtain optimal cost and performance takes significant amount of time (approximately 1-2 man months). Currently in most of the SoC design data layout is also performed manually and it has two major problems:(1) the development time is significant not acceptable for current-day time to market requirements, (2) quality of solution varies based on the expertise. 1.4 Memory Architecture Exploration In modern embedded systems, the area and power consumed by the memory subsystem is up to 10 times that of the data path, making memory a critical component of the design [11]. Further, the memory subsystem constitutes a large part (typically up to 70%) of the silicon area for the current day SoC and it is expected to go up to 94% in 2014 as shown in the Figure 1.3 [6]. The main reason for this is that embedded memory has a relatively smallsubsystem per-area design cost in terms of both man-power, time-tomarket and power consumption [60]. Hence the memory plays an important role in the design of embedded SoCs. Further the memory architecture strongly influences the cost, performance and power dissipation of an embedded SoC. As discussed earlier, the on-chip memory organization of embedded processors varies widely from one SoC to another, depending on the application and market segment for which the SoC is deployed. There is a wide variety of choices available for the embedded designers, starting from simple on-chip SPRAM based architecture to more complex cache-spram based hybrid architecture. To begin with, the system designer needs to decide if the SoC requires cache and what is the right size of on-chip RAM. Once the high level memory organization is decided, the finer parameters need to be defined to complete the memory architecture definition. For the on-chip SPRAM based architecture, the parameters, namely, size, latency, number of memory banks, number of read/write ports per memory bank and connectivity, collectively define the memory organization and strongly influence the performance, cost, and power consumption. For cache based on-chip RAM,

29 1.4 Memory Architecture Exploration 11 Figure 1.3: Memory Trends in SoC the finer parameters are the size of cache, associativity, line size, miss latency and write policy. Due to its large impact on system performance parameters, the memory architecture is often hand-crafted by the designer based on the targeted applications. However, with the combination of on-chip SPRAM and cache, the memory design space is too large for a manual analysis [31]. Also, with the projected growth in the complexity of embedded systems and the vast design space in memory architecture, hand optimization of the memory architecture will soon become impossible. This warrants an automated framework which can explore the memory architecture design space and identify interesting design points that are optimal from a performance, power consumption and VLSI area (and hence cost) perspective. As the memory architecture design space itself is vast, a brute force design space exploration tool may take large computation time and hence is unlikely to be useful in meeting the tight time-to-market constraint. Further, for each given memory architecture, there are several possible data section layouts which are optimal in terms of performance and power. This further compounds the memory architecture exploration problem.

30 12 Introduction 1.5 Embedded System Design Flow In this section, we present our view of embedded system design flow to set the context for our work. For this purpose, we introduce the notion of the X-chart, which is inspired from the well-known Y-chart introduced by Gajski to capture the process of VLSI system design [29]. In a Y-chart, the three levels of design abstraction form the three dimensions of the figure Y; these are (a) design behavior, (b) design structure and (c) physical aspects of the design. A design flow starts from a behavior specification, which is then mapped to a structure, which in turn is mapped to a physical realization. We can view the process of transforming a behavior to a physical realization as a successive refinement process. Optimization of design metrics such as area, performance, and power are the goals of each of these refinement steps. The design process may spiral from the behavioral axis to structural axis to physical design axis in multiple stepwise refinement steps. We introduce the notion of the X-chart, which is illustrated in Figure 1.4. The X- chart representation has four axes: (a) Behavior, (b) Logical Architecture, (c) Physical Architecture and (d) Software Data Layout. The logical memory architecture (LMA) defines the embedded cache size, cache associativity, cache block size, size of the scratch pad memory, number of memory banks, and the number of ports. The physical memory architecture (PMA) is an actual realization of an LMA using the memory library components provided by the semiconductor vendor. The fourth dimension, namely Software Data Layout, is necessary for capturing the process of embedded system design. We have identified several steps in the embedded system design flow and marked them with circled numbers. Table 1.1 explains the individual steps in the X-chart representation. The design of an embedded system begins with a behavioral description (Point (1) in Figure 1.4, which is shown on the behavioral axis). Today, there are many languages available to capture the system behavior, e.g., System Verilog [5], System C [4], and so on. Hardware-software partitioning is performed to identify which functionalities of the description are best performed in hardware and which are best implemented in software. Hardware implementation is cost-intensive, but improves the performance.

31 1.5 Embedded System Design Flow 13 We show point (2) on the LMA axis, since hardware-software partitioning adds considerable amount of detail to decide the LMA parameters. The next step is to select hardware and software IP blocks. Depending on the time schedule (for designing the embedded system) and the cost constraint, the designer may wish to use readily available IP blocks from a vendor or implement a custom version of the IP. The target platform is then defined to implement the embedded system. As mentioned earlier, a platform includes one or more processors, memory, and hardware accelerators for specific functions. Platforms also come with software tools such as compilers and simulators, so that the development cycle can be accelerated. In other words, one does not need to wait for the hardware implementation to complete before trying out the software. We show point (4) on the software data layout axis, since the selection of a platform defines many aspects of software implementation. Software partitioning is now performed to decide which software IP blocks are executed on which processor. This completes one spiral cycle in the design life cycle of the embedded system. To recapitulate, the following components are defined at the end of the first cycle (a) the platform on which the embedded system will be built, (b) the hardware and software IP blocks that are selected for the target application, (c) assignment of software IP blocks to target processors where the software will be executed. We show point (5) on the behavioral axis, since the next spiral cycle will begin from here. The next step is to define the logical memory architecture for the memory subsystem. Guided by considerations such as cost, performance, and power, the designer must decide basic architectural parameters of the memory sub-system, such as whether or not to provide cache memory, how many memory banks are provided, whether or not dualported memories are necessary for guaranteeing performance, etc. The next step is to perform design space exploration in the logical space. Each logical memory architecture is also characterized by the selection of values for parameters such as cache size, cache associativity, cache block size, etc. There is often a cost/performance tradeoff between two solutions in the architectural space. Hence the designer must consider different Paretooptimal solutions that exhibit cost/performance tradeoff. This results in point (6) in

32 14 Introduction Figure 1.4. Figure 1.4: Application Specific SoC Design Flow Illustration with X-chart A logical memory architecture must be translated into a physical implementation by selecting components from the semiconductor vendors memory library. There are multiple realizations, i.e., physical memory architectures (PMA) for the same LMA. This involves choosing the appropriate modules based on the process technology selected in step (7), and the corresponding semiconductor vendor memory library. These represent tradeoff in terms of power consumed and VLSI area. This leads to point (7) in Figure 1.4. The mapping of an LMA to a PMA is similar to the technology mapping step in logic synthesis [53]. Data Layout (DL) is the subsequent step in the design life cycle. During this step, the placement of data variables is determined, considering every possible implementation

33 1.5 Embedded System Design Flow 15 Table 1.1: Explanation of Xchart Steps of the physical memory architecture. Once again, there are multiple solutions for data layout for a given PMA. These solutions may exhibit tradeoffs in power, performance, and area. In this thesis, we use the phrase Physical Memory Architecture Exploration (PMAE) to refer to the search for Pareto-optimal LMA/PMA/DL solutions. We capture this in the form of an equation that follows. P MAE = Logical M emory Architecture Exploration + M emory Allocation Exploration + Data Layout Exploration (1.1)

34 16 Introduction In this thesis, the focus is on memory sub-system optimization, constituted by steps (5) to (9) in Figure 1.4. The size of the solution space increases manifold during each step of the memory exploration. If N 1 optimal solutions (logical memory architectures) are identified during memory sub-system definition, memory allocation must be explored for each one of them, which can potentially result in N 1 N 2 solutions during memory allocation exploration. Similarly, data layout must be performed for each of the N 1 N 2 solutions from the memory allocation exploration step, and we may in general obtain N 1 N 2 N 3 Pareto-optimal points in the PMAE solution space. As mentioned earlier this problem can result in exploring a combinatorially exploding large design space. 1.6 Contributions First, we propose methods for data layout optimization, assuming a fixed memory architecture for a DSP-based embedded system architecture. Data layout is a critical component in the embedded design cycle and decides the final configuration of the embedded system. Data layout happens at the final stage in the life cycle of an embedded system, as illustrated in the X-chart of Figure 1.4. Data layout forms the foundation for memory subsystem optimization. Hence, we first formulate data section layout as an Integer Linear programming (ILP) problem. The proposed ILP formulation can handle: (i) partitioning of data between on-chip and off-chip memory, (ii) handling simultaneously accessed data variables (parallel conflict) in different on-chip memory banks, (iii) placing data variables that are accessed concurrently (self conflict) in dual-access RAMs, (iv) overlay of data sections with non-overlapping life times, and (v) swapping of data sections from/to off-chip memory. An important contribution of this work is the development of a simple unified ILP formulation to handle all the above mentioned optimizations. The ILP based approach is very effective for many moderately complex applications and delivers optimal results. However, as the application complexity increases, the execution time of ILP method increases drastically, making them unsuitable for large applications and in situations (such as memory architecture exploration) where the data layout need to be solved repeatedly.

35 1.6 Contributions 17 Hence we looked at developing faster methods to solve this problem. We propose a heuristic algorithm that maps the data sections to the given memory architecture and reduces the number of memory access conflicts resulting from both self conflicts and parallel conflicts. Finally, we also formulate the same problem in Genetic Algorithm (GA) and compare the results of the heuristic with GA. We find that the heuristic algorithm performs within 5% of GA s results with GA performing better. However, the heuristic algorithm s run-time is an order faster than GA s run-time making it suitable to be used for memory architecture exploration. Next, we address logical memory architecture exploration for DSP-based embedded systems (step (5) to (7) in the X-chart of Figure 1.4). The input is a set of high-level memory parameters such as the number of memory banks, size of each memory bank, number of ports etc., that define the memory sub-system. The goal of the exploration is to find an optimal on-chip memory organization that can run the given applications with minimum number of memory-stalls. When an LMA is generated, it must be evaluated for cost (in terms of VLSI area) and performance. But these depend on the data layout. Hence to evaluate a memory architecture properly, we must first generate an efficient data layout. We use the fast heuristic method proposed by us. We have implemented the memory architecture exploration problem as a two-level hierarchical search, with architectural exploration at the outer level and data-layout exploration at the inner level. A multi-objective GA and a Simulated Annealing algorithm (SA) are used as alternate search mechanisms for the architectural exploration problem. As the memory architecture exploration framework consider both performance and cost (VLSI area) objectives, we use the Pareto-optimality constraint proposed in [25] to identify design points that are interesting from one or the other objective. The proposed memory exploration framework is fully automatic and flexible. The framework is also scalable, and additional objectives like power consumption can be added easily. We have used four different applications from multimedia and communication domains for our experiments and found Pareto-optimal design choices (memory architectures) for each of the applications.

36 18 Introduction Next, we explore the data layout design space for a given physical memory architecture in order to optimize the performance and power consumption of the memory subsystem. Note that data layout exploration forms the step (8) to (9) in the X-chart representation. We propose MODLEX, a Multi Objective Data Layout EXploration framework based on Genetic Algorithm that explores the data layout design space for a given logical and physical memory architecture and obtains a list of Pareto-optimal data layout solutions from performance and power perspectives. Most of the existing work in the literature assumes that performance and power are non-conflicting objectives with respect to data layout. However we show that there is a significant trade-off (up to 70%) that is possible between power and performance. Our next step is physical memory architecture exploration (step (5 to 8) in Figure 1.4). We propose two different methods for physical memory exploration. The first approach is an extension of the Logical Memory Architectural Exploration (LMAE) method described in Chapter 4 and represented in X-chart by step 5 to 6. Physical memory exploration is performed by taking the output of LMAE and for each of the Pareto-optimal logical memory architecture, performing a memory allocation exploration (step (6 to 7)) with an objective to optimize power and area in the physical memory space. Note that the data layout is fixed at the logical memory exploration stage itself and hence the performance does not change at this step. The memory allocation exploration is formulated as a multiobjective Genetic search to explore the design space with power and area as objectives. We refer to this approach as LME2PME. The second approach is a direct and integrated approach for Physical Memory Exploration, which we refer to as DirPME. This approach corresponds to a direct move from point 5 to point 8 in Figure 1.4. In this approach, we integrate three critical components together: (i) Logical Memory Architecture Exploration, (ii) Memory Allocation Exploration (iii) Data layout exploration. The core engine of the memory architecture exploration framework is formulated as a Multi-objective Non-Dominated Sorting Genetic Algorithm (NSGA) [25]. For the data layout problem, which needs to be solved for thousands of memory architectures, we use our fast efficient heuristic data layout method.

37 1.6 Contributions 19 Our integrated memory architecture exploration framework searches the design space by exploring 1000s of memory architectures and lists down Pareto-optimal design solutions that are interesting from an area, power, and performance view point. Next, we address the memory architecture exploration problem for hybrid memory architectures that have a combination of SPRAM and cache. For such a hybrid architecture, a critical step is to partition the data between on-chip SPRAM and external RAM. Data partitioning aims at improving the overall memory sub-system performance by placing data in SPRAM that have the following characteristics: (a) higher access frequency, (b) data that has a overlapping life time with many other data, and (c) data that has poor spatial access characteristics. By placing all data that exhibits the above characteristics in SPRAM results in reducing the number of potentially conflicting data in cache, reducing the cache misses, leading to overall memory sub-system performance improvement. But typically the SPRAM size is small and it is not possible to accommodate all the data identified for SPRAM placement. Hence, even after data partitioning, there will be a significant number of potentially conflicting data sections that need to be placed in external RAM. If these data are need to be placed in the caches such that the conflict misses causes between them is reduced. Cache-conscious data layout addresses this problem and aims at placing data in external RAM (off-chip RAM) with the objective to reduce cache misses. This is achieved by an efficient data layout heuristic that is independent of instruction caches, optimizes run-time and keeps the off-chip memory address space usage under check. We extend the above approach and perform hybrid memory architecture exploration with the objective to optimize run-time performance, power consumption and area. The salient feature of our work are as follows. First, we provide a unified framework for logical memory exploration, memory allocation exploration, and data layout Our work addresses power, performance, area optimization in an integrated framework

38 20 Introduction Our work addresses memory architecture exploration framework for a hybrid memory architecture involving on-chip SPRAM and cache. Our work does not rely on source-code optimization for power and performance optimization. Hence it is suitable for Platform-based/IP-based system design 1.7 Thesis Overview The rest of the thesis is organized as follows. In the following chapter, we provide the background material for the thesis. We begin by explaining the memory architecture of a DSP and an MCU. We summarize the software optimizations used in the literature to improve memory access efficiency. We explain cache-based embedded SoC and their challenges with respect to predictability. Finally, we introduce the concepts of a Genetic Algorithm (GA) for optimization, since GA is used in our optimization framework in the latter chapters. In Chapter 3, we propose different methods to address the data layout problem for onchip SPRAM based memory architecture. First, we propose a Integer Linear Programming (ILP) based approach. Further, we also propose a fast and efficient heuristic for the data layout problem. Finally, we formulate the data layout problem in Genetic Algorithm (GA). In Chapter 4, we present a multi-objective memory architecture exploration framework to search the memory design space for the on-chip memory architecture with performance and memory cost as two objectives. We address the memory architecture exploration problem at the logical level. Multi-objectective Data Layout Exploration problem is addressed in Chapter 5. Here, the data layout design space is explored for a given logical memory architecture and application with respect to performance and power. In Chapter 6, we address the memory architecture exploration problem at physical memory level. In this chapter we propose two different approaches for addressing the physical memory architecture exploration.

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

Introduction to Digital System Design

Introduction to Digital System Design Introduction to Digital System Design Chapter 1 1 Outline 1. Why Digital? 2. Device Technologies 3. System Representation 4. Abstraction 5. Development Tasks 6. Development Flow Chapter 1 2 1. Why Digital

More information

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview

EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview EEM870 Embedded System and Experiment Lecture 1: SoC Design Overview Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw Feb. 2013 Course Overview

More information

Testing of Digital System-on- Chip (SoC)

Testing of Digital System-on- Chip (SoC) Testing of Digital System-on- Chip (SoC) 1 Outline of the Talk Introduction to system-on-chip (SoC) design Approaches to SoC design SoC test requirements and challenges Core test wrapper P1500 core test

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

ELEC 5260/6260/6266 Embedded Computing Systems

ELEC 5260/6260/6266 Embedded Computing Systems ELEC 5260/6260/6266 Embedded Computing Systems Spring 2016 Victor P. Nelson Text: Computers as Components, 3 rd Edition Prof. Marilyn Wolf (Georgia Tech) Course Topics Embedded system design & modeling

More information

Multichannel Voice over Internet Protocol Applications on the CARMEL DSP

Multichannel Voice over Internet Protocol Applications on the CARMEL DSP Multichannel Voice over Internet Protocol Applications on the CARMEL DSP 1 Introduction Multichannel DSP applications continue to demand increasing numbers of channels and equivalently greater DSP performance

More information

Computer System Design. System-on-Chip

Computer System Design. System-on-Chip Brochure More information from http://www.researchandmarkets.com/reports/2171000/ Computer System Design. System-on-Chip Description: The next generation of computer system designers will be less concerned

More information

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: Embedded Systems - , Raj Kamal, Publs.: McGraw-Hill Education Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Pre-tested System-on-Chip Design. Accelerates PLD Development

Pre-tested System-on-Chip Design. Accelerates PLD Development Pre-tested System-on-Chip Design Accelerates PLD Development March 2010 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 Pre-tested

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

Trends in Embedded Software Development in Europe. Dr. Dirk Muthig dirk.muthig@iese.fraunhofer.de

Trends in Embedded Software Development in Europe. Dr. Dirk Muthig dirk.muthig@iese.fraunhofer.de Trends in Embedded Software Development in Europe Dr. Dirk Muthig dirk.muthig@iese.fraunhofer.de Problems A software project exceeds the budget by 90% and the project time by 120% in average Project Management

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

ARM Microprocessor and ARM-Based Microcontrollers

ARM Microprocessor and ARM-Based Microcontrollers ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 A Microcontroller-Based Embedded System Roadmap 1 Introduction ARM ARM Basics 2 ARM Extensions Thumb Jazelle NEON & DSP Enhancement

More information

Implementing a Digital Answering Machine with a High-Speed 8-Bit Microcontroller

Implementing a Digital Answering Machine with a High-Speed 8-Bit Microcontroller Implementing a Digital Answering Machine with a High-Speed 8-Bit Microcontroller Zafar Ullah Senior Application Engineer Scenix Semiconductor Inc. Leo Petropoulos Application Manager Invox TEchnology 1.0

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance

MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance MPSoC Designs: Driving Storage Management IP to Critical Importance Design IP has become an essential part of SoC realization it is a powerful resource multiplier that allows SoC design teams to focus

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

NETWORK ISSUES: COSTS & OPTIONS

NETWORK ISSUES: COSTS & OPTIONS VIDEO CONFERENCING NETWORK ISSUES: COSTS & OPTIONS Prepared By: S. Ann Earon, Ph.D., President Telemanagement Resources International Inc. Sponsored by Vidyo By:S.AnnEaron,Ph.D. Introduction Successful

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Real-Time Operating Systems for MPSoCs

Real-Time Operating Systems for MPSoCs Real-Time Operating Systems for MPSoCs Hiroyuki Tomiyama Graduate School of Information Science Nagoya University http://member.acm.org/~hiroyuki MPSoC 2009 1 Contributors Hiroaki Takada Director and Professor

More information

Chapter 2 Features of Embedded System

Chapter 2 Features of Embedded System Chapter 2 Features of Embedded System Abstract This chapter will introduce the basic elements of embedded systems (or dedicated systems). The integrated control systems represent one of the areas of modern

More information

Custom design services

Custom design services Custom design services Your partner for electronic design services and solutions Barco Silex, Barco s center of competence for micro-electronic design, has established a solid reputation in the development

More information

1. PUBLISHABLE SUMMARY

1. PUBLISHABLE SUMMARY 1. PUBLISHABLE SUMMARY ICT-eMuCo (www.emuco.eu) is a European project with a total budget of 4.6M which is supported by the European Union under the Seventh Framework Programme (FP7) for research and technological

More information

on-chip and Embedded Software Perspectives and Needs

on-chip and Embedded Software Perspectives and Needs Systems-on on-chip and Embedded Software - Perspectives and Needs Miguel Santana Central R&D, STMicroelectronics STMicroelectronics Outline Current trends for SoCs Consequences and challenges Needs: Tackling

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Types Of Operating Systems

Types Of Operating Systems Types Of Operating Systems Date 10/01/2004 1/24/2004 Operating Systems 1 Brief history of OS design In the beginning OSes were runtime libraries The OS was just code you linked with your program and loaded

More information

evm Virtualization Platform for Windows

evm Virtualization Platform for Windows B A C K G R O U N D E R evm Virtualization Platform for Windows Host your Embedded OS and Windows on a Single Hardware Platform using Intel Virtualization Technology April, 2008 TenAsys Corporation 1400

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:

More information

Continuous-Time Converter Architectures for Integrated Audio Processors: By Brian Trotter, Cirrus Logic, Inc. September 2008

Continuous-Time Converter Architectures for Integrated Audio Processors: By Brian Trotter, Cirrus Logic, Inc. September 2008 Continuous-Time Converter Architectures for Integrated Audio Processors: By Brian Trotter, Cirrus Logic, Inc. September 2008 As consumer electronics devices continue to both decrease in size and increase

More information

Sample Project List. Software Reverse Engineering

Sample Project List. Software Reverse Engineering Sample Project List Software Reverse Engineering Automotive Computing Electronic power steering Embedded flash memory Inkjet printer software Laptop computers Laptop computers PC application software Software

More information

An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration

An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration Toktam Taghavi, Andy D. Pimentel Computer Systems Architecture Group, Informatics Institute

More information

4. H.323 Components. VOIP, Version 1.6e T.O.P. BusinessInteractive GmbH Page 1 of 19

4. H.323 Components. VOIP, Version 1.6e T.O.P. BusinessInteractive GmbH Page 1 of 19 4. H.323 Components VOIP, Version 1.6e T.O.P. BusinessInteractive GmbH Page 1 of 19 4.1 H.323 Terminals (1/2)...3 4.1 H.323 Terminals (2/2)...4 4.1.1 The software IP phone (1/2)...5 4.1.1 The software

More information

Multi-objective Design Space Exploration based on UML

Multi-objective Design Space Exploration based on UML Multi-objective Design Space Exploration based on UML Marcio F. da S. Oliveira, Eduardo W. Brião, Francisco A. Nascimento, Instituto de Informática, Universidade Federal do Rio Grande do Sul (UFRGS), Brazil

More information

DEVELOPING TRENDS OF SYSTEM ON A CHIP AND EMBEDDED SYSTEM

DEVELOPING TRENDS OF SYSTEM ON A CHIP AND EMBEDDED SYSTEM DEVELOPING TRENDS OF SYSTEM ON A CHIP AND EMBEDDED SYSTEM * Monire Norouzi Young Researchers and Elite Club, Shabestar Branch, Islamic Azad University, Shabestar, Iran *Author for Correspondence ABSTRACT

More information

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit. Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

FLIX: Fast Relief for Performance-Hungry Embedded Applications

FLIX: Fast Relief for Performance-Hungry Embedded Applications FLIX: Fast Relief for Performance-Hungry Embedded Applications Tensilica Inc. February 25 25 Tensilica, Inc. 25 Tensilica, Inc. ii Contents FLIX: Fast Relief for Performance-Hungry Embedded Applications...

More information

Hardware/Software Co-Design of a Java Virtual Machine

Hardware/Software Co-Design of a Java Virtual Machine Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada ken@csc.uvic.ca Micaela Serra University of Victoria

More information

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU ESE566 REPORT3 Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU Nov 19th, 2002 ABSTRACT: In this report, we discuss several recent published papers on design methodologies of core-based

More information

Chapter 4 System Unit Components. Discovering Computers 2012. Your Interactive Guide to the Digital World

Chapter 4 System Unit Components. Discovering Computers 2012. Your Interactive Guide to the Digital World Chapter 4 System Unit Components Discovering Computers 2012 Your Interactive Guide to the Digital World Objectives Overview Differentiate among various styles of system units on desktop computers, notebook

More information

JOURNAL OF OBJECT TECHNOLOGY

JOURNAL OF OBJECT TECHNOLOGY JOURNAL OF OBJECT TECHNOLOGY Online at http://www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2005 Vol. 4, No. 7, September-October 2005 On Issues with Component-Based Software

More information

CS2101a Foundations of Programming for High Performance Computing

CS2101a Foundations of Programming for High Performance Computing CS2101a Foundations of Programming for High Performance Computing Marc Moreno Maza & Ning Xie University of Western Ontario, London, Ontario (Canada) CS2101 Plan 1 Course Overview 2 Hardware Acceleration

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

Codesign: The World Of Practice

Codesign: The World Of Practice Codesign: The World Of Practice D. Sreenivasa Rao Senior Manager, System Level Integration Group Analog Devices Inc. May 2007 Analog Devices Inc. ADI is focused on high-end signal processing chips and

More information

Using On-chip Networks to Minimize Software Development Costs

Using On-chip Networks to Minimize Software Development Costs Using On-chip Networks to Minimize Software Development Costs Today s multi-core SoCs face rapidly escalating costs driven by the increasing number of cores on chips. It is common to see code bases for

More information

ZigBee Technology Overview

ZigBee Technology Overview ZigBee Technology Overview Presented by Silicon Laboratories Shaoxian Luo 1 EM351 & EM357 introduction EM358x Family introduction 2 EM351 & EM357 3 Ember ZigBee Platform Complete, ready for certification

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Switch Fabric Implementation Using Shared Memory

Switch Fabric Implementation Using Shared Memory Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Software Defined Radio Architecture for NASA s Space Communications

Software Defined Radio Architecture for NASA s Space Communications From July 2007 High Frequency Electronics Copyright 2007 Summit Technical Media Software Defined Radio Architecture for NASA s Space Communications By Maximilian C. Scardelletti, Richard C. Reinhart, Monty

More information

Microtronics technologies Mobile: 99707 90092

Microtronics technologies Mobile: 99707 90092 For more Project details visit: http://www.projectsof8051.com/rfid-based-attendance-management-system/ Code Project Title 1500 RFid Based Attendance System Synopsis for RFid Based Attendance System 1.

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

Discovering Computers 2011. Living in a Digital World

Discovering Computers 2011. Living in a Digital World Discovering Computers 2011 Living in a Digital World Objectives Overview Differentiate among various styles of system units on desktop computers, notebook computers, and mobile devices Identify chips,

More information

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to: 55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................

More information

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH WHITE PAPER METRIC-DRIVEN VERIFICATION ENSURES SOFTWARE DEVELOPMENT QUALITY BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH INTRODUCTION The complexity of electronic systems is rapidly

More information

Qsys and IP Core Integration

Qsys and IP Core Integration Qsys and IP Core Integration Prof. David Lariviere Columbia University Spring 2014 Overview What are IP Cores? Altera Design Tools for using and integrating IP Cores Overview of various IP Core Interconnect

More information

From Control Loops to Software

From Control Loops to Software CNRS-VERIMAG Grenoble, France October 2006 Executive Summary Embedded systems realization of control systems by computers Computers are the major medium for realizing controllers There is a gap between

More information

Chapter 2 Heterogeneous Multicore Architecture

Chapter 2 Heterogeneous Multicore Architecture Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is

More information

Computer Systems Structure Main Memory Organization

Computer Systems Structure Main Memory Organization Computer Systems Structure Main Memory Organization Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Storage/Memory

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Weighted Total Mark. Weighted Exam Mark

Weighted Total Mark. Weighted Exam Mark CMP2204 Operating System Technologies Period per Week Contact Hour per Semester Total Mark Exam Mark Continuous Assessment Mark Credit Units LH PH TH CH WTM WEM WCM CU 45 30 00 60 100 40 100 4 Rationale

More information

Memory Systems. Static Random Access Memory (SRAM) Cell

Memory Systems. Static Random Access Memory (SRAM) Cell Memory Systems This chapter begins the discussion of memory systems from the implementation of a single bit. The architecture of memory chips is then constructed using arrays of bit implementations coupled

More information

WORKFLOW ENGINE FOR CLOUDS

WORKFLOW ENGINE FOR CLOUDS WORKFLOW ENGINE FOR CLOUDS By SURAJ PANDEY, DILEBAN KARUNAMOORTHY, and RAJKUMAR BUYYA Prepared by: Dr. Faramarz Safi Islamic Azad University, Najafabad Branch, Esfahan, Iran. Workflow Engine for clouds

More information

Introducción. Diseño de sistemas digitales.1

Introducción. Diseño de sistemas digitales.1 Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

More information

How mobile operators can monetize 3G investments through an effective applications platform

How mobile operators can monetize 3G investments through an effective applications platform Technology for Innovators TM How mobile operators can monetize 3G investments through an effective applications platform By Mike Yonker mikey@ti.com Director of Technology Strategy, Wireless Terminals

More information

Development of an Internet based Embedded System for Smart House Controlling and Monitoring

Development of an Internet based Embedded System for Smart House Controlling and Monitoring Development of an Internet based Embedded System for Smart House Controlling and Monitoring Ahmed Abd-Elkarim Abd- Ellatif Salih Maged Ali Mohammed Asa'ad Yousif Elhadi Elsideeg Ahmed Department of Computer

More information

The SA601: The First System-On-Chip for Guitar Effects By Thomas Irrgang, Analog Devices, Inc. & Roger K. Smith, Source Audio LLC

The SA601: The First System-On-Chip for Guitar Effects By Thomas Irrgang, Analog Devices, Inc. & Roger K. Smith, Source Audio LLC The SA601: The First System-On-Chip for Guitar Effects By Thomas Irrgang, Analog Devices, Inc. & Roger K. Smith, Source Audio LLC Introduction The SA601 is a mixed signal device fabricated in 0.18u CMOS.

More information

EEC 119B Spring 2014 Final Project: System-On-Chip Module

EEC 119B Spring 2014 Final Project: System-On-Chip Module EEC 119B Spring 2014 Final Project: System-On-Chip Module Dept. of Electrical and Computer Engineering University of California, Davis Issued: March 14, 2014 Subject to Revision Final Report Due: June

More information

White Paper: Pervasive Power: Integrated Energy Storage for POL Delivery

White Paper: Pervasive Power: Integrated Energy Storage for POL Delivery Pervasive Power: Integrated Energy Storage for POL Delivery Pervasive Power Overview This paper introduces several new concepts for micro-power electronic system design. These concepts are based on the

More information

Mobile Operating Systems Lesson 05 Windows CE Part 1

Mobile Operating Systems Lesson 05 Windows CE Part 1 Mobile Operating Systems Lesson 05 Windows CE Part 1 Oxford University Press 2007. All rights reserved. 1 Windows CE A 32 bit OS from Microsoft Customized for each specific hardware and processor in order

More information

System Software Integration: An Expansive View. Overview

System Software Integration: An Expansive View. Overview Software Integration: An Expansive View Steven P. Smith Design of Embedded s EE382V Fall, 2009 EE382 SoC Design Software Integration SPS-1 University of Texas at Austin Overview Some Definitions Introduction:

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

Full-Band Capture Cable Digital Tuning

Full-Band Capture Cable Digital Tuning White Paper Full-Band Capture Cable Digital Tuning Cable operators are demanding devices that support an increasing number of simultaneous channels, which translates to multiple cable tuners and demodulators

More information

FPGAs in Next Generation Wireless Networks

FPGAs in Next Generation Wireless Networks FPGAs in Next Generation Wireless Networks March 2010 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 FPGAs in Next Generation

More information

On some Potential Research Contributions to the Multi-Core Enterprise

On some Potential Research Contributions to the Multi-Core Enterprise On some Potential Research Contributions to the Multi-Core Enterprise Oded Maler CNRS - VERIMAG Grenoble, France February 2009 Background This presentation is based on observations made in the Athole project

More information

3 - Introduction to Operating Systems

3 - Introduction to Operating Systems 3 - Introduction to Operating Systems Mark Handley What is an Operating System? An OS is a program that: manages the computer hardware. provides the basis on which application programs can be built and

More information

Building Blocks for PRU Development

Building Blocks for PRU Development Building Blocks for PRU Development Module 1 PRU Hardware Overview This session covers a hardware overview of the PRU-ICSS Subsystem. Author: Texas Instruments, Sitara ARM Processors Oct 2014 2 ARM SoC

More information

Chapter 1: Introduction. What is an Operating System?

Chapter 1: Introduction. What is an Operating System? Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real -Time Systems Handheld Systems Computing Environments

More information

High-Level Synthesis for FPGA Designs

High-Level Synthesis for FPGA Designs High-Level Synthesis for FPGA Designs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer consultant Cereslaan 10b 5384 VT Heesch

More information

Universal Flash Storage: Mobilize Your Data

Universal Flash Storage: Mobilize Your Data White Paper Universal Flash Storage: Mobilize Your Data Executive Summary The explosive growth in portable devices over the past decade continues to challenge manufacturers wishing to add memory to their

More information

Networking Remote-Controlled Moving Image Monitoring System

Networking Remote-Controlled Moving Image Monitoring System Networking Remote-Controlled Moving Image Monitoring System First Prize Networking Remote-Controlled Moving Image Monitoring System Institution: Participants: Instructor: National Chung Hsing University

More information

Software engineering for real-time systems

Software engineering for real-time systems Introduction Software engineering for real-time systems Objectives To: Section 1 Introduction to real-time systems Outline the differences between general-purpose applications and real-time systems. Give

More information

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions Insight, Analysis, and Advice on Signal Processing Technology BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions Steve Ammon Berkeley Design Technology, Inc.

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Seeking Opportunities for Hardware Acceleration in Big Data Analytics Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who

More information

Rapid System Prototyping with FPGAs

Rapid System Prototyping with FPGAs Rapid System Prototyping with FPGAs By R.C. Coferand Benjamin F. Harding AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Newnes is an imprint of

More information

EMBEDDED SYSTEM BASICS AND APPLICATION

EMBEDDED SYSTEM BASICS AND APPLICATION EMBEDDED SYSTEM BASICS AND APPLICATION TOPICS TO BE DISCUSSED System Embedded System Components Classifications Processors Other Hardware Software Applications 2 INTRODUCTION What is a system? A system

More information

PowerPC Microprocessor Clock Modes

PowerPC Microprocessor Clock Modes nc. Freescale Semiconductor AN1269 (Freescale Order Number) 1/96 Application Note PowerPC Microprocessor Clock Modes The PowerPC microprocessors offer customers numerous clocking options. An internal phase-lock

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

The MOST Affordable HD Video Conferencing. Conferencing for Enterprises, Conferencing for SMBs

The MOST Affordable HD Video Conferencing. Conferencing for Enterprises, Conferencing for SMBs The MOST Affordable HD Video Conferencing Video conferencing has become an increasingly popular service, being widely used by enterprises, organizations and individuals. Thanks to the enormous growth in

More information

Embedded Systems: Technologies and Markets

Embedded Systems: Technologies and Markets Jan 2012 IFT016D Use this report to: Understand the market for embedded technology through 2015, considering macroeconomic factors and dynamics of the markets for various end products. Gain an understanding

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik

Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen. Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Architekturen und Einsatz von FPGAs mit integrierten Prozessor Kernen Hans-Joachim Gelke Institute of Embedded Systems Professur für Mikroelektronik Contents Überblick: Aufbau moderner FPGA Einblick: Eigenschaften

More information

Specification and Design of a Video Phone System

Specification and Design of a Video Phone System Specification and Design of a Video Phone System PROJECT REPORT G roup Members: -Diego Anzola -H anirizk Contents Introduction Functional Description - Spec. Components Controller Memory Management Feasibility

More information