Performance Monitor Unit Design for an AXI-based Multi-Core SoC Platform

Size: px
Start display at page:

Download "Performance Monitor Unit Design for an AXI-based Multi-Core SoC Platform"

Transcription

1 Performance Monitor Unit Design for an AXI-based Multi-Core SoC Platform Hyun-min Kyung, Gi-ho Park, Jong Wook Kwak, WooKyeong Jeong, Tae-Jin Kim, Sung-Bae Park Processor Architecture Lab, SOC R&D Center, System LSI Division, Semiconductor Business, Samsung Electronics, Yongin-City, Kyeong-gi Do, Korea Hyunmin.kyung, giho.park, jongwook.kwak, wk.jeong, taejinkim, samsung.com ABSTRACT As the physical gate-count in System-On-Chip (SOC) system increases and system design complexity grows steadily, it becomes more and more difficult to achieve good resource utilization by assigning each task to certain hardware IP and tracing the execution patterns of each task efficiently. Therefore, the performance monitoring feature is getting more and more important to provide the ease of system monitoring and performance debugging. In this paper, we present a performance monitoring unit (PMU) for the AMBA Advanced extensible Interface (AXI) bus. The PMU has capability to measure major performance metrics, such as bus latency for the specific master requests and amount of memory traffic for specific durations. It can also measure the contention of the bus masters and slaves in the SOC. We present the distributor and the synchronization method to use multiple performance counting units as well. The performance monitoring unit has been verified in the platform FPGA board with 9 by 4 AXI interconnect configuration. These monitoring features can give the insight to system design architect by helping to find and analyze the performance bottleneck of target system. Categories and Subject Descriptors C.4 [Performance of Systems]: Performance attributes General Terms Performance Keywords Performance monitor, AXI, AMBA, SOC platform, architecture exploration 1. INTRODUCTION As the complexity of the SOC increases, it is very crucial to understand the behavior of the internal interactions in the SOC for the effective SOC design. Because an SOC integrates various IP blocks into a chip, conventional board level system debug Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC 07, March 11-15, 2007, Seoul, Korea. Copyright 2007 ACM /07/0003 $5.00. approaches such as probing signals with logic analyzers, are not effective. This is because the most interesting transactions, like bus contentions cannot be observed off-chip. The SOC architects usually hand-calculate the memory traffics of a specific function IP, such as H.264 decoder, based on the IP specification and the algorithm analysis to design the bus system. This method can be effective to the simple SOC system which has only a single CPU and the function IP and the CPU is usually idle when the function IP is running. The dynamic behavior of the SOC cannot be considered in this design method. For the SOC design with multiple cores, especially the programmable core processes large applications, such as multimedia codec processing, it is very crucial to understand the dynamic behavior of the system for the bus architecture. This kind of information will be also very useful to optimize the software for the SOC. In this paper, we present the performance monitoring unit (PMU) designed for monitoring and analyzing system behavior on AXI bus of the multi-core SOC. The PMU will be embedded in the SAVm IV SOC platform. We are building our first version of the SAViT (Standard/Synergistic/ Superior Architecture for Versatile information Technology) platform for the multimedia processing [11]. The SAVm IV SOC platform is the first prototype version for SAViT series. This platform SOC has a heterogeneous multiprocessor architecture with an ARM CPU and a StarCore DSP. The on-chip system bus of thus SOC is based on the AMBA AXI bus system for high-performance and high-bandwidth applications. The PMU has capability to measure major performance metrics, such as bus latency for the specific master requests and amount of memory traffic for specific durations. We expect it will be used for the performance analysis of the bus system and software optimization for the SOC. The rest of this paper is organized as follows. Section 2 describes related works about performance monitoring unit. Overview of the AXI protocol and the architecture of PMU are described in section 3 and section 4 respectively. We present the implementation and a sample usage of the PMU in section 5. Finally, section 6 concludes this paper. 2. RELATED WORKS It is clear that adding performance monitoring features has created an amount of opportunities to analyze and tune the current microarchitectures and embedded systems. In order to efficiently evaluate system performance, research on the hardware performance monitor (HPM) has been performed. It includes the evolution of PowerPC performance monitors [3], which has been started since Originally, when Power2 processors were 1565

2 designed, a full scale performance monitor was integrated onto the 4-chip processor, including 22 counters. The performance monitoring features of Pentium 4 [4], with supporting simultaneous multithreaded executions is also one of these HPM to support simultaneous multithreading, qualification of event detection by thread ID and qualification of event counting by thread mode. These monitors usually embedded in the processor itself and it is mainly used for the software optimization. Collard et al. [5] shows system-wide instruction-level performance monitor, called SWIFT (System-Wide Information for Tuning), Borril et al. [6] introduce a new performance profiling tool of a cosmology application, called IPM (Integrated Performance Monitoring). Besides, DSPs also require performance monitoring features to achieve maximum performance. A performance monitoring tools in DSPs, however, should have the DSP-specific requirements such as multimedia video processing (MVP) [9]. In addition, Wisniewski et al and Mink et al introduce performance monitoring tools on operation systems and PCI bus, respectively [7], [8]. Recently, IBM proposed PLB Performance Monitor (PPM) [10], which provides hardware for counting certain events associated with processor local bus transactions. The PPM contains a set of counters whose contents may be read by software and used to analyze and enhance PLB performance. The main role of PPM is both event-occurrence counting and event-duration counting. The main difference of the PPM from previous research is that the PPM is designed for the system wide performance measuring, especially PLB bus performance. Compared to previous works mentioned above, we design and implement performance monitoring unit (PMU) for the System- On-Chip (SOC) platform. It is designed to gather the information for the bus performance related metrics such as bus-transaction related events including number of requests, size of the burst transfer request and bus-contention related events. The PMU also considers the characteristics of the AMBA AXI protocol. 3. THE OVERVIEW OF AMBA AXI The SAViT platform is based on the AMBA Advanced extensible Interface (AXI) bus system. The AXI is an enhanced bus protocol of existing Advanced High-performance bus (AHB). We consider carefully the nature of the AXI bus protocol to design the performance monitoring unit of our SOC. In this section, the features of AXI protocol and interconnect related to the performance counter design are described briefly. 3.1 AXI Protocol [1] The AMBA AXI is targeted at high-performance, high-frequency system designs. The AXI is an improved bus system of existing AHB and APB protocol in view of bus performance. The key features of the AXI protocol are as follow: Separate address/control and data phases Support for unaligned data transfers using byte strobes Burst-based transactions with only start address issued Separate read and write data channels Ability to issue multiple outstanding addresses The AXI protocol has five independent channels consists of a set of information signals, and each channel uses a two-way VALID and READY handshake mechanism. The five independent channels are read address channel, write address channel, read data channel, write data channel, and write response channel. The read address channel, write address channel and write data channel are initiated by a master device. The read data channel and write response channel are driven by a slave device. 3.2 AXI Interconnect The AXI interconnect is needed for connecting multiple AXI buses. A number of master and slave devices connect together through some form of interconnect as shown in Figure 1. The AXI interconnect has interfaces to connect master and slave devices. The interface can be either a master port or a slave port, and each port is equivalent to another device with symmetrical master and slave ports to real master and slave devices. The key feature of the AXI interconnect is a full-cross bar structure between master and slave ports. This enables that multiple master devices access to different slave devices simultaneously. But, if the destination slave devices are overlapped, the contention between master devices can occur. Similarly, multiple slave instances can also access different master instances concurrently. This is one of the most distinct features as compared with AHB protocol, because only one master can occupy a bus in AHB system. Figure 1. AXI Interconnect (Example. 3x2) We defined a Master Bus as the interface from a master device to a slave port of an interconnect, and a Slave Bus as the interface from a master port of an interconnect to a slave device. The AXI interconnect developed in our system is based on PrimeCell AXI Configurable Interconnect (PL300)[2]. The structure of AXI interconnect, 9 by 4 is used for our system implementation for performance monitoring unit design (i.e, 9 masters, 4 slaves). 4. THE PERFORMANCE MONITORING UNITS In this section, we explain the performance monitoring unit (PMU) in AXI bus system in detail. First, the overall architecture of PMU is presented and sub-module architectures of PMU are described on following subsections. 4.1 Architectural Overview The PMU provides monitoring hardware for counting certain events related with AXI bus transactions. The PMU is a set of 1566

3 hardware monitors which embed a set of counters whose contents may be set and read by software and used to analyze and enhance the performance of the entire system. Figure 2 shows the overall structure of the performance monitoring unit. The PMU is designed for monitoring and analyzing both the events related to bus-transaction (BTE) and the events related to bus-contention (BCE). BTE can be measured by observing a specified bus. Counting of BTE is accomplished via a set of 32-bit counters that increment their values once for each occurrence of the selected event. However, BCE can be counted by monitoring bus occupation and arbitration signals from multiple buses. This contention and arbitration is executed on AXI interconnect. Measuring BCE can be done by a set of counters through incrementing their values by one for each Conflict between a selected target port and other target ports, which can be either a master port or a slave port. Conflict is defined as two or more targets want to transfer their data to a same destination at the same time. The PMU is composed of two independent modules, Bus Monitor (BM) and AXI Interconnect Contention Monitor (CM). The BM counts BTE on a master bus, and the CM measures BCE. The CM is further divided into Master Contention Monitor (MCM) and Slave Contention Monitor (SCM) according to contention devices. To increase the effectiveness of the PMU, we use the distributor for these bus monitor and interconnection monitors. The distributor exists for flexible connecting with master/slave buses and monitoring units as shown in Figure 2. We can connect any combination of masters to the bus monitor modules up to the number of monitor modules. In the case of bus monitor, if three bus monitor modules are available in the 9 by 4 AXI interconnect, we can observe any three ports among the nine master ports of the bus by setting the register of the distributor simultaneously. Detailed features for each module will be explained on the next subsections. 4.2 Bus monitor The Bus Monitor (BM) performs bus-transaction related events counting. Since bus transaction is generated by a master device, the BM observes a master bus signals. The Bus monitor gathers following information. Address range of the request (implemented to gather the statistics for three address ranges specified by register setting) Total transfer count (read/write) Total transfer size (read/write), in bytes Read latency distribution Write latency distribution Global clock count The BM has three address range divisions, because a master transfers data to several destinations (slave) as shown in Figure 2. An architect can modify an address range for observing the specified bus-transaction events by setting the control register of the bus monitor. The address range can be a criterion that is dividing into destination slaves or dividing into three sections in a slave address range. The BM contains three performance counter sets (PCSs) for counting bus-transaction events independently according to three address ranges as shown in Figure 3. Each PCS unit, denoted S0_PC, S1_PC, and S3_PC in the Figure 3, measures bus-transaction events which a master generates within the specified address range. AWVALID/ARVALID AWREADY/ARREADY AWSIZE[2:0]/ARSIZE[2:0] AWLEN[3:0]/ARLEN[3:0] WVALID/RVALID WREADY/RREADY Read address transaction indication DEMUX DEMUX Transfer count[31:0] (read/write) Transfer size[31:0] (read/write) Read Latency Count[31:0] 0~7 Write Latency Count[31:0] 0~3 Master bus 0 Master bus1 AXI Interconnect (9x4) Slave bus0 Read Data CH Write Resp CH Control Register Setting/ Monitoring register read Enable Global clock Count[63:0] Read Address range 0~7 0~3 Write Address range 0~3 Decoding Address Range 0~3 Global clock count register S0_PC monitoring registers S1_PC monitoring registers S2_PC monitoring registers Master bus2 Master bus3 Master bus4 Master bus5 Master bus6 Master bus7 Master bus8 Slave bus1 Slabe bus2 Slave bus3 Addr CH Write Data CH Figure 2. Overall Structure of Performance Monitoring Unit for 9x4 AXI interconnect Figure 3. Structure of Bus Monitor The counters in each performance monitoring unit store the number of bus-transaction events, those are the total transfer count, the total transfer size (in bytes), the read latency distribution, and the write latency distribution. Total transfer count is the number of transactions that a master generates for a simulation time. Transfer count does not distinguish single transaction and burst transaction. Transfer count is increased one for both single and burst cases. But transfer size does classify single transaction and burst transaction. PC counts total transfer size of current transaction with ARLEN/AWLEN and ARSIZE/ AWSIZE signals when address transaction generates. 1567

4 Read and write latency is defined as the time a master waits for completing a transaction. There are two possible sources of this waiting time. The first source is other masters competing transaction on the bus and the other is response time of a slave device. In case of read latency, a master has to wait for receiving the read data from a slave after generating the read address. Therefore, read latency is defined as the time from the start of the read address transaction to the beginning of the read data service. We define the read latency differently in case of overlapping transaction, the time from the end of read data service for a previous transaction to the beginning of read data service for a current transaction as shown in Figure 4 for more effective analysis of read latency. Write latency is defined as the period needed a master completes write data transaction. Write latency does not consider the time from write address transaction to write data transaction, because the interval is not dependent on bus status, but the master s operating time. Because the read/write latency has wide distribution, such as from a couple of cycles to several hundreds of cycles, it is not effective to have dedicated counters for each latency cycle. We use an interval based counting method for counting read and write latency effectively. We can select eight intervals for read latency and four intervals for write latency by software setting. A set of 32-bit counters (eight counters for read and four counters for write) is incremented their values once for each occurrence of the event within the specified latency interval. The bus monitor has four global clock counters. Each of them is 64-bit counter. Global clock counter is used for measuring total clock counts for a simulation time. Conflict count between other slaves except specified slave on read data channel / write response channel The CM measures conflict counts for four independent channels: address channel, write data channel, read data channel, and write response channel. Since each channel has dedicated signals in AXI bus, four channels can activate independently without interfering with other channels. The CM is further divided into two modules according to transaction source: the Master Contention Monitor (MCM) and the Slave Contention Monitor (SCM). The address and write data channels are the signals a master generates and a slave responses. These channel contentions are treated in MCM. On the other hand, the read data and write response channels are the signals that a slave generates and a master device responses. The SCM measures conflicts between slave devices in these channels. The MCM block is connected with interconnect internal signals, which have each master s transaction information and current bus-occupied Master ID (ADDRMASTER [3:0], WRITEMASTER[3:0]), and slave bus signals as shown in Figure 5. The slave bus signals indicate a point of time that a transaction generates on the slave bus. At that time, if other masters except current bus-occupied master try to generate transactions, it is regarded as conflict. As growing the number of masters, the conflict case between masters increases rapidly. Therefore we select a user defined master named MasterX. The MCM is designed to measure conflict counts between MasterX and other masters and conflict counts between other masters except userdefined master by collection for each channel. Using this feature, we can observe the contention between one most interesting master and other masters in addition to the contention among the less interesting masters. Figure 4. BM read latency for Overlapping Transaction 4.3 Contention Monitor The Contention Monitor (CM) counts bus-contention related events. The bus contention can be occurred between master devices or slave devices. Since the AXI interconnect is a fullcross bar structure, we have to observe the convergence point that multiple devices have the same destination for monitoring buscontention events. In case of 9x4 interconnect, the convergence points are 13: 9 points are merged by slave devices, and 4 points are converged by master devices as shown in Figure 2. The monitoring features of CM are as follows: Conflict count between specified master and other masters on address channel / write data channel Conflict count between other masters except specified master on address channel / write data channel Conflict count between specified slave and other slaves on read data channel / write response channel Figure 5. Master Contention Monitor (Interconnect 9x4) The SCM block is similar to the MCM block, as shown in Figure 6. The SCM measures conflict counts between user-defined slave (i.e, SlaveX) and other slaves. It also counts conflicts between other slaves except user-defined slave for read data channel and write response channel. 1568

5 Figure 6. Slave Contention Monitor (Interconnect 9x4) 4.4 Distributor Configurable and Scalable Design Distributor provides configurable connection between AXI interconnect bus signals and bus/contention monitors as shown in Figure 2. The distributor is classified into three independent distributors: BM distributor, MCM distributor, and SCM distributor. Each distributor enables its monitor to measure multiple AXI interconnect bus signals based on the configurable setting. In case of 9x4 (9 masters and 4 slaves) AXI interconnect crossbar, BM distributor connects 9 master buses (not all AXI bus signals, but only transaction-related bus signals). Since BM distributor has a full-cross structure, it can link any master buses to a BM unit. This connection enables one BM can measure any master buses transactions. However, a BM unit cannot observe multiple master buses simultaneously. Therefore multiple BMs have to be connected to BM distributor for monitoring multiple master buses concurrently. The number of BM units connected to BM distributor can be changed up to the number of the total BM units by software setting. Figure 7 shows the (9 x n) BM distributor, which connects 9 master buses to n BM units by control register setting. The MCM distributor is connected 4 slave buses and the AXI internal signals that contain 9 masters transaction information and current transferred Master ID for each slave bus. Since the MCM also has a full-cross bar structure, it can link any MCM unit to every slave buses and other signals adaptively. Similarly, the SCM distributor connects multiple SCM units with the signals related with the contention between slave devices. Similar to the BM distributor, the MCM and SCM distributor can connect the MCM and SCM unit with each bus signals flexibly. The single BM, MCM and SCM distributors are needed for an AXI interconnect, but the multiple monitors are connected with each distributor. Figure 7. BM distributor (9 x n) 4.5 Multi-PMs Synchronization This section explains the details of measurement capabilities where multiple Performance Monitors (PMs) instances, which can be either Bus Monitor or Contention Monitor, are implemented in the system. When multiple PMs are activated concurrently, it is required to method to interact with each other for the synchronization. Each PM s Enable-In and Enable-Out signals ports can be connected in a way to allow synchronization among the multiple monitors. Figure 8 illustrates a wiring example among all PM instances to operate concurrently. One instance would act as the primary core, where its Enable-Out signal is connected to the Enable-In input of all remaining (secondary) PM instances. This configuration is set by enabling Sync_out in the primary instance and Sync_in in the secondary instances. This feature allows all instances to begin performing measurements and finish monitoring at the same time. In this configuration all secondary instances are to be programmed first, yet each will not begin functioning until the primary instance has been programmed and started. Therefore, all instances will operate concurrently. Figure 8. Multiple PMUs synchronization 1569

6 5. IMPLEMENTATION OF PMU In this section, we present the implementation and verification of the performance monitoring unit. The possible example usage of the PMU will be also discussed. 5.1 PMU Implementation We have implemented and verified the performance monitoring unit with a SAVm IV platform FPGA board. The FPGA board has two Xilinx xc2v8000, one xc2v4000 for the bus and IPs, and SDRAM/DDR, UART, LCD, Sim Card, Key pad, Modem interfaces. It also has ARM1136 and StarCore SC2400 test-chip on it to reflect the SAVm IV architecture which has a heterogeneous multi-processor architecture with ARM11 processor and a SC2400 DSP processor. Figure 9 shows the architecture and IP features of the SAVm IV SOC platform. It has an external bus interface, the AXI/AHB bus, to be easily used to integrate other IPs with the platform. The SAVm IV adopts 32-bit bus architecture including the AXI, AHB and APB bus. The primary system bus is the AXI spine bus. AHB/APB bus and bridges are used for interfacing with various IPs. The structure of AXI-spine interconnect is implemented in 9 by 4 in the FPGA board. ARM activates as a primary processor which performs control processing. StarCore carries out data processing such as H.264 decoding. The system and two core clock frequencies used at SOC platform are 20MHz in the FPGA board. The simulation on FPGA board is executed with following monitors: one (9x6) BM distributor, one (4x4) MCM distributor, one (9x9) SCM distributor, one BM, one MCM, and one SCM unit. The location of PMU on SAVm IV platform is shown in Figure Verification of the PMU The performance monitoring unit has been verified with SAVm IV FPGA board. The functionality of the PMU are verified by running the test scenario for each modules in the PMU as presented follows. Table 1 shows the bus monitor parameter setting of the PMU to verify the functionality of bus monitor and BM distributor. We write a simple program to generate some master transactions and check the result of the bus monitor counter values based on the configuration presented in Table 1. Table 1. Bus Monitoring Parameters IP Bus Monitor0 BM distributor (9 x 6) Configuration Registers BM enable Three address ranges Eight read latency ranges Three write latency ranges Global clock enable Sync enable BM distributor enable BM0~5 bus selection (ex. BM0 : Master1 bus) As explained earlier, BM distributor has capability to connect any Master bus and BM modules. In our test scenario, BM0 is used to monitor the master transactions. The BM distributor register has been set based on this scenario as shown in Table 1. In this example, BM0 bus selection register is configured as Master1 bus to connect Master 1 bus and BM0. BM1~5 bus selections which are not connected with any BMs are programmed as disable. Then, control registers of BM0 are configured for monitoring specified events, such as bus latencies and bus transaction events within specified address range. In addition, synchronization with other monitors is controlled by Sync_Enable register. By performing various parameters setting of BM distributor and Bus Monitor, we can verify the function of BM distributor and Bus Monitor. The Master Contention Monitor is activated by selecting the Slave bus which is selected to monitor contentions and the Master ID which is intended to monitor contentions on the selected Slave bus. For example, if we want to monitor contentions between Master 1 and other masters on Slave bus 2, the register setting is shown as Table 2. In order to synchronize with other monitors, Sync register can be set. Table 2. Master Contention Monitoring Parameters MCM distributor Master Contention Monitor 0 MCM Enable User defined Master ID MCM (MasterX) selection (ex. Master 1 device) Sync Enable MCM distributor enable MCM distributor ( 4 x 4 ) MCM0~3 bus selection (ex. MCM0 : Slave bus 2) Similar to MCM case, the Slave Contention Monitor is operated by selecting the Master bus which is desired to monitor contentions and the Slave ID which is intended to monitor contentions on the selected Master bus. If contentions between Slave 0 and other slave devices on Master bus 1 are desired to measure, the control registers are configured as Table 3. Table 3. Slave Contention Monitoring Parameters SCM distributor Slave Contention Monitor 0 SCM SCM Enable User defined Slave (SlaveX) selection (ex. Slave 0 device) Sync Enable SCM distributor enable SCM distributor ( 9 x 9) SCM0~8 bus selection (ex. SCM0 : Master bus 1) 1570

7 Figure 9. SAVm IV Platform block diagram Table 4. Example Test Scenario of Bus Monitor and Contention Monitor Measure Events IP Configuration Registers Description Measurement BM distributor ( 9 x 6 ) BM distributor enable BM0 bus selection BM1~5 bus selection Enable Master1 or Master7 bus select Disable The connection Master1(or Master7) bus with BM0 Number of bus transactions of Master1 (ARM) and Master7 (DMAX), and bus contentions related with Master 1 on Slave bus2 BM0 MCM distributor ( 4 x 4 ) MCM 0 BM enable Three address ranges Eight read latency ranges Three write latency ranges Global clock enable Sync Enable MCM distributor enable MCM0 bus selection MCM1~3 bus selection MCM Enable User defined Master ID (MasterX) selection Sync Enable Enable Select address range fitted SDRAM access address map for one among three ranges Select desired read latency ranges Select desired write latency ranges Global clock enable Sync Out Enable (2 b10) Enable Slave bus 2 select Disable Disable Master1 select Sync In Enable (2 b01) Bus transaction events such as transfer counts, size, read latency distribution, write latency distribution, simulation time, etc on Master1(or Master7) bus The connection Slave bus2 with MCM0 Bus contention events between Master1 and other masters, respectively Table 4 shows the test scenario to verify the functionality of the bus monitor and contention monitor simultaneously. We assume a simple case that the ARM1136 core (Master1) and the DMAX (Master 7) access SDRAM (Slave bus 2) simultaneously. The configuration presented in Table 4 enables the bus monitor and the contention monitor to gather the information about the contention between two IPs, i.e., Master 1 (ARM) and Master 7 (DMAX). In order to synchronize the BM0 and MCM0, the Sync Enable of BM0 is set as Sync Out, and the Sync Enable of MCM0 is set as 1571

8 Sync In. By this configuration, the operation of MCM0 is synchronized to the BM0 enable. We present only a couple of sample test scenarios as shown in this section even though many other various scenarios have been used for the checking of PMU functionality. Those include the check the functionality of the bus monitor and slave contention monitor 5.3 Usage of PMU We will use the PMU to evaluate the system performance with the H.264 decoding example on FPGA board. The Figure 10 shows the data flow of the H.264 decoding and the information what master devices generate transactions to slaves as dotted lines. As shown in Figure 10, the most congested path is SDRAM access via DMC (PL340). Since the AXI spine is the primary system bus, we ll estimate system performance at the AXI-spine interconnect. The structure of AXI-spine interconnect is 9x4, and the DMC (Dynamic Memory Controller) is connected to the 2nd Slave bus in AXIspine interconnect. We are currently porting the H.264 decoding software to the FPGA board to measure the performance of the system including bus utilization and latencies. The effectiveness of the performance monitoring unit will be verified using real application. The feature of the performance counting unit can be adjusted based on this analysis. 6. CONCLUSION We present a performance monitoring unit (PMU) for the AMBA AXI bus in this paper. The PMU has capability to measure major performance metrics, such as bus latency for the specific master requests and amount of memory traffic for specific durations. It can also measure the contention of the bus masters and slaves in the SOC. We design the distributor and the synchronization method to use multiple performance monitoring units as well. The performance counting unit has been verified in the platform FPGA board with 9 by 4 AXI interconnect configuration. The performance monitoring unit will be used to evaluate the system architecture, especially for the bus architecture design for our platform SOC design. 7. REFERENCES [1] AMBA AXI Protocol Specification v1.0, ARM, 2003 [2] PrimeCell AXI Configurable Interconnect (PL300) Technical Reference Manual, ARM, 2004 [3] Charles Roth, and Frank Levine. PowerPC TM Performance Monitor Evolution. Performance, Computing, and Communications Conference, IPCCC IEEE, Feb 1997, pp [4] Sprunt, B. Pentium 4 performance monitoring features, Micro, IEEE, July-Aug Figure 10. H.264 Decoding Data Flow in regard of SDRAM access [5] Collard, J.F, Jouppi N, and Yehia S. System-wide performance monitors and their application to the optimization of coherent memory access, PPoPP, ACM, June 2005 [6] Borril, J., Carter, J., Oliker, L., Skinner, D., and Biswas, R. Integrated performance monitoring of a cosmology application on leading HEC platforms, IEEE Parallel processing, 2005, pp [7] Wisniewski,R.W and Rosenburg, B. Efficient, unified, and scalable performance monitoring for multiprocessor operating system. Supercomputing, ACM/IEEE conference, Nov [8] Mink, A., Salamon, W., Hollingsworth, J.K., and Arunachalam, R. Performance measurement using low perturbation and high precision hardware assists Real-time system symposium, Dec. 1998, pp [9] Jihong Kim and Yongmin Kim, Performance analysis and tuning for a single-chip multiprocessor DSP, IEEE Concurrency, Jan-March 1997, pp [10] PLB performance monitor user s manual, IBM, 2002 [11] Gi-ho Park, et al. Architecture exploration and performance verification environments of multi-core SOC for mobile multimedia embedded systems, ISOCC,

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

From Bus and Crossbar to Network-On-Chip. Arteris S.A. From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.

More information

A case study of mobile SoC architecture design based on transaction-level modeling

A case study of mobile SoC architecture design based on transaction-level modeling A case study of mobile SoC architecture design based on transaction-level modeling Eui-Young Chung School of Electrical & Electronic Eng. Yonsei University 1 EUI-YOUNG(EY) CHUNG, EY CHUNG Outline Introduction

More information

LogiCORE IP AXI Performance Monitor v2.00.a

LogiCORE IP AXI Performance Monitor v2.00.a LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................

More information

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual Architetture di bus per System-On On-Chip Massimo Bocchi Corso di Architettura dei Sistemi Integrati A.A. 2002/2003 System-on on-chip motivations 400 300 200 100 0 19971999 2001 2003 2005 2007 2009 Transistors

More information

AXI Performance Monitor v5.0

AXI Performance Monitor v5.0 AXI Performance Monitor v5.0 LogiCORE IP Product Guide Vivado Design Suite Table of Contents IP Facts Chapter 1: Overview Advanced Mode...................................................................

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

Chapter 2 Heterogeneous Multicore Architecture

Chapter 2 Heterogeneous Multicore Architecture Chapter 2 Heterogeneous Multicore Architecture 2.1 Architecture Model In order to satisfy the high-performance and low-power requirements for advanced embedded systems with greater fl exibility, it is

More information

SoC-Based Microcontroller Bus Design In High Bandwidth Embedded Applications

SoC-Based Microcontroller Bus Design In High Bandwidth Embedded Applications SoC-Based Microcontroller Bus Design In High Bandwidth Embedded Applications White Paper Abstract 32-bit embedded designs increasingly require real-time control of highbandwidth data streams over a network.

More information

A Scalable Large Format Display Based on Zero Client Processor

A Scalable Large Format Display Based on Zero Client Processor International Journal of Electrical and Computer Engineering (IJECE) Vol. 5, No. 4, August 2015, pp. 714~719 ISSN: 2088-8708 714 A Scalable Large Format Display Based on Zero Client Processor Sang Don

More information

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: "Embedded Systems - ", Raj Kamal, Publs.: McGraw-Hill Education

Lesson 7: SYSTEM-ON. SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY. Chapter-1L07: Embedded Systems - , Raj Kamal, Publs.: McGraw-Hill Education Lesson 7: SYSTEM-ON ON-CHIP (SoC( SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY 1 VLSI chip Integration of high-level components Possess gate-level sophistication in circuits above that of the counter,

More information

OpenSPARC T1 Processor

OpenSPARC T1 Processor OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

Design and Verification of Nine port Network Router

Design and Verification of Nine port Network Router Design and Verification of Nine port Network Router G. Sri Lakshmi 1, A Ganga Mani 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Pragathi Engineering College, Andhra

More information

Chapter 02: Computer Organization. Lesson 04: Functional units and components in a computer organization Part 3 Bus Structures

Chapter 02: Computer Organization. Lesson 04: Functional units and components in a computer organization Part 3 Bus Structures Chapter 02: Computer Organization Lesson 04: Functional units and components in a computer organization Part 3 Bus Structures Objective: Understand the IO Subsystem and Understand Bus Structures Understand

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Embedded Development Tools

Embedded Development Tools Embedded Development Tools Software Development Tools by ARM ARM tools enable developers to get the best from their ARM technology-based systems. Whether implementing an ARM processor-based SoC, writing

More information

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit. Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

What is a System on a Chip?

What is a System on a Chip? What is a System on a Chip? Integration of a complete system, that until recently consisted of multiple ICs, onto a single IC. CPU PCI DSP SRAM ROM MPEG SoC DRAM System Chips Why? Characteristics: Complex

More information

Serial port interface for microcontroller embedded into integrated power meter

Serial port interface for microcontroller embedded into integrated power meter Serial port interface for microcontroller embedded into integrated power meter Mr. Borisav Jovanović, Prof. dr. Predrag Petković, Prof. dr. Milunka Damnjanović, Faculty of Electronic Engineering Nis, Serbia

More information

ARM Ltd 110 Fulbourn Road, Cambridge, CB1 9NJ, UK. *peter.harrod@arm.com

ARM Ltd 110 Fulbourn Road, Cambridge, CB1 9NJ, UK. *peter.harrod@arm.com Serial Wire Debug and the CoreSight TM Debug and Trace Architecture Eddie Ashfield, Ian Field, Peter Harrod *, Sean Houlihane, William Orme and Sheldon Woodhouse ARM Ltd 110 Fulbourn Road, Cambridge, CB1

More information

SoC IP Interfaces and Infrastructure A Hybrid Approach

SoC IP Interfaces and Infrastructure A Hybrid Approach SoC IP Interfaces and Infrastructure A Hybrid Approach Cary Robins, Shannon Hill ChipWrights, Inc. ABSTRACT System-On-Chip (SoC) designs incorporate more and more Intellectual Property (IP) with each year.

More information

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards eddie.edwards@imperial.ac.uk At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

Accelerate Cloud Computing with the Xilinx Zynq SoC

Accelerate Cloud Computing with the Xilinx Zynq SoC X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce

More information

Computer System Design. System-on-Chip

Computer System Design. System-on-Chip Brochure More information from http://www.researchandmarkets.com/reports/2171000/ Computer System Design. System-on-Chip Description: The next generation of computer system designers will be less concerned

More information

System Performance Analysis of an All Programmable SoC

System Performance Analysis of an All Programmable SoC XAPP1219 (v1.1) November 5, 2015 Application Note: Zynq-7000 AP SoC System Performance Analysis of an All Programmable SoC Author: Forrest Pickett Summary This application note educates users on the evaluation,

More information

Concept Engineering Adds JavaScript-based Web Capabilities to Nlview at DAC 2016

Concept Engineering Adds JavaScript-based Web Capabilities to Nlview at DAC 2016 KAL - Large IP Cores: Memory Controllers: SD/SDIO 2.0/3.0 Controller SDRAM Controller DDR/DDR2/DDR3 SDRAM Controller NAND Flash Controller Flash/EEPROM/SRAM Controller Dear , Concept Engineering

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE Guillène Ribière, CEO, System Architect Problem Statement Low Performances on Hardware Accelerated Encryption: Max Measured 10MBps Expectations: 90 MBps

More information

Serial Communications

Serial Communications April 2014 7 Serial Communications Objectives - To be familiar with the USART (RS-232) protocol. - To be able to transfer data from PIC-PC, PC-PIC and PIC-PIC. - To test serial communications with virtual

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Pre-tested System-on-Chip Design. Accelerates PLD Development

Pre-tested System-on-Chip Design. Accelerates PLD Development Pre-tested System-on-Chip Design Accelerates PLD Development March 2010 Lattice Semiconductor 5555 Northeast Moore Ct. Hillsboro, Oregon 97124 USA Telephone: (503) 268-8000 www.latticesemi.com 1 Pre-tested

More information

IMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR

IMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR International Journal of Engineering & Science Research IMPLEMENTATION OF BACKEND SYNTHESIS AND STATIC TIMING ANALYSIS OF PROCESSOR LOCAL BUS(PLB) PERFORMANCE MONITOR ABSTRACT Pathik Gandhi* 1, Milan Dalwadi

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Low-Overhead Hard Real-time Aware Interconnect Network Router

Low-Overhead Hard Real-time Aware Interconnect Network Router Low-Overhead Hard Real-time Aware Interconnect Network Router Michel A. Kinsy! Department of Computer and Information Science University of Oregon Srinivas Devadas! Department of Electrical Engineering

More information

AN141 SMBUS COMMUNICATION FOR SMALL FORM FACTOR DEVICE FAMILIES. 1. Introduction. 2. Overview of the SMBus Specification. 2.1.

AN141 SMBUS COMMUNICATION FOR SMALL FORM FACTOR DEVICE FAMILIES. 1. Introduction. 2. Overview of the SMBus Specification. 2.1. SMBUS COMMUNICATION FOR SMALL FORM FACTOR DEVICE FAMILIES 1. Introduction C8051F3xx and C8051F41x devices are equipped with an SMBus serial I/O peripheral that is compliant with both the System Management

More information

Testing of Digital System-on- Chip (SoC)

Testing of Digital System-on- Chip (SoC) Testing of Digital System-on- Chip (SoC) 1 Outline of the Talk Introduction to system-on-chip (SoC) design Approaches to SoC design SoC test requirements and challenges Core test wrapper P1500 core test

More information

C-GEP 100 Monitoring application user manual

C-GEP 100 Monitoring application user manual C-GEP 100 Monitoring application user manual 1 Introduction: C-GEP is a very versatile platform for network monitoring applications. The ever growing need for network bandwith like HD video streaming and

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Bus Data Acquisition and Remote Monitoring System Using Gsm & Can

Bus Data Acquisition and Remote Monitoring System Using Gsm & Can IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) e-issn: 2278-1676,p-ISSN: 2320-3331, Volume 8, Issue 3 (Nov. - Dec. 2013), PP 88-92 Bus Data Acquisition and Remote Monitoring System

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Network connectivity controllers

Network connectivity controllers Network connectivity controllers High performance connectivity solutions Factory Automation The hostile environment of many factories can have a significant impact on the life expectancy of PCs, and industrially

More information

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor

Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor Von der Hardware zur Software in FPGAs mit Embedded Prozessoren Alexander Hahn Senior Field Application Engineer Lattice Semiconductor AGENDA Overview Mico32 Embedded Processor Development Tool Chain HW/SW

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

System on Chip Platform Based on OpenCores for Telecommunication Applications

System on Chip Platform Based on OpenCores for Telecommunication Applications System on Chip Platform Based on OpenCores for Telecommunication Applications N. Izeboudjen, K. Kaci, S. Titri, L. Sahli, D. Lazib, F. Louiz, M. Bengherabi, *N. Idirene Centre de Développement des Technologies

More information

What is LOG Storm and what is it useful for?

What is LOG Storm and what is it useful for? What is LOG Storm and what is it useful for? LOG Storm is a high-speed digital data logger used for recording and analyzing the activity from embedded electronic systems digital bus and data lines. It

More information

Open Flow Controller and Switch Datasheet

Open Flow Controller and Switch Datasheet Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications 1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture

More information

PCI Express: The Evolution to 8.0 GT/s. Navraj Nandra, Director of Marketing Mixed-Signal and Analog IP, Synopsys

PCI Express: The Evolution to 8.0 GT/s. Navraj Nandra, Director of Marketing Mixed-Signal and Analog IP, Synopsys PCI Express: The Evolution to 8.0 GT/s Navraj Nandra, Director of Marketing Mixed-Signal and Analog IP, Synopsys PCIe Enterprise Computing Market Transition From Gen2 to Gen3 Total PCIe instances. 2009

More information

Operating Systems 4 th Class

Operating Systems 4 th Class Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science

More information

Application Note 132. Introduction. Voice Video and Data Communications using a 2-Port Switch and Generic Bus Interface KSZ8842-16MQL/MVL

Application Note 132. Introduction. Voice Video and Data Communications using a 2-Port Switch and Generic Bus Interface KSZ8842-16MQL/MVL Application Note 132 Voice Video and Data Communications using a 2-Port Switch and Generic Bus Interface KSZ42-16MQL/MVL Introduction The IP-Telephony market is booming, due to the ease of use of the technology

More information

Switch Fabric Implementation Using Shared Memory

Switch Fabric Implementation Using Shared Memory Order this document by /D Switch Fabric Implementation Using Shared Memory Prepared by: Lakshmi Mandyam and B. Kinney INTRODUCTION Whether it be for the World Wide Web or for an intra office network, today

More information

Using a Generic Plug and Play Performance Monitor for SoC Verification

Using a Generic Plug and Play Performance Monitor for SoC Verification Using a Generic Plug and Play Performance Monitor for SoC Verification Dr. Ambar Sarkar Kaushal Modi Janak Patel Bhavin Patel Ajay Tiwari Accellera Systems Initiative 1 Agenda Introduction Challenges Why

More information

Universal Flash Storage: Mobilize Your Data

Universal Flash Storage: Mobilize Your Data White Paper Universal Flash Storage: Mobilize Your Data Executive Summary The explosive growth in portable devices over the past decade continues to challenge manufacturers wishing to add memory to their

More information

Multiprocessor System-on-Chip

Multiprocessor System-on-Chip http://www.artistembedded.org/fp6/ ARTIST Workshop at DATE 06 W4: Design Issues in Distributed, CommunicationCentric Systems Modelling Networked Embedded Systems: From MPSoC to Sensor Networks Jan Madsen

More information

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL

DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL IJVD: 3(1), 2012, pp. 15-20 DESIGN AND VERIFICATION OF LSR OF THE MPLS NETWORK USING VHDL Suvarna A. Jadhav 1 and U.L. Bombale 2 1,2 Department of Technology Shivaji university, Kolhapur, 1 E-mail: suvarna_jadhav@rediffmail.com

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:

More information

Designing a System-on-Chip (SoC) with an ARM Cortex -M Processor

Designing a System-on-Chip (SoC) with an ARM Cortex -M Processor Designing a System-on-Chip (SoC) with an ARM Cortex -M Processor A Starter Guide Joseph Yiu November 2014 version 1.02 27 Nov 2014 1 - Background Since the ARM Cortex -M0 Processor was released a few years

More information

Ways to Use USB in Embedded Systems

Ways to Use USB in Embedded Systems Ways to Use USB in Embedded Systems by Yingbo Hu, R&D Embedded Engineer and Ralph Moore, President of Micro Digital Universal Serial Bus (USB) is a connectivity specification that provides ease of use,

More information

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches 3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,

More information

White Paper. S2C Inc. 1735 Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: +1 408 213 8818 Fax: +1 408 213 8821 www.s2cinc.com.

White Paper. S2C Inc. 1735 Technology Drive, Suite 620 San Jose, CA 95110, USA Tel: +1 408 213 8818 Fax: +1 408 213 8821 www.s2cinc.com. White Paper FPGA Prototyping of System-on-Chip Designs The Need for a Complete Prototyping Platform for Any Design Size, Any Design Stage with Enterprise-Wide Access, Anytime, Anywhere S2C Inc. 1735 Technology

More information

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 5 INPUT/OUTPUT UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ

COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 5 INPUT/OUTPUT UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ COMPUTERS ORGANIZATION 2ND YEAR COMPUTE SCIENCE MANAGEMENT ENGINEERING UNIT 5 INPUT/OUTPUT UNIT JOSÉ GARCÍA RODRÍGUEZ JOSÉ ANTONIO SERRA PÉREZ Tema 5. Unidad de E/S 1 I/O Unit Index Introduction. I/O Problem

More information

White Paper Increase Flexibility in Layer 2 Switches by Integrating Ethernet ASSP Functions Into FPGAs

White Paper Increase Flexibility in Layer 2 Switches by Integrating Ethernet ASSP Functions Into FPGAs White Paper Increase Flexibility in Layer 2 es by Integrating Ethernet ASSP Functions Into FPGAs Introduction A Layer 2 Ethernet switch connects multiple Ethernet LAN segments. Because each port on the

More information

System-Level Performance Analysis for Designing On-Chip Communication Architectures

System-Level Performance Analysis for Designing On-Chip Communication Architectures 768 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 6, JUNE 2001 System-Level Performance Analysis for Designing On-Chip Communication Architectures Kanishka

More information

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

More information

Serial Communications

Serial Communications Serial Communications 1 Serial Communication Introduction Serial communication buses Asynchronous and synchronous communication UART block diagram UART clock requirements Programming the UARTs Operation

More information

HARDWARE IMPLEMENTATION OF TASK MANAGEMENT IN EMBEDDED REAL-TIME OPERATING SYSTEMS

HARDWARE IMPLEMENTATION OF TASK MANAGEMENT IN EMBEDDED REAL-TIME OPERATING SYSTEMS HARDWARE IMPLEMENTATION OF TASK MANAGEMENT IN EMBEDDED REAL-TIME OPERATING SYSTEMS 1 SHI-HAI ZHU 1Department of Computer and Information Engineering, Zhejiang Water Conservancy and Hydropower College Hangzhou,

More information

System Considerations

System Considerations System Considerations Interfacing Performance Power Size Ease-of Use Programming Interfacing Debugging Cost Device cost System cost Development cost Time to market Integration Peripherals Different Needs?

More information

21152 PCI-to-PCI Bridge

21152 PCI-to-PCI Bridge Product Features Brief Datasheet Intel s second-generation 21152 PCI-to-PCI Bridge is fully compliant with PCI Local Bus Specification, Revision 2.1. The 21152 is pin-to-pin compatible with Intel s 21052,

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

1. PUBLISHABLE SUMMARY

1. PUBLISHABLE SUMMARY 1. PUBLISHABLE SUMMARY ICT-eMuCo (www.emuco.eu) is a European project with a total budget of 4.6M which is supported by the European Union under the Seventh Framework Programme (FP7) for research and technological

More information

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions

BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions Insight, Analysis, and Advice on Signal Processing Technology BDTI Solution Certification TM : Benchmarking H.264 Video Decoder Hardware/Software Solutions Steve Ammon Berkeley Design Technology, Inc.

More information

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH

BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH WHITE PAPER METRIC-DRIVEN VERIFICATION ENSURES SOFTWARE DEVELOPMENT QUALITY BY STEVE BROWN, CADENCE DESIGN SYSTEMS AND MICHEL GENARD, VIRTUTECH INTRODUCTION The complexity of electronic systems is rapidly

More information

Codesign: The World Of Practice

Codesign: The World Of Practice Codesign: The World Of Practice D. Sreenivasa Rao Senior Manager, System Level Integration Group Analog Devices Inc. May 2007 Analog Devices Inc. ADI is focused on high-end signal processing chips and

More information

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2

Reconfigurable Computing. Reconfigurable Architectures. Chapter 3.2 Reconfigurable Architectures Chapter 3.2 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design Coarse-Grained Reconfigurable Devices Recall: 1. Brief Historically development (Estrin Fix-Plus

More information

- Nishad Nerurkar. - Aniket Mhatre

- Nishad Nerurkar. - Aniket Mhatre - Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,

More information

Feb.2012 Benefits of the big.little Architecture

Feb.2012 Benefits of the big.little Architecture Feb.2012 Benefits of the big.little Architecture Hyun-Duk Cho, Ph. D. Principal Engineer (hd68.cho@samsung.com) Kisuk Chung, Senior Engineer (kiseok.jeong@samsung.com) Taehoon Kim, Vice President (taehoon1@samsung.com)

More information

Linux. Reverse Debugging. Target Communication Framework. Nexus. Intel Trace Hub GDB. PIL Simulation CONTENTS

Linux. Reverse Debugging. Target Communication Framework. Nexus. Intel Trace Hub GDB. PIL Simulation CONTENTS Android NEWS 2016 AUTOSAR Linux Windows 10 Reverse ging Target Communication Framework ARM CoreSight Requirements Analysis Nexus Timing Tools Intel Trace Hub GDB Unit Testing PIL Simulation Infineon MCDS

More information

Embedded Parallel Computing

Embedded Parallel Computing Embedded Parallel Computing Lecture 5 - The anatomy of a modern multiprocessor, the multicore processors Tomas Nordström Course webpage:: Course responsible and examiner: Tomas

More information

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,

More information

ZigBee Technology Overview

ZigBee Technology Overview ZigBee Technology Overview Presented by Silicon Laboratories Shaoxian Luo 1 EM351 & EM357 introduction EM358x Family introduction 2 EM351 & EM357 3 Ember ZigBee Platform Complete, ready for certification

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

Computer Organization & Architecture Lecture #19

Computer Organization & Architecture Lecture #19 Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of

More information

Hardware accelerated Virtualization in the ARM Cortex Processors

Hardware accelerated Virtualization in the ARM Cortex Processors Hardware accelerated Virtualization in the ARM Cortex Processors John Goodacre Director, Program Management ARM Processor Division ARM Ltd. Cambridge UK 2nd November 2010 Sponsored by: & & New Capabilities

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

Software engineering for real-time systems

Software engineering for real-time systems Introduction Software engineering for real-time systems Objectives To: Section 1 Introduction to real-time systems Outline the differences between general-purpose applications and real-time systems. Give

More information

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Design of a High Speed Communications Link Using Field Programmable Gate Arrays Customer-Authored Application Note AC103 Design of a High Speed Communications Link Using Field Programmable Gate Arrays Amy Lovelace, Technical Staff Engineer Alcatel Network Systems Introduction A communication

More information

Model-based system-on-chip design on Altera and Xilinx platforms

Model-based system-on-chip design on Altera and Xilinx platforms CO-DEVELOPMENT MANUFACTURING INNOVATION & SUPPORT Model-based system-on-chip design on Altera and Xilinx platforms Ronald Grootelaar, System Architect RJA.Grootelaar@3t.nl Agenda 3T Company profile Technology

More information

The Motherboard Chapter #5

The Motherboard Chapter #5 The Motherboard Chapter #5 Amy Hissom Key Terms Advanced Transfer Cache (ATC) A type of L2 cache contained within the Pentium processor housing that is embedded on the same core processor die as the CPU

More information

Multicore Programming with LabVIEW Technical Resource Guide

Multicore Programming with LabVIEW Technical Resource Guide Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN

More information

The Research and Application of College Student Attendance System based on RFID Technology

The Research and Application of College Student Attendance System based on RFID Technology The Research and Application of College Student Attendance System based on RFID Technology Zhang Yuru, Chen Delong and Tan Liping School of Computer and Information Engineering, Harbin University of Commerce,

More information