How To Test The Performance Of Different Communication Architecture On A Computer System

Size: px
Start display at page:

Download "How To Test The Performance Of Different Communication Architecture On A Computer System"

Transcription

1 Evaluation of the Traffic-Performance Characteristics of System-on-Chip Communication Architectures Kanishka Lahiri Dept. of ECE UC San Diego Anand Raghunathan NEC USA C&C Research Labs, Princeton, NJ Sujit Dey Dept. of ECE UC San Diego Abstract The emergence of several communication architectures for System-on-Chips provides designers with a variety of design alternatives. In addition, the need to customize the system architecture for a specific application or domain, makes it critical for a designer to be aware of (and to evaluate) the trade-offs involved in selecting an optimal system-level communication architecture. While it is generally known that different communication architectures may be better suited to serve the needs of different applications, very little work has been done on quantitatively comparing and characterizing their performance for different classes of on-chip communication traffic. In this paper, we present a detailed analysis of the performance of various System-on-Chip communication architectures under different classes of on-chip communication traffic. We present high-level models of a few commonly used on-chip architectures, which take into account key architectural features, including their characteristic topologies and communication protocols. We present an efficient methodology to study the performance of each architecture, making use of (i) parameterized traffic generators, that help create a wide variety of on-chip communication traffic, and (ii) an implementation independent communication abstraction, to enable plug-and-play evaluation of alternative communication architectures. Our experiments show that the effectiveness of each architecture varies significantly, depending on the characteristics of the communication traffic (average communication rates of common architectures were seen to vary by as much as 49%). Additionally, they also demonstrate the criticality of judiciously selecting an on-chip communication architecture for a given application. We discuss the implications of our experiments, including the relative strengths and weaknesses of the considered architectures, the classes of traffic that each is well suited to, and requirements for system design tools and methodologies in order to support efficient communication architecture selection and customization. Introduction The System-on-Chip paradigm in electronic system design creates new opportunities to improve several system design metrics, including performance, cost, size, power dissipation, and design turn-around time. In order to exploit these potential advantages to the fullest, it is essential to thoroughly explore the large design space made available by the System-on-Chip (SoC) approach. In particular, any SoC design methodology should adequately address two dimensions of system design. First, it is necessary to efficiently and optimally map an application s computation requirements to a set of high-performance computing elements, like CPUs, DSPs, application specific cores, custom logic, etc. Inthis work we address a second, equally important, dimension of system design, that of selecting an optimal system-level architecture This work was supported by NEC USA Inc., and by the California Micro Program that provides suitable mechanisms for high-speed on-chip communication. Increasingly complex systems, with heterogeneous (often predesigned) components implementing the application s computation, result in increased volume and diversity of on-chip communication traffic. Hence, it is necessary to judiciously select a communication architecture that best suits (or is optimal for) the needs of communication traffic that is generated by the application. In addition to selecting a communication architecture from a variety of alternatives [], it is necessary to customize (or tune) the selected architecture for the specific application or domain. Both these factors make it critical for a designer to be aware of, and to evaluate the trade-offs involved in the selection of an optimal system-level communication architecture. Various competing communication architecture topologies and protocols have been developed and used commercially [2, 3, 4, 5], and it is known that different architectures may be better suited to serve the needs of different applications. For large systems with complex communications, ad hoc techniques that rely on a designer s intuition to assess the characteristics of each application, and select an architecture to implement its communications, could result in a system with significantly sub-optimal performance. Recognizing this, there has been recent interest in developing fast and accurate system-level performance analysis and communication architecture synthesis techniques [6, 7, 8, 9,,, 2, 3, 4]. However, these efforts do not provide any quantitative assessment or analysis of the relative merits and demerits of commonly available communication architectures for various classes of application traffic. The focus of this paper is to systematically and quantitatively explore the dependence between the performance of various communication architectures, and the characteristics of the traffic generated by an application which has been mapped to a set of system components. In addition, we identify parts of the application s communication traffic space for which different communication architectures are well-suited.. Paper overview and contributions In this paper, we present a detailed analysis of the performance of various commonly used SoC communication architectures, under several classes of on-chip communication traffic. The architectures we consider in this paper include a static priority based shared system bus, a two-level hierarchical bus, a TDMA based architecture, and a ring based architecture. For our analysis, we developed high-level simulation models of each of the above communication architectures, taking into account their characteristic topologies, communication protocols, and various architectural parameters. We developed an efficient methodology to study the performance of the above architectures, based on the use of (i) parameterized traffic generators to create a wide variety of on-chip communication traffic, and (ii) an implementation independent abstraction for inter-component communication, in order to minimize the effort involved in composing and analyzing the various candidate communication architectures. Through experiments (based on simulation of the developed models of the architectures, components, and s) we show that the effectiveness of each architecture varies significantly

2 with the characteristics of the communication traffic (performance metrics for common architectures were seen to vary by as much as 49%). We compare the performance of each architecture across a variety of on-chip traffic classes, as well as study the the relative ability of each architecture in being able to handle particular classes of traffic. Overall, we demonstrate the criticality of carefully selecting the on-chip communication architecture with the communication requirements and traffic characteristics of the application in mind. Our work motivates future research in system-level design, to develop techniques that methodically and efficiently explore the trade-offs that arise when selecting a high-performance communication architecture to satisfy an application s communication requirements. In the next section, we describe each of the architectures that we consider in this paper. In Section 3, we present the experimental methodology, and in Section 4, we present the results of our experiments, including a discussion of the results and their implications. We conclude by commenting on the requirements for system design tools and methodologies to support efficient communication architecture selection and customization. 2 Modeling On-Chip Communication Architectures In this section, we first introduce some concepts and terminology frequently used in connection with on-chip communication architectures. Next, we present high-level models of each communication architecture considered in this paper, highlighting the architectural features captured by each model. 2. Background The various steps in designing a communication architecture for a System-on-chip include the following: Selection of an appropriate topology: The topology typically consists of several shared and dedicated communication channels, to which the various SoC components are connected. Components that can initiate communications are called masters. Examples include CPUs, DSPs, DMA controllers, etc. Passive SoC components, or slaves, merely respond to transactions initiated by a master. (e.g., on-chip memories and peripherals). To enable communication between master-slave pairs connected to different channels, bridges are introduced where appropriate. Selection of communication protocols: For each channel, the associated protocol specifies the exact manner in which communications across the channel take place, including handshaking conventions, burst mode transfer characteristics, endianness, etc. In addition, shared channels are empowered with mechanisms to manage accesses from multiple SoC masters. These include popular resource management approaches such as round-robin access, priority based selection, and time-division multiplexed access, implemented in centralized or distributed bus arbiters. Specification of architectural parameters: These define certain properties of the channels and their associated protocols (e.g., bus widths, burst transfer size, priorities, etc). Mapping communications onto the architecture: The various communications of the application are mapped to sequences of channels (or paths) in the architecture by choosing appropriate component to channel assignments. Other than the topology, mapping, protocols and parameters of the communication architecture, another important factor that determines the performance of an application is the clock speed of each channel. For a given process technology, the clock frequency depends on the complexity of the logic, the placement of the various components, physical characteristics (capacitance values) and routing of the wires. An increasingly important issue in designing cores for use in HW/SW systems is the use of a consistent communication to facilitate a design methodology where cores can be easily integrated with other system components. Several on-chip bus standards are evolving to realize this goal, most notably that put forward by VSIA [], and more recently, the Open Core Protocol [5]. Using standard s is advantageous because (i) it frees the core developer from having to make any assumptions about the system in which the core will be used, and (ii) facilitates the development of a variety of novel communication architectures not constrained by detailed interfacing requirements of each SoC component that it may potentially need to serve. 2.2 Static Priority Based Shared Bus Parameters: BLK_TRAN_SIZE= 6, PRIORITY_M_=3, PRIORITY_M_2=2, PRIORITY_M_3=, PRIORITY_M_4=4, WIDTH=64, FREQ=66Mhz,... M S S2 Shared bus S3 S4 Arbiter Figure : Static priority based shared bus architecture The shared system bus with a static priority based arbitration protocol is one of the more commonly used on-chip bus architectures [3]. The bus (Figure ) is a set of address, data and control lines that are shared by a set of masters that contend among themselves for access to one or more slaves. In our model, the arbiter periodically examines accumulated requests from the four master s, and grants bus access to the master that is of highest priority among all the requesting masters. The bus also supports a burst mode of data transfer, where the master negotiates with the arbiter to send or receive multiple words of data over the bus without incurring the overhead of handshaking for each word. Parameters that characterize this architecture include the maximum size of a burst transfer, the width of the bus (bytes per word), its frequency of operation, and the address space associated with each slave. 2.3 Hierarchical Bus A hierarchical bus architecture consists of multiple busses, optionally interconnected by bridges. For our experiments, we modeled the architecture of Figure 2, a two level hierarchy connected by a bridge. A static priority based protocol (as described above) is implemented on each of the two busses. Each bus is connected to two master s, two slave s, an arbiter, and a bridge. The bridge needs to support both master and slave s to carry out data transfers initiated by masters on either bus. It does this by activating its slave on the bus where the transaction is initiated, and its master on the bus where the transaction is targeted. A transaction going across the bridge involves a fair amount of overhead, and during the transfer, both busses remain inaccessible to other components. However, multiple word communications can proceed across the bridge in a pipelined manner. Parameters for this architecture include those mentioned for the shared system bus and are associated with each of the two busses. Priorities determine the access rights of the bridge s master s, and an address space assignment characterizes its slave Note that the exact implementation of each architecture considered in this paper was chosen to be representative of, but is not necessarily identical to any commercial implementation. Also, variations and enhancements (e.g., pre-emption, multi-threaded transactions, dynamic bus splitting, etc.), could be applied to any of the architectures, but were not considered in this paper for the sake of simplicity and fair comparison.

3 M Bus S S2 Bus params: PRIORITY_M_=, PRIORITY_BRDG=3, FREQ=33MHZ,... Bus 2 params: PRIORITY_M_3=, FREQ=33MHZ,... Arbiter Master/Slave I/F BRIDGE Master/Slave I/F Arbiter Bus 2 S3 S4 Figure 2: Hierarchical bus architecture s. In such an architecture, the key feature is that each bus is of shorter length, with fewer components contributing to the capacitive loading. Therefore, with other conditions remaining the same, each bus can be clocked at a higher rate than the single shared bus architecture described earlier. Also, transactions can proceed in parallel on the two busses. 2.4 Two-level TDMA based architecture The third architecture we consider is based on time-division multiplexing. The topology we consider here is also a shared system bus, i.e., all components are connected to the same communication channel. However, in this architecture, the components are provided access to the communication channel in an interleaved manner, using a two level arbitration protocol. Current_slot Timing wheel reservations M Request map M N N N Y Old rr2 (a) Timing wheel and second level round robin New rr2 Resvn. Arbitrn. Data tx 2 3 4* 3 4* 3 G G2 G3 G G- G2 G G3 D4 D D2 D3 D - D2 D (b) Pipelined word level arbitration and data transfer Figure 3: Two-level TDMA based architecture The first level of arbitration uses a timing wheel where each slot is statically reserved for a unique master (Figure 3(a)). In a single rotation of the wheel, a master which has reserved more than one slot is potentially granted access to the channel multiple times (e.g., M 2 and M 3 have reserved multiple slots). If the master associated with the current slot has an outstanding request, a single word transfer is granted, and the timing wheel is rotated by one slot. To alleviate the problem of wasted slots (inherent in TDMA based approaches), a second level of arbitration is supported. The policy is to keep track of the last master to be granted access via the second level of arbitration, and issue a grant to the next requesting master in a round-robin fashion. In the example, the current slot is reserved for M, but it has no data to communicate. The second level increments a round-robin pointer rr2 from its current position at M 2 to the next outstanding request at M 4. In this manner, as long as there are outstanding requests, slots are not wasted. Additionally, arbitration is pipelined with word transfers as shown in Figure 3(b). Architectural parameters include the number of slots in the timing wheel, their reservations (this can be used to assign some components a guaranteed fraction of the channel bandwidth), the width and clock frequency of the channel. 2.5 Ring Based Architecture The last architecture we consider is a ring (Figure 4); based on a token passing protocol often used in local area networks [6]. S4 M S S3 S2 Figure 4: Ring based communication architecture Ring based architectures have also been used in high speed ATM switches [5], their high clock rates making them an attractive alternative to conventional bus architectures. The figure shows our model of a ring based architecture with 8 components attached to the ring through ring s. A special data word circulates on the ring which each can recognize as a token. A ring which receives a token is allowed to initiate a transaction. If the that receives the token has no pending request, then it forwards the token to its ring neighbor. If it does have a pending request, the ring captures the token, writes data into the ring, one word per ring cycle, (or reads data off it), for a fixed number of ring cycles. When the transaction is complete, it releases the token. For an arriving data word, a ring must examine the address associated with it and check if it belongs to the address space of any slave to which it may be connected. The advantage of the ring based architecture is that the channel is connected to all the components, but is point-to-point, and therefore can support much higher clock rates than the previously described architectures. An important parameter is the maximum token holding time, which bounds the maximum number of words a ring can send or receive each time it seizes the token. 3 Experimental Methodology In this section, we present the experimental framework used to evaluate the considered communication architectures. We describe a system test-bed, the use of parameterized traffic generators, and an architecture independent communication. Finally, we present the performance metrics used in our study, and illustrate how they were obtained. 3. Test-Bed for Performance Evaluation For our experiments we made use of the POLIS [7] HW/SW codesign environment. All components of the test-bed were modeled using Esterel and C, from which simulation models were generated using POLIS. Schematic capture and HW/SW cosimulation were performed using PTOLEMY [8]. Figure 5 shows a system level test-bed to evaluate the performance of alternative communication architectures for different classes of application traffic. The test-bed was designed to provide flexibility in two respects. First, the components should be capable of generating communication traffic with widely varying characteristics. This is made possible through the use of a set of configurable traffic generators. Second, the system should allow plug-and-play of alternative communication architectures for a given configuration of the traffic generators (or class of application traffic). This is enabled by making use of an architecture independent communication. For our experiments, we chose a flexible test-bed which lends itself to systematic and convenient experimentation instead of specific benchmark applications, since such a test-bed provides better control over the characteristics of the generated communication traffic and is easily scalable. The test-bed consists of 8 components exchanging variable quantities of data and control messages during the course of their

4 Traffic generators Communication Architecture Slave components Comp M Comp Comp Comp I/F I/F I/F I/F I/F I/F I/F I/F Comp S Comp S2 Comp S3 Comp S4 Master components Master Interfaces Slave Interfaces Figure 5: Test-bed for communication architecture evaluation execution. Components M through M 4 are masters, while components S through S 4 are slaves. Each master is connected to a parameterized traffic generator. The parameters of the traffic generator can be varied to control the characteristics of the communication traffic generated by the SoC component to which it is connected. 3.2 Traffic Generators Each traffic generator can be configured by setting several parameter values prior to a co-simulation run. Parameters to be chosen include the following: The type of distribution followed by the size of the communication requests (currently supported distributions are Gaussian, uniform, Poisson, exponential); The distribution parameters of each communication request (e.g., mean and variance in the case of Gaussian requests); The distribution type of the inter-request intervals; Distribution parameters of the inter-request interval; Probabilities of accessing each of the slave devices. The first four sets of parameters control the bandwidth requirements of each SoC master, as well as the temporal properties of its communications, such as regularity, burstiness, etc. Thelast set of parameters control the spatial distribution of the communication traffic. A specification of parameter values for all the traffic generators defines a point in the communication traffic space. 3.3 Architecture Independent Communication Interface The basic signaling involved at the communication is shown in Figure 6(a). req lines from the master to the communication architecture are asserted whenever the master needs to transfer data to or from a slave. Doubling as the address bus, the request lines (with configurable width) also uniquely determine the target slave (slave addresses are mapped into a single flat address space). n word specifies the number of bus words to be transferred. rnw specifies a read (write) operation when set (unset). data bus (of parameterized width) is used for both reads and writes. gnt size defines how many words a master can transfer. wake up is used to check if the slave is busy. ready is asserted when the slave is ready. Figure 6(b) shows a timing diagram of an example transaction executing at the communication architecture. The transaction proceeds in two phases. In the first phase, the master asserts req (which carries the value of the address to be accessed), n word,and sets the bit rnw. In the example, the master requests for a word write starting at address x. After 5 cycles, it receives gnt size with value 5, which also indicates that transmission can start. In the second phase, the master executes the data transfer, writing 5 words onto the communication fabric, at the rate of one word per bus cycle. At the end of this transfer, since the original request of words is not completely satisfied, another 5 word write request is asserted. Figure 6(b) also shows the wait and transmit cycles incurred by the SoC master device during the transaction. In this example, the component spent 2 cycles waiting, and 2 cycles writing to the data bus for a transaction of words. 3.4 Performance Measurement To measure the performance of the architectures under study for different points in the communication traffic space, the following procedure was adopted. For each SoC component in the top level schematic, three counters were instantiated: bus words this counts the total number of bus words read or written by the SoC component during a simulation run. wait cycles this counts the total number of clock cycles the SoC component spends between assertion of the req signal and reception of the gnt size signal. tx cycles this sums the total transmission delay for all communication transactions. For each transaction, the delay is given by the elapsed cycles from the arrival of the grant signal, to the point when the last word of the transaction arrives (either at the master or the slave, depending on whether the transaction is a read or a write). After a co-simulation run, the average communication rate is calculated as follows. The average number of clock cycles per bus wait cycles+tx cycles word is given by bus words. For example, in Figure 6(b), ( ) or 4 clock cycles are spent transferring words. If this pattern repeats indefinitely, the average delay is 4. clock cycles per word. For a bus of frequency Mhz, and word size 64, the average communication rate of the architecture is (64 6 4:) bits/sec or 95 Mbytes/sec. Note that the metric described above represents the average rate of communication supported by communication architecture during periods when it is being utilized for a communication transaction. Therefore the metric will not be spuriously reduced when the communication architecture is idle due to lack of access requests. 4 Experimental Results In this section, we describe the experiments conducted on each of the architectures described in Section 2. We discuss the results of the experiments, their implications, point out the relative strengths and weaknesses of the considered architectures, identify classes of traffic that each is well suited to, and comment on the requirements for system design tools and methodologies in order to support efficient communication architecture selection and customization. We conducted several experiments to study the comparative performance of the various architectures at different points in the communication traffic space. The characteristics of the communication traffic were controlled by configuring the parameterized traffic generators and for each configuration, we measured the average communication rate of each architecture (as described in Section 3), by simulating extensive input traces. In all cases, measured values converged, and further simulation did not cause significant variation. For unbiased comparison, system parameters were carefully assigned to ensure that (i) characteristics of the traffic presented to each architecture were identical and (ii) each architecture was assigned parameter values that were consistent across equivalent cost implementations. For example, since the clock period of the communication channels is influenced by capacitive loading of the interconnect wires, the channel frequencies were assigned in direct proportion to the number of components attached to it. In our experimental set up, this resulted in highest clock rates for

5 Master Interface req n_word rnw data_bus gnt_size Communication Architecture wake_up ready rnw address data_bus Slave Interface req x x5 n_word 5 rnw write write gnt_size 5 5 data_bus Data Data wait_cycles=8 wait_cycles=3 tx_cycles= tx_cycles= (a) Master and slave signals (b) Example of a write transaction at a master Figure 6: Architecture independent communication the ring based architecture, proportionally lower rates for the hierarchical architecture, and lowest rates for the TDMA and static priority based busses. The maximum block transfer size of the static priority and hierarchical architectures, the maximum token holding time of the ring based architecture, and the number of contiguously reserved slots for each component in the TDMA architecture were set to an identical value ( bus words), while all the traffic generators were configured to generate communication traffic with similar characteristics. 4. Effect of s and Intervals The aim of the first set of experiments is to evaluate the performance characteristics of each architecture under communication traffic with a variety of average request sizes, and inter-request intervals. Figures 7(a)-(d) present results of these experiments. The X-Y plane in each figure denotes different points in the communication traffic space, with communication request sizes and intervals following Gaussian distributions. The X-axis denotes different values of the mean inter-request interval while the Y-axis denotes different values of the mean request size. In all cases, the standard deviation was 25% of the mean. The master-slave access probabilities were held constant, such that each master was equally likely to access each slave. The Z-axis denotes the average communication rate delivered by the communication architecture. From the graphs, we make the following observations: For each architecture, the average communication rate increases with increasing values of the mean inter-request interval. For example, in the ring based architecture, for requests with mean size, the communication rate improves from 33 Mbytes/sec (at intervals of cycles) to 28 Mbytes/sec (at intervals of cycles). This is because larger intervals result in fewer conflicts (hence smaller waiting times) for access to shared communication resources. In most cases, with increasing mean request size, the average communication rate initially improves, and then deteriorates. For example, in the case of the static priority based bus, for requests at mean intervals of cycles, with increasing request size, the average communication rate increases from 3 Mbytes/sec to 68 Mbytes/sec, and eventually decreases to 73 Mbytes/sec. This is due to a tradeoff between the increased efficiency of transferring larger amounts of data (mechanisms such as burst transfers and pipelining help reduce arbitration overheads), versus the price of increased system load (larger transfers result in more conflicts, and hence increased waiting times). At a given point in the traffic space, different architectures can have significantly different performance. For example, when the mean size is words, and the mean interval is cycles, the average rate of architectures (a) through (d) are 87, 92, 78, and 22 Mbytes/sec respectively. The percentage variation in performance across different parts of the communication traffic space is significant, and differs from architecture to architecture. The ring and hierarchical bus architectures exhibit greater sensitivity to the characteristics of the communication traffic, (communication rates vary by 49% and 355% respectively) than the static priority and TDMA based architectures (where rates vary by 28% and 26% respectively). Consequently, architectures with higher sensitivity should be chosen only if characteristics of the application s traffic are likely to remain in parts of the space where performance is acceptable. No single architecture uniformly outperforms others. For example, at request size, and interval size, the highest average rate (4 Mbytes/sec), is provided by the ring based architecture, while at size, and interval, it is provided by the TDMA architecture (29 Mbytes/sec). The ring based architecture performs well for most parts of the space, in fact, providing upto 36 Mbytes/sec for certain types of traffic. This is because the superior bandwidth of each link in the ring permits high-speed pipelined data transfers. However, for small request sizes, the dominating factor in each transaction is the latency of communicating with a remote slave. In such cases, the total transmission delay (including overheads) is highly dependent on the number of ring segments each word has to traverse. The observed communication rates (under uniformly distributed slave accesses) were as low as 62 Mbytes/sec. The hierarchical bus also exhibits comparatively high average communication rates (upto 3 Mbytes/sec). This is due to the higher degree of parallelism inherent in the architecture, and the high bandwidth of each bus. However, here too, the advantages are limited when the requests are small and have uniformly distributed addresses (observed communication rates were as low as 65 Mbytes/sec). The main factor which contributes to these low rates is the high latency of communications involving both busses, each of which incur two bus transmission delays, and extra overhead due to additional handshaking at the bridge. At request sizes of, communications across the bridge take place in a pipelined manner, effectively reducing the impact of these high latencies. However, this benefit is limited for larger request sizes, because during each bridge transaction, both the busses remain inaccessible to other system components for longer periods of time, resulting in lower average communication rates. 4.2 Effect of Spatial Locality of Communication Traffic The next experiment we carried out was to examine the effect of varying the slave access probabilities for each master device. Note that changing these probabilities directly affects the spatial locality of the communication traffic. Here we chose 5 cases of probability values, where each case consists of a specification of

6 (a) Static priority based shared bus architecture (b) Hierarchical bus architecture (c) Two level TDMA architecture (d) Ring based architecture Figure 7: Performance of alternative communication architectures across the communication traffic space 6 probabilities, one for each master-slave pair. In Case, communication occurs only between the following pairs: (M,S ), (M 2,S 2 ), (M 3,S 3 )and(m 4,S 4 ). In Case 2, M and M 2 each have 8% likelihood of accessing one of (S,S 2 ), while M 3 and M 4 are each 8% likely to access one of (S 3,S 4 ). In Case 3, communication occurs between pairs (M,S 4 ), (M 2,S ), (M 3,S 2 )and(m 4,S 3 ). Case 4 is the opposite of Case 2, (M 3 and M 4 are 8% likely to access either of S,S 2 ), while Case 5 is its extreme (M 3 and M 4 communicate only with S and S 2 ; M and M 2 communicate only with S 3 and S 4 ). For this experiment, other traffic characteristics were held constant (request sizes averaged words at a mean interval of bus cycles). For each of the above cases, the average communication rate of the various architectures is shown in Figure 8. From the figure, we make the following observations: Variations in the access patterns have no effect on the static priority or TDMA based busses. For these architectures, irrespective of which master and slave pairs are communicating, the number of clock cycles spent in transferring a single word is constant. Also, only one master-slave pair can be actively communicating at any given time. In the ring based architecture, for a given arrangement of components on the ring, the spatial distribution of accesses has an impact on performance. With the components arranged as shown (Figure 4), Case represents the best case, where each master only communicates with its immediately neighboring slave. This results in an average communication rate of 23 Mbytes/sec. Case 3 represents the worst case, where each communication traverses the entire ring, resulting in an average rate of 96 Mbytes/sec. However, since transactions take place in a pipelined manner, the effect diminishes with increasing transaction size. The performance of the hierarchical bus is very sensitive to slave access patterns, with average communication rate varying from 42 Mbytes/sec to 299 Mbytes/sec, a variation of %. The best performance is seen when communication transactions in the system have high spatial locality and remain within the address space of each bus. Cases through 5 represent decreasing locality for the arrangement of components shown in Figure 2. In Case, all transactions are highly localized, and no communications go across the bridge, while Cases 3, 4 and 5 represent cases where 2%, %, 8% and % of the communications involve the bridge ring hierarchy static priority tdma Case Case2 Case3 Case4 Case5 Slave Access Patterns Figure 8: Effect of varying slave access patterns 4.3 Performance at Low Levels of Contention In the third experiment, we illustrate the ability of each communication architecture to deal with traffic conditions where most often, exactly one master needs to communicate, i.e., there are

7 very few conflicts for access to the communication architecture. We modeled such conditions by shutting down all the traffic generators except one, and had it generate single word requests at short intervals. The results (Figure 9) indicate the following: (a) the static priority bus performs the best, providing the highest average communication rate of 43 Mbytes/sec; (b) the hierarchical bus at 4 Mbytes/sec is 25% poorer. The reasons for this are (i) the potential benefit of parallelism offered by this architecture is not utilized by this type of traffic, and (ii) small communication requests equally distributed throughout the address space have significant latencies owing to frequent transactions that involve the bridge; (c) The TDMA based architecture provides poorer performance than the previous two, because many of the TDMA slots are assigned via the second level of arbitration (Section 2) since only one component accesses the bus. The extra cycles spent in arbitration results in an average rate of Mbytes/sec, 29% poorer performance than the static priority based bus; (d) The ring based architecture provides the poor performance, because each time the active component releases the token, it must wait till the token traverses the entire ring before it can start using the channel again. The observed rate (8 Mbytes/sec) is 75% lower than that offered by the static priority based single shared bus. Average Communication Rate ring hierarchical static priority tdma Figure 9: Performance comparison at low contention 4.4 Summary In summary, it is very important to carefully assess the properties of the communication traffic generated by an application while selecting a communication architecture. An incorrect choice may result in significantly sub-optimal performance. For instance, choosing a ring based architecture for a system where most of the time the only active master is a processor core, could severely impede the application s performance. Depending on the characteristics of the on-chip communication traffic, the choice of the the best communication architecture could vary, since there is no architecture that is optimal for all types of traffic. The above experiments suggest that one needs to be cautious when choosing a traffic sensitive communication architecture (such as a hierarchical bus), to ensure compatibility of the communication traffic and the chosen architecture. Our experiments demonstrate that there is a crucial need for system design tools and methodologies to support methodical and careful exploration of available choices, when selecting a communication architecture for an application with specific communication traffic characteristics. Current design methodologies (including the one used for our experiments), are far from being able to provide such frameworks. Simulation based methods, though accurate, can be very expensive and time consuming, hence are infeasible for achieving the above goals when dealing with large, complex systems. Our experiments motivate a new research direction whose aim is to develop techniques to extract an application s on-chip communication traffic characteristics, and then efficiently and accurately predict performance of the system under candidate communication architectures. A step further would be to develop techniques that automatically suggest to the designer the most suitable communication architecture for the target application. Tools such as these will aid a designer in achieving better optimized solutions for high-performance systems, while at the same time help reduce design turn-around-time. 5 Conclusions and Future Work In this work, we evaluated the performance characteristics of several communication architectures under different classes of communication traffic. We demonstrated the importance of selecting an architecture that is well suited to the characteristics of the traffic generated by an application. We presented our use of high-level models of commonly used communication architectures, and a framework that aided systematic and efficient comparison of their performance. Our conclusion is that the optimality of a communication architecture is highly dependent on specific properties of the traffic generated by the application. Consequently their selection and design should be aided by automated, systematic analysis of the system s communication behavior. Since current system design methodologies are not geared towards providing a designer with such feedback, we plan to address this gap in the future by (i) developing models to extensively capture the characteristics of on-chip communication traffic, and (ii) developing efficient analysis and exploration tools that use such characterizations to guide the design of communication architectures that are highly optimized for a target application. References [] On chip bus attributes specification OCB., On-chip bus DWG. [2] IBM On-chip CoreConnect Bus Architecture. [3] Peripheral Interconnect Bus Architecture. [4] Sonics Inc. [5] J. Turner and N. Yamanaka, Architectural choices in large scale ATM switches, IEICE Trans. on Communications, vol. E-8B, Feb [6] J. A. Rowson and A. Sangiovanni-Vincentelli, Interface Based Design, in Proc. Design Automation Conf., pp , June 997. [7] K. Hines and G. Borriello, Optimizing Communication in embedded system cosimulation, in Proc. International Workshop on Hardware/Software Codesign, pp. 2 25, Mar [8] K. Lahiri, A. Raghunathan, and S. Dey, Fast performance analysis of bus-based system-on-chip communication architectures, in Proc. Int. Conf. Computer-Aided Design, pp , Nov [9] K. Lahiri, A. Raghunathan, and S. Dey, Performance analysis of systems with multi-channel communication architectures, in Proc. Int. Conf. VLSI Design, pp , Jan. 2. [] P. Knudsen and J. Madsen, Integrating communication protocol selection with partitioning in hardware/software codesign, in Proc. Int. Symp. System Level Synthesis, pp. 6, Dec [] M. Gasteier and M. Glesner, Bus-based communication synthesis on system level, in ACM Trans. Design Automation Electronic Systems, pp., Jan [2] T. Yen and W. Wolf, Communication synthesis for distributed embedded systems, in Proc. Int. Conf. Computer-Aided Design, pp , Nov [3] J. Daveau, T. B. Ismail, and A. A. Jerraya, Synthesis of systemlevel communication by an allocation based approach, in Proc. Int. Symp. System Level Synthesis, pp. 55, Sept [4] S. Dey and S. Bommu, Performance analysis of a system of communication processes, in Proc. Int. Conf. Computer-Aided Design, pp , Nov [5] Open Core Protocol Specification version.. Oct [6] A. S. Tanenbaum, Computer Networks. Englewood Cliffs, N.J.: Prentice Hall, 989. [7] F. Balarin, et al, Hardware-software Co-Design of Embedded Systems: The POLIS Approach. Kluwer Academic Publishers, Norwell, MA, 997. [8] J. Buck, et al, Ptolemy: A framework for simulating and prototyping heterogeneous systems, International Journal on Computer Simulation, Special Issue on Simulation Software Management, vol. 4, pp , Apr. 994.

System-Level Performance Analysis for Designing On-Chip Communication Architectures

System-Level Performance Analysis for Designing On-Chip Communication Architectures 768 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 6, JUNE 2001 System-Level Performance Analysis for Designing On-Chip Communication Architectures Kanishka

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

From Bus and Crossbar to Network-On-Chip. Arteris S.A.

From Bus and Crossbar to Network-On-Chip. Arteris S.A. From Bus and Crossbar to Network-On-Chip Arteris S.A. Copyright 2009 Arteris S.A. All rights reserved. Contact information Corporate Headquarters Arteris, Inc. 1741 Technology Drive, Suite 250 San Jose,

More information

SoC IP Interfaces and Infrastructure A Hybrid Approach

SoC IP Interfaces and Infrastructure A Hybrid Approach SoC IP Interfaces and Infrastructure A Hybrid Approach Cary Robins, Shannon Hill ChipWrights, Inc. ABSTRACT System-On-Chip (SoC) designs incorporate more and more Intellectual Property (IP) with each year.

More information

Integrated Data Relocation and Bus Reconfiguration for Adaptive System-on-Chip Platforms

Integrated Data Relocation and Bus Reconfiguration for Adaptive System-on-Chip Platforms Integrated Data Relocation and Reconfiguration for Adaptive System-on-Chip Platforms Krishna Sekar Dept.ofECE,UCSD La Jolla, CA 92093 Kanishka Lahiri NEC Laboratories America Princeton, NJ 08540 Anand

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design Applying the Benefits of on a Chip Architecture to FPGA System Design WP-01149-1.1 White Paper This document describes the advantages of network on a chip (NoC) architecture in Altera FPGA system design.

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU

ESE566 REPORT3. Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU ESE566 REPORT3 Design Methodologies for Core-based System-on-Chip HUA TANG OVIDIU CARNU Nov 19th, 2002 ABSTRACT: In this report, we discuss several recent published papers on design methodologies of core-based

More information

Overview of Network Hardware and Software. CS158a Chris Pollett Jan 29, 2007.

Overview of Network Hardware and Software. CS158a Chris Pollett Jan 29, 2007. Overview of Network Hardware and Software CS158a Chris Pollett Jan 29, 2007. Outline Scales of Networks Protocol Hierarchies Scales of Networks Last day, we talked about broadcast versus point-to-point

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Optical interconnection networks with time slot routing

Optical interconnection networks with time slot routing Theoretical and Applied Informatics ISSN 896 5 Vol. x 00x, no. x pp. x x Optical interconnection networks with time slot routing IRENEUSZ SZCZEŚNIAK AND ROMAN WYRZYKOWSKI a a Institute of Computer and

More information

Two-Stage Forking for SIP-based VoIP Services

Two-Stage Forking for SIP-based VoIP Services Two-Stage Forking for SIP-based VoIP Services Tsan-Pin Wang National Taichung University An-Chi Chen Providence University Li-Hsing Yen National University of Kaohsiung Abstract SIP (Session Initiation

More information

Quality of Service (QoS) for Asynchronous On-Chip Networks

Quality of Service (QoS) for Asynchronous On-Chip Networks Quality of Service (QoS) for synchronous On-Chip Networks Tomaz Felicijan and Steve Furber Department of Computer Science The University of Manchester Oxford Road, Manchester, M13 9PL, UK {felicijt,sfurber}@cs.man.ac.uk

More information

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1 System Interconnect Architectures CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures Direct networks for static connections Indirect

More information

A NOVEL RESOURCE EFFICIENT DMMS APPROACH

A NOVEL RESOURCE EFFICIENT DMMS APPROACH A NOVEL RESOURCE EFFICIENT DMMS APPROACH FOR NETWORK MONITORING AND CONTROLLING FUNCTIONS Golam R. Khan 1, Sharmistha Khan 2, Dhadesugoor R. Vaman 3, and Suxia Cui 4 Department of Electrical and Computer

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards eddie.edwards@imperial.ac.uk At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components

More information

Improving Bluetooth Network Performance Through A Time-Slot Leasing Approach

Improving Bluetooth Network Performance Through A Time-Slot Leasing Approach Improving Bluetooth Network Performance Through A Time-Slot Leasing Approach Wensheng Zhang, Hao Zhu, and Guohong Cao Department of Computer Science and Engineering The Pennsylvania State University University

More information

AN EFFICIENT DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CONTENT DELIVERY NETWORKS

AN EFFICIENT DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CONTENT DELIVERY NETWORKS Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

A Generic Network Interface Architecture for a Networked Processor Array (NePA)

A Generic Network Interface Architecture for a Networked Processor Array (NePA) A Generic Network Interface Architecture for a Networked Processor Array (NePA) Seung Eun Lee, Jun Ho Bahn, Yoon Seok Yang, and Nader Bagherzadeh EECS @ University of California, Irvine Outline Introduction

More information

TOPOLOGIES NETWORK SECURITY SERVICES

TOPOLOGIES NETWORK SECURITY SERVICES TOPOLOGIES NETWORK SECURITY SERVICES 1 R.DEEPA 1 Assitant Professor, Dept.of.Computer science, Raja s college of Tamil Studies & Sanskrit,Thiruvaiyaru ABSTRACT--In the paper propose about topology security

More information

Chapter 14: Distributed Operating Systems

Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Motivation Types of Distributed Operating Systems Network Structure Network Topology Communication Structure Communication

More information

Local Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software

Local Area Networks transmission system private speedy and secure kilometres shared transmission medium hardware & software Local Area What s a LAN? A transmission system, usually private owned, very speedy and secure, covering a geographical area in the range of kilometres, comprising a shared transmission medium and a set

More information

Communicating with devices

Communicating with devices Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.

More information

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual

How To Design A Single Chip System Bus (Amba) For A Single Threaded Microprocessor (Mma) (I386) (Mmb) (Microprocessor) (Ai) (Bower) (Dmi) (Dual Architetture di bus per System-On On-Chip Massimo Bocchi Corso di Architettura dei Sistemi Integrati A.A. 2002/2003 System-on on-chip motivations 400 300 200 100 0 19971999 2001 2003 2005 2007 2009 Transistors

More information

Communication Networks. MAP-TELE 2011/12 José Ruela

Communication Networks. MAP-TELE 2011/12 José Ruela Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)

More information

White Paper Abstract Disclaimer

White Paper Abstract Disclaimer White Paper Synopsis of the Data Streaming Logical Specification (Phase I) Based on: RapidIO Specification Part X: Data Streaming Logical Specification Rev. 1.2, 08/2004 Abstract The Data Streaming specification

More information

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Module 15: Network Structures

Module 15: Network Structures Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and

More information

The Bus (PCI and PCI-Express)

The Bus (PCI and PCI-Express) 4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the

More information

Chapter 16: Distributed Operating Systems

Chapter 16: Distributed Operating Systems Module 16: Distributed ib System Structure, Silberschatz, Galvin and Gagne 2009 Chapter 16: Distributed Operating Systems Motivation Types of Network-Based Operating Systems Network Structure Network Topology

More information

Mixed-Criticality Systems Based on Time- Triggered Ethernet with Multiple Ring Topologies. University of Siegen Mohammed Abuteir, Roman Obermaisser

Mixed-Criticality Systems Based on Time- Triggered Ethernet with Multiple Ring Topologies. University of Siegen Mohammed Abuteir, Roman Obermaisser Mixed-Criticality s Based on Time- Triggered Ethernet with Multiple Ring Topologies University of Siegen Mohammed Abuteir, Roman Obermaisser Mixed-Criticality s Need for mixed-criticality systems due to

More information

Path Selection Methods for Localized Quality of Service Routing

Path Selection Methods for Localized Quality of Service Routing Path Selection Methods for Localized Quality of Service Routing Xin Yuan and Arif Saifee Department of Computer Science, Florida State University, Tallahassee, FL Abstract Localized Quality of Service

More information

Physical Synthesis of Bus Matrix for High Bandwidth Low Power On-chip Communications

Physical Synthesis of Bus Matrix for High Bandwidth Low Power On-chip Communications Physical Synthesis of Bus Matrix for High Bandwidth Low Power On-chip Communications Renshen Wang, Evangeline Young, Ronald Graham and Chung-Kuan Cheng University of California, San Diego, La Jolla, CA

More information

The Importance of Software License Server Monitoring

The Importance of Software License Server Monitoring The Importance of Software License Server Monitoring NetworkComputer How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization White Paper Introduction Semiconductor companies typically

More information

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip Outline Modeling, simulation and optimization of Multi-Processor SoCs (MPSoCs) Università of Verona Dipartimento di Informatica MPSoCs: Multi-Processor Systems on Chip A simulation platform for a MPSoC

More information

LogiCORE IP AXI Performance Monitor v2.00.a

LogiCORE IP AXI Performance Monitor v2.00.a LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................

More information

Preserving Message Integrity in Dynamic Process Migration

Preserving Message Integrity in Dynamic Process Migration Preserving Message Integrity in Dynamic Process Migration E. Heymann, F. Tinetti, E. Luque Universidad Autónoma de Barcelona Departamento de Informática 8193 - Bellaterra, Barcelona, Spain e-mail: e.heymann@cc.uab.es

More information

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus Datorteknik F1 bild 1 What is a bus? Slow vehicle that many people ride together well, true... A bunch of wires... A is: a shared communication link a single set of wires used to connect multiple subsystems

More information

Optimizing Configuration and Application Mapping for MPSoC Architectures

Optimizing Configuration and Application Mapping for MPSoC Architectures Optimizing Configuration and Application Mapping for MPSoC Architectures École Polytechnique de Montréal, Canada Email : Sebastien.Le-Beux@polymtl.ca 1 Multi-Processor Systems on Chip (MPSoC) Design Trends

More information

Interconnection Generation for System-on-Chip Design and Design Space Exploration

Interconnection Generation for System-on-Chip Design and Design Space Exploration Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. G. Fettweis Interconnection Generation for System-on-Chip Design and Design Space Exploration Dipl.-Ing. Markus Winter Vodafone Chair for Mobile

More information

EPL 657 Wireless Networks

EPL 657 Wireless Networks EPL 657 Wireless Networks Some fundamentals: Multiplexing / Multiple Access / Duplex Infrastructure vs Infrastructureless Panayiotis Kolios Recall: The big picture... Modulations: some basics 2 Multiplexing

More information

Autoconfiguration and maintenance of the IP address in ad-hoc mobile networks

Autoconfiguration and maintenance of the IP address in ad-hoc mobile networks 1 Autoconfiguration and maintenance of the IP address in ad-hoc mobile networks M. Fazio, M. Villari, A. Puliafito Università di Messina, Dipartimento di Matematica Contrada Papardo, Salita Sperone, 98166

More information

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師 Lecture 7: Distributed Operating Systems A Distributed System 7.2 Resource sharing Motivation sharing and printing files at remote sites processing information in a distributed database using remote specialized

More information

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES SWATHI NANDURI * ZAHOOR-UL-HUQ * Master of Technology, Associate Professor, G. Pulla Reddy Engineering College, G. Pulla Reddy Engineering

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

RESOURCE ALLOCATION FOR INTERACTIVE TRAFFIC CLASS OVER GPRS

RESOURCE ALLOCATION FOR INTERACTIVE TRAFFIC CLASS OVER GPRS RESOURCE ALLOCATION FOR INTERACTIVE TRAFFIC CLASS OVER GPRS Edward Nowicki and John Murphy 1 ABSTRACT The General Packet Radio Service (GPRS) is a new bearer service for GSM that greatly simplify wireless

More information

A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI)

A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI) A 2-Slot Time-Division Multiplexing (TDM) Interconnect Network for Gigascale Integration (GSI) Ajay Joshi Georgia Institute of Technology School of Electrical and Computer Engineering Atlanta, GA 3332-25

More information

QoS issues in Voice over IP

QoS issues in Voice over IP COMP9333 Advance Computer Networks Mini Conference QoS issues in Voice over IP Student ID: 3058224 Student ID: 3043237 Student ID: 3036281 Student ID: 3025715 QoS issues in Voice over IP Abstract: This

More information

Design of a High Speed Communications Link Using Field Programmable Gate Arrays

Design of a High Speed Communications Link Using Field Programmable Gate Arrays Customer-Authored Application Note AC103 Design of a High Speed Communications Link Using Field Programmable Gate Arrays Amy Lovelace, Technical Staff Engineer Alcatel Network Systems Introduction A communication

More information

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs

Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Distributed Elastic Switch Architecture for efficient Networks-on-FPGAs Antoni Roca, Jose Flich Parallel Architectures Group Universitat Politechnica de Valencia (UPV) Valencia, Spain Giorgos Dimitrakopoulos

More information

Attenuation (amplitude of the wave loses strength thereby the signal power) Refraction Reflection Shadowing Scattering Diffraction

Attenuation (amplitude of the wave loses strength thereby the signal power) Refraction Reflection Shadowing Scattering Diffraction Wireless Physical Layer Q1. Is it possible to transmit a digital signal, e.g., coded as square wave as used inside a computer, using radio transmission without any loss? Why? It is not possible to transmit

More information

Clock Distribution Networks in Synchronous Digital Integrated Circuits

Clock Distribution Networks in Synchronous Digital Integrated Circuits Clock Distribution Networks in Synchronous Digital Integrated Circuits EBY G. FRIEDMAN Invited Paper Clock distribution networks synchronize the flow of data signals among synchronous data paths. The design

More information

Interconnection Networks

Interconnection Networks Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode

More information

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches

3D On-chip Data Center Networks Using Circuit Switches and Packet Switches 3D On-chip Data Center Networks Using Circuit Switches and Packet Switches Takahide Ikeda Yuichi Ohsita, and Masayuki Murata Graduate School of Information Science and Technology, Osaka University Osaka,

More information

Clocking. Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 Clocks 1

Clocking. Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 Clocks 1 ing Figure by MIT OCW. 6.884 - Spring 2005 2/18/05 L06 s 1 Why s and Storage Elements? Inputs Combinational Logic Outputs Want to reuse combinational logic from cycle to cycle 6.884 - Spring 2005 2/18/05

More information

Providing Deterministic Quality-of-Service Guarantees on WDM Optical Networks

Providing Deterministic Quality-of-Service Guarantees on WDM Optical Networks 2072 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 18, NO. 10, OCTOBER 2000 Providing Deterministic Quality-of-Service Guarantees on WDM Optical Networks Maode Ma and Mounir Hamdi, Member, IEEE

More information

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs

SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND Flash-Based SSDs SOS: Software-Based Out-of-Order Scheduling for High-Performance NAND -Based SSDs Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim Department of Computer Science and Engineering, Seoul National University,

More information

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses Overview of Real-Time Scheduling Embedded Real-Time Software Lecture 3 Lecture Outline Overview of real-time scheduling algorithms Clock-driven Weighted round-robin Priority-driven Dynamic vs. static Deadline

More information

DESIGN CHALLENGES OF TECHNOLOGY SCALING

DESIGN CHALLENGES OF TECHNOLOGY SCALING DESIGN CHALLENGES OF TECHNOLOGY SCALING IS PROCESS TECHNOLOGY MEETING THE GOALS PREDICTED BY SCALING THEORY? AN ANALYSIS OF MICROPROCESSOR PERFORMANCE, TRANSISTOR DENSITY, AND POWER TRENDS THROUGH SUCCESSIVE

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems DR.K.P.KALIYAMURTHIE 1, D.PARAMESWARI 2 Professor and Head, Dept. of IT, Bharath University, Chennai-600 073 1 Asst. Prof. (SG), Dept. of Computer Applications,

More information

TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) Internet Protocol (IP)

TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) Internet Protocol (IP) TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) *Slides adapted from a talk given by Nitin Vaidya. Wireless Computing and Network Systems Page

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,

More information

Controlled Random Access Methods

Controlled Random Access Methods Helsinki University of Technology S-72.333 Postgraduate Seminar on Radio Communications Controlled Random Access Methods Er Liu liuer@cc.hut.fi Communications Laboratory 09.03.2004 Content of Presentation

More information

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow

Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Asynchronous IC Interconnect Network Design and Implementation Using a Standard ASIC Flow Bradley R. Quinton Dept. of Electrical and Computer Engineering University of British Columbia bradq@ece.ubc.ca

More information

Introduction to PCI Express Positioning Information

Introduction to PCI Express Positioning Information Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

MLPPP Deployment Using the PA-MC-T3-EC and PA-MC-2T3-EC

MLPPP Deployment Using the PA-MC-T3-EC and PA-MC-2T3-EC MLPPP Deployment Using the PA-MC-T3-EC and PA-MC-2T3-EC Overview Summary The new enhanced-capability port adapters are targeted to replace the following Cisco port adapters: 1-port T3 Serial Port Adapter

More information

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances: Scheduling Scheduling Scheduling levels Long-term scheduling. Selects which jobs shall be allowed to enter the system. Only used in batch systems. Medium-term scheduling. Performs swapin-swapout operations

More information

A Power Efficient QoS Provisioning Architecture for Wireless Ad Hoc Networks

A Power Efficient QoS Provisioning Architecture for Wireless Ad Hoc Networks A Power Efficient QoS Provisioning Architecture for Wireless Ad Hoc Networks Didem Gozupek 1,Symeon Papavassiliou 2, Nirwan Ansari 1, and Jie Yang 1 1 Department of Electrical and Computer Engineering

More information

Performance Analysis of Storage Area Network Switches

Performance Analysis of Storage Area Network Switches Performance Analysis of Storage Area Network Switches Andrea Bianco, Paolo Giaccone, Enrico Maria Giraudo, Fabio Neri, Enrico Schiattarella Dipartimento di Elettronica - Politecnico di Torino - Italy e-mail:

More information

Power Reduction Techniques in the SoC Clock Network. Clock Power

Power Reduction Techniques in the SoC Clock Network. Clock Power Power Reduction Techniques in the SoC Network Low Power Design for SoCs ASIC Tutorial SoC.1 Power Why clock power is important/large» Generally the signal with the highest frequency» Typically drives a

More information

A case study of mobile SoC architecture design based on transaction-level modeling

A case study of mobile SoC architecture design based on transaction-level modeling A case study of mobile SoC architecture design based on transaction-level modeling Eui-Young Chung School of Electrical & Electronic Eng. Yonsei University 1 EUI-YOUNG(EY) CHUNG, EY CHUNG Outline Introduction

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

AXI Performance Monitor v5.0

AXI Performance Monitor v5.0 AXI Performance Monitor v5.0 LogiCORE IP Product Guide Vivado Design Suite Table of Contents IP Facts Chapter 1: Overview Advanced Mode...................................................................

More information

Chapter 11 I/O Management and Disk Scheduling

Chapter 11 I/O Management and Disk Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization

More information

Dynamic Load Balance Algorithm (DLBA) for IEEE 802.11 Wireless LAN

Dynamic Load Balance Algorithm (DLBA) for IEEE 802.11 Wireless LAN Tamkang Journal of Science and Engineering, vol. 2, No. 1 pp. 45-52 (1999) 45 Dynamic Load Balance Algorithm () for IEEE 802.11 Wireless LAN Shiann-Tsong Sheu and Chih-Chiang Wu Department of Electrical

More information

Fractal Networking Advantages and Disadvantages

Fractal Networking Advantages and Disadvantages Halmstad University Post-Print Fibre-ribbon ring network with inherent support for earliest deadline first message scheduling Carl Bergenhem and Magnus Jonsson N.B.: When citing this work, cite the original

More information

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012)

Lecture 18: Interconnection Networks. CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Lecture 18: Interconnection Networks CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Project deadlines: - Mon, April 2: project proposal: 1-2 page writeup - Fri,

More information

Ethernet. Ethernet Frame Structure. Ethernet Frame Structure (more) Ethernet: uses CSMA/CD

Ethernet. Ethernet Frame Structure. Ethernet Frame Structure (more) Ethernet: uses CSMA/CD Ethernet dominant LAN technology: cheap -- $20 for 100Mbs! first widely used LAN technology Simpler, cheaper than token rings and ATM Kept up with speed race: 10, 100, 1000 Mbps Metcalfe s Etheret sketch

More information

Computer Networks Vs. Distributed Systems

Computer Networks Vs. Distributed Systems Computer Networks Vs. Distributed Systems Computer Networks: A computer network is an interconnected collection of autonomous computers able to exchange information. A computer network usually require

More information

Computer Organization & Architecture Lecture #19

Computer Organization & Architecture Lecture #19 Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of

More information

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer. RESEARCH ARTICLE ISSN: 2321-7758 GLOBAL LOAD DISTRIBUTION USING SKIP GRAPH, BATON AND CHORD J.K.JEEVITHA, B.KARTHIKA* Information Technology,PSNA College of Engineering & Technology, Dindigul, India Article

More information

Chapter 13 Selected Storage Systems and Interface

Chapter 13 Selected Storage Systems and Interface Chapter 13 Selected Storage Systems and Interface Chapter 13 Objectives Appreciate the role of enterprise storage as a distinct architectural entity. Expand upon basic I/O concepts to include storage protocols.

More information

AN ECONOMICS-BASED POWER-AWARE PROTOCOL FOR COMPUTATION DISTRIBUTION IN MOBILE AD-HOC NETWORKS

AN ECONOMICS-BASED POWER-AWARE PROTOCOL FOR COMPUTATION DISTRIBUTION IN MOBILE AD-HOC NETWORKS AN ECONOMICS-BASED POWER-AWARE PROTOCOL FOR COMPUTATION DISTRIBUTION IN MOBILE AD-HOC NETWORKS Li Shang, Robert P. Dick, and Nira K. Jha Department of Electrical Engineering Princeton University, Princeton,

More information

Performance Workload Design

Performance Workload Design Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles

More information

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip

A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip www.ijcsi.org 241 A CDMA Based Scalable Hierarchical Architecture for Network- On-Chip Ahmed A. El Badry 1 and Mohamed A. Abd El Ghany 2 1 Communications Engineering Dept., German University in Cairo,

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

A Hardware-Software Cosynthesis Technique Based on Heterogeneous Multiprocessor Scheduling

A Hardware-Software Cosynthesis Technique Based on Heterogeneous Multiprocessor Scheduling A Hardware-Software Cosynthesis Technique Based on Heterogeneous Multiprocessor Scheduling ABSTRACT Hyunok Oh cosynthesis problem targeting the system-on-chip (SOC) design. The proposed algorithm covers

More information

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Abstract AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Mrs. Amandeep Kaur, Assistant Professor, Department of Computer Application, Apeejay Institute of Management, Ramamandi, Jalandhar-144001, Punjab,

More information

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS?

Quality of Service versus Fairness. Inelastic Applications. QoS Analogy: Surface Mail. How to Provide QoS? 18-345: Introduction to Telecommunication Networks Lectures 20: Quality of Service Peter Steenkiste Spring 2015 www.cs.cmu.edu/~prs/nets-ece Overview What is QoS? Queuing discipline and scheduling Traffic

More information