Bare-metal message passing API for many-core systems

Size: px
Start display at page:

Download "Bare-metal message passing API for many-core systems"

Transcription

1 Bare-metal message passing API for many-core systems Johannes Scheller Institut Superieur de l Aeronautique et de l Espace, Toulouse, France Eric Noulard and Claire Pagetti and Wolfgang Puffitsch ONERA-DTIM, Toulouse, France This paper describes a bare-metal multiple instruction, multiple data message passing library for the Intel Single Chip Cloud Computer. The library s design, the derived notion of a global time across the cores and the verification of the send and receive functions are highlighted. Finally, a use-case example implementing a pseudo AFDX network shows that the message passing performance is not limited by the mesh network but by the workloads of the cores. A. Context I. Introduction Safety critical embedded systems require a timing predictable implementation in order to prove their correctness. For both cost and time reasons, embedded systems rely heavily on commercial off-the-shelves (COTS) hardware. Evermore complex chip designs and shared caches in commercially available chip multiprocessors (CMP) make it difficult, if not impossible, to ensure predictable timing behavior. For instance, the time to execute a simple memory access may vary greatly due to mechanisms such as pipelining, speculative and out-of-order execution, etc. 1 The many-core technology holds potentially the key to resolving some of these issues. These processors combine numerous cores with a Network-on-Chip (NoC) based communication networks. This structure has various advantages. Reducing conflicts and allowing full control over when cores communicate is one of these, achieved by explicit message passing. Furthermore, the large number of cores on these chips allow dedicating a core, or a cluster of cores to a single application. The Intel Single-chip Cloud Computer (SCC) is an experimental platform designed to evaluate many-core processors and hardware-assisted message passing using an on die mesh network. 2 B. Programming solutions for predictability In addition to adding a considerable overhead, the use of an operating system (OS) also introduces additional non-predictable mechanisms such as the use of interrupts, preemption, etc. Bare-metal programming usually means programming directly on the underlying hardware and thereby avoiding this additional overhead. In addition, the programmer has full control over the execution of the application, resource usage, and the allocation on the chip. The user has the choice between using a single program, multiple data (SPMD) and a multiple instruction, multiple data (MIMD) programming model. In SPMD programming the same executables are launched upon each core. Whereas, in the MIMD programming model each core is provided with its individual executable thereby significantly reducing the memory footprint on each core. In addition MIMD reduces the complexity of the individual programs run on the cores and the increases the predictability, as a core s application contains only the essential parts for this specific core. Young graduate, address: johannes.scheller@gmail.com Researcher, address: eric.noulard@onera.fr Researcher, address: claire.pagetti@onera.fr Post-doctoral research felloy, address: wolfgang.puffitsch@onera.fr 1 of 11

2 C. Related work The work presented in this paper is based upon the Baremichael bare-metal framework. 3 Baremichael is a minimalistic bare-metal framework for the Intel SCC. The framework brings the cores from real into protected mode and sets up a flat memory space. Since version 6 Baremichael supports SPMD message passing using RCCEv2. RCCE is the message passing library provided by Intel. 4, 5 It is based upon put and get primitives which conduct a memory copy from and to the desired memory address. RCCE was designed to support both Linux and C. It uses a static SPMD model. Therefore, the user has to specify the core s with which to communicate upon startup. The same executables are then launched on the specified cores. Since the RCCE message passing library only allows SPMD programs, it did not fit our demands. ircce, which was developed by RWTH Aachen, extends RCCE by non-blocking send and receive primitives and uses an improved memory copy algorithm. 6 Nevertheless, ircce still relies on a static SPMD programming model hence it is not suitable for our purposes either. On the other hand, both the send and receive functions of our message passing library use the ircce put and get functions in order to copy data. ET International (ETI) also offers a bare-metal framework for the Intel SCC. 7 ETI provides full libc support and also includes a simulation environment for Linux. Unfortunately the ETI framework is closed source and only supports SPMD programs. The provided streaming API, which allows for inter-core communication, does not use the MPBs in order to transfer messages. Instead, the API uses the on-tile MPBs only to signal the transfer of a message. The messages themselves are transferred using the external DRAM. Since the ETI framework uses the MPB only for control and not for messages, and it only supports a SPMD programming model, it does not fulfill all our needs. MetalSVM, 8 which was developed by RWTH Aachen, implements a bare-metal hypervisor on the Intel SCC. Their implementation goes beyond simply exchanging messages between independent applications. They create a shared virtual memory that can be accessed by the applications. Finally Altreonic ported OpenComRTOS, 9 which is a network centric real-time operating systems, to the SCC. D. Contribution The BareMichael bare-metal framework (BMBMF) provides the ability to run bare-metal programs on the SCC. 3 Building upon BMBMF, this paper proposes a message passing library suitable for multiple instruction, multiple data (MIMD) applications which enables the user to run independent code on each core. This significantly reduces the memory footprint on the cores. In order to ensure the correctness of the created library, the send and receive functions are modeled using UPPAAL, which is a tool for modeling 10, 11 and verifying networked timed automata. The remainder of this paper is organized as follows. Section 2 takes a closer look at message passing on the Intel SCC. To that end, the SCC s architecture is detailed and the basic functions making up the message passing library are explained. The send and receive process between two communicating cores is illustrated as is the message passing performance of the designed library. Section 3 describes the verification of the library using UPPAAL. Furthermore, the derivation of a global clock on all cores is presented. Finally, the dependence of the message passing performance on the mapping of senders and receivers on the NoC is analyzed. This analysis is done by means of a case study based upon a pseudo Avionics Full DupleX switched ethernet (AFDX) network. II. A message passing library for the SCC A. Brief description of the Intel SCC The Intel SCC shown in Figure 1(a), is a research processor consisting of 24 tiles, arranged in a 6 4 matrix. The tiles are interconnected by a mesh network. Each tile holds two P54C cores, 12 which are derived from very basic second generation Pentium processors. These P54C cores both have a private L1 (16 KB data and instruction cache) and L2 (256 KB) cache. In addition, they also share a local memory buffer, the MPB (16 KB). Data in one tile s MPB can be accessed from any other tile. Requests to access the MPB of another tile are translated by the network and executed using simple (X,Y) routing. 13 Therefore, the MPBs can be seen as distributed shared memory with non-uniform access time. For data written to and read from the MPB, Intel has also introduced a special memory type, the message passing buffer type (MPBT). Data flagged as MPBT is not cached in the L2 cache. Furthermore, cache lines in the L1 cache flagged as MPBT can be invalidated using a special instruction, enabling users to force reads and writes. Each tile also 2 of 11

3 contains the mesh interface unit (MIU), which translates between the core s bus interface and the network, and the write combine buffer (WCB). This WCB stores up to a whole cache line of MPBT flagged data before writing it onto the network. The WCB can be flushed by issuing a write to another memory line flagged MPBT. R R R R R R Fool line DC R R R R R R DC Send flags Ack flags R R R R R R Setup space MPB space DC R R R R R R DC Data space VRC System Controller/Interface (a) Illustration of the Intel SCC R = ROUTER VRC = Voltage Regulator Controller DC = DDR3 Controller (b) Layout core s MPB space Figure 1: Intel SCC and MPB B. Message passing paradigms Based on the structure of the SCC, two basic message passing methods can be used. Either the data can be written locally and read remotely (PULL), or the data can be written remotely and read locally (PUSH). To implement these methods, we split the MPB in two parts: one part dedicated to overhead data such as flags and one part for data. The flags are used to inform the receiver of a pending message and to acknowledge its reception by the sender. As illustrated in Figure 1(b), for both send and receive flags 64 Bytes of MPB space are allocated. Each core has its send and receive flag padded up to 32 Byte, hence a total of 128 Bytes. Furthermore, a fool line is allocated in order to flush the WCB. Finally, a single line is allocated for setup information and clock synchronisation (see Section III B). The data space either for PUSH and PULL is statically partitioned. Static partitioning was done to avoid locking, reduce complexity and avoid an a-priori knowledge of the currently communicating cores. Different cores can employ different message passing methods (PUSH or PULL). Communication between a PUSH- and a PULL-based core is not supported for the moment. PUSH-based sending requires a strict partitioning between the possible sending cores in order to avoid message inconsistency. The developed library employs a simple division of the remaining MPB space between all 48 cores. Hence each core has 160 Bytes of dedicated message passing space in each of the cores.figure 2 illustrates a PUSH-based sending process. In this example the core s 0 and 47 are communicating. The sender first puts the data into the partition dedicated to him in the receiver s MPB (orange). He then continues and raises the send flag on the receiver side (green). When reading his send flags the receiver will see the raised flag and be able to read the data out of his own MPB. The receiver then acknowledges the communication by setting his dedicated receive flag at the sender (blue). In other words the receiver writes to the memory space allocated to him to acknowledge message reception in the senders MPB. Following this handshake, the sender is able to send the next data packet to the receiver. The previously described process is similar for the PULL-based sending. In this case the message is stored in the MPB of the sender. Since only the sender is writing to its own MPB no partitioning is needed and the full 8000 Bytes can be used for data transfers. In other words while the PUSH-based sending is a combination of remote read and local write, the PULL-based sending combines a local write with a remote read. 3 of 11

4 The choice to use either PUSH or PULL is up to the user, but has to be made at compile time. Each of the mechanisms has certain advantages. While the PUSH-based mechanism is beneficial when sending multiple different messages in short intervals to different receivers, the PULL method is advantageous when issuing multicasts as the send overhead is spread. Fool line Fool line Setup space MPB 0 Setup space MPB 0 MPB 47 MPB 47 Figure 2: Illustration of the communication for a PUSH-based sending between two cores C. Send and receive functions Send and receive can be conducted in four different ways, which are governed by the type. These types are asynchronous (async), blocking (blocking), non-blocking (non_blocking) and non-blocking with timeout (non_blocking_timeout). If a send is being conducted the types lead to the following behavior: async The sender checks if he can send to the receiver. If he can, he will send all requested bytes to the receiver returning only after the whole send operation is finished. If the receiver is not ready to receive the send function will return immediately. non_blocking In the event that a non_blocking send is being conducted the sender checks if he can send to the receiver. If this check is successful, the sender will transmit the first packet, whose size is governed by the size of the available MPB partition. If the receiver directly acknowledges the successful reception of the packet, the next packet will be sent. Otherwise the sender will return the number of transmitted bytes. non_blocking_timeout A non_blocking send implementation which after each send sets a timeout. When called, the send function with this type does not only check if a previous send has occurred and/or if it has terminated, but also if for all the unterminated sends the set timeout value has been reached. In this case the receiver is signalled of the timeout occurrence and the send process continues. blocking In the event that a blocking send is being conducted the sender waits until he can send to the receiver. He then transmits the complete message to the receiver. If the message exceeds the size of the partition available to the sender the sender will split the message into multiple packets. The function will not return until all bytes have been successfully transmitted. Similarily the different possible types also influence the receive function. async The receiver checks whether or not data has been received. If data was received, the receiver will not return until all the requested bytes have been successfully received. In the event that no data is present, the receiving function will return. The receiver will return the number of bytes that were received (either 0 or the number of requested bytes). non_blocking If a non_blocking receive is being conducted, the receiver checks if data was received. If data was received, the receiver receives as many bytes as the size of the MPB partition available to him. If 4 of 11

5 further data is available, he continues the receive, else he finishes the receive process. If no data was present after the first check the receiver returns directly. The receiver returns the number of bytes that were received. non_blocking_timeout A non_blocking implementation which checks whether the sender has signalled a timeout by setting the flag accordingly in which case he returns indicating the occurred timeout to the calling function. blocking A blocking receive will cause the receiver to wait until all the requested data has been received before he returns. Both the send and receive function rely on the memory copy function originally designed for RCCE 4, 5 in order to transfer the messages into the destination core s MPB. Furthermore, as a non-blocking send does not directly require an acknowledge, and when conducting a PUSH-based send different messages can be sent to different receivers without waiting for the first receiver to acknowledge, message transfer can be conducted asynchronously. D. Comparison of PUSH and PULL mechanisms As mentioned previously two different message passing mechanisms were implemented in the library. Since the PUSH-based mechanism is based upon a remote write and a local read each core has only 160 Bytes of message passing space available in every other core. On the other hand, using the PULL-based mechanism the core can exploit the entire 8000 Bytes available in its own MPB. This difference in message passing space results in a noticeable performance difference as shown in Figure 3. For every message that exceeds 160 Bytes the PUSH-based mechanism has to wait for the receiver to acknowledge the receipt before continuing the send process. Hence, at 160 Byte, a performance gap can be observed. The decrease in performance for messages larger than 8000 Bytes for both the PUSH- and PULL-based mechanism is due to the nature of the experiment. In this simple PING-PONG example the sender transmits the message to the receiver and then waits for the receiver to respond using the same message. The receiver stores the received message and the destination buffer in its L1 cache. Therefore, for messages larger than half the size of the L1 cache, conflicts between the message reads and writes lead to a decrease of the performance until no more useful data remains in the L1 cache (at about 16 kbyte). This performance reflects the bandwidth sustainable by the L2 cache. Consequently, when messages exceed the size of the L2 cache another drop in performance occurs. 5 In addition to illustrating the performance of the PUSH- and PULL-based mechanisms for MPBT mapped memory, Figure 3 also compares the newly introduced MPBT memory type to uncached memory (UC). Since UC mapped data is not cached in the WCB but instead directly put on the network the number of packets increases significantly and hence the performance decreases. This results in a significantly lower performance of UC memory compared to MPBT memory. III. Validation of the library A. Verification of the API with Uppaal We want to verify that the message passing mechanisms work well if the functions are correctly employed by the user. For instance, if core i makes a blocking send to core j and j conducts a blocking read from i, then the transaction will succeed and the cores can continue their processing. For that purpose, we have defined several communication patterns and their expected behaviors. These were verified using the model checker Uppaal. For each function, send and receive, we modeled its behavior in a generic timed automaton (see Figure 4) which is independent of the type (blocking,... ) and of the paradigm (PUSH or PULL). The types appear in the guards of the transitions. The blocking send automaton works as follows: the sender (see Figure 4(a)) first checks if he can send to the particular destination(s) (eval_check). If a send is not possible the sender will wait, continuously checking if the send can be conducted. Once a send to the receiver(s) can be conducted the number of packets to be sent is calculated (send_rdy) and the first packet is sent (send_data). 5 of 11

6 Figure 3: Performance comparison of PULL and PUSH using a simple PING-PONG example MPBT= memory mapped as message passing buffer type memory mapped as uncached Init Init eval_check check_send no_received send_nrdy check_timeout send_rdy send_data rec_done data_received timeout_rec send_nbdone check_ack send_bdone check_send rec_nomore (a) Send (b) Receive Figure 4: Automata representations of the functions The sender will then wait for the acknowledgement from the receiver(s). Once the acknowledgement has been received the sender either sends the next packet (send_data) or, if all packets have been sent, returns (send_bdone). The communication patterns check the exchange between one emitter and one receiver. The expected behavior is expressed as a formula in temporal logic. In case of a blocking sender and a blocking receiver, we want to verify that the message will be received correctly. In the automata, it means that the sender will always reach the state send_bdone while the receiver will always reach the state rec_done. This can be expressed by the formula: A A (send.send_bdone rec.rec_done) The property has been verified with Uppaal. We have also proven that in case of a blocking sender and a non_blocking receiver the message is received correctly if the sender has issued the send before the receiver has conducted the receive and the message is not larger than the available partition in the MPB. In the automata this means that if the receiver 6 of 11

7 is still in the Init state and the sender already sent its message (check_ack) the sender will terminate in the state send_bdone while the receiver will reach the state rec_done. Similarily it was verified that if no receiver is present, a blocking send will never terminate. In the send automaton, it means that the sender will never reach send_bdone. The communication using a non_blocking sender and a blocking receiver was also verified. Once again it has to be assumed that the message size does not exceed the size of the available MPB partition. If that is true the receiver will always terminate (i.e. the state rec_done will always hold at some point in the future). Independent of this, the sender will always terminate by reaching the state send_nbdone. The communication using a non_blocking sender and a non_blocking receiver was verified by checking that if the send occurred before the receive, both send and receive will terminate correctly. In the automata, it means that the sender will reach the state send_nbdone and the receiver will reach the state rec_done. Finally it was verified that even if no receiver is present, a non_blocking send will terminate. In other words, the send automaton will reach the state send_nbdone. B. Global clock and synchronization library To validate the real-time execution of the message passing library, we need to use real-time clock primitives. Each SCC core has a Pentium TSC (TimeStamp Counter) which can be used as a locally accessible highresolution clock. The only problem is, that the different TSC are not synchronised on startup. In order to start the execution of the code on the cores the cores reset bits have to be cleared. This causes the core to leave the reset state and start the execution. As there is no global reset available, the reset bits of all cores have to be released individually causing discrepancy between the local TSC values. The Baremichael framework allows access to the global clock on the FPGA which is connected to the SCC. Since the cost to access this global clock is large compared to the local TSC and its accuracy depends on the number of cores simultaneously requesting access to the global clock a local notion of the global clock had to be derived. In order to generate global clock notion of time, each core can simply poll the global time from the FPGA and calculate the local clock s offset via t global = t local + offset local. In order to create an average value for the access time of the global clock via the mesh network, the global clock is accessed multiple times. The average Round-trip time (RTT) is then subtracted from the local time obtained after the last access of the global clock. The result of this computation is subtracted from the scaled global time resulting in the local cores offset value. The scaling of the global clock is necessary, because local and global clocks are at different frequencies. In order to avoid collisions on the mesh network the global clock should be pulled by each core individually. The thereby obtained notion of a global clock can then be used to synchronize the cores. Since we do not necessarily know how many cores are active and what kind of message passing paradigm they use, the synchronization uses the dedicated setup space in the MPB. A mastercore, which has to be specified at compile time, sends to each core a global start time at which point the cores can start the execution. Using this method a maximum clock discrepancy of 4 µs can be observed between the cores and the mastercore, which in this case is core 0. This difference can be attributed to the drift between the SCC s clock and the FPGA s clock. This causes the discrepancy in the core s notion of global time, because the sequential polling of global time prevents the cores from accessing the global clock at the same time. C. Timing experiments - Case study In order to evaluate the behavior of the NoC, a case study was constructed emulating an AFDX network. Different virtual links (VL) defined by a maximum package size, a period, and a group of receivers are mapped to different cores. The goal was to evaluate the dependence of the the delay between the send and the reception of a message on the mapping of the VLs to cores. To that end two mappings were created. Both mappings are putting the same overall workload on the cores. The major difference between both mappings is that: in the first mapping, shown in Figure 5(a), the distances between the sending and receiving node are minimized whereas in the second mapping, shown in Figure 5(b), the distances between sender and receiver are maximized just as the number of messages passing by the same router. The first mapping should therefore create as little congestion as possible on the network. On the other hand, the second mapping was designed especially to create network congestion. In addition to evaluating both an uncongested and a congested network, we also compare the PULL and the PUSH mechanisms. 7 of 11

8 (a) without congestion (b) with congestion Figure 5: Network traffic During the experiment each core sends in function of the VLs mapped to it in specific intervals messages. The send messages are encoded with the global time at which the message was sent. A core will send a message every 100 µs. When it is not sending it tries to receive. Initially an implementation of receive-any was used for checking the reception of a package. This implementation read a whole flag line at once in order to reduce the penalty from consecutively accessing single bytes from the memory. As the mechanisms after each successful read restarted the send and receive process, it was possible that cores with a higher core number sending to a core with a lower core number are never able to successfully send their message. Hence, the sending core was blocked. In order to circumvent this behavior and to be as fair as possible, the cores sequentially go through all possible senders and check whether they have received a message from that sender. This also penalizes cores with a larger core number but they do not risk being blocked entirely. Upon receipt of a message the receiving core evaluates the message s transmission time by subtracting the send time from the time of reception. The maximum message size is 160 Bytes. In order to increase the number of messages traversing the network all messages are sent at the same global time. The results for PUSH- and PULL-based sending in the uncongested case (Figure 6(a) for PULL and Figure 6(b) for PUSH) are quite distinct. Whereas in the PUSH case a maximum transmission time mirroring the core workload can be observed, the maximum transmission times for the PULL case are subject to stronger variations. Furthermore, little to no difference can be seen between the maximum delays when comparing the congested case (Figure 7(a) for PULL and Figure 7(b) for PUSH) with the uncongested case. Both cases have transmission time maxima on cores 0, 10, 37, and 47. While those peaks are higher in the congested case, the average delay between the transmission and reception of messages is higher for the uncongested case. This is especially obvious for the sends conducted using PULL. The peaks in Figures 6(b) to 7(a) can be understood when considering the number of frames to be sent and received by each core. This figure can be regarded as a measure for a core s workload. Cores 0, 10, 37, and 47, the cores who exhibit the peaks, are cores conducting multicasts. Hence, they have more frames to send and flags to set. As this requires time, receiving experiences an additional delay. These multicast cores are also the main reason for the increased maximum delay in the uncongested case for the PULL mechanism. In the PULL case, a multicast can be conducted by writing one message in the sender s MPB and setting the flags of the receiving cores. As this is done sequentially, some cores get notified later than others and observe a larger delay. When examining the mapping for the uncongested case, one can see that the concerned cores are all targets of a multicast. The higher delay at these cores is caused by the way the multicast is conducted when using PULL. While the message only has to be written once the core s flags are pulled sequentially. Therefore, a core can suffer a higher delay when his flag is set later in the multicast sequence. This effect is increased by the way the reception is conducted. The receiving cores check sequentially for each core if they ve received a message from this particular core. Therefore, messages from cores with a higher corenumber are discriminated. From the results presented above, we conclude that the network is not the dominating factor for the delay. Rather, the delay can be attributed to the workload of the cores; cores with a high workload experience higher message delays. Receiving cores may also be penalized when a core is conducting a multicast. Application design should take this into account in order to achieve the required timing precision. 8 of 11

9 (a) PULL (b) PUSH Figure 6: Delay per core without congestion IV. Conclusion This paper presented a multiple instruction, multiple data bare-metal message passing library for the Intel Single Chip Cloud Computer. The library functions for send and receive were verified using UPPAAL. In addition a mechanism to arrive at a global notion of time across the distributed cores was presented. This global notion of time is used to arrive at a synchronization of the cores. On the example of an use-case based upon a pseudo AFDX network implementation, it was shown, that the main bottleneck of the message passing performace is not the performance of the network but rather the workload of the cores. Therefore, in order to arrive at an evenly distributed transmission time between the cores, an evenly distributed load is desirable. Future work should try to optimize the workload of the cores in order to arrive at the smallest possible transmission time. Furthermore an a-priori analysis of the inter-core communication and core workloads similar to a schedulability analysis could be interesting. 9 of 11

10 (a) PULL (b) PUSH Figure 7: Delay per core with network congestion References 1 Wilhelm, R., Engblom, J., Ermedahl, A., Holsti, N., Thesing, S., Whalley, D. B., Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T., Mueller, F., Puaut, I., Puschner, P. P., Staschulat, J., and Stenström, P., The worst-case execution-time problem - overview of methods and survey of tools, ACM Trans. Embedded Comput. Syst., Vol. 7, No. 3, Labs, I., SCC External Architecture Specification (EAS), Tech. rep., Intel Corporation, May Ziwisky, M. and Brylow, D., BareMichael: A Minimalistic Bare-metal Framework for the Intel SCC, Noulard and Vernhes, 14 pp , pp , 4 Mattson, T., Riepen, M., Lehnig, T., Brett, P., Haas, W., Kennedy, P., Howard, J., Vangal, S., Borkar, N., Ruhl, G., et al., The 48-core SCC processor: the programmer s view, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Computer Society, 2010, pp Mattson, T., Riepen, M., Lehnig, T., Brett, P., Haas, W., Kennedy, P., Howard, J., Vangal, S., Borkar, N., Ruhl, G., et al., The 48-core SCC processor: the programmer s view presentation, Clauss, C., Lankes, S., Galowicz, J., and Bemmerl, T., ircce: a non-blocking communication extension to the RCCE communication library for the Intel Single-chip Cloud Computer, Chair for Operating Systems, RWTH Aachen University (December 17, 2010), of 11

11 7 ET International Inc., ETI SCC Bare Metal OS Development Framework: User s Manual, Tech. rep., ET International Inc., January Reble, P., Galowicz, J., Lankes, S., and Bemmerl, T., Efficient Implementation of the bare-metal Hypervisor MetalSVM for the SCC, Noulard and Vernhes, 14 pp , pp , 9 Sputh, B. H., Lukin, A., and Verhulst, E., Transparent Programming of Many/Multi Cores with OpenComRTOS: Comparing Intel 48-core SCC and TI 8-core TMS320C6678, Noulard and Vernhes, 14 pp , pp , onera.fr/marconera Larsen, K., Pettersson, P., and Yi, W., UPPAAL in a Nutshell, International Journal on Software Tools for Technology Transfer (STTT), Vol. 1, No. 1, 1997, pp Behrmann, G., David, A., and Larsen, K., A tutorial on uppaal, Formal methods for the design of real-time systems, 2004, pp Kaiser, R. and Wagner, S., Pentium Processor Family Developer s Manual Volume 3:Architecture and Programming Manual, Manual, Intel Corporation, Millberg, M., Architectural Techniques for Improving Performance in Networks on Chip, Ph.D. thesis, KTH Royal Institute of Technology, 2011,. 14 Noulard, E. and Vernhes, S., editors. ONERA, The French Aerospace Lab, July 2012, marconera of 11

Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.

Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp. Advances in Smart Systems Research : ISSN 2050-8662 : http://nimbusvault.net/publications/koala/assr/ Vol. 3. No. 3 : pp.49-54 : isrp13-005 Optimized Communications on Cloud Computer Processor by Using

More information

- Nishad Nerurkar. - Aniket Mhatre

- Nishad Nerurkar. - Aniket Mhatre - Nishad Nerurkar - Aniket Mhatre Single Chip Cloud Computer is a project developed by Intel. It was developed by Intel Lab Bangalore, Intel Lab America and Intel Lab Germany. It is part of a larger project,

More information

A Fast Inter-Kernel Communication and Synchronization Layer for MetalSVM

A Fast Inter-Kernel Communication and Synchronization Layer for MetalSVM A Fast Inter-Kernel Communication and Synchronization Layer for MetalSVM Pablo Reble, Stefan Lankes, Carsten Clauss, Thomas Bemmerl Chair for Operating Systems, RWTH Aachen University Kopernikusstr. 16,

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

AFDX networks. Computers and Real-Time Group, University of Cantabria

AFDX networks. Computers and Real-Time Group, University of Cantabria AFDX networks By: J. Javier Gutiérrez (gutierjj@unican.es) Computers and Real-Time Group, University of Cantabria ArtistDesign Workshop on Real-Time System Models for Schedulability Analysis Santander,

More information

Efficient Implementation of the bare-metal Hypervisor MetalSVM for the SCC

Efficient Implementation of the bare-metal Hypervisor MetalSVM for the SCC Efficient Implementation of the bare-metal Hypervisor MetalSVM for the SCC Public Release of MetalSVM 0.1 Pablo Reble, Jacek Galowicz, Stefan Lankes and Thomas Bemmerl MARC Symposium, Toulouse CHAIR FOR

More information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

Computer Network. Interconnected collection of autonomous computers that are able to exchange information Introduction Computer Network. Interconnected collection of autonomous computers that are able to exchange information No master/slave relationship between the computers in the network Data Communications.

More information

STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT

STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT 1. TIMING ACCURACY The accurate multi-point measurements require accurate synchronization of clocks of the measurement devices. If for example time stamps

More information

Designing Predictable Multicore Architectures for Avionics and Automotive Systems extended abstract

Designing Predictable Multicore Architectures for Avionics and Automotive Systems extended abstract Designing Predictable Multicore Architectures for Avionics and Automotive Systems extended abstract Reinhard Wilhelm, Christian Ferdinand, Christoph Cullmann, Daniel Grund, Jan Reineke, Benôit Triquet

More information

A Transport Protocol for Multimedia Wireless Sensor Networks

A Transport Protocol for Multimedia Wireless Sensor Networks A Transport Protocol for Multimedia Wireless Sensor Networks Duarte Meneses, António Grilo, Paulo Rogério Pereira 1 NGI'2011: A Transport Protocol for Multimedia Wireless Sensor Networks Introduction Wireless

More information

Fast Fluid Dynamics on the Single-chip Cloud Computer

Fast Fluid Dynamics on the Single-chip Cloud Computer Fast Fluid Dynamics on the Single-chip Cloud Computer Marco Fais, Francesco Iorio High-Performance Computing Group Autodesk Research Toronto, Canada francesco.iorio@autodesk.com Abstract Fast simulation

More information

CPU Shielding: Investigating Real-Time Guarantees via Resource Partitioning

CPU Shielding: Investigating Real-Time Guarantees via Resource Partitioning CPU Shielding: Investigating Real-Time Guarantees via Resource Partitioning Progress Report 1 John Scott Tillman jstillma@ncsu.edu CSC714 Real-Time Computer Systems North Carolina State University Instructor:

More information

Invasive MPI on Intel s Single-Chip Cloud Computer

Invasive MPI on Intel s Single-Chip Cloud Computer Invasive MPI on Intel s Single-Chip Cloud Computer Isaías A. Comprés Ureña 1, Michael Riepen 2, Michael Konow 2, and Michael Gerndt 1 1 Technical University of Munich (TUM), Institute of Informatics, Boltzmannstr.

More information

Protocols and Architecture. Protocol Architecture.

Protocols and Architecture. Protocol Architecture. Protocols and Architecture Protocol Architecture. Layered structure of hardware and software to support exchange of data between systems/distributed applications Set of rules for transmission of data between

More information

TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) Internet Protocol (IP)

TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) Internet Protocol (IP) TCP over Multi-hop Wireless Networks * Overview of Transmission Control Protocol / Internet Protocol (TCP/IP) *Slides adapted from a talk given by Nitin Vaidya. Wireless Computing and Network Systems Page

More information

Design and Verification of Nine port Network Router

Design and Verification of Nine port Network Router Design and Verification of Nine port Network Router G. Sri Lakshmi 1, A Ganga Mani 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Pragathi Engineering College, Andhra

More information

Early experience with the Barrelfish OS and the Single-Chip Cloud Computer

Early experience with the Barrelfish OS and the Single-Chip Cloud Computer Early experience with the Barrelfish OS and the Single-Chip Cloud Computer Simon Peter, Adrian Schüpbach, Dominik Menzi and Timothy Roscoe Systems Group, Department of Computer Science, ETH Zurich Abstract

More information

Exploring the Intel Single-Chip Cloud Computer and its possibilities for SVP

Exploring the Intel Single-Chip Cloud Computer and its possibilities for SVP Master Computer Science Computer Science University of Amsterdam Exploring the Intel Single-Chip Cloud Computer and its possibilities for SVP Roy Bakker 0583650 bakkerr@science.uva.nl October 7, 2011 Supervisors:

More information

Architecture of distributed network processors: specifics of application in information security systems

Architecture of distributed network processors: specifics of application in information security systems Architecture of distributed network processors: specifics of application in information security systems V.Zaborovsky, Politechnical University, Sait-Petersburg, Russia vlad@neva.ru 1. Introduction Modern

More information

Switched Interconnect for System-on-a-Chip Designs

Switched Interconnect for System-on-a-Chip Designs witched Interconnect for ystem-on-a-chip Designs Abstract Daniel iklund and Dake Liu Dept. of Physics and Measurement Technology Linköping University -581 83 Linköping {danwi,dake}@ifm.liu.se ith the increased

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

A Dynamic Link Allocation Router

A Dynamic Link Allocation Router A Dynamic Link Allocation Router Wei Song and Doug Edwards School of Computer Science, the University of Manchester Oxford Road, Manchester M13 9PL, UK {songw, doug}@cs.man.ac.uk Abstract The connection

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Per-Flow Queuing Allot's Approach to Bandwidth Management

Per-Flow Queuing Allot's Approach to Bandwidth Management White Paper Per-Flow Queuing Allot's Approach to Bandwidth Management Allot Communications, July 2006. All Rights Reserved. Table of Contents Executive Overview... 3 Understanding TCP/IP... 4 What is Bandwidth

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 11 Memory Management Computer Architecture Part 11 page 1 of 44 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin

More information

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented

More information

FPGA Implementation of IP Packet Segmentation and Reassembly in Internet Router*

FPGA Implementation of IP Packet Segmentation and Reassembly in Internet Router* SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 6, No. 3, December 2009, 399-407 UDK: 004.738.5.057.4 FPGA Implementation of IP Packet Segmentation and Reassembly in Internet Router* Marko Carević 1,a,

More information

Building an Inexpensive Parallel Computer

Building an Inexpensive Parallel Computer Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

More information

A Conference Control Protocol for Highly Interactive Video-conferencing

A Conference Control Protocol for Highly Interactive Video-conferencing A Conference Control Protocol for Highly Interactive Video-conferencing Ruibiao Qiu Fred Kuhns Jerome R. Cox Applied Research Laboratory Department of Computer Science Washington University Saint Louis,

More information

Module 15: Network Structures

Module 15: Network Structures Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and

More information

Memory Architecture and Management in a NoC Platform

Memory Architecture and Management in a NoC Platform Architecture and Management in a NoC Platform Axel Jantsch Xiaowen Chen Zhonghai Lu Chaochao Feng Abdul Nameed Yuang Zhang Ahmed Hemani DATE 2011 Overview Motivation State of the Art Data Management Engine

More information

Latency on a Switched Ethernet Network

Latency on a Switched Ethernet Network Application Note 8 Latency on a Switched Ethernet Network Introduction: This document serves to explain the sources of latency on a switched Ethernet network and describe how to calculate cumulative latency

More information

Real-Time Scheduling 1 / 39

Real-Time Scheduling 1 / 39 Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A

More information

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM 152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented

More information

Chapter 14: Distributed Operating Systems

Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Chapter 14: Distributed Operating Systems Motivation Types of Distributed Operating Systems Network Structure Network Topology Communication Structure Communication

More information

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner Real-Time Component Software slide credits: H. Kopetz, P. Puschner Overview OS services Task Structure Task Interaction Input/Output Error Detection 2 Operating System and Middleware Applica3on So5ware

More information

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001

Agenda. Michele Taliercio, Il circuito Integrato, Novembre 2001 Agenda Introduzione Il mercato Dal circuito integrato al System on a Chip (SoC) La progettazione di un SoC La tecnologia Una fabbrica di circuiti integrati 28 How to handle complexity G The engineering

More information

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師

Operating System Concepts. Operating System 資 訊 工 程 學 系 袁 賢 銘 老 師 Lecture 7: Distributed Operating Systems A Distributed System 7.2 Resource sharing Motivation sharing and printing files at remote sites processing information in a distributed database using remote specialized

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

QoS issues in Voice over IP

QoS issues in Voice over IP COMP9333 Advance Computer Networks Mini Conference QoS issues in Voice over IP Student ID: 3058224 Student ID: 3043237 Student ID: 3036281 Student ID: 3025715 QoS issues in Voice over IP Abstract: This

More information

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet CCNA R&S: Introduction to Networks Chapter 5: Ethernet 5.0.1.1 Introduction The OSI physical layer provides the means to transport the bits that make up a data link layer frame across the network media.

More information

Transport Layer Protocols

Transport Layer Protocols Transport Layer Protocols Version. Transport layer performs two main tasks for the application layer by using the network layer. It provides end to end communication between two applications, and implements

More information

Based on Computer Networking, 4 th Edition by Kurose and Ross

Based on Computer Networking, 4 th Edition by Kurose and Ross Computer Networks Ethernet Hubs and Switches Based on Computer Networking, 4 th Edition by Kurose and Ross Ethernet dominant wired LAN technology: cheap $20 for NIC first widely used LAN technology Simpler,

More information

Remote I/O Network Determinism

Remote I/O Network Determinism Remote I/O Network Determinism September 2011 by David Doggett & Manuel J. Palomino Make the most of your energy Summary Executive Summary... p 3 Introduction... p 4 Network Determinism vs Overall Performance...

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

USB readout board for PEBS Performance test

USB readout board for PEBS Performance test June 11, 2009 Version 1.0 USB readout board for PEBS Performance test Guido Haefeli 1 Li Liang 2 Abstract In the context of the PEBS [1] experiment a readout board was developed in order to facilitate

More information

Intel DPDK Boosts Server Appliance Performance White Paper

Intel DPDK Boosts Server Appliance Performance White Paper Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks

More information

Asynchronous Bypass Channels

Asynchronous Bypass Channels Asynchronous Bypass Channels Improving Performance for Multi-Synchronous NoCs T. Jain, P. Gratz, A. Sprintson, G. Choi, Department of Electrical and Computer Engineering, Texas A&M University, USA Table

More information

Ethernet. Ethernet. Network Devices

Ethernet. Ethernet. Network Devices Ethernet Babak Kia Adjunct Professor Boston University College of Engineering ENG SC757 - Advanced Microprocessor Design Ethernet Ethernet is a term used to refer to a diverse set of frame based networking

More information

Interconnection Networks

Interconnection Networks Advanced Computer Architecture (0630561) Lecture 15 Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept. Interconnection Networks: Multiprocessors INs can be classified based on: 1. Mode

More information

Chapter 16: Distributed Operating Systems

Chapter 16: Distributed Operating Systems Module 16: Distributed ib System Structure, Silberschatz, Galvin and Gagne 2009 Chapter 16: Distributed Operating Systems Motivation Types of Network-Based Operating Systems Network Structure Network Topology

More information

Parallel Computing 37 (2011) 26 41. Contents lists available at ScienceDirect. Parallel Computing. journal homepage: www.elsevier.

Parallel Computing 37 (2011) 26 41. Contents lists available at ScienceDirect. Parallel Computing. journal homepage: www.elsevier. Parallel Computing 37 (2011) 26 41 Contents lists available at ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco Architectural support for thread communications in multi-core

More information

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

How To Monitor And Test An Ethernet Network On A Computer Or Network Card 3. MONITORING AND TESTING THE ETHERNET NETWORK 3.1 Introduction The following parameters are covered by the Ethernet performance metrics: Latency (delay) the amount of time required for a frame to travel

More information

How To Build A Cloud Computer

How To Build A Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Vorlesung Rechnerarchitektur 2 Seite 178 DASH Vorlesung Rechnerarchitektur 2 Seite 178 Architecture for Shared () The -architecture is a cache coherent, NUMA multiprocessor system, developed at CSL-Stanford by John Hennessy, Daniel Lenoski, Monica

More information

Voice over IP: RTP/RTCP The transport layer

Voice over IP: RTP/RTCP The transport layer Advanced Networking Voice over IP: /RTCP The transport layer Renato Lo Cigno Requirements For Real-Time Transmission Need to emulate conventional telephone system Isochronous output timing same with input

More information

Communications and Computer Networks

Communications and Computer Networks SFWR 4C03: Computer Networks and Computer Security January 5-8 2004 Lecturer: Kartik Krishnan Lectures 1-3 Communications and Computer Networks The fundamental purpose of a communication system is the

More information

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0

Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0 Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without

More information

ICOM 5026-090: Computer Networks Chapter 6: The Transport Layer. By Dr Yi Qian Department of Electronic and Computer Engineering Fall 2006 UPRM

ICOM 5026-090: Computer Networks Chapter 6: The Transport Layer. By Dr Yi Qian Department of Electronic and Computer Engineering Fall 2006 UPRM ICOM 5026-090: Computer Networks Chapter 6: The Transport Layer By Dr Yi Qian Department of Electronic and Computer Engineering Fall 2006 Outline The transport service Elements of transport protocols A

More information

Traffic Monitoring in a Switched Environment

Traffic Monitoring in a Switched Environment Traffic Monitoring in a Switched Environment InMon Corp. 1404 Irving St., San Francisco, CA 94122 www.inmon.com 1. SUMMARY This document provides a brief overview of some of the issues involved in monitoring

More information

Awareness of MPI Virtual Process Topologies on the Single-Chip Cloud Computer

Awareness of MPI Virtual Process Topologies on the Single-Chip Cloud Computer Awareness of MPI Virtual Process Topologies on the Single-Chip Cloud Computer Steffen Christgau, Bettina Schnor Potsdam University Institute of Computer Science Operating Systems and Distributed Systems

More information

Serial Communications

Serial Communications Serial Communications 1 Serial Communication Introduction Serial communication buses Asynchronous and synchronous communication UART block diagram UART clock requirements Programming the UARTs Operation

More information

Project 4: (E)DoS Attacks

Project 4: (E)DoS Attacks Project4 EDoS Instructions 1 Project 4: (E)DoS Attacks Secure Systems and Applications 2009 Ben Smeets (C) Dept. of Electrical and Information Technology, Lund University, Sweden Introduction A particular

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

VXLAN: Scaling Data Center Capacity. White Paper

VXLAN: Scaling Data Center Capacity. White Paper VXLAN: Scaling Data Center Capacity White Paper Virtual Extensible LAN (VXLAN) Overview This document provides an overview of how VXLAN works. It also provides criteria to help determine when and where

More information

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding

Mathematical Modelling of Computer Networks: Part II. Module 1: Network Coding Mathematical Modelling of Computer Networks: Part II Module 1: Network Coding Lecture 3: Network coding and TCP 12th November 2013 Laila Daniel and Krishnan Narayanan Dept. of Computer Science, University

More information

HRG Assessment: Stratus everrun Enterprise

HRG Assessment: Stratus everrun Enterprise HRG Assessment: Stratus everrun Enterprise Today IT executive decision makers and their technology recommenders are faced with escalating demands for more effective technology based solutions while at

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

More information

Parallel sorting on Intel Single-Chip Cloud computer

Parallel sorting on Intel Single-Chip Cloud computer Parallel sorting on Intel Single-Chip Cloud computer Kenan Avdic, Nicolas Melot, Jörg Keller 2, and Christoph Kessler Linköpings Universitet, Dept. of Computer and Inf. Science, 5883 Linköping, Sweden

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Network Simulation Traffic, Paths and Impairment

Network Simulation Traffic, Paths and Impairment Network Simulation Traffic, Paths and Impairment Summary Network simulation software and hardware appliances can emulate networks and network hardware. Wide Area Network (WAN) emulation, by simulating

More information

Data Link Layer(1) Principal service: Transferring data from the network layer of the source machine to the one of the destination machine

Data Link Layer(1) Principal service: Transferring data from the network layer of the source machine to the one of the destination machine Data Link Layer(1) Principal service: Transferring data from the network layer of the source machine to the one of the destination machine Virtual communication versus actual communication: Specific functions

More information

Advanced Networking Voice over IP: RTP/RTCP The transport layer

Advanced Networking Voice over IP: RTP/RTCP The transport layer Advanced Networking Voice over IP: RTP/RTCP The transport layer Renato Lo Cigno Requirements For Real-Time Transmission Need to emulate conventional telephone system Isochronous output timing same with

More information

Operating System Support for Multiprocessor Systems-on-Chip

Operating System Support for Multiprocessor Systems-on-Chip Operating System Support for Multiprocessor Systems-on-Chip Dr. Gabriel marchesan almeida Agenda. Introduction. Adaptive System + Shop Architecture. Preliminary Results. Perspectives & Conclusions Dr.

More information

Dynamic Load Balancing of Virtual Machines using QEMU-KVM

Dynamic Load Balancing of Virtual Machines using QEMU-KVM Dynamic Load Balancing of Virtual Machines using QEMU-KVM Akshay Chandak Krishnakant Jaju Technology, College of Engineering, Pune. Maharashtra, India. Akshay Kanfade Pushkar Lohiya Technology, College

More information

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy

Hardware Implementation of Improved Adaptive NoC Router with Flit Flow History based Load Balancing Selection Strategy Hardware Implementation of Improved Adaptive NoC Rer with Flit Flow History based Load Balancing Selection Strategy Parag Parandkar 1, Sumant Katiyal 2, Geetesh Kwatra 3 1,3 Research Scholar, School of

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng Architectural Level Power Consumption of Network Presenter: YUAN Zheng Why Architectural Low Power Design? High-speed and large volume communication among different parts on a chip Problem: Power consumption

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

In-Vehicle Networking

In-Vehicle Networking In-Vehicle Networking SAE Network classification Class A networks Low Speed (

More information

AFDX Emulator for an ARINC-based Training Platform. Jesús Fernández Héctor Pérez J. Javier Gutiérrez Michael González Harbour

AFDX Emulator for an ARINC-based Training Platform. Jesús Fernández Héctor Pérez J. Javier Gutiérrez Michael González Harbour AFDX Emulator for an ARINC-based Training Platform Jesús Fernández Héctor Pérez J. Javier Gutiérrez Michael González Harbour 2 2 Motivation Mature standards for safety-critical applications ARINC-653 for

More information

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK

AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Abstract AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Mrs. Amandeep Kaur, Assistant Professor, Department of Computer Application, Apeejay Institute of Management, Ramamandi, Jalandhar-144001, Punjab,

More information

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors

Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), February 23-24, 2011 Hyper Node Torus: A New Interconnection Network for High Speed Packet Processors Atefeh Khosravi,

More information

C-GEP 100 Monitoring application user manual

C-GEP 100 Monitoring application user manual C-GEP 100 Monitoring application user manual 1 Introduction: C-GEP is a very versatile platform for network monitoring applications. The ever growing need for network bandwith like HD video streaming and

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Single-chip Cloud Computer IA Tera-scale Research Processor

Single-chip Cloud Computer IA Tera-scale Research Processor Single-chip Cloud Computer IA Tera-scale esearch Processor Jim Held Intel Fellow & Director Tera-scale Computing esearch Intel Labs August 31, 2010 www.intel.com/info/scc Agenda Tera-scale esearch SCC

More information

ANALYSIS OF LONG DISTANCE 3-WAY CONFERENCE CALLING WITH VOIP

ANALYSIS OF LONG DISTANCE 3-WAY CONFERENCE CALLING WITH VOIP ENSC 427: Communication Networks ANALYSIS OF LONG DISTANCE 3-WAY CONFERENCE CALLING WITH VOIP Spring 2010 Final Project Group #6: Gurpal Singh Sandhu Sasan Naderi Claret Ramos (gss7@sfu.ca) (sna14@sfu.ca)

More information

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project Effects of Filler Traffic In IP Networks Adam Feldman April 5, 2001 Master s Project Abstract On the Internet, there is a well-documented requirement that much more bandwidth be available than is used

More information

Design Issues in a Bare PC Web Server

Design Issues in a Bare PC Web Server Design Issues in a Bare PC Web Server Long He, Ramesh K. Karne, Alexander L. Wijesinha, Sandeep Girumala, and Gholam H. Khaksari Department of Computer & Information Sciences, Towson University, 78 York

More information

Communication Networks. MAP-TELE 2011/12 José Ruela

Communication Networks. MAP-TELE 2011/12 José Ruela Communication Networks MAP-TELE 2011/12 José Ruela Network basic mechanisms Introduction to Communications Networks Communications networks Communications networks are used to transport information (data)

More information

Thingsquare Technology

Thingsquare Technology Thingsquare Technology Thingsquare connects smartphone apps with things such as thermostats, light bulbs, and street lights. The devices have a programmable wireless chip that runs the Thingsquare firmware.

More information

Question: 3 When using Application Intelligence, Server Time may be defined as.

Question: 3 When using Application Intelligence, Server Time may be defined as. 1 Network General - 1T6-521 Application Performance Analysis and Troubleshooting Question: 1 One component in an application turn is. A. Server response time B. Network process time C. Application response

More information

524 Computer Networks

524 Computer Networks 524 Computer Networks Section 1: Introduction to Course Dr. E.C. Kulasekere Sri Lanka Institute of Information Technology - 2005 Course Outline The Aim The course is design to establish the terminology

More information

Synchronization. Todd C. Mowry CS 740 November 24, 1998. Topics. Locks Barriers

Synchronization. Todd C. Mowry CS 740 November 24, 1998. Topics. Locks Barriers Synchronization Todd C. Mowry CS 740 November 24, 1998 Topics Locks Barriers Types of Synchronization Mutual Exclusion Locks Event Synchronization Global or group-based (barriers) Point-to-point tightly

More information

Quality of Service Testing in the VoIP Environment

Quality of Service Testing in the VoIP Environment Whitepaper Quality of Service Testing in the VoIP Environment Carrying voice traffic over the Internet rather than the traditional public telephone network has revolutionized communications. Initially,

More information

Chapter 3. Internet Applications and Network Programming

Chapter 3. Internet Applications and Network Programming Chapter 3 Internet Applications and Network Programming 1 Introduction The Internet offers users a rich diversity of services none of the services is part of the underlying communication infrastructure

More information