FPGA PCIe Bandwidth. Abstract. 1 Introduction. Mike Rose. Department of Computer Science and Engineering University of California San Diego

Size: px
Start display at page:

Download "FPGA PCIe Bandwidth. Abstract. 1 Introduction. Mike Rose. Department of Computer Science and Engineering University of California San Diego"

Transcription

1 FPGA PCIe Bandwidth Mike Rose Department of Computer Science and Engineering University of California San Diego June 9, 2010 Abstract The unique fusion of hardware and software that characterizes FPGAs has had a strong impact on the research community. Here at UCSD, there have been many successful applications of FPGAs, especially in the implementation of Machine Learning techniques that react in real-time. I have demonstrated that with the addition of DMA transfers, our bandwidth between the FPGA and host computer has been tripled for large data transfers. This increase in communication will result in increased capabilities of previous work and will permit otherwise impossible future applications. 1 Introduction Field Programmable Gate Arrays (FPGAs) already have shown an unparalleled ability to provide inexpensive real-time video processing solutions that are simply unavailable with competing technologies. This allows for unrivaled human-machine interactions by achieving undetectable visual response times of less than 100 ms. Many ongoing projects at UCSD have been utilizing these FPGA devices made by Xilinx with great success. An especially impressive project that took place at UCSD that exploited FPGAs is the real-time face detector written by Junguk Cho et al. [1]. They achieved a thirty-five times increase in system performance over an equivalent software implementation. Another exciting use of FPGA boards that is currently in development is a 60 frames per second point tracker by Matt Jacobson and Sunsern Cheamanunkul [4]. Mayank Cabra has also been looking into incorporating FPGAs into his work on analyzing pathology images for cancer detection. Most of our FPGA solutions rely on both hardware and software components. With solely software systems, the latency is just too high to be used in real time because you are forced to buffer and parse large amounts of incoming data. And while it is relatively easy to integrate in small specified hardware modules to accelerate simple tasks, the development effort and resources required to implement complex systems completely in hardware is also often unattainable. This means that in the majority of complex systems we must rely on hardware and software cooperation. There are two ways that software programs are run in our systems. Microblaze is a small processor that runs directly on the FPGA using a portion of the programmable hardware on the chip, or we can use a workstation (desktop computer) attached and powering the FPGA. Microblaze is much closer to our hardware, since it is also running on the same chip. This results in low communication latency but Microblaze is limited by a very slow clock-speed of only about 100 MHz and a small amount of memory available. Desktop computers, in comparison, have easier programming environments, seemingly limitless memory, multiple CPUs and fast clock speeds. For these reasons, it was my goal to increase the bandwidth of communication between our FPGA devices and our 1

2 workstations so we can take advantage of the superior performance of the powerful workstation computers and increase the capabilities of the projects designed by my fellow students. 2 PCIe Overview On our Xilinx Virtex-5 FPGA board, the Peripheral Component Interconnect Express (PCIe) is ideal for our communication needs because it is capable of both high bandwidth and low latency data transfer. Its major advantage over our other communication options is that it plugs the FPGA directly into the motherboard itself (like an expansion card). PCIe connections are formed based on serial links (called lanes) rather than a parallel interconnect. The new Xilinx FPGA boards support multiple lanes (up to 16X) on the PCIe interconnect which allow for greater communication bandwidth especially for unrelated or bidirectional data streams. Above the physical layer lies the Transaction Layer which is the heart of the PCIe Communication protocol. The Transaction Layer segments data into Transaction Layer Packets (TLPs) which guarantees inorder transmission across the bus. Even though the protocol claims it is capable of 250 MB/s throughput across a single lane of the PCIe, this is usually unattainable in practice. In fact, in preliminary tests here at UCSD, we were only able to send, process and send back data at a rate of about 3.5 MB/sec [2]. My first goal when I undertook this project was to investigate the reason that our throughput numbers were so low when the protocol had given us such high hopes. Upon reading about the PCIe, it quickly became evident that it was optimized for large data transfers (such as those that come constantly streaming from a graphics card). Since the entire memory size of the current system we were testing was only 64KB, it was impossible for data transfers larger than that to occur. I also discovered that by using only normal read and write calls via Process Input Output (PIO) communication in the drivers means that each individual TLP Packet can only contain a payload of 4 bytes. This is instead of something closer to the defined Figure 1: TLP Packet Breakdown maximum of 4KB. Xilinx performed a theoretical analysis of the protocol just based on the overhead of space in every packet (see Figure 1) and demonstrated you could compute the maximum theoretical efficiency with the following equation: Efficiency = Payload Payload So if we use only 4 byte payloads, this already drops even the theoretical maximum efficiency down to 16.7% efficiency instead of 99% when using 2 KB payloads [8]. But what complicated things in our case was that the systems we were using were not actually using just basic reads and writes through the driver. We actually had a memory mapping in place that allowed us to access the memory of the FPGA as if it were our own. How memory mappings are implemented is completely machine dependent, so it was very difficult to know if the same limitations were applying in this case or if Linux was using this mapping to optimize our data transfers. The recommended way to take advantage of the full payload size of TLP packets is using Direct Memory Access (DMA). In a DMA transfer the PCIe knows the amount of data to be transferred beforehand and can segment the data accordingly. 3 Related Work Patrick Lai, Matt Jacobson and myself began exploring the capabilities and possibilities of using FPGAs to implement Machine Learning algorithms under the direction of Yoav Freund here at UCSD in the Fall of It was at this time that we uncovered and created much of the framework still used today when beginning UCSD s FPGA integrated projects [4]. At 2

3 first we were only able to communicate to our workstation through an extremely limited JTAG interface, but Patrick went on to find and edit a driver by Xilinx allowing us to communicate via PCIe with much more ease and higher bandwidth [3]. I cannot stress enough that the technologies and ideas that were used in this project were not groundbreaking or new. Xilinx has had the framework to do this DMA communication via the PCIe for quite some time and others have managed to implement similar systems in the past [7]. However, my objective was to investigate and design a solution so those of us researching here at UCSD could attain analogous results for our systems. Evan Ettinger designed the initial bandwidth benchmark of our PCIe communication that I edited and used for the baseline performance and on which I drew my comparisons [2]. 4 Design and Implementation The main communication concern that we wished to address was how to maximize the amount of throughput (data per second) that could be sent from the FPGA to the workstation, processed, and then returned back to the FPGA. Since we wanted to benchmark the bandwidth of a round trip, we decided that the communication bandwidth would be the same whether the data originated on the workstation or the FPGA as long as a full cycle was completed. It was easier at the time to locate the controlling application that was on the workstation since development time of a normal C program is much quicker and taking timing measurements would be far simpler. The other reason that it was easier to create a benchmark with the workstation in control was because our communication with the FPGA was still asymmetric. This means that the workstation was completely in control of the source and destination addresses of any data transmissions across the PCIe. An important byproduct of my research was to enable the FPGA components to be able to control reads and writes of workstation memory, creating a full symmetric communication system. With these goals in mind, the following benchmark Figure 2: Original Design Flow was devised by Evan Ettinger (see Figure 2). First, the workstation fills a buffer with test data. It then begins timing and sends the data over the PCIe using its mapping into BRAM (the FPGAs fast access memory). When the transmission is complete, the benchmark sets a flag in a specific location in BRAM signaling that the data is ready for processing. Meanwhile, a hardware module on the FPGA named the bitflipper polls this location. Once it has been updated, the bitflipper wakes and begins negating each word of data that was received. The bitflipper then signals the workstation by writing to a different location in BRAM. In the meantime, the workstation must poll BRAM across the PCIe until it receives notification that the processing is complete. The workstation then reads all of the data back from FPGA. Lastly, it stops our timer and checks the validity of the data by comparing its returned data to the inverse of the input data. The bitflipper was added because it proved to be a useful tool in verifying that our data was indeed received by the FPGA and there was no caching or errors occurring in our data transfers. Based on my initial research, there were two areas that I perceived as the most likely possible bottlenecks in the benchmark. The first was using the memory map as our main mechanism for data transfers. Since the mechanisms used inside the memory map were such an unknown area, it seemed very plausible that the resulting data transmissions were only sending one word data transmissions at a time. I decided that the most worthwhile optimization I could hope to achieve would be to convert the system to use DMA transfers which are capable of taking full 3

4 advantage of the PCIe protocol. The other issue was having the workstation completely in control of data transmission. This not only made a strange, unnatural dependency that caused an ugly interface between the two halves of our systems, but I also believed that having the workstation polling across the PCIe channel was creating latency that could dominate the running time of small data transfers. 4.1 DMA Daemon To realize these optimizations, the first step was to incorporate a DMA controller into the system. Luckily, Xilinx has an available IP core available that can be generated for exactly this purpose. The XPS Central DMA Controller [6] transfers a programmable amount of data from a source address to a destination address. I generated the core and attached it to the PLB (processor local bus). The PLB is the central bus of the FPGA that can be accessed from the Microblaze and is used to access other peripherals like BRAM and the PCIe bridge. The next step was completing a local standalone DMA transfer. To achieve this, I wrote a small program running in the Microblaze on the FPGA that put some data into BRAM. In order for the Microblaze to initiate a DMA transfer it must edit four registers. These registers are located in the XPS Central DMA Controller base address on the PLB plus corresponding offsets for source address, destination address, DMA control (DMACR), and Length. The source and destination in this case were both addresses within BRAM so the transfer would be local. Lastly, when the length register is set, the DMA transfer commences. It was quite useful to discover that while the transaction is underway you can poll the DMA Status Register to find out when the transfer has been completed. Upon finishing and debugging this code, I had my first successful DMA transfer. The bitflipper IP core was only ever connected directly to BRAM and never interfaced to the PLB. This created an interesting problem since the bitflipper was therefore incapable of using the DMA Controller alone. A colleague advised me that writing hardware level code to interface to the PLB was acheivable but was not a good use of my time. So instead of having the bitflipper IP core communicate to the DMA controller directly, I made a daemon that ran on the Microblaze and would poll the specific BRAM locations and pass on the parameters to the DMA Controller. I realized this would also be very useful for ordering DMA transfers from the workstation as well, since we already had working access to BRAM. This simple daemon made a clean and useful interface to the DMA handler and bridged the communication gap. 4.2 Workstation Driver Now that I had verified my ability to make local DMA Transfers, it was necessary to allow the workstation s memory to be addressable from the FPGA. The only way that this can be done is at the kernel driver level because user level applications live in a world of virtual addresses that are process specific and would mean nothing to the out-of-context hardware on the FPGA. At the kernel level you can allocate true physical addresses, but here I uncovered another interesting problem. The workstation we were using had a 64-bit address system so a simple call of kmalloc to allocate a buffer could very likely lay in a region not addressable with only 32 bits. After some searching, I found exactly the system call required for this situation. The pci alloc consistent call takes a size and returns both a cpu virtual memory address to be used by the PC and a hardware address (which is guaranteed to be within the first 32 bits of address space) that maps to the same address to be used by the device. This memory is also managed to ensure consistency. This means that either the workstation or the FPGA could be accessing this memory, but not both at any given time (using a mutex locking system). With this useful call in mind, I could now make some changes to the driver. Upon loading the driver I now allocated two buffers, one for sending data to the FPGA, and one for receiving data from the FPGA. Now using the previous working mechanism for writing across the PCIe (writel), I manually write the hardware addresses to specific locations in BRAM. 4

5 These locations will be picked up by the daemon and stored for later use. I could now begin changing the read and write functions of the driver. Instead of calling individual writes for each word, it would instead copy the user data into my preallocated buffer for workstation to FPGA communication. It would then simply send the DMA parameters of source, destination and length to my DMA daemon. I was happy to discover that with my new system in place, reads have became even easier. The bitflipper was now responsible for initiating the transfers from the FPGA to the workstation on its own. So the only functionality required in the workstation driver s read function was to copy the contents of the shared buffer back to the user. This means for a read operation there was no longer a need to transmit data across the PCIe. 4.3 PCIe Bridge Now most of the pieces of my project were in place. The DMA Controller was complete and the workstation driver was able to accept data transfers. But the next and final piece to connect it altogether proved very challenging. The PCIe Bridge (more specifically the Xilinx PLBv46 PCIe core [5]) that we had always relied on for PCIe communication was still only configured to allow data requests that originated from the workstation. We had always known it should be capable of allowing the FPGA the same freedom but had never understood how to accomplish this. The PCIe bridge itself, is actually broken up into two parts. There is the PCIe Controller which we had used before to enable the workstation to access BRAM, and the PCIe Bar, which was meant to be an address window on the PLB that maps directly to workstation memory. The Bar had so far gone unused, and it was now my task to use the Controller to initialize and configure the PCIe Bar. The first issue I faced was that in the beginning I could only find a mechanism to configure the workstation memory location that the bar should be mapped to at compile time. This was not very useful since I did not know the hardware address of my buffer until the driver was up and running. And about half the time it would get the same address but occasionally it would not, and it was unacceptable to consider a solution that was dependant on something so arbitrary. I finally found that if I rebuilt the Controller with the parameter C INCLUDE BAROFFSET REG set to 1 there would be an allocated register named IP- IFBAR2PCIBAR 0L where I could input my destination address at runtime [5]. The next problem I confronted can be summed up in one word: ëndianness. The Microblaze architecture is big endian and the workstation like any x86 computer is a little endian system. This means when I transferred over the hardware address that I should be setting the bar to point to, it was received but mixed up. A bug like this is easily fixable with a short method that flips the endianness of your data, though it was a difficult bug to detect. The hardest part was confirming which endianness I even should be using since I could find no documentation stating which endianness the PCIe controller would expect an addresses to be written in. The final difficulty I dealt with in altering the PCIe Bridge was due to alignment. A subtlety of how the PCIe Bar is designed is that it can only be configured to begin at addresses that are a multiple of its allocated address range. At first, I had given it an address range that was very large without much thought. I later found out this was causing the bar to ignore many of the least significant bits of the hardware address. This made the bridge appear that it was not working when it in fact was working but the beginning of the window was located far before the beginning of my buffers. This problem can be alleviated by either lowering the address size of the bar or by using bit manipulation on the hardware location to add the correct relative offset to the bar. 4.4 The Completed Benchmark The resulting data flow of the benchmark can be seen in Figure 3. The blue arrows represent the process to send data to the FPGA, and the red arrows represent the return trip. As the diagram shows, the driver and the bitswapper both communicate with the DMA Daemon but neither one is responsible for sending the bulk of the data being transferred. This system also 5

6 Figure 3: Modified Design Flow allows both the workstation and the FPGA to only poll memory locally which alleviates the more costly process of polling across the PCIe. 5 Results The results presented in this section were obtained by running the baseline benchmark that did not include the configured PCIe bar or DMA cores against the new system that I designed. When running the two benchmarks, all of the timings included in this paper were calculated using the RDTSC registers to count CPU clock cycles. The cycles were divided by the workstation s clock speed to yield our time. I also checked these values using gettimeofday which returned very similar results. Each test consisted of 10 trials that each requested 1000 DMA transfers in both directions for the the varying data sizes. I was careful to record both averages and standard deviations to ensure the quality of my results. Figure 4 shows the results of running the original benchmark in green (on the left), and the results of running our new benchmark in blue (on the right). The error bars represent the standard deviation in our test data which remained very small. Table 1 shows the results from the same trials, but I have added in the ratio of improvement. It can clearly be seen the throughput has improved in all sizes of data processed after the optimizations were added. Figure 4: Processed Data Throughput It was especially interesting to observe that even the small 4 byte transfers have shown a slight improvement. This is in spite of incurring extra overhead to begin our DMA transfers. This can only be explained by the fact that we have eliminated the need to poll for data readiness across the PCIe. As the data size is increased, the advantage of using DMA transfers begins to dominate. By the time we transfer 2 kilobytes of data, we have surpassed a 3x improvement in our data processing capabilities. Even though the results show great improvement, the numbers still do not approach the maximum bandwidth limits imposed by the PCIe protocol. However, I do not believe this in anyway undercuts the impact that this added bandwidth will have. 6 Future Work Accelerating bulk communication between the workstation and the FPGA will remain an open research problem here at UCSD, as it will always limit the abilities of our researchers. I believe the next step is to contact Xilinx on their forums and ask what can be done to improve our results further. An interesting topic that I was unable to test due to time constraints was the impact of using wider lane widths for the PCIe communication on the FPGA. The ML506 board we are currently us- 6

7 Bytes Original Modified Ratio K K K K K K Table 1: Processed Data Throughput (MB/sec) ing only supports 1X communication but the slightly newer ML555 series has 8X capability, and the just released Virtex-6 FPGA boards has been reported to use up to 16X. I have read mixed figures on how much has been gained from increasing the width, but using 8X lane widths did result in close to a 8 times speedup for this bandwidth test run by Xilinx [8] and is said to be especially beneficial for bidirectional communication. The other great improvement to the benchmark would be to take advantage of the time in which the PCIe is not full. This benchmark could be pipelined so that there are multiple buffers of data and a separate buffer could be transferred while the earlier data is being processed. I think this is the only true way to really maximize the usage of the PCIe. I hope that Xilinx is able to diagnose the limiting factors in our current communication method. But at the very least, when pipelined data flow is being incorporated with wider lane widths of new boards, we will see far superior performance. haar classifiers. In FPGA 09: Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays (New York, NY, USA, 2009), ACM, pp [2] Ettinger, E. Cse291 bram pcie. php/cse291_bram_pcie. [3] Lai, P. Patricklabdiary. edu/mediawiki/index.php/patricklabdiary. [4] UCSD. Fpga. bin/view/fpgaweb/webhome. [5] Xilinx. Ip logicore plbv46 rc/ep bridge for pci express (v4.04a). support/documentation/ip_documentation/ plbv46_pcie.pdf. [6] Xilinx. Logicore ip xps central dma controller (v2.01c). support/documentation/ip_documentation/ xps_central_dma.pdf. [7] Xilinx. Pci express forum. PCI-Express/bd-p/PCIe;jsessionid= CD9813F58F3E0898E51FD9AD783737A4. [8] Xilinx. Spartan-6 fpga connectivity targetedreferencedesignperformance. http: // ip_documentation/xps_central_dma.pdf. References [1] Cho, J., Mirzaei, S., Oberg, J., and Kastner, R. Fpga-based face detection system using 7

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Open Flow Controller and Switch Datasheet

Open Flow Controller and Switch Datasheet Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development

More information

LogiCORE IP AXI Performance Monitor v2.00.a

LogiCORE IP AXI Performance Monitor v2.00.a LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................

More information

Communicating with devices

Communicating with devices Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.

More information

Computer Systems Structure Input/Output

Computer Systems Structure Input/Output Computer Systems Structure Input/Output Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output Ward 1 Ward 2 Examples of I/O Devices

More information

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring CESNET Technical Report 2/2014 HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring VIKTOR PUš, LUKÁš KEKELY, MARTIN ŠPINLER, VÁCLAV HUMMEL, JAN PALIČKA Received 3. 10. 2014 Abstract

More information

The Bus (PCI and PCI-Express)

The Bus (PCI and PCI-Express) 4 Jan, 2008 The Bus (PCI and PCI-Express) The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Frequently Asked Questions

Frequently Asked Questions Frequently Asked Questions 1. Q: What is the Network Data Tunnel? A: Network Data Tunnel (NDT) is a software-based solution that accelerates data transfer in point-to-point or point-to-multipoint network

More information

D1.2 Network Load Balancing

D1.2 Network Load Balancing D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,

More information

How PCI Express Works (by Tracy V. Wilson)

How PCI Express Works (by Tracy V. Wilson) 1 How PCI Express Works (by Tracy V. Wilson) http://computer.howstuffworks.com/pci-express.htm Peripheral Component Interconnect (PCI) slots are such an integral part of a computer's architecture that

More information

PCI Express Overview. And, by the way, they need to do it in less time.

PCI Express Overview. And, by the way, they need to do it in less time. PCI Express Overview Introduction This paper is intended to introduce design engineers, system architects and business managers to the PCI Express protocol and how this interconnect technology fits into

More information

Direct GPU/FPGA Communication Via PCI Express

Direct GPU/FPGA Communication Via PCI Express Direct GPU/FPGA Communication Via PCI Express Ray Bittner, Erik Ruf Microsoft Research Redmond, USA {raybit,erikruf}@microsoft.com Abstract Parallel processing has hit mainstream computing in the form

More information

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links Filippo Costa on behalf of the ALICE DAQ group DATE software 2 DATE (ALICE Data Acquisition and Test Environment) ALICE is a

More information

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f. one large disk) Parallelism improves performance Plus extra disk(s) for redundant data storage Provides fault tolerant

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Chapter 6. 6.1 Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig. 6.1. I/O devices can be characterized by. I/O bus connections

Chapter 6. 6.1 Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig. 6.1. I/O devices can be characterized by. I/O bus connections Chapter 6 Storage and Other I/O Topics 6.1 Introduction I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections

More information

Hardware/Software Co-Design of a Java Virtual Machine

Hardware/Software Co-Design of a Java Virtual Machine Hardware/Software Co-Design of a Java Virtual Machine Kenneth B. Kent University of Victoria Dept. of Computer Science Victoria, British Columbia, Canada ken@csc.uvic.ca Micaela Serra University of Victoria

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card

Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card In-Su Yoon 1, Sang-Hwa Chung 1, Ben Lee 2, and Hyuk-Chul Kwon 1 1 Pusan National University School of Electrical and Computer

More information

HARDWARE ACCELERATION IN FINANCIAL MARKETS. A step change in speed

HARDWARE ACCELERATION IN FINANCIAL MARKETS. A step change in speed HARDWARE ACCELERATION IN FINANCIAL MARKETS A step change in speed NAME OF REPORT SECTION 3 HARDWARE ACCELERATION IN FINANCIAL MARKETS A step change in speed Faster is more profitable in the front office

More information

Getting the most TCP/IP from your Embedded Processor

Getting the most TCP/IP from your Embedded Processor Getting the most TCP/IP from your Embedded Processor Overview Introduction to TCP/IP Protocol Suite Embedded TCP/IP Applications TCP Termination Challenges TCP Acceleration Techniques 2 Getting the most

More information

Kirchhoff Institute for Physics Heidelberg

Kirchhoff Institute for Physics Heidelberg Kirchhoff Institute for Physics Heidelberg Norbert Abel FPGA: (re-)configuration and embedded Linux 1 Linux Front-end electronics based on ADC and digital signal processing Slow control implemented as

More information

Intel DPDK Boosts Server Appliance Performance White Paper

Intel DPDK Boosts Server Appliance Performance White Paper Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks

More information

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin BUS ARCHITECTURES Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin Keywords: Bus standards, PCI bus, ISA bus, Bus protocols, Serial Buses, USB, IEEE 1394

More information

7a. System-on-chip design and prototyping platforms

7a. System-on-chip design and prototyping platforms 7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit

More information

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to: 55 Topic 3 Computer Performance Contents 3.1 Introduction...................................... 56 3.2 Measuring performance............................... 56 3.2.1 Clock Speed.................................

More information

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand

More information

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Shaomeng Li, Jim Tørresen, Oddvar Søråsen Department of Informatics University of Oslo N-0316 Oslo, Norway {shaomenl, jimtoer,

More information

Multicore Programming with LabVIEW Technical Resource Guide

Multicore Programming with LabVIEW Technical Resource Guide Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN

More information

PCI vs. PCI Express vs. AGP

PCI vs. PCI Express vs. AGP PCI vs. PCI Express vs. AGP What is PCI Express? Introduction So you want to know about PCI Express? PCI Express is a recent feature addition to many new motherboards. PCI Express support can have a big

More information

3.4 Planning for PCI Express

3.4 Planning for PCI Express 3.4 Planning for PCI Express Evaluating Platforms for Performance and Reusability How many of you own a PC with PCIe slot? What about a PCI slot? 168 Advances in PC Bus Technology Do you remember this

More information

Hardware Level IO Benchmarking of PCI Express*

Hardware Level IO Benchmarking of PCI Express* White Paper James Coleman Performance Engineer Perry Taylor Performance Engineer Intel Corporation Hardware Level IO Benchmarking of PCI Express* December, 2008 321071 Executive Summary Understanding the

More information

VMWARE WHITE PAPER 1

VMWARE WHITE PAPER 1 1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and

More information

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING OBJECTIVE ANALYSIS WHITE PAPER MATCH ATCHING FLASH TO THE PROCESSOR Why Multithreading Requires Parallelized Flash T he computing community is at an important juncture: flash memory is now generally accepted

More information

Overlapping Data Transfer With Application Execution on Clusters

Overlapping Data Transfer With Application Execution on Clusters Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer

More information

Intel PCI and PCI Express*

Intel PCI and PCI Express* Intel PCI and PCI Express* PCI Express* keeps in step with an evolving industry The technology vision for PCI and PCI Express* From the first Peripheral Component Interconnect (PCI) specification through

More information

The proliferation of the raw processing

The proliferation of the raw processing TECHNOLOGY CONNECTED Advances with System Area Network Speeds Data Transfer between Servers with A new network switch technology is targeted to answer the phenomenal demands on intercommunication transfer

More information

White Paper Streaming Multichannel Uncompressed Video in the Broadcast Environment

White Paper Streaming Multichannel Uncompressed Video in the Broadcast Environment White Paper Multichannel Uncompressed in the Broadcast Environment Designing video equipment for streaming multiple uncompressed video signals is a new challenge, especially with the demand for high-definition

More information

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM

APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM 152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented

More information

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah (DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de jens_onno.krah@fh-koeln.de NIOS II 1 1 What is Nios II? Altera s Second Generation

More information

Application Note. Windows 2000/XP TCP Tuning for High Bandwidth Networks. mguard smart mguard PCI mguard blade

Application Note. Windows 2000/XP TCP Tuning for High Bandwidth Networks. mguard smart mguard PCI mguard blade Application Note Windows 2000/XP TCP Tuning for High Bandwidth Networks mguard smart mguard PCI mguard blade mguard industrial mguard delta Innominate Security Technologies AG Albert-Einstein-Str. 14 12489

More information

VDI Solutions - Advantages of Virtual Desktop Infrastructure

VDI Solutions - Advantages of Virtual Desktop Infrastructure VDI s Fatal Flaw V3 Solves the Latency Bottleneck A V3 Systems White Paper Table of Contents Executive Summary... 2 Section 1: Traditional VDI vs. V3 Systems VDI... 3 1a) Components of a Traditional VDI

More information

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

COMPUTER HARDWARE. Input- Output and Communication Memory Systems COMPUTER HARDWARE Input- Output and Communication Memory Systems Computer I/O I/O devices commonly found in Computer systems Keyboards Displays Printers Magnetic Drives Compact disk read only memory (CD-ROM)

More information

How To Design An Image Processing System On A Chip

How To Design An Image Processing System On A Chip RAPID PROTOTYPING PLATFORM FOR RECONFIGURABLE IMAGE PROCESSING B.Kovář 1, J. Kloub 1, J. Schier 1, A. Heřmánek 1, P. Zemčík 2, A. Herout 2 (1) Institute of Information Theory and Automation Academy of

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

USB readout board for PEBS Performance test

USB readout board for PEBS Performance test June 11, 2009 Version 1.0 USB readout board for PEBS Performance test Guido Haefeli 1 Li Liang 2 Abstract In the context of the PEBS [1] experiment a readout board was developed in order to facilitate

More information

Study and installation of a VOIP service on ipaq in Linux environment

Study and installation of a VOIP service on ipaq in Linux environment Study and installation of a VOIP service on ipaq in Linux environment Volkan Altuntas Chaba Ballo Olivier Dole Jean-Romain Gotteland ENSEIRB 2002 Summary 1. Introduction 2. Presentation a. ipaq s characteristics

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features

High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features UDC 621.395.31:681.3 High-Performance IP Service Node with Layer 4 to 7 Packet Processing Features VTsuneo Katsuyama VAkira Hakata VMasafumi Katoh VAkira Takeyama (Manuscript received February 27, 2001)

More information

1500 bytes 1308. Universal Serial Bus Bandwidth Analysis

1500 bytes 1308. Universal Serial Bus Bandwidth Analysis An Analysis of Throughput Characteristics of Universal Serial Bus John Garney, Media and Interconnect Technology, Intel Architecture Labs Abstract Universal Serial Bus (USB) is a new personal computer

More information

Performance Evaluation of Linux Bridge

Performance Evaluation of Linux Bridge Performance Evaluation of Linux Bridge James T. Yu School of Computer Science, Telecommunications, and Information System (CTI) DePaul University ABSTRACT This paper studies a unique network feature, Ethernet

More information

What is LOG Storm and what is it useful for?

What is LOG Storm and what is it useful for? What is LOG Storm and what is it useful for? LOG Storm is a high-speed digital data logger used for recording and analyzing the activity from embedded electronic systems digital bus and data lines. It

More information

A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT)

A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT) A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT) John W. Lockwood, Adwait Gupte, Nishit Mehta (Algo-Logic Systems) Michaela Blott, Tom English, Kees Vissers (Xilinx) August 22, 2012,

More information

Simplifying Embedded Hardware and Software Development with Targeted Reference Designs

Simplifying Embedded Hardware and Software Development with Targeted Reference Designs White Paper: Spartan-6 and Virtex-6 FPGAs WP358 (v1.0) December 8, 2009 Simplifying Embedded Hardware and Software Development with Targeted Reference Designs By: Navanee Sundaramoorthy FPGAs are becoming

More information

Latency in High Performance Trading Systems Feb 2010

Latency in High Performance Trading Systems Feb 2010 Latency in High Performance Trading Systems Feb 2010 Stephen Gibbs Automated Trading Group Overview Review the architecture of a typical automated trading system Review the major sources of latency, many

More information

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project Effects of Filler Traffic In IP Networks Adam Feldman April 5, 2001 Master s Project Abstract On the Internet, there is a well-documented requirement that much more bandwidth be available than is used

More information

PCI Express* Ethernet Networking

PCI Express* Ethernet Networking White Paper Intel PRO Network Adapters Network Performance Network Connectivity Express* Ethernet Networking Express*, a new third-generation input/output (I/O) standard, allows enhanced Ethernet network

More information

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that

More information

1 Storage Devices Summary

1 Storage Devices Summary Chapter 1 Storage Devices Summary Dependability is vital Suitable measures Latency how long to the first bit arrives Bandwidth/throughput how fast does stuff come through after the latency period Obvious

More information

Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation

Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation Iain Davison Chief Technology Officer Bricata, LLC WWW.BRICATA.COM The Need for Multi-Threaded, Multi-Core

More information

Influence of Load Balancing on Quality of Real Time Data Transmission*

Influence of Load Balancing on Quality of Real Time Data Transmission* SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 6, No. 3, December 2009, 515-524 UDK: 004.738.2 Influence of Load Balancing on Quality of Real Time Data Transmission* Nataša Maksić 1,a, Petar Knežević 2,

More information

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip

Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Design and Implementation of an On-Chip timing based Permutation Network for Multiprocessor system on Chip Ms Lavanya Thunuguntla 1, Saritha Sapa 2 1 Associate Professor, Department of ECE, HITAM, Telangana

More information

Network administrators must be aware that delay exists, and then design their network to bring end-to-end delay within acceptable limits.

Network administrators must be aware that delay exists, and then design their network to bring end-to-end delay within acceptable limits. Delay Need for a Delay Budget The end-to-end delay in a VoIP network is known as the delay budget. Network administrators must design a network to operate within an acceptable delay budget. This topic

More information

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

Introduction to PCI Express Positioning Information

Introduction to PCI Express Positioning Information Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that

More information

OpenFlow Based Load Balancing

OpenFlow Based Load Balancing OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple

More information

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Operating Systems and Networks Sample Solution 1

Operating Systems and Networks Sample Solution 1 Spring Term 2014 Operating Systems and Networks Sample Solution 1 1 byte = 8 bits 1 kilobyte = 1024 bytes 10 3 bytes 1 Network Performance 1.1 Delays Given a 1Gbps point to point copper wire (propagation

More information

White Paper. Recording Server Virtualization

White Paper. Recording Server Virtualization White Paper Recording Server Virtualization Prepared by: Mike Sherwood, Senior Solutions Engineer Milestone Systems 23 March 2011 Table of Contents Introduction... 3 Target audience and white paper purpose...

More information

C-GEP 100 Monitoring application user manual

C-GEP 100 Monitoring application user manual C-GEP 100 Monitoring application user manual 1 Introduction: C-GEP is a very versatile platform for network monitoring applications. The ever growing need for network bandwith like HD video streaming and

More information

Computer Networks. Definition of LAN. Connection of Network. Key Points of LAN. Lecture 06 Connecting Networks

Computer Networks. Definition of LAN. Connection of Network. Key Points of LAN. Lecture 06 Connecting Networks Computer Networks Lecture 06 Connecting Networks Kuang-hua Chen Department of Library and Information Science National Taiwan University Local Area Networks (LAN) 5 kilometer IEEE 802.3 Ethernet IEEE 802.4

More information

A Transport Protocol for Multimedia Wireless Sensor Networks

A Transport Protocol for Multimedia Wireless Sensor Networks A Transport Protocol for Multimedia Wireless Sensor Networks Duarte Meneses, António Grilo, Paulo Rogério Pereira 1 NGI'2011: A Transport Protocol for Multimedia Wireless Sensor Networks Introduction Wireless

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

A Reconfigurable and Programmable Gigabit Ethernet Network Interface Card

A Reconfigurable and Programmable Gigabit Ethernet Network Interface Card Rice University Department of Electrical and Computer Engineering Technical Report TREE0611 1 A Reconfigurable and Programmable Gigabit Ethernet Network Interface Card Jeffrey Shafer and Scott Rixner Rice

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

How To Monitor And Test An Ethernet Network On A Computer Or Network Card 3. MONITORING AND TESTING THE ETHERNET NETWORK 3.1 Introduction The following parameters are covered by the Ethernet performance metrics: Latency (delay) the amount of time required for a frame to travel

More information

Computer Organization & Architecture Lecture #19

Computer Organization & Architecture Lecture #19 Computer Organization & Architecture Lecture #19 Input/Output The computer system s I/O architecture is its interface to the outside world. This architecture is designed to provide a systematic means of

More information

September 25, 2007. Maya Gokhale Georgia Institute of Technology

September 25, 2007. Maya Gokhale Georgia Institute of Technology NAND Flash Storage for High Performance Computing Craig Ulmer cdulmer@sandia.gov September 25, 2007 Craig Ulmer Maya Gokhale Greg Diamos Michael Rewak SNL/CA, LLNL Georgia Institute of Technology University

More information

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU Savita Shiwani Computer Science,Gyan Vihar University, Rajasthan, India G.N. Purohit AIM & ACT, Banasthali University, Banasthali,

More information

Below is a diagram explaining the data packet and the timing related to the mouse clock while receiving a byte from the PS-2 mouse:

Below is a diagram explaining the data packet and the timing related to the mouse clock while receiving a byte from the PS-2 mouse: PS-2 Mouse: The Protocol: For out mini project we designed a serial port transmitter receiver, which uses the Baud rate protocol. The PS-2 port is similar to the serial port (performs the function of transmitting

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Getting Started with Embedded System Development using MicroBlaze processor & Spartan-3A FPGAs. MicroBlaze

Getting Started with Embedded System Development using MicroBlaze processor & Spartan-3A FPGAs. MicroBlaze Getting Started with Embedded System Development using MicroBlaze processor & Spartan-3A FPGAs This tutorial is an introduction to Embedded System development with the MicroBlaze soft processor and low

More information

Multiple Connection Telephone System with Voice Messaging

Multiple Connection Telephone System with Voice Messaging Multiple Connection Telephone System with Voice Messaging Rumen Hristov, Alan Medina 6.111 Project Proposal Fall 2015 Introduction We propose building a two-way telephone system. We will utilize two FPGAs,

More information

PCI Express: Interconnect of the future

PCI Express: Interconnect of the future PCI Express: Interconnect of the future There are many recent technologies that have signalled a shift in the way data is sent within a desktop computer in order to increase speed and efficiency. Universal

More information

FPGA-based MapReduce Framework for Machine Learning

FPGA-based MapReduce Framework for Machine Learning FPGA-based MapReduce Framework for Machine Learning Bo WANG 1, Yi SHAN 1, Jing YAN 2, Yu WANG 1, Ningyi XU 2, Huangzhong YANG 1 1 Department of Electronic Engineering Tsinghua University, Beijing, China

More information

Communication Protocol

Communication Protocol Analysis of the NXT Bluetooth Communication Protocol By Sivan Toledo September 2006 The NXT supports Bluetooth communication between a program running on the NXT and a program running on some other Bluetooth

More information

APRIL 2010 HIGH PERFORMANCE NETWORK SECURITY APPLIANCES

APRIL 2010 HIGH PERFORMANCE NETWORK SECURITY APPLIANCES APRIL 21 HIGH PERFORMANCE NETWORK SECURITY APPLIANCES The more you can process, the more value your network security appliance provides Disclaimer: This document is intended for informational purposes

More information

Practice #3: Receive, Process and Transmit

Practice #3: Receive, Process and Transmit INSTITUTO TECNOLOGICO Y DE ESTUDIOS SUPERIORES DE MONTERREY CAMPUS MONTERREY Pre-Practice: Objective Practice #3: Receive, Process and Transmit Learn how the C compiler works simulating a simple program

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Test Driven Development of Embedded Systems Using Existing Software Test Infrastructure

Test Driven Development of Embedded Systems Using Existing Software Test Infrastructure Test Driven Development of Embedded Systems Using Existing Software Test Infrastructure Micah Dowty University of Colorado at Boulder micah@navi.cx March 26, 2004 Abstract Traditional software development

More information

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB-02499-001_v02

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB-02499-001_v02 Technical Brief DualNet with Teaming Advanced Networking October 2006 TB-02499-001_v02 Table of Contents DualNet with Teaming...3 What Is DualNet?...3 Teaming...5 TCP/IP Acceleration...7 Home Gateway...9

More information

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1

find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1 Monitors Monitor: A tool used to observe the activities on a system. Usage: A system programmer may use a monitor to improve software performance. Find frequently used segments of the software. A systems

More information

Serial Communications

Serial Communications Serial Communications 1 Serial Communication Introduction Serial communication buses Asynchronous and synchronous communication UART block diagram UART clock requirements Programming the UARTs Operation

More information