Since the advent of electronic. Smart, Fast Trading Platforms Start with FPGAs
|
|
|
- Paul Hopkins
- 10 years ago
- Views:
Transcription
1 Smart, Fast Trading Platforms Start with FPGAs A new framework speeds application development of ultralow-latency financial systems. by Rafeh Hulays, PhD Vice President, Business Development AdvancedIO Systems Inc. [email protected] Since the advent of electronic trading, a race for speed has ensued to build the fastest and smartest trading platforms. Smart and fast translates into money. The trading platform that is able to identify a trading opportunity and execute on it first will win the day. Response time has decreased from seconds, to milliseconds, to microseconds. The drive for microsecond and submicrosecond response time is simply not possible with traditional software or simple hardware architectures, a fact that is propelling the adoption of FPGA technology in ultralow-latency systems. However, programming FPGAs requires the development and migration of existing trading strategies (largely written in C and C++) into HDL machine language. This is a fundamentally different set of skills than standard C and C++ programming and requires a new investment in people, tools and time. 00 Xcell Journal Second Quarter 2012
2 C-to-HDL compiler tools that port existing C code to a hardware description language (HDL) are available, and are faster than developing the code using HDL from scratch. However, the person working on porting the code must be familiar with hardware programming or must follow specific guidelines in order to generate acceptable code. Even then, the resulting code may be opaque and less efficient than code developed natively in HDL. To reduce the risk involved in developing HDL code natively on an FPGA Ethernet card while also slashing development time, AdvancedIO has pioneered the use of FPGA frameworks for 10-Gigabit Ethernet (10GE) communications. Our expressxg development framework tool set provides the infrastructure necessary to ensure rapid deployment of financial services and allows seamless portability to the latest generation of FPGA cards. It integrates and optimizes all necessary core functionality required to develop applications, minimizing third-party licensing costs and shaving months off the development cycle. Development teams can then integrate a C-to-HDL compiler tool into the framework and thereby have a choice of options for developing different parts of the applications. Indeed, both methods have a place within a project. Where efficiency and compact code are essential or where components are used in multiple projects, it s best to develop HDL components natively using the expressxg framework. Where timeto-market and customization are necessary, and where code is available in C, it may be desirable to use a C-to- HDL compiler tool. This approach addresses the fear barrier that many in the financial industry have with regard to FPGAs, without compromising performance. AN OVERVIEW OF ALGORITHMIC TRADING Trading in securities, derivatives, futures and other financial instruments invokes images of trading floors filled with hundreds of people shouting, milling about or peering intently at computer screens. The reality is that today, algorithms running on computer servers conduct more than 70 percent of trades in the United States. In the financial industry, speed matters. The company that can first detect and act upon an opportunity stands to profit handsomely. For some trading strategies, getting in the queue first wins the trade. No wonder, then, that companies go to great lengths to ensure they have an edge even if just a microsecond over the competition. In an algorithmic trading system (ATS), algorithms running on high-performance computers handle trades and make decisions. Such systems use multiple techniques to minimize latency and gain a competitive edge. These techniques include co-location with the different exchanges, use of shortestpath networks and, increasingly, utilizing 10GE to achieve lower latency. The major trading companies are adopting all of these techniques to gain an advantage over the competition. Brokers/Traders Real-Time Market Data Feed Order Management System (OMS) Quantitative Analysts Algorithmic Trading System (ATS) Exchange Interface Sever (EIS) Risk Management Control System (RMCS) Exchange Order Server/ Matching Engine -NYSE -EREX -CME -CBOE Historical Market Data System Figure 1 A simplified diagram of a trading ecosystem. The boxes outlined in blue are points where FPGAs will significantly improve performance. Second Quarter 2012 Xcell Journal 00
3 An FPGA brings a level of parallelism to financial applications that is impossible to match with general-purpose processors. Designers now are striving to pack as much functionality as possible into these FPGAs. Trading on a stock exchange is restricted to broker-dealers and market-making firms that are members of said exchange. Some of these firms provide services to other trading companies and enable them to use Direct Market Access to the exchange order book through their accounts. Firms providing Direct Market Access must implement pretrade risk-management controls in order to limit their financial exposure and to meet regulatory requirements. To do so, they must deploy servers with software to check on all trades going through their account. This must happen at wire speed, because any delay will put them and their clients at a disadvantage. Existing software-based pretrade risk-management platforms take on the order of tens of microseconds to perform the required policy check on financial transactions. Clearly, this is not fast enough for today s traders. Figure 1 shows the elements of a simplified trading system. In reality, an ATS may have many more elements, such as additional servers or computing clouds to perform more-sophisticated algorithms that we will not discuss here. A trading system may connect to multiple exchanges directly or through one or more exchange interface servers, electronic communications networks or other means. In addition, firewalls and intrusion-detection systems are key components of a robust trading infrastructure. LIMITATIONS OF SOFTWARE- BASED SOLUTIONS An ATS must be able to receive multiple market feeds, decode the different protocols used (and there are several, Response Time (µsec) CPU-based server with colorful names like FIX, FAST, ITCH and OUCH), filter the desired trading symbols and send them to predefined baskets that the trading algorithms (TAs) will analyze. Once the TAs decide that a trade must be acted upon, they send a trade request to the ordermanagement system (OMS), which communicates with the exchange interface server or with the exchanges themselves, places orders and receives confirmations. An ATS server can also stamp the incoming data with an ultraaccurate time stamp and send it for archiving. Time-stamping and market data capture can be done on either the same server (with the data going to a database server over an internal network) or a separate server. An ATS may receive data from multiple exchanges at an aggregate burst data rate exceeding 6 Gbits/second (Gbps). Recently, Nasdaq has declared its intention to offer connectivity at 40 Gbps to its data center customers, subject to regulatory approval. This increasing rate of data is putting a significant strain on system processors and is taking away from the important task of analyzing the data and making decisions on trades. Tests on file servers using commodity 10GE network interface cards have shown that when performing TCP transfers through their native stacks, even 2.2- GHz processors can be close to 100 percent utilized just processing the IP protocols, while only achieving data rates of less than 5 Gbps. To remedy this situation, one approach is to use a piece of hardware called a TCP offload engine (TOE) to accelerate the network stack. This yields some improvement in response time. Even with TOE being done on the network card, the amount of data that passes to the main processor is so large as to cause unacceptable delay. To reduce FPGA Acceleration Session Load Figure 2 Response time for applications run on a CPU-based server and those on an FPGA Ethernet card (FPGA acceleration) 00 Xcell Journal Second Quarter 2012
4 latency, it is necessary to filter the relevant data at the network card before passing it to the main system processor. When all of the workload is done at the FPGA level, the response time is substantially faster. In addition, an FPGA will provide the same response time regardless of the level of load. This is not true with a regular processor. When a processor is lightly loaded, the response time is predictable, but at high load the response time increases and becomes unpredictable. Today, communication between processors and Ethernet cards takes place via the PCI Express bus. In theory, an eight-lane PCI Express Gen 2 bus offers a peak throughput of 4 Gbps. However, PCI Express suffers from latencies inherent in device drivers and operating-system interrupt handling. There are clear advantages to performing financial computations on the network card without having to cross the PCI Express bus to the host processor. FPGA technology is an excellent candidate for financial applications because of its ability to process massive amounts of data in real time, with low latency and with consistency. Simply put, an FPGA brings a level of parallelism that is impossible to match with general-purpose processors. Because of the increased performance and blazing speed they are bringing to trading systems, designers are striving to pack as much functionality as possible (such as the OMS, which communicates with the exchanges, and some of the algorithms) into these FPGAs in order to reduce response time. Figure 2 illustrates the difference in response time between an FPGA-based and a CPU-based system. Not only is the FPGA considerably faster, but it also behaves consistently with increasing load, whereas the response time in a CPU-based solution increases significantly with load. FPGA TECHNOLOGY ADVANTAGES FPGA technology offers an unmatched capability to provide optimized solutions to the challenging problems encountered in high-bandwidth, real-time and computationally intensive financial trading applications. A solution where an FPGA executes trades at the Ethernet card has many advantages: Trade execution on an FPGA located close to the network s physical interface eliminates the latencies caused by the host bus, the host processor and the operating system. This drastically improves trading response time. FPGAs provide wire-speed performance, allowing the execution of the part of the algorithm that detects and acts on trading opportunities almost instantaneously, before others even notice the opportunity. FPGAs can be reprogrammed during operation, making it possible to change parameters and update algorithms to keep ahead of competitors. FPGAs are excellent at parallel processing, enabling them to act on multiple trades simultaneously. Ultralow-latency financial applications are challenging to implement since they must operate at wire speed at very high bandwidth and must execute complex algorithms. Special architectures, designed with the application in mind, are required to achieve cutting-edge performance. These must take into account the need to balance computationally intensive algorithms with the need for blazing response speed to trades. Designers must make a special effort to remove bottlenecks and minimize communication and processing latencies. To avoid latencies associated with the host system bus, it is important to choose powerful FPGAs to implement the essential parts of trading algorithms, along with generous amounts of SRAM and SDRAM memory and low-latency communication ports. Programming FPGAs using an offthe-shelf hardware module typically requires more effort and a specialized skill set than that required to create software for single or multicore processors. In addition, significant effort and investment are required to learn the documented and undocumented nuances of implementing FPGA solutions, especially at very high speeds. These characteristics may increase FPGAs perceived risk and implementation time frames, causing some project managers and teams to avoid them and to settle instead for suboptimal software implementations. Moreover, existing trading algorithms are written almost exclusively in C and C++, and porting these into HDL code is not trivial. There have been many approaches to simplify the task of programming FPGAs, such as embedded hard cores, software cores and tools to port C code to HDL languages. While each of these schemes has its place and each accelerates the deployment of applications, they all suffer from performance drawbacks. C-to-HDL tools are emerging as clear candidates to simplify the task of developing applications using FPGAs. However, the code these PCI Express Interface Wrapper Sandbox User Applications Software Libraries Drivers PCI Express Interface Wrapper Sandbox Figure 3 A complete system-level FPGA development framework solution includes optimized low-latency Linux drivers and APIs as well as the optimized controllers shown in Figure 4. Second Quarter 2012 Xcell Journal 00
5 tools generate is largely opaque and is rather difficult to optimize. Simply put, at times there is no substitute for direct HDL programming. NATIVE HDL VS. C-TO-HDL TOOLS A study done at the George Washington University [1, 2] reviewed the high-level language (HLL) tools that generate HDL code for FPGAs used in high-performance reconfigurable computers, and developed metrics to measure the efficiency and productivity of the various tools vs. native HDL. The researchers selected four workloads and drafted users of different levels of experience to implement the code using various tools. The results show that the HLL-to-HDL tools invariably shortened the development time by as much as 61 percent depending on the tool used. However, the results also show that the resulting code is invariably less efficient than native HDL code, with the area of utilization increasing by up to 36 percent and a frequency reduction of up to 50 percent. In addition, the research shows a considerable reduction in throughput when using some of these tools. [2] Moreover, the code these tools generated is opaque, making debugging on the HDL level challenging. We should note, however, that the four designs used in the test were relatively simple. Some of the financial algorithms and applications are substantially more complex, and implementing them in native HDL is more challenging. We expect that as more-complex algorithms are implemented in FPGA, we will see more savings in time when using HLL-to-HDL tools. This is especially true when the algorithms are already available in C and need to be ported into HDL rather than being written in C first and then ported. It is important to remember that C and HDL are fundamentally different, and not everything that is written in C translates well into HDL. As seen in Table 1, there are currently several variants or subsets of the C languages that vendors are using. There is an attempt under way to standardize on OpenCL, an open standard for cross-platform, parallel programming of modern processors that is being adapted for use on FPGAs. [3] FPGA DEVELOPMENT FRAMEWORK IN HDL Another approach to simplifying and shortening the development cycle is to use a framework written in native HDL which is highly optimized for latency and performance (Figures 3 and 4). The framework abstracts the details of Ethernet protocols and interfaces, memory controllers and host fabric interfaces, thereby reducing the development effort and schedule for designers to implement custom algorithms. This allows developers to focus 100 percent of their time on application development and integration rather than on getting all the external interfaces working on the FPGA card. A proper development framework must ensure application portability among FPGA device families and within the same family of cards. This significantly reduces the costs of future migration or upgrade cycles. We have implemented and validated our development framework on the various families of high-performance 10GE cards from AdvancedIO Systems, which are used for multiple applications in the Clock Logic Remote Upgrade UART PCI Express 10GbE SDRAM SDRAM Peripheral Packet Bus Packet Bus Memory Bus Memory Bus Peripheral Bus FIFO Packet Processing Memory Tester Memory Tester Register File Sandbox Figure 4 A high-level view of the expressxg FPGA development framework from AdvancedIO. The top-row components (in blue) are the controllers integrated into the hardware. The components at bottom provide an infrastructure for development. 00 Xcell Journal First Quarter 2012
6 Tool Development Time (hrs) defense, financials and telecommunications markets. The V5022 card, optimized for financial trading, features the Xilinx Virtex -6 HXT family of FPGAs and has Frequency (MHz) Area (% Utilization) 1 C Impulse-C Handel-C Carte-C Mitrion-C SysGen RC Toolbox HDLs Table 1 This comparison of HLL-to-HDL tools shows that while they deliver a considerable saving in time, the resulting code is not as efficient as that created natively in HDL. [1] all the necessary elements that an application architect needs when implementing a high-capacity, ultralowlatency trading solution. The Virtex-6 HXT family offers a large number of logic resources for complex algorithm implementations. It has two independent banks of up to 8-Gbyte, 533-MHz DDR3 SDRAM and four independent banks of up to 144-Mbit, 350-MHz QDRII+ SRAM, ideal for advanced algorithms requiring buffering or ultrafast lookup tables. The V5022 card has four 10GE ports and offers a significant reduction in latency measured from the optical cable to the MAC interface (Layer 2) inside the FPGA device. The V5022 supports a PCI Express Gen 2 host interface. An intercard high-speed port ensures ultrafast communications between different cards in the system without the need to use the host system bus, providing a further reduction in latency. This supplies the trading platform with additional high-speed processing capability for the implementation of more-complex trading algorithms. The development framework provides the major functionality that must be integrated into the board to ensure FRONT PANEL Intercard Connector Dual BPI Flash DDR3-1 SDRAM X64 / 533MHz DDR3-2 SDRAM X64 / 533MHz Serial EPROM SFP + module LED SFP + module LED SFP + module LED SFP + module FPGA Xilinx Virtex-6 HX380T HX565T JTAG PCIe x8 QDRII+ SRAM 1 18Bit / 350MHz QDRII+ SRAM 2 18Bit / 350MHz QDRII+ SRAM 3 18Bit / 350MHz QDRII+ SRAM 4 18Bit / 350MHz Thermal Sensor POWER LED PCIe Edge Connector Figure 5 A high-level diagram for the V5022 shows the different elements that an FPGA designer needs to communicate with when implementing a solution. Second Quarter 2012 Xcell Journal 00
7 that everything works the way it should. Suffice it to say that the logistics of acquiring, integrating and optimizing such functionality is a costly and time-consuming proposition. Hence, the development framework offers a great value for project managers and help with time-to-market. AdvancedIO has done extensive work to ensure that its framework is optimized to provide the best performance possible and that it occupies a small footprint and performs efficiently. This leaves the bulk of the FPGA resources available for the development of applications. For example, on the V5022 and when using the Xilinx Virtex-6 HX565T, the framework occupies less than 7.5 percent of the resources of the FPGA. This includes a PCIe interface, four 10GE interfaces, two SDRAM controllers and four SRAM controllers. The development framework provides a sandbox where programmers can develop their applications. It has easy-to-understand interfaces to the outside world, allowing the quick integration of applications on the FPGA card. Sample code and examples are provided to showcase how to exercise the interfaces and get data flowing through the system right out of the box, giving developers more confidence. SHAVING MONTHS OFF DEVELOPMENT CYCLE To reduce the risk involved in developing HDL code on FPGA devices and to reduce development time as well, our FPGA development framework integrates and optimizes all the controllers necessary to develop applications on the FPGA card. This shaves at least many months off the development of projects. We have also proposed the integration of a C-to-HDL compiler tool into the development framework, as studies have shown that while the code it generates is less efficient than native HDL coding, it delivers a significant reduction in development time. Figure 6 The V5022 is a single-slot, half-length card deployed worldwide in financial trading systems. In real-world scenarios, there is no single approach that works best for all situations. A design team should have many tools and choices when weighing development options. Where efficiency and compact code are essential or where components are used in multiple projects, the best choice is to develop HDL components using the FPGA development framework. Where speed getting to market and customization are necessary, and where code is already available in C, it may be desirable to use a C-to-HDL compiler tool. It s best to develop fixed-function modules such as TCP/IP and UDP/IP stacks in HDL using the FPGA development framework, while algorithms that require frequent changes can make use of User Applications Trading Logic OMS Filtering MDS FIX FAST OUCH ITCH TCP UDP IP Figure 7 -- Among the elements of an ATS, the blue lines indicate the software components in which FPGAs will increase performance. high-level language tools such as the Xilinx AutoESL [4] or the Impulse [5] high-level synthesis tools. APPLICATION TO FINANCIAL TRADING SYSTEMS Figure 7 shows the different parts that an algorithmic trading system needs to implement at a very high level. An ATS has to be able to read one or more market data feeds, perform filtering on this data against different baskets, undertake analysis, make trading decisions and communicate with one or more exchanges. Trading logic and strategy components are subject to frequent changes and are specific to different trading companies. They are perfect candidates to be implemented using a C-to-HDL compiler in order to meet time-to-market requirements and, if the market demands, move to a more-efficient implementation at a later stage (using the FPGA development framework). On the other side, network and financial protocols do not change often and an efficient implementation of such protocols significantly affects the performance of the system. Therefore, we recommend implementing these natively in HDL using the expressxg development framework. 00 Xcell Journal First Quarter 2012
8 SPEED, RESPONSIVENESS AND PREDICTABILITY In the financial trading sector, the drive for speed and ultralow latency is necessitating the adoption of FPGA technology. Users in this sector face many challenges, including unfamiliarity with FPGA design, different skill sets and a large existing code base in high-level languages, among others. Our expressxg FPGA development framework promises to simplify and shorten the development of applications on FPGA-powered high-performance Ethernet PCI Express cards. To assist in the porting of existing C code or where time-to-market is paramount, we propose integrating a C-to- HDL compiler within this framework. We believe that expressxg, integrated with a C-to-HDL compiler, will assist in the adoption of FPGA technology, improving the speed, responsiveness and predictability of trading systems. For more details on the expressxg system, see Get Published Would you like to write for Xcell Publications? References 1. E. El-Araby, S.G. Merchant and T. El-Ghazawi, A Framework for Evaluating High-Level Design Methodologies for High-Performance Reconfigurable Computers. IEEE Transactions on Parallel and Distributed Systems, Vol. 22, No. 1., pp , January Abstract available at xpl/freeabs_all.jsp?arnumber= E. El-Araby, M. Taher, M. Abouellail, T. El-Ghazawi and G.B. Newby, Comparative Analysis of High-Level Programming for Reconfigurable Computers: Methodology and Empirical Study. Proceedings of the Third Southern Conference on Programmable Logic (SPL 07), February Abstract available at xpl/freeabs_all.jsp?arnumber= Chronus Group, 4. Xilinx, Inc., 5. Impulse Accelerated Technologies, It's easier than you think! Submit an article draft for our Web-based or printed publications and we will assign an editor and a graphic artist to work with you to make your work look as good as possible. For more information on this exciting and highly rewarding program, please contact: Mike Santarini Publisher, Xcell Publications [email protected] See all the new publications on our website. Second Quarter 2012 Xcell Journal 00
Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data
White Paper Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data What You Will Learn Financial market technology is advancing at a rapid pace. The integration of
7a. System-on-chip design and prototyping platforms
7a. System-on-chip design and prototyping platforms Labros Bisdounis, Ph.D. Department of Computer and Communication Engineering 1 What is System-on-Chip (SoC)? System-on-chip is an integrated circuit
A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT)
A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT) John W. Lockwood, Adwait Gupte, Nishit Mehta (Algo-Logic Systems) Michaela Blott, Tom English, Kees Vissers (Xilinx) August 22, 2012,
Open Flow Controller and Switch Datasheet
Open Flow Controller and Switch Datasheet California State University Chico Alan Braithwaite Spring 2013 Block Diagram Figure 1. High Level Block Diagram The project will consist of a network development
White Paper Solarflare High-Performance Computing (HPC) Applications
Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters
Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC
Nutaq PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET QUEBEC I MONTREAL I N E W YO R K I nutaq.com Nutaq PicoDigitizer 125-Series The PicoDigitizer 125-Series
Data Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
The Next Generation Trading Infrastructure What You Will Learn. The State of High-Performance Trading Architecture
The Next Generation Trading Infrastructure What You Will Learn Optimizing latency and throughput is essential to high performance trading companies. They are challenged, however, by the complexity of their
HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring
CESNET Technical Report 2/2014 HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring VIKTOR PUš, LUKÁš KEKELY, MARTIN ŠPINLER, VÁCLAV HUMMEL, JAN PALIČKA Received 3. 10. 2014 Abstract
Enabling Cloud Architecture for Globally Distributed Applications
The increasingly on demand nature of enterprise and consumer services is driving more companies to execute business processes in real-time and give users information in a more realtime, self-service manner.
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment
The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment Introduction... 2 Virtualization addresses key challenges facing IT today... 2 Introducing Virtuozzo... 2 A virtualized environment
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller
White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller
CFD Implementation with In-Socket FPGA Accelerators
CFD Implementation with In-Socket FPGA Accelerators Ivan Gonzalez UAM Team at DOVRES FuSim-E Programme Symposium: CFD on Future Architectures C 2 A 2 S 2 E DLR Braunschweig 14 th -15 th October 2009 Outline
XMC Modules. XMC-6260-CC 10-Gigabit Ethernet Interface Module with Dual XAUI Ports. Description. Key Features & Benefits
XMC-6260-CC 10-Gigabit Interface Module with Dual XAUI Ports XMC module with TCP/IP offload engine ASIC Dual XAUI 10GBASE-KX4 ports PCIe x8 Gen2 Description Acromag s XMC-6260-CC provides a 10-gigabit
Evaluation Report: Emulex OCe14102 10GbE and OCe14401 40GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters
Evaluation Report: Emulex OCe14102 10GbE and OCe14401 40GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters Evaluation report prepared under contract with Emulex Executive Summary As
Solving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand
Flash Memory Arrays Enabling the Virtualized Data Center. July 2010
Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,
Application Server Platform Architecture. IEI Application Server Platform for Communication Appliance
IEI Architecture IEI Benefits Architecture Traditional Architecture Direct in-and-out air flow - direction air flow prevents raiser cards or expansion cards from blocking the air circulation High-bandwidth
VMWARE WHITE PAPER 1
1 VMWARE WHITE PAPER Introduction This paper outlines the considerations that affect network throughput. The paper examines the applications deployed on top of a virtual infrastructure and discusses the
Accelerate Cloud Computing with the Xilinx Zynq SoC
X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
Architecture of distributed network processors: specifics of application in information security systems
Architecture of distributed network processors: specifics of application in information security systems V.Zaborovsky, Politechnical University, Sait-Petersburg, Russia [email protected] 1. Introduction Modern
How To Monitor And Test An Ethernet Network On A Computer Or Network Card
3. MONITORING AND TESTING THE ETHERNET NETWORK 3.1 Introduction The following parameters are covered by the Ethernet performance metrics: Latency (delay) the amount of time required for a frame to travel
VDI Solutions - Advantages of Virtual Desktop Infrastructure
VDI s Fatal Flaw V3 Solves the Latency Bottleneck A V3 Systems White Paper Table of Contents Executive Summary... 2 Section 1: Traditional VDI vs. V3 Systems VDI... 3 1a) Components of a Traditional VDI
Gigabit Ethernet Design
Gigabit Ethernet Design Laura Jeanne Knapp Network Consultant 1-919-254-8801 [email protected] www.lauraknapp.com Tom Hadley Network Consultant 1-919-301-3052 [email protected] HSEdes_ 010 ed and
100 Gigabit Ethernet is Here!
100 Gigabit Ethernet is Here! Introduction Ethernet technology has come a long way since its humble beginning in 1973 at Xerox PARC. With each subsequent iteration, there has been a lag between time of
How to Build a Massively Scalable Next-Generation Firewall
How to Build a Massively Scalable Next-Generation Firewall Seven measures of scalability, and how to use them to evaluate NGFWs Scalable is not just big or fast. When it comes to advanced technologies
Extreme Networks: Building Cloud-Scale Networks Using Open Fabric Architectures A SOLUTION WHITE PAPER
Extreme Networks: Building Cloud-Scale Networks Using Open Fabric Architectures A SOLUTION WHITE PAPER WHITE PAPER Building Cloud- Scale Networks Abstract TABLE OF CONTENTS Introduction 2 Open Fabric-Based
Observer Analysis Advantages
In-Depth Analysis for Gigabit and 10 Gb Networks For enterprise management, gigabit and 10 Gb Ethernet networks mean high-speed communication, on-demand systems, and improved business functions. For enterprise
3G Converged-NICs A Platform for Server I/O to Converged Networks
White Paper 3G Converged-NICs A Platform for Server I/O to Converged Networks This document helps those responsible for connecting servers to networks achieve network convergence by providing an overview
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
Linux. Reverse Debugging. Target Communication Framework. Nexus. Intel Trace Hub GDB. PIL Simulation CONTENTS
Android NEWS 2016 AUTOSAR Linux Windows 10 Reverse ging Target Communication Framework ARM CoreSight Requirements Analysis Nexus Timing Tools Intel Trace Hub GDB Unit Testing PIL Simulation Infineon MCDS
Optimizing Data Center Networks for Cloud Computing
PRAMAK 1 Optimizing Data Center Networks for Cloud Computing Data Center networks have evolved over time as the nature of computing changed. They evolved to handle the computing models based on main-frames,
PCI Express* Ethernet Networking
White Paper Intel PRO Network Adapters Network Performance Network Connectivity Express* Ethernet Networking Express*, a new third-generation input/output (I/O) standard, allows enhanced Ethernet network
How To Make Gigabit Ethernet A Reality
Off-loading TCP/IP into hardware makes Gigabit Ethernet a reality for your application Coupling a TCP/IP Offload Engine (TOE) with FPGA technology can deliver over 100MBytes/s data rate (in each direction),
Low Latency Market Data and Ticker Plant Technology. SpryWare.
Low Latency Market Data and Ticker Plant Technology. SpryWare. Direct Feeds Ultra Low Latency Extreme Capacity High Throughput Fully Scalable SpryWare s state-of-the-art Ticker Plant technology enables
FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25
FPGA Acceleration using OpenCL & PCIe Accelerators MEW 25 December 2014 FPGAs in the news» Catapult» Accelerate BING» 2x search acceleration:» ½ the number of servers»
Upgrading Data Center Network Architecture to 10 Gigabit Ethernet
Intel IT IT Best Practices Data Centers January 2011 Upgrading Data Center Network Architecture to 10 Gigabit Ethernet Executive Overview Upgrading our network architecture will optimize our data center
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
Getting the most TCP/IP from your Embedded Processor
Getting the most TCP/IP from your Embedded Processor Overview Introduction to TCP/IP Protocol Suite Embedded TCP/IP Applications TCP Termination Challenges TCP Acceleration Techniques 2 Getting the most
Cisco Integrated Services Routers Performance Overview
Integrated Services Routers Performance Overview What You Will Learn The Integrated Services Routers Generation 2 (ISR G2) provide a robust platform for delivering WAN services, unified communications,
Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com
Best Practises for LabVIEW FPGA Design Flow 1 Agenda Overall Application Design Flow Host, Real-Time and FPGA LabVIEW FPGA Architecture Development FPGA Design Flow Common FPGA Architectures Testing and
How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet
How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet Professor Jiann-Liang Chen Friday, September 23, 2011 Wireless Networks and Evolutional Communications Laboratory
Fibre Channel over Ethernet in the Data Center: An Introduction
Fibre Channel over Ethernet in the Data Center: An Introduction Introduction Fibre Channel over Ethernet (FCoE) is a newly proposed standard that is being developed by INCITS T11. The FCoE protocol specification
Mellanox Academy Online Training (E-learning)
Mellanox Academy Online Training (E-learning) 2013-2014 30 P age Mellanox offers a variety of training methods and learning solutions for instructor-led training classes and remote online learning (e-learning),
Switching Architectures for Cloud Network Designs
Overview Networks today require predictable performance and are much more aware of application flows than traditional networks with static addressing of devices. Enterprise networks in the past were designed
high-performance computing so you can move your enterprise forward
Whether targeted to HPC or embedded applications, Pico Computing s modular and highly-scalable architecture, based on Field Programmable Gate Array (FPGA) technologies, brings orders-of-magnitude performance
Achieving Low-Latency Security
Achieving Low-Latency Security In Today's Competitive, Regulatory and High-Speed Transaction Environment Darren Turnbull, VP Strategic Solutions - Fortinet Agenda 1 2 3 Firewall Architecture Typical Requirements
Performance Beyond PCI Express: Moving Storage to The Memory Bus A Technical Whitepaper
: Moving Storage to The Memory Bus A Technical Whitepaper By Stephen Foskett April 2014 2 Introduction In the quest to eliminate bottlenecks and improve system performance, the state of the art has continually
How Solace Message Routers Reduce the Cost of IT Infrastructure
How Message Routers Reduce the Cost of IT Infrastructure This paper explains how s innovative solution can significantly reduce the total cost of ownership of your messaging middleware platform and IT
Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB-02499-001_v02
Technical Brief DualNet with Teaming Advanced Networking October 2006 TB-02499-001_v02 Table of Contents DualNet with Teaming...3 What Is DualNet?...3 Teaming...5 TCP/IP Acceleration...7 Home Gateway...9
Nine Use Cases for Endace Systems in a Modern Trading Environment
FINANCIAL SERVICES OVERVIEW Nine Use Cases for Endace Systems in a Modern Trading Environment Introduction High-frequency trading (HFT) accounts for as much as 75% of equity trades in the US. As capital
Beyond the Data Center: How Network-Function Virtualization Enables New Customer-Premise Services
Beyond the Data Center: How Network-Function Virtualization Enables New Customer-Premise Services By Tom R. Halfhill Senior Analyst February 2016 www.linleygroup.com Beyond the Data Center: How Network-Function
The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group
The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links Filippo Costa on behalf of the ALICE DAQ group DATE software 2 DATE (ALICE Data Acquisition and Test Environment) ALICE is a
Understanding the Performance of an X550 11-User Environment
Understanding the Performance of an X550 11-User Environment Overview NComputing's desktop virtualization technology enables significantly lower computing costs by letting multiple users share a single
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R O r a c l e V i r t u a l N e t w o r k i n g D e l i v e r i n g F a b r i c
Architecting Low Latency Cloud Networks
Architecting Low Latency Cloud Networks Introduction: Application Response Time is Critical in Cloud Environments As data centers transition to next generation virtualized & elastic cloud architectures,
Network connectivity controllers
Network connectivity controllers High performance connectivity solutions Factory Automation The hostile environment of many factories can have a significant impact on the life expectancy of PCs, and industrially
Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University
Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University Napatech - Sharkfest 2009 1 Presentation Overview About Napatech
Why 25GE is the Best Choice for Data Centers
Why 25GE is the Best Choice for Data Centers Gilles Garcia Xilinx Director Wired Communication Business Santa Clara, CA USA April 2015 1 Outline - update Data center drivers Why 25GE The need for 25GE
High-Density Network Flow Monitoring
Petr Velan [email protected] High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible
Accelerating Data Compression with Intel Multi-Core Processors
Case Study Predictive Enterprise Intel Xeon processors Intel Server Board Embedded technology Accelerating Data Compression with Intel Multi-Core Processors Data Domain incorporates Multi-Core Intel Xeon
Simplify VMware vsphere* 4 Networking with Intel Ethernet 10 Gigabit Server Adapters
WHITE PAPER Intel Ethernet 10 Gigabit Server Adapters vsphere* 4 Simplify vsphere* 4 Networking with Intel Ethernet 10 Gigabit Server Adapters Today s Intel Ethernet 10 Gigabit Server Adapters can greatly
10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications
10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications Testing conducted by Solarflare Communications and Arista Networks shows that
Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC)
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE CCNC 2006 proceedings. Putting it on the NIC: A Case Study on application
Brocade Solution for EMC VSPEX Server Virtualization
Reference Architecture Brocade Solution Blueprint Brocade Solution for EMC VSPEX Server Virtualization Microsoft Hyper-V for 50 & 100 Virtual Machines Enabled by Microsoft Hyper-V, Brocade ICX series switch,
10 Gigabit Ethernet: Scaling across LAN, MAN, WAN
Arasan Chip Systems Inc. White Paper 10 Gigabit Ethernet: Scaling across LAN, MAN, WAN By Dennis McCarty March 2011 Overview Ethernet is one of the few protocols that has increased its bandwidth, while
Seeking Opportunities for Hardware Acceleration in Big Data Analytics
Seeking Opportunities for Hardware Acceleration in Big Data Analytics Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Who
Unified Computing Systems
Unified Computing Systems Cisco Unified Computing Systems simplify your data center architecture; reduce the number of devices to purchase, deploy, and maintain; and improve speed and agility. Cisco Unified
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Von der Hardware zur Software in FPGAs mit Embedded Prozessoren. Alexander Hahn Senior Field Application Engineer Lattice Semiconductor
Von der Hardware zur Software in FPGAs mit Embedded Prozessoren Alexander Hahn Senior Field Application Engineer Lattice Semiconductor AGENDA Overview Mico32 Embedded Processor Development Tool Chain HW/SW
HARDWARE ACCELERATION IN FINANCIAL MARKETS. A step change in speed
HARDWARE ACCELERATION IN FINANCIAL MARKETS A step change in speed NAME OF REPORT SECTION 3 HARDWARE ACCELERATION IN FINANCIAL MARKETS A step change in speed Faster is more profitable in the front office
Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability
White Paper Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability The new TCP Chimney Offload Architecture from Microsoft enables offload of the TCP protocol
SummitStack in the Data Center
SummitStack in the Data Center Abstract: This white paper describes the challenges in the virtualized server environment and the solution that Extreme Networks offers a highly virtualized, centrally manageable
Going to the wire: The next generation financial risk management platform
Going to the wire: The next generation financial risk management platform Ari Studnitzer, MD of Platform Development, CME Group Oskar Mencer, CEO and Founder, Maxeler CME Group CME Group is the world s
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database
Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a
Windows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration Table of Contents Overview of Windows Server 2008 R2 Hyper-V Features... 3 Dynamic VM storage... 3 Enhanced Processor Support... 3 Enhanced Networking Support...
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada [email protected] Micaela Serra
Meeting the Five Key Needs of Next-Generation Cloud Computing Networks with 10 GbE
White Paper Meeting the Five Key Needs of Next-Generation Cloud Computing Networks Cloud computing promises to bring scalable processing capacity to a wide range of applications in a cost-effective manner.
The Role of Precise Timing in High-Speed, Low-Latency Trading
The Role of Precise Timing in High-Speed, Low-Latency Trading The race to zero nanoseconds Whether measuring network latency or comparing real-time trading data from different computers on the planet,
Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware
Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware Shaomeng Li, Jim Tørresen, Oddvar Søråsen Department of Informatics University of Oslo N-0316 Oslo, Norway {shaomenl, jimtoer,
The proliferation of the raw processing
TECHNOLOGY CONNECTED Advances with System Area Network Speeds Data Transfer between Servers with A new network switch technology is targeted to answer the phenomenal demands on intercommunication transfer
Simplifying Embedded Hardware and Software Development with Targeted Reference Designs
White Paper: Spartan-6 and Virtex-6 FPGAs WP358 (v1.0) December 8, 2009 Simplifying Embedded Hardware and Software Development with Targeted Reference Designs By: Navanee Sundaramoorthy FPGAs are becoming
Brennan Hughes Partner WWFIRS, LLC 312.924.1458 [email protected]
Brennan Hughes Partner WWFIRS, LLC 312.924.1458 [email protected] A program trading platform that uses powerful computers to transact a large number of orders at very fast speeds. High-frequency trading
What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.
What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications. 2 Contents: Abstract 3 What does DDS do 3 The Strengths of DDS 4
Observer Probe Family
Observer Probe Family Distributed analysis for local and remote networks Monitor and troubleshoot vital network links in real time from any location Network Instruments offers a complete line of software
Definition of a White Box. Benefits of White Boxes
Smart Network Processing for White Boxes Sandeep Shah Director, Systems Architecture EZchip Technologies [email protected] Linley Carrier Conference June 10-11, 2014 Santa Clara, CA 1 EZchip Overview
Frequently Asked Questions
Frequently Asked Questions 1. Q: What is the Network Data Tunnel? A: Network Data Tunnel (NDT) is a software-based solution that accelerates data transfer in point-to-point or point-to-multipoint network
Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine
Where IT perceptions are reality Test Report OCe14000 Performance Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine Document # TEST2014001 v9, October 2014 Copyright 2014 IT Brand
VPX Implementation Serves Shipboard Search and Track Needs
VPX Implementation Serves Shipboard Search and Track Needs By: Thierry Wastiaux, Senior Vice President Interface Concept Defending against anti-ship missiles is a problem for which high-performance computing
Accelerating High-Speed Networking with Intel I/O Acceleration Technology
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai 2007. Jens Onno Krah
(DSF) Soft Core Prozessor NIOS II Stand Mai 2007 Jens Onno Krah Cologne University of Applied Sciences www.fh-koeln.de [email protected] NIOS II 1 1 What is Nios II? Altera s Second Generation
Hardware Acceleration for High-density Datacenter Monitoring
Hardware Acceleration for High-density Datacenter Monitoring Datacenter IaaS Workshop 2014 Denis Matoušek [email protected] Company Introduction Czech university spin-off company Tight cooperation with
HP Moonshot System. Table of contents. A new style of IT accelerating innovation at scale. Technical white paper
Technical white paper HP Moonshot System A new style of IT accelerating innovation at scale Table of contents Abstract... 2 Meeting the new style of IT requirements... 2 What makes the HP Moonshot System
