A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT)



Similar documents
Brennan Hughes Partner WWFIRS, LLC

Networking Virtualization Using FPGAs

Latency in High Performance Trading Systems Feb 2010

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

Data Center and Cloud Computing Market Landscape and Challenges

Open Flow Controller and Switch Datasheet

Xeon+FPGA Platform for the Data Center

RoCE vs. iwarp Competitive Analysis

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring

10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications

High Frequency Trading Acceleration using FPGAs

10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications

High-performance vswitch of the user, by the user, for the user

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

Stateful Inspection Technology

Wireshark in a Multi-Core Environment Using Hardware Acceleration Presenter: Pete Sanders, Napatech Inc. Sharkfest 2009 Stanford University

Definition of a White Box. Benefits of White Boxes

Router Architectures

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Gigabit Ethernet Design

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

A way towards Lower Latency and Jitter

High-Density Network Flow Monitoring

Stress-Testing a Gbps Intrusion Prevention Device on DETER

QoS & Traffic Management

The Next Generation Trading Infrastructure What You Will Learn. The State of High-Performance Trading Architecture

Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC)

TickView Live Feed. Product Overview. April Document Number:

Performance Evaluation of Linux Bridge

Low Latency Test Report Ultra-Low Latency 10GbE Switch and Adapter Testing Bruce Tolley, PhD, Solarflare

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Network Performance Evaluation of Latest Windows Operating Systems

Datacenter Operating Systems

Research Report: The Arista 7124FX Switch as a High Performance Trade Execution Platform

- An Essential Building Block for Stable and Reliable Compute Clusters

1000Mbps Ethernet Performance Test Report

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

Next Generation Operating Systems

Foundation for High-Performance, Open and Flexible Software and Services in the Carrier Network. Sandeep Shah Director, Systems Architecture EZchip

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

Embedded Systems: map to FPGA, GPU, CPU?

SMB Direct for SQL Server and Private Cloud

IxChariot Virtualization Performance Test Plan

Since the advent of electronic. Smart, Fast Trading Platforms Start with FPGAs

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation

DSL Programmable Engine for High Frequency Trading Acceleration

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to

Ultra-Low Latency, High Density 48 port Switch and Adapter Testing

Direct GPU/FPGA Communication Via PCI Express

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Monitoring of Tunneled IPv6 Traffic Using Packet Decapsulation and IPFIX

Getting the most TCP/IP from your Embedded Processor

Design Issues in a Bare PC Web Server

Optimising the resource utilisation in high-speed network intrusion detection systems.

Frequently Asked Questions

Informatica Ultra Messaging SMX Shared-Memory Transport

Hardwired: FX trading in a chip

Low Latency Market Data and Ticker Plant Technology. SpryWare.

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment

Enabling Open-Source High Speed Network Monitoring on NetFPGA

Your Old Stack is Slowing You Down. Ajay Patel, Vice President, Fusion Middleware

HP Mellanox Low Latency Benchmark Report 2012 Benchmark Report

I3: Maximizing Packet Capture Performance. Andrew Brown

Hyper-V R2: What's New?

Distributed Compute for Both Performance and Cost Optimization - Maximizing Data Center Performance in Financial Services

Emerging storage and HPC technologies to accelerate big data analytics Jerome Gaysse JG Consulting

Performance of Software Switching

BME Direct Market Access Services the fastest direct access for non member firms

THE USE OF PTP IN FINANCIAL APPLICATIONS PIERRE BICHON, SR. CONSULTING ENGINEER, EMEA ITSF, EDINBURGH, NOVEMBER 2ND, 2011

10 Gbps Line Speed Programmable Hardware for Open Source Network Applications*

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing

Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect

DPDK Summit 2014 DPDK in a Virtual World

IEEE Congestion Management Presentation for IEEE Congestion Management Study Group

Building an Accelerated and Energy-Efficient Traffic Monitor onto the NetFPGA platform

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

Why 25GE is the Best Choice for Data Centers

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Arista Application Switch: Q&A

Parallel Firewalls on General-Purpose Graphics Processing Units

Introduction to SFC91xx ASIC Family

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Layered Protocol Wrappers for Internet Packet Processing in Reconfigurable Hardware

FPGA-based MapReduce Framework for Machine Learning

Architectures and Platforms

Challenges in high speed packet processing

Architectural Breakdown of End-to-End Latency in a TCP/IP Network

Intel Xeon +FPGA Platform for the Data Center

Understanding Latency in IP Telephony

How To Improve Performance On A Linux Based Router

High-Density Network Flow Monitoring

RDMA Performance in Virtual Machines using QDR InfiniBand on VMware vsphere 5 R E S E A R C H N O T E

Exploiting Stateful Inspection of Network Security in Reconfigurable Hardware

White Paper Solarflare High-Performance Computing (HPC) Applications

Transcription:

A Low Latency Library in FPGA Hardware for High Frequency Trading (HFT) John W. Lockwood, Adwait Gupte, Nishit Mehta (Algo-Logic Systems) Michaela Blott, Tom English, Kees Vissers (Xilinx) August 22, 2012, Santa Clara, CA Algo-Logic Systems Inc., All rights reserved 1

2 Outline Introduction High Frequency Trading (HFT) Survey of HFT Platforms Software, Hardware, and Hybrid Approaches Field Programmable Gate Arrays (FPGAs) Advantages and Disadvantages Algo-Logic s Low Latency Library Implementation on NetFPGA Platform Exposure and Position Tracking Application Protocols Supported Results

3 High Frequency Trading (HFT) HFT is Trading of equities, options, futures at high speed in large volumes Earning money by exploiting the fleeting variation in stock price or demand HFT accounts 70% of all trades in US Markets in 2010 And it continues to grow HFT involves Using computers to place orders based on pre-defined algorithms

4 Challenges in Financial Markets Main Challenges Latency Execute orders faster than other investors to capture fleeting variations in price and demand in the markets Jitter Provide consistent and fair executions Secondary challenges Throughput Handle large volume of orders Flexibility Adapt to changing risks and trading strategies

5 Recent Problems in HFT Knightmare (Knight Capital) Test script executed live trades $450M loss in 45 minutes Nasdaq - Facebook IPO Order confirmations delayed $62M loss in direct damages BATS Failure Software bug in order auctions Forced to cancel IPO Have not only hurt the banks/institutions financially, but also the credibility of the market

6 Latency in Current Approaches Software Linux E NICs 15-20 µs for Half-Round Trip Time (½ RTT) through un-optimized kernel TCP Offload: 2.9 µs Transmit + 6 µs Receive for ½ RTT Datagram Bypass (DBL) 3.5 µs for UDP and 4.0 µs for TCP Infiniband MPI 1 µs (excluding application layer) Hardware Graphics Processor (GPU) Optimized for throughput, but not optimized for low latency Incurs additional overhead of passing data through PCIe bus ASIC Achieves sub-µs latency But lacks flexibility to handle new protocols and features FPGA Provides 0.2 µs latency w/tcp Has the flexibility to support new protocols and features

7 FPGA Approaches Hybrid Computing CPU + NIC + FPGA Offload Pure FPGA Computing FPGA handles all computing Unpredictable PCIe bus transfer Memory copy Potential cache misses Amdahl s law No bus copy No memory copy No cache misses Parallel execution

8 FPGA outperforms Software Latency Hardware: 0.2 µs Software: 5 µs Jitter Hardware: 6 ns Software: 600 ns

Latency 9 Latency vs. Development Time Software on CPU Higher Latency Longer Development Time FPGA Lower Latency Hours Days Weeks Months Years Development Time (Time to Market) Software solutions Require less development time to get started But FPGA hardware solutions achieve lower latency That fundamentally cannot be achieved with software

Data Link Application Network Physical NIC 10 Algo-Logic s Low Latency Library Infrastructure Protocol Parsers Market data in local memory Host PC Algo-Logic Platform Host Application Stock Price Tables Application (Custom Logic) Algo-Logic C++ API Financial Protocol Parser Register Interface Networking Stack Session Datagram OS Kernel IP Processing FPGA µblaze Soft CPU SRAM Controller QDR II SRAM Host (UDP) Broker Exchange Ethernet Links

Application Network Data Link Physical NIC 11 Infrastructure Host PC Algo-Logic Platform Host Application Stock Price Tables Application (Custom Logic) Algo-Logic C++ API Financial Protocol Parser Register Interface Networking Stack Session Datagram OS Kernel IP Processing FPGA µblaze Soft CPU SRAM Controller QDR II SRAM Host (UDP) Broker Exchange Ethernet Links

Application Network Data Link Physical NIC 12 Protocol Parsers Host PC Host Application Algo-Logic Platform Stock Price Tables Application (Custom Logic) FIX Financial Information exchange Algo-Logic C++ API Networking Stack Financial Protocol Parser Session Datagram Register Interface OUCH NASDAQ OS Kernel IP Processing FPGA XPRS DirectEdge µblaze Soft CPU SRAM Controller BOE BATS BZX Host (UDP) Broker Exchange QDR II SRAM Ethernet Links ArcaDirect NYSE Arca Native Trading Gateway LSE

13 View of an Nasdaq order in the FPGA OUCH Packet Data O = OUCH Enter Order Message 14-byte Order Token Buy/Sell Indicator Number of Shares Stock Name Price

Application Network Data Link Physical NIC 14 Market Data and On-chip Storage Host PC Algo-Logic Platform Host Application Stock Price Tables Application (Custom Logic) Algo-Logic C++ API Financial Protocol Parser Register Interface Networking Stack Session Datagram OS Kernel IP Processing FPGA µblaze Soft CPU SRAM Controller QDR II SRAM Host (UDP) Broker Exchange Ethernet Links

Latency Algo-Logic Reduces Time to Market Starting with Pre-built Library Software on CPU FPGA w/library Higher Latency Lower Latency Hours Days Weeks Months Years Development Time (Time to Market) Starting with a pre-built low-latency library Reduces the initial development effort But maintains fundamental benefits of FPGA solution Full data-path remains in low-latency hardware 15

16 Areas of Application Algo-Logic s Low Latency Libraries provide lowest latency for Traders, Brokers, Market Makers and Exchanges in the areas of: Trading Strategy Internalization Feed Processing Smart Order Routing Risk Management Matching

17 Operations performed in hardware Parsing FIX execution reports Update price per Security to calculate Long Exposure Short Exposure Value/Security (across all sessions) Position/Security (across all sessions) FPGA Sending these values to the customer On a programmed, periodic basis

Example: Position & Exposure Monitor 18

19 Specifications & Performance Hardware platform Application Parameter Value (demo) NetFPGA- Position & Exposure calculation Protocol FIX 4.2 # sessions supported 10 # securities supported 100 Clock frequency Latency (Logic processing inside FPGA) be delay Line rate, 10 Gbps 200ns 400ns (one-way) Total Latency (pin-to-pin) 1µs

20 Results Low latency library Implemented as FPGA gateware Implements Order flow processing infrastructure Protocol parsing for all major exchanges Maintains market data in local memory Demonstrated Ten sessions of FIX 4.2 on NetFPGA Achieves Jitter-free processing with 200ns latency

More about Accelerated Finance 21

22 Algo-Logic Systems, Inc. Web http://algo-logic.com Email Solutions@ Algo-Logic.com Phone (408) 707-3740 Office Address 2255-D Martin Ave. Santa Clara, CA 95050