Aeron High-Performance Open Source Message Transport. Martin Thompson - @mjpt777



Similar documents
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Ethernet. Ethernet. Network Devices

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

Strategies. Addressing and Routing

Question: 3 When using Application Intelligence, Server Time may be defined as.

Internet Architecture and Philosophy

Big Data Pipeline and Analytics Platform

Informatica Ultra Messaging SMX Shared-Memory Transport

Chapter 11 I/O Management and Disk Scheduling

Qualogy M. Schildmeijer. Whitepaper Oracle Exalogic FMW Optimization

20-CS X Network Security Spring, An Introduction To. Network Security. Week 1. January 7

Voice over IP. Demonstration 1: VoIP Protocols. Network Environment

Graylog2 Lennart Koopmann, OSDC /

Latency in High Performance Trading Systems Feb 2010

Intel DPDK Boosts Server Appliance Performance White Paper

Network Attached Storage. Jinfeng Yang Oct/19/2015

CHAPTER 3 PROBLEM STATEMENT AND RESEARCH METHODOLOGY

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Scaling Progress OpenEdge Appservers. Syed Irfan Pasha Principal QA Engineer Progress Software

Datacenter Operating Systems

Mojolicious. Marcos Rebelo

White Paper Abstract Disclaimer

Tomcat Tuning. Mark Thomas April 2009

Parallel Replication for MySQL in 5 Minutes or Less

Computer Networks. Definition of LAN. Connection of Network. Key Points of LAN. Lecture 06 Connecting Networks

The BSN Hardware and Software Platform: Enabling Easy Development of Body Sensor Network Applications

point to point and point to multi point calls over IP

Practical Performance Understanding the Performance of Your Application

Giving life to today s media distribution services

Enhance Service Delivery and Accelerate Financial Applications with Consolidated Market Data

Network Layer: Network Layer and IP Protocol

Improved Digital Media Delivery with Telestream HyperLaunch

Ground up Introduction to In-Memory Data (Grids)

Architectures for massive data management

Lecture 15: Congestion Control. CSE 123: Computer Networks Stefan Savage

WSO2 Message Broker. Scalable persistent Messaging System

Measurement Study of Wuala, a Distributed Social Storage Service

Basic Networking Concepts. 1. Introduction 2. Protocols 3. Protocol Layers 4. Network Interconnection/Internet

Enabling High performance Big Data platform with RDMA

Network Programming TDC 561

NETWORK LAYER/INTERNET PROTOCOLS

Tool - 1: Health Center

Common Lisp Networking API Advantages and Disadvantages

Module 7 Internet And Internet Protocol Suite

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Replication on Virtual Machines

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Quantifind s story: Building custom interactive data analytics infrastructure

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Network Simulation Traffic, Paths and Impairment

Java and Real Time Storage Applications

Distributed File Systems

CPS221 Lecture: Layered Network Architecture

An Introduction to VoIP Protocols

Performance of Software Switching

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:

Application Note: AN00121 Using XMOS TCP/IP Library for UDP-based Networking

Requirements of Voice in an IP Internetwork

Open Source building blocks for the Internet of Things. Benjamin Cabé JFokus 2013

Amazon Glacier. Developer Guide API Version

Vortex White Paper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems

dnstap: high speed DNS logging without packet capture Robert Edmonds Farsight Security, Inc.

Lecture Computer Networks

Using jvmstat and visualgc to Solve Memory Management Problems

Are Second Generation Firewalls Good for Industrial Control Systems?

InfoScale Storage & Media Server Workloads

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Michael Kagan.

Mellanox Academy Online Training (E-learning)

What can DDS do for You? Learn how dynamic publish-subscribe messaging can improve the flexibility and scalability of your applications.

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Repeat Success, Not Mistakes; Use DDS Best Practices to Design Your Complex Distributed Systems

Your Old Stack is Slowing You Down. Ajay Patel, Vice President, Fusion Middleware

IP - The Internet Protocol

Microsoft Windows Server Hyper-V in a Flash

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Architecture and Performance of the Internet

Windows Azure Storage Scaling Cloud Storage Andrew Edwards Microsoft

Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000

Final for ECE374 05/06/13 Solution!!

High Performance OpenStack Cloud. Eli Karpilovski Cloud Advisory Council Chairman

Iperf Tutorial. Jon Dugan Summer JointTechs 2010, Columbus, OH

Chapter 11. User Datagram Protocol (UDP)

JoramMQ, a distributed MQTT broker for the Internet of Things

Transport Layer Protocols

Microsoft Windows Server in a Flash

Middleware Support for Real-Time Stream Processing Luigi Romano

Unified Fabric: Cisco's Innovation for Data Center Networks

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

D1.2 Network Load Balancing

High Frequency Trading and NoSQL. Peter Lawrey CEO, Principal Consultant Higher Frequency Trading

Software-Defined Networking for the Data Center. Dr. Peer Hasselmeyer NEC Laboratories Europe

Storage I/O Control: Proportional Allocation of Shared Storage Resources

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

ECLIPSE Performance Benchmarks and Profiling. January 2009

Transcription:

Aeron High-Performance Open Source Message Transport Martin Thompson - @mjpt777

1. Why build another Product? 2. What Features are really needed? 3. How does one Design for this? 4. What did we Learn on the way? 5. What s the Roadmap?

1. Why build another product?

Not Invented Here!

There s a story here...

Clients Gateway Gateway Gateway Gateway Gateway Gateway Matching or Trading Engine Matching or Trading Engine

But many others could benefit

Feature Bloat & Complexity

Not Fast Enough

Low-Latency is key

We are in a new world Multi-core, Multi-socket, Cloud...

We are in a new world Multi-core, Multi-socket, Cloud... UDP, IPC, InfiniBand, RDMA, PCI-e

Aeron is trying a new approach

The Team Todd Montgomery Richard Warburton Martin Thompson

2. What features are really needed?

Messaging Publishers Channel Subscribers Stream Channel

A library, not a framework, on which other abstractions and applications can be built

Composable Design

OSI layer 4 Transport for message oriented streams

OSI Layer 4 (Transport) Services 1. Connection Oriented Communication 2. Reliability 3. Flow Control 4. Congestion Avoidance/Control 5. Multiplexing

Connection Oriented Communication

Reliability

Flow Control

Congestion Avoidance/Control

Multiplexing

Multi-Everything World!

Multi-Everything World Publishers Subscribers Channel Stream

Endpoints that scale

3. How does one design for this?

Design Principles 1. Garbage free in steady state running 2. Smart Batching in the message path 3. Wait-free algos in the message path 4. Non-blocking IO in the message path 5. No exceptional cases in message path 6. Apply the Single Writer Principle 7. Prefer unshared state 8. Avoid unnecessary data copies

It s all about 3 things

It s all about 3 things 1. System Architecture

It s all about 3 things 1. System Architecture 2. Data Structures

It s all about 3 things 1. System Architecture 2. Data Structures 3. Protocols of Interaction

Architecture Publisher Subscriber Subscriber Publisher IPC Log Buffer

Architecture Publisher Sender Receiver Subscriber Media Subscriber Receiver Sender Publisher IPC Log Buffer Media (UDP, InfiniBand, PCI-e 3.0)

Architecture Publisher Sender Receiver Subscriber Admin Events Conductor Media Conductor Events Admin Subscriber Receiver Sender Publisher IPC Log Buffer Media (UDP, InfiniBand, PCI-e 3.0) Function/Method Call Volatile Fields & Queues

Architecture Client Media Driver Media Driver Client Publisher Sender Receiver Subscriber Admin Events Conductor Conductor Media Conductor Conductor Events Admin Subscriber Receiver Sender Publisher IPC Log Buffer Media (UDP, InfiniBand, PCI-e 3.0) Function/Method Call Volatile Fields & Queues IPC Ring/Broadcast Buffer

Data Structures Maps IPC Ring Buffers IPC Broadcast Buffers ITC Queues Dynamic Arrays Log Buffers

What does Aeron do? Creates a replicated persistent log of messages

How would you design a log?

File Tail

File Header Message 1 Tail

File Header Message 1 Header Message 2 Tail

File Header Message 1 Header Message 2 Tail

File Header Message 1 Header Message 2 Message 3 Tail

File Header Message 1 Header Message 2 Header Message 3 Tail

Persistent data structures can be safe to read without locks

One big file that goes on forever?

No!!! Page faults, page cache churn, VM pressure,...

Clean Dirty Header Message Header Message Active Header Message Header Message Header Message Header Message Header Tail Message Header Message

How do we stay wait-free?

File Header Message 1 Header Message 2 Header Message 3 Message X Tail Message Y

File Header Message 1 Header Message 2 Header Message 3 Message X Tail Message Y

File Header Message 1 Header Message 2 Header Message 3 Message X Message Y Tail

File Header Message 1 Header Message 2 Header Message 3 Message X Header Message Y Tail

File Header Message 1 Header Message 2 Header Message 3 Message X Header Message Y Padding Tail

File Header Message 1 Header File Tail Message 2 Header Message 3 Message X Header Message Y Padding

File Header Message 1 Header File Header Message X Message 2 Tail Header Message 3 Header Message Y Padding

What s in a header?

Data Message Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Version B E Flags Type +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------------+ R Frame Length +-+-------------------------------------------------------------+ R Term Offset +-+-------------------------------------------------------------+ Session ID +---------------------------------------------------------------+ Stream ID +---------------------------------------------------------------+ Term ID +---------------------------------------------------------------+ Encoded Message...... +---------------------------------------------------------------+

Unique identification of a byte within each stream across time (streamid, sessionid, termid, termoffset)

How do we replicate a log?

We need a protocol of messages

Sender Receiver

Setup Sender Receiver

Sender Receiver Status

Data Data Sender Receiver Status

Heartbeat Data Data Sender Receiver

Heartbeat Data Data Sender Receiver NAK

How are message streams reassembled?

Completed File High Water Mark

Completed File Header Message 1 High Water Mark

File Completed Header Message 1 Header Message 3 High Water Mark

File Header Message 1 Header Message 2 Header Message 3 Completed High Water Mark

What if a gap is never filled?

How do we know what is consumed?

Publishers, Senders, Receivers, and Subscribers all keep position counters

Counters are the key to flow control and monitoring

Protocols can be more subtle than you think

What about Self similar behaviour?

4. What did we learn on the way?

Humans suck at estimation!!!

Building distributed systems is Hard!

We have more defensive code than feature code

This does not mean the code is riddled with exception handlers Yuk!!!

Building distributed systems is Rewarding!

Monitoring and Debugging

Loss, throughput, and buffer size are all strongly related!!!

Pro Tip: Know your OS network parameters and how to tune them

We can track application consumption No need for the Disruptor

Some parts of Java really suck!

Some parts of Java really suck! Unsigned Types?

Some parts of Java really suck! Unsigned Types? NIO (most of) - Locks

Some parts of Java really suck! Unsigned Types? NIO (most of) - Locks Off-heap, PAUSE, Signals, etc.

Some parts of Java really suck! Unsigned Types? NIO (most of) - Locks Off-heap, PAUSE, Signals, etc. String Encoding

Some parts of Java really suck! Unsigned Types? NIO (most of) - Locks Off-heap, PAUSE, Signals, etc. String Encoding Managing External Resources

Some parts of Java really suck! Unsigned Types? NIO (most of) - Locks Off-heap, PAUSE, Signals, etc. String Encoding Managing External Resources Selectors - GC

Bytes!!! public void main(final String[] args) { byte a = 0b0000_0001; byte b = 0b0000_0010; byte flags = a b; } System.out.printf( "flags=%s\n", Integer.toBinaryString(flags));

Bytes!!! public void main(final String[] args) { byte a = 0b0000_0001; byte b = 0b0000_0010; byte flags = a b; } System.out.printf( "flags=%s\n", Integer.toBinaryString(flags));

Bytes!!! public void main(final String[] args) { byte a = 0b0000_0001; byte b = 0b0000_0010; byte flags = a b; } System.out.printf( "flags=%s\n", Integer.toBinaryString(flags));

Some parts of Java are really nice!

Some parts of Java are really nice! Tooling IDEs, Gradle, HdrHistogram

Some parts of Java are really nice! Tooling IDEs, Gradle, HdrHistogram Lambdas & Method Handles

Some parts of Java are really nice! Tooling IDEs, Gradle, HdrHistogram Lambdas & Method Handles Bytecode Instrumentation

Some parts of Java are really nice! Tooling IDEs, Gradle, HdrHistogram Lambdas & Method Handles Bytecode Instrumentation Unsafe!!! + Java 8

Some parts of Java are really nice! Tooling IDEs, Gradle, HdrHistogram Lambdas & Method Handles Bytecode Instrumentation Unsafe!!! + Java 8 The Optimiser

Some parts of Java are really nice! Tooling IDEs, Gradle, HdrHistogram Lambdas & Method Handles Bytecode Instrumentation Unsafe!!! + Java 8 The Optimiser Love/Hate

Some parts of Java are really nice! Tooling IDEs, Gradle, HdrHistogram Lambdas & Method Handles Bytecode Instrumentation Unsafe!!! + Java 8 The Optimiser Love/Hate Garbage Collection!!!

5. What s the Roadmap?

We are major feature complete!

Just finished Profiling and Tuning

Things are looking very good

20 Million 40 byte messages per second!!!

7.0 7.1 7.2 7.2 7.2 7.3 7.4 9.2 9.6 9.9 10.5 11.0 11.2 11.7 12.2 12.5 13.3 13.9 14.5 15.8 18.1 18.8 19.5 20.9 23.9 26.7 37.9 10000 5034 5330 4596 Latency Distribution (µs) 2677 2645 2362 2577 2184 1000 1311 1423 1106 1028 1014 654 612612579 622 313313316327 301 151156159152 163 100 757578 8282 42 3839 42 29 1818 20 2120 10 10 11 9 9 8 6 5 5 4 4 3 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1

C++ Port coming next

Then IPC and Infiniband

Have discussed FPGA implementations with 3 rd Parties

In closing

Where can I find it? https://github.com/real-logic/aeron

Questions? Blog: http://mechanical-sympathy.blogspot.com/ Twitter: @mjpt777 Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius, and a lot of courage, to move in the opposite direction. - Albert Einstein