HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS
|
|
- Giles Anthony
- 1 years ago
- Views:
Transcription
1 HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1
2 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access (RDMA) Performance
3 Index 00 Background
4 Network Communication 01 Send Application buffer Socket buffer Attach headers Data is pushed to NIC buffer Receive NIC buffer Socket buffer Parsing headers Data is copied into Application buffer Application is scheduled (context switching)
5 Today s Theme 02 Faster and lightweight communication!
6 Terms and Problems 03 Communication latency Processing overhead: message-handling time at sending/receiving ends Network latency: message transmission time between two ends (i.e., end-to-end latency)
7 Terms and Problems 03 Communication latency Processing overhead: message-handling time at sending/receiving ends Network latency: message transmission time between two ends (i.e., end-to-end latency) If network environment satisfies High bandwidth / Low network latency Long connection durations / Relatively few connections
8 TCP Offloading Engine (TOE) 04 THIS IS NOT OUR STORY!
9 Our Story 05 Large vs Small messages Large: transmission dominant new networks improves (e.g., video/audio stream) Small: processing dominant new paradigm improves (e.g., just a few hundred bytes)
10 Our Story 05 Large vs Small messages Large: transmission dominant new networks improves (e.g., video/audio stream) Small: processing dominant new paradigm improves (e.g., just a few hundred bytes) Our underlying picture Sending many small messages in LAN Processing overhead is overwhelming (e.g., buffer management, message copies, interrupt)
11 Traditional Architecture 06 Problem: Messages pass through the kernel Low performance Duplicate several copies Multiple abstractions between device driver and user apps Low flexibility All protocol processing inside the kernel Hard to support new protocols and new message send/receive interfaces
12 History of High-Performance 07 User-level Networking (U-Net) One of the first kernel-bypassing systems Virtual Interface Architecture (VIA) First attempt to standardize user-level communication Combine U-Net interface with remote DMA service Remote Direct Memory Access (RDMA) Modern high-performance networking Many other names, but sharing common themes
13 Index 00 U-Net
14 U-Net Ideas and Goals 08 Move protocol processing parts into user space! Move the entire protocol stack to user space Remove kernel completely from data communication path
15 U-Net Ideas and Goals 08 Move protocol processing parts into user space! Move the entire protocol stack to user space Remove kernel completely from data communication path Focusing on small messages, key goals are: High performance / High flexibility
16 U-Net Ideas and Goals 08 Move protocol processing parts into user space! Move the entire protocol stack to user space Remove kernel completely from data communication path Focusing on small messages, key goals are: High performance / High flexibility Low communication latency in local area setting Exploit full bandwidth Emphasis on protocol design and integration flexibility Portable to off-the-shelf communication hardware
17 U-Net Ideas and Goals 08 Move protocol processing parts into user space! Move the entire protocol stack to user space Remove kernel completely from data communication path Focusing on small messages, key goals are: High performance / High flexibility Low communication latency in local area setting Exploit full bandwidth Emphasis on protocol design and integration flexibility Portable to off-the-shelf communication hardware
18 U-Net Ideas and Goals 08 Move protocol processing parts into user space! Move the entire protocol stack to user space Remove kernel completely from data communication path Focusing on small messages, key goals are: High performance / High flexibility Low communication latency in local area setting Exploit full bandwidth Emphasis on protocol design and integration flexibility Portable to off-the-shelf communication hardware
19 U-Net Ideas and Goals 08 Move protocol processing parts into user space! Move the entire protocol stack to user space Remove kernel completely from data communication path Focusing on small messages, key goals are: High performance / High flexibility Low communication latency in local area setting Exploit full bandwidth Emphasis on protocol design and integration flexibility Portable to off-the-shelf communication hardware
20 U-Net Ideas and Goals 08 Move protocol processing parts into user space! Move the entire protocol stack to user space Remove kernel completely from data communication path Focusing on small messages, key goals are: High performance / High flexibility Low communication latency in local area setting Exploit full bandwidth Emphasis on protocol design and integration flexibility Portable to off-the-shelf communication hardware
21 U-Net Architecture 09 Traditionally Kernel controls network All communications via the kernel U-Net Applications can access network directly via MUX Kernel involves only in connection setup * Virtualize NI provides each process the illusion of owning interface to network
22 U-Net Building Blocks 10 End points: application s / kernel s handle into network Communication segments: memory buffers for sending/receiving messages data Message queues: hold descriptors for messages that are to be sent or have been received
23 U-Net Communication: Initialize 11 Initialization: Create single/multiple endpoints for each application Associate a communication segment and send/receive/free message queues with each endpoint
24 U-Net Communication: Send 12 START [NI] exerts backpressure to the user processes Composes the data in the communication segment Associated send buffer can be reused Is Queue Full? Push a descriptor for the message onto the send queue [NI] indicates messages injection status by flag [NI] leaves the descriptor in the queue NEGATIVE Backedup? POSITIVE [NI] picks up the message and inserts into the network Send as simple as changing one or two pointers!
25 U-Net Communication: Receive 13 START [NI] demultiplexes incoming messages to their destinations Periodically check the status of queue Read the data using the descriptor from the receive queue polling Get available space from the free queue Receive Model? event driven Use upcall to signal the arrival Transfer data into the appropriate comm. segment Push message descriptor to the receive queue Receive as simple as NIC changing one or two pointers!
26 14 U-Net Protection Owning process protection Endpoints Communication segments Send/Receive/Free queues Only owning process can access! Tag protection Outgoing messages are tagged with the originating endpoint address Incoming messages are only delivered to the correct destination endpoint
27 U-Net Zero Copy 15 Base-level U-Net (might not be zero copy) Send/receive needs a buffer Requires a copy between application data structures and the buffer in the communication segment Can also keep the application data structures in the buffer without requiring a copy Direct Access U-Net (true zero copy) Span the entire process address space But requires special hardware support to check address
28 Index 00 RDMA
29 RDMA Ideas and Goals 16 Move buffers between two applications via network Once programs implement RDMA: Tries to achieve lowest latency and highest throughput Smallest CPU footprint
30 RDMA Architecture (1/2) 17 Traditionally, socket interface involves the kernel Has a dedicated verbs interface instead of the socket interface Involves the kernel only on control path Can access rnic directly from user space on data path bypassing kernel
31 RDMA Architecture (2/2) 18 To initiate RDMA, establish data path from RNIC to application memory Verbs interface provide API to establish these data path Once data path is established, directly read from/write to buffers Verbs interface is different from the traditional socket interface.
32 RDMA Building Blocks 19 Applications use verb interfaces in order to Register memory: kernel ensures memory is pinned and accessible by DMA Create a queue pair (QP): a pair of send/receive queues Create a completion queue (CQ): RNIC puts a new completion-queue element into the CQ after an operation has completed. Send/receive data
33 RDMA Communication (1/4) 20 Step 1
34 RDMA Communication (2/4) 21 Step 2
35 RDMA Communication (3/4) 22 Step 3
36 RDMA Communication (4/4) 23 Step 4
37 Index 00 Performance
38 U-Net Performance: Bandwidth 24 * UDP bandwidth * TCP bandwidth size of messages data generation by application
39 U-Net Performance: Latency 25 End-to-end round trip latency size of messages
40 RDMA Performance: CPU load 26 CPULoad
41 Modern RDMA Several major vendors: Qlogic (Infiniband), Mellanox, Intel, Chelsio, others RDMA has evolved from the U/Net approach to have three modes Infiniband (Qlogic PSM API): one-sided, no connection setup More standard: qpair on each side, plus a binding mechanism (one queue is for the sends, or receives, and the other is for sensing completions) One-sided RDMA: after some setup, allows one side to read or write to the memory managed by the other side, but prepermission is required RDMA + VLAN: needed in data centers with multitenancy
42 Modern RDMA Memory management is tricky: Pages must be pinned and mapped into IOMMU Kernel will zero pages on first allocation request: slow If a page is a mapped region from a file, kernel may try to automatically issue a disk write after updates, costly Integration with modern NVRAM storage is awkward On multicore NUMA machines, hard to know which core owns a particular memory page, yet this matters Main reason we should care? RDMA runs at 20, 40Gb/s. And soon 100, 200 1Tb/s But memcpy and memset run at perhaps 30Gb/s
43 SoftROCE Useful new option With standard RDMA may people worry programs won t be portable and will run only with one kind of hardware SoftROCE allows use of the RDMA software stack (libibverbs.dll) but tunnels via TCP hence doesn t use hardware RDMA at all Zero-copy sends, but needs one-copy for receives Intel iwarp is aimed at something similar Tries to offer RDMA with zero-copy on both sides under the TCP API by dropping TCP into the NIC Requires a special NIC with an RDMA chip-set
44 HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Jaeyong Sung (Nov 20, 2014) Part 2
45 Hadoop 2 Big Data very common in industries e.g. Facebook, Google, Amazon, Hadoop open source MapReduce for handling large data require lots of data transfers
46 Hadoop Distributed File System 3 primary storage for Hadoop clusters both Hadoop MapReduce and HBase rely on it communication intensive middleware layered on top of TCP/IP
47 Hadoop Distributed File System 4 highly reliable fault-tolerant replications in data-intensive applications, network performance becomes key component HDFS write: replication
48 Software Bottleneck 5 Using TCP/IP on Linux, TCP echo
49 RDMA for HDFS 6 data structure for Hadoop <key, value> pairs stored in data blocks of HDFS Both write(replication) and read can take advantage of RDMA
50 FaRM: Fast Remote Memory 7 relies on cache coherent DMA object s version number stored both in the first word of the object header and at the start of each cache line NOT visible to the application (e.g. HDFS)
51 Traditional Lock-free reads 8 For updating the data,
52 Traditional Lock-free reads 9 Reading requires three accesses
53 FaRM Lock-free Reads 10 FaRM relies on cache coherent DMA Version info in each of cache-lines
54 FaRM Lock-free Reads 11 single RDMA read
55 FaRM: Distributed Transactions 12 general mechanism to ensure consistency Two-stage commits (checks version number)
56 Shared Address Space 13 shared address space consists of many shared memory regions consistent hashing for mapping region identifier to the machine that stores the object each machine is mapped into k virtual rings
57 Transactions in Shared Address Space 14 Strong consistency Atomic execution of multiple operations Shared Address Space Read Write Read Write Free Alloc
58 Communication Primitives 15 One-sided RDMA reads to access data directly RDMA writes circular buffer is used for unidirectional channel one buffer for each sender/receiver pair buffer is stored on receiver
59 16 benchmark on communication primitives
60 Limited cache space in NIC 17 Some Hadoop clusters can have hundreds and thousands of nodes Performance of RDMA can suffer as amount of memory registered increases NIC will run out of space to cache all page tables
61 Limited cache space in NIC 18 FaRM s solution: PhyCo kernel driver that allocates a large number of physically contiguous and naturally aligned 2GB memory regions at boot time maps the region into the virtual address space aligned on a 2GB boundary
62 19 PhyCo
63 Limited cache space in NIC 20 PhyCo still suffered as number of clusters increased because it can run out of space to cache all queue pair 2 mm tt 2 queue pairs per machine mm = number of machines, tt = number of threads per machine single connection between a thread and each remote machine 2 mm tt queue pair sharing among qq threads 2 mm tt / qq
64 Connection Multiplexing 21 best value of q depends on cluster size
65 Experiments 22 Key-value store: lookup scalability
66 Experiments 23 Key-value store: varying update rates
RoCE vs. iwarp Competitive Analysis
WHITE PAPER August 21 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...4 Summary...
Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics
Why Compromise? A discussion on RDMA versus Send/Receive and the difference between interconnect and application semantics Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400
Advanced Computer Networks. High Performance Networking I
Advanced Computer Networks 263 3501 00 High Performance Networking I Patrick Stuedi Spring Semester 2014 1 Oriana Riva, Department of Computer Science ETH Zürich Outline Last week: Wireless TCP Today:
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck
Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering
RDMA-based Big Data Analytic
RDMA-based Big Data Analytic Gilad Shainer Technion, March 2014 The InfiniBand Architecture Industry standard defined by the InfiniBand Trade Association Defines System Area Network architecture Comprehensive
Solving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
OFA Training Program. Writing Application Programs for RDMA using OFA Software. Author: Rupert Dance Date: 11/15/2011. www.openfabrics.
OFA Training Program Writing Application Programs for RDMA using OFA Software Author: Rupert Dance Date: 11/15/2011 www.openfabrics.org 1 Agenda OFA Training Program Program Goals Instructors Programming
Intel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
InfiniBand Software and Protocols Enable Seamless Off-the-shelf Applications Deployment
December 2007 InfiniBand Software and Protocols Enable Seamless Off-the-shelf Deployment 1.0 Introduction InfiniBand architecture defines a high-bandwidth, low-latency clustering interconnect that is used
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance
TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance M. Rangarajan, A. Bohra, K. Banerjee, E.V. Carrera, R. Bianchini, L. Iftode, W. Zwaenepoel. Presented
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building
Network Performance Improvement for FreeBSD Guest on Hyper-V
Network Performance Improvement for FreeBSD Guest on Hyper-V Dexuan Cui Yanmin Qiao (Sephe) Hongjiang Zhang Open Source Technology Center,
Datacenter Operating Systems
Datacenter Operating Systems CSE451 Simon Peter With thanks to Timothy Roscoe (ETH Zurich) Autumn 2015 This Lecture What s a datacenter Why datacenters Types of datacenters Hyperscale datacenters Major
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8
Microsoft SMB 2.2 - Running Over RDMA in Windows Server 8 Tom Talpey, Architect Microsoft March 27, 2012 1 SMB2 Background The primary Windows filesharing protocol Initially shipped in Vista and Server
Network Virtualization Technologies and their Effect on Performance
Network Virtualization Technologies and their Effect on Performance Dror Goldenberg VP Software Architecture TCE NFV Winter School 2015 Cloud Computing and NFV Cloud - scalable computing resources (CPU,
Design and Implementation of the iwarp Protocol in Software. Dennis Dalessandro, Ananth Devulapalli, Pete Wyckoff Ohio Supercomputer Center
Design and Implementation of the iwarp Protocol in Software Dennis Dalessandro, Ananth Devulapalli, Pete Wyckoff Ohio Supercomputer Center What is iwarp? RDMA over Ethernet. Provides Zero-Copy mechanism
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
TCP Offload Engines. As network interconnect speeds advance to Gigabit. Introduction to
Introduction to TCP Offload Engines By implementing a TCP Offload Engine (TOE) in high-speed computing environments, administrators can help relieve network bottlenecks and improve application performance.
Network Attached Storage. Jinfeng Yang Oct/19/2015
Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability
RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
Router Architectures
Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Bridging the Gap between Software and Hardware Techniques for I/O Virtualization
Bridging the Gap between Software and Hardware Techniques for I/O Virtualization Jose Renato Santos Yoshio Turner G.(John) Janakiraman Ian Pratt Hewlett Packard Laboratories, Palo Alto, CA University of
Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect
White PAPER Low Latency 10 GbE Switching for Data Center, Cluster and Storage Interconnect Introduction: High Performance Data Centers As the data center continues to evolve to meet rapidly escalating
Distributed Systems: Concepts and Design
Distributed Systems: Concepts and Design Edition 3 By George Coulouris, Jean Dollimore and Tim Kindberg Addison-Wesley, Pearson Education 2001. Chapter 2 Exercise Solutions 2.1 Describe and illustrate
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June ronald.vanderpol@sara.nl,freek.dijkstra@sara.nl,
Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC. Wenhao Wu Program Manager Windows HPC team
Accelerating From Cluster to Cloud: Overview of RDMA on Windows HPC Wenhao Wu Program Manager Windows HPC team Agenda Microsoft s Commitments to HPC RDMA for HPC Server RDMA for Storage in Windows 8 Microsoft
Broadcom Ethernet Network Controller Enhanced Virtualization Functionality
White Paper Broadcom Ethernet Network Controller Enhanced Virtualization Functionality Advancements in VMware virtualization technology coupled with the increasing processing capability of hardware platforms
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology
I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology Reduce I/O cost and power by 40 50% Reduce I/O real estate needs in blade servers through consolidation Maintain
SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation
SMB Advanced Networking for Fault Tolerance and Performance Jose Barreto Principal Program Managers Microsoft Corporation Agenda SMB Remote File Storage for Server Apps SMB Direct (SMB over RDMA) SMB Multichannel
Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014
Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,
Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. Abstract:
Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure Motti Beck Director, Marketing motti@mellanox.com Michael Kagan Chief Technology Officer michaelk@mellanox.com
PCI Express High Speed Networks. Complete Solution for High Speed Networking
PCI Express High Speed Networks Complete Solution for High Speed Networking Ultra Low Latency Ultra High Throughput Maximizing application performance is a combination of processing, communication, and
A Tour of the Linux OpenFabrics Stack
A Tour of the OpenFabrics Stack Johann George, QLogic June 2006 1 Overview Beyond Sockets Provides a common interface that allows applications to take advantage of the RDMA (Remote Direct Memory Access),
VIRTUAL INTERFACE ARCHITECTURE DRAFT WRITE UP Roberto Innocente and Olumide Sunday Adewale
VIRTUAL INTERFACE ARCHITECTURE DRAFT WRITE UP Roberto Innocente and Olumide Sunday Adewale INTRODUCTION The recent arrivals of fast networks, such as Myrinet, asynchronous transfer mode (ATM) coupled with
Experiences with Wide-Area RDMA: The most fun you can have with your WAN
Experiences with Wide-Area RDMA: The most fun you can have with your WAN Ezra Kissel, Martin Swany Indiana University July 16 th 2012 Internet 2 Joint Techs High-performance data movement Increasing demand
Operating Systems Design 16. Networking: Sockets
Operating Systems Design 16. Networking: Sockets Paul Krzyzanowski pxk@cs.rutgers.edu 1 Sockets IP lets us send data between machines TCP & UDP are transport layer protocols Contain port number to identify
OpenFlow Based Load Balancing
OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple
Quantum StorNext. Product Brief: Distributed LAN Client
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
VXLAN: Scaling Data Center Capacity. White Paper
VXLAN: Scaling Data Center Capacity White Paper Virtual Extensible LAN (VXLAN) Overview This document provides an overview of how VXLAN works. It also provides criteria to help determine when and where
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
High-Density Network Flow Monitoring
Petr Velan petr.velan@cesnet.cz High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible
I3: Maximizing Packet Capture Performance. Andrew Brown
I3: Maximizing Packet Capture Performance Andrew Brown Agenda Why do captures drop packets, how can you tell? Software considerations Hardware considerations Potential hardware improvements Test configurations/parameters
Optimizing Network Virtualization in Xen
Optimizing Network Virtualization in Xen Aravind Menon EPFL, Lausanne aravind.menon@epfl.ch Alan L. Cox Rice University, Houston alc@cs.rice.edu Willy Zwaenepoel EPFL, Lausanne willy.zwaenepoel@epfl.ch
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking
Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand
Lustre Networking BY PETER J. BRAAM
Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information
Highly parallel, lock- less, user- space TCP/IP networking stack based on FreeBSD. EuroBSDCon 2013 Malta
Highly parallel, lock- less, user- space TCP/IP networking stack based on FreeBSD EuroBSDCon 2013 Malta Networking stack Requirements High throughput Low latency ConnecLon establishments and teardowns
Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine
Where IT perceptions are reality Test Report OCe14000 Performance Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine Document # TEST2014001 v9, October 2014 Copyright 2014 IT Brand
Tyche: An efficient Ethernet-based protocol for converged networked storage
Tyche: An efficient Ethernet-based protocol for converged networked storage Pilar González-Férez and Angelos Bilas 30 th International Conference on Massive Storage Systems and Technology MSST 2014 June
Building Enterprise-Class Storage Using 40GbE
Building Enterprise-Class Storage Using 40GbE Unified Storage Hardware Solution using T5 Executive Summary This white paper focuses on providing benchmarking results that highlight the Chelsio T5 performance
Myriad of different LAN technologies co-existing in a WAN. For example:
Myriad of different LAN technologies co-existing in a WAN. For example: Fast Ethernet (100 Mbps) Gigabit Ethernet (1000 Mbps); 10 and 100 GigE Purdue CS backbone: 10 Gbps AT&T (tier-1 provider)? WLAN (11,
High-Performance Networking for Optimized Hadoop Deployments
High-Performance Networking for Optimized Hadoop Deployments Chelsio Terminator 4 (T4) Unified Wire adapters deliver a range of performance gains for Hadoop by bringing the Hadoop cluster networking into
Windows 8 SMB 2.2 File Sharing Performance
Windows 8 SMB 2.2 File Sharing Performance Abstract This paper provides a preliminary analysis of the performance capabilities of the Server Message Block (SMB) 2.2 file sharing protocol with 10 gigabit
Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card
Implementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card In-Su Yoon 1, Sang-Hwa Chung 1, Ben Lee 2, and Hyuk-Chul Kwon 1 1 Pusan National University School of Electrical and Computer
Mark Bennett. Search and the Virtual Machine
Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
PE10G2T Dual Port Fiber 10 Gigabit Ethernet TOE PCI Express Server Adapter Broadcom based
PE10G2T Dual Port Fiber 10 Gigabit Ethernet TOE PCI Express Server Adapter Broadcom based Description Silicom 10Gigabit TOE PCI Express server adapters are network interface cards that contain multiple
Next Generation Operating Systems
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015 The end of CPU scaling Future computing challenges Power efficiency Performance == parallelism Cisco Confidential 2 Paradox of the
A Packet Forwarding Method for the ISCSI Virtualization Switch
Fourth International Workshop on Storage Network Architecture and Parallel I/Os A Packet Forwarding Method for the ISCSI Virtualization Switch Yi-Cheng Chung a, Stanley Lee b Network & Communications Technology,
Optimizing TCP Forwarding
Optimizing TCP Forwarding Vsevolod V. Panteleenko and Vincent W. Freeh TR-2-3 Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556 {vvp, vin}@cse.nd.edu Abstract
Accelerating High-Speed Networking with Intel I/O Acceleration Technology
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
Chapter 14 Virtual Machines
Operating Systems: Internals and Design Principles Chapter 14 Virtual Machines Eighth Edition By William Stallings Virtual Machines (VM) Virtualization technology enables a single PC or server to simultaneously
Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability
White Paper Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability The new TCP Chimney Offload Architecture from Microsoft enables offload of the TCP protocol
IP SAN Fundamentals: An Introduction to IP SANs and iscsi
IP SAN Fundamentals: An Introduction to IP SANs and iscsi Updated April 2007 Sun Microsystems, Inc. 2007 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 USA All rights reserved. This
Integrated Application and Data Protection. NEC ExpressCluster White Paper
Integrated Application and Data Protection NEC ExpressCluster White Paper Introduction Critical business processes and operations depend on real-time access to IT systems that consist of applications and
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study
CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what
Getting More Performance and Efficiency in the Application Delivery Network
SOLUTION BRIEF Intel Xeon Processor E5-2600 v2 Product Family Intel Solid-State Drives (Intel SSD) F5* Networks Delivery Controllers (ADCs) Networking and Communications Getting More Performance and Efficiency
Introduction to Intel Ethernet Flow Director and Memcached Performance
White Paper Intel Ethernet Flow Director Introduction to Intel Ethernet Flow Director and Memcached Performance Problem Statement Bob Metcalfe, then at Xerox PARC, invented Ethernet in 1973, over forty
Networking Driver Performance and Measurement - e1000 A Case Study
Networking Driver Performance and Measurement - e1000 A Case Study John A. Ronciak Intel Corporation john.ronciak@intel.com Ganesh Venkatesan Intel Corporation ganesh.venkatesan@intel.com Jesse Brandeburg
Intel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking
Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking Roberto Bonafiglia, Ivano Cerrato, Francesco Ciaccia, Mario Nemirovsky, Fulvio Risso Politecnico di Torino,
Introduction to Infiniband. Hussein N. Harake, Performance U! Winter School
Introduction to Infiniband Hussein N. Harake, Performance U! Winter School Agenda Definition of Infiniband Features Hardware Facts Layers OFED Stack OpenSM Tools and Utilities Topologies Infiniband Roadmap
OpenFlow with Intel 82599. Voravit Tanyingyong, Markus Hidell, Peter Sjödin
OpenFlow with Intel 82599 Voravit Tanyingyong, Markus Hidell, Peter Sjödin Outline Background Goal Design Experiment and Evaluation Conclusion OpenFlow SW HW Open up commercial network hardware for experiment
Leveraging NIC Technology to Improve Network Performance in VMware vsphere
Leveraging NIC Technology to Improve Network Performance in VMware vsphere Performance Study TECHNICAL WHITE PAPER Table of Contents Introduction... 3 Hardware Description... 3 List of Features... 4 NetQueue...
Sockets Direct Protocol v1.0 RDMA Consortium. Jim Pinkerton RDMA Consortium 10/24/2003
Sockets Direct Protocol v1.0 RDMA Consortium Jim Pinkerton RDMA Consortium 10/24/2003 Agenda! Problems with existing network protocol stacks! Sockets Direct Protocol (SDP) Overview Protocol details! Connection
Lecture 25 Symbian OS
CS 423 Operating Systems Design Lecture 25 Symbian OS Klara Nahrstedt Fall 2011 Based on slides from Andrew S. Tanenbaum textbook and other web-material (see acknowledgements) cs423 Fall 2011 1 Overview
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume
Hyper-V over SMB: Remote File Storage Support in Windows Server 2012 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation
Hyper-V over SMB: Remote Storage Support in Windows Server 2012 Hyper-V Jose Barreto Principal Program Manager Microsoft Corporation Abstract In this session, we cover the Windows Server 2012 Hyper-V support
Transport Layer Protocols
Transport Layer Protocols Version. Transport layer performs two main tasks for the application layer by using the network layer. It provides end to end communication between two applications, and implements
Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser
Network Performance Optimisation and Load Balancing Wulf Thannhaeuser 1 Network Performance Optimisation 2 Network Optimisation: Where? Fixed latency 4.0 µs Variable latency
LyraNET: A Zero-Copy TCP/IP Protocol Stack for Embedded Operating Systems
LyraNET: A Zero-Copy TCP/IP Protocol Stack for Embedded Operating Systems Yun-Chen Li Mei-Ling Chiang Department of Information Management National Chi-Nan University, Puli, Taiwan, R.O.C. s1213526@ncnu.edu.tw,
JoramMQ, a distributed MQTT broker for the Internet of Things
JoramMQ, a distributed broker for the Internet of Things White paper and performance evaluation v1.2 September 214 mqtt.jorammq.com www.scalagent.com 1 1 Overview Message Queue Telemetry Transport () is
10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications
10Gb Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Latency-Sensitive Applications Testing conducted by Solarflare and Arista Networks reveals single-digit
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
Bonding / Team drivers
HW High-Availability and Link Aggregation for Ethernet switch and NIC RDMA using Linux bonding/team Tzahi Oved tzahio@mellanox.com ; Or Gerlitz ogerlitz@mellanox.com Netdev 1.1 2016 Bonding / Team drivers
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308 Laura Knapp WW Business Consultant Laurak@aesclever.com Applied Expert Systems, Inc. 2011 1 Background
High Speed I/O Server Computing with InfiniBand
High Speed I/O Server Computing with InfiniBand José Luís Gonçalves Dep. Informática, Universidade do Minho 4710-057 Braga, Portugal zeluis@ipb.pt Abstract: High-speed server computing heavily relies on
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
Network Layer: Network Layer and IP Protocol
1 Network Layer: Network Layer and IP Protocol Required reading: Garcia 7.3.3, 8.1, 8.2.1 CSE 3213, Winter 2010 Instructor: N. Vlajic 2 1. Introduction 2. Router Architecture 3. Network Layer Protocols
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect ashutosh_shinde@hotmail.com Validating if the workload generated by the load generating tools is applied
Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V. Jose Barreto Principal Program Manager Microsoft Corporation
Hyper-V over SMB Remote File Storage support in Windows Server 8 Hyper-V Jose Barreto Principal Program Manager Microsoft Corporation Agenda Hyper-V over SMB - Overview How to set it up Configuration Options
Challenges of Sending Large Files Over Public Internet
Challenges of Sending Large Files Over Public Internet CLICK TO EDIT MASTER TITLE STYLE JONATHAN SOLOMON SENIOR SALES & SYSTEM ENGINEER, ASPERA, INC. CLICK TO EDIT MASTER SUBTITLE STYLE OUTLINE Ø Setting
Chapter 5 Cloud Resource Virtualization
Chapter 5 Cloud Resource Virtualization Contents Virtualization. Layering and virtualization. Virtual machine monitor. Virtual machine. Performance and security isolation. Architectural support for virtualization.