perfsonar Host Hardware

Similar documents

perfsonar Host Hardware

Campus Network Design Science DMZ

EVALUATING NETWORK BUFFER SIZE REQUIREMENTS

Network Performance - Theory

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

Enabling Technologies for Distributed and Cloud Computing

IOS110. Virtualization 5/27/2014 1

Enabling Technologies for Distributed Computing

D1.2 Network Load Balancing

Fundamentals of Data Movement Hardware

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

TCP Labs. WACREN Network Monitoring and Measurement Workshop Antoine Delvaux perfsonar developer

Windows Server Performance Monitoring

perfsonar: End-to-End Network Performance Verification

Evaluation of 40 Gigabit Ethernet technology for data servers

Evaluation Report: Emulex OCe GbE and OCe GbE Adapter Comparison with Intel X710 10GbE and XL710 40GbE Adapters

Performance tuning Xen

Networking Driver Performance and Measurement - e1000 A Case Study

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

VMWARE WHITE PAPER 1

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Linux NIC and iscsi Performance over 40GbE

VMware vsphere 4.1 Networking Performance

Virtualization and Performance NSRC

Virtualization for Cloud Computing

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

Datacenter Operating Systems

White Paper. Accelerating VMware vsphere Replication with Silver Peak

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Small is Better: Avoiding Latency Traps in Virtualized DataCenters

Intel DPDK Boosts Server Appliance Performance White Paper

LSI MegaRAID FastPath Performance Evaluation in a Web Server Environment

Windows Server 2012 R2 Hyper-V: Designing for the Real World

How To Make Your Database More Efficient By Virtualizing It On A Server

Mark Bennett. Search and the Virtual Machine

I3: Maximizing Packet Capture Performance. Andrew Brown

Solving the Hypervisor Network I/O Bottleneck Solarflare Virtualization Acceleration

Hyper-V: Microsoft s

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family

HP SN1000E 16 Gb Fibre Channel HBA Evaluation

Data Centers and Cloud Computing

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

Zeus Traffic Manager VA Performance on vsphere 4

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

I/O virtualization. Jussi Hanhirova Aalto University, Helsinki, Finland Hanhirova CS/Aalto

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Post-production Video Editing Solution Guide with Microsoft SMB 3 File Serving AssuredSAN 4000

LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment

VXLAN Performance Evaluation on VMware vsphere 5.1

Scaling in a Hypervisor Environment

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

2009 AAMGA Automation Conference

Data Center Infrastructure of the future. Alexei Agueev, Systems Engineer

Gigabit Ethernet Design

Virtual Switching Without a Hypervisor for a More Secure Cloud

DPDK Summit 2014 DPDK in a Virtual World

Virtualization Technologies

Performance Management in a Virtual Environment. Eric Siebert Author and vexpert. whitepaper

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Networking Virtualization Using FPGAs

PEX 8748, PCI Express Gen 3 Switch, 48 Lanes, 12 Ports

Chapter 14 Virtual Machines

Workstation Virtualization Software Review. Matthew Smith. Office of Science, Faculty and Student Team (FaST) Big Bend Community College

COS 318: Operating Systems. Virtual Machine Monitors

Virtualization of the MS Exchange Server Environment

9/26/2011. What is Virtualization? What are the different types of virtualization.

IDE/ATA Interface. Objectives. IDE Interface. IDE Interface

Quantum Hyper- V plugin

Virtualization. P. A. Wilsey. The text highlighted in green in these slides contain external hyperlinks. 1 / 16

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

Hyper-V Networking. Aidan Finn

Best Practices for Virtualised SharePoint

Mellanox Cloud and Database Acceleration Solution over Windows Server 2012 SMB Direct

Performance of Host Identity Protocol on Nokia Internet Tablet

Parallels Virtuozzo Containers

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

The Advantages of Multi-Port Network Adapters in an SWsoft Virtual Environment

Basics in Energy Information (& Communication) Systems Virtualization / Virtual Machines

Choosing the Best Network Interface Card Mellanox ConnectX -3 Pro EN vs. Intel X520

Comparing the Network Performance of Windows File Sharing Environments

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

COS 318: Operating Systems. Virtual Machine Monitors

Knut Omang Ifi/Oracle 19 Oct, 2015

Transcription:

perfsonar Host Hardware This document is a result of work by the perfsonar Project (h@p://www.perfsonar.net) and is licensed under CC BY- SA 4.0 (h@ps://creakvecommons.org/licenses/by- sa/4.0/). Event Presenter, OrganizaKon, Email Date

Outline Use Cases Hardware SelecKon VirtualizaKon Host ConfiguraKon Successes and Failures 5

Use Cases There are several deployment strategies for perfsonar Hardware: Bandwidth Only TesKng Latency Only TesKng Combined Individual NIC for Bandwidth and Latency TesKng Shared NIC 7

Bandwidth Use Case The bandwidth host is designed to saturate a network to gain a measure of achievable throughout (e.g. how much informakon can be sent, given current end- to- end condikons) Can test using TCP (will back off) or UDP (won t back off) the end result is skll the same ConnecKvity can be any size typically you will want a host that matches the bo@leneck of your network 8

Latency Use Case Tests are lightweight (e.g. smaller packets, less of them) Designed to measure things like ji@er (variakon in arrival Kmes of data), packet loss due to congeskon, and the Kme it takes to travel from source to desknakon ConnecKon can be smaller typically 100Mb or 1Gb conneckons will do fine. 10Gbps latency teskng is not really necessary 9

Why Separate These? Bandwidth teskng is heavy in that it is designed to fill the network as quickly as possible E.g. the memory on the host, the queues on the NIC, the LAN, the WAN, etc. Most throughput tests will cause loss, even if its temporal Latency teskng is light in that it wants to know if there is something that is perturbing the network CongesKon from other sources, a failing interface, etc. 10

Why Separate These? Because of the conflickng use case running these at the same Kme is problemakc A heavy bandwidth test could cause loss in the latency teskng. This makes it challenging to figure out where the loss is coming from; the host or the network If operakng two machines isn t possible, it is desirable to run these on a single host. There are to ways to do this: Dual NICs Single NIC, with isolated teskng 11

Dual NIC TesKng Use Case Newer releases of the perfsonar sokware facilitate the use of two interfaces Host- level roukng manages the test traffic to each of the interfaces Bo@lenecks are skll possible: If the host has a single CPU managing both sets of test traffic If there is a memory bo@leneck If the NICs do not have an offload engine, they both will need to rely on the CPU to manage traffic flow internally 12

Single NIC/Dual TesKng Use Case If the host has a single NIC, tests can be configured to share access: BWCTL and OWAMP tests will be mutually exclusive (they share a common scheduler) This prevents OWAMP from working in the normal streaming mode however, which will not pick up as many problems The previous bo@lenecks surrounding the NIC, CPU, and Memory are not as impacoul (e.g. they will skll be a problem, but impact both sets of tests equally, and one at at Kme) 13

Outline Use Cases Hardware SelecKon VirtualizaKon Host ConfiguraKon Successes and Failures 14

Hardware SelecKon SelecKng hardware to the job of measurement is not impossible OpKmize for the use case of memory to memory teskng, e.g. we don t care about the disk subsystem Things that ma@er CPU speed/number Motherboard architecture Memory availability Peripheral interconneckon NIC card design + driver support 16

CPU/Motherboard/Memory Motherboard/CPU Intel Sandy Bridge or Ivy Bridge CPU architecture Ivy Bridge is about 20% faster in prackce High clock rate be@er than high core count for measurement Faster QPIC for communicakon between processors MulK- processor is waste given that cores are more and more common Motherboard/system possibilikes: SuperMicro motherboard X9DR3- F Sample Dell Server (Poweredge r320- r720) Sample HP Server (ProLiant DL380p gen8 High Performance model) Memory speed faster is be@er We recommend at least 8GB of RAM for a test node (minimum to support the operakng system and tools). More is be@er especially for teskng over larger distances and to mulkple sites. 17

System Bus PCI Gen 3 (full 40G requires PCI Gen 3, some 10G will require Gen 3 mostly Gen 2) PCI slots are defined by: Slot width: Physical card and form factor Max number of lanes Lane count: Maximum bandwidth per lane Most cards will run slower in a slower slot Not all cards will use all lanes Example: 10GE NICs require an 8 lane PCIe- 2 slot 40G/QDR NICs require an 8 lane PCIe- 3 slot Most RAID controllers require an 8 lane PCIe- 2 slot A high- end Fusion- io card may require a 16 lane PCIe- 3 slot 18

NIC There is a difference between 1G and 10G (or larger) teskng As network speeds increase (e.g. requiring more packets to pass through interfaces per second) problems that are very nuanced become easier to see Failing equipment with small (<.01%) packet loss CRC errors Microbursts of congeskon Consider these opkons when choosing a NIC speed 19

NIC Driver support is key if it doesn t have a (recent) linux driver, avoid There is a huge performance difference between cheap and expensive 10G NICs. E.g. please don t cheap out on the NIC or opkcs If you have heard of the brand it probably will do fine NIC features to look for include: support for interrupt coalescing support for MSI- X TCP Offload Engine (TOE) Note that many 10G and 40G NICs come in dual ports, but that does not mean if you use both ports at the same 4me you get double the performance. Oken the second port is meant to be used as a backup port. Myricom 10G- PCIE2-8C2-2S Mellanox MCX312A- XCBT 20

Outline Use Cases Hardware SelecKon VirtualizaKon Host ConfiguraKon Successes and Failures 21

VirtualizaKon IntroducKon VirtualizaKon is the process of dividing up a physical resource into mulkple logical units Why would we want to do this? Scale a larger server with lots of capacity to do a number of tasks Separate funckons into different logical contains (e.g. a windows server that runs one funckon, a linux server that runs another) Reduce cooling/power cost by not requiring mulkple servers 23

VirtualizaKon IntroducKon A Virtual Machine has two components: Host: the physical server itself, having some number of resources (CPUs, memory, disks, network cards, etc.) Guest: virtual workloads that are run by the host. These share the underlying resources VirtualizaKon Plaoorm: VMware, Hyper- V, Citrix, XEN, etc. Sokware abstrackon between the hardware host, and the guests Hypervisor: management/monitoring sokware that is used to look aker the guest resources Isolates funckons Creates a layer between the physical hardware and the guests e.g. manages all of the interackons 24

VirtualizaKon IntroducKon 25

What Time is it? Known complicakon: the ability to keep accurate Kme. perfsonar uses NTP (network Kme protocol) which is designed to keep Kme monotonically increasing Slows a fast clock, skips ahead a slow clock. Never reverses Kme VM environments rely on the hypervisor to tell them what Kme is this means Kme could skip forwards, or backwards. IF NTP sees this, it turns off this is normally catastrophic for measurement purposes (when do I start? When do I end?) Picture on right ji@er observed aker a hypervisor adjusted the clock. 26

Pros: FuncKonality Comparison Ability to have many ecosystems (Windows, FreeBSD, Linux, etc.) invoked through a standard management layer UKlize resources horizontally on the machine. E.g. most Kmes a server sits idle if it has no task. By stacking mulkple guest machines onto a single host, the probability of the resource being be@er uklized increases Cons: Limit is reached when machines require resources beyond what is available. Can plan for this and design the system so its underuklized, or overprovision in the hopes that there will be no conflicts Because this is a shared resource, it won t do one job very well. 27

E2E ImplicaKons By adding new layers into our original end to end drawing, we add more sources of delay: ApplicaKon delay will be the same we would use iperf in either case There are now 2 operakng system delays we must contend with. Guest OS the perfsonar toolkit operakng environment Host OS perhaps this is windows, perhaps its linux, etc. This is what touches the real hardware. There are now 2 sets of hardware Guest Hardware which is just an emulakon of a processor, memory, and network card. The applicakon makes calls to these, but they will get translated through the hypervisor into real system calls to the base hardware Host hardware same as before, but shared We have an addikonal sokware layer (the hypervisor) that sits between the virtual and the real 28

Virtual End- to- End Network 29

Virtual End- to- End Network VM Src Host Delay: ApplicaKon wrikng to VOS VKernel wrikng via memory to VHardware VNIC wrikng to hypervisor Src Host Delay: Hypervisor wrikng to OS Kernel wrikng via memory to hardware NIC wrikng to network Src LAN: Buffering on ingress interface queues Dst LAN: Processing data for desknakon interface Egress interface queuing Transmission/SerializaKon to wire WAN: PropagaKon delay for long spans Ingress queuing/processing/egress queuing/serializakon for each hop VM Dst Host Delay: VNIC receiving data from hypervisor VKernel allocakng space, sending to applicakon ApplicaKon reading/ackng on received data Dst Host Delay: NIC receiving data Kernel allocakng space, sending to hypervisor Hypervisor reading/ackng on received data to a guest Buffering on ingress interface queues Processing data for desknakon interface Egress interface queuing Transmission/SerializaKon to wire 30

RealiKes New Sources of delay The hypervisor is now managing traffic for a number of other hosts. Think of this as a sokware controlled LAN it is a switch (running on shared hardware) that must route traffic to the hosts, in addikon to make sure none are starved for memory/compute resources The VNIC on each guest can t receive an enkre hardware NIC to itself (unless there are many available, and allocated for private use) The VCPU won t receive an enkre dedicated CPU unless configured to do so. If it can be bound, the handling of interrupts is skll slower than on bare metal If another guest is doing work and requeskng resources at the same Kme as a network measurement what happens? CompeKng for a processor/core/memory there will be a race condikon and someone may get starved The work of either machine will suffer - and this may happen a lot Do you want your DNS server for the campus down, or the perfsonar box? Also you don t usually get to make that choice, the hypervisor will. 31

ReacKon of tools RealiKes Recall that iperf/owamp etc. don t know what s in the middle; they are designed to test, and report some numbers. The addikon of new delays (e.g. due to queuing/processing of data between the guest, hypervisor, and host operakng system) is not negligible. It can be easily witnessed and this propagates into the measurements Recourse? DedicaKng specific resources to the guests Running less guests on a host to ensure higher levels of performance Both of these defeat the purpose of a virtual environment of course e.g. sharing resources 32

ConsolaKon Prize VirtualizaKon can be useful: TesKng virtual environments (e.g. cloud providers) Non- latency/bandwidth sensikve teskng (passive monitoring, etc.) Smaller performance expectakon versus the network E.g. if you are supporkng NDT teskng for 100s of 100MB connected laptops, a 1G or 10G NDT server in a virtual machine is far greater than the bo@leneck of performance 33

Outline Use Cases Hardware SelecKon VirtualizaKon Host ConfiguraKon Successes and Failures 34

Examples of Hardware Performance The following examples will demonstrate: The role of host tuning TesKng against hosts with different sized capacity Hosts that are of a different hardware lineage, and the impact on performance Comparison of virtual and real machine performance 36

Host Tuning of TCP Seyngs Long path (~70ms), single stream TCP, 10G cards, tuned hosts Why the nearly 2x upkck? Adjusted net.ipv4.tcp_rmem/wmem maximums (used in auto tuning) to 64M instead of 16M. As the path length/throughput expectakon increases, this is a good idea. There are limits (e.g. beware of buffer bloat on short RTTs) 37

Host Tuning of TCP Seyngs (Long RTT) 38

Host Tuning of TCP Seyngs The role of MTUs and host tuning (e.g. its all related ): 39

Speed Mismatch 1G to 10G SomeKmes this happens: Is it a problem? Yes and no. Cause: this is called overdriving and is common. A 10G host and a 1G host are teskng to each other 1G to 10G is smooth and expected (~900Mbps, Blue) 10G to 1G is choppy (variable between 900Mbps and 700Mbps, Green) h@p://fasterdata.es.net/performance- teskng/troubleshookng/interface- speed- mismatch/ h@p://fasterdata.es.net/performance- teskng/evaluakng- network- performance/impedence- mismatch/ 40

Speed Mismatch 1G to 10G A NIC doesn t stream packets out at some average rate - it s a binary operakon: Send (e.g. @ max rate) vs. not send (e.g. nothing) 10G of traffic needs buffering to support it along the path. A 10G switch/router can handle it. So could another 10G host (if both are tuned of course) A 1G NIC is designed to hold bursts of 1G. Sure, they can be tuned to expect more, but may not have enough physical memory Di@o for switches in the path At some point things downstep to a slower speed, that drops packets on the ground, and TCP reacts like it were any other loss event. 10GE 10GE DTN traffic with wire-speed bursts Background traffic or competing bursts 10GE 41

Hardware Differences Between Hosts There have been some expectakon management problems with the tools that we have seen Some feel that if they have 10G, they will get all of it Some may not understand the makeup of the test Some may not know what they should be geyng Lets start with an ESnet to ESnet test, between very well tuned and recent pieces of hardware 5Gbps is awesome for: A 20 second test 60ms Latency Homogenous servers Using fasterdata tunings On a shared infrastructure 42

Hardware Differences Between Hosts Another example, ESnet (Sacremento CA) to Utah, ~20ms of latency Is it 5Gbps? No, but skll outstanding given the environment: 20 second test Heterogeneous hosts Possibly different configurakons (e.g. similar tunings of the OS, but not exact in terms of things like BIOS, NIC, etc.) Different congeskon levels on the ends 43

Hardware Differences Between Hosts Similar example, ESnet (Washington DC) to Utah, ~50ms of latency Is it 5Gbps? No. Should it be? No! Could it be higher? Sure, run a different diagnoskc test. Longer latency skll same length of test (20 sec) Heterogeneous hosts Possibly different configurakons (e.g. similar tunings of the OS, but not exact in terms of things like BIOS, NIC, etc.) Different congeskon levels on the ends Takeaway you will know bad performance when you see it. This is consistent and jives with the environment. 44

Virtual Machine to Bare Metal Ex. The next example compares the results of teskng between domains ESnet Pacific Northwest GigaPoP LocaKon (Sea@le WA) Rutherford Lab (Swindon, UK) ESnet Host = 10Gbps connected Server RL Host 1 = 10Gbps connected Server RL Host 2 = VM with a 1Gbps VNIC, 10Gbps NIC on host 45

Virtual Machine to Bare Metal Ex. 46

Virtual Machine to Bare Metal Ex. 47

Real Host ObservaKons/Comments 80ms One way delay (160ms RTT). Stable over Kme. RL - > ESnet is slower than ESnet - > RL Could be differences in host hardware and TCP tuning No packet loss observed on the network This is good observakon if seen this could contribute to lower TCP performance 48

Virtual Machine to Bare Metal Ex. 49

Virtual Machine to Bare Metal Ex. 50

Virtual Host ObservaKons/Comments 80ms One way delay (160ms RTT). Mostly stable over Kme period of instability on host caused latency change RL - > ESnet is slower than ESnet - > RL Virtual host is underpowered vs. server, has less memory, CPU, and NIC. Packet loss observed More severe ESnet - > RL direckon. A factor of the virtual and real host at RL having problems dealing with influx of network traffic In either case packet loss contributes to low (and unpredictable) throughput 51

perfsonar Host Hardware This document is a result of work by the perfsonar Project (h@p://www.perfsonar.net) and is licensed under CC BY- SA 4.0 (h@ps://creakvecommons.org/licenses/by- sa/4.0/). Event Presenter, OrganizaKon, Email Date