CrossBow: From Hardware ized NICs t\ to ized Networks Sunay Tripathi, Nicolas Droux, Thirumalai Srinivasan, Kais Belgaied Aug 17 th. 2009 Sigcomm VISA 2009, Barcelona Sunay Tripathi, Distinguished Engineer, Sun Microsystems Inc Sunay.Tripathi@Sun.Com
Key Issues in Network ization Fair or Policy based resource sharing in virtualized environments > Bandwidth > NIC Hardware resources including descriptors > Processing CPUs Overheads due to ization > Latency > Extra processing > Throughput Security > New threats to L2 network Where to solve the problem? > Switches > L3/L4 devices > Hosts www.opensolaris.org/os/project/crossbow 2
Crossbow: Solaris Networking Stack 8 years of development work to achieve > Scalability across multi-core CPUs and multi-10gige bandwidth > ization, QoS, High Availibility designed in > Exploit advanced NIC features Key Enabler for > Server and Network Consolidation > Open Networking > Cloud computing www.opensolaris.org/os/project/crossbow 3
Crossbow Hardware Lanes Ground-Up Design for multi-core and multi-10gige Linear Scalability using 'Hardware Lanes' with dedicated resources Network ization and QoS designed in the stack More Efficiency due to 'Dynamic Polling and Packet Chaining' Physical NIC Physical Machine Switch Hardware Lane VLAN Separated C L A S S I F I E R Hardware Rings/DMA Hardware Rings/DMA Hardware Rings/DMA Kernel Threads and Queues Kernel Threads and Queues Kernel Threads and Queues NIC NIC Squeue Machine/Zone Machine/Zone Application www.opensolaris.org/os/project/crossbow 4
Hardware Lanes and Dynamic Polling Partition the NIC Hardware ( rings, DMA), kernel queues/threads, and CPU to allow creation of Hardware Lane which can be assigned to VNICs & Flows Use Dynamic Polling on rings to schedule rate of packet arrival and transmission on a per lane bassis Effect of dynamic polling Mpstat (older driver) intr ithr csw icsw migr smtx srw syscl usr sys wt idl 10818 8607 4558 1547 161 1797 289 19112 17 69 0 12 Mpstat (GLDv3 based driver) intr ithr csw icsw migr smtx srw syscl usr sys wt idl 2823 1489 875 151 93 261 1 19825 15 57 0 27 Use Dynamic polling for B/W partitioning and isolation without any support from switches and routers ~75% Fewer Interrupts ~85% Fewer Context Switches ~85% Fewer Mutexes ~15% More CPU Free www.opensolaris.org/os/project/crossbow 5
Network Containers ization Flows NICs & Switches Wire Solaris Global Zone Zone xb1-z1 Zone xb1-z2 Resource Control Bandwidth Partitioning NIC H/W Partitioning CPUs/pri assignment bge0 Exclusive IP Instance VNIC1 (100Mbps) Exclusive IP Instance VNIC2 (200Mbps) Observability Real time usage for each Link/flow Finer grained stats per Link/flow History at no cost NIC DMA DMA Flow Classifier DMA Client xb2 Client xb3 www.opensolaris.org/os/project/crossbow 6
NIC (VNIC) & Switches NICs > Functionally physical NICs: > IP address assigned statically or via DHCP and snooped individually > Appear in MIB as separate 'if' with configured link speed shown as 'ifspeed' > VNICs can be created over Link Aggregation on can be assigned to IPMP groups for load balancing and failover support > VNICs Can have multiple hardware lanes assigned to them > Can be created over physical NIC (without needing a Vswitch) to provide external connectivity with switching done in NIC H/W > VNICs have configurable link speed, CPU and priority assignment > Standards based End to End Network ization > VLAN tags and Priority Flow Control (PFC) assigned to VNIC extend Hardware Lanes to Switch > No configuration changes needed on switch to support virtualization Switches > Can be created to provide private connectivity between Machines www.opensolaris.org/os/project/crossbow 7
NIC & Switch Usage # dladm create-vnic -l bge1 vnic1 # dladm create-vnic -l bge1 -m random -p maxbw=100m -p cpus=4,5,6 vnic2 # dladm create-etherstub vswitch1 # dladm show-etherstub LINK vswitch1 # dladm create-vnic -l vswitch1 -p maxbw=1000m vnic3 # dladm show-vnic LINK OVER MACTYPE MACVALUE BANDWIDTH CPUS vnic1 bge1 factory 0:1:2:3:4:5 - - vnic2 bge1 random 2:5:6:7:8:9 max=100m 4,5,6 vnic3 vswitch1 random 4:3:4:7:0:1 max=1000m - # dladm create-vnic -l ixgbe0 -v 1055 -p maxbw=500m -p cpus=1,2 vnic9 www.opensolaris.org/os/project/crossbow 8
Physical Wire w/physical Machines Client Router Host 1 Host 2 Port 6 20.0.03 Port 9 20.0.01 Port 3 10.0.03 Port 1 10.0.01 Port 2 10.0.02 1 Gbps 1 Gbps 1 Gbps 100 Mbps 1 Gbps Switch 3 Switch 1 Wire w/ Network Machines Client Router ( Router) Host 1 Host 2 VNIC6 VNIC9 VNIC3 VNIC1 VNIC2 20.0.03 20.0.01 10.0.03 10.0.01 10.0.02 1 Gbps 1 Gbps 1 Gbps 100 Mbps 1 Gbps EtherStub 3 EtherStub 1 www.opensolaris.org/os/project/crossbow 9
Related Work Commercial/Products > Vmware Hypervisor > Linux/Xen Hypervisor > Cisco UCS/VMware based solutions Research Community > OpenFlow programmable switch > Various Linux/BSD based efforts www.opensolaris.org/os/project/crossbow 10
BACKUP
Solaris Core Network Functionality Networking Services > Routing Protocols using Quagga > L3/L4 Load Balancer kernel modules > IP Firewall (IPFilter) > DNS, DHCP, NTP, SIP, VOIP, etc Scalable & ized Network Stack > Kernel Socket & Socket Filter > Modernized TCP/IP Stack > QoS: B/W limits, Priorities, CPU bindings > IP Multi Pathing (IPMP) > IP Tunneling > Defense against DDoS attacks Crossbow: Networking > VNICs, VSwitches, VWire > Service ization (Flows) > L2 Services: Classification, Filtering Generic LAN Driver v3 GLDv3 > Aggregation > Vanity Names > Drivers (1GbE and 10GbE, FCoE, IPoIB) User Kernel Driver Developer Tools and Management Interfaces Routing Protocols (Quagga) IP Tunnels L2 Bridge Scalable ized TCP/IP Stack Crossbow: Network ization NICs Flows IP Multi Pathing Kernel Sockets Switches QoS IPFilter (Firewall) L3/L4 Load Balancer Wire L2 Classification, Filtering Observability Generic LAN Driver GLDv3 Aggr, SR-IOV, Vanity Names 1gigE/10gigE (Neptune, Niantic, etc) VRRP (Routing HA) FCOE Perf Diag Tools IPoIB Kernel Socket API IP Hooks API MAC Client API MAC Driver API S Y S A P I s www.opensolaris.org/os/project/crossbow 12
Crossbow Flows: Service ization Services and Protocols Compute Resources CPU 1 VIRTUAL CPU 2 VIRTUAL CPU 'n' VIRTUAL CPU 1 Squeue CPU 2 Squeue VOIP HTTPS DEFAULT TCP UDP DEFAULT Kernel threads/qs Kernel threads/qs Kernel threads/qs Kernel threads/qs Kernel threads/qs Kernel threads/qs Memory Partition Memory Partition Memory Partition Memory Partition Memory Partition Memory Partition NIC 1 Flow Classifier NIC 2 Flow Classifier www.opensolaris.org/os/project/crossbow 13
Crossbow Flows Crossbow Flows based on: > Services (protocol + remote/local ports) > Transport (TCP, UDP, SCTP, iscsi, etc) > Remote and local IP addresses > Remote IP Subnets > DSCP labels Following attributes can be set on each Flow > B/W limits > Priorities > CPUs # flowadm create-flow -l bge0 protocol=tcp,local_port=443 -p maxbw=50m http-1 # flowadm set-flowprop -l bge0 -p maxbw=100m http-1 www.opensolaris.org/os/project/crossbow 14
Machines Solaris Host OS NIC ization Engine Host OS VIRTUAL All Traffic Solaris Guest OS 1 NIC ization Engine HTTP Guest OS 1 VIRTUAL HTTPS DEFAULT Solaris Guest OS 2 NIC ization Engine Guest OS 2 VIRTUAL All Traffic Host OS VNIC NIC NIC NIC Guest OS 2 VNIC Host OS All traffic Guest OS 1 HTTP Guest OS 1 HTTPS Guest OS 1 DEFAULT Guest OS 2 All Traffic H/W Flow Classifier NIC www.opensolaris.org/os/project/crossbow 15
Dynamic Polling: Effect on Throughput Bi-Directional Thruput (Gbps) High Load TCP Read/Write Test 5 Clients (pktsz=1500; wrtsz=8k) 13 12 11 10 9 8 7 6 5 4 3 2 1 0 5 5 Client Read/Write 3 Reading/2 Writing 10 thread/client Config Details: 5 Client; 1 Server 10GigE Links 3 Clients reading (10 thread each) 2 Clients writing (10 thread each) All Client/Sever: x4150 dual soc 8x2.8Ghz Intel CPU 10 GigE NIC Intel Oplin (ixgbe) Xbow2 Fedora 2.6 1 2 3 4 www.opensolaris.org/os/project/crossbow 16 Number of Packets Chain Lengths Pkts Rcv'd via interrupt/poll 5000000 4500000 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% Lane Number Chain Lengths 1 2 3 4 Lane Numbers Pkts by Interrupt Pkts by Poll Total Pkts Chains > 50 pkts Chains 10 50 Pkts Chains < 10 Pkts
Dynamic Polling: Effect on Latency UDP 66byte pkt Low Load Latency Test Pkts Received via Interrupt/Poll Ratio Number of Chains < 10 pkts Txn/Sec (66byte packets) 20 18 16 14 12 10 8 6 4 2 0 1 2 3 4 5 Number of Clients Xbow2 Fedora 2.6 Interrupt/Poll Percent 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1 2 3 4 5 Number of Clients Pkts by Poll Pkts by Interrupts Chains < 10 pkts 600000 550000 500000 450000 400000 350000 300000 250000 200000 150000 100000 50000 0 1 2 3 4 5 Number of Clients Chains < 10 pkts Txn/Sec (66 byte packets) UDP 66byte pkt High Load Latency Test 120 110 100 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 Number of Clients Xbow2 Fedora 2.6 Pkts Received via Interrupt/Poll Ratio Interrupt/Poll Ratio 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1 2 3 4 5 Number of Clients Poll Interrupts Chain Lengths 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% Pkt Chain Lengths 1 2 3 4 5 Number of clients Chains > 50 pkts Chains 10 50 pkts Chains < 10 pkts www.opensolaris.org/os/project/crossbow 17
Defense against DOS/DDOS DDOS have the ability to cripple entire server farms and all services offered by them Only the impacted services or virtual machine takes the hit instead of the entire grid Under attack, impacted services start all new connections under lower priority flow with limited bandwidth Connections transition to appropriate priority stacks after application authentication IDS systems can use Crossbow APIs to create '0' B/W flows based on remote IP addresses or subnets of the attackers and minimize their impact www.opensolaris.org/os/project/crossbow 18
Network Machines Networking as a Service (NaaS) Network Machines Networking as a Service Monetize via the subscription model in cloud using virtualized networking services like vrouter, vloadbalancer, vfirewall, vdhcpserver, vdnsserver, etc ized Networking Services wrapped in Solaris Zone/Xen/VB running on dedicated Networking blades/appliance Open Source ized Networking Services VNICs and Vswitches provide the virtualized ports similar to physical ports Enable Networks with configurable link speeds using Wire Management for a Network Machines Solaris command line Cisco Style 'cli' Web based www.opensolaris.org/os/project/crossbow 19
Network Machine Appliance for the cloud P e r i m e t e r F W X M L M e s s a g e S w i t c h I n t r u s i o n D e t e c t i o n A p p l i c a t i o n S w i t c h 1 R o u t e r - O S P F S S L V P N G a t e w a y A s tr i s k V o I P P B X The network is the computer Crossbow Networking as a Service (Naas) Subscription based or dedicated appliance www.opensolaris.org/os/project/crossbow 20
Network Machines over 10Gbe OpenSolaris vfirewall, vvpn vrouter vntp, vdhcp, vdns, vldap,.. Dedicated CPUs N2/x64 Server/Blades TCP/ UDP IP NIC A NIC A TCP/ UDP IP NIC B TCP/ UDP IP NIC B APIs for ISVs at each layer 10Gbe NIC/ NIU DMA DMA DMA Flow Classifier & Offload Eng. NIC A DMA DMA DMA Flow Classifier & Offload Eng. NIC B WAN Data Center VLAN'd ETH Fabric www.opensolaris.org/os/project/crossbow 21
Cloud Machines over 40Gbs IB OpenSolaris Dom0 DomU... DomU Dedicated CPUs N2/x64 Server/Blades iser, NFS,... RDMA, IPoIB IBTF VNIC/ EoIB Apps TCP/IP VNIC/ EoIB...... Apps TCP/IP VNIC/ EoIB APIs for ISVs at each layer Q-Pair Q-Pair Q-Pair Q-Pair Q-Pair Q-Pair Infiniband Firmware HCA A Infiniband Firmware HCA B Dom0 IB Partition DomU IB Partition www.opensolaris.org/os/project/crossbow 22
Open Storage Networking Priority Based Flow Control (PFC) > 8 ethernet virtual lanes with their own pause mechanism > Extend the Crossbow H/W ized Lanes to the switch Enhanced Transmission Selection (ETS) > Add Class of service support within the ethernet virtual lane > Extend the Crossbow flow based QoS to the switch Link Layer Discovery Protocol (LLDP) and Congestion notification (optional) PFC and ETS is useful in normal virtualization and server QoS scenarios PFC, ETS, and LLDP are necessary to implement Data Center Bridge Exchange protocol (DCBX) and FCOE www.opensolaris.org/os/project/crossbow 23
Join Us... Our communities and projects are open on OpenSolaris.org: > CrossBow: http://opensolaris.org/os/project/crossbow > VNM: http://opensolaris.org/os/project/vnm > Networking: http://opensolaris.org/os/community/networking Where you will find: > Active discussions, design docs, FAQs, source code drops, preliminary binary releases, etc... www.opensolaris.org/os/project/crossbow 24