Data Center Infrastructure of the future Alexei Agueev, Systems Engineer
Traditional DC Architecture Limitations Legacy 3 Tier DC Model Layer 2 Layer 2 Domain Layer 2 Layer 2 Domain Oversubscription Ports on devices are oversubscribed ~ 8:1 Higher Oversubscrip5on as traffic traverses north ~ 20:1 Mobility What happens if my IP changes? What happens if traffic panern changes? Vendor Proprietary Lock into a vendor Lock into an architecture Non open source implementa5ons Proprietary protocols Monitoring Reac5ve v Proac5ve No flow informa5on Troubleshoo5ng is tough Latency High latency Low predictability Management Each device has to be managed individually Increased management footprint Cost As mul5ple layers, it can get $$$ Scalability Scales up and not scales out Dependent on specific hardware (mix & match) Not scalable to 40GbE / 100GbE DC Automation No automa5on available to quickly deploy infrastructure No automa5on tools available 2
Three ways to scale: L2 / L3 / VXLAN... MLAG Spine L2...... ECMP Layer 3 Controller...... VXLAN Layer 2 over Layer3 Layer 2 - All Active Multipath for L2 and L3 Standards based protocols (LACP) Simplifies or eliminates the Spanning Tree topology Simple to understand and easy to engineer traffic Layer 3 - All Active Multipath using ECMP (up to 32x) Standards based protocols (OSPF, BGP) Eliminates L2 for exceptional scalability and fault tolerance Exceptional scale with consistent performance VXLAN & L3 - Best of both worlds All Active Multipath using ECMP (up to 32-way) Single L3 network for all applications L2 extensions for stateful VM VM over Layer 3 Extends L2 with exceptional scalability (16M Virtual NWs) Cloudburst over Public infrastructure 3
One network, all apps, >95% of WW DC 1.28 Tbps DC Egress Traffic 11,520 10GbE host ports available 2x Storage, 1x Services, 1x Edge, 1 Mgmt, 120-240x Compute Leaves 4
Dedicated Leaf Architectures 1:1 Capacity with Service Throughput, focus on offload and flow assist 1:1/2:1 Capacity, focus on deep buffering to handle TCP Incast and speed mismatch 3:1+ Capacity, focus on reliability and service availability 1:2 Capacity get traffic to edge routers and optimize return path 1:1-3:1 Capacity, use LANZ to monitor congestion and VXLAN for workload portability 1:1-3:1, primarily for workloads where the applications provides reliability and availability 5
Universal Cloud Infrastructure Options Spline Layer 2 / MLAG Layer 3 / ECMP L2 over L3 VXLAN Servers Middle Servers of Row Servers Servers Servers Server Scale: 100 to 1,000 100 to 10,000 100 to 100,000+ 100 to 100,000+ All Standard Based No Proprietary Fabrics! 6
Importance of Standards: BGP vs OSPF Test bed that emulates 72 containers Each container has 2 shim nodes Objective study system and route table behavior when control plane is operating in a state that mimics production S1 S2 S3 S4 S5 S6 S7 S8 shim 1 shim 2 shim 3 shim 4 shim 143 shim 144 Container 1 Container 2 Container 72 7
BGP vs OSPF: Control plane traffic load Wireshark pcap files captured for both OSPF and BGP spine designs Using the wireshark statistics tools, we can study the control plane traffic sent/received to leaf-node-1 when operating in OSPF and BGP BGP OSPF 8
VXLAN: The Concept Virtual extensible LAN (VXLAN) Introduction - VXLAN creates logical L2 domains over standard L3 infrastructure - VXLAN is an extended version of regular bridging, it connects bridges through a L3 multi-point tunnel. - Like traditional bridging, it works using flooding, learning, and reverse path forwarding - VM traffic is encapsulated by a VTEP (Virtual Tunnel End Point) into the tunnel. - The VTEP can be realized in software directly in the hypervisor or by a physical switch at the ToR. VXLAN gateway Software VTEP Non VXLAN aware end nodes Layer 3 Core Hardware VTEP Software VTEP VM-1 VNI=10 VM-1 VNI=10 9
VXLAN: Software-Defined Overlay Network 192.168.1.x Rigid, hierarchical, structured, and designed to be easily managed Variable Flexible, mobile, agile designed to be provisioned via cloud platforms VNI - Rogers 10.20.20.x VNI - Darby 10.11.10.x VNI - Ranger 10.10.10.x 10
VXLAN: Routing VXLAN Routing Between VNI s Direct Un-encapsulated VXLAN traffic Encapsulated VXLAN traffic Spine Tier VTEP3 VTEP4 Leaf Tier IP Fabric VTEP5 VTEP6 VTEP1 VTEP2 IP Storage HYPERVISOR 1 HYPERVISOR 2 A 1 B 1 A2 B2 Every ToR is a Hardware VTEP Bare Metal Servers Every VTEP has a router interface in every virtual Layer 2 domain Every VTEP is configured with an Anycast Gateway (VARP) address to avoid tromboning of egress traffic 11
VXLAN: Routing VXLAN Routing Between VNI s Indirect/Naked Un-encapsulated VXLAN traffic Encapsulated VXLAN traffic Spine Tier Leaf Tier IP Fabric VTEP 1 VTEP 2 HYPERVISOR 1 HYPERVISOR 2 A 1 B 1 A2 B2 IP Storage VTEP3 External Default Gateway for VNIs Every Hardware VTEP is the default gateway for a small subset of virtual Layer 2 bridging domains Traffic forwarding is very similar to Direct Routing, however, the VXLAN packets received and forwarded are on remote VTEPs. Rather than having default gateway on every ToR, only a specific set of switches are dedicated within the infrastructure to share VNI s and route traffic inter VNI. This provides for a more scalable design. 12
Open, Universal, Highly Automatable Network Cloud Management System and/or Network Controller VMware NSX Native Openstack VMware NSX Native vsphere 3.x, 4.x, 5.x 13
Underlay Network Challenges Automate Deployments - Zero Touch Provisioning Congestion Management - Latency Analyzer - Virtual Output Queuing - Data Center Bridging End to End Visibility - Tracer Toolset Traffic Analysis - Data Analyzer Toolset Device Management - Multi-Switch CLI Proactive Notification - Advanced Event Monitor 14