BGP dans le ToR. FRnOG 23. Antoine Sibout Datacenter Architect Juniper Networks



Similar documents
CLOS IP FABRICS WITH QFX5100 SWITCHES

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

SDN CONTROLLER. Emil Gągała. PLNOG, , Kraków

Data Center Network Virtualisation Standards. Matthew Bocci, Director of Technology & Standards, IP Division IETF NVO3 Co-chair

Solution Guide. Software as a Service. Modified: Copyright 2015, Juniper Networks, Inc.

Introduction to MPLS-based VPNs

Transform Your Business and Protect Your Cisco Nexus Investment While Adopting Cisco Application Centric Infrastructure

Simplify Your Data Center Network to Improve Performance and Decrease Costs

CLOUD NETWORKING FOR ENTERPRISE CAMPUS APPLICATION NOTE

ETHERNET VPN (EVPN) NEXT-GENERATION VPN FOR ETHERNET SERVICES

Cisco Virtual Topology System: Data Center Automation for Next-Generation Cloud Architectures

Architecting Data Center Networks in the era of Big Data and Cloud

WHITEPAPER. Bringing MPLS to Data Center Fabrics with Labeled BGP

Introduction to Network Virtualization in IaaS Cloud. Akane Matsuo, Midokura Japan K.K. LinuxCon Japan 2013 May 31 st, 2013

Core and Pod Data Center Design

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

WHITE PAPER. Network Virtualization: A Data Plane Perspective

Building a small Data Centre

CLOUD NETWORKING THE NEXT CHAPTER FLORIN BALUS

RFC 2547bis: BGP/MPLS VPN Fundamentals

Scalable Approaches for Multitenant Cloud Data Centers

Cloud Networking: Framework and VPN Applicability. draft-bitar-datacenter-vpn-applicability-01.txt

DD2491 p MPLS/BGP VPNs. Olof Hagsand KTH CSC

Roman Hochuli - nexellent ag / Mathias Seiler - MiroNet AG

Extending Networking to Fit the Cloud

Network Working Group Request for Comments: March 1999

Recent Progress in Routing Standardization An IETF update for UKNOF 23

Brocade Data Center Fabric Architectures

Understanding Virtual Router and Virtual Systems

IPv6 over IPv4/MPLS Networks: The 6PE approach

Brocade Data Center Fabric Architectures

JUNIPER DATA CENTER EDGE CONNECTIVITY SOLUTIONS. Michael Pergament, Data Center Consultant EMEA (JNCIE 2 )

Data Center Infrastructure of the future. Alexei Agueev, Systems Engineer

ETHERNET VPN (EVPN) OVERLAY NETWORKS FOR ETHERNET SERVICES

Implementing MPLS VPN in Provider's IP Backbone Luyuan Fang AT&T

White Paper. Contrail Architecture

DD2491 p BGP-MPLS VPNs. Olof Hagsand KTH/CSC

TECHNOLOGY WHITE PAPER. Correlating SDN overlays and the physical network with Nuage Networks Virtualized Services Assurance Platform

SOFTWARE-DEFINED NETWORKING AND OPENFLOW

VXLAN: Scaling Data Center Capacity. White Paper

TRILL for Data Center Networks

NVO3: Network Virtualization Problem Statement. Thomas Narten IETF 83 Paris March, 2012

How Routers Forward Packets

Data Center Use Cases and Trends

Juniper Networks EVPN Implementation for Next-Generation Data Center Architectures

Stretched Active- Active Application Centric Infrastructure (ACI) Fabric

How To Make A Network Secure

Demonstrating the high performance and feature richness of the compact MX Series

PRASAD ATHUKURI Sreekavitha engineering info technology,kammam

Cisco s Massively Scalable Data Center

Implementing VPN over MPLS

SDN PARTNER INTEGRATION: SANDVINE

Analyzing Capabilities of Commercial and Open-Source Routers to Implement Atomic BGP

HP Networking BGP and MPLS technology training

Transitioning to BGP. ISP Workshops. Last updated 24 April 2013

Evolution of Software Defined Networking within Cisco s VMDC

Introduction Inter-AS L3VPN

Network Virtualization for Large-Scale Data Centers

Data Center Fabrics What Really Matters. Ivan Pepelnjak NIL Data Communications

VMware. NSX Network Virtualization Design Guide

Internet Engineering Task Force Marc Lasserre. Expires: Nov 2014 Thomas Morin France Telecom Orange. Nabil Bitar Verizon. Yakov Rekhter Juniper

SDN and Data Center Networks

S ITGuru Exercise (3: Building the MPLS BGP VPN) Spring 2006

Avaya VENA Fabric Connect

VMware and Brocade Network Virtualization Reference Whitepaper

Solution Guide. Software as a Service. Modified: Copyright 2016, Juniper Networks, Inc.

Network Virtualization for the Enterprise Data Center. Guido Appenzeller Open Networking Summit October 2011

Virtual Private LAN Service on Cisco Catalyst 6500/6800 Supervisor Engine 2T

You can t build a new future on old technologies Juniper Networks. Enabling the Hi-IQ network of tomorrow

Multi-Tenant Isolation and Network Virtualization in. Cloud Data Centers

EVOLVING ENTERPRISE NETWORKS WITH SPB-M APPLICATION NOTE

VIRTUALIZED SERVICES PLATFORM Software Defined Networking for enterprises and service providers

VXLAN Bridging & Routing

TRILL Large Layer 2 Network Solution

SOFTWARE DEFINED NETWORKING: INDUSTRY INVOLVEMENT

NSX TM for vsphere with Arista CloudVision

MPLS Concepts. Overview. Objectives

Network Configuration Example

Kingston University London

MPLS Implementation MPLS VPN

VMDC 3.0 Design Overview

RIDE THE SDN AND CLOUD WAVE WITH CONTRAIL

MPLS-based Layer 3 VPNs

Building A Cheaper Peering Router. (Actually it s more about buying a cheaper router and applying some routing tricks)

SOFTWARE-DEFINED NETWORKING AND OPENFLOW

Software Defined Cloud Networking

Brain-Slug: a BGP-Only SDN for Large-Scale Data-Centers

BROADCOM SDN SOLUTIONS OF-DPA (OPENFLOW DATA PLANE ABSTRACTION) SOFTWARE

Qualifying SDN/OpenFlow Enabled Networks

2013 ONS Tutorial 2: SDN Market Opportunities

APNIC elearning: Introduction to MPLS

BGP as an IGP for Carrier/Enterprise Networks

VXLAN Overlay Networks: Enabling Network Scalability for a Cloud Infrastructure

Networking in the Era of Virtualization

MPLS VPN Services. PW, VPLS and BGP MPLS/IP VPNs

VIABILITY OF DEPLOYING ebgp AS IGP IN DATACENTER NETWORKS. Chavan, Prathamesh Dhandapaani, Jagadeesh Kavuri, Mahesh Babu Mohankumar, Aravind

Analysis of Network Segmentation Techniques in Cloud Data Centers

DEMYSTIFYING ROUTING SERVICES IN SOFTWAREDEFINED NETWORKING

Experiences with BGP in Large Scale Data Centers: Teaching an old protocol new tricks

Cloud Fabric. Huawei Cloud Fabric-Cloud Connect Data Center Solution HUAWEI TECHNOLOGIES CO.,LTD.

Transcription:

BGP dans le ToR FRnOG 23 Antoine Sibout Datacenter Architect Juniper Networks

Datacenter traditionnel Underlay & Vlans Routing & Filtering between VLANs L3 L3 Routing & Filtering between VLANs FW LB L2/L3 L2/L3 No VLANs Across L3 L2/L3 L2/L3 FW LB STP Or MAC-LAG L2 Switch ToR L2 Switch L2 Switch ToR L2 Switch L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 L2 s Servers VLAN Span Limit

Evolution vers des architectures Clos IP Fabrics Spine Middle Spine Leaf Leaf Leaf Leaf Ingress Scale Egress

Émergence de Clos Fabrics à 3-5 étages 3-Stage Clos Spine and Leaf 5-Stage Clos PODs 5-Stage Clos Performance S S S S S S S S S S S S S S S S L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L A A A A A A A A A A A A A A A A A A A A A A A A Topologie flexible en fonction des taux d oversubscription 1 Subnet par rack (leaf/access node) --------- Besoin de choisir un protocole de routage Et de quelques outils (plug&play?)

Pourquoi du L3 jusqu au ToR? MSDC (Datacenter de très grande capacité) Nouvelles applications scale-out Hadoop cluster Nouvelles applications cloud distribuées Nouvelle repartition de trafic (Est/Ouest) Convergence sur Ethernet et vers IP maintenant Stockage base sur IP ( block sur IP, système de fichiers distribués, stockage objet) Overlay networking L3 scale très bien (partage de charge, agrégation, hiérarchie) Limitation des effets de bord L2 (ex: tempête broadcast, blast radius ) Basé sur un standard bien connu & largement inter-operable (multi vendeurs, multi chip, multi-protocoles)

Choix d un protocole de routage Simple & well known protocol Robust & scalable (to support large scale clusters ) Easy to understand & debug Multi-vendor Flexible & Extensible BGP Been around forever Powers Internet Rib in // Rib out Internet again more on this later Requirement OSPF IS-IS BGP Advertise prefixes Yes Yes Yes Scale Limited Limited Yes Traffic Engineering Limited Limited Yes Traffic Tagging Limited Limited Yes Multi-Vendor Stability Yes Yes Even more so

Options : E/I-BGP i-bgp Tendance naturelle (1 AS) Besoin de RR pour scale/simplification BGP add-path pour RIB ECMP sur FIB e-bgp 1 AS par Switch (privee) Multipath multi-as pour RIB ECMP sur FIB Gestion plus naturelle des policies (Traffic engineering)

Que faut-il configurer? S S S S L L L L L L L L L L L L L L L L BGP ASN assignments IP address scheme P2P network assignments P2P address assignments Server-facing network assignments Loopbacks BGP import/export policies Couple of knobs (ECMP) 4 spines + 16 leaves 64 links (ie P2P Networks) + 128 Interfaces IP (Just infrastructure) for each and every ToR

Exemple détaillé d utilisation BGP jusqu au ToR Spine BGP RR ibgp Down vspine1 ASN 1 vspine2 ASN 2 BGP RR BGP RR BGP RR BGP RR ibgp ibgp Leaf ibgp Up ebgp Down 32x40GE 32x40GE 32x40GE 32x40GE 32x40GE 32x40GE 32x40GE 32x40GE ebgp Access ebgp Up 96x10GE 96x10GE 96x10GE 96x10GE 96x10GE 96x10GE 96x10GE 96x10GE ASN 11 ASN 12 ASN 13 ASN 14 ASN 15 ASN 16 ASN 17 ASN 18

Exemple de configuration L L L L S S S S protocols { bgp { L L L L L L L L L L L L 4 spines + 16 leaves 64 links (ie P2P Networks) + 128 Interfaces IP (Just infrastructure) log-updown; import bgp-clos-in; export bgp-clos-out; graceful-restart; group CLOS { type external; mtu-discovery; bfd-liveness-detection { minimum-interval 350; multiplier 3; session-mode single-hop; multipath multiple-as; neighbor 192.168.0.1 { peer-as 200; neighbor 192.168.0.3 { peer-as 201; neighbor 192.168.0.5 { peer-as 202; neighbor 192.168.3.7 { peer-as 303; neighbor 192.168.3.9 { peer-as 304; neighbor 192.168.3.11 { peer-as 305; neighbor 192.168.3.13 { peer-as 306; neighbor 192.168.3.15 { peer-as 307; neighbor 192.168.3.17 { peer-as 308; neighbor 192.168.3.19 { peer-as 309; neighbor 192.168.3.21 { peer-as 310; neighbor 192.168.10.1 { peer-as 1001; neighbor 192.168.10.2 { peer-as 1002; neighbor 192.168.10.3 { peer-as 1003; neighbor 192.168.10.4 { peer-as 1004; neighbor 192.168.10.5 { peer-as 1005; neighbor 192.168.10.6 { peer-as 1006; neighbor 192.168.10.7 { peer-as 1007; neighbor 192.168.10.8 { peer-as 1008;

Automatisation - Outils Besoin de générer automatiquement la configuration Liste de variables + Topologie (ASN, Subnet dans fichier YAML/Script) Python avec Template (jinja2) Chargement initial de la configuration Zero Touch Provisioning (ZTP) Ou utilisation des interfaces de configuration du switch Netconf / XML API Python wrapper : PythonEZ mini-framework Ansible Outil pour valider la configuration et l'état opérationel de l infrastructure PythonEZ // JSNAP => Utilisation aisée de la structure XML BMP/SNMP Outils opensource - Github

Conclusion 1ère utilisation décrite dans la présentation : routage IP dans l infrastructure jusqu au ToR Simple, scalable, robuste, inter-operable Et extensible Il y a bien d autres avantages à utiliser BGP: Routage dans le serveur, multihoming/anycast (Quagga) Echange d états L2 (MAC Address): EVPN (over VXLAN ou MPLS) Création d Overlay dans la Fabric (L2 sur Fabric L3) Echange d états entre un réseau SDN/Overlay & l Underlay (infra) Le contrôleur peer en BGP vers le réseau physique Les routes (L2/L3) sont échangées ET contrôlées En utilisant des standards VPN-IP routes (rfc4364 aka 2547 VPNs) MAC routes (draft-ietf-l2vpn-evpn)

Questions? asibout@juniper.net

MULTI-STAGE CLOS ROLES Spine Leaf Backplane of multi-stage CLOS Always 1:1 Over-Subscription Provide BGP Route Reflection Peers via ibgp to Leaf nodes NNI of multi-stage CLOS Variable Over-Subscription Peers via ibgp to Spine nodes Peers via ebgp to Access nodes vspine Combination of Spine and Leaf Acts as a logical switch Virtual peering point for Access Over-Subscription dependent on the Spine and Leaf roles Single BGP Autonomous System Number Peers via ebgp to access switches Access Provide access to end-points such as compute and storage Typically 3:1 Over-Subscription in ENT and SP environments, and 1:1 for HPC Peers via ebgp to vspine nodes Provides L3 gateway services to end-points Provides Link Aggregation to end-points

SDN/Overlay - NVo3 & Doughnuts

State management for Cloud computing service Cloud Computing service requires managing (very) large amount of state Due to the need to support (very) large number of customers (aka tenants ), with multiple Virtual Machines (s) per customer Driven by the need to take advantage of the economy of scale Cloud Computing service requires managing (fairly) dynamic state Due to creation, termination, and moves of s Driven by the need to support fast on-demand delivery of required Cloud Computing service Driven by the need to optimize usage of resources (e.g., servers) used by Cloud Computing service Scalability, both in terms of the total amount of state, and in terms of the state s dynamic is of critical importance Note: In the rest of the slides the term state used by Cloud Computing services refers to the state associated with s of individual customers/tenants of Cloud Computing service

Scalability via Virtualization doughnut DC = Data Center DC 2 Intra-DC network NVE NVE DC 1 Intra-DC network Inter-DC network DC 3 Intra-DC network NVE s on the outside: connected to the outer virtualization edge Intra-DC/Inter-DC networks on the inside: connected to the inner virtualization edge Network Virtualization Edge devices (NVEs) NVEs use some form of tunnel encapsulation to form an overlay over the network infrastructure that interconnects them (just like PEs in 2547 VPN)

What is in the doughnut? Virtualization doughnut has to include servers -> servers must be in the doughnut Where are ToR switches? In or out of the doughnut? Where is NVE? THICK DOUGHNUT THIN DOUGHNUT Server1 Server7 Server6 ToR1 Server2 ToR2 ToR4 ToR3 Server3 Server4 Server5 Server1 Server7 Server6 ToR1 Server2 ToR4 ToR2 ToR3 Server3 Server4 Server5 Thick doughnut places NVEs on ToRs Thin doughnut places NVEs on servers -> ToRs are part the doughnut -> ToRs are NOT part of the doughnut

Thick vs Thin doughnut THICK DOUGHNUT THIN DOUGHNUT Server7 Server1 Server6 ToR1 ToR4 Server2 ToR2 ToR3 Server3 Server4 Server5 Server1 Server7 Server6 ToR1 Server2 ToR4 ToR2 ToR3 Server3 Server4 Server5 In thick doughnut service creation /management requires coordination between servers and ToRs NVE on ToR has to maintain state for s of a given customer/tenant, as long as the customer/tenant has at least one of these s hosted on any of the servers connected to that ToR No need for orchestration NVE on server has to maintain state for s of a given customer/tenant, as long as the customer/tenant has at least one of these s hosted on that server Thin doughnut allows to improve scalability by reducing the amount of state that an individual NVE has to maintain (relative to thick doughnut)

Distributed State management with BGP Scalability via state management hierarchy: state management endpoint need not have control plane peering with every other state management endpoint State management hierarchy MUST NOT force hierarchy on the data plane Important for efficient use of resources BGP supports state management hierarchy by using BGP Route Reflectors State management endpoint need not have BGP peering with every other state management endpoint only with few BGP Route Reflectors State management endpoints form the bottom level of the hierarchy, BGP Router Reflectors form the top level of the hierarchy Use of BGP third-party Next Hop allows to impose state management hierarchy without forcing hierarchy on the data plane Additional levels of state management hierarchy could be introduced by forming hierarchy of BGP Route Reflectors

Scalability via divide and conquer Scalability via divide and conquer : Distribution scope of state information associated with creation, termination, and moves of a given should be constrained to only the NVEs that are connected to s of a single Cloud Computing service customer Customer who owns this BGP can constrain distribution scope of state information by using Route Targets and Route Target Constrains (rfc4684) NVE uses Route Target Constrains to inform its Route Reflector(s) of the customers for which it needs to receive state information Route Reflector uses the Route Target Constrains information it receives from an NVE to send to that NVE only the state information of the customers that have at least one of their s connected to that NVE

Scalability via divide and conquer and State Management hierarchy Divide and conquer could be used in combination with state management hierarchy : Use BGP Route Reflectors for state management hierarchy Partition BGP Route Reflectors within the Cloud Computing service provider among Cloud Computing service customers served by the Provider As a result: No single component within the system (neither NVEs, nor Route Reflectors) is required to maintain all state for all the s Capacity of the system is not bounded by the capacity of its individual components

BGP: wide variety of state information BGP can distribute a wide variety of state information: State associated with IP routes (rfc4271) State associated with VPN-IP routes (rfc4364 aka 2547 VPNs) State associated with MAC routes (draft-ietf-l2vpn-evpn) State associated with RT Constrain Routes (rfc4684) State associated with Flow Spec (rfc5575) State associated with multicast routing information (rfc6514) All of the above are either IETF standards, or on the IETF standards track All of the above may be applicability in the context of state management for Cloud Computing service Single tool for distributing multiple types of state information