for Service Provider Data Center and IXP Francois Tallet, Cisco Systems 1
: Transparent Interconnection of Lots of Links overview How works designs Conclusion 2
IETF standard for Layer 2 multipathing Driven by multiple vendors http://datatracker.ietf.org/wg/trill/ Switching Easy Configuration Plug & Play Provisioning Flexibility Routing Multi-pathing (ECMP) Fast Convergence Highly Scalable brings Layer 3 routing benefits to flexible Layer 2 bridged Ethernet networks 3
4
Connect a group of switches using an arbitrary topology With minimal configuration, aggregate them into a fabric. A control protocol based on Layer 3 technology provides fabric-wide intelligence and ties the elements together 5
MAC IF A e1/1 B s8 s3 s8 A B Single address lookup at the ingress edge identifies the exit port across the fabric Traffic is then switched using the shortest path available Reliable L2 connectivity any to any (as if it was the same switch, no STP inside) 6
Equal Cost MultiPathing (ECMP) s3 s8 A B Mutipathing Traffic is redistributed across remaining links in case of failure, providing fast convergence 7
Can Route Anywhere L3 L3 The fabric provides seamless L3 integration An arbitrary number of routed interfaces can be created at the edge or within the fabric 8
9
Plug-n-Play L2 IS-IS Manages Forwarding Topology IS-IS assigns an address (nickname) to all switches Compute shortest, pair-wise paths Support equal-cost paths between any switch pairs Routing Table S10 S20 S30 S40 Switch Next-hop S10 S20 S30 S40 L1 L2 L3 L4 L1 L2 L3 L4 S200 L1, L2, L3, L4 S400 L1, L2, L3, L4 S100 S200 S300 S400 10
The association MAC address/nickname is maintained at the edge S10 S20 S30 S40 Nickname space: Routing decisions are made based on the routing table A è B S100 è S300 S100 S200 S300 S300: Routing Table Switch S100 IF L1, L2, L3, L4 MAC adress space: Switching based on MAC address tables A 1/1 Classical Ethernet (CE) 1/2 B S300: CE MAC Address Table MAC IF B 1/2 A S100 Traffic is encapsulated across the Fabric 11
S10 S20 S30 S40 A è B S100 è M S100: CE MAC Address Table Lookup B: Miss Flood S100 S200 S300 1/1 S200: CE MAC Address Table Lookup B: Miss flood 1/2 Lookup B: Miss flood S300: CE MAC Address Table MAC IF A 1/1 A MAC A IF S100 B MAC A IF S100 Classical Ethernet 12
S10 S20 S30 S40 Lookup A: Hit Learn source B S100 B è A S300 è S100 S200 S300 Lookup A: Hit Send to S100 S300: Routing Table Switch S100 IF L1, L2, L3, L4 S100: CE MAC Address Table MAC IF A 1/1 B S300 A 1/1 S200: CE MAC Address Table 1/2 S300: CE MAC Address Table MAC IF MAC IF B A S100 B A S100 1/2 B 1/2 Classical Ethernet 13
Time To Live (TTL) and Reverse Path Forwarding (RPF) Check Root STP TTL=2 Root S1 S2 TTL=1 S10 TTL=3 TTL=0 The control protocol is the only mechanism preventing loops If STP fails à loop No backup mechanism in the data plane Flooding impacts the whole network TTL in header RPF Check for multi-destination traffic The data plane is protecting against loops 14
Uses 2 Q-Tag on the Fabric Segment ID 10,000 Segment ID 30,000 Segment ID 20,000 V10 V10 V10 V10 V10 V10 V10 V10 V10 V10 15
16
vs. STP/Distributed Port Channel (DPC) L3 L3 Core Simple configuration No constraint in the design Seamless L3 integration No STP, no traditional bridging Virtually unlimited bandwidth Can extend easily and without operational impact POD STP/DPC POD 17
Efficient POD Interconnect L2+L3 Core DPC POD L3 DPC POD in the Core VLANs can terminate at the distribution or extend between PODs. STP is not extended between PODs, remote PODs or even remote data centers can be aggregated. Bandwidth or scale can be introduced in a non-disruptive way 18
Site 4 Site 1 Site 3 Site 2 Dark fiber interconnect (DWDM) Requires dark fiber Arbitrary interconnect topology (not dependent of port channels) Any number of sites High bandwidth, fast convergence Spanning tree isolation VLANs can be selectively extended/terminated 19
Provider A Provider B IXP Requirements Layer 2 Peering 10GE non-blocking Fabric Scale to thousands of ports Provider C Provider D Provider E Provider F Benefits for IXP Layer 2 Fabric Non-blocking up to thousands 10GE ports Simple to manage No design constraint, easy to grow 20
The Network Can Evolve With No Disruption Need more edge ports? Need more bandwidth? Add more leaf switches Add more links and spines L3 L3 21
Example: 2,048 x 10GE Server Design 16X improvement in bandwidth performance 6 to 1 consolidation (from 74 managed devices to 12 devices) 2X+ increase in network availability Simplified IT operations (fewer devices, vlans anywhere) - Traditional Spanning Tree Based Network Based Network Blocked Links Oversubscription 16:1 8:1 2:1 4 Pods 64 Access Switches 2, 048 Servers 8 Access Switches 2, 048 Servers Fully Non-Blocking 22
23
is simple, keeps the attractive aspects of Layer 2 Transparent to L3 protocols No addressing, simple configuration and deployment is efficient High bi-sectional bandwidth (ECMP) Optimal path between any two nodes is scalable Can extend a bridged domain without extending the risks generally associated to Layer 2 (frame routing, TTL, RPFC) 24