High-Availability Enterprise Network Design haviland@cisco.com 1
Staying On Target HA Focus vs Distractions! Flat networks are easier beware! Variety of vendors, protocols, designs, etc. Inherited complexity hard to purge Five nines is job one! Feature rich let s use all the knobs! The latest cool stuff older is more stable Change is hard, sometimes $$$ 2
HA Features of the Catalyst 6500 Consider for Backbones & Server Farms Fabric Redundancy switch fabric module in CatOS 6.1 Supervisor Redundancy HA feature in CatOS 5.4.1 stateful recovery image versioning on the fly MSFC Redundancy config-sync feature IOS 12.1.3 CatOS 6.1 HSRP pair 3
Thinking Outside the Box For HA/HP design outside the box the logical design is critical network features & protocols geophysical diversity is powerful Inside: HA, RAID, UPS, MTBF, etc. 4
Dramatis Personae Our Cast of Symbols Links GE, DPT, SONET, etc. L2 switching L2 forwarding in hardware L3 switching L3/L2 forwarding in hardware Routing L3 forwarding (SW or HW) Control plane = IOS routing protocols & features QoS where required Application intelligence GigE Channel Catalyst 4000 Catalyst 6500 Cisco 7500 Cisco 12000 5
HA Gigabit Campus Architecture survivable modules + survivable backbone Client Blocks Access L2 Distribution L3 Define the mission critical parts first! E or FE Port GE or GEC Server Block Ethernet or ATM Layer 2 or Layer 3 Server Farm Backbone Distribution L3 Access L2 6
High Availability Design Why a Modular ABC Approach Many new products, features, technologies HA and HP application operation is the goal Start with modular, structured approach (the logical design) Add multicast, VoIP, DPT, DWDM... 7
$350 $300 Design the Solution Then Pick the Products New Price per 10/100 Modules New Modules $250 New Catalyst 5XXX Catalyst 6XXX $200 Catalyst 2912G Catalyst 2948G Catalyst 2980G New Catalyst 4XXX $100 10/100 Ports Gigabit Ports Backplane Switching Capacity 24 32-96 6-12 24-500+ 24-350+ 3-38+ 8-64+ 24 Gbps 1.2-3.6 + 10Gbps 250+ Gbps 20 Mpps Up to 72 Mpps Up to 150 Mpps 8
HA Design Reality Check! Assume Things Fail - Then What? Networks are complex Things break, people make mistakes What happens if a failure occurs? Simple, structured, deterministic design required for fast recovery The tradeoffs your choices are important 9
Network Recovery How Long? What Happens? Building Branches Access Layer 2 Distribution Layer 3 6 Core L3 WAN WAN backup 5 4 Server Distribution 2 3 Server Farm Layer 2 1 10
Network Recovery Times If You Follow the Rules Failure Scenario 1,2 server 3,4 uplink 5,6 core dual-path L3 EtherChannel L3 routing L2 general Recovery Mode Server NIC HSRP (& UplinkFast) HSRP track alternate path used channel recovery EIGRP or OSPF L2 spanning tree Recovery Time < 2 seconds tune to 3 seconds tune to 3 seconds < 2 seconds < 1 second depends on tuning tune (up to 50 seconds) DPT IPS 50 milliseconds 11
Design for High Availability How to Build Boring Networks! The Concepts The Rules Design Building Block Design Backbone Notes on Tuning 12
HA Network Design Concepts thinking outside the box 1) Simplicity & Determinism 2) Collapse the Sandwich 3) Spanning Tree Failure Domain 4) Map L3 to L2 to L1 5) Scaling and Hierarchy 6) ABCs of Module + Backbone Design 7) The Four Corners 13
1) Simplicity and Determinism reducing the degrees of freedom Simple Structured Deterministic Boring! HA Continuum Flexible Complex Varied Interesting! Every Choice Affects Availability! Determinism or Flexibility? Would you support 27 desktop environments? Would you support 13 network vendors? Would you use 57 varieties of Cisco IOS? 14
2) Collapse the Sandwich 2) route IP over glass Traditional Model Optical Internetworking IP FR/ATM SONET Service Traffic Eng Fiber Mgmt Lower equipment cost Lower operational cost Simplified architecture Scalable capacity IP Fiber Big Fat Pipe Fiber 15
3) Minimize the Failure Domain public enemy number one avoid highly meshed, non-deterministic large scale L2 = VLAN topology Building 1 Building 2 Where should root go? What happens when something breaks? How long to converge? Many blocking links Large failure domain! Broadcast flooding Multicast flooding Loops within loops ST from heck Times 100 VLANs? Building 3 Building 4 16
4) Map L3 to L2 to L1 Easier administration & troubleshooting Clients in subnet 10.0.55.0 VLAN 55 wiring closet 55 on floor 55 access switch 55 interface VLAN 55 all match and life is good go fishing with your kids 10/100 BaseT GE or GEC 17
5) Scaling and Hierarchy Strong hierarchies like telephone system and Internet segment addressing and therefore scale Flat L2 Ethernet is easy but does not scale ATM LANE is logically flat, scales as N squared U U U C C C N N N C complexity U unmanageable N number 1999, Cisco Systems, of devices Inc. 18
6) Building Block & Backbone Design ABCs Server Farm LAN Access Distribution A design bb B design BB C connect bb to BB Core Divide and conquer Cookie cutter configuration Distribution WAN Access WAN Ecommerce Solution Internet PSTN Deterministic L3 demarcation 19
7) Four Square Network Redundancy or the Four Corners Problem L3 One Chassis Two Chassis One Supervisor Simplest No Redundancy GeoPhysical Effective Two Supervisors When space is limited HA Most Complex Belt and Suspenders 20
Dos and Don ts for HA Design 1) Eliminate STP Loops 2) L3 Dual-Path Design 3) EtherChannel Across Cards 4) Workgroup Servers 5) Use HSRP Track 6) Passive Interfaces 7) Issues with Single-Path Design 8) Oversubscription Guidelines 9) HA for single attached servers 10) Protocol Tradeoffs 11) UDLD Protection 21
Rule 1) Eliminate STP Loops in the backbone and mission critical points Too many cooks spoil the broth L3 control is better X.1 X.2 X.3 No blocking links to waste bandwidth Avoids slow STP convergence Very deterministic Routed links not VLAN trunks Root VLAN X L2 Gigabit switch in backbone subnet X = VLAN X 22
Rule 2) Dual Equal-Cost Path L3 Load balance - don t waste bandwidth unlike L1 and L2 redundancy Fast recovery to remaining path detect L1 down & purge - about 1s Works with any routed fat pipes Equal cost routes to X Path A Path B Path A Path B Destination network X 23
Rule 3) EtherChannel Across Cards Increased availability Sub second recovery Spans cards on 6500 Up to 8 ports in channel Small complexity increase Single L2 STP link Single L3 subnet less if channel set on 24
Rule 4a) Connect Workgroup Server With no L2 recovery path, what happens if link breaks. C Client X.1 Link CB breaks. VLAN X in purple includes clients and workgroup servers attached at different places. A B Workgroup server X.100 attached to distribution layer L2 path to client X.1 Links to core 25
Rule 4b) Connect Workgroup Server Subnet X now discontiguous Incoming traffic gets dropped Client X.1 C Routers A & B continue to advertise reachability of subnet X... A B Workgroup server X.100 attached to distribution layer L2 path to client X.1 X.100 not reachable X.1 not reachable 26
Rule 4c) Connect Workgroup Server Introduce L2/STP redundancy Adds a loop (band-aid fix) C Client X.1 VLAN trunk AB forms L2 loop recovery path for STP prevents black hole A B Workgroup server X.100 attached to distribution layer L2 path to client X.1 27
Rule 4d) Connect Workgroup Server Real Lessons: Enterprise Server Farms are better L3 demarcation is better Example of why extended L2 is difficult 28
Rule 5a) Use HSRP Track Review - Hot Standby Router Protocol Fast recovery can be tuned to 3s or less Subnet M hosts M.1 M.2 M.3 Z Router X acts as gateway router for subnet M, IP address M.100. If link Z fails router Y will take over as M.100 gateway with same MAC address X is M.100 HSRP Primary Priority 200 Y ( becomes M.100) HSRP Backup Priority 100 10/100 BaseT GE or GEC 29
Rule 5b) Use HSRP Track Track extends HSRP to monitor links to backbone Ensures shortest path - best outbound gateway Subnet M hosts M.1 M.2 M.3 Track interface A - lower priority 75 Track interface B - lower priority 75 HSRP triggers if both A and B lost Z X is M.100 HSRP Primary Priority 200 Y ( becomes M.100) HSRP Backup Priority 100 A B 10/100 BaseT GE or GEC 30
Rule 6a) Use Passive Interfaces L3 switches X & Y in distribution layer 4 VLANs per wiring closet 10 wiring closets Wiring closet switch ABCD EFGH IJKL MNOP Ten total Distribution switch X Y 31
Rule 6b) Use Passive Interfaces What X and Y see is 4*10=40 routed links Increased protocol overhead & CPU A.1 A A.2 X B.1 C.1 B C B.2 C.2 Y D.1 D D.2 E.1 E E.2 F.1 F F.2 G.1 G G.2 Etc. Etc. Etc. 32
Rule 6c) Use Passive Interfaces Turns off routing updates & overhead Leave two routed links for redundant paths CDP, VTP, HSRP etc. still function on all links A.1 A A.2 X B.1 (passive) C.1 (passive) B C B.2 (passive) C.2 (passive) Y D.1 (passive) D D.2 (passive) E.1 E E.2 F.1 (passive) F F.2 (passive) G.1 (passive) G G.2 (passive) Etc. Etc. Etc. 33
Rule 7a) Issues With Single Path Designs Outbound case... L3 engine MSFC on core-x reloads Lights are on but nobody home - HSRP does not recover Remove passive interface to wiring closet subnets A, B Provide longer routed recovery path Access L2 HSRP primary Core L3 Subnet A GE Single path to core X Subnet B New, longer outbound routes Y 34
Rule 7b) Issues with Single-Path Design Inbound case... Recovery must take place in both directions Routing protocol recovers longer route from X to subnets A, B Therefore dual-path L3 is better & faster than single-path Access L2 Core L3 GE HSRP primary Subnet A Single path to core X Subnet B Y New, longer routes to A, B 35
Rule 8a) Oversubscription Guidelines Oversubscription part of all networks - not bad Non-blocking switches do not mean a nonblocking network You determine the amount of blocking Non-blocking design GE GE Blocking design 2:1 GE GE GE 36
Rule 8b) Oversubscription Guidelines 200 100BaseT Oversubscription rules of thumb work well 20:1 at wiring closet Less in distribution and server farm QoS required IFF congestion occurs Protect real time flows at congested points 20:1 Distribution L3 Core L3 use nonblocking switches GE Dual-link GEC 8 uplinks n:1 37
Rule 9) Dual Supervisors HA for Single Attached Servers Single point of failure Dual supervisors - fast stateful recovery No increase in complexity Single attached server mission critical application HA dual supervisors Catalyst 6XXX 10/100 BaseT GE or GEC Redundant uplinks 38
Rule 10) Protocol Tradeoffs Rule 10) Automatic or Manual Configuration Configuration up front rather than CPU overhead later, for example: set VTP mode transparent set/clear VLANs for each trunk set trunks on or off set channel on or off Choose flexibility or determinism 39
Rule 11) UniDirectional Link Detection UDLD detects mismatch when physical layer checks out OK Prevents various failure conditions including crossed wiring The lights are on, BUT.. Tx Fiber Rx Fiber 40
Building Block Means Survivable Self-contained Backbone Autonomous Survivability Unit - HSRP L3 Broadcast Multicast demarcation Cookie cutter configuration L3 Demarcation of failure domain Simple, repeatable, deterministic Redundancy adds 15% cost at mission critical points like server farm L2 L3 ASU delimits failure domain 41
Building Block Templates Use As Is or Combine 1) Standard Model simple, structured 2) VLAN Model more flexible 3) Large Scale Server Farm Model accommodate dual NIC 4) Small Scale Server Farm Model accommodate dual NIC 42
1) Standard Building Block no loops - no STP complexity Subnet 10 Subnet 11 Subnet 12 Subnet 13 Subnet 14 Subnet 15 Subnet 16 Subnet 17 Access L2 root switch VLAN 10/11 GE/GEC VLAN Trunks HSRP Primary Subnets/VLANs 10, 12, 14, 16 Highly Deterministic L1 maps L2 maps L3 No blocking links Shortest path always Not flexible HSRP Primary Subnets/VLANs 11, 13, 15, 17 10/100 BaseT GE or GEC Dual Path with Tracking 43
2) VLAN Building Block make L2 design match L3 design All VLANs terminate at L3 boundary All VLANs All Subnets All VLANs All Subnets All VLANs All Subnets All VLANs All Subnets Uplink- Fast FE BO FO BE FE BO FO BE FE BO FO BE FE BO FO BE GE/GEC VLAN Trunks More flexible FO forwarding odd BE blocking even etc. L2 L3 STP root VLANs 10 12 14 16 HSRP primary subnets 10 12 14 16 L2 Path STP root VLANs 11 13 15 17 HSRP primary subnets 11 13 15 17 L2 L3 10/100 BaseT GE or GEC Dual Path with Tracking 44
3) Large-Scale Server Farm Building Block based on VLAN building block aggregates traffic - high BW Dual-NIC Server Example Fault Tolerant Mode (FTM) Same IP Address - seamless recovery Access L2 UplinkFast GE/GEC VLAN Trunks 10/100 BaseT GE or GEC L2 L3 STP root VLANs EVEN HSRP primary subnets EVEN L2 Path STP root VLANs ODD HSRP primary subnets ODD L2 L3 Dual Path with Tracking 45
4) Small-Scale Scale Server Farm Building Block Simplified building block with no STP loops Use if port density permits Use if no oversubscription (non-blocking) is a requirement Dual-NIC Server Example Fault Tolerant Mode (FTM) Same IP Address - seamless recovery L2 L3 HSRP primary subnets EVEN L2 Path HSRP primary subnets ODD L2 L3 Dual Path with Tracking 10/100 BaseT GE or GEC 46
Redundant Backbone Models all good - increasing scale 1) Collapsed L3 Backbone 2) Full Mesh 3) Partial Mesh 4) Dual-Path L2 Switched 5) Dual-Path L3 Switched 47
1) Collapsed L3 Backbone large building or small campus Clients Access L2 Collapsed Backbone GE/GEC Core L3 Scale depends on physical plant and policy more than performance Server Farm 10/100 BaseT GE or GEC 48
2) Full Mesh Backbone small campus - n squared limitation Access L2 Client Blocks Distribution L3 2 blocks - 6 peerings 3 blocks - 15 peerings 4 blocks - 28 peerings 5 blocks - 45 peerings Server Block Note importance of passive wiring closet interfaces in meshed designs! Distribution L3 Access L2 E or FE Port GE or GEC 49
3) Partial Mesh Backbone medium campus - traffic flow to server farm Access L2 Client Blocks Distribution L3 Predominant traffic pattern Distribution/Core L3 E or FE Port GE or GEC Server Block Access L2 50
4) Dual-Path L2 Switched Backbone no STP loops or VLAN trunks in core North West South Client Blocks Access L2 Distribution L3 Dual L2 Backbone red core subnet=vlan=elan Core L2 blue core subnet=vlan=elan E or FE Port GE or GEC 51
5a) Benefits of a L3 Backbone Multicast PIM routing control Load balancing No blocked links Fast convergence EIGRP/OSPF Greater scalability overall Router peering reduced IOS features in the backbone 52
5b) Dual-Path L3 Backbone largest scale, intelligent multicast Access L2 Client Block Distribution L3 Core L3 All routed links, consider subnet count! E or FE Port GE or GEC Server Farm Block Distribution L3 Access L2 53
Restore Considerations Restoring can take longer in some cases - more complex - schedule On power up L1 may come up before L3 builds routing table - temporary black hole for HSRP Use preempt delay for HSRP 54
Campus Failover Layer 2 Recovery & Tuning STP Tune diameter on root switch Improves recovery time maxage UplinkFast No tuning, 2 seconds, wiring closet only Only applies with forwarding & blocking link PortFast Server or desktop ports only 1 s Move directly from linkup into forwarding Backbonefast Converges 2 sec + 2xFwd_delay for indirect link failures Eliminates maxage timeout 55
Campus Failover Layer 3 Recovery & Tuning Caution with aggressive tuning Good when network is stable, highly summarized OSPF (fast LAN links) Tune hello timer 1 sec, dead timer 3 sec <4s to recognize problem, then converge HSRP (fast LAN links) Tune hello timer 1 sec, dead timer 3 sec <4s to converge EIGRP (fast LAN links) Tune hello timer 1 sec, hold timer 3 sec <4s to recognize problem, then converge 56
Keeping Networks Available! KISS - eliminate complex L2 ASU - building blocks Redundant backbone Redundant L3 paths L3 segments failure domain 57
58