High-Availability Enterprise Network Design



Similar documents
Chapter 3. Enterprise Campus Network Design

Disaster Recovery Design Ehab Ashary University of Colorado at Colorado Springs

Top-Down Network Design

Network Virtualization and Data Center Networks Data Center Virtualization - Basics. Qin Yin Fall Semester 2013

Expert Reference Series of White Papers. Planning for the Redeployment of Technical Personnel in the Modern Data Center

RESILIENT NETWORK DESIGN

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

CCNP SWITCH: Implementing High Availability and Redundancy in a Campus Network

OSPF Routing Protocol

TechBrief Introduction

Auspex Support for Cisco Fast EtherChannel TM

Campus Network for High Availability Design Guide

Voice Over IP. MultiFlow IP Phone # 3071 Subnet # Subnet Mask IP address Telephone.

Troubleshooting and Maintaining Cisco IP Networks Volume 1

CORPORATE NETWORKING

Course Contents CCNP (CISco certified network professional)

How To Understand and Configure Your Network for IntraVUE

Data Center Multi-Tier Model Design

NETE-4635 Computer Network Analysis and Design. Designing a Network Topology. NETE Computer Network Analysis and Design Slide 1

hp ProLiant network adapter teaming

CHAPTER 6 DESIGNING A NETWORK TOPOLOGY

Virtual PortChannels: Building Networks without Spanning Tree Protocol

Data Networking and Architecture. Delegates should have some basic knowledge of Internet Protocol and Data Networking principles.

Data Center Infrastructure Design Guide 2.1 Readme File

Data Center Convergence. Ahmad Zamer, Brocade

Designing Cisco Network Service Architectures ARCH v2.1; 5 Days, Instructor-led

This chapter covers four comprehensive scenarios that draw on several design topics covered in this book:

Switching in an Enterprise Network

LAN Baseline Architecture Branch Office Network Reference Design Guide

Layer 3 Network + Dedicated Internet Connectivity

Introducing Network Design Concepts

CISCO STUDY GUIDE. Building Cisco Multilayer Switched Networks (BCMSN) Edition 2

Objectives. The Role of Redundancy in a Switched Network. Layer 2 Loops. Broadcast Storms. More problems with Layer 2 loops

Introducing Network Design Concepts

IMPLEMENTING CISCO SWITCHED NETWORKS V2.0 (SWITCH)

Cisco Certified Network Professional - Routing & Switching

Walmart s Data Center. Amadeus Data Center. Google s Data Center. Data Center Evolution 1.0. Data Center Evolution 2.0

Leased Line + Remote Dial-in connectivity

Chapter 1 Reading Organizer

Data Center Blade Server Integration Guide

Interconnecting Cisco Networking Devices Part 2

STATE OF THE ART OF DATA CENTRE NETWORK TECHNOLOGIES CASE: COMPARISON BETWEEN ETHERNET FABRIC SOLUTIONS

A New Approach to Developing High-Availability Server

VMDC 3.0 Design Overview

COURSE AGENDA. Lessons - CCNA. CCNA & CCNP - Online Course Agenda. Lesson 1: Internetworking. Lesson 2: Fundamentals of Networking

Advanced Network Services Teaming

SummitStack in the Data Center

6/8/2011. Document ID: Contents. Introduction. Prerequisites. Requirements. Components Used. Conventions. Introduction

How To Design A Network For A Small Business

Configuring EtherChannels

FASTIRON II SWITCHES Foundry Networks award winning FastIron II family of switches provides high-density

Data Center Networking Designing Today s Data Center

TÓPICOS AVANÇADOS EM REDES ADVANCED TOPICS IN NETWORKS

Brocade One Data Center Cloud-Optimized Networks

John Ragan Director of Product Management. Billy Wise Communications Specialist

High Availability Campus Network Design Routed Access Layer using EIGRP or OSPF

CHAPTER 10 LAN REDUNDANCY. Scaling Networks

Datacenter Rack Switch Redundancy Models Server Access Ethernet Switch Connectivity Options

Configuring IPS High Bandwidth Using EtherChannel Load Balancing

SummitStack in the Data Center

Top-Down Network Design

Chapter 4: Spanning Tree Design Guidelines for Cisco NX-OS Software and Virtual PortChannels

Campus High availability network -LAN

Network Design. Yiannos Mylonas

Application Note Gigabit Ethernet Port Modes

Introducing Brocade VCS Technology

Solutions Guide. Resilient Networking with EPSR

Server Consolidation and Remote Disaster Recovery: The Path to Lower TCO and Higher Reliability

Cisco Data Center Infrastructure Design Guide 2.1 Release Notes

Building Tomorrow s Data Center Network Today

High Availability Failover Optimization Tuning HA Timers PAN-OS 6.0.0

Migrate from Cisco Catalyst 6500 Series Switches to Cisco Nexus 9000 Series Switches

Overview of Routing between Virtual LANs

Brocade Solution for EMC VSPEX Server Virtualization

ADVANCED NETWORK CONFIGURATION GUIDE

Simulation of High Availability Internet Service Provider s Network

IP SAN Best Practices

VMware Virtual SAN 6.2 Network Design Guide

Introduction about cisco company and its products (network devices) Tell about cisco offered courses and its salary benefits (ccna ccnp ccie )

CCNP Switch Questions/Answers Implementing High Availability and Redundancy

Cisco Certified Network Associate Exam. Operation of IP Data Networks. LAN Switching Technologies. IP addressing (IPv4 / IPv6)

FWSM introduction Intro 5/1

ICANWK613A Develop plans to manage structured troubleshooting process of enterprise networks

ASM Educational Center (ASM) Est. 1992

The Keys for Campus Networking: Integration, Integration, and Integration

Chapter 7 Lab 7-1, Configuring Switches for IP Telephony Support

Juniper Networks QFabric: Scaling for the Modern Data Center

IP Networking. Overview. Networks Impact Daily Life. IP Networking - Part 1. How Networks Impact Daily Life. How Networks Impact Daily Life

: Interconnecting Cisco Networking Devices Part 2 v1.1

Port Trunking. Contents

Zarząd (7 osób) F inanse (13 osób) M arketing (7 osób) S przedaż (16 osób) K adry (15 osób)

Ethernet Fabrics: An Architecture for Cloud Networking

Chapter 7 Configuring Trunk Groups and Dynamic Link Aggregation

Redundancy and load balancing at L3 in Local Area Networks. Fulvio Risso Politecnico di Torino

Cisco Catalyst 4500-X Series Switch Family

High Speed Ethernet. Dr. Sanjay P. Ahuja, Ph.D. Professor School of Computing, UNF

Transcription:

High-Availability Enterprise Network Design haviland@cisco.com 1

Staying On Target HA Focus vs Distractions! Flat networks are easier beware! Variety of vendors, protocols, designs, etc. Inherited complexity hard to purge Five nines is job one! Feature rich let s use all the knobs! The latest cool stuff older is more stable Change is hard, sometimes $$$ 2

HA Features of the Catalyst 6500 Consider for Backbones & Server Farms Fabric Redundancy switch fabric module in CatOS 6.1 Supervisor Redundancy HA feature in CatOS 5.4.1 stateful recovery image versioning on the fly MSFC Redundancy config-sync feature IOS 12.1.3 CatOS 6.1 HSRP pair 3

Thinking Outside the Box For HA/HP design outside the box the logical design is critical network features & protocols geophysical diversity is powerful Inside: HA, RAID, UPS, MTBF, etc. 4

Dramatis Personae Our Cast of Symbols Links GE, DPT, SONET, etc. L2 switching L2 forwarding in hardware L3 switching L3/L2 forwarding in hardware Routing L3 forwarding (SW or HW) Control plane = IOS routing protocols & features QoS where required Application intelligence GigE Channel Catalyst 4000 Catalyst 6500 Cisco 7500 Cisco 12000 5

HA Gigabit Campus Architecture survivable modules + survivable backbone Client Blocks Access L2 Distribution L3 Define the mission critical parts first! E or FE Port GE or GEC Server Block Ethernet or ATM Layer 2 or Layer 3 Server Farm Backbone Distribution L3 Access L2 6

High Availability Design Why a Modular ABC Approach Many new products, features, technologies HA and HP application operation is the goal Start with modular, structured approach (the logical design) Add multicast, VoIP, DPT, DWDM... 7

$350 $300 Design the Solution Then Pick the Products New Price per 10/100 Modules New Modules $250 New Catalyst 5XXX Catalyst 6XXX $200 Catalyst 2912G Catalyst 2948G Catalyst 2980G New Catalyst 4XXX $100 10/100 Ports Gigabit Ports Backplane Switching Capacity 24 32-96 6-12 24-500+ 24-350+ 3-38+ 8-64+ 24 Gbps 1.2-3.6 + 10Gbps 250+ Gbps 20 Mpps Up to 72 Mpps Up to 150 Mpps 8

HA Design Reality Check! Assume Things Fail - Then What? Networks are complex Things break, people make mistakes What happens if a failure occurs? Simple, structured, deterministic design required for fast recovery The tradeoffs your choices are important 9

Network Recovery How Long? What Happens? Building Branches Access Layer 2 Distribution Layer 3 6 Core L3 WAN WAN backup 5 4 Server Distribution 2 3 Server Farm Layer 2 1 10

Network Recovery Times If You Follow the Rules Failure Scenario 1,2 server 3,4 uplink 5,6 core dual-path L3 EtherChannel L3 routing L2 general Recovery Mode Server NIC HSRP (& UplinkFast) HSRP track alternate path used channel recovery EIGRP or OSPF L2 spanning tree Recovery Time < 2 seconds tune to 3 seconds tune to 3 seconds < 2 seconds < 1 second depends on tuning tune (up to 50 seconds) DPT IPS 50 milliseconds 11

Design for High Availability How to Build Boring Networks! The Concepts The Rules Design Building Block Design Backbone Notes on Tuning 12

HA Network Design Concepts thinking outside the box 1) Simplicity & Determinism 2) Collapse the Sandwich 3) Spanning Tree Failure Domain 4) Map L3 to L2 to L1 5) Scaling and Hierarchy 6) ABCs of Module + Backbone Design 7) The Four Corners 13

1) Simplicity and Determinism reducing the degrees of freedom Simple Structured Deterministic Boring! HA Continuum Flexible Complex Varied Interesting! Every Choice Affects Availability! Determinism or Flexibility? Would you support 27 desktop environments? Would you support 13 network vendors? Would you use 57 varieties of Cisco IOS? 14

2) Collapse the Sandwich 2) route IP over glass Traditional Model Optical Internetworking IP FR/ATM SONET Service Traffic Eng Fiber Mgmt Lower equipment cost Lower operational cost Simplified architecture Scalable capacity IP Fiber Big Fat Pipe Fiber 15

3) Minimize the Failure Domain public enemy number one avoid highly meshed, non-deterministic large scale L2 = VLAN topology Building 1 Building 2 Where should root go? What happens when something breaks? How long to converge? Many blocking links Large failure domain! Broadcast flooding Multicast flooding Loops within loops ST from heck Times 100 VLANs? Building 3 Building 4 16

4) Map L3 to L2 to L1 Easier administration & troubleshooting Clients in subnet 10.0.55.0 VLAN 55 wiring closet 55 on floor 55 access switch 55 interface VLAN 55 all match and life is good go fishing with your kids 10/100 BaseT GE or GEC 17

5) Scaling and Hierarchy Strong hierarchies like telephone system and Internet segment addressing and therefore scale Flat L2 Ethernet is easy but does not scale ATM LANE is logically flat, scales as N squared U U U C C C N N N C complexity U unmanageable N number 1999, Cisco Systems, of devices Inc. 18

6) Building Block & Backbone Design ABCs Server Farm LAN Access Distribution A design bb B design BB C connect bb to BB Core Divide and conquer Cookie cutter configuration Distribution WAN Access WAN Ecommerce Solution Internet PSTN Deterministic L3 demarcation 19

7) Four Square Network Redundancy or the Four Corners Problem L3 One Chassis Two Chassis One Supervisor Simplest No Redundancy GeoPhysical Effective Two Supervisors When space is limited HA Most Complex Belt and Suspenders 20

Dos and Don ts for HA Design 1) Eliminate STP Loops 2) L3 Dual-Path Design 3) EtherChannel Across Cards 4) Workgroup Servers 5) Use HSRP Track 6) Passive Interfaces 7) Issues with Single-Path Design 8) Oversubscription Guidelines 9) HA for single attached servers 10) Protocol Tradeoffs 11) UDLD Protection 21

Rule 1) Eliminate STP Loops in the backbone and mission critical points Too many cooks spoil the broth L3 control is better X.1 X.2 X.3 No blocking links to waste bandwidth Avoids slow STP convergence Very deterministic Routed links not VLAN trunks Root VLAN X L2 Gigabit switch in backbone subnet X = VLAN X 22

Rule 2) Dual Equal-Cost Path L3 Load balance - don t waste bandwidth unlike L1 and L2 redundancy Fast recovery to remaining path detect L1 down & purge - about 1s Works with any routed fat pipes Equal cost routes to X Path A Path B Path A Path B Destination network X 23

Rule 3) EtherChannel Across Cards Increased availability Sub second recovery Spans cards on 6500 Up to 8 ports in channel Small complexity increase Single L2 STP link Single L3 subnet less if channel set on 24

Rule 4a) Connect Workgroup Server With no L2 recovery path, what happens if link breaks. C Client X.1 Link CB breaks. VLAN X in purple includes clients and workgroup servers attached at different places. A B Workgroup server X.100 attached to distribution layer L2 path to client X.1 Links to core 25

Rule 4b) Connect Workgroup Server Subnet X now discontiguous Incoming traffic gets dropped Client X.1 C Routers A & B continue to advertise reachability of subnet X... A B Workgroup server X.100 attached to distribution layer L2 path to client X.1 X.100 not reachable X.1 not reachable 26

Rule 4c) Connect Workgroup Server Introduce L2/STP redundancy Adds a loop (band-aid fix) C Client X.1 VLAN trunk AB forms L2 loop recovery path for STP prevents black hole A B Workgroup server X.100 attached to distribution layer L2 path to client X.1 27

Rule 4d) Connect Workgroup Server Real Lessons: Enterprise Server Farms are better L3 demarcation is better Example of why extended L2 is difficult 28

Rule 5a) Use HSRP Track Review - Hot Standby Router Protocol Fast recovery can be tuned to 3s or less Subnet M hosts M.1 M.2 M.3 Z Router X acts as gateway router for subnet M, IP address M.100. If link Z fails router Y will take over as M.100 gateway with same MAC address X is M.100 HSRP Primary Priority 200 Y ( becomes M.100) HSRP Backup Priority 100 10/100 BaseT GE or GEC 29

Rule 5b) Use HSRP Track Track extends HSRP to monitor links to backbone Ensures shortest path - best outbound gateway Subnet M hosts M.1 M.2 M.3 Track interface A - lower priority 75 Track interface B - lower priority 75 HSRP triggers if both A and B lost Z X is M.100 HSRP Primary Priority 200 Y ( becomes M.100) HSRP Backup Priority 100 A B 10/100 BaseT GE or GEC 30

Rule 6a) Use Passive Interfaces L3 switches X & Y in distribution layer 4 VLANs per wiring closet 10 wiring closets Wiring closet switch ABCD EFGH IJKL MNOP Ten total Distribution switch X Y 31

Rule 6b) Use Passive Interfaces What X and Y see is 4*10=40 routed links Increased protocol overhead & CPU A.1 A A.2 X B.1 C.1 B C B.2 C.2 Y D.1 D D.2 E.1 E E.2 F.1 F F.2 G.1 G G.2 Etc. Etc. Etc. 32

Rule 6c) Use Passive Interfaces Turns off routing updates & overhead Leave two routed links for redundant paths CDP, VTP, HSRP etc. still function on all links A.1 A A.2 X B.1 (passive) C.1 (passive) B C B.2 (passive) C.2 (passive) Y D.1 (passive) D D.2 (passive) E.1 E E.2 F.1 (passive) F F.2 (passive) G.1 (passive) G G.2 (passive) Etc. Etc. Etc. 33

Rule 7a) Issues With Single Path Designs Outbound case... L3 engine MSFC on core-x reloads Lights are on but nobody home - HSRP does not recover Remove passive interface to wiring closet subnets A, B Provide longer routed recovery path Access L2 HSRP primary Core L3 Subnet A GE Single path to core X Subnet B New, longer outbound routes Y 34

Rule 7b) Issues with Single-Path Design Inbound case... Recovery must take place in both directions Routing protocol recovers longer route from X to subnets A, B Therefore dual-path L3 is better & faster than single-path Access L2 Core L3 GE HSRP primary Subnet A Single path to core X Subnet B Y New, longer routes to A, B 35

Rule 8a) Oversubscription Guidelines Oversubscription part of all networks - not bad Non-blocking switches do not mean a nonblocking network You determine the amount of blocking Non-blocking design GE GE Blocking design 2:1 GE GE GE 36

Rule 8b) Oversubscription Guidelines 200 100BaseT Oversubscription rules of thumb work well 20:1 at wiring closet Less in distribution and server farm QoS required IFF congestion occurs Protect real time flows at congested points 20:1 Distribution L3 Core L3 use nonblocking switches GE Dual-link GEC 8 uplinks n:1 37

Rule 9) Dual Supervisors HA for Single Attached Servers Single point of failure Dual supervisors - fast stateful recovery No increase in complexity Single attached server mission critical application HA dual supervisors Catalyst 6XXX 10/100 BaseT GE or GEC Redundant uplinks 38

Rule 10) Protocol Tradeoffs Rule 10) Automatic or Manual Configuration Configuration up front rather than CPU overhead later, for example: set VTP mode transparent set/clear VLANs for each trunk set trunks on or off set channel on or off Choose flexibility or determinism 39

Rule 11) UniDirectional Link Detection UDLD detects mismatch when physical layer checks out OK Prevents various failure conditions including crossed wiring The lights are on, BUT.. Tx Fiber Rx Fiber 40

Building Block Means Survivable Self-contained Backbone Autonomous Survivability Unit - HSRP L3 Broadcast Multicast demarcation Cookie cutter configuration L3 Demarcation of failure domain Simple, repeatable, deterministic Redundancy adds 15% cost at mission critical points like server farm L2 L3 ASU delimits failure domain 41

Building Block Templates Use As Is or Combine 1) Standard Model simple, structured 2) VLAN Model more flexible 3) Large Scale Server Farm Model accommodate dual NIC 4) Small Scale Server Farm Model accommodate dual NIC 42

1) Standard Building Block no loops - no STP complexity Subnet 10 Subnet 11 Subnet 12 Subnet 13 Subnet 14 Subnet 15 Subnet 16 Subnet 17 Access L2 root switch VLAN 10/11 GE/GEC VLAN Trunks HSRP Primary Subnets/VLANs 10, 12, 14, 16 Highly Deterministic L1 maps L2 maps L3 No blocking links Shortest path always Not flexible HSRP Primary Subnets/VLANs 11, 13, 15, 17 10/100 BaseT GE or GEC Dual Path with Tracking 43

2) VLAN Building Block make L2 design match L3 design All VLANs terminate at L3 boundary All VLANs All Subnets All VLANs All Subnets All VLANs All Subnets All VLANs All Subnets Uplink- Fast FE BO FO BE FE BO FO BE FE BO FO BE FE BO FO BE GE/GEC VLAN Trunks More flexible FO forwarding odd BE blocking even etc. L2 L3 STP root VLANs 10 12 14 16 HSRP primary subnets 10 12 14 16 L2 Path STP root VLANs 11 13 15 17 HSRP primary subnets 11 13 15 17 L2 L3 10/100 BaseT GE or GEC Dual Path with Tracking 44

3) Large-Scale Server Farm Building Block based on VLAN building block aggregates traffic - high BW Dual-NIC Server Example Fault Tolerant Mode (FTM) Same IP Address - seamless recovery Access L2 UplinkFast GE/GEC VLAN Trunks 10/100 BaseT GE or GEC L2 L3 STP root VLANs EVEN HSRP primary subnets EVEN L2 Path STP root VLANs ODD HSRP primary subnets ODD L2 L3 Dual Path with Tracking 45

4) Small-Scale Scale Server Farm Building Block Simplified building block with no STP loops Use if port density permits Use if no oversubscription (non-blocking) is a requirement Dual-NIC Server Example Fault Tolerant Mode (FTM) Same IP Address - seamless recovery L2 L3 HSRP primary subnets EVEN L2 Path HSRP primary subnets ODD L2 L3 Dual Path with Tracking 10/100 BaseT GE or GEC 46

Redundant Backbone Models all good - increasing scale 1) Collapsed L3 Backbone 2) Full Mesh 3) Partial Mesh 4) Dual-Path L2 Switched 5) Dual-Path L3 Switched 47

1) Collapsed L3 Backbone large building or small campus Clients Access L2 Collapsed Backbone GE/GEC Core L3 Scale depends on physical plant and policy more than performance Server Farm 10/100 BaseT GE or GEC 48

2) Full Mesh Backbone small campus - n squared limitation Access L2 Client Blocks Distribution L3 2 blocks - 6 peerings 3 blocks - 15 peerings 4 blocks - 28 peerings 5 blocks - 45 peerings Server Block Note importance of passive wiring closet interfaces in meshed designs! Distribution L3 Access L2 E or FE Port GE or GEC 49

3) Partial Mesh Backbone medium campus - traffic flow to server farm Access L2 Client Blocks Distribution L3 Predominant traffic pattern Distribution/Core L3 E or FE Port GE or GEC Server Block Access L2 50

4) Dual-Path L2 Switched Backbone no STP loops or VLAN trunks in core North West South Client Blocks Access L2 Distribution L3 Dual L2 Backbone red core subnet=vlan=elan Core L2 blue core subnet=vlan=elan E or FE Port GE or GEC 51

5a) Benefits of a L3 Backbone Multicast PIM routing control Load balancing No blocked links Fast convergence EIGRP/OSPF Greater scalability overall Router peering reduced IOS features in the backbone 52

5b) Dual-Path L3 Backbone largest scale, intelligent multicast Access L2 Client Block Distribution L3 Core L3 All routed links, consider subnet count! E or FE Port GE or GEC Server Farm Block Distribution L3 Access L2 53

Restore Considerations Restoring can take longer in some cases - more complex - schedule On power up L1 may come up before L3 builds routing table - temporary black hole for HSRP Use preempt delay for HSRP 54

Campus Failover Layer 2 Recovery & Tuning STP Tune diameter on root switch Improves recovery time maxage UplinkFast No tuning, 2 seconds, wiring closet only Only applies with forwarding & blocking link PortFast Server or desktop ports only 1 s Move directly from linkup into forwarding Backbonefast Converges 2 sec + 2xFwd_delay for indirect link failures Eliminates maxage timeout 55

Campus Failover Layer 3 Recovery & Tuning Caution with aggressive tuning Good when network is stable, highly summarized OSPF (fast LAN links) Tune hello timer 1 sec, dead timer 3 sec <4s to recognize problem, then converge HSRP (fast LAN links) Tune hello timer 1 sec, dead timer 3 sec <4s to converge EIGRP (fast LAN links) Tune hello timer 1 sec, hold timer 3 sec <4s to recognize problem, then converge 56

Keeping Networks Available! KISS - eliminate complex L2 ASU - building blocks Redundant backbone Redundant L3 paths L3 segments failure domain 57

58