Resilient Network Design Concepts. Mark Tinka

Similar documents

Introductions to ISP Design Fundamentals

IMPLEMENTING CISCO IP ROUTING V2.0 (ROUTE)

Course Contents CCNP (CISco certified network professional)

Disaster Recovery Design Ehab Ashary University of Colorado at Colorado Springs

Cisco Certified Network Associate Exam. Operation of IP Data Networks. LAN Switching Technologies. IP addressing (IPv4 / IPv6)

COURSE AGENDA. Lessons - CCNA. CCNA & CCNP - Online Course Agenda. Lesson 1: Internetworking. Lesson 2: Fundamentals of Networking

Top-Down Network Design

CCNP SWITCH: Implementing High Availability and Redundancy in a Campus Network

Chapter 3. Enterprise Campus Network Design

ICTTEN6172A Design and configure an IP- MPLS network with virtual private network tunnelling

OSPF Routing Protocol

Data Networking and Architecture. Delegates should have some basic knowledge of Internet Protocol and Data Networking principles.

MPLS Concepts. Overview. Objectives

Troubleshooting and Maintaining Cisco IP Networks Volume 1

TechBrief Introduction

Designing and Developing Scalable IP Networks

How To Learn Cisco Cisco Ios And Cisco Vlan

RESILIENT NETWORK DESIGN

Introduction about cisco company and its products (network devices) Tell about cisco offered courses and its salary benefits (ccna ccnp ccie )

Introducing Basic MPLS Concepts

IP Networking. Overview. Networks Impact Daily Life. IP Networking - Part 1. How Networks Impact Daily Life. How Networks Impact Daily Life

CORPORATE NETWORKING

Example: Advertised Distance (AD) Example: Feasible Distance (FD) Example: Successor and Feasible Successor Example: Successor and Feasible Successor

Introduction to HA Technologies: SSO/NSF with GR and/or NSR. Ken Weissner / kweissne@cisco.com Systems and Technology Architecture, Cisco Systems

Border Gateway Protocol Best Practices

Introduction to Routing

BSCI Chapter Cisco Systems, Inc. All rights reserved.

"Charting the Course...

Interconnecting Cisco Networking Devices, Part 2 Course ICND2 v2.0; 5 Days, Instructor-led

Cisco Certified Network Professional - Routing & Switching

Interconnecting Cisco Networking Devices Part 2

Cisco Discovery 3: Introducing Routing and Switching in the Enterprise hours teaching time

Layer 3 Network + Dedicated Internet Connectivity

Leased Line + Remote Dial-in connectivity

Address Scheme Planning for an ISP backbone Network

ISOM3380 Advanced Network Management. Spring Course Description

: Interconnecting Cisco Networking Devices Part 2 v2.0 (ICND2)

Interconnecting Cisco Networking Devices, Part 2 **Part of CCNA Route/Switch**

This chapter covers four comprehensive scenarios that draw on several design topics covered in this book:

INTERCONNECTING CISCO NETWORKING DEVICES PART 2 V2.0 (ICND 2)

Demonstrating the high performance and feature richness of the compact MX Series

BFD. (Bidirectional Forwarding Detection) Does it work and is it worth it? Tom Scholl, AT&T Labs NANOG 45

Interconnecting Cisco Networking Devices: Accelerated (CCNAX) 2.0(80 Hs) 1-Interconnecting Cisco Networking Devices Part 1 (40 Hs)

Network Design for Highly Available VoIP

Brocade to Cisco Comparisons

ISP & IXP Design. Philip Smith APNIC st 31 st August 2012

Virtual PortChannels: Building Networks without Spanning Tree Protocol

MPLS VPN Services. PW, VPLS and BGP MPLS/IP VPNs

Using the Border Gateway Protocol for Interdomain Routing

ASM Educational Center (ASM) Est. 1992

Transitioning to BGP. ISP Workshops. Last updated 24 April 2013

SSVVP SIP School VVoIP Professional Certification

SSVP SIP School VoIP Professional Certification

IP, Ethernet and MPLS

ISP Case Study. UUNET UK (1997) ISP/IXP Workshops. ISP/IXP Workshops. 1999, Cisco Systems, Inc.

MPLS Traffic Engineering in ISP Network

Fast Reroute for Triple Play Networks

High-Availability Enterprise Network Design

Voice Over IP. MultiFlow IP Phone # 3071 Subnet # Subnet Mask IP address Telephone.

IP Routing Configuring RIP, OSPF, BGP, and PBR

LAB TESTING SUMMARY REPORT

Networking 4 Voice and Video over IP (VVoIP)

Expert Reference Series of White Papers. Planning for the Redeployment of Technical Personnel in the Modern Data Center

Cisco Dynamic Multipoint VPN: Simple and Secure Branch-to-Branch Communications

TRILL for Service Provider Data Center and IXP. Francois Tallet, Cisco Systems

ICANWK613A Develop plans to manage structured troubleshooting process of enterprise networks

Multi Protocol Label Switching (MPLS) is a core networking technology that

Route Optimization. rek Petr Grygarek, VSB-TU Ostrava, Routed and Switched Networks 1

NETE-4635 Computer Network Analysis and Design. Designing a Network Topology. NETE Computer Network Analysis and Design Slide 1

: Interconnecting Cisco Networking Devices Part 2 v1.1

Enterprise Network Simulation Using MPLS- BGP

Router and Routing Basics

Region 10 Videoconference Network (R10VN)

Configuring IP Load Sharing in AOS Quick Configuration Guide

ISP Network Design. Point of Presence Topologies. ISP Network Design. PoP Topologies. Modular PoP Design. PoP Design

Campus High availability network -LAN

CHAPTER 6 DESIGNING A NETWORK TOPOLOGY

- Multiprotocol Label Switching -

Redundancy and load balancing at L3 in Local Area Networks. Fulvio Risso Politecnico di Torino

IPv6 Fundamentals, Design, and Deployment

NX-OS and Cisco Nexus Switching

Juniper / Cisco Interoperability Tests. August 2014

Textbook Required: Cisco Networking Academy Program CCNP: Building Scalable Internetworks v5.0 Lab Manual.

Ativando MPLS Traffic Engineering

Cisco 12 CCNA Certification

Description: Objective: Upon completing this course, the learner will be able to meet these overall objectives:

GregSowell.com. Mikrotik Routing

Chapter 1 Personal Computer Hardware hours

White paper. Reliable and Scalable TETRA networks

Improving Network Efficiency for SMB Through Intelligent Load Balancing

Note: This case study utilizes Packet Tracer. Please see the Chapter 5 Packet Tracer file located in Supplemental Materials.

Bell Aliant. Business Internet Border Gateway Protocol Policy and Features Guidelines

MPLS RSVP-TE Auto-Bandwidth: Practical Lessons Learned

QoS Switching. Two Related Areas to Cover (1) Switched IP Forwarding (2) 802.1Q (Virtual LANs) and 802.1p (GARP/Priorities)

Redundancy & the Netnod Internet Exchange Points

Kingston University London

AT&T Managed IP Network Service (MIPNS) MPLS Private Network Transport Technical Configuration Guide Version 1.0

Transcription:

Resilient Network Concepts Mark Tinka 1

The Janitor Pulled the Plug Why was he allowed near the equipment? Why was the problem noticed only afterwards? Why did it take 6 weeks to determine the problem? Why wasn t there redundant power? Why wasn t there network redundancy? 2

Network and Architecture is of critical importance contributes directly to the success of the network contributes directly to the failure of the network No amount of magic knobs will save a sloppily designed network Paul Ferguson Consulting Engineer, Cisco Systems 3

What is a Well-ed Network? A network that takes into consideration these important factors: Physical infrastructure Topological/protocol hierarchy Scaling and Redundancy Addressing aggregation (IGP and BGP) Policy implementation (core/edge) Management/maintenance/operations Cost 4

The Three-legged Stool ing the network with resiliency in mind Using technology to identify and eliminate single points of failure Having processes in place to reduce the risk of human error Technology All of these elements are necessary, and all interact with each other One missing leg results in a stool which will not stand Process 5

New World vs. Old World Internet/L3 networks Build the redundancy into the system Telco Voice and L2 networks Put all the redundancy into a box Internet Network vs. 6

New World vs. Old World Despite the change in the Customer Provider dynamic, the fundamentals of building networks have not changed ISP Geeks can learn from Telco Bell Heads the lessons learned from 100 years of experience Telco Bell Heads can learn from ISP Geeks the hard experience of scaling at +100% per year Telco Infrastructure Internet Infrastructure 7

How Do We Get There? In the Internet era, reliability is becoming something you have to build, not something you buy. That is hard work, and it requires intelligence, skills and budget. Reliability is not part of the basic package. Joel Snyder Network World Test Alliance 1/10/2000 Reliability: Something you build, not buy 8

Redundant Network Concepts and Techniques 9

Basic ISP Scaling Concepts Modular/Structured Functional Tiered/Hierarchical Discipline 10

Modular/Structured Organize the network into separate and repeatable modules Backbone PoP Hosting services ISP Services Support/NOC Backbone link to Another PoP Consumer DIAL Access ISP Services (DNS, Mail, News,FTP, WWW) Channelised T1/E1 Circuits Nx64 Customer Aggregation Layer Nx64 Leased Line Circuit Delivery Other ISPs Network Core Hosted Services Backbone Link to Another PoP Consumer Cable and xdsl Access NxT1/E1 Customer Network Aggregation Layer Operations Centre T1/E1 Leased Line Circuit Delivery Channellized T3/E3 Circuits 11

Modular/Structured Modularity makes it easy to scale a network smaller units of the network that are then plugged into each other Each module can be built for a specific function in the network Upgrade paths are built around the modules, not the entire network 12

Functional One Box cannot do everything (no matter how hard people have tried in the past) Each router/switch in a network has a well-defined set of functions The various boxes interact with each other Equipment can be selected and functionally placed in a network around its strengths ISP Networks are a systems approach to design Functions interlink and interact to form a network solution. 13

Tiered/Hierarchical Flat meshed topologies do not scale Hierarchy is used in designs to scale the network Good conceptual guideline, but the lines blur when it comes to implementation. Other Regions Other Regions Distribution Layer Core Other Regions Access Layer 14

Multiple Levels of Redundancy Triple layered PoP redundancy Lower-level failures are better Lower-level failures may trigger higher-level failures L2: Two of everything L3: IGP and BGP provide redundancy and load balancing L4: TCP re-transmissions recover during the fail-over Backbone Border Intra-POP Interconnect PoP Intraconnect Access 15

Multiple Levels of Redundancy Multiple levels also mean that one must go deep for example: Outside Cable plant circuits on the same bundle backhoe failures Redundant power to the rack circuit over load and technician trip MIT (maintenance injected trouble) is one of the key causes of ISP outage. 16

Multiple Levels of Redundancy Objectives As little user visibility of a fault as possible Minimize the impact of any fault in any part of the network Network needs to handle L2, L3, L4, and router failure PoP Backbone Peer Networks Location Access Residential Access 17

Multiple Levels of Redundancy Neighboring POP OSPF Originate- Default into POP POP Interconnect Medium SW 1 Core Backbone Router OSPF Area 0 and ibgp Core 1 Core 2 OSPF Area 200 SW 2 Neighboring POP OSPF Originate- Default into PoP PoP Service and Applications NetFlow Collector and Syslog Server Access 1 Access 2 NAS 1 NAS 2 Dedicated Access Dial-up Customer s IGP Customer s IGP Customer s IGP 18

Redundant Network The Basics 19

The Basics: Platform Redundant Power Two power supplies Redundant Cooling What happens if one of the fans fail? Redundant route processors Consideration also, but less important Partner router device is better Redundant interfaces Redundant link to partner device is better 20

The Basics: Environment Redundant Power UPS source protects against grid failure Dirty source protects against UPS failure Redundant cabling Cable break inside facility can be quickly patched by using spare cables Facility should have two diversely routed external cable paths Redundant Cooling Facility has air-conditioning backup or some other cooling system? 21

Redundant Network Within the DataCentre 22

Bad Architecture (1) A single point of failure Single collision domain Single security domain Spanning tree convergence No backup Switch Central switch performance HSRP Dial Network Server Farm ISP Office LAN 23

Bad Architecture (2) A central router Simple to build Resilience is the vendor s problem More expensive No router is resilient against bugs or restarts You always need a bigger router Router Upstream ISP Dial Network Customer links Customer Hosted Services Server farm ISP Office LAN 24

Even Worse!! Avoid Highly Meshed, Non-Deterministic Large Scale L2 Building 1 Building 2 Where Should Root Go? What Happens when Something Breaks? How Long to Converge? Many Blocking Links Large Failure Domain! Broadcast Flooding Multicast Flooding Loops within Loops Spanning Tree Convergence Time Times 100 VLANs? Building 3 Building 4 25

Typical (Better) Backbone Access L2 Client Blocks Distribution L3 Backbone Ethernet or ATM Layer 2 Distribution L3 Access L2 Server Block Still a Potential for Spanning Tree Problems, but Now the Problems Can Be Approached Systematically, and the Failure domain Is Limited 26 Server Farm

The best architecture Client Access L2 Distribution L3 Core L3 Server farm multiple subnetworks Highly hierarchical Controlled Broadcast and Multicast Distribution L3 Access L2 27

Benefits of Layer 3 backbone Technology Multicast PIM routing control Load balancing No blocked links Fast convergence OSPF/ISIS/EIGRP Greater scalability overall Router peering reduced 28

Redundant Network Server Availability 29

Multi-homed Servers Technology L3 (router) Core L3 (router) Distribution L2 Switch Using Adaptive Fault Tolerant Drivers and NICs NIC Has a Single IP/MAC Address (Active on one NIC at a Time) When Faulty Link Repaired, Does Not Fail Back to Avoid Flapping Fault-tolerant Drivers Available from Many Vendors: Intel, Compaq, HP, Sun Many Vendors also Have Drivers that also Support etherchannel 1 Server Farm Dual-homed Server Primary NIC Recovery (Time 1 2 Seconds) 30

HSRP Hot Standby Router Protocol Technology 10.1.1.33 10.1.1.2 00:10:7B:04:88:CC 10.1.1.3 00:10:7B:04:88:BB default-gw = 10.1.1.1 10.1.1.1 00:00:0C:07:AC:01 Transparent failover of default router Phantom router created One router is active, responds to phantom L2 and L3 addresses Others monitor and take over phantom addresses 31

HSRP RFC 2281 Technology HSR multicasts hellos every 3 sec with a default priority of 100 HSR will assume control if it has the highest priority and preempt configured after delay (default=0) seconds HSR will deduct 10 from its priority if the tracked interface goes down Router Group #1 Primary Standby Standby Primary Standby Router Group #2 32

HSRP Technology Router1: interface ethernet 0/0 ip address 169.223.10.1 255.255.255.0 standby 10 ip 169.223.10.254 Internet or ISP Backbone Router2: interface ethernet 0/0 ip address 169.223.10.2 255.255.255.0 standby 10 priority 150 pre-empt delay 10 standby 10 ip 169.223.10.254 standby 10 track serial 0 60 Router 1 Router 2 Server Systems 33

Redundant Network WAN Availability 34

Circuit Diversity Having backup PVCs through the same physical port accomplishes little or nothing Port is more likely to fail than any individual PVC Use separate ports Having backup connections on the same router doesn t give router independence Use separate routers Use different circuit provider (if available) Problems in one provider network won t mean a problem for your network 35

Circuit Diversity Ensure that facility has diverse circuit paths to telco provider or providers Make sure your backup path terminates into separate equipment at the service provider Make sure that your lines are not trunked into the same paths as they traverse the network Try and write this into your Service Level Agreement with providers 36

Circuit Diversity Technology THIS is better than. Customer Customer Customer THIS, which is better than. THIS Whoops. You ve been trunked! Service Provider Network 37

Circuit Bundling MUX Technology Use hardware MUX Hardware MUXes can bundle multiple circuits, providing L1 redundancy Need a similar MUX on other end of link Router sees circuits as one link Failures are taken care of by the MUX Using redundant routers helps MUX WAN MUX 38

Circuit Bundling MLPPP Technology interface Multilink1 ip address 172.16.11.1 255.255.255.0 ppp multilink multilink-group 1! interface Serial1/0 no ip address encapsulation ppp ppp multilink multilink-group 1! interface Serial1/1 no ip address encapsulation ppp ppp multilink multilink-group 1 Multi-link PPP with proper circuit diversity, can provide redundancy. Router based rather than dedicated hardware MUX MLPPP Bundle 39

Load Sharing Load sharing occurs when a router has two (or more) equal cost paths to the same destination EIGRP also allows unequal-cost load sharing Load sharing can be on a per-packet or per-destination basis (default: per-destination) Load sharing can be a powerful redundancy technique, since it provides an alternate path should a router/path fail 40

Load Sharing Technology OSPF will load share on equal-cost paths by default EIGRP will load share on equal-cost paths by default, and can be configured to load share on unequal-cost paths: router eigrp 111 network 10.1.1.0 variance 2 Unequal-cost load-sharing is discouraged; Can create too many obscure timing problems and retransmissions 41

Policy-based Routing Technology If you have unequal cost paths, and you don t want to use unequal-cost load sharing (you don t!), you can use PBR to send lower priority traffic down the slower path! Policy map that directs FTP-Data! out the Frame Relay port. Could! use set ip next-hop instead route-map FTP_POLICY permit 10 match ip address 6 set interface Serial1.1!! Identify FTP-Data traffic access-list 6 permit tcp any eq 20 any!! Policy maps are applied against! inbound interfaces interface ethernet 0 ip policy route-map FTP_POLICY FTP Server Frame Relay 128K ATM 2M 42

Convergence The convergence time of the routing protocol chosen will affect overall availability of your WAN Main area to examine is L2 design impact on L3 efficiency 43

BFD BFD - Bidirectional Forwarding Detection Used to QUICKLY detect local/remote link failure Between 50ms and 300ms Signals upper-layer routing protocols to converge OSPF BGP EIGRP IS-IS HSRP Static routes Especially useful on Ethernet links - where remote failure detection may not be easily identifiable. 44

IETF Graceful Restart Graceful Restart Allows a router s control plane to restart without signaling a failure of the routing protocol to its neighbors. Forwarding continues while switchover to the backup control plane is initiated. Supports several routing protocols OSPF (OSPFv2 & OSPFv3) BGP IS-IS RIP & RIPng PIM-SM LDP RSVP 45

NSR NSR - Non-Stop Routing A little similar to IETF Graceful Restart, but Rather than depend on neighbors to maintain routing and forwarding state during control plane switchovers The router maintains 2 identical copies of the routing state on both control planes. Failure of the primary control plane causes forwarding to use the routing table on the backup control plane. Switchover and recovery is independent of neighbor routers, unlike IETF Graceful Restart. 46

VRRP VRRP - Virtual Router Redundancy Protocol Similar to HSRP or GLBP But is an open standard Can be used between multiple router vendors, e.g., between Cisco and Juniper 47

ISSU ISSU - In-Service Software Upgrade Implementation may be unique to each router vendor Basic premise is to modularly upgrade software features and/or components without having to reboot the router Support from vendors still growing, and not supported on all platforms Initial support is on high-end platforms that support either modular or microkernel-based operating systems 48

MPLS-TE MPLS Traffic Engineering Allows for equal-cost load balancing Allows for unequal cost load balancing Makes room for MPLS FRR (Fast Reroute) FRR provides SONET-like recovery of 50ms Ideal for so-called converged networks carrying voice, video and data 49

Control Plane QoS QoS - Quality of Service (Control Plane) Useful for control plane protection Ensures network congestion do not cause network control traffic drops Keeps routing protocols up and running Guarantees network stability Cisco features: CoPP (Control Plane Policing) CPPr (Control Plane Protection) 50

Factors Determining Protocol Convergence Network size Hop count limitations Peering arrangements (edge, core) Speed of change detection Propagation of change information Network design: hierarchy, summarization, redundancy 51

OSPF Hierarchical Structure ABR Backbone Area #0 Area #1 Area #2 Area #3 Topology of an area is invisible from outside of the area LSA flooding is bounded by area SPF calculation is performed separately for each area 52

Factors Assisting Protocol Convergence Keep number of routing devices in each topology area small (15 20 or so) Reduces convergence time required Avoid complex meshing between devices in an area Two links are usually all that are necessary Keep prefix count in interior routing protocols small Large numbers means longer time to compute shortest path Use vendor defaults for routing protocol unless you understand the impact of twiddling the knobs Knobs are there to improve performance in certain conditions only 53

Redundant Network Internet Availability 54

PoP One router cannot do it all Redundancy redundancy redundancy Most successful ISPs build two of everything Two smaller devices in place of one larger device: Two routers for one function Two switches for one function Two links for one function 55

PoP Two of everything does not mean complexity Avoid complex highly meshed network designs Hard to run Hard to debug Hard to scale Usually demonstrate poor performance 56

PoP Wrong Neighboring PoP External BGP Peering Big Router Neighboring PoP Dedicated Access Big SW Big NAS Big Server PSTN/ISDN Web Services 57

PoP Correct Neighboring PoP External BGP Peering Neighboring PoP Core Core 1 Routers Core 2 SW 1 PoP Interconnect Medium SW 2 Access 1 Access 2 NAS 1 NAS 2 Dedicated Access PSTN/ISDN 58

Hubs vs. Switches Technology Hubs These are obsolete Switches cost little more Traffic on hub is visible on all ports It s really a replacement for coax ethernet Security!? Performance is very low 10Mbps shared between all devices on LAN High traffic from one device impacts all the others Usually non-existent management 59

Hubs vs. Switches Technology Switches Each port is masked from the other High performance 10/100/1000Mbps per port Traffic load on one port does not impact other ports 10/100/1000 switches are commonplace and cheap Choose non-blocking switches in core Packet doesn t have to wait for switch Management capability (SNMP via IP, CLI) Redundant power supplies are useful to have 60

Beware Static IP Dial Problems Does NOT scale Customer /32 routes in IGP IGP won t scale More customers, slower IGP convergence Support becomes expensive Solutions Route Static Dial customers to same RAS or RAS group behind distribution router Use contiguous address block Make it very expensive it costs you money to implement and support 61

Redundant Network Operations! 62

Network Operations Centre Process NOC is necessary for a small ISP It may be just a PC called NOC, on UPS, in equipment room. Provides last resort access to the network Captures log information from the network Has remote access from outside Dialup, SSH, Train staff to operate it Scale up the PC and support as the business grows 63

Operations Process A NOC is essential for all ISPs Operational Procedures are necessary Monitor fixed circuits, access devices, servers If something fails, someone has to be told Escalation path is necessary Ignoring a problem won t help fixing it. Decide on time-to-fix, escalate up reporting chain until someone can fix it 64

Operations Process Modifications to network A well designed network only runs as well as those who operate it Decide and publish maintenance schedules And then STICK TO THEM Don t make changes outside the maintenance period, no matter how trivial they may appear 65

In Summary Implementing a highly resilient IP network requires a combination of the proper process, design and technology and now abideth design, technology and process, these three; but the greatest of these is process And don t forget to KISS! Keep It Simple & Stupid! Technology Process 66

Acknowledgements The materials and Illustrations are based on the Cisco Networkers Presentations Philip Smith of Cisco Systems Brian Longwe of Inhand.Ke 67