Introduction to HA Technologies: SSO/NSF with GR and/or NSR. Ken Weissner / kweissne@cisco.com Systems and Technology Architecture, Cisco Systems



Similar documents
LAB TESTING SUMMARY REPORT

Cisco CCNP Optimizing Converged Cisco Networks (ONT)

WAN Topologies MPLS. 2006, Cisco Systems, Inc. All rights reserved. Presentation_ID.scr Cisco Systems, Inc. All rights reserved.

Chapter 3. Enterprise Campus Network Design

- Multiprotocol Label Switching -

Bell Aliant. Business Internet Border Gateway Protocol Policy and Features Guidelines

BFD. (Bidirectional Forwarding Detection) Does it work and is it worth it? Tom Scholl, AT&T Labs NANOG 45

Cisco Catalyst 6500 High Availability: Deploying Redundant Supervisors for Maximum Uptime

Implementing VPN over MPLS

ISIS for ISPs. ISP/IXP Workshops

Example: Advertised Distance (AD) Example: Feasible Distance (FD) Example: Successor and Feasible Successor Example: Successor and Feasible Successor

Fast Re-Route in IP/MPLS networks using Ericsson s IP Operating System

Designing and Developing Scalable IP Networks

Table of Contents. Cisco How Does Load Balancing Work?

Configuring the Cisco IOS In-Service Software Upgrade Process

Cisco IOS Software: Guide to Performing In-Service Software Upgrades

Supervisor Redundancy for the Cisco Catalyst 6500 Series Switches with Cisco Catalyst Operating System

Distance Vector Routing Protocols. Routing Protocols and Concepts Ola Lundh

Brocade to Cisco Comparisons

Interconnecting Cisco Networking Devices Part 2

The Complete IS-IS Routing Protocol

: Interconnecting Cisco Networking Devices Part 2 v1.1

Virtual Private LAN Service on Cisco Catalyst 6500/6800 Supervisor Engine 2T

CCNP SWITCH: Implementing High Availability and Redundancy in a Campus Network

MPLS Basics. For details about MPLS architecture, refer to RFC 3031 Multiprotocol Label Switching Architecture.

MPLS Layer 2 VPNs Functional and Performance Testing Sample Test Plans

Cisco Router Configuration Tutorial

Understanding Virtual Router and Virtual Systems

How Routers Forward Packets

Chapter 4. Distance Vector Routing Protocols

IMPLEMENTING CISCO MPLS V3.0 (MPLS)

Data Networking and Architecture. Delegates should have some basic knowledge of Internet Protocol and Data Networking principles.

Link-State Routing Protocols

Router and Routing Basics

Course Contents CCNP (CISco certified network professional)

Measuring IP Network Routing Convergence. A new approach to the problem

Introducing Basic MPLS Concepts

Notice the router names, as these are often used in MPLS terminology. The Customer Edge router a router that directly connects to a customer network.

Disaster Recovery Design Ehab Ashary University of Colorado at Colorado Springs

Exterior Gateway Protocols (BGP)

Chapter 10 Link-State Routing Protocols

MPLS Architecture for evaluating end-to-end delivery

Implementing Cisco MPLS

BGP as an IGP for Carrier/Enterprise Networks

How To Make A Network Secure

Virtual PortChannels: Building Networks without Spanning Tree Protocol

MPLS Concepts. Overview. Objectives

Basic Configuration of the Cisco Series Internet Router

Route Discovery Protocols

New Features in Cisco IOS Software Release 12.2(33)SXI2

IMPLEMENTING CISCO IP ROUTING V2.0 (ROUTE)

Competitive Performance Testing Results Carrier Class Ethernet Services Routers

Implementation of High Availability

TechBrief Introduction

Enterprise Network Simulation Using MPLS- BGP

ENSURING RAPID RESTORATION IN JUNOS OS-BASED NETWORKS

High Availability Failover Optimization Tuning HA Timers PAN-OS 6.0.0

AT&T Managed IP Network Service (MIPNS) MPLS Private Network Transport Technical Configuration Guide Version 1.0

IP Networking. Overview. Networks Impact Daily Life. IP Networking - Part 1. How Networks Impact Daily Life. How Networks Impact Daily Life

MPLS WAN Explorer. Enterprise Network Management Visibility through the MPLS VPN Cloud

Cisco IOS MPLS configuration

Textbook Required: Cisco Networking Academy Program CCNP: Building Scalable Internetworks v5.0 Lab Manual.

Cisco Configuring Basic MPLS Using OSPF

CS551 External v.s. Internal BGP

ASA 9.x EIGRP Configuration Example

MPLS VPN over mgre. Finding Feature Information. Prerequisites for MPLS VPN over mgre

Understanding Route Redistribution & Filtering

BGP Convergence in much less than a second Clarence Filsfils - cf@cisco.com

Chapter 2 Lab 2-2, EIGRP Load Balancing

Data Center Use Cases and Trends

IP Application Services Commands show vrrp. This command was introduced. If no group is specified, the status for all groups is displayed.

- Redundancy and Load Balancing -

BSCI Chapter Cisco Systems, Inc. All rights reserved.

ICTTEN6172A Design and configure an IP- MPLS network with virtual private network tunnelling

Configuring a Load-Balancing Scheme

Implementing Cisco Service Provider Next-Generation Edge Network Services **Part of the CCNP Service Provider track**

Cisco Certified Network Associate Exam. Operation of IP Data Networks. LAN Switching Technologies. IP addressing (IPv4 / IPv6)

Juniper Exam JN0-343 Juniper Networks Certified Internet Specialist (JNCIS-ENT) Version: 10.1 [ Total Questions: 498 ]

Troubleshooting and Maintaining Cisco IP Networks Volume 1

AMPLS - Advanced Implementing and Troubleshooting MPLS VPN Networks v4.0

Routing in Small Networks. Internet Routing Overview. Agenda. Routing in Large Networks

Overview of Luna High Availability and Load Balancing

HP Networking BGP and MPLS technology training

: Interconnecting Cisco Networking Devices Part 2 v2.0 (ICND2)

Fast Reroute for Triple Play Networks

HPSR 2002 Kobe, Japan. Towards Next Generation Internet. Bijan Jabbari, PhD Professor, George Mason University

Configuring a Gateway of Last Resort Using IP Commands

Table of Contents. Cisco Configuring a Basic MPLS VPN

OSPF Routing Protocol

Chapter 7 Lab 7-1, Configuring Switches for IP Telephony Support

Why Is MPLS VPN Security Important?

Kingston University London

RESILIENT NETWORK DESIGN

Seminar Seite 1 von 10

Module 12 Multihoming to the Same ISP

MPLS VPN Security Best Practice Guidelines

Cisco Networking Academy CCNP Multilayer Switching

Best Practices for Eliminating Risk from Routing Changes

Networking. Palo Alto Networks. PAN-OS Administrator s Guide Version 6.0. Copyright Palo Alto Networks

MPLS. Cisco MPLS. Cisco Router Challenge 227. MPLS Introduction. The most up-to-date version of this test is at:

Transcription:

Introduction to HA Technologies: SSO/NSF with GR and/or NSR. Ken Weissner / kweissne@cisco.com Systems and Technology Architecture, Cisco Systems 1

That s a lot of acronyms Some definitions HA - High Availability High level terminology SSO - Stateful Switchover An operating mode where a dual processor router has transferred state information to a standby processor to allow the standby to pickup necessary router functions in the event of an active failure. Mostly refers to L2 information (PPP state, FIB ect) but some L3 applicability. In this operating mode both processors must run identical software versions. NSF - Non Stop Forwarding NSF refers to a routers ability to almost immediately start forwarding packets following an active processor failure. The FIB (Forwarding Information Base) is initially transferred and actively updated so that when a failure occurs, the router is able to forward packets while the control plane is rebuilt or refreshed. 2

That s a lot of acronyms (cont) GR - Graceful Restart IETF specified mechanisms for interaction between routing protocol peers which allow the peer of a failing device to continue forwarding packets to that device, even though the neighbor relationship has been destroyed. NSR Non Stop Routing A routing protocol operating mode where all information needed to fully maintain the neighbor relationship and all its relevant routing information is transferred (or checkpointed ) to the standby processor. No additional communication or interaction with the routing protocol peer is needed in this mode. Some implementations allow the use of both GR and NSR for the same protocol, but single routing protocol session must be either GR or NSR. All of the above technologies combine to allow for an unplanned processor switchover to occur with very little interruption in availability. ISSU In Service Software Upgrade A process which allows the complete upgrade of a dual processor router. This process uses the technologies of SSO/NSF with GR/NSR with the added functionality of a stateful switchover between different images. 3

Why SSO/NSF? Protection for devices connected to a single PE device Software or hardware failures induce processor switchover to maintain router availability Framework and infrastructure for supporting In Service Software Upgrades (ISSU) Leverages distributed nature of router Redundant Route Processors ACTIVE STANDBY Line Card Line Card Line Card Line Card 4

Networks Without NSF/SSO and Graceful Restart 1) Before PE Failover 2) During PE Failover 1. P PE Adjacency Established Traffic Flow P PE PE Router Restarts Adjacency Fails; Traffic Stops 3) After PE Failover 4) After PE Failover P PE Adjacency Reestablished Traffic Stopped P PE Traffic Flow Resumes 5

Networks with NSF/SSO and Graceful Restart 1) Before PE Failover 2) During PE Failover (GR Aware) 1. P (GR Aware) PE Adjacency Established Traffic Flow (GR Aware) P PE PE Router Restarts Traffic Flow Continues 3) After PE Failover 4) After PE Failover P PE Routing Updates Exchanged Traffic Flow Continues P PE Traffic Flow Uninterrupted Nonstop Forwarding!! 6

Graceful Restart IETF status GR is the only feature that interacts with peer network devices, all other features (SSO/NSF/NSR) are internal to the router and therefore don t require standards. Graceful Restart Mechanism for Label Distribution Protocol RFC 3478 Graceful Restart Mechanism for BGP RFC 4724 Restart Signaling for Intermediate System to Intermediate System (IS-IS) RFC 3847 Graceful OSPF Restart RFC 3623 draft-nguyen-ospf-restart-06 7

Graceful Restart Protocol Support RFC based support for BGP, OSPF, ISIS and LDP Additional support for EIGRP and draft-nguyen-ospfrestart-06 based OSPF GR. Two versions of OSPF GR are supported. Draftnguyen-ospf-restart-06 was implemented prior to the existence of RFC3623, and is widely deployed in networks today. A router participating in GR is said to be aware or capable, with awareness being a subset of capability. 8

Graceful Restart Awareness and Capability A Graceful Restart capable device will announce its ability to perform graceful restart to the routing protocol peer. It will also initiate the Graceful Restart process when a route processor transition occurs and act as a graceful restart aware device. A Graceful Restart aware device has the components to be able to understand a peer router is transitioning, and will take the appropriate action when it detects the peer router is performing Graceful Restart (start timers, routes in holddown, ect) Awareness is also referred to running in helper mode 9

Graceful Restart Awareness and Capability Configuration Graceful Restart capability must always be enabled for all protocols. This is only necessary on routers with dual processors that will be performing switchovers. Graceful Restart awareness is on by default for non-tcp based interior routing protocols (OSPF,ISIS and EIGRP). These protocols will start operating in GR mode as soon as one side is configured capable. TCP based protocols must enable GR on both sides of the session and the session must be reset to enable GR. The information enabling GR is sent in the Open message for these protocols. 10

Graceful Restart Concerns Voiced at Nanog40 peering BOF With regards to BGP graceful restart, the problem we ve seen with implementing it is that Cisco s implementation of graceful restart assumes you have NSF (non-stop forwarding), and then tells your peers, if I ever drop this BGP session, it s because Im failing over from the primary to redundant supervisor, and will keep passing packets, so keep sending them my way. That s all well and good if that s what actually happens. However, if the router really does go down, then the neighboring router continues sending traffic its was (blackholing it) for many minutes, rather than simply failing over to a working path. 11

Graceful Restart Concerns Addressed Determining between the peer switching over and the peer going away (power off, reload) is key to deploying. NSF is not configurable, it is enabled by default when SSO is configured on the router. NSF is a function of checkpointing the FIB to the standby. Early on use of the term NSF by cisco (predating the generally accepted term of Graceful Restart) can cause confusion. GR and NSF are two very different functions. Today in IOS, to enable OSPF or EIGRP GR, the command NSF under the routing process is used, while other protocols (BGP and LDP) use more appropriate variants of the term graceful restart. 12

Graceful Restart Concerns Addressed For BGP GR, there is a restart timer which gives a window for which the initiation of GR is allowed to happen. If the BGP peer does not come back up within this time, the GR is aborted and all stale routes are flushed. The default is 120 seconds, but given network conditions and test, this number can be reduced to reduce black holing for non-switchover events. Other conditions will abort GR as well. If the link is POS point to point, and the peer router reloads, the interface will go down and GR will abort. 13

Graceful Restart BGP Operation Summary BGP GR- Capable Router BGP GR- Aware Peer Router Restarts Send Restart Notification Session Established Performs Best Path Selection when EoR Is Received OPEN w/ Graceful Restart Capability 64 OPEN w/ Restart Bit Set OPEN w/ Capability 64 Send BGP Hello Send Initial Updates, End of RIB (EoR) Send Updates+ EoR CONVERGED! Acknowledge Restart, Mark Routes Stale, Start Restart Timer Stop Restart Timer, Start Stale-Path Timer Stop Stale-Path Timer, Delete Stale Prefixes, and Refresh with New Ones 14

RFC3623 vs. draft-nguyen-ospf-restart-06 GR RFC 3623 Uses Grace LSA to signal capability Uses LSDB resync as defined in RFC2328 Terminated if a routing topology chance occurs during GR (configurable) GR terminated if there is one or more GR unaware peers on a broadcast domain draft-nguyen-ospfrestart-06 Uses a Restart-Signaling bit in the LLS fields of hello packets Uses an Out of band LSDB resynchronization GR process uninterrupted until complete. GR continues even if unaware peers (configurable) Two very similar ways of accomplishing the same thing 15

Non-Stop Routing Cisco currently supports NSR for ISIS and BGP in IOS In box solution that required no additional communication with routing protocol peer. For interior protocols, NSR acts on the entire routing process. For BGP, NSR is enabled on a per-peer basis. Increases workload on router due to checkpointing routing information in addition to forwarding information to the standby processor. Hybrid solution helps with scalability of BGP NSR deployments 16

Hybrid BGP NSR The most compelling reason to run NSR on BGP sessions is to avoid having to worry if the peer has GR enabled or not. Perhaps the peer has older code, or even has the right code but does not have GR enabled. Relatively, a device will have much less routes than a routereflector. The hybrid solution recommends running GR to the route-reflectors and NSR to non-gr capable devices to reduce the checkpointing load on the router. Operator is more likely to have the right software on the routereflector (or the ability to change it) to support BGP GR. 17

Per Peer GR/NSR config for BGP Currently, BGP GR is globally enabled for all peers in the routing process. Where BGP NSR is available, it is configured on a perpeer basis. If GR is enabled globally and an individual session is configured for NSR, and that session receives GR signaling from its peer, GR will be used over NSR. Per peer GR config on horizon 18

Deployment Considerations From a high level, you need to protect the interfaces (SSO), the forwarding plane (NSF) and the control plane (GR or NSR). Many requests in the past to turn these features on one at a time, yet for a switchover to work, all need to be enabled and operating properly. Enabling SSO also enables NSF (FIB checkpointing) Each routing protocol peer needs to be examined to ensure that both its capability has been enabled and that its peer has awareness enabled. 19

Routing Protocol Timer Manipulation Lower than default routing protocol timers are common for faster detection of failures. When a switchover occurs, packet forwarding picks up almost immediately, but it takes a bit more time for the new processor to start sending routing protocol control packets. Delay is independent of use of GR or NSR. Typically not a problem for BGP, even with 10/30 timers we can get the first packet out well within 10 seconds. Using timers such as 1/5 for OSPF can be more problematic. Testing under your real-world conditions is essential, as the time to first packet can depend on config and platform. 20

Q and A 21

22