Managing Availability and Failure Avoidance

Similar documents
Comparing Three Solutions

Enhanced Enterprise SIP Communication Solutions

Verizon Unified Communications and Collaboration as a Service Service Level Agreement ( SLA )

Business Continuity protection for SIP trunking service

IP Telephony Management

Deploying, Configuring, and Administering Microsoft Lync Server 2010

Course 10533A: Deploying, Configuring, and Administering Microsoft Lync Server 2010

Convergence & Disaggregation in Telecommunications

VoIP Survivor s s Guide

Convergence: VOIP/UC Business Case. Robin Gareiss Executive Vice President, Sr. Founding Partner

An Introduction to SIP

AudioCodes Gateway in the Lync Environment

This chapter covers four comprehensive scenarios that draw on several design topics covered in this book:

MPLS: Key Factors to Consider When Selecting Your MPLS Provider Whitepaper

Unified Communications. Summary of Manufacturer Technical Evaluations

VitalPBX. Hosted Voice That Works. For You

WHITEPAPER MPLS: Key Factors to Consider When Selecting Your MPLS Provider

PREPARED FOR ABC CORPORATION

Figure 1. Traditional PBX system based on TDM (Time Division Multiplexing).

IP Telephony: Reliability You Can Count On

MyIP. Overview. listening, 6/3/2011 understanding, delivering

November Defining the Value of MPLS VPNs

Convergence: The Foundation for Unified Communications

Truffle Broadband Bonding Network Appliance

SIP Trunking: Evolution and Position in the Market Today VoiceCon, November 2008

VoIP Solutions Guide Everything You Need to Know

Best Practices for VoIP in the Contact Center Part 1: Planning a Successful Transition BY LORI BOCKLUND AND BRIAN HINTON

Meeting the challenge of voice services

Request for Proposals Voice over Internet Protocol Unified Communications System /2016

TÓPICOS AVANÇADOS EM REDES ADVANCED TOPICS IN NETWORKS

Avoid Network Readiness Risks with The Phybridge UniPhyer

Best Practices for deploying unified communications together with SIP trunking connectivity

Product Overview. Steve Erickson

Level: 3 Credit value: 9 GLH: 80. QCF unit reference R/507/8351. This unit has 6 learning outcomes.

VoIP Resilience and Security Jim Credland

At the PAC, how many ports are in each of the 3 Catalyst 3560G switches. Do you know how many IP phones will be assigned to each of the switches?

Communications Transformations 2: Steps to Integrate SIP Trunk into the Enterprise

Experience The Windstream Advantage: Corporate Overview. David Allen Regional Director - NH, VT and ME Windstream Communications, Inc.

Agenda What can we do now? And 5 years from now we will still be current!

MPLS: Key Factors to Consider When Selecting Your MPLS Provider

Cisco Data Center 3.0: Aligning IT to the 21 st Century Business

10533A: Deploying, Configuring, and Administering Microsoft Lync Server 2010

Contents. Specialty Answering Service. All rights reserved.

SSVVP SIP School VVoIP Professional Certification

How To Set Up An Ip Trunk For A Business

VoIP / SIP Planning and Disclosure

VoIP and IP Telephony A Carrier s s Perspective

Adit 3000 Series Part Guide

Table of Contents Table of Contents...2 Introduction...3 Mission of IT...3 Primary Service Delivery Objectives...3 Availability of Systems...

Information Technology Cluster

Smart Tips. Enabling WAN Load Balancing. Key Features. Network Diagram. Overview. Featured Products. WAN Failover. Enabling WAN Load Balancing Page 1

INFORMATION TECHNOLOGY ENGINEER V

CITY OF OAK CREEK VoIP Telephone System Addendum

Delivering UC Solutions UC Summit

Global Headquarters: 5 Speen Street Framingham, MA USA P F

CCIE Exam Certification CCIE Routing and Switching Exam Certification Guide dec-2009

Voice and Data Convergence

Enabling Innovation - Unleashing Unified Communications: Best Practices and Case Studies. October 18-19, 2011

Core Solutions of Microsoft Lync Server 2013

With 360 Cloud VoIP, your company will benefit from more advanced features:

IP Telephony Technology. IEEE ComSoc Meeting. Corey Coffin, SE Cisco Systems

TRANSFORMATION OPPORTUNITIES WITH THE ALCATEL-LUCENT OPENTOUCH SUITE OPTIMIZING CONVERSATION DELIVERY OVER CENTRALIZED COMMUNICATIONS NETWORKS

PROPRIETARY CISCO. Cisco Cloud Essentials for EngineersV1.0. LESSON 1 Cloud Architectures. TOPIC 1 Cisco Data Center Virtualization and Consolidation

Selecting the Right SIP Phone for Your IP PBX By Gary Audin May 5, 2014

How Proactive Business Continuity Can Protect and Grow Your Business. A CenturyLink White Paper

Release the full potential of your Cisco Call Manager with Ingate Systems

Networking 4 Voice and Video over IP (VVoIP)

CenSus ICT Strategy ( )

Addendum SOLICITATION NAME ADDENDUM NUMBER. VOIP Telephone System C DATE

1.264 Lecture 37. Telecom: Enterprise networks, VPN

NETWORK ADMINISTRATOR

Mobilize to Rightsize Your Network

Allstream Converged IP Telephony

Internet Resiliency and Recovery

WHY COX BUSINESS? SIP TRUNKING: BUSINESS CONTINUITY AND REDUNDANCY A White Paper

ICE 008 IP PBX. 1. Product Information New Mini PBX Features System Features

High Availability Private Network (HAPN ) Network Upgrade Case Study

How To Make A Phone System More Reliable And Reliable

SIP Trunking. October 7, 2011

Avaya Aura Communication Manager Greater than 5 Nines Availability

Network Management System (NMS) FAQ

How To Use Ip Telephony For Small Business

A New Approach to Cloud Communications

Voice over IP Basics for IT Technicians

P-3202H-Bb. G-PON VoIP IAD DEFAULT LOGIN DETAILS. Firmware v1.0 Edition 1, 09/2009. IP Address: Password: 1234

A New Approach to Communications as a Service (CaaS)

Transcription:

Innovate Integrate Transform Interaction Information Networks Managing Availability and Failure Avoidance Phil Edholm President and Principal

Business Continuity is Availability Most organizations have some level of requirement to understand and document Business Continuity System availability Disaster recovery Operational risk Availability of a VoIP/UC system is a critical part of Business Continuity planning and documentation Many Organizations EXPECT high availability of their communications solutions Often Availability is Defined by Five Nines 9/25/2015 Consulting 2015 2

9/25/2015 Consulting 2015 3

Understanding Nines Five Nines is 99.999% Availability Or.00001% Unavailability Or 5.2 Minutes per Year of Outage (24/365) 9/25/2015 Consulting 2015 4

What the Nines Mean Availability in Percentage Availability in Number of "Nines" Equivalent Average Unavailability (in Minutes) Equivalent Average Unavailability (in Hours) 99.99990% 6 0.5256 0.00876 99.9990% 5 5.256 0.0876 99.990% 4 52.56 0.876 99.90% 3 525.6 8.76 99.0% 2 5,526 92.1 9/25/2015 Consulting 2015 5

VoIP/UC is Major Change from TDM TDM Trunks SIP Trunks SBCs and Gateways TDM Trunks Communications System UPS/Generator Communications System Core 16 169 10 Years Servers/VMs Data Center Dist Data Center UPS/Generator IP Access (WAN, ISP, MPLS, etc) Core Data Network Devices WC Distribution Wiring Closet UPS/Generator Power Power 9/25/2015 Consulting 2015 6

Communications Application and Equipment Achieving Availability 99 98 200 Hrs Availability is the sum of the parts Multiple Items can cause a failure Redundancy avoids the impact of a failure 99.9 99.99 99.98 2 Hrs 99.8 20 Hrs 99.9989 99.985 4-8 Hours Per Year 99.999 99.99 99.9 99 5 Mins 53 Mins 9 hours 92 Hrs Data Network 9/25/2015 Consulting 2015 7

Communications Application and Equipment Achieving Availability 99 98 200 Hrs Even if the VoIP core system exceeds 5 nines, the data network will define the availability 99.9 99.99 99.98 99.8 20 Hrs 99.99 99.85 1-4 Hours Per Year 2 Hrs 99.999 99.99 99.9 99 5 Mins 53 Mins 9 hours 92 Hrs Data Network 9/25/2015 Consulting 2015 8

Availability is Built In Availability System Design Reliability & Redundancy Operations 9/25/2015 Consulting 2015 9

Impact of Redundancy Example of the impact of redundancy. Shown with 75% Availability, 25% Unavailability for illustration. Element 1 Element 2 Element 3 25% probability that Element 2 fails while Element 1 has failed 25% probability that Element 3 fails while Element 1 and 2 have failed The resulting Unavailability is a failure probability of 25% x 25% x 25% or 1.56%. Resulting Availability is 99.44% or 2 Nines 9/25/2015 Consulting 2015 10

System Area Average System Failure Contact Center Example of Process Mid-sized System with 4,000 users 3600 in remote sites 300 Contact Center Agents Dual Data Centers VM Clusters UPS no generators Comms System Redundant Comm Mgrs in DC1, BU in DC2 Redundant Session Manager in DC1 and DC2 Gateways in both DCs Trunking is SIP and TDM BU Data Network has dual paths between sites (MPLS and fiber) Remote sites on MPLS and cable BU Single Ethernet switches and remote site routers Power Core Comms Data Network Voice Trunking Availability Overall 9/25/2015 Consulting 2015 11

Calculating Availability in a Complex System Step 1 - Define Elements of the System Power Data Center Servers/VMs Network Communications System Core Branches Devices Trunking Data Network Core Edges Building WAN/MPLS Internet Access Define all of the Elements that are required to deliver services Define all of the redundancy Elements that are built into the architecture Define potential inter-element interactions (power fails in DC 1 and a server fails in DC2) 9/25/2015 Consulting 2015 12

Calculating Availability in a Complex System Step 2 Define Element MTBF/MTTR Use MTBF Data to define expected failure rate for each Element class Vendor Data Industry data Operational reports (good for power) Add in factor for Operator error to hardware and software MTBFs Typically in Comms Systems 30% of outages are caused by operator error Use MTTR Data to define repair timing Maximum and average Operational plans and commitments Experience Industry data Operational reports (good for power) Combine to generate Availability and Unavailability data for Element Unavailability is Minutes per Year Unavailable the average number of minutes per year the Element will be out of service 9/25/2015 Consulting 2015 13

Calculating Availability in a Complex System Step 3 Calculating Element Unavailability and Availability Add in the Operator factor for Elements that have an operator not calculated in the MTBF (typically these are hardware, software, or other Elements like servers and Ethernet switches or apps. Calculated MTBF = Base MTBF x (1- Operational factor) (generally 30%) Unavailability (%) = Unavailable minutes per year Total minutes per year Unavailability (%) = Average MTTR (Hours) x 60 Calculated MTBF (Years) Minutes per year (365 x 24 x 60 = 525,600) Calculated Availability = 100% - Unavailability % 9/25/2015 Consulting 2015 14

Calculating Availability in a Complex System Step 4 Define Failure Sequences For Each Element area define the potential failure sequences and calculate unavailability and availability for each 9/25/2015 Consulting 2015 15

Calculating Availability in a Complex System Step 5 Define Impacts For each failure type, define the impacts of the failure based on the redundancy System Failure - RED Reduced Capacity - ORANGE No Impact - GREEN 9/25/2015 Consulting 2015 16

Calculating Availability in a Complex System Step 6 Total Impacts for Area Sum up the minutes of impact per year for each type of failure and generate availability for that system area 9/25/2015 Consulting 2015 17

Calculating Availability in a Complex System Step 7 Sum the Impacts for all areas Sum up the minutes of impact per year for each type of failure and generate availability for that system area 9/25/2015 Consulting 2015 18

Calculating Availability in a Complex System Step 8 Show Impact and Percentages 9/25/2015 Consulting 2015 19

Calculating Availability in a Complex System Step 9 Analyze and Recommend Evaluate key failure areas Recommend changes Architectural Structural Operational Mitigation Steps for Network Cellular Redundancy Wireless PC redundancy Wireless Multiplicity Multiple data channels Create Survivability Tool 9/25/2015 Consulting 2015 20

Spreadsheet Demo and Review 9/25/2015 Consulting 2015 21

Survivability Tool Enables Support Organizations to rapidly understand what to do in a failure situation Shows what steps to take Red tag critical operational elements Steps to take for survivability Easy to understand Avoid Cascading Failures 9/25/2015 Consulting 2015 22

Cascading choices Top level location Data Center(s) Core Network Building System Communications Core Video Core Network Failed Element Switch Server App Gateway.. 9/25/2015 Consulting 2015 23

Structured Choices DC 1 DC 2 Net Core Building Power Network Video VoIP Server Comm App SM App Gateway. Normal Primary Normal operational Element Current redundancy Element Failure Protection Protection Actions (to keep running) Restoration actions (to get operational) User Impact First Redundancy Current redundancy Element Failure Protection Protection Actions (to keep running) Restoration actions (to get back to the primary) User Impact Second Redundancy Current redundancy Element Failure Protection Protection Actions (to keep running) Restoration actions (to get back to the primary) User Impact 9/25/2015 Consulting 2015 24

Tool Demo 9/25/2015 Consulting 2015 25

Summary There is a major disconnect between expectations and reality in Communications System Availability Achieving five nines of availability is hard Analyzing your customers networks for avilabilty will lead to design and operational choices An availability audit will reduce the chance of blame if issues occur 9/25/2015 Consulting 2015 26

Tools and Partnering Send me an email and I will send either/both of the Spreadsheets Contact me to partner for a Business Continuity Analysis for any of your clients 9/25/2015 Consulting 2015 27

Innovate Integrate Transform Interaction Information Networks Thank You and Questions