Table of Contents WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 1

Similar documents
What Will You Automate?

Discover Live Network

WHITEPAPER: Streamline Enterprise IT Management Network Map Automation. A Visual Path to Automated Network Documentation

IBM Tivoli Network Manager software

NetBrain Consultant Edition v5.0

How To Use Ibm Tivoli Monitoring Software

NetBrain Enterprise Suite v5.1

Diagnosing the cause of poor application performance

Network Management Deployment Guide

Network Management for Common Topologies How best to use LiveAction for managing WAN and campus networks

WHITE PAPER. Best Practices for Network Monitoring Switch Automation

Data Center Automation - A Must For All Service Providers

Whitepaper. Controlling the Network Edge to Accommodate Increasing Demand

Traffic Analysis with Netflow The Key to Network Visibility

NetBrain Enterprise Edition v5.4g1

Drive Down IT Operations Cost with Multi-Level Automation

whitepaper Network Traffic Analysis Using Cisco NetFlow Taking the Guesswork Out of Network Performance Management

Application Visibility and Monitoring >

ICND2 NetFlow. Question 1. What are the benefit of using Netflow? (Choose three) A. Network, Application & User Monitoring. B.

Top 10 Reasons to Automate your IT Run Books

Cisco IOS Flexible NetFlow Technology

Top Ten Keys to Gaining Enterprise Configuration Visibility TM WHITEPAPER

Automate Key Network Compliance Tasks

White Paper. The Ten Features Your Web Application Monitoring Software Must Have. Executive Summary

Troubleshooting Network Performance with Alpine

Routing & Traffic Analysis for Converged Networks. Filling the Layer 3 Gap in VoIP Management

Improving. Summary. gathered from. research, and. Burnout of. Whitepaper

PROACTIVE PERFORMANCE MANAGEMENT

WHITEPAPER. VPLS for Any-to-Any Ethernet Connectivity: When Simplicity & Control Matter

WAN Traffic Management with PowerLink Pro100

Smart Business Architecture for Midsize Networks Network Management Deployment Guide

Traffic Analysis With Netflow. The Key to Network Visibility

PLUMgrid Toolbox: Tools to Install, Operate and Monitor Your Virtual Network Infrastructure

Best Practices for Eliminating Risk from Routing Changes

ForeScout CounterACT. Device Host and Detection Methods. Technology Brief

Enhancing Network Monitoring with Route Analytics

Network Topology. White Paper

Cisco Discovery 3: Introducing Routing and Switching in the Enterprise hours teaching time

RIVERBED STEELCENTRAL NETMAPPER

Customer Evaluation Report On Incident.MOOG

Best Practices for Network Monitoring How a Network Monitoring Switch Helps IT Teams Stay Proactive

CA Service Desk Manager

can you improve service quality and availability while optimizing operations on VCE Vblock Systems?

CA Spectrum r Overview. agility made possible

Lab Management, Device Provisioning and Test Automation Software

NetBrain Workstation Professional Edition 2.3 Release notes

THE CONVERGENCE OF NETWORK PERFORMANCE MONITORING AND APPLICATION PERFORMANCE MANAGEMENT

Optimize Your Microsoft Infrastructure Leveraging Exinda s Unified Performance Management

A FAULT MANAGEMENT WHITEPAPER

WHITE PAPER OCTOBER CA Unified Infrastructure Management for Networks

The Next Generation Network:

How To: Diagnose Poor VoIP Calls through diagnostics.

Managing Network Bandwidth to Maximize Performance

Beyond Quality of Service (QoS) Preparing Your Network for a Faster Voice over IP (VoIP)/ IP Telephony (IPT) Rollout with Lower Operating Costs

This document describes how the Meraki Cloud Controller system enables the construction of large-scale, cost-effective wireless networks.

WHITE PAPER. Extending the Reach of the Help Desk With Web-based Asset Management Will Significantly Improve Your Support Operations

Solving Monitoring Challenges in the Data Center

Data Center Solutions

Configuring and Managing Token Ring Switches Using Cisco s Network Management Products

Optimizing Enterprise Network Bandwidth For Security Applications. Improving Performance Using Antaira s Management Features

Network change is constant: Configuration and compliance management can help

Internet Services. Amcom. Support & Troubleshooting Guide

Network Configuration Management

White Paper: Application and network performance alignment to IT best practices

VCS Monitoring and Troubleshooting Using Brocade Network Advisor

How To Manage A Network With Ccomtechnique

Ten top problems network techs encounter

Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid clouds.

WHITE PAPER Using SAP Solution Manager to Improve IT Staff Efficiency While Reducing IT Costs and Improving Availability

Introduction. The Inherent Unpredictability of IP Networks # $# #

Intelligent Routing Platform White Paper

BMC ProactiveNet Performance Management Application Diagnostics

Cisco IP Solution Center MPLS VPN Management 5.0

CIO Financial Services Visualization Dashboards that Make Managing IT Easier

Kaseya Traverse. Kaseya Product Brief. Predictive SLA Management and Monitoring. Kaseya Traverse. Service Containers and Views

WhatsUpGold. v3.0. WhatsConnected User Guide

Buyer s Guide to Automated Layer 2 Discovery & Mapping Tools

CET442L Lab #2. IP Configuration and Network Traffic Analysis Lab

Smart Data Center Solutions

Vistara Lifecycle Management

Real-Time Traffic Engineering Management With Route Analytics

Brocade Network Monitoring Service (NMS) Helps Maximize Network Uptime and Efficiency

Saving Time & Money Across The Organization With Network Management Simulation

Managed Service Plans

Application Performance Management

Lab Organizing CCENT Objectives by OSI Layer

Wi-Fi, Health Care, and HIPAA

Transcription:

Table of Contents 1. Executive Summary... 1 2. Why is Network Troubleshooting So Hard?... 1 Causes of Network Outages... 1 The Cost of Network Outages... 1 Finding a Needle in a Haystack: Troubleshooting with Limited Visibility... 2 3. Divide & Conquer with Network Automation... 3 A Network Map to Define the Scope of the Problem... 4 Analyzing Network Performance... 5 Analyzing Recent Changes... 6 Diagnosing Network Segments in Parallel... 7 4. Case Study: Dimension Data Accelerates Troubleshooting on Customer Networks... 10 WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 1

1. Executive Summary When the network goes down, every minute counts. Data from a 2013 CDW survey suggests that network outages cost enterprises over $1.7B in lost revenue over the previous year. Much of this loss could have been avoided if network teams were able to discover the source of problems more quickly. Many enterprises have already deployed network monitoring systems to help them react to incidents faster, but it s not enough. It s equally important to improve mean-time-to-repair (MTTR) by accelerating troubleshooting times. In this paper, we ll examine why network troubleshooting is so challenging and look at opportunities to improve incident response times with a divide and conquer strategy. We ll address how automation can be applied to a traditional troubleshooting methodology for isolating the problem, gathering information, and automating the analysis of critical data. 2. Why is Network Troubleshooting So Hard? Effective troubleshooting requires a combination of both experience and an intimate knowledge of the network s design. Even when a network engineer possesses both, there s still the challenge of diagnosing network symptoms, involving a lot of manual data collection and analysis. Top Causes of Network Outages* o 23% from router/switch failure (including DoS attacks) o 32% from a link failure (fiber cuts, network congestion) o 36% from a network change (upgrade, config change) *Data from a 2013 Cisco Study Causes of Network Outages There s a lot of hype and media coverage around network hacking and DDoS attacks, but far more network outages are actually caused by mistakes made by an organization s own people. A recent Gartner study estimated that people and process issues will cause 80% of outages impacting mission-critical services through 2015. Of that number, more than 50% will be the result of a network upgrade or configuration change. The Cost of Network Outages Early in 2014, both Xbox LIVE and Facebook suffered well-publicized network outages, both caused by configuration errors during scheduled maintenance. For Xbox LIVE the untimely outage crippled the launch of one of their biggest online games. For Facebook, 30 minutes of downtime cost an estimated $500,000 in lost ad revenue. Of course, the cost to a business reputation may be far higher if customers are impacted. WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 1

Finding a Needle in a Haystack: Troubleshooting with Limited Visibility Network visibility is increasingly sought-after in the network industry, because better visualization of the network leads to better decision-making and faster problem resolution. Despite dozens of tools, which claim to improve visibility, the most common window a troubleshooter has into the network is the command-line interface (CLI). Unfortunately, the CLI provides a narrow field of vision for troubleshooters because the information they can gather is limited to the rate at which they can issue and interpret commands one device at a time. When diagnosing a network problem, it s estimated that engineers spend 80% of their time manually gathering data, and only 20% analyzing it. This time spent data mining represents an opportunity for improvement. The figure below shows how important the task of gathering and analyzing information is during a typical troubleshooting scenario. Figure 1: Visibility Challenges during Troubleshooting Diagnosis Because the CLI provides limited visibility, engineers also need access to accurate troubleshoot-ready network diagrams. These are diagrams that target the problem area and omit parts of the network that aren t related to the problem. These maps should include design parameters including routing protocols, access-lists, VLANs, etc. Today, very few tools exist which can provide these types of maps; instead engineers commonly rely on static diagrams, commonly created with MS Visio. Although both the CLI and network diagrams (if available), help troubleshooters gather information about topology and configuration, they re WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 2

both poor tools for understanding what s happening on the network. During an incident, engineers need to understand both live performance as well as recent changes. Even with a performance monitoring solution deployed, engineers often struggle due to information overload. The last factor we ll address in this paper is the dependence network teams have on tribal knowledge. This refers to the all-too-common scenario where a network hero needs to come in and solve a difficult problem. The reason is that a very small percentage of team members have sufficient troubleshooting experience or intimate network knowledge which is required to solve complex problems. The figure below summarizes the challenges associated with visibility, and how it impacts an engineer s ability to find answers to their most critical questions. Figure 2: Sources and Limitations of Network Visibility in an Enterprise Environment 3. Divide & Conquer with Network Automation There s no shortage of network monitoring tools to help engineers detect network outages, but the steps to diagnose a detected alarm are almost always manual. Effective troubleshooting techniques require a tool which can both increase network visibility as well as help divide and conquer time-consuming analyses. WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 3

A Network Map to Define the Scope of the Problem Without visual aids, the ability to understand complex networks begins to break down. Network diagrams serve as the go-to visual aid for network engineers, but troubleshooting is dramatically hindered if the diagrams aren t up-to-date and reliable. More than a repository of updated site diagrams, what a troubleshooter needs is a customized diagram, which omits irrelevant parts of the network that only serve to distract. For example, if a slow application is traversing across three data centers, an engineer needs a single diagram of the application flow, not three diagrams, one for each data center. In other words, a tailored diagram is the best asset. A Fresh Approach: Dynamic Network Mapping NetBrain s unique network diagrams are dynamic in nature, which means they are updated automatically, when the network changes. NetBrain diagrams can be created on-demand as well, so engineers don t need to sort through dozens of diagrams during an incident. Instead, they can instantly create a custom map focused on the event. Network engineers are frequently asked to troubleshoot poorly performing applications, with little more to go on than a report of slowness. To tackle this challenge, the engineer can dynamically create a custom layer-3 or layer-2 map of the application flow by entering two IP addresses (i.e. the source IP address and the IP of the application server). NetBrain will perform a comprehensive analysis of the routing, access-lists, and NAT for every hop in the path. The resulting map will show which devices are in the path of the application flow. Figure 3: A Tailored Diagram of an Application Flow (Created On-Demand with NetBrain) WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 4

Analyzing Network Performance It s difficult to troubleshoot performance problems without being able to see what s happening on the network. Many network teams have 24x7 network monitoring systems that generate alarms when an incident occurs. Examples of such monitoring tools include HP OpenView, IBM Tivoli, CA Spectrum, and Solarwinds NPM. Figure 4: Example Network Monitoring and Alerting Tools Network monitoring tools solve only half of the puzzle; after an alarm is generated network teams still revert to manual methods of troubleshooting. An effective troubleshooting tool should integrate with network monitoring and ticketing systems to improve visibility into the problem area. Diagnostic Monitoring on a Live Map NetBrain s monitoring function can be turned on from any map, or even launched from a 3 rd part monitoring tool, to visualize the performance characteristics of each device and interface. When troubleshooting a slow application, engineers can quickly spot bandwidth bottlenecks on the interfaces (highlighted in red) or CPU/Memory over-utilization on each device. For intermittent application behavior issues, monitoring can be left to run overnight; it will collect and plot each data point to highlight trends. WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 5

Figure 5: Monitoring Application Performance Factors (Issues Highlighted in Red) Analyzing Recent Changes With over one third of network outages resulting from a network change, visibility into what s changed is critical. That means understanding not just what s changed in configuration, but understanding the impact of those changes on routing, topology, application traffic, and more. Automated Change Analysis NetBrain can be configured to benchmark the network regularly so that network teams are better equipped to understand recent changes. During every benchmark, NetBrain collects live data and looks for changes in configuration, routing, inventory, as well as MAC/ARP/CDP/STP tables. NetBrain also includes comparative analysis capabilities to automatically highlight the changes side-byside. Figure 6: NetBrain s System Benchmark Properties WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 6

By way of example, when troubleshooting application slowness, an engineer can rewind the clock and see how application traffic was being routed before the problem arose. Any changes could provide valuable clues into the problem. Figure 7: Analyzing Application Traffic from Last Week Diagnosing Network Segments in Parallel When engineers rely on the command line interface as their primary troubleshooting tool, they re forced to diagnose the network in a serialized manner, one device at a time. That s because the output to CLI commands is often uneasy to scan, and important data points are hard to find. Finding the missing pieces of information may take dozens of commands. CLI ping and traceroute used to determine path Quick performance test results Repeat until problem is found Quick performance test results Multiple show level commands in multiple CLI windows Stare and compare to find deviations and anomalies Figure 8: Serialized Troubleshooting with the CLI WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 7

Effective troubleshooting should instead occur in parallel, meaning that commands are issued on many devices simultaneously and only the relevant data is parsed from the output. A network map serves as the best troubleshooting user interface because it provides a canvas for which to populate the relevant data. Figure 9: Diagnosing Interface Errors in Parallel (collisions and CRC errors labeled in red) The image above shows what it may look like to diagnose the interfaces of multiple devices, in parallel, on a live network map. Troubleshooting automation can issue the appropriate commands on your behalf, and extract the relevant data. Adaptive Network Automation A Powerful Alternative to Scripting Writing Perl and Python scripts to automate data collection is powerful, but the vast majority of network engineers aren t programmers and they struggle to realize the benefits. NetBrain eliminates the programming requirement from network automation with its quick programming environment. Engineers can literally point and click to program their own NetBrain Qapps. As an example, the Check Interface Errors Qapp - which was written by a NetBrain engineer in less than 10 minutes - can be run to detect incrementing interface errors and speed/duplex mismatches. WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 8

Figure 10: NetBrain s Quick Programming Environment Each new Qapp becomes a new feature, and it leverages a dynamic map to display the output. For troubleshooters, every Qapp is an executable diagnosis which can automatically extract and analyze the CLI data which would otherwise be collected manually. This helps network teams troubleshoot virtually any network issue in parallel, rather than one device at a time. It also helps network teams digitize and share their troubleshooting checklists. WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 9

4. Case Study: Dimension Data Accelerates Troubleshooting on Customer Networks.CUSTOMER PROFILE: Industry: Managed Services Company: Dimension Data CHALLENGE: Dimension Data does not own the customer networks they manage so they struggle to gain and maintain intimate knowledge on those networks, which is inherently gained through day-to-day operations. SOLUTION: Dimension Data utilizes NetBrain to automate diagram creation, visualize performance issues to expedite diagnosis, and to easily share information for collaborative troubleshooting sessions. BENEFIT: NetBrain s advanced network visualization and automation capabilities enable Dimension Data to shorten typical diagnosis and repair time by as much as 50%. Dimension Data specializes in information technology services, with operations on every inhabited continent. Dimension Data's focus areas include network integration, security solutions, data center solutions, converged communications, and a range of professional, consulting, and managed services. A major challenge the company consistently faces is the ability to understand their customers networks to the extent necessary to diagnose and troubleshoot complex issues and resolve network outages effectively. Dimension Data deployed NetBrain in their customer environments, in many cases integrating the tool with the NetCool alarm system, Opsware configuration management solution, and Vitalnet s performance trending solution. With these integrations, an alarm reported by HP OpenView is instantly translated to a map inside NetBrain Workstation. NetBrain continues to offer value to Dimension Data in three areas: On-demand network mapping effectively removes dependencies on manual network diagrams which are often inconsistent and errorprone. Network performance diagnosis via Dynamic Diagrams enables lowerlevel engineers to troubleshoot advanced problems Engineers share information via NetBrain for collaboration The following are some war stories reported by this customer: Detecting Serious Routing Issues on the Accudyne Network NetBrain was able to provide real-time network visibility into the Accudyne network and help identify serious routing issues. The tool was used to highlight the congestion points on the map and ultimately tie the problem to equal cost routes and MPLS design segregation. Troubleshooting Slowness to a Server Previously it took Dimension Data almost two and a half hours to determine the source and destination path of an application server inside the Accudyne network, followed by another two hours to diagnose the problem. With NetBrain, the task to find the path took two minutes, and another five minutes was all that was needed to diagnose the issue. Troubleshooting MS Outlook Slowness to Tokyo The Tokyo office was experiencing slowness sending outlook attachments. Multiple tickets had been opened for this issue and several engineers had already looked into it. NetBrain was then applied and, within three minutes, it was determined that there was a duplex issue on the edge WAN port. NetBrain saves time when time is critical. As a Dimension Data Network Integration Engineer reported, It has changed the way I approach troubleshooting. WHITEPAPER: LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 10

About NetBrain Technologies, Inc. Founded in 2004, NetBrain set out to pursue a new vision: automate timeconsuming tasks associated with network documentation, design, and troubleshooting. NetBrain s customers are using map-driven automation to eliminate manual network documentation, automate troubleshooting tasks, and mitigate security risks. NetBrain is headquartered in Burlington, MA with offices in Sacramento, CA, New York, and Beijing, China. To learn more about NetBrain s dynamic mapping solution, contact us at 781.221.7199 or download free trial of NetBrain s Enterprise Suite from www.netbraintech.com/trial. NetBrain Technologies, Inc. 15 Network Drive Burlington, MA 01803 +1 800 605 7964 info@netbraintech.com WHITEPAPER: www.netbraintech.com LEVERAGING AUTOMATION FOR ADVANCED NETWORK TROUBLESHOOTING 11