Reduce the risk of network and application performance issues Your how-to guide to optimising network performance and minimising downtime

Similar documents
Diagnosing the cause of poor application performance

Diagnosing the cause of poor application performance

Riverbed SteelCentral. Product Family Brochure

Proven techniques and best practices for managing infrastructure changes

OptiView. Total integration Total control Total Network SuperVision. Network Analysis Solution. No one knows the value of an

White Paper: Troubleshooting Remote Site Networks Best Practices

White Paper: Application and network performance alignment to IT best practices

OptiView. Total integration Total control Total Network SuperVision. Network Analysis Solution. No one knows the value of an

Riverbed SteelCentral. Product Family Brochure

Elevating Data Center Performance Management

Network Application Performance Alignment to IT Best Practices

Datasheet: Visual Performance Manager and TruView Advanced MPLS Package with VoIPIntegrity (SKU 01923)

Observer Probe Family

Delivering actionable service knowledge

Application-Centric Analysis Helps Maximize the Value of Wireshark

WHITE PAPER OCTOBER CA Unified Infrastructure Management for Networks

Application Performance Management

Observer Analysis Advantages

Best Practices from Deployments of Oracle Enterprise Operations Monitor

THE CONVERGENCE OF NETWORK PERFORMANCE MONITORING AND APPLICATION PERFORMANCE MANAGEMENT

White Paper: Validating 10G Network Performance

IBM Tivoli Network Manager software

Traffic Analysis With Netflow. The Key to Network Visibility

Network Management and Monitoring Software

Cisco Bandwidth Quality Manager 3.1

Beyond Monitoring Root-Cause Analysis

WHITE PAPER September CA Nimsoft For Network Monitoring

Managing Network Bandwidth to Maximize Performance

CA Virtual Assurance/ Systems Performance for IM r12 DACHSUG 2011

Cisco Network Analysis Module Software 4.0

A FAULT MANAGEMENT WHITEPAPER

Minder. simplifying IT. All-in-one solution to monitor Network, Server, Application & Log Data

APPLICATION PERFORMANCE MONITORING

RUGGEDCOM NMS. Monitor Availability Quick detection of network failures at the port and

Ten top problems network techs encounter

How To Use Mindarray For Business

White Paper. The Ten Features Your Web Application Monitoring Software Must Have. Executive Summary

Managing Application Delivery from the User s Perspective

Auditing the LAN with Network Discovery

whitepaper Network Traffic Analysis Using Cisco NetFlow Taking the Guesswork Out of Network Performance Management

Observer Probe Family

ForeScout CounterACT. Device Host and Detection Methods. Technology Brief

Network Performance + Security Monitoring

Network Monitoring Fabrics Are Key to Scaling IT

Network Assessment Services

This document describes how the Meraki Cloud Controller system enables the construction of large-scale, cost-effective wireless networks.

RIVERBED APPRESPONSE

Traffic Analysis with Netflow The Key to Network Visibility

Routing & Traffic Analysis for Converged Networks. Filling the Layer 3 Gap in VoIP Management

Aternity Virtual Desktop Monitoring. Complete Visibility Ensures Successful VDI Outcomes

Gaining Operational Efficiencies with the Enterasys S-Series

NetScope: Powerful Network Management

MRV EMPOWERS THE OPTICAL EDGE.

A New Paradigm for Network Problem Solving

Troubleshooting LANs with Network Statistics Analysis

Optimizing IT Performance

Introduction to Junos Space Network Director

Network Performance Management Solutions Architecture

Analyze hop-by-hop path, devices, interfaces, and queues Locate and troubleshoot problems

How To Manage A Network With Ccomtechnique

mbits Network Operations Centrec

Digital Advisory Services Professional Service Description Network Assessment

Whitepaper. 10 Metrics to Monitor in the LTE Network. blog.sevone.com

IBM Tivoli Netcool network management solutions for enterprise

Cisco Performance Visibility Manager 1.0.1

Cisco Wireless Control System (WCS)

HP End User Management software. Enables real-time visibility into application performance and availability. Solution brief

The evolution of data connectivity

STEELCENTRAL APPRESPONSE

About Network Data Collector

Application Performance Monitoring (APM) Technical Whitepaper

PROACTIVE PERFORMANCE MANAGEMENT

Wi-Fi, Health Care, and HIPAA

Evaluating Internal and Outsourced Models for Network Monitoring

Common Core Network Readiness Guidelines Is your network ready? Detailed questions, processes, and actions to consider.

BT Business. Embracing the Bring Your Own Device revolution

Overview. performance bottlenecks in the SAN,

Achieving Service Quality and Availability Using Cisco Unified Communications Management Suite

Support the Era of the App with End-to-End Network and Application Performance Visibility

CA Service Desk Manager

S o l u t i o n O v e r v i e w. Optimising Service Assurance with Vitria Operational Intelligence

Cisco Change Management: Best Practices White Paper

MANAGING NETWORK COMPONENTS USING SNMP

Kaseya Traverse. Kaseya Product Brief. Predictive SLA Management and Monitoring. Kaseya Traverse. Service Containers and Views

ENABLING TODAY S WIRELESS ENTERPRISE

Application Visibility and Monitoring >

Aternity Desktop and Application Virtualization Monitoring. Complete Visibility Ensures Successful Outcomes

Cisco and Visual Network Systems: Implement an End-to-End Application Performance Management Solution for Managed Services

Visibility in the Modern Data Center // Solution Overview

How To Use An Ipad Wireless Network (Wi Fi) With An Ipa (Wired) And An Ipat (Wired Wireless) Network (Wired Wired) At The Same Time

Introduction. The Inherent Unpredictability of IP Networks # $# #

Cisco NetFlow Generation Appliance (NGA) 3140

CISCO WIRELESS CONTROL SYSTEM (WCS)

Empowering the Enterprise Through Unified Communications & Managed Services Solutions

Test Equipment Depot Washington Street Melrose, MA TestEquipmentDepot.com. Application Advisor

Transcription:

Reduce the risk of network and application performance issues Your how-to guide to optimising network performance and minimising downtime The IT network is at the heart of most enterprises, supporting business-critical applications, providing the data on which business decisions are made and facilitating communications with customers, partners, suppliers and co-workers. More than ever before, it is a strategic asset to the business, and any downtime or degradation in network or application performance will directly impact the organisation s bottom line. To deliver the service levels agreed with the business, the challenge is two-fold: proactively improve and optimise performance to ensure that the network delivers what users require, and resolve any problems that arise as quickly as possible to minimise downtime. This White Paper looks at methodology of solving Table of contents Network complexity makes problem-solving more difficult... 2 The process for tackling network performance problems.... 2 Where problems can hide in the network.. 2 A new approach to problem solving.... 3 a. Monitor/alert... 3 b. Investigate... 4 c. Isolate... 4 d. Root cause analysis and problem resolution... 5 e. Network optimisation.... 5 Solutions from.... 6 network and application performance issues and outlines a new approach to getting to root cause more quickly. Supplied and supported in the UK and Ireland by Phoenix Datacom tel: 01296 397711 info@phoenixdatacom.com www.phoenixdatacom.com

Network complexity makes problem solving more difficult Getting to the root cause of network and application issues is increasingly difficult and time-consuming in today s enterprise networks. Virtualisation is extending from the data centre to the desktop, cloud services are growing in popularity and BYOD (Bring Your Own Device) is here to stay, reflecting shifting work patterns and cultural change. Problems may result from a proliferation of WiFi devices, excessive use of bandwidth by unauthorised applications, configuration errors, poor application delivery infrastructure or many other sources. The increasing inclusion of voice and video adds more complexity and pushes bandwidth to its limits. Solving performance problems is made more difficult and time-consuming by the challenge of trying to ascertain whose responsibility they are, particularly when all groups are reporting green KPIs. The process for tackling network performance problems To get to the root cause of network performance issues, network engineers typically follow a four step troubleshooting process: Monitor / Alert Investigate (Determine problem scope) Isolate (Server/app network/client) Root Cause Analysis & Problem Resolution Figure 1: the problem solving workflow The tools available to assist problem solving fall into two categories: Network Management Systems (NMS) and packet capture and analysis tools. The NMS primarily plays a role at the monitor/alert phase, monitoring the company s routers and servers and asking whether they are working and responding as expected. However, some NMS are so complex to set up that they are only able to manage down to Layer 3 devices, so switches are not monitored at Layer 2. Polling data is aggregated over many minutes and so is smoothed out, hiding the impact of usage spikes. Additionally, because the NMS is centrally located, measurements made with the intention of understanding end user response times are inaccurate because the test is using a different part of the network to reach the device under investigation. As a network engineer progresses through the troubleshooting process, the usefulness of the NMS decreases and it fails to provide the detailed information required to investigate performance problems fully. A recent survey of approx. 3,000 network professionals found that 82% of respondents ranked network and application performance problems as a concern or critical issue, with 52% stating that an NMS has insufficient capabilities to get to the root cause of the problem most or all of the time. 51% of respondents said that they needed to leave their desk some or most of the time to troubleshoot the problem. To obtain more detailed information, the engineer has to turn to freeware or commercial packet capture and analysis tools. These have a limited role in the alert stage as they only view a single point in the network, but come into their own at the root cause analysis stage. The complexity of packet analysis tools requires skilled and experienced engineers, and they are time consuming to use, as the result can be too much data millions of packets to wade through, displayed through different user interfaces. This makes the troubleshooting process much more difficult and timeconsuming. Where problems can hide in the network The gap between these tools an NMS without comprehensive information and complex packet capture tools increases MTTR. Nagging, intermittent problems can hide in the network, reducing both productivity and the credibility of the IT department. To investigate and resolve performance issues quickly, the engineer needs end-to-end visibility across the network: a dedicated solution for automated network and application analysis that fills the gap between traditional NMS and packet capture. 2

This needs to address: Unmanaged equipment, which may have been purchased because it is less expensive but will cost more to troubleshoot when problems occur, as there is no visibility of the health of each network segment and utilisation levels cannot be monitored. In contrast, with a managed switch a network engineer can go to any switch port and see what the errors look like, view the utilisation and see who is connected to that port. Undocumented networks, a continual problem given that frequent changes on a network make any documentation out-of-date shortly after completion. Physically trying to trace the path would take a long time, but without accurate documentation the engineer does not know which packets are flowing where. What is required is a means of discovering the real-time path through the network. Too much data, when the problem may lie in just a few packets. Problem-solving would be much faster with an automated method of sifting through captured packets to find the bad ones an application centric analysis that takes a top down approach. Problems in the past, which only come to the engineer s attention hours after they have occurred. What s needed is a means of going back in time by capturing and analysing large amounts of granular data over an extended period of time, say 24 hours, to catch intermittent issues. New technology that isn t monitored, such as 10Gb Ethernet or 802.11n Wi-Fi. Many organisations have not invested in instrumentation for such technologies because they believe the substantial increase in capacity will overcome any problems. Wireless devices the engineer needs a way to identify and monitor WiFi devices, including BYOD, and to identify WiFi and non-wifi interference from Bluetooth devices, cordless telephones, microwaves etc. using spectrum analysis. Problems that reside outside the network, so that the engineer can identify them and hand the performance issue and supporting evidence to other IT teams or external service providers, with sufficient information to enable further investigation and a rapid solution. A new approach to problem solving What is needed is an holistic network and application performance solution that captures all the data in the network and provides intelligent analysis to enable engineers to isolate root cause more quickly, or identify if the real problem lies outside the network. It needs to collect, aggregate, correlate and mediate all information, including flow, SNMP data and information gathered from other devices, with granularity up to one millisecond. Data should be displayed through a single user configurable dashboard, so that guided workflows can be applied to isolate the root cause of the issue quickly. By taking away the need to make assumptions and enabling the user to follow a logical process until the issue is identified and resolved, MTTR is reduced and the network engineer becomes more effective. A network and application performance solution facilitates all stages of the troubleshooting process and provides the visibility needed to support network optimisation. Monitor/alert The first requirement when addressing and resolving network problems is a system that provides a timely alert that a problem has occurred. The worst case scenario is to find out through a call from a user, in which case the engineer is already on the back foot. Many network management tools alerts have to be manually configured for each network by setting the system to ping or discover all the devices in each broadcast domain. With an always on network and application performance solution, however, automated discovery and guided workflows make it quick and easy to immediately see which are connected. This dramatically reduces the time required for set-up and monitoring. Performance data is continually collected and stored in a database and displayed via a GUI on a performance dashboard, which the user can configure to suit their own requirements. Performance is monitored against a user defined baseline (e.g. the SLA) and anything outside this is immediately shown as an alarm. The user can then view the problem in varying degrees of detail as they begin the investigation stage. MSC Home Terminal in Antwerp, Belgium, is one of Europe s busiest container terminals. It uses a central database containing information about all containers and a combination of many smaller programmes. Any network performance issues would have a serious impact on the charging and discharging of cargo from vessels, trains and trucks. By deploying a network and application performance solution from Fluke Networks, MSC s IT team has an independent view of overall performance which has led to faster problem isolation and troubleshooting and lessened the risk of service-impacting performance related issues. 3

Network and application performance systems can also be integrated with existing network management systems such as HP OpenView or Tivoli Netcool, and pass information and alarms to service management and operational dashboard solutions. Investigate The network engineer now needs to investigate the scope of the problem. In order to facilitate rapid and accurate investigation, the solution needs to be able to collect and store all pertinent data, e.g. SNMP, flows, packets, end-user-response time etc. and store these for future analysis. A network and application performance solution also provides a real time method of discovering the path from the client to the service or application, significantly reducing the amount of time required; the path between the two devices can then be found and monitored for any problems across internal and external networks and the devices in the path. Results are displayed in a graphical format to facilitate understanding and rapid root cause analysis. For optimal effectiveness the system should provide interfaces with both 1Gbps and 10Gbps connectivity, and be capable of capturing data at line speed on the wire. Some solutions can trace a path through the network from a client to a server identifying Layer 2 and Layer 3 devices in the path and providing the granularity needed to identify the source of the problem. If the issue lies with a client or group of clients, the engineer needs to carry out a performance or application response test to identify if the problem is a wired or wireless network issue. By providing wired and wireless tools integrated using the same user interface, the network and application system enables a single test to identify the source of the problem. Malware outbreaks also can be identified as part of this process, including the originating IP address, enabling the engineer to identify root causes of downtime that other tools miss. Isolate At this stage the problem has been isolated to a single network segment, switch, router, server or application and the path, devices and ports in the path have been identified. Now the path needs to be analysed, requiring traffic statistics for each link to determine whether the issue is due to a faulty device, link media, noise or interference, or traffic overload. One of the great advantages of SNMP (Simple Network Management Protocol) is its ability to help isolate fault domains. Using SNMP to query each connection point along the way will determine if a traffic bottleneck is the source of the slowdown. This is straightforward if the devices in the path are managed and the engineer has the passwords or community strings to interrogate the devices. Otherwise he or she has to connect a tool in each link without disrupting the network to view the packets and traffic statistics. This can be extremely time consuming if there are a lot of links over a large geographic area, and may require multiple tools in different locations. An automated network infrastructure health check using a network and application performance tool makes it possible to monitor all of the SNMP supported devices, looking at application flows for those showing packet loss or high utilisation by querying the SNMP MIBs on the routers and reporting back at regular intervals. Whether there are tens or hundreds of switches in the network, the process is simple and quick. Some problems will only be visible by being at the point where the problem has arisen. This requires a portable device with the right testing capabilities and the right interface to connect at the problem point, whether that is in front of a client or a 10G link in a datacentre. With many people working remotely, having a tool which gives this visibility is vital and this will only increase in importance with the growth of BYOD. A portable tool can also be shipped to a remote site to see what is happening with unmanaged equipment in the network without the need for an accompanying engineer. Ideally it should be able to perform path analysis, measure application infrastructure health and application flows and analyse WLAN performance, as well as reviewing roaming and retry capability and investigating any interference from outside devices. Leicester College is one of the largest colleges in the UK, with three main campuses and a large number of outreach centres. To streamline network planning and maintenance, the college chose OptiView XG tools to help automate the analysis and troubleshooting of networks and applications. This delivers a level of insight into the network that was previously missing, and allows the IT team to respond quickly when problems arise on the wired or wireless side of the network. It can be used remotely or taken into the field to gather the data needed to troubleshoot or manage the network. If there are no links which are oversubscribed or have frame errors then the problem is not likely to be the network but this can only be confirmed if the engineer has analysed the links in a reasonable time and the problem he or she is trying to fix still exists. This requires the historic data captured by the network and application performance system. 4

Root cause analysis and resolution At this stage the engineer will confirm the cause of the issue, formulate and implement a fix and validate the solution. If the problem is not located in the network and is not the server response or the result of overloading resources, more detailed information is required by capturing and analysing packets. It is important to have isolated the link or triaged the problem between the server, the network and the application first, as packet analysis can be extremely time-consuming and requires a considerable amount of skill and experience. To get to the root cause more quickly it is best to take a top down approach to the analysis, beginning at the application level. For example, if the path is good but the response time is poor, the problem could be a virtualised server, an application running on multiple tiers or a bug in the application. One option is to use a packet analyser that can easily show the application level and packet ladder diagrams. Span or mirrored tap connections are easy to configure but can lose packets with heavy traffic loads and will not show Layer 1 errors as these are blocked by the Layer 2 switch providing the span. Passive taps are best but connecting them breaks the connection, which will disrupt users of the services this link provides. If performance is suffering, this usually does not cause a problem, but it may affect those using this link to connect to other services. A better solution is to construct the network with taps already placed in strategic position in front of server farms, data centres, routers to external links and in the core of the network. This enables the captures can be taken without breaking the network. If this is not possible the engineer may have to resort to span or port mirroring, bearing in mind the accompanying problems and inaccuracies. A network and application performance solution provides an automated method of sifting through the captured packets to find the bad ones. It uses an application centric approach, with a GUI that shows each data flow with a visual indicator to indicate problems. The engineer simply clicks on this to drill down and see exactly which packet or packets have a problem. This can be further aided by capturing packets at multiple points in the infrastructure to determine where the problem exists. It requires the ability to carry out multi-segment analysis, triggering data capture at multiple points at the same time and then merging the results to provide the whole picture. Effective root cause analysis can be carried out at either the data centre or remote sites to see if problems are server or application related. Some tools can pull management information from physical or virtual servers to reveal performance and resource issues. By collecting and analysing historic granular data, the network and application performance system also enables the engineer to go back in time to review the symptoms that were occurring when the problem first appeared, enabling intermittent problems to be identified and resolved. Network optimisation A network application and performance solution provides the visibility engineers need to document and audit the health of their corporate network. It enables them to spot poor performance and identify where the paths of applications or servers are running slowly, so that the slowest and most critical paths can be addressed. The information obtained can be used to prioritise projects such as server upgrades and make the business case for approval. It can also support installation of new equipment and applications by verifying that what has been done has worked and ensuring that it has not had a negative impact on performance elsewhere. The data can also prove (or otherwise) the impact of changes to the network such as virtualisation, WAN optimisation or data centre consolidation. T he University of Nottingham is one of the UK s foremost research institutions, with international campuses in Malaysia and China. The IT organisation supports more than 30,000 staff and students as well as 120 business systems across the University. One of its main challenges was troubleshooting application performance across the enterprise network while maintaining network services. The University uses a solution to provide visibility into the causes of performance issues, enabling the IT team to detect, isolate and resolve problems with less time and effort and reducing the time required to solve complex problems from days to hours. 5

About is the world-leading provider of network test and monitoring solutions to speed the deployment and improve the performance of networks and applications. Leading enterprises and service providers trust products and expertise to help solve today s toughest issues and emerging challenges in WLAN security, mobility, unified communications and datacenters. Based in Everett, Washington, the company distributes products in more than 50 countries. For more information on our network and application performance solutions, visit www.flukenetworks.com/instantvisibility Solutions from offers a range of complementary network and performance management solutions: TruView and OptiView. These are dedicated custom hardware with support and interfaces that provide both 1Gbps and 10Gbps connectivity. Visual TruView TM Appliance TruView provides the ability to track, baseline, trend and monitor individual application performance of every end user experience, enterprisewide through a highly customizable dashboard. It also provides high volume packet archival at 10Gbps line rate and comprehensive VoIP/Video monitoring and troubleshooting. More information at /truview OptiView XG Automated network and application analysis The OptiView XG is the first tablet specifically designed for the Network Engineer. It automates root-cause analysis of network and application problems allowing the user to spend less time on troubleshooting and more time on other initiatives. It is designed to support deployment of new technologies, including unified communications, virtualization, wireless and 10 Gbps Ethernet. The result is that new initiatives get up and running faster and network stay productive even in these days of smaller teams. More information at /xg OneTouch AT TM Client to Cloud troubleshooting in 60 seconds The OneTouch AT Network Assistant greatly reduces troubleshooting time through a streamlined, three-step approach: 1. The unique AutoTest replaces multiple tools and an hour of troubleshooting time. 2. A powerful set of network performance measurements to troubleshoot wired and Wi-Fi networks. 3. It enhances team collaboration through a simple web-remote interface and easy-to-use inline packet capture capabilities. More information at /OneTouchAT More information at: www.phoenixdatacom.com Supplied and supported in the UK and Ireland by Phoenix Datacom tel: 01296 397711 info@phoenixdatacom.com www.phoenixdatacom.com 6