Ticket Management & Best Practices April 29, 2014
Trouble Ticketing System Objectives CURRENT STATE Assumption: Network Operations Centers are governed by two principles 1. Pursuit of excellence in customer service 2. Operating the most cost effective NOC possible Common elements of these two principles: Have as few outages as possible, with the shortest MTTR achievable Have systems perform work for you to act quickly & effectively with regular updates to customers In support of these principles, what do we need from a trouble ticketing system? Efficiency Easy to use with simplicity & automation Customer service updates, interaction and self-service Analytical capabilities Ability to reduce trouble ticket volume through analytics; Repository of data for reporting on drivers of troubles and high TTR with focus on fault and TTR reduction An effective trouble ticketing system is the foundation of operational success 2 Proprietary and Confidential
Trouble Ticketing System Requirements Efficient CURRENT STATE Intuitive / easy to train & use Automation & Auto-population of data- have system perform as much of the work as possible Flexible & Extensible Easily customizable (preferably by in-house personnel) Extensible with add-ins and additional functionality Capable of integration (input/output) with other systems Inventory systems for circuit details (single system for NCC technicians) Alarm systems for ticket / alarm automation APIs for integration with external vendors/customers Extensive data capturing and reporting capability Association of each case with customer circuit ID, account, locations Root cause codes (4 or 5 levels)- down to circuit pack part number for equipment failures Responsible party (customer or provider) MTTR (auto calculated) Type II vendor performance (where involved) Site / City / State / Region / Legacy Network (as applicable) The trouble ticketing system should be designed to accommodate current operational needs, and extensible to support future requirements 3 Proprietary and Confidential
Trouble Ticketing System Architecture CURRENT STATE 4 Proprietary and Confidential
Trouble Ticket Lifecycle Best in class trouble ticket handling requires 5 phases of trouble ticket handling 1. Create 2. Update 3. Close 4. Analyze 5. Act Traditional life cycle of a trouble ticket; open, update, close Value-add phases of trouble ticket lifecycle; analyze and act on the data = RESULTS Capturing, analyzing, and acting on accurate trouble ticket data is critical to operational improvement 5 Proprietary and Confidential
Trouble Ticket Lifecycle Phase 1 (Case Creation) 1. Create Ticket Primary Objectives for Creating Case: Associate case to service & validate contacts on file Capture all customer information in first conversation (pursue one call resolution) Establish case framework with all information needed to successfully work and close ticket Populate required trouble ticket fields Use a standardized case subject e.g. Customer XYZ, DS-3 HIYX/123456//ZYO Down Hard Associate to customer circuit ID (and thereby account) Capture all available information Customer description of issue, including circuit status (Is circuit currently hard down or degraded?, intrusive testing permitted?) Troubleshooting steps customer has taken- Have they checked CPE for power (for hard down issues)? Customer information to include trouble ticket number, contact name, phone number and email address(es) Access process to customer location (if applicable)? Update customer with details on next steps towards resolution Provide customer with service provider ticket number Automatically send email to customer with ticket details, link to portal (if applicable) Suggest additional troubleshooting steps customer might take (bounce ports, restart router, enable lasers, etc.) Articulate next steps in process- e.g. dispatching a field technician to site, engaging our Tier II technicians, etc and set expectations for next communication to customer Immediately initiate troubleshooting and repairs Queues cost customers- the outage may be significantly impacting the customer s business- act like it Immediately route to appropriate Tier II organization or other fix agent Case creation stage pertinent to ensure correct association to circuit and account for SLA reporting and customer updates 6 Proprietary and Confidential
Single Customer vs. Network (Multi-customer) Cases Network Case - Case comments - Case status - Closure Codes - Case Closure Customer Case #1 Customer Case #2 Customer Case #3 Customer Case #4 Customer Case #5... Customer Case #N Network Case Handling (multiple customer circuits affected) - For issues affecting multiple customers, create a single parent / network case that reflects the overall event and a child / customer case for each service impacted - All information entered into the Parent case cascades down to child cases - Proactively create the cases (ideally with system automation) and email customers upon case opening; Attempt to inform customer of event BEFORE they contact the NOC to report service issue - Enter public comments frequently into the parent case to send emails to each affected customer; over-communicate updates to significantly reduce call volume into NOC (and increase customer satisfaction over event handling) 7 Proprietary and Confidential
Trouble Ticket Lifecycle Phase 2 (Case Updates) 2. Update Ticket Primary objectives for working case: Resolve issue as quickly as possible (i.e. Work with a great sense of urgency) Over-communicate with affected customers until issue fully resolved Thoroughly document event and actions taken Tactical Approach Enter thorough, detailed case comments- include names, phone numbers, IP addresses, location details, equipment alarm logs, etc. The more detail, the better. Document every action taken, every conversation held- if it isn t in the ticket, it didn t happen Case comments should have automatic timestamps for reconstruction of event Case status changes should automatically drive MTTR logs > Case Created (starts MTTR clock) > Repair in Process > Technician dispatched > Technician arrived > Service restored (stops clock) Enter Public comments as frequently as possible, never less than once per hour for long duration events Escalate as needed Engage higher level resources as needed and involve Tier III, Engineering, vendor resources as required- don t get stuck Update management on critical issues- don t let management team be caught by surprise When required resources are not reachable (e.g. field technicians), escalate up their management chain immediately- once around and up Outages will occur; acting with urgency and providing frequent updates to customers improves customer satisfaction and reduces attrition 8 Proprietary and Confidential
Trouble Ticket Lifecycle Phase 3 (Case Closure) 3. Close Ticket Primary Objectives for Closing Case: Close-out communications with customer- wrap it up Capture closure code details for subsequent reporting Summarize the event in 2 to 3 sentences for future internal and external consumption Communicate with Customer that case is being closed Summarize case details, provide preliminary RFO, let customer know that case is being closed or placed into monitor status Create succinct, descriptive closing summary Customer reported DS-3 down hard, dispatched technician and isolated to failed DSX-3 module, replaced DSX-3 module to restore Capture closure codes with accurate detail Level 1: Zayo owned equipment or fiber Level 2: Equipment Failure Level 3: Telect Level 4: DSX-3 Module Specific part number captured by equipment replacement request Review MTTR logs for accuracy, correct if needed Close case or set to monitor status with auto-close (i.e. try not to touch it again) Accurate case closure codes are required for reporting on drivers of trouble volumes and high TTR; customer consumable closure summaries reduce RFO requests and reduce need for customer follow-up 9 Proprietary and Confidential
Trouble Ticket Lifecycle Phase 4 (Analyze Ticket Data) 4. Analyze Ticket Data Primary Objectives for Analyzing Trouble Ticket Data: Determine most frequent causes of trouble tickets Determine drivers of high TTR Identify chronic issues (before the customer does) Analyze trouble by closure codes Create pareto charts to determine top drivers of trouble volumes Determine fault frequency rate of equipment issues; expect <2.5% failures per annum Review cases from different perspectives > Troubles by vendor, equipment make/model, circuit pack (part number), software load > Troubles by service type > Troubles by site > Troubles by legacy network Identify and review chronic issues Identify repeat/recurring troubles on specific circuits Repeat events at site (high temp, low temp, power loss, card failures, circuit errors, etc.)- may be indicative of power/grounding/lighting/cabling issues Specific routes subject to failure- fiber cuts, power outages, intermittent errors, (e.g. PMD identification) Analyze Root Cause of Faults and drivers of high TTR Analyze events with high MTTR to determine drivers > Regional, state, city (sparing, technician locations, tools, training, OSP repair processes and capabilities, local management) > Equipment type (NOC technician training & capabilities, OSS systems, software issues, vendor support) > OSP repair processes (cut isolation and repair approach, 3 rd party performance, OSP restoration contractors and capabilities) Customer responsible troubles Drivers of customer responsible troubles Specific circuits with high volumes of customer responsible issues Specific customers with high volumes of customer responsible issues Weekly, monthly, quarterly, and annual analyses provide different perspectives Invest the time required to thoroughly analyze trouble ticket metrics to determine root cause drivers of outages and high MTTR 10 Proprietary and Confidential
Fault Analysis Example Pareto Chart Analysis of Equipment Failures Top 3 levels of closure codes provide view down to equipment manufacturer In this example, fault frequency rate of Force10 equipment determined to be >8% across ~500 network elements (vs. several thousand Accedian and Westell Devices) Data used to create business case for removal of equipment as part of network modernization; resulted in significant improvement in trouble ticket volumes res
Trouble Ticket Lifecycle Phase 5 (Improvement Activity) 5. Act on Trouble Ticket Data Analysis Primary Objectives: Reduce trouble ticket volumes Reduce mean time to restore Identify Opportunities to eliminate outages and reduce MTTR Determine actions that can be taken to eliminate outages (Software upgrades, equipment replacements, process improvements, training, systems, power audits, etc.) Engage technology vendors and demand product improvement as appropriate (e.g. >2.5 annual fault frequency rate)- don t accept subpar technologies) Hold type II providers to high standards; report to them on their performance and request corrective action plans as appropriate. Ensure that vendor performance influences buying decisions Identify potential to reduce MTTR (troubleshooting processes & training, field technician locations, tools & equipment, sparing, restoration and power contractors, etc.) If it is worth doing, put it on an Action Item register and make a commitment to completing; Create impactful corrective actions and assign an individual that is accountable for each action item with a due date. Don t create trivial corrective actions as this diminishes importance of urgent action items Have a system for following each action item to completion; for larger organizations consider dedicating an employee just to this function- it s that important Determine methodology to reduce customer responsible troubles Inform customers of potential chronic issues on their side; suggest potential improvement initiatives, noting that some customers may not have the ability to identify chronic issues or capability to reduce issues Enable customer self-service where possible (DNS updates, routing updates, equipment PMs, circuit status, etc.)- pursue advanced portable capabilities Bill customers for repetitive abuse of the system- i.e. using service provider to troubleshoot customer equipment or isolate among multiple providers In rare cases, consider firing your customer Acting on the trouble ticket data dramatically improves network performance & customer service while reducing operational costs 12 Proprietary and Confidential
Trouble Ticket Lifecycle General Commentary Always attempt to contact the customer before they contact you Proactively notify of outages Over-communicate with frequent case updates; don t make customer ask Internal escalation- if appropriate, escalate to management before customer escalates and have upper management engage Attempt to interact with customers in the method(s) that they prefer Carrier customers prefer to interact via phone Enterprise customers (particularly IP) generally prefer to interact via email or portal Enable self-service; allow customers to service themselves for non-outage requests DNS Updates Routing Updates Bandwidth utilization Contact updates Build a NOC model focused on continuous improvement Continuous reduction in fault frequency rate and trouble ticket volumes Improvement in MTTR until consistently meeting target objectives Pursuit of these initiatives results in delivering on the most critical NOC objectives: Delivering the best in customer service Operating the most cost effective NOC possible 13 Proprietary and Confidential
14 Proprietary and Confidential Q & A
Greg Hadlock 303-731-6662 ghadlock@zayo.com Thank You. www.zayo.com