Network Monitoring and Management Services: Standard Operating Procedures September 17, 2013
TABLE OF CONTENTS Purpose... 2 Contacts... 2 Standard Operating Procedures... 3 Hours of Coverage... 3 Contacting the Response Center... 3 Trouble Ticketing Format... 4 Priority Definitions... 4 Critical Alerts... 5 Medium Alerts... 5 Low Alerts... 5 Escalation and Communications Procedure... 6 Critical and Medium Priority Trouble Tickets... 6 Low Priority Trouble Tickets... 8 Out of Band Communications... 9 Service Level Management and Notification Frequency... 9 Reporting... 10 On-Demand Reporting... 10 Maintenance... 11 Customer Maintenance... 11 Monitoring System Maintenance... 11 1
PURPOSE This Standard Operating Procedures document defines the specific procedures, processes, and policies for the day-to-day management, trouble reporting, and repair of One Source Networks Inc. ( OSN ) Managed Network Services ( MNS ) associated with Customer s network. CONTACTS One Source Networks Contact Information Account Manager Responsible for the entire lifecycle experience Project Manager Responsible for coordinating initial service delivery and service changes Name Phone E-Mail Mobile Name Phone E-Mail Mobile One Source Networks NOC Escalation Contacts NOC Name Chris Garrett Supervis Phone 512.592.4810 or E-Mail chris.garrett@onesourcenetworks.com Mobile 512.605.9620 Responsib le for dayto-day NOC operations Service Name Allan Mittelstaedt Delivery Phone (512) 592-4813 Manager E-Mail Allan.mittelstaedt@onesourcenetworks.com Mobile (512) 415-1905 Responsib le for service delivery 2
Customer Contact Information Primary Point of Contact Secondary Point of Contact Name Phone E-Mail Mobile Name Phone E-Mail Mobile Customer Notification Information Email Notification Address Monitoring System Alerts (Yes/No) Ticketing System Alerts (Yes/No) STANDARD OPERATING PROCEDURES HOURS OF COVERAGE Operational hours of coverage are 24 x 7 x 365. CONTACTING THE RESPONSE CENTER Ticketing Portal http://portal.outer.net Monitoring Portal https://monitoring.onesourcenetworks.com Phone Number (512) 592-4810 E-Mail support@onesourcenetworks.com Usernames and passwords for both Ticketing and Monitoring will be provisioned for each user or distribution group at or before the time of service commencement. 3
TROUBLE TICKETING FORMAT Description Customer OSN Trouble Ticket Number Vendor Trouble Ticket Number [If applicable] Site Covered Device Circuit ID [If applicable] Interface Circuit Status [If applicable] Time of Alert Status New Pending Technician Pending Customer Pending Close Closed Priority Low Medium Critical PRIORITY DEFINITIONS All Alerts will be generated by the Monitoring System according to the criteria below. Notifications of Alerts will be provided to the email addresses specified in the Customer Notification Information section of this document. Trouble Tickets will be generated with the priority corresponding to the relevant Alert or incident, or at Low Priority for Customer-initiated change or administration requests. Customer Contact is defined as speaking with or receiving an email from the Primary or Secondary Point of Contact, as defined within the Customer Contact Information above. 4
CRITICAL ALERTS Critical Alerts are defined to be cases where a Covered Device or telecommunications circuit is down or otherwise non-functional as experienced by any user or reported by the Monitoring System. Critical Alerts are to be issued as follows: A Covered Device is considered down if it misses polling cycles for five (5) consecutive minutes An interface on a Covered Device is down for five (5) consecutive minutes Any location experiencing lost network connectivity: office down or datacenter down MEDIUM ALERTS Medium Alerts are considered failures of some kind that do not immediately impact production processes but that may cause degraded service. Medium Alerts are to be issued as follows: For Covered Devices exceeding thresholds for interface packet loss or latency, where such thresholds are available, as follows: o Packet loss over 30% continuously for fifteen (15) minutes o Response time over 1000ms continuously for fifteen (15) minutes LOW ALERTS Low Alerts are considered failures of some kind that do not immediately impact production processes or user experiences. Low Alerts are to be issued as follows: For Covered Devices exceeding thresholds for CPU utilization, memory utilization, or storage utilization where such thresholds are available, as follows: o CPU utilization over 90% continuously for one (1) hour o Memory utilization over 90% continuously for one (1) hour o Available storage on volume below 2GB continuously for fifteen (15) minutes 5
ESCALATION AND COMMUNICATIONS PROCEDURE CRITICAL AND MEDIUM PRIORITY TROUBLE TICKETS 1. Upon receiving an Alert from the Monitoring System or a trouble call from the Customer, the NOC technician will create a Trouble Ticket in the OSN Trouble Ticketing system (Portal) within fifteen (15) minutes of receipt of the system Alert or phone call. The Trouble Ticket will be created according to the Trouble Ticketing Format above and have its user field assigned to the primary Customer site contact or distribution group, who will receive Trouble Ticket updates via email. 2. The NOC technician will immediately check the Monitoring System to determine the availability of the system that is reported down or degraded. 3. The NOC technician will attempt to diagnose and resolve the problem. a. [MNS Complete: Router and MNS Lite: Router Only] If it is determined that the outage or degradation is carrier-related, the NOC technician will create a Trouble Ticket with the carrier for circuit testing. i. The NOC technician will notify the primary Customer POC of the creation of a Trouble Ticket with the carrier via phone or email at the preference of the Customer. ii. If the primary Customer POC is unavailable, the NOC technician will notify the secondary Customer POC via phone. iii. If intrusive testing is required, the NOC technician will request a maintenance window from the primary Customer POC via phone. iv. If the primary Customer POC is unavailable, the NOC technician will request a maintenance window from the secondary Customer POC via phone. b. If it is determined that the outage or degradation is due to failed Customer equipment that is under management by OSN, the NOC technician will create a Trouble Ticket with the equipment vendor for additional testing and/or, if available, warranty equipment replacement. i. The NOC technician will notify the primary Customer POC via phone of the creation of a Trouble Ticket with the vendor. ii. If the primary Customer POC is unavailable, the NOC technician will notify the secondary Customer POC via phone. iii. If sparing is employed rather than third-party support, the NOC technician will ship a spare Covered Device to the affected Customer site. 1. The NOC technician will notify the primary Customer POC via phone of the spare shipment and estimated arrival date. 6
2. If the primary Customer POC is unavailable, the NOC technician will notify the secondary Customer POC via phone. 3. The NOC technician will update the Trouble Ticket with a shipping tracking number as soon as it becomes available. c. If onsite work by either the carrier or equipment vendor is needed, a phone call will be made to the following people and a date and time will be scheduled: i. Primary POC ii. Secondary POC if Primary is unavailable d. [MNS Complete Only] If it is determined that the outage or degradation is due to a configuration problem with Customer equipment that is under management by OSN, the NOC technician will request authorization to perform a configuration rollback and/or software update to the Covered Device. i. The NOC technician will request authorization from the primary Customer POC via phone. ii. If the primary Customer POC is unavailable, the NOC technician will request authorization from the secondary Customer POC via phone. iii. Upon receipt of authorization, the NOC technician will escalate the Trouble Ticket for performance of the requested operation. e. If none of the above solution paths is appropriate, the NOC technician will escalate the Trouble Ticket to Customer IT resources via email. 4. The NOC technician will update the status of the Trouble Ticket with a frequency appropriate to its priority, except as described below. This update will result in an email being sent to the primary Customer site contact or distribution group, as in step (1). a. If issue resolution is delayed or blocked by Customer s action or inaction, the Trouble Ticket status will be changed to pending Customer and Trouble Ticket updates will occur daily. b. If the next step in the troubleshooting or remediation process has been scheduled for a time more than one (1) hour in the future, the Trouble Ticket status will be changed to pending Customer and the next Trouble Ticket update will occur within one (1) hour of the scheduled start time of said next step or within twenty-four (24) hours, whichever is sooner. i. After the next update, the Trouble Ticket status will be re-assigned to pending technician if hourly updates are again appropriate. ii. After the next update, the Trouble Ticket status will remain pending Customer if another troubleshooting or remediation step has been scheduled more than one (1) hour in the future. 5. The NOC technician will monitor the request through to resolution and closure. 7
a. Upon initial resolution of the Trouble Ticket, the NOC technician will change the Trouble Ticket status to pending close. The NOC will continue to monitor the affected Covered Device or circuit for twenty-four (24) hours after activity is restored. b. At least twenty-four (24) hours after initial resolution of the Trouble Ticket, the NOC technician will call the Primary POC to advise that the issue has been resolved and to request verification of resolution. i. If the NOC technician is unable to reach the Primary POC for verification of resolution, the NOC technician will update the Trouble Ticket with details of the contact attempt and leave the Trouble Ticket status set to pending close. ii. The NOC technician will call the Primary POC daily until contact has been made, and only proceed to step (c) after confirming resolution with the Primary POC. c. The NOC technician will update the Trouble Ticket with a Reason For Outage ( RFO ), and close the Trouble Ticket. The RFO shall include: i. Start time and end time of Customer impacting outage. ii. Simple summary of root cause and fix implemented to restore service. 6. If at any time during the Trouble Ticket lifecycle the Customer requests that the Trouble Ticket priority be downgraded, the NOC will make the requested priority change and updates will occur with the frequency appropriate to the new priority. LOW PRIORITY TROUBLE TICKETS 1. Upon receiving an Alert from the Monitoring System or a trouble call from the Customer, the NOC technician will create a Trouble Ticket in the OSN Trouble Ticketing system (Portal) within fifteen (15) minutes of receipt of the system Alert or phone call. The Trouble Ticket will be created according to the Trouble Ticketing Format above and have its user field assigned to the primary Customer site contact or distribution group, who will receive Trouble Ticket updates via email. 2. [MNS Compete Only] The NOC technician will assign the Trouble Ticket to the Engineering Group for troubleshooting and resolution or performance of a requested change. 3. The NOC technician will escalate the Trouble Ticket to Customer IT resources via email. 4. The Engineer will monitor the Trouble Ticket through to resolution and assign the Trouble Ticket back to the NOC technician to monitor until close. Updates to the Trouble Ticket will be provided every twenty-four (24) hours until the Trouble Ticket is resolved. 8
OUT OF BAND COMMUNICATIONS From time to time, it may be necessary for the NOC to send correspondence from outside the Trouble Ticketing system. In such cases, correspondence will include the associated OSN Trouble Ticket number in the subject line, and will be appended to the Trouble Ticket. SERVICE LEVEL MANAGEMENT AND NOTIFICATION FREQUENCY The Network Operations Center will: Monitor all priority Critical Trouble Tickets at one (1) hour intervals except as defined in the Escalation and Communication Procedure above: o Update the Trouble Ticket with new information. o In the absence of updated information or inability to contact the third party vendor for an update, the NOC technician will call the Customer POC for intervention. o As a result of intervention, the Customer POC will call the NOC technician with updated information. Monitor all priority Medium Trouble Tickets at four (4) hour intervals except as defined in the Escalation and Communication Procedure above: o Update the Trouble Ticket with new information. o In the absence of updated information or inability to contact the third party vendor for an update, the NOC technician will call the Customer POC for intervention. o As a result of intervention, the Customer POC will call the NOC technician with updated information. Monitor all priority Low Trouble Tickets at twenty-four (24) hour intervals: o Update the Trouble Ticket with new information. o In the absence of updated information or inability to contact the third party vendor for an update, the NOC technician will call the Customer POC for intervention. o As a result of intervention, the Customer POC will call the NOC technician with updated information. Manage Trouble Tickets to ensure the following SLAs are met: SLA Priority Status Commitment Level 9
Response Critical New 15 minute response Response Medium New 15 minute response Response Low New 15 minute response Update Critical Pending Technician 1 hour update Update Critical Pending Customer 24 hour update, or as appropriate Update Critical Pending Close 24 hour update Update Medium Pending Technician 4 hour update Update Medium Pending Customer 24 hour update, or as appropriate Update Medium Pending Close 24 hour update Update Low All Pending 24 hour update REPORTING Monthly reports will include the following items: Summary of Trouble Tickets created within the prior month, including the following information: o Trouble Ticket number o Creation Time o Location o Site Code o Issue and RFO o Carrier Trouble Ticket Number o Close Action and Time Summary of the following statistics by interface as reported by the Monitoring System: o Receive/Transmit Bandwidth o Errors and Discards o Packet Loss and Interface Utilization Summary of the following statistics by Covered Device as reported by the Monitoring System: o Node Availability o CPU and Memory Utilization ON-DEMAND REPORTING 10
The Monitoring System portal at https://monitoring.onesourcenetworks.com allows additional reports to be run on demand, in addition to providing a real-time view of Covered Device status and performance where applicable. Covered Devices are grouped by Customer Name, by Covered Device vendor, and then by up/down status. Click the plus sign next to the higher-level field to expand the items at the next level. Select the Reports tab to see a list of available reports. MAINTENANCE CUSTOMER MAINTENANCE For planned maintenance, please email support@onesourcenetworks.com or open a Trouble Ticket via the Trouble Ticketing portal. Alerts will be disabled on monitored Covered Devices during the planned maintenance window. MONITORING SYSTEM MAINTENANCE Customer pre-approves a 15-minute maintenance window, to take place Saturdays at 2300 US Central Time, which may include downtime of the Monitoring System. OSN will send notification in advance when this window is needed. Customer pre-approves a monthly 1-hour maintenance window, to take place on the second Saturday of each month at 2300 US Central Time, which may include downtime of the Monitoring System Customer will approve emergency maintenance upon receipt of 24 hours notice by OSN for security-related patches and fixes Periodic major maintenance may be required as new versions of the software are released. OSN will provide at least 1 weeks notice of such maintenance, and these notifications will include expected downtime. Major maintenance will not occur more frequently than quarterly. 11