Double Dipping Incident - Problem Management & Your SLA s www.caskllc.com
Introduction Richard Pilgrim, ITIL Sr. ITSM Consultant richard.pilgrim@caskllc.com www.caskllc.com Cask, LLC Strategy. Solutions. Success. #57 on 2011 Inc. 500 - Fastest-Growing Private Companies Senior ITSM Consultant ITIL & Lean Six Sigma certified and have 20 years of hand-on ITSM, service delivery and consulting experience. I have worked with clients, ranging from 1,500 to 750,000 users, to build Service Management programs, roadmaps, services, portfolios and processes in Telecommunications, Federal, & Defense industries. I am a member of the ITSMF USA San Diego LIG. 2
3 What is Double Dipping? From Wikipedia Double dip may refer to: To put a food item (like a vegetable or chip) into a dip, take a bite and put it back in. Socially taboo as believed to add microbes from the person's mouth into the dip.
What is Double Dipping in SLA s? Competing Incidents Often times we experience outages that become availability incidents. These incidents hold a different service level than those incidents caused by the outage. EXAMPLE: An email server goes down and an incident is recorded by the SysAdmin or a Data Center Admin. Depending on how well communications work in your organization, I would guess to say that you will have users calling in connectivity incidents. Each of the incidents described above normally carry a different SLA target. If not associated/mapped, the Double Dipping can occur. If they are not categorized properly, Double Dipping can occur. 4
Example of Competing SLA s Cask Rapid Process Workshops 5 SLA SLA Description Standard Penalty Service Availability Core Services IM-1 Response The percentage of time Core Services are working such that the end-user can utilize the subscribed service/s. A system or service is defined as unavailable from the time the service provider has received the Incident ticket until the time the Incident is closed. Time to respond to an Incident, after ticket for a seat subscribed to 2-business-hour Return to Service received 99.95% 0.5% 96% 0.25% IM -1 RTS IM-2 Response IM-2 RTS Customer Satisfaction Time to restore service for seat subscribed to 2-businesshour Return to Service, after ticket received Time to respond to an Incident, after ticket for a seat subscribed to 8-business-hour Return to Service received Time to restore service for seat subscribed to 8-businesshour Return to Service, after ticket received The percentage of returned surveys that reflect Very Good or higher satisfaction (both objective and subjective) with the quality of services provided under the Contract. 95% 0.5% 96% 0.25% 95% 0.5% 94.00% 1.50%
Incident Management Process Cask Rapid Process Workshops 6
How do we avoid Double Dipping? 7 Incident Categorization Are your categories & subcategories set up to handle your requirements? Assigning incident tickets to categories and subcategories can greatly improve the clarity and granularity of SLA visibility & report data. For example, without good categorization of incidents, you'd never know how many network-related versus telephone-related incidents you had from week to week or month to month. The platform can also use an incident's category/subcategory to automatically assign it to a specific fulfillment group to work on. Category Inquiry / Help Software Hardware Network Database Subcategory Anti-Virus Email Internal Application Email Operating System CPU Disk Keyboard Memory Monitor Mouse DHCP DNS IP Address VPN Wireless DB2 MS SLQ Server Oracle
What else should we be using in our ITSM tools? 8 CI Relationship to SLA s Priority Impact Matrix
9 When should we associate incidents? General Rules Associate Incident to Related Record/s Associate the related record to the incident if: An identical active incident (same error on the same CI) exists. Or An active problem record related to the incident exists. Or The applied workaround, known error, or resolution information resolved the incident. Or The incident was caused by the implementation of a change.
Is Restoring Enough? Purpose» The primary goal of the Incident Management process is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. 'Normal service operation' is defined as service operation within SLA limits.» As we can see the importance of returning the service to normal operations is always the main focus for obvious reason.» So who is really looking at the SLA Limits and when? 10
Who is responsible for SLA performance?» Service Desk» Virtual/Remote Service Management» Field Service» Data Center/Systems Administrators» Network» Engineering» Vendors» Service Level Mgmt.» Executive Leadership SLA Metering 11
Ties to Incident Management Cask Rapid Process Workshops 12
Q&A Richard Pilgrim, ITIL Sr. ITSM Consultant richard.pilgrim@caskllc.com www.caskllc.com 13