Incident Management Get Your Basics Right
Introduction Neil Thomas Industry experience in IT & IT support ITIL Vendor Product Management ITIL Consulting Specialised in Service Catalog & CMDB
Introduction Fully Accredited ITIL Training Fully Accredited SDI Training ITIL Consultancy elearning Social Media Training & Consultancy Industry Webinars (ITSM & SM) Industry/Organizational Podcasts SDI Partner for Social Media Courses
The Webinar Series Service Catalog Developing a CMDB Incident Management Problem Management Change Management Measuring Service Desk Performance Metrics
Topics today Incident Management & ITIL Service Desk Incident versus Service requests Other Incident Workflows Knowledge Service Level Agreements Incident & Problem Incident & Change
If Something Goes Wrong (Incident Management) How Quickly do we Support (Service Level Management) If Something Keeps Goes Wrong (Problem Management) User Needs Something (Service Requests & Service Catalogue) Managing it (Service Portfolio Management & Financial Management) Service Ensuring it s there in the Future (Availability Management & Capacity Management & Service Continuity Management) What Delivers it (Configuration Management) Need to Improve or Resolve Problems (Change Management) Delivering Agreed Changes to Business (Release Management)
Incident Management Restore normal services AS QUICKLY AS POSSIBLE while minimizing the impact Incident definition: Any event that disrupts, or could disrupt, a service
Key Elements Incidents ANYTHING hardware and software errors Reported by email, phone self-service, Twitter etc Events detected within the IT infrastructure (Event Mgt V3) Normally recorded by the Service Desk to ensure compliance Data vital to improve resolution of service
Key Elements Incident detection & recording Classification & initial support Investigation & diagnosis Resolution & recovery Incident closure Ownership, monitoring, tracking, & communication (monitoring the progress of the resolution of the incident and keeping those who are affected by the incident up to date with the status)
The Incident Management Process From Event Mgmt From Web Interface User Phone Call Email Technical Staff Incident Identification EXAMPLE Incident Logging Incident Categorization Service Request? No Yes To Request Fulfilment Incident Prioritization Major Incident Procedure Yes Major Incident? No Initial Diagnosis Management Escalation Yes Hierarchic Escalation Needed? No Yes Functional Escalation Needed? No Investigation & Diagnosis Yes Functional Escalation 2/3 Level Resolution and Recovery Incident Closure End
Record Normally recorded by a Service Desk Record all incidents Ensures compliance with SLAs Records all relevant data Facility for users to to report incidents quickly & easily,
Categorize Effective categorization of incidents has two aspects: Classification to determine incident type (for example IT Service = degraded) The Configuration Item (CI) that is affected Use standardized coding criteria.
Prioritize (Severity) Priority/Severity Level 4 No Business Impact No loss of service or resources Priority/Severity Level 3 Minor Business Impact Minor loss of service or resources Priority/Severity Level 2 Serious Business Impact Severe loss of service or resources acceptable workaround Priority/Severity Level 1 Critical Business Impact Complete loss of service or resources and work cannot reasonably continue - the work is considered mission critical
Escalate Rapidly escalate incidents according to agreed service level allocate more support resources if necessary Escalation can follow two paths: Horizontal escalation is required when the incident needs to be escalated to different SME groups that are better able to perform the Incident Management function. Vertical escalation is where the incident needs to gain higher levels of priority. Rules to ensure timely escalation For every resolution attempt, accurate data must be attached to the incident detail to save repeating recovery procedures
Resolve, Recover & Restore Check for known errors and use any workarounds Resolving the Incident with solutions or workarounds For some solutions, a Request for Change (RFC) will need to be submitted Service Desk confirms with the user the error has been rectified and that the incident can be closed Goal of the Incident Management process is to restore service.
Key Functions Take ownership for an incident Provide a prompt recovery of the business within SLA Keep the focus on the incident (no blindsiding) Escalating incidents: functional (higher technical skill) Escalating incidents: hierarchical (manager decision) Keep the customer informed Facilitate communication and act as an interface Keep tracking of time & activity
The Incident Management Process
Service Level Agreements Negotiated and AGREED level of response WITH organization Different SLA s for different: Priorities Configuration Items (assets) Service User Appropriate to organizations needs Aim to RESTORE service asap given the IMPORTANCE of the service
Major Incident A Major Incident is an unplanned or temporary interruption of service with severe negative consequences
Problem Management A Problem is the cause (typically unknown) of one or more incidents. Activities include: Analyze and identify the root cause of one or more incidents Validate and publish the workaround for incidents whose cause is known (known error) Effect the systematic removal of the root cause via RFCs
Known Errors Problem Management identifies the underlying causal factor It might take many incidents to understand the root cause. When identified the causal factor becomes a known error If a work-around exists then becomes a workaround
The Incident Management Process
Knowledge & Incidents Use of in Self Service Self help (knowledgebases, FAQs etc) Script based help Record that it self help has been used Use of to Construct Knowledge Incidents contain DESCRIPTION Incidents contain RESOLUTION INFORM Problem
Service Desk & Incidents Incident logging Customer satisfaction Prioritization First line support Request fulfillment Escalations Communication Operational metrics
Know when to stop! Beware over analyzing Appropriate Management Information Closed 4,000 calls Received 45,000 SNMP Traps SIGNIFICANCE Why Measure? What is IMPORTANT to the Organization Key Performance Indicators Customer satisfaction Time to resolution Key Incident Resolution
Service Catalog & Requests
Is it a bird or? When is an Incident a Service? Alternate Incidents New Hire Leaver Equipment Request Software Provision Virus Scan No right answer
Is it a bird or? Define the Process Manage by Priority Set realistic SLA s OR Make Service Requests
New Hire Process HR Tasks Recruitment request signed, attached and filed Recruitment offer signed, attached and filed Offer letter and T&C s sent to candidate Signed letter back from candidate Starter letter sent out to candidate Created new employee in external systems Personal details completed Informed payroll / reception Collected acknowledgement forms: Employee handbook H&S policy IT Policy Induction arranged References requested & received Healthcare cover arranged Pension arranged Parking permit issued Business cards arranged End of probation letter sent IT Tasks PC/Laptop Network ID Email Telephony Internal, Cell Security card Application access FM Tasks Seating allocation
Incident & Change Accurate analysis Identification of Configuration Items Good Problem analysis that touches ALL Incidents Link to Known Errors and Work Arounds
Configuration Management Defines WHAT delivers a SERVICE Defines the RELATIONSHIPS & dependencies Know WHAT is important and HOW it connects Change Process uses IMPACT ANALYSIS
Why Incident Management? Knowing which Service is most important Incidents to be prioritized Defines who a user/customer contact, what is the expected fix time etc If not then we fight the same fires over and over again Building better and more repeatable process around this firefighting will drive efficiency and effectiveness and overall greater quality Builds on the body of knowledge of a call DOCUMENTS what has happened, who did what and when Stops duplication of work Avoids difficult tasks being ignored (bounce count) Communication occurs (or should occur)
Q & A Time. Confidential, All Rights Reserved, ServiceSphere 2008 http://www.servicesphere.com