Problem Management. Process Guide. Document Code: Version 2.6. January 7, 2011. Robert Jackson (updates) Ashish Naphray (updates)

Similar documents
Problem Management Fermilab Process and Procedure

ITIL Roles Descriptions

Problem Management Overview HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies

ITIL by Test-king. Exam code: ITIL-F. Exam name: ITIL Foundation. Version 15.0

EXIN.Passguide.EX0-001.v by.SAM.424q. Exam Code: EX Exam Name: ITIL Foundation (syllabus 2011) Exam

ITIL v3. Service Management

Which statement about Emergency Change Advisory Board (ECAB) is CORRECT?

ITSM Process Description

Fermilab Computing Division Service Level Management Process & Procedures Document

The ITIL Foundation Examination

Problem Management: A CA Service Management Process Map

HP Service Manager. Software Version: 9.34 For the supported Windows and UNIX operating systems. Processes and Best Practices Guide

ITIL Essentials Study Guide

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Processes and Best Practices Guide (Codeless Mode)

BMC Software Consulting Services. Fermilab Computing Division Service Catalog & Communications: Process and Procedures

An ITIL Perspective for Storage Resource Management

Which ITIL process or function deals with issues and questions about the use of services, raised by end users?

The ITIL Foundation Examination

The ITIL Foundation Examination

The ITIL Foundation Examination

The ITIL Foundation Examination Sample Paper A, version 5.1

ENTERPRISE SERVICE DESK (ESD) SERVICE DELIVERY GUIDE

Overview of Service Support & Service

ITSM Maturity Model. 1- Ad Hoc 2 - Repeatable 3 - Defined 4 - Managed 5 - Optimizing No standardized incident management process exists

Yale University Incident Management Process Guide

The ITIL Foundation Examination

Roles within ITIL V3. Contents

hi Information Technologies Change Management Standard

The ITIL Foundation Examination

The ITIL Foundation Examination

White Paper August BMC Best Practice Process Flows for ITIL Change Management

Commonwealth of Massachusetts IT Consolidation Phase 2. ITIL Process Flows

ITIL 2011 Lifecycle Roles and Responsibilities UXC Consulting

INCIDENT MANAGEMENT & REQUEST FULFILLMENT PROCESSES. Process Owner: Service Desk Manager. Version: v2.0. November 2014 Page 0

INTERVIEW QUESTIONS. Que: Which process is responsible for ensuring that the CMDB has been updated correctly?

Front Metrics Technologies Pvt. Ltd. Capacity Management Policy, Process & Procedures Document

Applying ITIL v3 Best Practices

HP Service Manager. Process Designer Content Pack Processes and Best Practices Guide

Terms of Use - The Official ITIL Accreditor Sample Examination Papers

Exam : EX Title : ITIL Foundation Certificate in IT Service Management. Ver :

Service Management. A framework for providing worlds class IT services

Infasme Support. Incident Management Process. [Version 1.0]

Free ITIL v.3. Foundation. Exam Sample Paper 4. You have 1 hour to complete all 40 Questions. You must get 26 or more correct to pass

The ITIL v.3. Foundation Examination

Identifying & Implementing Quick Wins

Process Description Incident/Request. HUIT Process Description v6.docx February 12, 2013 Version 6

Information Technology Engineers Examination. Information Technology Service Manager Examination. (Level 4) Syllabus

Central Agency for Information Technology

Free ITIL v.3. Foundation. Exam Sample Paper 1. You have 1 hour to complete all 40 Questions. You must get 26 or more correct to pass

Tutorial: Towards better managed Grids. IT Service Management best practices based on ITIL

The Official ITIL v3 Foundation Study Aid Glossary of Terms and Definitions

EXIN IT Service Management Foundation based on ISO/IEC 20000

1 Why should monitoring and measuring be used when trying to improve services?

ITIL v3 Incident Management Process

The ITIL v.3 Foundation Examination

ITIL V3 Foundation Certification - Sample Exam 1

ITSM Process Maturity Assessment

ITIL Version 3.0 (V.3) Service Transition Guidelines By Braun Tacon

ICS Operations & Policies Manual. For State Agencies Providing and Using Consolidated Services

ITSM Reporting Services. Enterprise Service Management. Monthly Metric Report

Service Strategy. Process orientation Terminology Inputs and outputs Activities Process flow / diagram Process Roles Challenges KPIs

ITIL Foundation for IT Service Management 2011 Edition

University of Waikato Change Management Process

Change Management Process

HP Change Configuration and Release Management (CCRM) Solution

Combine ITIL and COBIT to Meet Business Challenges

Government of Ontario IT Standard (GO-ITS) GO-ITS Number 38 Enterprise Problem Management Process

Version 1.0. IT Service Management & IT Asset Management Services (ITSM & ITAM Services) Governance Process

Integrating Project Management and Service Management

White Paper. Incident Management: A CA IT Service Management Process Map

LANDesk Service Desk Certified in All 15 ITIL. v3 Suitability Requirements. LANDesk demonstrates capabilities for all PinkVERIFY 3.

ITIL Introducing service transition

Service Level Management

IT Service Management Center

ITIL: Foundation (Revision 1.6) Course Overview. Course Outline

Foundation. Summary. ITIL and Services. Services - Delivering value to customers in the form of goods and services - End-to-end Service

ITIL v3 (Lecture III) Service Management as a Practice IT Operation

How To Create A Help Desk For A System Center System Manager

HP Service Manager software

Recovery Management. Release Data: March 18, Prepared by: Thomas Bronack

ITIL Introducing service operation

ITSM. Maturity Assessment

Introduction Purpose... 4 Scope... 4 Manitoba ehealth Change Management... 4 Icons RFC Procedures... 5

The ITIL v.3 Foundation Examination

INS Problem Management Manual

Incident Manager. Notified. Major Incident? YES. Major Incident Declared. Initial Communication Drafted. MIH At A Glance. Major Incident Ended

Service Improvement. Part 1 The Frontline. Robert.Gormley@ed.ac.uk

TechExcel. ITIL Process Guide. Sample Project for Incident Management, Change Management, and Problem Management. Certified

Introduction to ITIL: A Framework for IT Service Management

Applying ITIL Best Practices to Operations Centers NA SNO Colloquium

Yale University Change Management Process Guide

1 What does the 'Service V model' represent? a) A strategy for the successful completion of all service management projects

IT Organisation in Change

ITIL A guide to service asset and configuration management

Supporting and Extending the IT Infrastructure Library (ITIL)

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Problem Management help topics for printing

Service Integration &

ITIL - QUICK REFERENCE GUIDE

Change Management Process. June 1, 2011 Version 2.7

Transcription:

Problem Management Process Guide Document Code: Version 2.6 January 7, 2011 Prepared by: R. Bruce Specht Robert Jackson (updates) Ashish Naphray (updates) Contributors: Dalibor Petrovic Karen Moses ITSM 7 working group ICT Team

Table of Contents 1. About this document... 1 1.1 Document Objective... 1 1.2 Intended Audience... 1 1.3 Document Owner... 1 1.4 Mandatory Document Revision Schedule... 1 1.4.1 Ad-Hoc Process Changes and Modifications... 2 1.5 General Inquiries... 2 2. Definitions and Acronyms... 3 2.1 Definitions... 3 2.2 Acronyms... 8 3. Overview... 9 3.1 Introduction... 9 3.2 Goal and Objectives... 10 3.3 Process Scope... 10 3.4 Out of Scope... 11 3.5 Benefits... 11 3.6 Business Justification... 12 3.7 Policy Statement... 12 3.8 Reporting and Management Information... 12 3.9 Best Practices... 12 4. Process Governance... 14 4.1 Process Owner... 14 4.2 Process Manager... 14 4.3 Stakeholders... 14 5. Roles and Responsibilities... 16 6. Problem Management Process... 18 PRB.1: Problem Identification, Recording and Classification... 21 PRB. 2: Problem Review... 27 PRB.3: Problem Investigation and Diagnosis... 33 PRB.4: Problem Resolution and Closure... 42 PBR.5: Known Error Identification and Recording... 46 Problem Management Process Guide ii

PRB.6: Known Error Classification and Assessment... 52 PBR.7: Known Error Resolution and Closure... 58 7. Key Performance Indicators... 64 8. Procedures... 66 9. Control Alignment Table... 68 Appendix A Table of Figures... A-70 Problem Management Process Guide iii

Revision History Author Version Description Date R. Bruce Specht V 1.0 Begin Document; Draft One June 4, 2007 R. Bruce Specht V 1.0 Incorporate items discussed at workshop 1 June 7, 2007 R. Bruce Specht 1.2 Doc re-organize to standardize layout June 11, 2007 R. Bruce Specht 1.4 Continuing with re-org June 13, 2007 R. Bruce Specht 1.5 Re-org continued June 15, 2007 R. Bruce Specht 1.6 Updated flow diagram/accept recommend formatting changes June 16, 2007 R. Bruce Specht 2.0 Revisions as per Working Group Request July 5, 2007 Robert Jackson 2.1 Reviewed and Updated May 15, 2009 Robert Jackson 2.2 Reviewed and Updated from draft to released Robert Jackson 2.3 Corrected Urgency and Priority definitions and added 4.3 Stakeholders user info. Robert Jackson 2.4 Updates as part of the Client Management and Reporting Program Robert Jackson 2.5 Added to section 6, Stage 1, additional clarification of process initiators. Robert Jackson 2.6 Updated process ownership to reflect organizational changes. March 10, 2010 March 31, 2010 August 17, 2010 November 15, 2010 January 7,2011 Problem Management Process Guide iv

1. About this document 1.1 Document Objective The objective of this document is to define the process of managing Problems within the Government of Alberta s Information and Communications Technology (ICT) Service Coordination Initiative (SCI). This is a live document, intended to be reviewed and modified regularly as the process matures. The document includes process definitions for: Problem Identification, Classification and Recording Problem Investigation and Diagnosis Problem Resolution and Closure Known Error Identification and Recording Known Error Classification and Assessment Known Error Resolutions and Closure The document will need to be updated to include process definitions for: Knowledge Identification and Recording Knowledge Validation and Publication Procedures for the defined activities will be dependant on the implementation of the toolset, GoA ICT and other ministries protocols. 1.2 Intended Audience The audiences for this document are individuals and groups fulfilling any of the following Roles within the GoA: IT Executives GoA s Service Delivery Resources Members of appropriate GoA governance bodies identified in this document Internal and External Service Suppliers Business/Ministry Representatives acting on behalf of the Users of Information and Communication Technology (ICT) Services 1.3 Document Owner The owner of this Document is the Owner of the GoA s ICT Problem Management Process. The Document owner is accountable for accuracy and completeness of the document, and its alignment to the implemented Problem Management process. 1.4 Mandatory Document Revision Schedule This document follows the revision schedule: Annual Review (Mandatory): The document shall be reviewed for completeness and accuracy every March 31 st. The Document Owner is accountable for performing an annual Document Review. Problem Management Process Guide 1

Ad-hoc Review: Ad-Hoc requests for document revision should be directed to the Document Owner and submitted using the Change Management Process and supporting systems. The Document Owner is accountable for managing the Ad-hoc document revisions. The Document Owner can delegate the responsibility for the Document Revision to their designate. 1.4.1 Ad-Hoc Process Changes and Modifications Requests for changes to this process should be submitted using the formal Change Management process. The request will take a form of a Request for Change (RFC). The Change Manager will hierarchically escalate the RFC to the appropriate Process Manager and / or Process Owner for comments and approval/rejection. The Process Manager does not have the authority to change/modify the following within the Process Management Process unilaterally: a) Process b) Activities Such changes and modifications must be approved by the Problem Management Process Owner, using the ICT Change Management process. Changes to Procedures and Work Instructions (lower level of operational detail) do not require the approval of the Incident Management Process Owner, but do require approval of the Problem Management Process Manager, and must be managed though the ICT Change Management Process. 1.5 General Inquiries General Inquiries about this document should be directed to the Document Owner. Problem Management Process Guide 2

2. Definitions and Acronyms 2.1 Definitions For the purpose of this process, the following terms are defined as 1 : Term Assertions Change Change Management Configuration Item (CI) Definition An assertion has a meaningful bearing on how a control activity satisfies a control objective. For the purposes of this document, the relevant assertions are: Completeness: All events that should have been recorded have been recorded. Accuracy: All data relating to recorded events has been recorded accurately and included in the proper tickets and documents. Validity: Events that have been recorded have occurred. Authorization: All actions requiring authorization are properly authorized. Safeguarding: All configuration items are adequately safeguarded. Presentation: Information required for analysis and decision making is provided in a clear, concise and actionable manner, in compliance with industry best practices and legal requirements. One or all of the above assertions may be relevant/need to be satisfied in order that a control activity can address a relevant control activity The addition, modification, or removal of approved, supported or baselined hardware, network, software, environment, system, or associated documentation (Configuration Items). Changes can arise as a result of Problems, Known Errors and their resolution, or from proactively seeking business benefits. The process of controlling Changes to the infrastructure or any aspect of services, enabling approved Changes with minimum disruption. Any component of an infrastructure that needs to be managed in order to deliver an IT service, and that is under the control of Configuration Management. Information about each Configuration Item is recorded in a configuration record within the CMDB, and it maintained throughout it lifecycle. Configuration Items typically include hardware, software, buildings, people, and formal documentation such as process documentation and SLAs; therefore they may vary widely in complexity, size, and type, from an entire system (including all hardware, software, and documentation) to a single software module or a minor hardware component. Each configuration item can be composed of other configuration items (e.g. a computer system can be comprised of a hard drive, monitor, etc). 1 These definitions are a subset of the ICT Service Coordination Terminology document, and may have been modified to match the definitions of BMC s ITSM 7 tool. Problem Management Process Guide 3

Term Configuration Management Configuration Management Database (CMDB) Controls, Control Objectives and Control Activities Customer End-user Evidence Definition The process responsible for maintaining information about the Configuration Items (CIs) required to deliver an IT Service, including their relationships. This information is managed throughout the lifecycle of the CI. The primary objective of Configuration Management is to underpin the delivery of IT Services by providing accurate data to all IT Service Management processes when and where it is needed. Means a database used to manage configuration records throughout their lifecycle. The CMDB records the attributes of each CI, and relationships with other CIs. A CMDB may also contain other information linked to CIs, for example Incident, Problem or Change records. The CMDB is maintained by Configuration Management and is used by all IT Service Management Processes. Information technology controls (or IT controls) are specific information systems designed to allow support, oversight, and monitoring of business processes. IT controls include controls over the general information technology (IT) environment, computer operations, access to programs and data, program development and program changes. Controls are those upon which the enterprise places reliance to ensure that functions such as access rights, processing, integrity of operations and reporting are sound and reliable. The recipient of a service. Usually, the customer management has responsibility for the cost of the service. In this environment, the ministries of the Government of Alberta are the customers. The person who uses the services on a day-to-day basis. Details of the evidence that satisfies the control objective requirements. Evidence can manifest itself in several ways, including: Document: Identification of what to do, and how to do it. Examples include process documents, templates, forms, checklists, etc. Manual Record: Outputs from an activity that are generated manually, such as meeting minutes, reports, analyses, etc. Automatic Record: Outputs generated automatically by the ITSM product, such as system logs, system usage metrics, etc. Impact Means the effect on the business of an Incident, Problem, Service Request, or a Request for Change. Typically this will relate to the number and type of users affected and the service that is having the problem. It is often equal to the extent of a distortion of agreed or expected Service Levels. In the ITSM tool, Impact can be: Extensive/Widespread: The Change affects multiple Ministries. The Change may require the Service to be down during working hours and/or for a significant amount of time. Significant/Large: The Change affects a single Ministry, or several departments across multiple Ministries. The Change may require the Problem Management Process Guide 4

Impact Term Definition Service to be down during working hours, but for a minimal amount of time. Moderate/Limited: The Change affects a moderate number of users (<100), probably limited to a single department within a Ministry. The Change likely will have little effect during working hours, but may require the Service to be down during non-working hours. Minor/Localized: The Change affects a very small number of users (<25), perhaps limited to a single team. The Change will likely little or no effect to service levels during working hours. Incident Incident Management Known Error Priority Any event that is not part of the standard operations of a service and that causes, or may cause, an interruption to, or a reduction in, the quality of that service. Means the process responsible for managing the lifecycle of all Incidents. The primary objective of Incident Management is to restore normal service operations as quickly as possible with minimum interruption to the business, therefore ensuring that the best achievable levels of availability and services are maintained. An Incident or Problem for which the Root Cause is known and for which a temporary work-around or permanent alternative has been identified. Known Errors are created by problem control and are managed throughout their lifecycle. Known Errors may also be identified by Development or Suppliers. A Known Error remains until and unless it is permanently fixed by a Change. An indicator of the relative importance of an Incident, Problem, or Change. It is the main indicator of severity, and is used to set service level requirements. It is also used to determine the sequence in which an Incident, problem or change needs to be resolved, and therefore the speed at which the resolution will be approved and deployed. It is based primarily on the impact and urgency, although the business risk of not implementing the change is another important criterion for determining the priority of a change. In the ITSM tool, Priority can be: Critical: A Problem that impacts both a large number of users and must be implemented quickly High: A Problem that impacts a significant number of users, and must be implemented in the short term Medium: A Problem that impacts a moderate number of users, and can be implemented in the medium-term Low: A Problem that impacts few users, and can be implemented in the long-term Priority Urgency Critical High Medium Low Extensive Critical Critical High High Significant Critical High High Medium Moderate High High Medium Low Problem Management Process Guide 5

Term Definition Minor High Medium Low Low Problem Problem Management RACI Matrix The unknown underlying cause of multiple Incidents. A condition identified from multiple Incidents exhibiting common symptoms, or from a single major Incident, indicative of a single error, for which the cause is unknown. The process of minimizing the adverse effect to the business of Incidents and Problems caused by errors in the infrastructure, and proactive prevention of occurrence of Incidents, Problems and errors. A RACI diagram is used to describe the roles and responsibilities of various teams or people in delivering a project or in executing a process. It is especially useful in clarifying roles and responsibilities in cross functional/cross departmental projects and initiatives. The RACI diagram splits process activities down to four participatory responsibility types that are then assigned to different roles in the process. These responsibilities types make up the acronym RACI. Responsible (Those who do work to achieve the task, there can be multiple resources responsible). Accountable (The resource ultimately accountable for the completion of the task- there must be exactly one A specified for each task) Consulted (Those whose opinions are sought. 2 way communication) Informed (Those that are kept up-to-date on progress. 1 way communication) Release Release Management Root Cause A collection of new and/or changed configuration items which are tested and introduced into the live environment together. The process responsible for planning, designing, building, testing, scheduling and controlling the movement of releases from test to production environments. The primary objective of Release Management is to ensure that the integrity of the production environment is protected and that the correct components are released. Release Management works closely with Configuration Management and Change Management. The underlying or original cause of an Incident or Problem. Problem Management Process Guide 6

Term Service Request Definition Every Incident that is not a failure in the IT infrastructure (e.g. password change). Alternately, a request for a Standard Change (e.g. providing access to a new employee, moving a few PCs). Means a request for Services submitted by an Authorized User or created and submitted on behalf of an Authorized User using the mutually agreed upon process. Service Request is a specific type of Change (RFC) that is pre-approved. Pre-approval means that the risk of a Service Request has been assessed and is well-known, and that the appropriate levels of approvals have already been secured. Service Requests are those Changes that can be completed by the Service Desk without a need to initiate a full Change Management process. Urgency Measure of the business criticality of an Incident, Problem, or Change based on the business needs of the Customer. The extent to which resolution of an Incident, Problem or Service Request can or cannot bear delay. In the ITSM tool, Urgency can be: Critical: A Problem that, if not implemented immediately, will leave the organization open to huge risk or operational failure, e.g. applying a security patch, or a fix to restore a service outage High: A Problem that is important for the organization and must be implemented soon Medium: A Problem that should be implemented to gain benefit from the changed service Low: Problem that is not pressing but would be advantageous Workaround Means the reduction or elimination of the Impact of an Incident or Problem for which a full resolution is not yet available. For example. by restarting a failed Configuration Item. Workarounds for Problems are documented in Known Error records; Workarounds for Incidents that do not have associated Problem records are documented in the Incident record. Problem Management Process Guide 7

2.2 Acronyms AR Action Request RFP Request for Proposal CAB Change Approval Board ROI Return on Investment CAB/EC Change Approval Board Emergency Committee RPO Recovery Point Objective CI(s) Configuration Item(s) RRR Release Readiness Review CIW Change Initiation Workgroup RTO Recovery Time Objective CM Change Management SCI Service Coordination Initiative CMDB Configuration Management DataBase SD Service Desk CobiT Control Objectives for Information and related Technologies SIP Service Improvement Program CPI Continuous Process Improvement SLA Service Level Agreement CR Change Request SLM Service Level Management DRP Disaster Recovery Program TCO Total Cost of Ownership EC Emergency Change TQM Total Quality Management FSC Forward Schedule of Changes TSP Technical Solution Paper ICT Information and Communications Technology WIP Work In Progress GoA Government of Alberta International Standards Organization ITIL Information Technology Infrastructure Library ITR Information Technology Representative ITSM Information Technology Services Management OLA Operational Level Agreements PIR Post Implementation Review PR Problem Record PSA Projected Service Availability RC Root Cause RFC Request for Change Problem Management Process Guide 8

3. Overview 3.1 Introduction The ICT Problem Management Process has been identified by the GoA Stakeholders as one of the critical processes within the framework of the new ICT Service Coordination initiative. The process is a part of the overall Process Framework for the new Service Coordination Organization, intended to provide governance and control required for optimized service management activities within the GoA. This process will be used by the ICT Service Organization to manage Problems derived from Incidents and proactive monitoring of the infrastructure according to the rules set out in the SLAs, OLAs and UCs developed as part of the Service Level Management process, in support of the business requirements for GoA Ministries. The process will, amongst other things, provide direction to ensure that a comprehensive framework for managing ICT Problems is defined providing a formal process for management, resolution and reporting of Problems. The process will also point to and support service coordination events amongst other Service Management processes where necessary such as Service Level Management, Incident Management, Change Management, Capacity Management and Availability Management processes Figure 1 is a visual presentation of the service lifecycle model that identifies critical stages in the service design, and aligns them with the appropriate governance structures of the GoA and functional areas within the IT organization. For additional clarity, the ICT Problem Management Process placement within this model is identified with the red box. Strategic Architecture & Standards Planning Projects Service Design WG Prioritization Change Advisory Board Ongoing Service Review IT HR, IT Finance & IT Operations Management Service Delivery SLM CRM Customer (s) Customer (s) Ministry Programs Arch. & Planning Service Development (Projects) Service Transition (Impleme ntation) Service Resolution Service Control Service Coordination ICT Services Users Suppliers (Internal & External) Suppliers (Internal & External) Supplier Relationship Management Figure 1: Service Life Cycle Problem Management Process Guide 9

3.2 Goal and Objectives The goal of Problem Management is to minimise the adverse Impact of Incidents and Problems on the business that are caused by errors within the IT Infrastructure, and to prevent the recurrence of Incidents related to these errors. In order to achieve this goal, Problem Management seeks to identify the Root Cause of Incidents and then initiate actions to improve or correct the situation. The Problem Management process has both reactive and proactive aspects. The reactive aspect is concerned with solving Problems in response to one or more Incidents. Proactive Problem Management is concerned with identifying and solving Problems and Known Errors before Incidents occur in the first place. 3.3 Process Scope Problem Management will operate within the context of the process frameworks of the ICT Service Coordination Organization, and is based on the ITIL Best Practices frameworks. Therefore, Problem control, error control and proactive Problem Management are all within the scope of the Problem Management process. In terms of formal ITIL definitions, a 'Problem' is defined as an unknown underlying cause of one or more Incidents, and a 'Known Error' is a Problem that is successfully diagnosed and for which a Work-around has been identified. Inputs to the Problem Management process are: Incident details from Incident Management configuration details from the Configuration Management Database (CMDB) any defined Work-arounds (from Incident Management). The major activities of Problem Management are: 1. Problem Control 2. Error Control 3. the proactive prevention of Problems 4. identifying trends 5. obtaining management information from Problem Management data 6. the completion of major Problem reviews. Outputs of the process are: Known Errors and identified work-arounds/solutions a Request for Change (RFC) an updated Problem record (including a solution and/or any available Workarounds) for a resolved Problem, a closed Problem record response from Incident matching to Problems and Known Errors management information. For the purposes of clarity, the following stages are in the scope of the Problem Management process within the ICT Service Coordination Organization: Problem Identification, Recording and Classification Problem Review Problem Investigation and Diagnosis Problem Resolution and Closure Problem Management Process Guide 10

Know Error Identification and Recording Known Error Classification and Assessment Know Error Resolution and Closure Knowledge Identification and Recording Knowledge Validation and Publication 3.4 Out of Scope All other Service Support and Service Delivery processes are outside the scope of this document, however other processes are referenced within this document. Design of the detailed templates to support the process is outside the scope of this document. 3.5 Benefits The benefits of taking a formal approach to Problem Management include the following: Improved IT service quality: Problem Management helps generate a cycle of rapidly increasing IT service quality. High-quality reliable service is good for the business users of IT, and good for the productivity and morale of the IT service providers. Incident volume reduction: Problem Management is instrumental in reducing the number of Incidents that interrupt the conduct of business. Permanent solutions: There will be a gradual reduction in the number and Impact of Problems and Known Errors as those that are resolved stay resolved. Improved organisational learning: The Problem Management process is based on the concept of learning from past experience. The process provides the historical data to identify trends, and the means of preventing failures and of reducing the Impact of failures, resulting in improved User productivity. Better first-time fix rate at the Service Desk: Problem Management enables a better first time fix rate of Incidents at the Service Desk, achieved via the capture, retention and availability of Incident resolution and Workaround data within a knowledge database available to the Service Desk at call logging. In contrast, the costs of not implementing a Problem Management process may include: a purely reactive support organisation, facing up to Problems only when the service to Customers has already been disrupted an IT User organisation, confronted with recurring Incidents, losing faith in the quality of the IT support organisation an ineffective support organisation, with high costs and low employee motivation, since similar Incidents have to be resolved repeatedly and structural solutions are not provided. Problem Management Process Guide 11

3.6 Business Justification Risks associated with failure to implement the ICT Problem Management process may result in: ICT Services are not designed to meet business requirements represented through the Problem Resolution Requirements Lack of adequate response to changing business requirements and misalignment with GoA business strategy Disconnect between the business functional and control requirements and lack of efficient and effective processes support Dissatisfaction with ICT Services The ICT Problem Management Process will, over time, minimize or eliminate the above listed adverse effects on the GoA s business. 3.7 Policy Statement The Policy Statement addresses the Business Reason for implementing the Problem Management process, and addresses the Sense of Urgency for the process implementation. The GoA is implementing the Problem Management Process to satisfy the following Business objectives: Respond to business requirements in alignment with the business strategy Maintain the integrity of information and processing infrastructure Demonstrate compliance to legal and regulatory requirements Establishing a defined, measurable and manageable process in line with the Industry Best Practice. 3.8 Reporting and Management Information Reports will be produced under the authority of the Process Manager, who should draw up a schedule and distribution list in collaboration with the Process Owner. The content and the presentation of the management reports will be determined during the implementation of the Problem Management Process and will include appropriate consultation with related stakeholder groups. Distribution list for these Reports includes: TBD Monthly SLA Reports and Monthly Quality Assurance Reports will be available to Customers and Management, via a Centralized Document Repository. 3.9 Best Practices The process documentation provided within this document is based on Best Practices combined with the particulars of the GoA organization, and its chosen toolset for use by the Service Desk, Incident Management and Problem Management teams. The Best Practices references for this process are as follows: ITIL Problem Management Process Guide 12

Service Support Book: Problem Management CobiT DS10 Manage Problems /IEC 20000 Problem Management Process The following CobiT Controls have been integrated within the Problem Management Process: Identification and Classification of Problems Problem Tracking and Resolution Problem Closure Integration of Change, Configuration and Problem Management This document also considers and addresses, where appropriate, specifications for the /IEC 20000 certification related to the Problem Management process, specifically: All Problems SHALL be recorded. Procedures SHALL be adopted to identify, minimize or avoid the Impact of Incidents and Problems. They shall define the recording, classification, updating, escalation, resolution and closure of all Problems. Preventative action SHALL be taken to reduce potential Problems, e.g. following trend analysis of Incident volumes and types. Changes required in order to correct the underlying cause of Problems SHALL be passed to the Change Management process. Problem resolution SHALL be monitored, reviewed and reported on for effectiveness. Problem Management SHALL be responsible for ensuring up-to-date information on Known Errors and corrected Problems is available to Incident Management. Actions for improvement identified during this process SHALL be recorded and input into a plan for improving the service. Problem Management Process Guide 13

4. Process Governance The expectation is that every service provider, vendor, and operational team will support and participate in the problem management process including resolution of user impacting problems, known error logging and knowledge management, and prevention of reoccurrence with the goal of reducing customer impact in the environment. A secondary expectation is not to create additional layers of management but ensure that problem investigations and known errors are resolved efficiently with the correct processes to engage the executive for guidance and report status and progress. Service providers are responsible for both correcting the technical cause of incidents and any operational issues identified. Service providers will allocate the appropriate resources to investigate and report on the status of issues and actions items assigned for review at the weekly and monthly operational meetings. Service provider problem managers are accountable for meeting established performance levels. 4.1 Process Owner The Process Owner is the person accountable for the coordination of various functions and work activities across all levels of a process, and for achieving the overall goal of the process. This person has the authority and ability to make changes in the process as required, and manages the entire process cycle to ensure end-to-end performance effectiveness. The Owner of this Problem Management process is: Director of ICT Processes ICT Service Delivery and Support Service Alberta 4.2 Process Manager The Process Manager is the person responsible for the execution and monitoring of repeatable business processes that have been defined by a set of formal procedures. The Manager of this Problem Management process is: Problem Process Manager ICT Service Delivery and Support Service Alberta 4.3 Stakeholders There are various groups that have a vested interest in the smooth operation of the process describe within this document. This includes: ICT Service Coordination Initiative: The ICT Service Coordination Initiative is responsible for monitoring the ICT infrastructure across the GoA. Problem Management Process Guide 14

Internal/External service providers: Service providers are accountable to their signed SLAs if an Incident or failure in the infrastructure prevents them from providing their support services, they could be held liable. Other process owners: As owners who are being held accountable for the performance of their processes (e.g. Release Management, Configuration Management, Problem Management, Service Level Management, etc), if the hand-offs between processes are inefficient or ineffective, the performance of their processes may be impacted. Current Customers: Specifically, the Ministries within the GoA Domain who are included in the scope of this process. Future Customers: Specifically, the Ministries that will be integrated into the GoA domain in the future. Users: Individuals and groups in the GOA that use the services and systems. Problem Management Process Guide 15

5. Roles and Responsibilities The following are the roles involved in the Problem Management process, with a description of their responsibilities and activities. The roles are described by function, not by the individual(s) who will fulfill the role. Some roles may be fulfilled by multiple individuals. ICT Service Alberta Problem Process Manager Service Provider Problem Manager The ICT Service Alberta Problem Process Manager has the responsibility for all Problem Management activities and has the following specific responsibilities: developing and maintaining the Problem control process reviewing the efficiency and effectiveness of the Problem control process producing management information managing Problem support staff allocating resources for the support effort making recommendations to individual service provider Problem Managers for Problem investigations monitoring the effectiveness of error control and making recommendations for improving it developing and maintaining Problem and error control systems reviewing the efficiency and effectiveness of proactive Problem Management activities. It is recommended that the Service Desk Manager and the Problem Process Manager roles are not combined because of the conflicting interests inherent in these roles. The Service Provider Problem Manager has the responsibility for all Problem Management activities and has the following specific responsibilities: developing and maintaining the Problem control process reviewing the efficiency and effectiveness of the Problem control process producing management information managing service provider Problem support staff allocating resources for the support effort making recommendations to other Problem Managers for Problem investigations monitoring the effectiveness of error control and making recommendations for improving it developing and maintaining Problem and error control systems reviewing the efficiency and effectiveness of proactive Problem Management activities. Problem Analyst It is recommended that the Service Desk Manager and the Problem Manager roles are not combined because of the conflicting interests inherent in these roles. Problem support has both reactive and proactive responsibilities, as follows: Problem Management Process Guide 16

reactive responsibilities: identifying Problems (by analysing Incident data, for example) investigating Problems, according to Impact, through to resolution or error identification raising RFCs to clear errors monitoring progress on the resolution of Known Errors advising Incident Management staff on the best available Workarounds for Incidents related to unresolved Problems/Known Errors assisting with the handling of major Incidents and identifying the Root Causes. proactive responsibilities: identifying trends and potential Problem sources (by reviewing Incident and Problem analyses) raising RFCs to prevent the recurrence of Problems preventing the replication of Problems across multiple systems. Problem Management Process Guide 17

6. Problem Management Process The Problem Management process can be described as having three primary interlocking subprocesses, the activities for which can be broken out into seven main Stages directly associated with Problem Management. There are an additional two Stages, bringing the total to nine, that deal with the recording and documentation of knowledge. The ITIL Version 2 Service Support book discusses the three sub-processes: Problem Control: The Problem control process is concerned with handling Problems in an efficient and effective way. The aim of Problem control is to identify the Root Cause, such as the CIs that are at fault, and to provide the Service Desk with information and advice on Work-arounds when available. The process of Problem control is very similar to, and highly dependent on, the quality of the Incident control process. Incident control focuses on resolving Incidents and on providing Work-arounds and temporary fixes for specific Incidents. If a Problem is identified for an Incident or a group of Incidents, available Work-arounds and temporary fixes are recorded in the Problem record by the Problem control process. Problem control also advises on the best Work-around available for the Problem. Because Problem control is concerned with preventing the recurrence of Incidents, the process should be subject to an approach that is carefully managed and planned. The degree of management and planning required is greater than that needed for Incident control, where the objective is restoration of normal service as quickly as possible. Priority should be given to the resolution of Problems that can cause serious business disruption. Error Control: Error control covers the processes involved in progressing Known Errors until they are eliminated by the successful implementation of a Change under the control of the Change Management process. The objective of error control is to be aware of errors, to monitor them and to eliminate them when feasible and costjustifiable. Proactive Problem Management: Proactive Problem Management covers the activities aimed at identifying and resolving Problems before Incidents occur. As previously mentioned, for the purposes of the ICT Service Consolidation Initiative and alignment with the chosen toolset, the Problem Management Process is broken out into nine Stages. The relationships and flow are illustrated in Figure 3. Stage 1. Problem identification, recording, and classification Either the Problem Manager or the assigned Problem analyst identify, record, and classify the Problem. Incident Management and reports from the other ITSM processes can be used to initiate Problem investigations or to facilitate proactive Problem management. In this stage, the analyst (Service Desk support technician, Incident Manager, Service Provider support analyst or subject matter expert) or a Problem Manager initiates the Problem investigation. The analyst identifies the Problem, records details, classifies it, and assigns it to the Problem Manager for review. The process starts when the analyst initiates a Problem investigation. A Problem investigation is typically initiated based on information from Incident Management or when there is not a known cause for a common incident type yet documented. A support analyst working on an Incident can create a Problem investigation from the Incident to determine the Root Cause. Stage 2. Review If the Problem investigation is recorded by the Problem analyst, it is reviewed by a Problem Manager. The Problem Manager evaluates whether to proceed with the Problem investigation. Problem Management Process Guide 18

Stage 3. Problem investigation and diagnosis The Problem analyst investigates the Problem and determines the diagnosis. Stage 4. Problem resolution and closure The Problem analyst or the Problem Manager resolves and closes the Problem investigation. The Problem is complete with the identification of a Known Error, or if a solution is determined. The Knowledge Base must be updated with solutions and Workarounds. When the Problem is complete, notification is sent to Incident Management and the Service Desk Team, so that related open Incidents might be resolved. Stage 5. Known Error identification and recording If the Problem investigation resulted in a Known Error, the Problem analyst identifies and records the Known Error. Stage 6. Known Error classification and assessment Either the Problem analyst or Problem Manager classifies and assesses the Known Error. Stage 7. Known Error resolution and closure Either the Problem analyst or Problem Manager resolves and closes the Known Error. Resolving the Known Error involves the Change Management process. After the change is implemented, the Known Error can be closed. When the Known Error is closed, notification is sent to Incident Management and the Service Desk Team, so that related open Incidents might be resolved. The processes used to maintain the knowledge gathered by and used by the Problem Management team requires further definition. Stage 8. Knowledge identification and recording The Problem analyst identifies and records knowledge generated from the process. Stage 9. Knowledge validation and publication The Problem analyst or Problem Manager validates and publishes the recorded knowledge. Problem Management Process Guide 19

ITSM Problem Management Process Overview Requester Investigation Initiated Problem Manager Problem Analyst 1 Problem Identification, Recording and Classification 2 Review 3 Problem Investigation and Diagnosis 4 Problem Resolution and Closure 5 Known Error Identification and Recording 6 Known Error Classification and Assessment 7 Known Error Resolution and Closure 8 Knowledge Identification and Recording 9 Knowledge Validation and Publication Other Service Support / Delivery Processes Incident Management Configuration Management (Mgmt Reports) Incident Management Change Management Change Implemented Incident Management Figure 2: Process Overview diagram Problem Management Process Guide 20

PRB.1: Problem Identification, Recording and Classification In this stage, the support technician (Incident Management or Service Desk), Problem Manager or Problem analyst initiates a Problem investigation. The Problem analyst identifies the Problem, records details, classifies it, and assigns it to the Problem Manager for review. The process starts when the Problem analyst initiates a Problem investigation. A Problem investigation is typically initiated based on information from Incident Management. A support analyst working on an Incident can create a Problem investigation from the Incident to determine the Root Cause. In the case of proactive Problem management, a Problem Manager might initiate a Problem investigation if there is a pattern of Incidents that indicate a potential Problem. If the Problem Manager initiated the investigation, then the Problem Manager will take ownership of the Problem investigation ticket. Problem Management Process Guide 21

PRB.1: ITSM Problem Identification, Recording and Classification Support Requester Investigation Initiated Problem Manager Problem Analyst PRB.1.1 Record Problem Details PRB.1.2 Associate Relevant Incident(s) PRB.1.3 Associate Relevant CIs PRB.1.4 Classify Problem PRB.1.5 Establish Impact, Urgency and Priority PRB.1.6 Assign Problem to Problem Manager PRB.2.1 Other Service Support / Delivery Processes Incident Management Incident Management Configuration Management Figure 3: Problem Identification, Recording and Classification diagram Table 1 Activities for PRB.1: Problem Identification, Recording and Classification PRB.1: Problem Identification, Recording and Classification Problem Management Process Guide 22

PRB.1.1 PRB.1.2 Activity Inputs Description Outputs Record Problem Details Associate Relevant Incident(s) Incident Records trend analysis observation Problem Ticket, search results The Problem analyst or Problem Manager records the details of the Problem. The Problem Analyst or Problem Manager relates relevant Incidents to the Problem investigation. If the Problem Manager initiated the investigation, it might be assigned to a Problem Analyst to continue the work for this stage. If an Incident Analyst created the Problem investigation ticket from an Incident (or series of Incidents), the toolset automatically relates the investigation to the Incident. If appropriate, the Analyst can relate the investigation to multiple Incidents. If acting proactively., the Problem Manager could create the Problem investigation before any Incidents are reported Problem Ticket Update Ticket PRB.1.3 Associate Relevant CIs Problem Ticket, search results The Problem analyst or Problem Manager relates relevant CIs to the Problem investigation. For example, if there is a Problem with email, the Problem analyst might relate the Problem investigation to an email service CI, and possibly to the email server CI. This activity is dependant on the toolset being properly configured. Updated Ticket PRB.1.4 Classify Problem analysis The Problem Analyst or Problem Manager classifies the Problem investigation. The analyst specifies the appropriate technology and operational categories for the Problem investigation. Updated Ticket PRB.1.5 Establish Impact, Urgency and Priority analysis The Problem Analyst or Problem Manager records the Impact and Urgency of the investigation. The Impact and Urgency of the investigation determine the Priority. Updated Ticket PRB.1.6 Assign Problem to Problem Manager Assignment The Problem Analyst assigns the investigation to a Problem Manager for review. The toolset notifies the assigned Problem Manager that a Problem investigation has been assigned. If the Problem Manager initiated the investigation, it might not be reassigned. Notification Problem Management Process Guide 23

Problem Manager Problem Analyst Incident Analyst Support Requestor Authorized User Table 2 RACI Matrix for PRB.1: Problem Identification, Recording and Classification PRB.1: Problem Identification, Recording and Classification RACI Matrix Activity PRB.1.1 Record Problem Details I A/R C I PRB.1.2 Associate Relevant Incident(s) PRB.1.3 Associate Relevant CIs PRB.1.4 Classify Problem A/R A/R A/R PRB.1.5 Establish Impact, Urgency and Priority A R PRB.1.6 Assign Problem to Problem Manager I A/R I I Problem Management Process Guide 24

Completeness Accuracy Validity Authorization Safeguarding Presentation Table 3 Risk and Control Matrix for PRB.1: Problem Identification, Recording and Classification PRB.1: Problem Identification, Recording and Classification Risk and Control Matrix Assertions Control Ref. Control Objective Process Activity Stated Evidence 20000: 8.3.10 Preventative action and investigation is undertaken to reduce potential Problems through the use of information generated by trend analysis, review of published material, Change management, assets and configuration and historical information. 1.1 Record Problem Details x x x Automatic Record: system reports Manual Records: trend analysis, process reports DS10.1 20000: 8.3.2 Processes have been implemented to report and classify Problems as they are identified. Steps taken determine category, Impact, Urgency and Priority. These groupings are the basis for allocating Problems to support staff and cataloguing within investigation databases. 1.1 Record Problem Details 1.4 Classify Problem 1.5 Establish Impact, Urgency and Priority x x Automatic Record: Problem ticket updated x x Automatic Record: Problem ticket updated x x Automatic Record: Problem ticket updated DS10.2 20000: 8.3.3 8.3.6 The process provides for adequate audit trail facilities that allow tracking, analysing and determining the Root Cause of all reported Problems considering associated CIs, outstanding Problems and Incidents, known and suspected errors. 1.2 Associate Relevant Incident(s) 1.3 Associate Relevant CIs x x Automatic Record: Problem ticket updated Automatic Record: Problem ticket updated Problem Management Process Guide 25

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.1: Problem Identification, Recording and Classification Risk and Control Matrix Assertions Control Ref. Control Objective Process Activity Stated Evidence DS10.3 A procedure exists to close the Problem record either after the confirmation of the successful elimination of the Known Error, or after agreement with the business on how to alternatively handle the Problem. DS10.4 20000:8.3.4 Demonstrable integration of the Change, Configuration, and Incident Management processes with Problem Management. This is using the processes to support Problem investigations 1.2 Associate Relevant Incident(s) 1.3 Associate Relevant CIs x x Automatic Record: Problem ticket updated Automatic Record: Problem ticket updated 20000:8.3.8 The process has internal reviews built in. Results provided by investigations are reviewed by the Problem Manager, and other supporting process Managers. 1.6 Assign Problem to Problem Manager x Automatic Record: Problem ticket updated 20000:8.3.5 With a bias to Incident Management and the Service Desk, the process has defined mechanism for updating Support Analysts on Known Error status, Workarounds and solutions Problem Management Process Guide 26

PRB. 2: Problem Review The Problem Manager reviews the Problem entry and assigns it to the appropriate analyst(s). 2) ITSM Problem Review Problem Manager Problem Analyst Requester PRB.1.6 PRB.2.1 Perform Business Impact Analysis PRB.2.2 Proceed with Investigation? No Yes Cancel Problem Investigation CANCELLED Notification to Problem Analyst PRB.2.3 Assign Problem for Root Cause determination Yes UNDER INVESTIGATION PRB.2.4 Accept assignment? No PRB.3.1 PRB.2.5 Re-classify if required and reassign Notification to Problem Manager Other Service Support / Delivery Processes Figure 4: Problem Review diagram Problem Management Process Guide 27

PRB.2: Problem Review PBR.2.1 Activity Inputs Description Outputs Perform Business Impact Analysis Business Impact metrics The Problem Manager analyzes the Impact of the Problem on the business using a Business Impact Analysis tool. This analysis includes the cost of allowing the Problem to continue, and the cost of investigating the Problem Business Impact Analysis Statement PBR.2.2 Proceed with Investigation? Business Impact Analysis Statement The Problem Manager decides whether to proceed with the investigation. If the decision is taken not to proceed with the investigation, the Problem Manager cancels the investigation. Decision: proceed or not. PBR.2.3 Assign Problem for Root Cause Determination Decision: Proceed. Staff skillsets The Problem Manager assigns the Problem investigation to a Problem Analyst. The Problem Analyst is responsible for determining the Root Cause of the Problem. The toolset notifies the Problem Analyst of the assigned Problem investigation. Assignment Notification to Analyst The Problem Manager will also track and verify the number of times the Problem ticket has been re-assigned. PBR.2.4 Accept Assignment? Notification The Problem Analyst determines whether to accept the assignment. The decision will normally be based on skillset. The Problem Manager should be aware of utilization constraints. If the analyst accepts the assignment, the investigation moves to the Problem investigation and diagnosis stage Decision: accept or reject PBR.2.5 Reclassify, if Required, and Reassign Decision: Rejection If the analyst does not accept the assignment, the analyst performs the following steps: If required, the analyst reclassifies the Problem investigation to better reflect the needed skill set required for the investigation. The analyst reassigns the investigation back to the Problem Manager. The toolset notifies the Problem Manager that the assignment has been reassigned. Updated Ticket Notification to Problem Manager Problem Management Process Guide 28

Problem Manager Problem Analyst Incident Analyst Support Requestor Authorized User PRB.2: Problem Review RACI Matrix Activity PBR.2.1 Perform Business Impact Analysis PBR.2.2 Proceed with Investigation? PBR.2.3 Assign Problem for Root Cause Determination A/R A/R A/R C I PBR.2.4 Accept Assignment? A R PBR.2.5 Reclassify, if Required, and Reassign A/C R Problem Management Process Guide 29

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.2: Problem Review Risk and Control Matrix Assertions Control Ref. Control Objective 20000:8.3.10 Preventative action and investigation is undertaken to reduce potential Problems through the use of information generated by trend analysis, review of published material, change management, assets and configuration and historical information. Process Activity Stated Evidence DS10.1 20000:8.3.2 Processes have been implemented to report and classify Problems as they are identified. Steps taken determine category, Impact, Urgency and Priority. These groupings are the basis for allocating Problems to support staff and cataloguing within investigation databases. PBR.2.5 Reclassify, if Required, and Reassign Automated Record: updated ticket DS10.2 20000:8.3.3 8.3.6 The process provides for adequate audit trail facilities that allow tracking, analysing and determining the Root Cause of all reported Problems considering associated CIs, outstanding Problems and Incidents, known and suspected errors. PBR.2.3 Assign Problem for Root Cause Determination Automated Record: updated ticket Problem Management Process Guide 30

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.2: Problem Review Risk and Control Matrix Assertions Control Ref. Control Objective Process Activity Stated Evidence DS10.3 A procedure exists to close the Problem record either after the confirmation of the successful elimination of the Known Error, or after agreement with the business on how to alternatively handle the Problem. DS10.4 20000:8.3.4 Demonstrable integration of the Change, Configuration, and Incident Management processes with Problem Management. This is using the processes to support Problem investigations 20000:8.3.8 The process has internal reviews built in. Results provided by investigations are reviewed by the Problem Manager, and other supporting process Managers. PBR.2.1 Business Impact Analysis Manual Record: review of existing docs PBR.2.2 Proceed with Investigation? Automated Record: updated ticket PBR.2.5 Reclassify, if Required, and Reassign Automated Record: updated ticket With a bias to Incident Management and the Problem Management Process Guide 31

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.2: Problem Review Risk and Control Matrix Assertions Control Ref. Control Objective 20000:8.3.5 Service Desk, the process has defined mechanism for updating support analysts on Known Error status, Workarounds and solutions Process Activity Stated Evidence Problem Management Process Guide 32

PRB.3: Problem Investigation and Diagnosis PRB.3: ITSM Problem Investigation and Diagnosis PART 1 Problem Manager Problem Analyst Requester PRB.2.4 PRB.3.1 Initial Investigation UNDER INVESTIGATION PRB.3.2 Assistance Needed? Yes PRB.3.3 Generate Appropriate Tasks Includes internal and external data sources, web, etc.. No PRB.3.5 Determine if Change is Cause PRB.3.4 Close Task Activities No Yes PRB.3.6 Relate Change to Problem PRB.3.7 Match to Existing Known Errors Yes PRB.3.8 Associate Known Error to Problem PRB.4.1 COMPLETED No PRB.2.3 PRB.3.9 Root Cause Identified? No Yes PRB.3.12 No PRB.3.10 Escalate to Problem Manager Yes PRB.3.11 Continue Investigation? Yes Other Service Support /Delivery Processes Configuration Management Change Management Figure 5: Problem Investigation and Diagnosis diagram Problem Management Process Guide 33

PRB.3: ITSM Problem Investigation and Diagnosis-PART 2 Problem Manager Problem Analyst Requester PRB.3.9 3.12 Document Root Cause PRB.3.13 Has a solution or workaround been identified? No PRB.3.10 Yes PRB.3.14 Document Solution or Workaround COMPLETED PRB.3.15 Complete Problem Investigation Notification to Analysts of open associated Incidents PRB.3.16 Known Error Created? Yes PRB.5.1 No PRB.3.17 Knowledge Entry Required? No PRB.4.1 Yes PRB.3.18 Create Knowledge Entry Service Support Service Delivery And Other Related Processes Incident Management Knowledge Management Figure 6- Problem Investigation and Diagnosis diagram continued Problem Management Process Guide 34

Table 4 Activities for PRB.3: Problem Investigation and Diagnosis PRB.3: Problem Investigation and Diagnosis Activity Inputs Description Outputs PBR.3.1 Initial Investigation Incident tickets, worklogs The Problem Analyst starts to investigate the Problem. The analyst looks at both internal and external sources of data, such as the Web. The analyst also looks at CIs in the CMDB. Investigation parameters PBR.3.2 Assistance Needed? Investigation Parameters The analyst determines whether the analyst needs assistance Decision: Help Needed or Not PBR.3.3 Generate Appropriate Tasks Investigation Parameters The Problem Analyst generates tasks. The Problem Analyst creates the tasks with a description of what needs to be accomplished, and assigns them as appropriate Tasks PBR.3.4 Close Task Activities Task Assignment The task analysts complete their tasks. Closed Task PBR.3.5 Determine if Change is Cause Worklogs/results from all closed tasks. FSC Change Records If there have been recent Changes to the infrastructure or technology being investigated, the Problem Analyst determines whether a Change caused the Problem Change Identified PBR.3.6 Relate Change to the Problem Identified Change If a Change caused the Problem, the Problem Analyst relates the Change to the Problem. The Problem Analyst has found the Root Cause. Updated ticket reflecting relationship PBR.3.7 Match to Existing Known Errors Investigation Results The Problem Analyst checks whether the Problem matches any existing Known Errors. Search Results PBR.3.8 Relate to Known Error Identified Matches If it matches a Known Error, the Problem Analyst relates the Known Error to the Problem investigation. The Known Error indicates the Root Cause, and should also have a Workaround/solution. Updated ticket reflecting relationship PBR.3.9 Root Cause Investigation results The Problem Analyst determines whether the Root Root Cause Identification Problem Management Process Guide 35

PRB.3: Problem Investigation and Diagnosis Activity Inputs Description Outputs Identified Matching activities results Cause has been identified, either by the investigation procedures used in activities PRB.3.1 through PRB.3.4, or through the matching activities. PBR.3.10 Escalate Root Cause Identification Status If the Root Cause has not been identified, the Problem Analyst escalates the investigation to a Problem Manager. Notification of escalation Updated Ticket PBR.3.11 Continue with Investigation? Worklogs associated with the ticket The Problem Manager determines whether to continue the investigation. If the investigation is not being continued, the Problem Manager cancels or closes the investigation. Details of the reason for the cancellation and all related documents are added to the Problem investigation record. Updated Ticket Closed Investigation PBR.3.12 Document Root Cause Investigation Results The Problem Analyst documents the Root Cause of the Problem. Updated ticket Updated Known Error database PBR.3.13 Workaround Identified? Investigation Results The Problem Analyst determines whether a Workaround has been identified. If a Workaround has not been identified, the process returns the investigation results to the Problem Manager to evaluate. Evaluation PBR.3.14 Document Workaround Investigation Results The Problem Analyst documents the Workaround. Updated ticket Updated Known Error database PBR.3.15 Complete Problem Investigation Updated ticket The Problem Analyst completes the Problem investigation. The toolset notifies Incident analysts assigned to related open Incidents that the Problem investigation is complete. This is intended to help the Incident analysts understand which Stage the investigation has reached, and to provide enough information to resolve the Incidents without requiring excessive intervention on the part of the Problem Updated ticket Notifications to other analysts as required Problem Management Process Guide 36

PRB.3: Problem Investigation and Diagnosis PRB.3.16 PRB.3.17 PRB.3.18 Activity Inputs Description Outputs Known Error Created? Knowledge Entry Required? Create Knowledge Entry Investigation Results Investigation Results Investigation Results Analyst. The Problem Analyst decides whether or not a Known Error has been created. The Problem Analyst determines whether a Knowledge entry is required. If the Knowledge Entry is required, the Problem Analyst creates it. Decision Decision Knowledge Entry The Problem has now become a Known Error as defined by ITIL (Root Cause identified and a Workaround/solution available). Problem Management Process Guide 37

Problem Manager Problem Analyst Incident Analyst Support Requestor Authorized User Proble Table 5 RACI Matrix for PRB.3: Problem Investigation and Diagnosis PRB.3: Problem Investigation and Diagnosis RACI Matrix Activity PRB.3.1 Initial Investigation A/R PRB.3.2 Assistance Needed? A R PRB.3.3 Generate Appropriate Tasks PRB.3.4 Close Task Activities PRB.3.5 Determine if Change is Cause PRB.3.6 Relate Change to the Problem PRB.3.7 Match to Existing Known Errors PRB.3.8 Relate to Known Error PRB.3.9 Root Cause Identified A/R A/R A/R A/R A/R A/R A/R PRB.3.10 Escalate C/I A/R I PRB.3.11 Continue with Investigation? A/R C/I PRB.3.12 Document Root Cause PRB.3.13 Workaround Identified? PRB.3.14 Document Workaround A/R A/R A/R PRB.3.15 Complete Problem Investigation A R I I PRB.3.16 Known Error Created? PRB.3.17 Knowledge Entry Required? PRB.3.18 Create Knowledge Entry A/R A/R A/R

Completeness Accuracy Validity Authorization Safeguarding Presentation Table 6 Risk and Control Matrix for PRB.3: Problem Investigation and Diagnosis PRB.3: Problem Investigation and Diagnosis Risk and Control Matrix Assertions Control Ref. Control Objective 20000:8.3.10 Preventative action and investigation is undertaken to reduce potential Problems through the use of information generated by trend analysis, review of published material, change management, assets and configuration and historical information. Process Activity Stated Evidence DS10.1 20000:8.3.2 Processes have been implemented to report and classify Problems as they are identified. Steps taken determine category, Impact, Urgency and Priority. These groupings are the basis for allocating Problems to support staff and cataloguing within investigation databases. DS10.2 20000:8.3.3 8.3.6 The process provides for adequate audit trail facilities that allow tracking, analysing and determining the Root Cause of all reported Problems considering associated CIs, outstanding Problems and Incidents, known and suspected errors. PRB.3.12 Automated Records: update worklog in ticket Manual Records: updated databases external to tool Problem Management Process Guide 39

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.3: Problem Investigation and Diagnosis Risk and Control Matrix Assertions Control Ref. Control Objective Process Activity Stated Evidence DS10.3 A procedure exists to close the Problem record either after the confirmation of the successful elimination of the Known Error, or after agreement with the business on how to alternatively handle the Problem. DS10.4 20000:8.3.4 Demonstrable integration of the Change, Configuration, and Incident Management processes with Problem Management. This is using the processes to support Problem investigations PRB.3.5 Determine if Change is Cause Automated Records: Change records Manual Records: Forward Schedule of Change 20000:8.3.8 The process has internal reviews built in. Results provided by investigations are reviewed by the Problem Manager, and other supporting process Managers. 20000:8.3.5 With a bias to Incident Management and the Service Desk, the process has defined mechanism for updating support analysts on Known Error status, Workarounds and solutions PRB.3.12 Document Root Cause X Automated Records: update worklog in ticket Manual Records: updated databases external to tool. PRB.3.14 Automated Records: Problem Management Process Guide 40

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.3: Problem Investigation and Diagnosis Risk and Control Matrix Assertions Control Ref. Control Objective Process Activity Document Workaround PRB.3.15 Complete Problem Investigation Stated Evidence update worklog in ticket Manual Records: updated databases external to tool Automated Records; Updated Ticket Problem Management Process Guide 41

PRB.4: Problem Resolution and Closure In the Problem resolution and closure stage, the Problem Manager reviews the Problem investigation and closes it. This stage occurs if a Problem Investigation is marked as completed, but no Known Error with Workaround or solution is discovered. In effect, the Problem Manager has stopped further work on the Problem, or cancelled further work. If a Known Error is created from the Problem investigation, this stage occurs when the Known Error is closed. PRB.4: ITSM Problem Resolution and Closure Problem Manager Problem Analyst Requester PRB.3.14 PRB.4.1 Review and validate details PRB.4.2 Complete Problem Control Investigation Close Problem Investigation CLOSED Notify Analyst(s) of Related Incidents Other Service Support / Delivery Processes Incident Management Figure 7: Problem Resolution and Closure diagram Problem Management Process Guide 42

Problem Manager Problem Analyst Incident Analyst Support Requestor Authorized User Table 7 Activities for PRB.4: Problem Resolution and Closure PRB.4: Problem Resolution and Closure Activity Inputs Description Outputs PRB.4.1 Review and Validate Details Problem Tickets Additional Documentation supplied by the Analyst The Problem Manager reviews and validates the details of the Problem investigation. Validation PRB.4.2 Complete Problem Control Investigation Validation and Review The Problem Manager then closes the Problem investigation. The toolset notifies Incident and Problem Analysts assigned to related open Incidents. The analysts might be able to resolve the Incidents. Closed Ticket Notifications to Analysts Table 8 RACI Matrix for PRB.4: Problem Resolution and Closure PRB.4: Problem Resolution and Closure RACI Matrix Activity PRB.4.1 Review and Validate Details A/R C PRB.4.2 Complete Problem Control Investigation A/R C I I Problem Management Process Guide 43

Completeness Accuracy Validity Authorization Safeguarding Presentation Table 9 Risk and Control Matrix for PRB.4: Problem Resolution and Closure PRB.4: Problem Resolution and Closure Risk and Control Matrix Assertions Control Ref. Control Objective 20000:8.3.10 Preventative action and investigation is undertaken to reduce potential Problems through the use of information generated by trend analysis, review of published material, change management, assets and configuration and historical information. Process Activity Stated Evidence DS10.1 20000:8.3.2 Processes have been implemented to report and classify Problems as they are identified. Steps taken determine category, Impact, Urgency and Priority. These groupings are the basis for allocating Problems to support staff and cataloguing within investigation databases. DS10.2 20000:8.3.3 8.3.6 The process provides for adequate audit trail facilities that allow tracking, analysing and determining the Root Cause of all reported Problems considering associated CIs, outstanding Problems and Incidents, known and suspected errors. DS10.3 A procedure exists to close the Problem record either after the confirmation of the successful elimination of the Known Error, or after agreement with the business on how to alternatively handle the Problem. PRB.4.2 Close the Investigation Automatic Record: updated ticket Problem Management Process Guide 44

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.4: Problem Resolution and Closure Risk and Control Matrix Assertions Control Ref. Control Objective Process Activity Stated Evidence DS10.4 20000:8.3.4 Demonstrable integration of the Change, Configuration, and Incident Management processes with Problem Management. This is using the processes to support Problem investigations 20000:8.3.8 The process has internal reviews built in. Results provided by investigations are reviewed by the Problem Manager, and other supporting process Managers. PRB.4.1 Review and Validate Details Automated Record: updated ticket 20000:8.3.5 With a bias to Incident Management and the Service Desk, the process has defined mechanism for updating support analysts on Known Error status, Workarounds and solutions PRB.4.2 Complete Problem Control Investigation Automated Record: updated ticket and notification to Analysts Problem Management Process Guide 45

PBR.5: Known Error Identification and Recording In this stage, the Problem Analyst records the details about the Known Error. A Known Error can be initiated from a Problem Investigation. In addition, other parts of the IT organization can request that a Known Error be created without a full Problem Investigation. For instance, when the Root Cause of a Problem has already been determined by the Requestor who is asking for help with the final solution, that Requestor can submit all of the pertinent information to the Problem Management for verification and subsequent inclusion in the Known Error database. The Release Management process can identify Known Errors with software/hardware CIs about to be introduced into the infrastructure. This information is then passed forward to the Problem Management team for verification and subsequent inclusion in the Known Error database. A big part of Problem Management is ensuring that the information generated by investigation is available to the rest of the ITSM and support teams through the processes that guide them. Problem Management therefore provides a strong emphasis on recordkeeping and establishing relationships among Incidents, Problems, Workarounds, Solutions and CIs. Problem Management Process Guide 46

PRB.5: ITSM Known Error Identification and Recording Support Requester Known Error Initiated A Known Error can be created without a Problem Investigation when the Root Cause has already been determined (ex. Errors discovered in Development) Problem Manager Problem Analyst PRB.5.1 Record Known Error Details PRB.3.15 PRB.5.2 Associate Problem Investigation to Known Error PRB.5.3 Associate Related Incident(s) to Known Error PRB.5.4 Associate Related CIs to Known Error PRB.6.1 Other Service Support / Delivery Processes Known Error Database Incident Mgt. CMDB Figure 8: Known Error Identification and Recording diagram Problem Management Process Guide 47

Table 10 Activities for PRB.5: Known Error Identification and Recording PRB.5: Known Error Identification and Recording Activity Inputs Description Outputs PRB.5.1 Record Known Error Details Results of Investigations Data from Requestors The Problem Analyst records the Known Error details. In the event that the Known Error is submitted by another team working in the IT environment, the Problem Management team will verify and then record the details. Known Error Data base updated Record PRB.5.2 Relate Problem Investigation to Known Error Known Error data Problem investigation ticket If the Known Error was initiated from a Problem investigation, the Problem Analyst relates the investigation to the Known Error. Relationship specified PRB.5.3 Relate Relevant Incidents to Known Error Known Error data Problem investigation ticket If there are relevant Incidents, the Problem Analyst relates them to the Known Error. Relationships specified PRB.5.4 Relate Relevant CIs to Known Error Known Error data Problem investigation ticket The Problem Analyst relates the relevant CIs to the Known Error. Relationships specified Problem Management Process Guide 48

Problem Manager Problem Analyst Incident Analyst Support Requestor Authorized User Table 11 RACI Matrix for PRB.5: Known Error Identification and Recording PRB.5: Known Error Identification and Recording RACI Matrix Activity PRB.5.1 Record Known Error Details A/R I PRB.5.2 Relate Problem Investigation to Known Error A/R PRB.5.3 Relate Relevant Incidents to Known Error A/R I PRB.5.4 Relate Relevant CIs to Known Error A/R I I Problem Management Process Guide 49

Completeness Accuracy Validity Authorization Safeguarding Presentation Table 12 Risk and Control for PRB.5: Known Error Identification and Recording PRB.5: Known Error Identification and Recording Risk and Control Matrix Assertions Control Ref. Control Objective 20000:8.3.10 Preventative action and investigation is undertaken to reduce potential Problems through the use of information generated by trend analysis, review of published material, change management, assets and configuration and historical information. Process Activity Stated Evidence DS10.1 20000:8.3.2 Processes have been implemented to report and classify Problems as they are identified. Steps taken determine category, Impact, Urgency and Priority. These groupings are the basis for allocating Problems to support staff and cataloguing within investigation databases. DS10.2 20000:8.3.3 8.3.6 The process provides for adequate audit trail facilities that allow tracking, analysing and determining the Root Cause of all reported Problems considering associated CIs, outstanding Problems and Incidents, known and suspected errors. DS10.3 A procedure exists to close the Problem record either after the confirmation of the successful elimination of the Known Error, or after agreement with the business on how to alternatively handle the Problem. DS10.4 Demonstrable integration of the Change, Configuration, and Incident Management Problem Management Process Guide 50

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.5: Known Error Identification and Recording Risk and Control Matrix Assertions Control Ref. Control Objective 20000:8.3.4 processes with Problem Management. This is using the processes to support Problem investigations Process Activity Stated Evidence 20000:8.3.8 20000:8.3.5 The process has internal reviews built in. Results provided by investigations are reviewed by the Problem Manager, and other supporting process Managers. With a bias to Incident Management and the Service Desk, the process has defined mechanism for updating support analysts on Known Error status, Workarounds and solutions PRB.5.2 Relate Problem Investigation to Known Error PRB.5.3 Relate Relevant Incidents to Known Error PRB.5.4 Relate Relevant CIs to Known Error x x x Automatic Record; updated ticket Automatic Record; updated ticket Automatic Record; updated ticket Problem Management Process Guide 51

PRB.6: Known Error Classification and Assessment In this stage, the Problem Analyst classifies the Known Error, the Problem Manager reviews and assigns the Known Error, and the Problem Analyst assesses how to correct the error. The ultimate goal at this stage is to move the Known Error through to a complete correction state. According to ITIL, this is now considered to be part of Error Control. PRB.6: ITSM Known Error Classification and Assessment Problem Manager Problem Analyst Requester PRB.6.1 Classify Known Error PRB.5.4 PRB.6.2 Establish Impact, Urgency and Priority PRB.6.3 Assign Known Error to Problem Manager PRB.6.4 Review Known Error Details PRB.6.6 Accept Assignment? PRB.6.5 Assign Known Error for determination of Corrective Action No No PRB.6.9 Management Intervention on Reassignment Yes PRB.6.7 Re-classify if required and assign back PRB.6.8 Has the Known Error been Reassigned more than once? Yes PRB.6.10 Assess means of correcting Known Error PRB.7.1 Other Service Support / Delivery Processes Figure 9: Known Error Classification and Assessment Problem Management Process Guide 52

Table 13 Activities for PRB.6: Known Error Classification and Assessment PRB.6: Known Error Classification and Assessment Activity Inputs Description Outputs PBR.6.1 Classify Known Error Results of Investigations The Problem Analyst classifies the Known Error. The Problem Analyst selects the appropriate technology and operational categorization. The classification of the Known Error may be different than that of the original Problem. Updated Ticket PBR.6.2 Establish Impact, Urgency and Priority Results of Investigations The Problem Analyst establishes the Impact and Urgency of the Known Error. The Impact and Urgency of the error determine the Priority. Updated Ticket PBR.6.3 Assign Known Error to Problem Manager Updated Ticket The Problem Analyst assigns the Known Error to a Problem Manager. Notification Updated Ticket PBR.6.4 Review Known Error Details Results of Investigations The Problem Manager reviews the details of the Known Error. Validation PBR.6.5 Assign Known Error for Determination of Corrective Actions Validation The Problem Manager assigns the Known Error to a Problem Analyst to determine the appropriate corrective action. This Problem Analyst might not be the same Problem Analyst who recorded the details about the Known Errors. Notification Updated Ticket PBR.6.6 Accept the Assignment? Notification The Problem Analyst determines whether to accept the analyst. This determination should be based on skill sets. The Problem Manager should be aware of assignments and workloads before making an assignment. Decision: Acceptance or Rejection Updated Ticket PBR.6.7 Reclassify, If Required, And Assign Back Decision Results of Investigations If not accepting the assignment, the analyst performs the following steps: a If required, reclassify the Known Error. Updated Ticket Notification b Assign the Known Error back to the Problem Manager. Problem Management Process Guide 53

Problem Manager Problem Analyst Incident Analyst Support Requestor Authorized User PRB.6.8 PRB.6.9 PBR.6.10 Has the Known Error been Reassigned more than once? Management Intervention on Reassignment Assess Means of Correcting Known Error Rejected/Reassigned Problem/Known Error Ticket Results of Investigations If yes, then intervention by the Problem Manager is required In the event that the Known Error has been reassigned more than once, the Problem Manager must be involved The Problem Analyst assesses how to correct the Known Error. The process moves to the Known Error resolution and closure stage Assessment Table 14 RACI Matrix for PRB.6: Known Error Classification and Assessment PRB.6: Known Error Classification and Assessment RACI Matrix Activity PBR.6.1 Classify Known Error PBR.6.2 Establish Impact, Urgency and Priority PBR.6.3 Assign Known Error to Problem Manager A/I R PBR.6.4 Review Known Error Details PBR.6.5 Assign Known Error for Determination of Corrective Actions A/R A/R A/R A/R C/1 PBR.6.6 Accept the Assignment? C/I A/R PBR.6.7 Reclassify, If Required, And Assign Back C/I A/R Problem Management Process Guide 54

Problem Manager Problem Analyst Incident Analyst Support Requestor Authorized User PRB.6: Known Error Classification and Assessment RACI Matrix Activity PRB.6.8 Has the Known Error been Reassigned more than once? PRB.6.9 Management Intervention on Reassignment A/R A/R PBR.6.8 Assess Means of Correcting Known Error A/R Problem Management Process Guide 55

Completeness Accuracy Validity Authorization Safeguarding Presentation Table 15 Risk and Control Matrix for PRB.6: Known Error Classification and Assessment PRB.6: Known Error Classification and Assessment Risk and Control Matrix Assertions Control Ref. Control Objective 20000:8.3.10 Preventative action and investigation is undertaken to reduce potential Problems through the use of information generated by trend analysis, review of published material, change management, assets and configuration and historical information. Process Activity Stated Evidence DS10.1 20000:8.3.2 Processes have been implemented to report and classify Problems as they are identified. Steps taken determine category, Impact, Urgency and Priority. These groupings are the basis for allocating Problems to support staff and cataloguing within investigation databases. PBR.6.1 Classify Known Error PBR.6.2 Establish Impact, Urgency and Priority Automated Record: updated ticket Automated Record: updated ticket PBR.6.7 Reclassify, If Required, And Assign Back Automated Record: updated ticket DS10.2 The process provides for adequate audit trail facilities that allow tracking, analysing and determining the Root Cause of all reported Problem Management Process Guide 56

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.6: Known Error Classification and Assessment Risk and Control Matrix Assertions Control Ref. Control Objective 20000:8.3.3 8.3.6 Problems considering associated CIs, outstanding Problems and Incidents, known and suspected errors. Process Activity Stated Evidence DS10.3 A procedure exists to close the Problem record either after the confirmation of the successful elimination of the Known Error, or after agreement with the business on how to alternatively handle the Problem. DS10.4 20000:8.3.4 Demonstrable integration of the Change, Configuration, and Incident Management processes with Problem Management. This is using the processes to support Problem investigations 20000:8.3.8 The process has internal reviews built in. Results provided by investigations are reviewed by the Problem Manager, and other supporting process Managers. PBR.6.4 Review Known Error Details Automated Record: updated ticket 20000:8.3.5 With a bias to Incident Management and the Service Desk, the process has defined mechanism for updating support analysts on Known Error status, Workarounds and solutions Problem Management Process Guide 57

PBR.7: Known Error Resolution and Closure In this stage, the Problem Analyst resolves the Known Error, and the Problem Manager closes the Known Error. PRB.7: ITSM Known Error Resolution and Closure Problem Manager Problem Analyst Requester PRB.7.1 Third Party Action required? Yes PRB.7.2 Assign to Vendor and Monitor Known Error PRB.6.8 No PRB.7.3 Internal Assistance Required? Yes PRB.7.4 Generate Appropriate Tasks No PRB.7.5 Close Task Activities PRB.7.6 Permanent Correction available? Notify Analysts of related Problems Notify Analysts of related Incidents Yes No PRB.7.9 Complete Error Control PRB.7.7 Document Permanent Corrective Action No Close Known Error CLOSED PRB.7.8 Change Required? Yes Other Service Support / Delivery Processes Problem Management Incident Management After PIR approval Change Management Figure 10: Known Error Resolution and Closure Problem Management Process Guide 58

Table 16 Activities for PRB.7: Known Error Resolution and Closure PRB.7: Known Error Resolution and Closure Activity Inputs Description Outputs PBR.7.1 Third Party Action Required? Results of Investigations The Problem Analyst determines whether third-party action is required, for example if a vendor must repair equipment. Decision PBR.7.2 Assign to Vendor and Monitor Known Error Decision Results of Investigations If third-party action is required, the Problem Analyst assigns the Known Error to the vendor and monitors the Known Error. Within ITSM, assigning a Known Error to a vendor is for informational purposes. The Problem Analyst must communicate with the vendor about the Known Error. The Problem Analyst will create a Task as part of the Vendor assignment. The Task will be self-assigned to the Problem Analyst, and the status of the ticket set to Pending Vendor Response Updated ticket Communication with Vendor PBR.7.3 Internal Assistance Required? Results of Investigations The Problem Analyst determines whether the Analyst (or Vendor staff) needs assistance from someone within the organization. Decision PBR.7.4 Generate Appropriate Tasks Results of Investigations Decision The Problem Analyst generates tasks. The Problem Analyst creates the tasks with a description of what needs to be accomplished, and assigns them as appropriate. Task Creation Task Assignment PBR.7.5 Close Task Activities Task Assignment Results of Investigations The task analysts complete their tasks. Closed Tasks Results of Investigations PBR.7.6 Permanent Correction Available? Results of Investigations The Problem Analyst determines whether a permanent correction is available. Assessment of Results of Investigations PBR.7.7 Document Permanent Corrective Action Assessment of Results of Investigations If the Known Error can be corrected, the Problem Analyst documents the permanent corrective action. Updated ticket Updated database Problem Management Process Guide 59

PBR.7.8 Change Required? Results of Investigations PBR.7.9 Complete Error Control The Problem Analyst determines whether a Change is required to correct the Known Error. If required, the Change request goes through the Change Management process. After the Change is complete and the Post Implementation Review (PIR) approves the successful implementation, the Known Error process continues. Completed RFC The Problem Manager completes error control. Closed Ticket RFC The Problem Manager closes the Known Error. The toolset notifies Problem Analysts assigned to investigate related Problems, and it might be possible to close these Problems. A Problem remains complete, but not closed, until the Known Error is closed. For the related Problems, the process continues with the resolution and closure stage. Problem Management Process Guide 60

Problem Manager Problem Analyst Incident Analyst Support Requestor Authorized User Table 17 RACI Matrix for PRB.7: Known Error Resolution and Closure PRB.7: Known Error Resolution and Closure RACI Matrix Activity PBR.7.1 Third Party Action Required? C A/R PBR.7.2 Assign to Vendor and Monitor Known Error A/C R PBR.7.3 Internal Assistance Required? A/C R PBR.7.4 Generate Appropriate Tasks PBR.7.5 Close Task Activities A/R A/R PBR.7.6 Permanent Correction Available? C A/R PBR.7.7 Document Permanent Corrective Action A/R I PBR.7.8 Change Required? A/C R PBR.7.9 Complete Error Control A/R C Problem Management Process Guide 61

Completeness Accuracy Validity Authorization Safeguarding Presentation Table 18 Risk and Control Matrix for PRB.7: Known Error Resolution and Closure PRB.7: Known Error Resolution and Closure Risk and Control Matrix Assertions Control Ref. Control Objective 20000:8.3.10 Preventative action and investigation is undertaken to reduce potential Problems through the use of information generated by trend analysis, review of published material, change management, assets and configuration and historical information. Process Activity Stated Evidence DS10.1 20000:8.3.2 Processes have been implemented to report and classify Problems as they are identified. Steps taken determine category, Impact, Urgency and Priority. These groupings are the basis for allocating Problems to support staff and cataloguing within investigation databases. DS10.2 20000:8.3.3 8.3.6 The process provides for adequate audit trail facilities that allow tracking, analysing and determining the Root Cause of all reported Problems considering associated CIs, outstanding Problems and Incidents, known and suspected errors. Problem Management Process Guide 62

Completeness Accuracy Validity Authorization Safeguarding Presentation PRB.7: Known Error Resolution and Closure Risk and Control Matrix Assertions Control Ref. Control Objective Process Activity Stated Evidence DS10.3 A procedure exists to close the Problem record either after the confirmation of the successful elimination of the Known Error, or after agreement with the business on how to alternatively handle the Problem. PBR.7.9 Complete Error Control Automated Record; updated/closed ticket DS10.4 20000:8.3.4 Demonstrable integration of the Change, Configuration, and Incident Management processes with Problem Management. This is using the processes to support Problem investigations PBR.7.8 Change Required? PBR.7.9 Complete Error Control Manual Record: RFC Automated Record: Change Ticket Automated Record; updated/closed ticket 20000:8.3.8 The process has internal reviews built in. Results provided by investigations are reviewed by the Problem Manager, and other supporting process Managers. PBR.7.9 Complete Error Control Automated Record; updated/closed ticket 20000:8.3.5 With a bias to Incident Management and the Service Desk, the process has defined mechanism for updating support analysts on Known Error status, Workarounds and solutions PBR.7.7 Document Permanent Corrective Action Manual Record: Updated database record Automated Record: Updated Ticket Problem Management Process Guide 63

7. Key Performance Indicators Key Performance Indicators (KPI) can be used to monitor the performance of the process, and to identify trends within the business that may require adaptation. The Problem Manager will regularly review, monitor, and report on the KPIs. Note: The KPIs that will be used, monitored, and reported on will be determined at a later date, during the implementation Stage. The following are sample KPIs taken from the ITIL Service Support book (v2) that may be included in the regular reports: the number of RFCs raised and the Impact of those RFCs on the availability and reliability of the services covered the amount of time worked on investigations and diagnoses per organisational unit or supplier, split by Problem types the number and Impact of Incidents occurring before the root Problem is closed or a Known Error is confirmed the ratio of immediate (reactive) support effort to planned support effort in Problem Management the plans for resolution of open Problems with regard to resources: people other used resources costs (against budget) a short description of actions to be undertaken. Information about weak components in the IT Infrastructure and breaches of agreed service levels with the business and by suppliers are of concern to Availability Management. The frequency and duration of Problems is a measurement of performance against agreed service levels. Information required will include: the number of Problems and errors split by: status service Impact category User group the total elapsed time on closed Problems the elapsed time to date on outstanding Problems the mean and maximum elapsed time to close Problems or confirm a Known Error, from the time of raising the Problem record, by Impact code and by support group (including vendors) any temporary resolution actions Problem Management Process Guide 64

the expected resolution time for outstanding Problems the total elapsed time for closed Problems. Problem Management Process Guide 65

8. Procedures Procedures are a pre-determined method of performing an activity or task. Within the Problem Management process, there are several procedures, as identified in the table below. There are procedures which are tool dependent and procedures which are independent of the tool. The procedures listed below are those which are independent of the tool. Note: In most cases these procedures need to be developed in detail; they are identified here for information purposes only. Table 19 Problem Management Procedures Activity Procedure Description Problem Identification, Recording and Classification Problem Review PBR.2.1 Perform Business Impact Analysis 1 Business Impact Analysis Within the parameters of Problem Management, this procedure will outline the tools and methodologies to be used for an impact analysis. Problem Investigation and Diagnosis PRB.3.1 Initial Investigation 2 Standard Investigation Procedures PRB.3.12 PRB.3.14 PRB.3.18 Document Root Cause Document WorkAround Create Knowledge Entry Problem Resolution and Closure 3 Documenting the Root Cause of a Problem 4 Documenting the Workaround for a Known Error 5 Creation of a Knowledge Entry This procedure lists how the Problem Analyst should proceed with an investigation to ensure that a standardized methodology is employed. This procedure will detail the steps to go through to document a Root Cause, where to post it and whom to notify. This procedure will detail the steps to go through to document the WorkAround associated with an Known Error, where to post it and whom to notify This procedure will detail the steps to go through to create a Knowledge Entry, where to post it and whom to notify Known Error Identification and Recording PRB.5.1 Record Known Error 6 Record the details This procedure will detail Problem Management Process Guide 66

Activity Procedure Description Details Known Error Classification and Assessment PRB.6.9 Management Intervention on Reassignment Known Error Resolution and Closure PRB.7.2 PBR.7.7 Assign to Vendor and Monitor Known Error Document Permanent Corrective Action associated with a Known Error 7 How to deal with too many Reassignments 8 Monitoring Vendor Work on a Problem Assignment 9 Documenting Permanent Corrective Action. PBR.7.8 Change Required? 10 Building a Request for Change the steps to go through to document a Known Error, where to post it and whom to notify This procedure will help Problem Managers to break the cycle of reassignment. Guidance for evaluating the reasons for reassignment, plus parameters for final assignment should be included. This procedure will present the steps and checkpoints for working with a Vendor who has been assigned a Problem Investigation. This procedure will detail the steps to go through to document a Permanent Corrective Action, where to post it and whom to notify This procedure will step the Problem Analyst through the method for creating and validating a Request for Change that will provide a permanent correction. Problem Management Process Guide 67

CobiT 20000 Documentation Activity Manual Record Activity Automatic Record Activity 9. Control Alignment Table Table 20 Summary Controls Control Alignment Table EVIDENCE Control Objectives Preventative action and investigation is undertaken to reduce potential Problems through the use of information generated by trend analysis, review of published material, change management, assets and configuration and historical information. Processes have been implemented to report and classify Problems as they are identified. Steps taken determine category, Impact, Urgency and Priority. These groupings are the basis for allocating Problems to support staff and cataloguing within investigation databases. 8.3.10 x 1.1 x 1.1 DS10.1 8.3.2 x 1.1 1.4 1.5 2.5 6.1 The process provides for adequate audit trail facilities that allow tracking, analysing and determining the Root Cause of all reported Problems considering associated CIs, outstanding Problems and Incidents, known and suspected errors. DS10.2 8.3.3 8.3.6 6.2 6.7 X 1.2 1.3 2.3 A procedure exists to close the Problem record either after the confirmation of the successful elimination of the Known Error, or after agreement with the business on how to alternatively handle the Problem. DS10.3 8.3.4 X 4.2 7.9 Demonstrable integration of the Change, Configuration, and Incident Management processes with Problem Management. This is using the processes to support Problem investigations. DS10.4 8.3.4 X 3.5 7.8 X 1.2 1.3 3.5 7.8 7.9 Problem Management Process Guide 68

CobiT 20000 Documentation Activity Manual Record Activity Automatic Record Activity Control Alignment Table EVIDENCE Control Objectives The process has internal reviews built in. Results provided by investigations are reviewed by the Problem Manager, and other supporting process Managers. 8.3.8 X 2.1 X 1.6 2.2 2.5 With a bias to Incident Management and the Service Desk, the process has defined mechanism for updating support analysts on Known Error status, Workarounds and solutions. 8.3.5 X 3.12 3.14 7.7 4.1 6.4 7.9 X 3.12 3.14 3.15 4.2 5.2 5.3 5.4 7.7 Problem Management Process Guide 69

Appendix A Table of Figures Figure 1: Service Life Cycle... 9 Figure 2: Process Overview diagram... 20 Figure 3: Problem Identification, Recording and Classification diagram... 22 Figure 4: Problem Review diagram... 27 Figure 5: Problem Investigation and Diagnosis diagram... 33 Figure 6- Problem Investigation and Diagnosis diagram continued... 34 Figure 7: Problem Resolution and Closure diagram... 42 Figure 8: Known Error Identification and Recording diagram... 47 Figure 9: Known Error Classification and Assessment... 52 Figure 10: Known Error Resolution and Closure... 58 Problem Management Process Guide A-70

Table of TablesTable 1 Activities for PRB.1: Problem Identification, Recording and Classification 22 Table 2 RACI Matrix for PRB.1: Problem Identification, Recording and Classification 24 Table 3 Risk and Control Matrix for PRB.1: Problem Identification, Recording and Classification 25 Table 4 Activities for PRB.3: Problem Investigation and Diagnosis 35 Table 5 RACI Matrix for PRB.3: Problem Investigation and Diagnosis 38 Table 6 Risk and Control Matrix for PRB.3: Problem Investigation and Diagnosis 39 Table 7 Activities for PRB.4: Problem Resolution and Closure 43 Table 8 RACI Matrix for PRB.4: Problem Resolution and Closure 43 Table 9 Risk and Control Matrix for PRB.4: Problem Resolution and Closure 44 Table 10 Activities for PRB.5: Known Error Identification and Recording 48 Table 11 RACI Matrix for PRB.5: Known Error Identification and Recording 49 Table 12 Risk and Control for PRB.5: Known Error Identification and Recording 50 Table 13 Activities for PRB.6: Known Error Classification and Assessment 53 Table 14 RACI Matrix for PRB.6: Known Error Classification and Assessment 54 Table 15 Risk and Control Matrix for PRB.6: Known Error Classification and Assessment 56 Table 16 Activities for PRB.7: Known Error Resolution and Closure 59 Table 17 RACI Matrix for PRB.7: Known Error Resolution and Closure 61 Table 18 Risk and Control Matrix for PRB.7: Known Error Resolution and Closure 62 Table 19 Problem Management Procedures 66 Table 20 Summary Controls 68 Problem Management Process Guide A-71