INS Problem Management Manual



Similar documents
Problem Management Overview HDI Capital Area Chapter September 16, 2009 Hugo Mendoza, Column Technologies

ITIL A guide to problem management

Release Management Release, Release Features and Migration. Release Management

Incident Management Get Your Basics Right

ITIL v3 Incident Management Process

Avon & Somerset Police Authority

HP Service Manager. Software Version: 9.34 For the supported Windows and UNIX operating systems. Incident Management help topics for printing

University of Waikato Change Management Process

Problem Management Fermilab Process and Procedure

The ITIL v.3 Foundation Examination

ITIL Introducing service transition

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Problem Management help topics for printing

ITIL A guide to service asset and configuration management

Auxilion Service Desk as a Service. Service Desk as a Service. Date January Commercial in Confidence Auxilion 2015 Page 1

ITIL & PROCESSES. Basic Training

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Processes and Best Practices Guide (Codeless Mode)

ITSM. Maturity Assessment

Problem Management. Process Guide. Document Code: Version 2.6. January 7, Robert Jackson (updates) Ashish Naphray (updates)

TechExcel. ITIL Process Guide. Sample Project for Incident Management, Change Management, and Problem Management. Certified

Roles within ITIL V3. Contents

ITSM Process Description

Which statement about Emergency Change Advisory Board (ECAB) is CORRECT?

Communicate: Data Service Level Agreement. Author: Service Date: October 13. Communicate: Data Service Level Agreementv1.

ITIL Roles Descriptions

HP Service Manager. Software Version: 9.34 For the supported Windows and UNIX operating systems. Processes and Best Practices Guide

ITIL Introducing service operation

INCIDENT MANAGEMENT SCHEDULE

Introduction Purpose... 4 Scope... 4 Manitoba ehealth Change Management... 4 Icons RFC Procedures... 5

ITIL by Test-king. Exam code: ITIL-F. Exam name: ITIL Foundation. Version 15.0

pavassure Resolve Service desk Onsite diagnosis and recovery Enhanced hours support Monitor Event monitoring & alerting Reporting Services

Module 1 Study Guide

Applying ITIL v3 Best Practices

Problem Management: A CA Service Management Process Map

Introduction to ITIL: A Framework for IT Service Management

DIGITAL MARKETPLACE (G CLOUD 7) OFFERING. Sopra Steria Integration Platform Support as a Service. Service Overview. Sopra Steria in the public sector

Infasme Support. Incident Management Process. [Version 1.0]

Overview of Service Support & Service

Foundation. Summary. ITIL and Services. Services - Delivering value to customers in the form of goods and services - End-to-end Service

Process Description Incident/Request. HUIT Process Description v6.docx February 12, 2013 Version 6

Implementation Date: November 9, Table of Contents. Section Description Page

The ITIL Foundation Examination

Identifying & Implementing Quick Wins

Incident Management Policy

Version 6.5 Users Guide

INTERVIEW QUESTIONS. Que: Which process is responsible for ensuring that the CMDB has been updated correctly?

CA Nimsoft Service Desk

ITIL A guide to incident management

Service Improvement. Part 1 The Frontline. Robert.Gormley@ed.ac.uk

The ITIL Foundation Examination

How To Create A Help Desk For A System Center System Manager

Yale University Incident Management Process Guide

Customer Support Handbook

S1200 Technical Support Service Overview

The ITIL v.3. Foundation Examination

Customer Service Charter TEMPLATE. Customer Service Charter Version: 0.1 Issue date :

MOF Service Management Function Incident Management

EXIN.Passguide.EX0-001.v by.SAM.424q. Exam Code: EX Exam Name: ITIL Foundation (syllabus 2011) Exam

Managed Service for MaaS360 Helpdesk to Helpdesk Support Service Charter

WHITE PAPER. iet ITSM Enables Enhanced Service Management

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Incident Management help topics for printing

Free ITIL v.3. Foundation. Exam Sample Paper 3. You have 1 hour to complete all 40 Questions. You must get 26 or more correct to pass

Incident Management Policy

Best Practices For Assigning First Call Responsibilities For Healthcare Networking Issues

SysAidTM ITIL Package Guide. Change Management, Problem Management and CMDB

Change Management MANDATORY CRITERIA

Introduction Purpose... 4 Scope... 4 Manitoba ehealth Incident Management... 4 Icons... 4

Incident Management: A CA IT Service Management Process Map

We released this document in response to a Freedom of Information request. Over time it may become out of date. Department for Work and Pensions

Publish Date: 19/06/14 Version: 1.2. Call Recording/Logging Service Level Agreement. Page: 1. Call Recording/Logging Service Level

Commonwealth of Massachusetts IT Consolidation Phase 2. ITIL Process Flows

The ITIL Foundation Examination Sample Paper A, version 5.1

Central Agency for Information Technology

UNM Service Desk Standard

ITIL Essentials Study Guide

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Request Management help topics for printing

Contact / Escalation Guide. For OPENHIVE Managed Services provided by Capita. Version 6.0

An ITIL Perspective for Storage Resource Management

Keywords: Escalation, Incident, Management, Process

HP Service Manager. Process Designer Content Pack Processes and Best Practices Guide

White Paper. Incident Management: A CA IT Service Management Process Map

Government of Ontario IT Standard (GO-ITS) GO-ITS Number 38 Enterprise Problem Management Process

The ITIL Foundation Examination

Problem Management Why and how? Author : George Ritchie, Serio Ltd george dot- ritchie at- seriosoft.com

The ITIL Foundation Examination

Internal Audit Report ITS CHANGE MANAGEMENT PROCESS. Report No. SC-11-11

Contact / Escalation Guide. For OPENHIVE Managed Services provided by Capita. Version 3.0

SERV SER ICE OPERA OPERA ION

Trainning Education Services Av. Paulista, º andar SP Tel/Fax: 55+ (11)

The ITIL Foundation Examination

The ITIL Foundation Examination

Customer Support Services

HP Service Manager. Software Version: 9.34 For the supported Windows and UNIX operating systems. Service Desk help topics for printing

Clarity Assurance allows operators to monitor and manage the availability and quality of their network and services

ENTERPRISE SERVICE DESK (ESD) SERVICE DELIVERY GUIDE

MyOfficePlace Business Critical Services Handbook

GMS NETWORK ADVANCED WIRELESS SERVICE PRODUCT SPECIFICATION

Novo Service Desk Software

Maruleng Local Municipality ICT CHANGE MANAGEMENT POLICY

Terms of Use - The Official ITIL Accreditor Sample Examination Papers

Address IT costs and streamline operations with IBM service desk and asset management.

Transcription:

INS Problem Management Manual Process, Policies & Procedures Version 0.4 Draft Ver 0.4 INS Problem Management Manual.doc Page 1 of 47

Document Control Document Versions Date Version Person Changing Brief Summary of Change 08/01/2010 0.1 Gary Bullard, IT Consultant 14/01/2010 0.2 Gary Bullard, IT Consultant 11/06/2013 0.3 Anna Pegg, CTI Project Manager 17/07/2013 0.4 Anna Pegg, CTI Project Manager First Draft for Review by Process Owner Updated with changes from Process Owner: Bruce Scott. For review by reference group on 18 Jan 2010 Updated with changes from Project Manager SDT Phase 2: Anna Pegg. Based on updates required to reflect approach taken when implementing Problem Management. Updated Document approval in order to allow document to be published online. Document Location The document is located at: \\staff.ad.griffith.edu.au\groups\ins\its\its-projects\ins Projects_Closed\INS Projects_Closed 2011\INS473 Service Desk replacement (Servicenow.com)\ITIL Processes\ITIL Problem Management\Current Document Approval* *The Problem Management Roles will be reviewed after the 2013 New INS changes have come into effect. The detailed Problem Process is still current; however reporting structures can be confirmed with your manager. Authorising Officer Name Signature/s Date Process Owner Problem Manager Document Owner Bruce Scott To be determined Bruce Scott Draft Ver 0.4 INS Problem Management Manual.doc Page 2 of 47

Table of Contents Document Control...2 Table of Contents...3 Problem Management Process...4 Overview...4 Benefits of Problem Management...4 Related Processes...5 Ownership & Responsibility...5 Document Location:...5 Contact Person...5 Inputs & Outputs of Problem Management...6 Problem Management Business Rules...7 Problem Prioritisation...9 Problem Categorisation...10 Problem Status Codes...11 Problem Resolution Codes...12 Problem Management Procedures...13 High Level Process Flow...13 Defined Problem Process Flow...14 2.0 Problem Prioritisation and Categorisation Procedure...18 3.0 Problem Investigation & Diagnosis Procedure...18 4.0 Workarounds & Known Errors...22 5.0 Problem Resolution...24 6.0 Problem Closure...29 7.0 Communication...31 8.0 Proactive Problem Analysis...33 Problem Management Roles...36 Problem Governance Team...36 Process Owner...37 Problem Manager...38 Service Desk (Service Desk and EIS Assist)...39 IT Support Specialist...40 Client...41 Process Metrics...42 Problem Management Reporting...46 Reports required by IT Support Team Leads...46 Reports required by Manager Planning (Account Management role)...47 Reports required by Problem Manager...47 Draft Ver 0.4 INS Problem Management Manual.doc Page 3 of 47

Problem Management Process Overview Problem Management is the process responsible for managing the lifecycle of all problems and the primary objectives of Problem Management are to prevent problems and resultant incidents from happening, to eliminate recurring incidents and to minimise the impact of incidents that cannot be prevented. Prevention of problems requires INS to take a proactive stance which requires direction and leadership within INS support teams. This is covered more fully in the Procedures section of this Manual (refer 8.0 Proactive Problem Analysis). By implementing Problem Management in Information Services (INS) we are seeking to achieve best practice processes using the IT Infrastructure Library (ITIL) process framework as a guide. A problem is defined within ITIL as the unknown cause of one or more incidents, however this definition can be extended to include matters which have been identified as having the potential to trigger incidents such as matters which have not been fully resolved at system implementation, and matters which have been identified through proactive monitoring of the IT infrastructure and incident trends. Problem Management focuses on analysing problems with a view to identifying the root cause and then ensuring that satisfactory resolution of the problem is achieved. Problem resolution will often involve changes being to be made to one of more IT infrastructure items (e.g. programs, hardware etc.) and the process disciplines in supporting processes such as Change Management and Release Management will be relied on to ensure problems are successfully resolved. Problem Management will also maintain information about problems and the associated workarounds and/or permanent fixes that are implemented. This will be achieved using the Knowledge Management functionality of the support system and more particularly data repositories such as the Knowledge Base. The availability and proper use of this information will greatly assist in incident management and contribute to more efficient problem resolution, ensuring that client service availability is maximised. This document covers the both the strategic and operational Problem Management practices within INS and details the process flow, business rules, procedures and metrics. It also identifies reporting requirements and the roles and responsibilities of process participants. It is critical that Problem Management links effectively with other INS ITIL oriented processes such as Incident, Change and Knowledge Base Management. These linkages are reflected in the High Level Process Flow which is detailed in the Problem Management Procedures section of this Manual. Benefits of Problem Management Disciplined management of problems will deliver INS and business clients some real and measurable benefits over time; in particular: Higher availability of IT services Higher productivity of business and IT staff Reduced expenditure on workarounds or fixes that do not work Reduction in cost of effort related to fire-fighting and/or resolving repeat incidents Draft Ver 0.4 INS Problem Management Manual.doc Page 4 of 47

Better quality management information on the profile and status of problems Related Processes For more detail on interfacing processes refer to the following links: https://intranet.secure.griffith.edu.au/computing/service-desk/about-itil Incident Management Process Handbook Version 1.5 Change Management Process Manual Version 4.0 Knowledge Management Process document Ownership & Responsibility The Manager of the Office of the Director ICTS is the Process Owner The *** To be Determined **** is the Problem Manager, in lieu of a Problem Manager the Service Owner related to the problem will take on the Problem Manager responsibilities. * Service Desk staff & IT Support Specialists are responsible for complying with the documented Problem Management processes and procedures The Manager of the Office of the Director ICTS is the Document Owner *In this document the term Service Desk Analyst is used to represent the first level support staff in the Library & IT Help and EIS Assist teams, and the term Support Specialist is used to represent staff providing higher levels of support (2 nd level, 3 rd level etc. for instance the Server Support Services and the Messaging & Collaboration teams). Document Location: The latest version of this document is able to be accessed on the INS intranet. Refer the following link: \\staff.ad.griffith.edu.au\groups\ins\its\its-projects\ins Projects_Active\INS473 Service Desk replacement (Service-now.com)\ITIL Processes\ITIL Problem Management\Current\Draft Ver 0.3 INS Problem Management Manual.doc Contact Person Any queries or suggestions for improvement to the scope and operation of the Problem Management procedures should be discussed first with the team leader and, as necessary, directed through to the document owner for discussion. Document owner contact details as below: Phone: 3735 7368 Email: b.scott@griffith.edu.au Draft Ver 0.4 INS Problem Management Manual.doc Page 5 of 47

Inputs & Outputs of Problem Management Scope This section lists all the relevant inputs and outputs to the process. Inputs Process Inputs Unresolved Incidents from Incident Management Problems identified from formal reviews of Priority 1 incidents. Problems identified from trend analysis and other proactive problem management initiatives Problems identified from the review of status messages and alerts generated from systems, network and application management platforms. Problems identified through system testing e.g. shortfalls in meeting business requirements which require root cause analysis and problem resolution Problems identified post system implementation; e.g. failed changes Notification of CI faults direct from manufacturer Outputs Output Objective Resolved Problems Known Errors Requests for Change (RFC) Management Information Knowledge Articles Description The removal of the underlying cause of problems through reactive and proactive PM activities Once the underlying cause of the problem is identified a known error will ensue. Underlying causes of problems may require an infrastructure change in order to remove them. An RFC must be raised to support the change in line with the established Change Management Process. Regular reports on the performance of the Problem Management process. Formal review reports on major outages and other Priority 1 rated incidents. A known error with a workaround becomes a knowledge article. INS will use the support system Knowledge Base functionality to store Known Errors and any associated workarounds and root cause fixes. These Known Error records will be able to be searched to facilitate resolution of open incidents. Draft Ver 0.4 INS Problem Management Manual.doc Page 6 of 47

Problem Management Business Rules This section describes all the process rules that are fundamental to the effective and efficient operation of the INS Problem Management Process. The detailed procedures which are detailed later in this document embody these rules. Business Rules Problem Management 1. Strategy and Communications Business Rule 1.1 The focus of all problem management effort is on detecting and correcting the root cause of problems. 1.2 A single Problem Management process is to be utilised by all INS teams as defined within the Problem Management Manual. 1.3 Support Team Leaders will balance their problem related work efforts between reactive problem management (i.e. acting on advised incidents) and proactive problem management (i.e. identifying trends and targeting preventative actions before problems occur). 1.4 All changes required to resolve problem root causes must be channelled through, and follow the prescribed procedures within the INS Change Management and Release management processes. 1.5 Every problem must have a designated Problem Owner who is responsible for the lifecycle of the problem 1.6 The IT Support Team Leader must ensure that any open problems which are owned by a person who is leaving INS are moved to other team members to action and resolve. 1.7 Every action in relation to a problem is to be documented within the problem record, and the problem status is to be maintained to reflect the true disposition of the problem at a point in time. 1.8 Problem records may only be viewed by INS personnel. Clients will have no access to Problem Records and must direct any problem related queries to either Service Desk or the Problem Owner 1.9 All client queries and communications in regard to problems under management by INS are to be directed through either Service Desk or the nominated problem owner. Activity history will not be available to clients by direct access to the support system. 1.10 Problems are to be managed by team members with the best levels of skills and knowledge and may be reassigned to the most appropriate group during the lifecycle of a problem. 1.11 Process metrics will be developed and regularly reported at both team and senior management levels. Benefit Resolution of problem root cause prevents future related incidents from occurring and improves productivity of support teams. Standard and consistent approach which ensures that all problems have visibility and are managed effectively and efficiently. Increased integration between IT service areas. Reduced service outages and improved SLA performances on system availability Reduction in the number of incidents. Changes implemented have higher likelihood of success following comprehensive testing and approvals. Establishes clear accountability for actions associated with a particular problem. Facilitates reporting by support areas. Ensures that problems are kept under notice and progress to a resolution as soon as practicable. Assists with communication with clients and other support staff. Assists with resolution of future incidents and problems Facilitates accurate reporting. Clients receive the best quality information on the status and nature of the problem. Support staff productivity is improved with fewer distractions to the core activities associated with problem root cause analysis and resolution. Consistent client communications and feedback Reduced risk of activity information being misinterpreted Problems are resolved quicker with more robust client solutions are delivered. Support team knowledge is shared and overall team collaboration is improved. Process improvement areas can be more readily identified and actioned Draft Ver 0.4 INS Problem Management Manual.doc Page 7 of 47

2. Recording and Classification Business Rule 2.1 Every problem identified from all channels, including advices from external providers, must be accurately recorded, categorised and prioritised. 2.2 All logged problems are matched against all other current problems. 3. Problem Investigation and Diagnosis Business Rule 3.1 Problem queues and statuses must be regularly reviewed by Support teams. 3.2 The Problem Manager, INS senior management and the relevant Service Manager are automatically advised when Priority 1 incidents occur 3.3 Problems where the cause is known should have their status changed to known errors. 3.4 All workarounds must be discussed and agreed with the client and recorded in the Knowledge Base. 4. Major Problem review (Priority 1 incidents) Business Rule 4.1 The Problem Manager and the relevant Service Line Manager are to be notified of a Priority 1 incident. 4.2 A formal review is to be convened and completed by the Problem Manager for all Priority 1 incidents and a problem record raised to incorporate the report. Any additional problems identified for root cause analysis must be linked back to the parent problem. 4.3 Priority 1 Incident review reports must be tabled for discussion at each fortnightly ICTS Management Team meeting. 5. Problem Resolution and Closure Business Rule 5.1 The root cause of a problem must be confirmed with the relevant Service Line Managers before the problem status is updated to a Known Error. Benefit Facilitates reporting of new problems per channel. Allows for incident trends to be monitored. Work effort can be better directed to addressing problems which are rated as more significant. High urgency/impact problems can be more readily determined and reported. Problem trends can be detected. Problem management staff can work together resolving problem causes. Benefit Any undue delays or bottlenecks in the problem management cycle are identified as early as possible. Provides early advice on any critical service outages so that analysis and recovery efforts can be expedited to reduce system downtime. Ensures the solution is promptly available for support staff. Improved quality of service for clients. Minimise business downtime. Benefit Communicates critical outages to relevant areas A well structured and comprehensive review will identify additional problems and reduce the risk of the same or similar high impact service failures. Review document is readily available to all staff and shares learnings, outcomes and improvement strategies and initiatives. Review findings and any associated recommendations can be reviewed and confirmed by a management group independent of the process operation. Benefit The Service Managers is better informed as to any open problems relating to the service/s under their management. Known errors are accurately defined for future reference. Allows for trend analysis. Future review can lead to known errors being found. Service to clients can be restored. Provides communication details for the client. Knowledge can be shared 5.2 A record is kept of all closed but unresolved problems with documented reasons for not resolving. 5.3 When a problem status is changed to known error, all incident managers with incidents related to the problem are to be informed and the Knowledge Base will be updated. 5.4 Problem resolutions are to be applied only after any Reduces the risk of changes failing on implementation changes have been approved and scheduled for release Minimises rework of changes and increases through the INS Change/Release Management process. productivity of support teams 5.5 The Knowledge Base is updated when a solution is Ensures other support staff have the information they determined need to resolve problems 5.6 Problems can only be closed by the Problem Manager Provides independent assurance to management that the process has been properly followed and the information recorded is of the highest quality (including resolution codes and known error records). Draft Ver 0.4 INS Problem Management Manual.doc Page 8 of 47

Urge ncy Enhances the integrity of reporting and provides a more accurate base of information for management decision making on forward problem related strategies and initiatives. Problem Prioritisation All problems logged within the support system will be given a suitable priority. Like incidents, problems will be prioritised using a combination of impact and urgency to the business. The following matrix will be used to determine Problem Priority: The priority of a problem is determined by: 1. Impact: Impact of the problem on the business. The number of clients or importance of system affected. The hierarchical position of the client is included in this variable. 2. Urgency: How severely the client s work process is affected. The Impact/Urgency matrix, shown below, determines the priority of the problem. Impact Low Medium High Low 5 4 3 Medium 4 3 2 High 3 2 1 The assessment methodology for the impact and the severity is explained in more detail below. Note: SLA requirements and policies for Problem Management are not a part of ITIL requirements as such there are no SLA escalations associated with Problem prioritisation. Impact Problems will be placed into High, Medium and Low impact categories. The key factor in measuring impact is the impact the problem has on the business and the following criteria guides the impact assessment. Impact High Medium Low Description Whole organisation affected; Site or multiple sites affected; Multiple groups of clients affected; Critical business process interrupted; or System-wide outages to Learning@Griffith, Staff portal, or Email Group of clients, a Pro Vice Chancellor (PVC), or a member of the Vice Chancellor s (VC s) Office staff affected; Non-critical business process interrupted. One client affected (other than VC s Office or PVCs) Draft Ver 0.4 INS Problem Management Manual.doc Page 9 of 47

Urgency Problems will be placed into High, Medium and Low urgency categories. The key factor in measuring urgency is how severely the client s work process is affected and the following criteria guides the urgency assessment. Urgency High Medium Low Description Process stopped; client(s) cannot work Process affected; client(s) cannot use certain functions Process not affected; change request, new/extra/optimised function Problem Categorisation All problems opened in Service Desk Tool will be categorised by Service Line and associated Service. This categorisation information is presented in drop down lists and will be maintained to align with the INS Service Catalogue information that is incorporated in the INS to Griffith University Service Level Agreement 2010/2011. The Service Line and Service fields are mandatory when creating a Problem record. Categorising problems in this way enables the Service Level Manager to profile open problems by Service Lines and Services and better positions INS for communications with business on service related matters. The following link displays the current INS Service Catalogue incorporating Service Lines, associated Services and detailed service descriptions. **** Insert hyperlink to INS Services Catalogue ***** Draft Ver 0.4 INS Problem Management Manual.doc Page 10 of 47

Problem Status Codes Scope The status of a problem reflects the current position in its lifecycle, sometimes known as its 'workflow position'. All problems logged in the support system will have the relevant status applied at each stage of its progression toward closure. The Status is a mandatory field. Status Codes The following status codes will be used for problems logged by INS: Problem Status Codes Open Work in Progress Known Error Awaiting External Response Awaiting Change Request Completion Resolved Closed No QA issues Explanation This is the system default when a problem record is first created and indicates that a problem is logged but no action has yet been taken. Indicates that problem investigation and diagnosis has commenced and/or actions are underway to develop the solution or workaround. Indicates that INS has raised a known error record and communicated the scope and nature of the problem out to all stakeholders. INS has determined that widespread knowledge of the problem would be beneficial. A response from an external vendor is required to inform the problem resolution. Indicates that a Request For Change (RFC) has been raised to resolve the problem and work on the RFC is still in progress. Note: The problem record will be linked to the RFC item (if the RFC is a standard change) or referenced to the RFC item (if the RFC is a nonstandard change within the SQISS system). This status indicates that EITHER a permanent solution to the problem has been found, applied and tested successfully to meet business and/or INS requirements. OR it has been determined, in agreement with business, that a permanent solution will not be implemented due to technical, financial or other business reasons. Indicates formal closure of the problem and also that the quality assurance review performed by the Problem Manager detected no issues requiring discussion and resolution prior to closure. Closed QA Issues Indicates formal closure of the problem and also that the quality assurance review performed by the Problem Manager detected some issues requiring discussion and resolution prior to closure. Work Around: A work around indicates that a client agreed work around has been implemented. Work is still proceeding to establish a permanent fix to the problem and as such the Work in Progress status code should be used. Workarounds can be improved and updated at any stage of the process, refer to section 7.1.2 Advise Workarounds Draft Ver 0.4 INS Problem Management Manual.doc Page 11 of 47

Problem Resolution Codes Scope In addition to the categorisation of problems, which can in itself be a valuable source of information as to the types of problems being experienced, resolution codes should be used to provide additional information about how the problem was resolved. A resolution code should be assigned to the problem, indicating the type of resolution action that was eventually taken. All problems logged in the support system will have the relevant resolution code applied Resolution Codes The following resolution codes will be used by INS. Problem Resolution Code Fix with changes/s required Fix Provided No Fix Available Workaround Provided NFA Unknown Cancelled Explanation Used when raising a Change is necessary to solve the root cause of the problem. Used when an explanation of how to fix the problem is supplied to the client and the fix does not require a change to any configuration item. Used when there is no fix available to a particular problem and a decision has been made by INS to live with the problem and associated incidents. Used where an effective workaround to a problem has been developed and a permanent fix has been decided against. Used when the problem cause goes away with no action having been taken. This status is applied when it has become apparent that the record raised is not appropriate (e.g. duplicate record or as a result of a team leader review). Draft Ver 0.4 INS Problem Management Manual.doc Page 12 of 47

Problem Management Procedures The following section documents the workflow and specific procedures to be followed during the lifecycle of a problem under INS management. High Level Process Flow INS Problem Management: High Level Process Flow Client Service Desk Problem Manager INS Support Team (Technical & Applications) Source: Receive Third Party Problem Advices; Perform Proactive Problem Monitoring; Identify Problem/s at System Implementation; Perform Proactive Problem Monitoring; Service Desk and Incident Management Process Gover nance 1.0 Problem Identification and Recording 2.0 Categorise and Prioritise Problem 3.0 Investigate & Diagnose Problem Yes No 4.0 Workarounds & Known Errors Assessment Investigate Further? Workaround Solution? Root Cause Known? No Yes Fix Identified? No No Yes Change Needed? Known Error Knowledge Database Yes Change Management Process Change Successful? Yes Yes 5.0 Resolve Problem 6.0 Close Problem 7.0 Communication End No No Draft Ver 0.4 INS Problem Management Manual.doc Page 13 of 47

Defined Problem Process Flow INS Problem Management: Defined Process Flow Problem Manager Governance Perform Proactive Problem Monitoring A B Conduct Review Additional Problems Identified? Yes No A Review Problems/ Reports Issue Report 6.0 Close Problem End INS Support Team (Technical & Applications) Start Perform Proactive Problem Monitoring Receive Third Party Problem Advices Identify Problem/s at System Implementation A 1.2 Raise Problem Record Yes 1.1 Problem Exists? Known Error Knowledge Database 2.0 Categorise and Prioritise Problem Update/Post Knowledge 3.0 Investigate & Diagnose Problem Yes C 4.0 Workarounds & Known Errors Assessment Investigate Further? No 4.1.1 Workaround? Root Cause Known? No Yes Yes 7.1.2 Communicate Workaround/Fix 4.1.2 Record Known Error C No 5.1.2 Workaround solution? Fix Identified? No Yes Yes C No Change Needed? Yes Document Fix C C Raise RFC Change Management Process Yes Change Successful? No E 5.0 Resolve Problem No No Client Service Desk Incident Management Process 1.1 Problem Identified? Yes Problem Record exists? No Yes No 1.2.3 Link Incident Escalate Incident Priority 1 Incident? Yes Manage Major Incident A B 7.2.3 Communicate Problem Status Query Problem Status Receive Advice of Workaround /Fix Draft Ver 0.4 INS Problem Management Manual.doc Page 14 of 47

1.0 Problem Identification and Recording Scope These procedures deal with the initial identification and recording of problems. Within INS this will normally be performed by the IT Systems Support. The Problem Manager will raise a problem record for every Priority 1 incident that occurs. These incidents demand a stringent review due to the criticality of impact. All problems will be recorded in the support system so that they can be tracked, monitored, and updated throughout their life cycle. This information can then be utilised for proactive problem management, reporting, process optimisation, and planning purposes. Activity 1.1 Identification Sub Activity Description In order for Problems to be managed, they first have to be identified. There are a number of triggers for a problem record to be raised; i.e. when: matching the reported incident to existing problems and known errors is not successful during the initial incident support and classification stage and a solution is not immediately evident to the support specialist analysis of incident data reveals recurrent incidents analysis of incident data reveals incidents that are not yet matched to existing problems or known errors analysis of the IT infrastructure indicates a problem that could potentially lead to incidents notification is received from a supplier that a problem exists that needs to be resolved testing associated with systems approved for production implementation discloses issues where user requirements are not being fully met a change fails and an infrastructure related problem is identified a review of a Priority 1 incident discloses issue/s which require structural solution/s to be found Once a Problem has been identified go to 1.2 Recording 1.2 Recording 1.2.1 Basic Facts All identified problems will be formally recorded and categorised within the support system so that problem statuses can be readily determined and reported on as necessary. A problem record will be raised with an initial status of open and will be automatically be assigned a problem record number. A problem record can be raised from within a Service Request record or alternatively by selecting the Create Problem Record from the main menu. Throughout the problem life cycle, the problem will pass through Draft Ver 0.4 INS Problem Management Manual.doc Page 15 of 47

Activity Sub Activity Description a number of different states before finally being closed. The status field within the support system is used to quickly identify a problem's current state and facilitate any communications to clients. These status values are explained in more detail at Section headed Problem Status Codes. This status field will default on initial recording of the problem to open. It is important that the status field is kept up-to-date; so that all IT staff can easily determine the current state of the problem. Each problem record will be given a unique reference ID automatically by the support system. All users that have an incident appended to the problem will receive an auto email through the support system to inform them that a problem exists related to their incident and provide them with the problem ID number. This ID will be used to easily locate the correct record if the user contacts Service Desk again or uses the web portal to update information or check progress. Go to 1.2.2 1.2.2 Is the problem related to an incident escalated from Library and IT Help? 1.2.3 Recording problems associated with an escalated incident If yes go to 1.2.3 otherwise go to 1.2.4 When the problem is identified from diagnosis of a referred incident create a new problem record using the create problem record tab from within the incident record. The user information will auto populate the problem record from the service request (incident) record and each problem record that is raised from within a service request record will be automatically linked to the service request record. Descriptions should be succinct but thorough, containing all relevant details such as: WHAT is the failure? WHERE is the failure located? WHEN did the failure occur? WHY did the failure occur? HOW did the failure occur? The standard is that any other support specialist should be able to Draft Ver 0.4 INS Problem Management Manual.doc Page 16 of 47

Activity Sub Activity Description read the description, understand it and be able to progress the problem at any time during its life cycle. It is also important that any Service Desk analyst can understand the description so they can keep the user informed. Service Desk is responsible for initially obtaining all the required information from the user when recording the now appended incident. However, there may be instances when some clarification or additional information is required. In these instances, the support specialist should contact the user to obtain the information. Go to 2.0 Problem Prioritisation and Categorisation Procedure 1.2.4 Recording problems not associated with a referred incident A system menu item termed Create Problem Record should be used to create the problem record when the problem has been identified from an event such as a failed change or as a result of a detailed review of a Type 1 incident. Refer to 1.2.3 for Problem Logging Form and data field descriptions. If available, the Configuration Item (CI) identifier must be recorded in the problem record. System drop down lists will facilitate identifying CIs. Recording this information assists future analysis and reporting of where problems are occurring in the IT infrastructure. (Note: As at January 2010 the INS Configuration Management Data Base has only been partly developed). Where the initial incident is reported because of an event or an auto alert from a monitoring system, the event or alert reference number should be included within the problem record. This allows support specialists investigating the problem to identify and view the original event or alert. The IT Support Analyst must append the incident/s auto generated by the alert that initiated the problem investigation. Any further incidents reported relating to the CI problem will be appended by Service Desk. The support specialist needs to be aware of the number of incidents being appended. If the number becomes too high the problem priority may need to change. Go to 2.0 Problem Prioritisation and Categorisation Procedure Draft Ver 0.4 INS Problem Management Manual.doc Page 17 of 47

2.0 Problem Prioritisation and Categorisation Procedure Scope Problems must be appropriately prioritised and categorised so they can be handled as effectively as possible. This procedure covers: Defining the priority of the problem. Specifying the related business service Specifying the primary Configuration Item (CI) affected. Identify the appropriate 3 rd party vendor for escalation (if appropriate). Activity 2.1 Problem Classification Sub Activity 2.1.1 Complete Priority Field 2.1.2 Complete Category Fields Description This is a mandatory field and is to be completed for every problem record created. There is no support system default for this field. Each support specialist is to complete the field according to the INS approved Priority Matrix. To facilitate completion refer to section of the document titled Problem Prioritisation. The support system provides a drop down list to select Once the impact and urgency fields have been completed within the support system, the priority field will auto populate with the relevant priority. Go to 2.1.2 Before any further action can be taken the relevant category fields must be completed. The Business Service fields are to be completed using the approved list of INS Service Lines and associated Services. Refer to section of document titled Problem Categorisation. Go to 3.0 Problem Investigation & Diagnosis 3.0 Problem Investigation & Diagnosis Procedure Scope These procedures deal with the investigation of the problem and the diagnosis of the root cause. This data can then be used to help the support group assess the resources and skills required to resolve the cause of the problem. Investigation activities should include the provision of workarounds as soon as possible for the incidents related to the Problem so the end user can continue with business priorities. Draft Ver 0.4 INS Problem Management Manual.doc Page 18 of 47

Activity Sub Activity Description 3.0 3.0.1 Is the problem a priority 1? If yes go to 3.1.1, if no go to 3.1.2. 3.1 Investigation & Diagnosis 3.1.1 Major Incident Procedure Major Incidents are those for which the degree of impact on the user community is significant and at INS all typically rated as Priority 1. The Service Desk makes the initial incident rating which in turn is confirmed by the Problem Manager who is required to raise a new problem record associated with the Parent Priority 1 incident record. The relevant Support Group Team Leader will be responsible for coordinating recovery action and keeping Service Desk Team Leader informed of progress. Regular communication both within IT and with the affected users is paramount under these circumstances Service Desk Team Leader will be responsible for communication and liaison with affected users or business units. The normal investigation and diagnosis action will commence immediately with the major problem given priority over all other problems being investigated. Go to 3.1.2 3.1.2 Investigate & Diagnose Investigation will commence to try to diagnose the root cause of the problem. Once investigation commences change the problem status to Work in Progress The support specialist will then undertake a more in-depth search of the knowledge base, known error database and any other technical resource material to analyse the problem further and attempt to find a solution. If the search is unsuccessful there are a variety of sources that can be used to assist in the investigation of problems. The following are often valuable resources for this data: External vendors. Manufacturers can provide valuable information about components and infrastructure items that they are producing. They commonly produce information on known errors that either they or other organisations using the products have experienced. Suppliers can provide information about upgraded products that can be used to resolve known errors. Users of the service. Users of the service can provide additional information about how the service is affected or important Draft Ver 0.4 INS Problem Management Manual.doc Page 19 of 47

Activity Sub Activity Description background information concerning the operational requirements of the services that are affected. Forums and user groups. User groups can be an excellent source of information for solutions to problems and can save valuable and costly investigation and diagnostic time. Log files. Event logs can be analysed to provide a history of events and activities taking place at the point of failure. Development cycle. Development staff know of known errors identified during other development projects, and this information should be made available for inclusion in the knowledge base linked to the known error database. Internet. The Internet is a valuable repository of information and contains many useful sites, user communities, supplier information pages, FAQs, and a host of other sources of information, which can be used for analysis. If the root cause cannot be determined by using the above resources then the next step should be to investigate which similar parts in a similar environment are functioning properly. With this, an answer can be formulated to the question of which parts could be showing the same problem but are not. It should then be possible to search effectively for relevant differences in both situations. Furthermore, past changes, which could be the cause of these differences, should be identified. The list of differences and changes thus generated will most likely contain the cause of the problem. Attempts should be made to extract the possible causes from this list. Each possible cause should be assessed to determine whether it could be the cause of the problem's symptoms. In this way, some of the possible causes can be eliminated. The remaining possible causes should be checked to see whether they are the source of the problem. Support specialists should first address the possible causes that can be verified quickly and simply. During problem investigation, it can be beneficial to sort the collected evidence into some sort of order or timeline. What at first might seem to be a collection of unconnected events or error messages may reveal a pattern when placed in timeline order. Similarly, a group of apparently random failures occurring over a number of servers may, once sorted by server address, reveal that the problem is limited to servers on a particular network segment. The use of sorting to reveal patterns within problem evidence can be an invaluable technique during problem analysis. Progress should be regularly reviewed. In instances where Draft Ver 0.4 INS Problem Management Manual.doc Page 20 of 47

Activity Sub Activity Description sufficient progress is not occurring, the support specialist should initiate management escalation by notifying the team leader. The team leader should discuss the problem and confirm the priority, resources, and time that should be allocated to it. If at any stage the support specialist feels that the problem should now be designated a "major problem", they should contact the team leader, who then becomes responsible for making this decision. If the root cause cannot be found go to 3.1.3 If the root cause is found and the problem is diagnosed go to 4.0 3.1.3 3 rd Party Support Where the root cause of a problem cannot be diagnosed, the problem needs to be referred to a third party for further investigation, change the status to Awaiting External Response and monitor against the Underpinning Contract with the third party supplier. Once a resolution or workaround is provided by the third party supplier go to 4.0 Workarounds & Known Errors Should the third party supplier be able to assist with resolution a decision the disposition of the problem needs to be discussed with the Problem Owner. Go to 5.0 Problem Resolution Draft Ver 0.4 INS Problem Management Manual.doc Page 21 of 47

4.0 Workarounds & Known Errors Scope Finding workarounds and creating known errors are the elements within problem management that allow the problem to progress to final resolution. It covers the procedures involved in developing workarounds and creating known errors. The objective is to change IT components, systems or procedures to remove problems affecting the IT infrastructure and thus prevent any recurrence of incidents. Workarounds and known errors directly interface with and operate alongside the change management, release& deployment management & knowledge management processes. Activity Sub Activity Description 4.1 Assessment 4.1.1 Workaround Once problem investigation and diagnosis has commenced the IT Support Specialist will change the problem status to work in progress and begin the task of finding a workaround. If a workaround is possible and subsequently developed, test the workaround to ensure suitability for use (do not just assume it works!). Even when a workaround has been found, it is still important that work on a permanent resolution continues (where this is justified) to ensure the Incidents do not continue to occur. In some instances a permanent fix may be identified that can be implemented just as quickly as developing a workaround. Go to 4.1.2 If, after assessment, a workaround cannot be developed update the problem record with the relevant details and go to 4.2 Request For Change. 4.1.2 Known Error A known error is a Problem for which the root cause is known and ideally a temporary workaround or a permanent fix has been identified. At this stage the problem will move to a Known Error status and a record created in the Knowledge Base using the Post to Knowledge tab that is available within the problem record. The problem record will be linked automatically to the known error record. The Known Error record should be raised as soon as it is deemed useful to do so and certainly must be raised when diagnosis is complete and a workaround has been found (even though it may not yet be a permanent resolution). Draft Ver 0.4 INS Problem Management Manual.doc Page 22 of 47

Activity Sub Activity Description Update the known error record with a detailed description of the workaround or the permanent fix. The support system will link this to the knowledge base and automatically inform Service Desk so they can inform the affected users. In cases where a workaround is found, it is therefore important that the problem record remains open and details of the workaround are always documented within the Problem Record A workaround should contain enough detail to ensure that the workaround can be implemented successfully. It is only necessary to complete this tab field in the case where a workaround has been identified; otherwise this field should remain blank. Go to 4.2. Draft Ver 0.4 INS Problem Management Manual.doc Page 23 of 47

5.0 Problem Resolution Scope These procedures cover the steps required to resolve Problems. Problem resolution details are to be fully recorded in the support system. It is vital to save data on the configuration items (CI s), symptoms, and resolution or circumvention actions relating to all problems so as to build up the knowledge base. This data is then available for incident matching, providing guidance during further investigations on resolving and circumventing incidents, and for providing management information. Activity Sub Activity Description 5.1 Solution 5.1.1 Does the problem have a permanent solution? If no, go to 5.1.2 If yes, go to 5.1.3 5.1.2 No Permanent Solution No Fix Available: If there is no workaround available and a decision has been made to permanently live with the problem then the problem can be resolved and the support specialist should complete the following steps: Select Resolved from the problem status code drop down list (Note: Library & IT Help will be notified automatically by the support system of the resolved status Select No Fix Available from the Resolution Code drop down list and type in text that further explains how the resolution was reached. Ring the client/s and personally explain the situation giving the reason that the problem cannot proceed further. Save the resolution and exit the problem record. Draft Ver 0.4 INS Problem Management Manual.doc Page 24 of 47

Activity Sub Activity Description Workaround No Fix Available: If a workaround has been developed and is considered to provide the best permanent solution to the problem without any configuration changes required, the support specialist should complete the following steps: Ensure all related knowledge articles including the workaround details are completed and then posted to the Knowledge Base using the Post to Knowledge functionality. Select Resolve Related Service Requests from the Problem form drop down action list. This will update the related Service Requests status to resolve and send the automated resolution email to the client. Any knowledge articles will be automatically attached. Select Resolved from the problem status code drop down list (Note: Library & IT Help will be notified automatically by the support system of the resolved status. Select Workaround NFA from the Resolution Code drop down list and type in text that further explains the resolution that was reached. Communicate the workaround to the client using the Communicate Workaround button. This has the effect of automatically notifying all clients associated with incidents which are linked to the problem record. Ring the client/s and personally explain the situation giving the reason that the problem cannot proceed further. Save the resolution and exit the problem record. Refer to 7.0 Communications Go to 5.3 Perform pre close check 5.1.3 Is a change required? If no, go to 5.1.4 If yes, go to 5.1.5 Draft Ver 0.4 INS Problem Management Manual.doc Page 25 of 47

Activity Sub Activity Description 5.1.4 Permanent Solution - No changes required Fix Provided : A problem may be able to be permanently solved by implementing a fix which does not require a change to any configuration items. In these cases the support specialist should take the following steps: Select Resolved from the problem status code drop down list (Note: Library & IT Help will be notified automatically by the support system of the resolved status). Select Fix Provided from the Resolution Code drop down list and type in text that further explains how the resolution was reached. Document the fix provided in the Work Notes section of the problem record, making sure that the explanation of the fix is clear and understandable. Click on the Post Knowledge tab and a Knowledge article will be submitted for inclusion in the Knowledge Base. Save the resolution and exit the problem record. Refer to 7.0 Communications Go to 5.3 Perform pre close check 5.1.5 Raise RFC In the most likely case, when a non standard change is required, then an RFC should be raised in the INS Change Management system (i.e. outside of the support system). The problem status must then be updated to Awaiting Change Request Completion. If the change is approved by the relevant Product Service Manager go to 5.1.6 If change not approved go to 5.1.8 5.1.6 Build Change & Release Once the change is approved it is built and released according to the relevant procedures outlined in the INS Change Management Process. If the problem is resolved go to 5.1.7 If the problem is not resolved after implementing the change go to 3.1.2 Draft Ver 0.4 INS Problem Management Manual.doc Page 26 of 47

Activity Sub Activity Description 5.1.7 Permanent Solution - Changes required Fix with change/s required: In most cases it will be necessary to raise a Change to manage the implementation of a permanent solution. If this is the case then the support specialist should take the following steps: Raise a Request For Change (RFC) and forward the RFC through to the Change Management process for review and approval to proceed. (Note: the RFCs will typically be non-standard changes which are not in scope for the first implementation of the support system). Once the change has been tested and proven, select Resolved from the problem status code drop down list (Note: Library & IT Help will be notified automatically by the support system of the resolved status) Select Fix with change/s required from the Resolution Code drop down list and type in text that further explains how the resolution was reached. Manually reference the change control documentation in RFC Link in the Problem Linking area of the Problem record so that it is able to be linked back to details of change/s, testing etc. Click on the Post Knowledge tab and a Knowledge article will be submitted for inclusion in the Knowledge Base. Save the resolution and exit the problem record. Refer to 7.0 Communications Go to 5.3 Perform pre close check Draft Ver 0.4 INS Problem Management Manual.doc Page 27 of 47