ITD BACKUP MANAGEMENT PROCEDURE PURPOSE The purpose of this document is to ensure that effective and efficient processes are in place to identify and safeguard data required to run and support applications in the IT portfolio. The duties and responsibility of all parties involved in the maintenance of UL backup routines are identified and defined. The procedure has been broken down to three key activities: 1. Adding or removing a server to/from the backup schedule 2. Adding or removing content from a backup 3. Operational backup checks The steps and responsibilities for these activities are as follows. 1. ADDING OR REMOVING A SERVER TO/FROM THE BACKUP SCHEDULE Step Details Responsible 1. If a server needs to be added or removed to/from the backup routine the SERVER BACKUP INCLUSION/ EXCLUSION FORM- APPENDIX 1 must be completed and submitted to the by raising a ticket in RMS. 2. Where a server is being removed from the backup schedule permanently, approval for the removal must be granted by the Deputy Director ITD. 3. The change to the backup routine should be completed within five working days from the date of receiving the completed form from the. For additions to the backup schedule, the new backup must be closely monitored for continued success for a period of no less than 3 days. 4. Once actioned the INCLUSION/EXCLUSION form will be uploaded to SharePoint. A communication must be sent to the indicating that the server has either been successfully added to or removed from the backup schedule. 5. The change to the backup schedule must be reflected in all regular reports (see section 3 below) that are distributed to s. 6. The regular backup reports must be checked to ensure that the required change to the backup schedule has been made and that the backup is completing successfully if it is an addition. Page 1 of 10 Rev. 5
7. The should be contacted if failures are occurring in any backups in order to troubleshoot and rectify these failures. This should take the form of a ticket in RMS. 8. The will assist the to troubleshoot and resolve the issue as quickly as possible. & 2. ADDING OR REMOVING CONTENT FROM A BACKUP Step Details Responsible 1. The form APPENDIX 3: BACKUP REVIEW FORM must be used to add or remove any backup locations from the application server to keep backups current. The form must be sent to the by raising a ticket in RMS. 2. The change to the backup routine should be completed within five working days from the date of receiving the completed form from the. The data components backed up on each server will be adjusted accordingly based on the forms submitted. 3. A communication must be sent to the indicating that the change to the backup of the server has been completed successfully. 4. The regular backup reports must be checked to ensure that the change to the backup of the server is included in the backup schedule and that the backup is completing successfully. Page 2 of 10 Rev. 5
3. OPERATIONAL BACKUP CHECKS Step Details 1. Reports must be distributed to the s as follows: Daily Job Summary Report Monthly Content Report 2. s should ensure that the server hosting the application they manage is being backed up successfully by checking the Daily Report. 3. The Monthly Content report must be checked every month to ensure that what is being backed up on the server they manage is correct. 4. s must also ensure that what is being backed up on the server is sufficient to implement a full restore of the application to operating capacity if a complete hardware failure and data loss occurred 5. All backup equipment must be monitored to ensure it is working effectively including the loading, rotating, removal and storage of tape media in line with the University data retention policy and ITD s Server Management Procedure 6. Check the backup process on a daily basis to monitor for any global failures or system faults (eg space running out, mount path issues, large number of backup jobs failing). 7. An email distribution list will be maintained of all s. In the case of any errors the s will be sent a communication via email identifying that an issue has occurred and is being investigated. When the issue is resolved the backup will again email the distribution list to notify what was the nature of the issue and to notify that it has been resolved. 8. The should be contacted if failures are occurring in any backups in order to troubleshoot and rectify these failures by raising a ticket in RMS. 9. On receipt of verbal or written notification from an relating to a backup failure or issue they are unable to resolve, the backup will assist the to troubleshoot and resolve the issue within one working day. 10. If backup issues and failures cannot be resolved, s should escalate the issue to their Section Head so that there is awareness that a system is at risk. Responsible Page 3 of 10 Rev. 5
11. Section Heads should raise the issue with the Head of Technology Solutions. If the issue can still not be resolved it should be further escalated to the Deputy Director and raised as a Problem Call at the ITD Management Meeting 12. The will also assist s to carry out test or production data restores on request. Production data restores will most likely need to be carried out on an urgent basis (especially for P1 systems). Test data restores should be arranged with a minimum of 4 days notice (enforceable at the discretion of the ). A test restore can be carried out at the time of a real production restore, but a period of no more than 12 months should elapse between restores. The test restore should be documented using APPENDIX 2: APPLICATION TEST RESTORE FORM and uploaded to the relevant SharePoint site indicated on the Form 13. The will maintain an archive in their email client of the daily backup reports sent from the CommVault application Section Heads & Page 4 of 10 Rev. 5
DEFINITIONS For the purposes of this document, the terms,, and Server are defined below: An is defined as the person responsible for the running of an application or maintaining data on a server, under this procedure their responsibilities are outlined in the above sections. Note: Where ITD facilitates the backup of an application where the management of that application is outside of ITD (eg RIS, Kinetics etc ) either: a) A person within ITD will be assigned to check the success or failure of backups on a daily basis and carry out appropriate liaisons to resolve any issues ;or b) The in the external division will be given access to the daily backup reports to allow them to take on the role of. The adopted arrangement must be clearly defined in an SLA with the external department. The is responsible for the effective operation of the institution backup processes as outlined above and is also responsible for: Setting and monitoring appropriate tape retention periods as per the University data retention policy Managing the physical hardware and software configurations including tape loading and replacement as required to maintain effective backup routines, including tape drive cleaning and maintenance Ensure adequate and timely backup software licences and support contracts are maintained to maintain services Tape storage and retrieval in line with data retention guidelines Monitor new and emerging trends to ensure UL s backup strategy is up to date and effective, this may include suggesting product enhancements or changes related to the current or new systems. Server A "server" is defined as either a physical or virtual computer system that is connected to the UL campus network and provides services to multiple individuals at the same time. Page 5 of 10 Rev. 5
RECORDS records are held in CommVault and can be made available on request Copies of emailed reports are maintained by the Copies of the completed forms used in this procedure are held in SharePoint at the links below: APPENDIX 1- SERVER BACKUP INCLUSION/ EXCLUSION FORM APPENDIX 2: APPLICATION TEST RESTORE FORM APPENDIX 3: BACKUP REVIEW FORM PROCESS VERIFICATION Evaluation of process effectiveness is carried out using Internal / External Quality Audits and the Corrective Action process where the procedure itself is found to be the source of the problem under investigation. Page 6 of 10 Rev. 5
APPENDIX 1: SERVER BACKUP INCLUSION/ EXCLUSION FORM SECTION: DATE COMPLETED: Server Name (s) on server Name of person responsible for ADD or REMOVE from backup SERVER REMOVAL REQUIRES APPROVAL BY DEPUTY DIRECTOR ITD DEPUTY DIRECTOR ITD: DATE: Page 7 of 10 Rev. 5
APPENDIX 2: APPLICATION TEST RESTORE FORM Name of Server: Your Name: Name of : Date: Type of Restore Live Production Restore Test Restore Restored to Test/ Dev Server Production Server Location of data restored (ie path name from production server) Was the restore from tape or disk? Approximate size of data restored in Megabytes? Was the restore Successful? Was the data / tape easy to locate?- if not please give details Any there comments Any lessons learnt or things that would be done differently next time? Details Drive:\..\..\ Tape/ Disk MB YES/ NO YES/ NO (details) Page 8 of 10 Rev. 5
Name of Server: Your Name: Name of : APPENDIX 3: BACKUP REVIEW FORM General requests for changes to backup route (eg frequency, backup type etc) Implemente d ( Y/N) List of locations to ADD to the backup process Notes Comments Implemented ( Y/N) List of locations to REMOVE the backup process Other Comments Notes Comments Implemented ( Y/N) GREY SECTIONS TO BE COMPLETED BY BACKUP MANAGER Page 9 of 10 Rev. 5
Revision No. Date Approved by: Details of Change 1 10 April 2015 Ian McKenzie New Document 2 01 July 2015 Ian McKenzie External application management added in Definition 3 20 August 2015 4 16 September 2015 5 30 September 2015 Brendan Dore Brendan Dore Brendan Dore Restructured the document based on key activities. Included timelines for completion of key steps. Amended the definitions of application manager & backup manager Incorporated feedback received from the ITD Management team: Removal of server backups requires signoff by Deputy Director ITD Addition of a server backup should be closely monitored by the for at least 3 days Reordered purpose statement to put first focus on protection of data Included RMS as a means of raising issues to the. Page 10 of 10 Rev. 5