To: LASERS Audit Committee; Cindy Rougeou, Executive Director Cc: Maris LeBlanc, Deputy Director; Lance Armstrong, IT Director; Dan Bowden, IT Deputy Director From: Ryan Babin, Audit Director; Blake Lee, Auditor Date: April 9, 2013 Subject: IT Process Review - Disaster Recovery Testing Process Audit Services purpose of this engagement was to provide consultation and review LASERS Information Technology s changes to the disaster recovery testing process. IT s self-assessment of this process included evaluating the current risks, controls and weaknesses. Furthermore, IT outlined ways to strengthen identified weaknesses and develop an accountability structure within the process. Project Background and Summary The review consisted of IT performing a self-assessment of the disaster recovery testing process. Key elements of this process review included: Identifying the current procedures for the process. Evaluating the current process for weakness and risks. Identifying methods of mitigating the identified risks. Evaluating the methods and choosing the best available control option(s). Integrating the new methods into the current process. Developing a report containing key conclusions. This was the first process review completed by IT in conjunction with Audit Services. Audit s role was to provide guidance on performing a self-assessment by consulting and providing feedback during the process, as well as on the final product. IT recognized the value of maintaining current procedure documentation which reduces the difficulty and work load associated with performing a process review. It is difficult to accurately assess improvements to a process when the current process is not fully documented. Additionally, documented procedures help with the succession planning and the passing of knowledge to other IT staff. During this review, IT developed methods to increase their accountability, maintain procedure documentation, and improve the way selfassessments are performed by IT staff. In our opinion, there was an increase in IT s understanding of how to perform a selfassessment. As IT continues to evaluate their processes, an understanding of how to perform a self-assessment will grow and efficiencies will be recognized. The methodology to
perform self-assessments is a new area for IT and Audit Services will continue to work closely with IT to help improve the performance of future process reviews. Conclusion For the first review, there were various improvements made to the disaster recovery testing process. Audit Services agrees with the observations from the IT Division s self-assessment of the disaster recovery testing process which are included in the attached Disaster Recovery Testing Process Review report. Audit Services will be able to evaluate the value of these changes in future disaster recovery testing exercises. As this process becomes utilized and re-assessed, there is the potential for items that were not previously identified to surface. At that point, the current procedure manual would serve as a starting point for IT to initiate another self-assessment and make recommendations for those future items. 2
Disaster Recovery Testing Process Review A. Description and Scope The current disaster recovery process has been analyzed by following the entire process during the current test from start to completion. When analyzing the process, the evaluation of a risk was performed by the testers and project lead. Any suggestions for changes were discussed with IT Management which decided what was to be implemented. The process and documentation were analyzed for the recovery of SOLARIS, printing checks, virtual private network, and limited functionality of Microsoft Exchange. B. Observations and Responses Observation #1: When working through the current disaster recovery solution, it was determined that changes were added to documentation and processes were varied from. Also, there was no formal reporting mechanism for communication to the DR Manager when the disaster recovery process was last performed. Created a series of checklists that ensure that the process is followed in the same way each time and that necessary communication to the manager is completed timely and appropriately before moving on to the next phase of recovery. See Appendix A, Section B Observation #2: After doing a data comparison after the test, it was found that the restore of the information was more current than the sample data that was taken for comparison. This was a positive consequence because we restored more current data than was expected and this would have been the best scenario in a real disaster. However, the data did differ from the comparison data because it was more current. The checklist will be able to address these issues by ensuring that the standard of the test is met. An issues tracker was also created for the recovery. This issues tracker document allows issues like the sample data needing to be taken from the latest date of the backup to ensure correct comparison and accurate designation of the success of the recovery. See Appendix A, Section B Observation #3: The tapes that were shipped were not all correctly logged which could have caused issues with the recovery.
The tape logging process was updated to ensure that it is accurate and that all tapes are clearly logged locally and at the Iron Mountain website before any containers are shipped out. Observation #4: The documentation for recovery was updated, but there was no formal check of this process to ensure that the documentation was sufficient for the recovery. If there were issues during the test, there was no way of formally documenting them for the next test to ensure that if they were encountered again they could be corrected quickly or avoided before the test. Checklist and flowcharts of the process were created to ensure that documentation is updated and correct and an update process is in place to ensure that the information gathered from the testing goes into the DR recovery document. These also coordinate the test itself better and ensure all necessary items have been met before a test is begun. See Appendix A, Section A Observation #5: The test recovery time objective is not adequate. The test itself was too long to be done in the required 24 hours in its current form. New snapshotting technology which takes an image of the servers involved was implemented to dramatically reduce the time it took to restore the environment. Continued improvement is being made to enhance the snapshot capability to further reduce the time for recovery. Observation #6: The recovery time objective for this test is unrealistic for a real world disaster. The 24 hour restore time for Solaris would not be legitimately accomplished due to the need for personnel to be flown to the DR site and the site reserved. Remote connectivity options were tested in the last DR test and worked well. This option needs to be implemented in the future tests to ensure that the time for the restore is actually what is recommended. However, even in this scenario, tapes would have to be shipped out before recovery is possible and this would take over 24 hours. Options have been researched in moving toward a hot site scenario in Bossier City that allows for a remote replication and local storage of data for a much faster recovery time as well as options to do multiple recovery tests throughout the year. IT Disaster Recovery Process Review Page 2 of 4
C. Conclusion After reviewing the current DR process, a new process document was created and outlines each area and sets a formal path for all reporting and updating to ensure consistency. Also, a series of checklists were created in order to ensure that all necessary criteria has been met by the testers. An issues tracker document was added to document all problems and solutions associated with the testing. All of these forms and processes are now referenced in the Disaster Recovery Testing Manual. As IT continues to perform annual disaster recovery testing, the documents that were created will be utilized and adjusted to continue the transfer of knowledge and increase efficiency of the DR process. IT Disaster Recovery Process Review Page 3 of 4
Appendix A Section A Disaster Recovery Manual Section B Disaster Recovery Checklists, Risk Assesments and Issues Tracker Section C Disaster Recovery Testing Procedure Manual IT Disaster Recovery Process Review Page 4 of 4