TxDOT Internal Audit Report Disaster Recovery - IT Objective Determine if adequate plans and the ability to ensure critical TxDOT operations are not impacted by business interruptions to IT infrastructure. Determine whether testing, debriefs, and remediation plans have been developed and implemented. Opinion Based on the audit scope areas reviewed, control mechanisms require improvement and only partially address risk factors and exposures considered significant relative to impacting operational execution, and regulatory compliance. The organization's system of internal controls requires improvement in order to provide reasonable assurance that key goals and objectives will be achieved. Significant improvements are required to correct control gaps and mitigate residual risk that may result in potentially significant negative impacts to the organization including the achievement of the organization's business/control objectives. Overall Engagement Assessment Needs Improvement Finding 1 Finding 2 Title Disaster Recovery Plan (April 1, 2013) does not Include Sufficient Recovery Instructions for all IT Systems Outdated Technical Recovery Instructions Findings Control Design x Operating Effectiveness x Rating Needs Improvement Needs Improvement Management concurs with the above findings and prepared management action plans to address deficiencies. Internal Environment Since July 2012 services in the Texas Data Center Services (DCS) program, including disaster recovery, have been delivered through a multi-source integrated contract. Taking over provisions of services from previous service provider was completed on December 31, 2012. TxDOT IT staff was heavily involved during the transition of services. In addition, recent focus of the TxDOT IT function has been on updating and aligning internal business processes. Current management is aware of the need to re-assess IT System recovery priorities and plans for a comprehensive evaluation have been discussed. Current management is also aware that existing Disaster Recovery Plan (DRP) does not include sufficient recovery instructions for all IT Systems and is working on a solution.
Summary Results Finding Scope Area Evidence Audit work identified 318 of 397 (80%) division managed IT systems without sufficient recovery instructions in the current DRP [52 of 397 (13%) mission critical; 345 (87%) are non- critical]. 1 Disaster Recovery Planning IT Systems: 30 of 52 (58%) systems do not include sufficient recovery instructions in the existing DRP documentation. Non-critical IT Systems: 287 of 345 (83%) non-critical IT systems do not include sufficient recovery instructions in the existing DRP documentation. 2 Disaster Recovery Plan Execution and Testing Disaster Recovery Activities 4 of 4 (100%) of the Run Book updates associated with action items identified in the 2012 DR Test Exercise remain incomplete. Audit Scope The audit coverage included: Disaster recovery planning, testing and sustaining activities for TxDOT IT production systems both in and out-of-scope of the statewide data center services contracts (DCS). Limited testing was performed for systems administered by third party vendors. The audit was performed by Patti Drummer, Dennis Frazier, Justan Lopez (Co-Lead) and Karin Faltynek (Engagement Lead). The audit was conducted during the period from April 22, 2013 to July 19, 2013. Methodology The methodology(s) used to complete the objectives of this audit included the following: Multiple sources of documented information for TxDOT production servers and applications provided by the client were analyzed and compared to existing Disaster Recovery Plan documentation. The Data Application Inventory System (DAIS) was used as a primary source. Additional information was obtained through interviews with knowledgeable internal and service provider staff. 2 of 12 August 28, 2013
Records of the two most recent disaster recovery tests were reviewed and the status of identified action items was determined through the review of applicable documentation. Additional information was obtained through Interviews with knowledgeable internal staff. Data center and remote site walk-throughs and observation of on-going activities were followed up with documentation review and interviews with knowledgeable staff. These procedures were applied as necessary to perform the audit fieldwork. Background This report is prepared for the Transportation Commission, TxDOT Administration, and Management. The report presents the results of the Disaster Recovery IT Audit which was conducted as part of the Fiscal Year 2013 Audit Plan. Disaster recovery is a sub-set of business continuity. Disaster recovery is the process, policies and procedures related to pre-disaster planning. It is essential for recovery and continuation of technology infrastructure that is vital to an organization after a natural or human-induced disaster. Established key metrics for various business data recovery point objectives (RPO) and data recovery time objectives (RTO) are essential elements in disaster recovery planning. The RTOs and RPOs are generally found in the business continuity plan. Incomplete RTOs and RPOs can quickly derail a disaster recovery plan, leading to significant problems that can extend the disaster s impact. Once the recovery point and time are known, the underlying IT systems (applications and infrastructure supporting those systems) are identified and prioritized for recovery. Technical information related to the infrastructure and application interdependencies is recorded in Run Books. IT system metrics are documented in a Disaster Recovery Plan (DRP). The DRP is periodically updated and validated through DRP test exercises. DRP test exercise results are recorded in a disaster recovery test exercise issue log. Technical documentation related to issues discovered is updated to correct the deficiencies found during testing. Technical documentation is also updated on an on-going basis as a result of infrastructure changes or other related technical updates. As required by the Texas Government Code, TxDOT participates in the Texas Data Center Services (DCS) program. In 2006, TxDOT executed a 10 year interagency contract with DIR for the majority of existing IT Systems. TxDOT received permission to exclude some IT systems from DCS services. Those IT systems are referred to as outof-scope. The data in two of the out-of-scope IT systems is managed by third party service providers, the remaining are managed by TxDOT. The DCS and other third party service providers manage the IT Systems, including disaster recovery planning based on information provided by TxDOT. This information must include data, like RPO, RTO, and IT System interdependencies. While this information is generally based on comprehensive business analysis, current TxDOT IT System classification is primarily based on input from the IT System OPR. 3 of 12 August 28, 2013
We conducted this performance audit in accordance with Generally Accepted Government Auditing Standards and in conformance with the International Standards for the Professional Practice of Internal Auditing. Those standards require that we plan and perform the audit to obtain sufficient, appropriate evidence to provide a reasonable basis for our findings and conclusions based on our audit objectives. We believe that the evidence obtained provides a reasonable basis for our findings and conclusions based on our audit objectives. A defined set of control objectives was utilized to focus on operational and regulatory goals for the identified scope areas. Our audit opinion is an assessment of the health of the overall control environment based on (1) the effectiveness of the enterprise risk management activities throughout the audit period and (2) the degree to which the defined control objectives were being met. Our audit opinion is not a guarantee against operational sub-optimization or regulatory non-compliance, particularly in areas not included in the scope of this audit. 4 of 12 August 28, 2013
Detailed Findings and Management Action Plans (MAP) Finding No. 1: Disaster Recovery Plan (April 1, 2013) does not Include Sufficient Recovery Instructions for all IT Systems Condition 318 of 397 (80%) of TxDOT s division Office of Primary Responsibility (OPR) managed IT systems do not have sufficient recovery instructions in the current disaster recovery plan. IT Systems The disaster recovery plan does not include sufficient recovery instructions for 30 of 52 (58%) IT systems previously identified as agency/mission critical by the IT system s OPR. While the Data Center Services (DCS) contract includes recovery priorities for servers; the interdependencies for specific IT Systems located on those servers is not included in the Disaster Recovery Plan. Technical recovery documentation for IT systems associated with the existing DCS disaster recovery plan includes Run Books with date/time stamps 1 year or older. Third Party data center service providers were able to provide disaster recovery documentation for the Toll Operation Management and Electronic Bidding Systems. However, the disaster recovery plan for the Toll Operation Management IT System was outdated due to recent infrastructure updates. Non-critical IT Systems The existing DCS disaster recovery plan does not include 287 of 345 (83%) IT systems identified as non-critical by the IT system s OPR. Technical recovery documentation for IT systems includes Run Books with date/time stamps 1 year or older. Effect/Potential Impact TxDOT operations would be impacted by business interruptions to IT infrastructure. After a disaster, the agency would not be able to continue its essential operations. Criteria & Cause Exhibit 16 of the Data Center Services Multi-sourcing Service Integrator Master Services Agreement IT Service Continuity Management states: Service Provider shall develop, maintain and implement a comprehensive Disaster Recovery Plan (DRP) for Services provided to DIR Customers and in relation to any DIR Customer-specific DRP s in each case subject to the DIR Customer s prior review and approval. Texas Administrative Code 202, Title 1, Part 10, Subchapter B, Rule 202.24 states, State agencies shall maintain written Business Continuity Plans that address information resources so that the effects of a disaster will be minimized, and the state agency will be able either to maintain or quickly resume missioncritical functions. 5 of 12 August 28, 2013
Disaster Recovery Plans should include information that reflect IT system interdependencies, business priorities, recovery time objectives (RTO) and recovery point objectives (RPO). This information is used by the service provider to assign appropriate server service tiers, including disaster recovery priority. A process for the development and continuous update of a comprehensive disaster recovery plan is not in place. Although the DCS service provider has been provided information for non-critical systems in the past, TxDOT has not validated that this information has been included in the Disaster Recovery Plan in accordance with the contract. Efforts to create a critical systems list have been made by TxDOT staff, but a business analysis to establish IT System RTO has not yet been performed. Establishing RTO is a critical task in developing and documenting a disaster recovery plan and for transformation of servers to a consolidated data center environment. Evidence Not all existing IT systems are documented in the existing disaster recovery plan. The evidence obtained in the review included: IT Systems: Review of the Data Application Inventory System (DAIS) identified 397 production systems managed by division OPRs. Fifty-two (13%) of those systems are classified by the IT system s OPR as critical. The existing disaster recovery plan only provides information for 21 of 52 critical systems. 30 critical IT systems are not included in the existing DRP. 1 of the 52 critical IT systems, Toll Operations Management, is excluded from DCS and managed by a third party service provider. The review of the disaster recovery plan for the Toll Operations Management IT system indicates that the technical recovery documentation is out-of-date. Separate documented disaster recovery guidance for 30 critical systems does not exist. See Appendix A for a list of the 30 mission/agency critical IT systems at risk that were reviewed. Date/time stamps on existing technical recovery documentation for critical IT Systems are more than 1 year old. A process for on-going validation of existing technical recovery documentation for critical IT systems was not found. In addition, the July 2013 update of the disaster recovery plan indicates that the recovery period for 5 critical applications was downgraded due to TxDOT providing insufficient recovery instructions and description of application dependencies. Non-critical IT Systems Review of the Data Application Inventory System (DAIS) identified 397 IT systems managed by division OPRs. Three hundred forty-five (87%) of those IT systems are classified by the IT system s OPR as non-critical. The current disaster recovery plan only covers and discusses 58 (17%) of the non-critical IT systems. 6 of 12 August 28, 2013
Separate documented disaster recovery guidance for 287 non-critical IT systems does not exist. Date/time stamps on existing technical recovery documentation for non-critical IT systems are more than 1 year old. A process for on-going validation of existing technical recovery documentation for non-critical IT systems was not found. In addition, the July 2013 update of the Disaster Recovery Plan indicates that the recovery period for 20 non-critical IT applications was downgraded due to insufficient recovery instructions and description of application dependencies. Management Action Plans (MAPs): MAP Owners: Margaret Dixon, Risk & Security Strategy Manager; Jamie Hahn, Risk Analyst The following MAP activities will address the deficiencies by ensuring disaster recovery guidance, processes, and documentation are created and maintained for TxDOT s IT systems, and included in the disaster recovery plan document MAP 1.1 - IT has two transformation projects scheduled which will provide: Business evaluation of applications and systems Performance of application rationalization of the list of systems These two projects will provide necessary input to determine current system criticality. Expected outcomes of these projects include: An updated list of critical applications. The service provider, NTT DATA, was provided a preliminary list of 46 critical applications Recovery time objectives (RTO) for critical applications Priority tiers for applications Completion Date: December 15, 2013 MAP 1.2 - TxDOT will implement an on-going process to establish a quarterly review of critical Run Books: A quarterly review process of TxDOT s DR plan is currently in place. This review is conducted by Capgemini/Xerox. TxDOT will direct NTT DATA to inform Capgemini/Xerox. TxDOT will be using the same updating cycle to update the Run Books on a quarterly basis. TxDOT will review the list of critical applications upon completion of the above transformation project. TxDOT will then develop a process to update or create outstanding critical Run Books on a quarterly schedule. TxDOT will give the quarterly list to NTT who will then direct Capgemini/ Xerox to update the portion of the application s list to be updated or created. At the end of the quarter, TxDOT will review the portal on the TxDOT Department of Information Resources website to ensure the critical application s Run Books have been updated or created. 7 of 12 August 28, 2013
The contract between Capgemini and Xerox has a schedule for the creation and updating of Run Books based on Tier Service Groups listed in the Capgemini/Xerox DR Program Overview, page 22. TxDOT will conform to the contract agreement. Completion Date: June 15, 2014 MAP 1.3 - TxDOT will create and implement a process to recover non-critical applications. Completion Date: March 15, 2014 8 of 12 August 28, 2013
Finding No. 2: Outdated Technical Recovery Instructions Condition Run Books are out of date and do not reflect current disaster recovery operations. Effect/Potential Impact Continuation of business processes reliant on IT system components required to be functional would be delayed or result in an unsuccessful recovery of the targeted IT systems. Criteria & Cause Exhibit 16 of the Data Center Services Multi-sourcing Service Integrator Master Services Agreement Disaster Recovery Testing states: Service Provider will implement and track corrective actions until resolved. An on-going process to validate Run Books is not in place. Evidence 4 of 4 (100%) required updates to associated Run Books were not completed. A review of Run Books for the mainframe applications testing during the Oct 2012 Disaster Recovery (DR) exercise indicates that issues identified during this test have not been updated in the Run Books. Management Action Plan (MAP): MAP Owners: Margaret Dixon, Risk & Security Strategy Manager Jamie Hahn, Risk Analyst MAP 2.1 - The MAP owners agree the run books need to be updated and kept current. Creating and maintaining the Run Books is performed by Capgemini/Xerox with TxDOT s input. There are four application s Run Books which require updating: TPX, ADABAS, Enterprise Extender and CTC Adaptors. The fifth application, Websphere, is a Dept. of Motor Vehicle issue, and is not the responsibility of TxDOT as noted in the Issue column of the document. o TxDOT will direct NTT DATA to contact Capgemini/Xerox to affect the necessary updates identified during the 2012 DR test. The updates will be reflected in the datacenter portal documentation. o TxDOT will request version control and the name or title be added to the Run Book documentation. o TxDOT will notify TxDMV of their potential risk regarding Websphere. Completion Date: November 15, 2013 9 of 12 August 28, 2013
Summary Results Based on Enterprise Risk Management Framework Closing Comments The results of this audit were discussed with Information Technology Division management and staff. We appreciate the assistance and cooperation received from the TxDOT IT Organization contacted during this audit. 10 of 12 August 28, 2013
Appendix Table 1 System Name Active Directory Non-Mainframe Agency/ Systems as of May 2013 System Description An implementation of LDAP directory services by Microsoft for use in Windows environments. Assigned Priority Agency Advanced Traffic Management System Provides the ability to manage traffic through the use of cameras and automated signs. BAMS - Decision Support System Used for the analysis of transportation construction project data. BAMS-DSS BAMS client-server Central Authorization and Authentication System (CAAS) is a front-end system that manages access to TxDOT applications. Agency Comprehensive Occupational Safety Management Optimized System Crash Records Information System Crash Reporting and Analysis for Safer Highways Document Tracking System Electronic Bidding System Electronic Grants HR Online Intelligent Transportation System Tracks claims, produces reports, letters, payment vouchers, contracts, releases, and spreadsheets. Collects and disseminates crash information for the Department of Public Safety (DPS) and the Texas Department of Transportation (TxDOT). Used to transfer of motor vehicle crash data from law enforcement agencies to the Crash Records Information System (CRIS). Internal and External TxDOT Document/Email/Phone Request Tracking System from any source, used daily by DDOR''s The Electronic Bidding System (EBS) permits electronic submission of digitally signed bids by qualified vendors. Processes and stores all transactions related to processing and accounting for federal/state grants available through TxDOT. (HR Online) is an application that uses PeopleSoft software to manage TxDOT employee information. Used to monitor traffic flows on major freeways. Agency LoadRunner Used for examining system behavior and performance. Lonestar Statewide Advanced Traffic Management System (ATMS) Memorial Sign Project MicroStrategy Intelligence Server Application for crash survivors to purchase memorial signs placed by districts. Texas Register Required. Production since 2/18/2004. Provides the core analytical processing and job management for all reporting, analysis and monitoring applications. 11 of 12 August 28, 2013
System Name Novell edirectory PONTEX Rail & Bridge Funding Prioritization System Description Centrally manages access to resources on multiple servers and computers within a given network. Stores complete bridge inventory and inspection data Used for prioritizing federal, state, and private fund allocation for bridge construction and highway-rail crossing construction including safety controls. Assigned Priority Agency Rail Hotline SiteManager Used for real time tracking/documentation and on-site action by federal rail inspectors in RRD. The application includes the two subsystems Site Manager Financial Interface (SMFI) and Site Manager Interface Controller (SMIC). Agency SPEEDZONE Used speed zone detail production. State HazMat Call Log Used for tracking and recording all HazMat calls from across the state and how the call was handled. Taskmaster Used to support crash report scan activities for Crash Records Information System (CRIS). Texas Maintenance Assessment Program A computer application used by TxDOT to satisfy the requirements of the Government Accounting Standards Board Texas Rail Information Management System Texas Traffic Operations Assessment Program Toxicology TRF Enterprise Document Management System (GASB) Statement 34. Manage all railroad-related projects and project information including crossing upgrade projects and construction projects that involve the railroad. Assessment of traffic control devices in each district for the purpose of evaluating and enhancing the safety of highways. Stores Medical Examiner/Coroners records, death certificates, cause of death event sequence hierarchy, and integration with state and federal systems. Tracks documents related to Traffic Operations Division business operations, such as consultant contract and administrative documents. 12 of 12 August 28, 2013