The Role of Internal Controls to achieve Highly Available Systems Donna M. Manley, MBA IT Senior Director, Computer Operations University of Pennsylvania
Data Center Certifications: ISO 9001:2008 ISO 9001 certification demonstrates an organization s compliance to the ISO 9001:2008, a set of guidelines developed by the International Organization for Standardization. These standards outline a philosophy of quality management. Obtaining certification validates that the operational practices in place, when applied correctly, will yield error-free services and result in high levels of customer satisfaction. We are currently certified in these operational areas: Command Center Operations, Monitoring of Systems, Devices, Hardware and Applications, Customer Service, Special Services (Asset, Incident, Problem and Change Management), and Data Center Facilities Management, and are the only Ivy League Data Center to hold such a certification. Compliance: FISMA NIST 800-53 Standard Issued by the National Institute of Standards and Technology (NIST) and the US Department of Commerce this standard covers the steps in the Risk Management Framework that address security control selection for federal information systems, in our case, the Veteran s Administration systems. The security rules cover 17 areas including access control, incident response, business continuity, and disaster recoverability. The standard is based on worst-case impact analysis, baseline security controls, and supplemental security controls tied to the assessment of risk. HIPAA HIPAA Compliance rules can be interpreted in a number of ways, and apply to Business Associates (any company that comes in contact with electronic protected health information [e-phi]). Based on the the U.S. Department of Health and Human Services, individuals, organizations and agencies that meet the definition of a Covered Entity under HIPAA must comply with the HIPAA security requirements to protect the privacy and security of health information. PCI Although there are currently no managed services in the Data Center that require PCI compliance, we have insured that our physical Data Center security meets the PCI standard. The PCI Data Security Standards standards consist of 12 significant requirements and directives against which businesses may measure their own payment card security policies, procedures and guidelines.
Highly Available Systems call for Strong Internal Controls Methodology ITIL, COBIT, Six Sigma, LEAN Internal Audit Quality Measurements Basic and Advanced Automation Centralized, Secured Documentation Legacy skills to highly skilled p.s. It doesn t happen overnight!
Here s what it looked like in 2006 December 2004 Assessment Begins June 2005 Organizational Restructuring Continue maturing Incident/Problem Mgt program April 2005 Begin Preparation August 2005 Incident Management in Production Final organization restructuring January 2006 Change Planning Begins Continue maturing Change Program November 2006 Performance Mgr December 2006 Multiple Change Processes June 2007 Data Center Lock Down Mainframe Ph 2 Auto March 2006 1 st Change Process in Production. Aug/Sept 2006 Staff attains ITIL Foundations Cert Mainframe Phase 1 Automation in place March/April 2007 Event Management Automation in Production Matured remote console solution in place 4
The Program continues to mature through 2007 December 2004 Assessment Begins June 2005 Organizational Restructuring Continue maturing Incident/Problem Mgt program April 2005 Begin Preparation August 2005 Incident Management in Production Final organization restructuring January 2006 Change Planning Begins Continue maturing Change Program November 2006 Performance Mgr December 2006 Multiple Change Processes June 2007 Remedy Integration/EM Data Center Lock Down March 2006 1 st Change Process in Production. Aug/Sept 2006 Staff attains ITIL Foundations Cert Mainframe Phase 1 Automation in place March/April 2007 Event Management Automation in Production Matured remote console solution in place Change Advisory Board (CAB) Asset Management Tool Evaluations New Asset Management Tool in Production July 2007 Remedy Integration/Steady State EM Steady State Re-assess EM Architecture CCO QA Processes in place August 2007 ISO Steering Committee in place Additional KMs into production New CCO job descriptions (Development) 5 September 2007 Additional KMs into production December 2007 (approx) (Planned) 1st Pre-ISO Audit Post Audit Remediation complete Additional KMs into production ITIL Cert for additional staff
Certification is achieved in 2008 Continue maturing Incident/Problem Mgt and Change Management program August 2007 ISO Steering Committee in place Additional KMs into production New CCO job descriptions (Development) March/April 2007 Event Management Automation in Production Matured remote console solution in place Change Advisory Board (CAB) Asset Management Tool Evaluations New Asset Management Tool in Production June 2007 Remedy Integration/EM Data Center Lock Down July 2007 Remedy Integration/Steady State EM Steady State Re-assess EM Architecture CCO QA Processes in place December 2007 Post Audit Remediation complete Plan ITIL Cert for additional staff Enterprise Monitoring Mainview Development Inception of Quality Management (QMS) Pgm QMS Process Identification QMS Manual Development Begins January 2008 Staff Auditor Certification Complete QMS Documentation and Process Gap Analysis Enterprise Monitoring Portal Upgrade Remedy 7 Upgrade Preparation Begins February 2008 Remediate Gap Analysis Findings Enterprise Monitoring TSM KM in development TPC Agent Upgrades March 2008 1 st Internal QMS Audit Remediate Internal QMS Audit Findings SAN (TPC) events through Pennscope Web App monitoring via Nagios/Pennscope April 2008 TSM KM in production Enterprise Monitoring BEM 5.1 out of support Disaster Recovery Exercise 6 May/June 2008 Fiscal Year End processing Continuous Improvement Initiatives Formal certification recommended 7/08
Sustaining the Certification Effort 2009 2010 Continue maturing Incident, Problem, Asset and Change Management program Continue maturing Incident, Problem, Asset and Change Management program Remedy rollout to FM Clients Backup Quality Manager identified Pennscope Virtualized Full Data Center Shutdown SOMIS clients move into Data Center Data Center Modernization analysis Eliminate paper requests between CCO and AIT Quality Council Training Database Assessment Major Data Center Power outages July and September Remedy 7 Upgrade PWC Audit Replacement of V2X with DS6800/mirror CMDB Installation begins Virtual Tape Implementation Begins Implement Defect Tracking Implement Solutions DB (post u/g) SSL Certificate Renewal via Remedy PWC Audit Data Center Modernization Analysis/Biz Case 2011 Continue maturing Incident, Problem, Asset and Change Management program FISMA compliance (NIST800-53) granted Risk Assessment methodology applied across projects ADABASE, TSM, Sharepoint Support moves to Operations Online reporting initiative to eliminate print VTL installation continues Formal Service Catalog 7 SOMIS/3440 Relocation Data Center Modernization Biz Case/Exec presentation Repurpose Command Center/Relo Command Center Mainframe CICS Automation RFID Technologies (Start)
Sustaining the Certification Effort 2012 2013 2014 Continue maturing Incident, Problem, Asset, Configuration, Change Management program FISMA compliance (NIST800-53) - Maintain Project Management following PMI Methodology Disaster Recovery moves to Operations (9/2012) Online Reporting - Phase II VTL - Phase II Zena replaces Zeke TSM Upgrade/Support moves to Operations CMDB Trial Install (under Pennscope) Increased SNMP Automation/Predictive Analytics Storage Management Automation (Hitachi Monitor) PWC Audit Continue maturing Incident, Problem, Asset, Configuration, Change Management program Remedy 7 Upgrade (End) CMDB in Production (Dependent on R7 Upgrade Data Center Modernization Execution (cont d) Data Center Automation (DCIM) FISMA compliance (NIST800-53) Maintain ISO 9001:2008 Maintain/Recertification Continue maturing Incident, Problem, Asset, Configuration, Change Management program Data Center Modernization Execution (End/Maintain) ISO 20000 Certification ISO 9001:2008 Maintain/Recertification FISMA compliance (NIST800-53) Maintain 8 Data Center Modernization Trustee Approval/Execution Remedy 7 upgrade (Start) Implement Discovery Tool (TADDM) RFID Technologies (End/Maintain) PWC Audit ISO 9001:2008 Maintain/Surveillance SSL/Certificate Management moves to Operations Sprint Mobile Wireless Management moves to Operations OLAs with Facilities and other 3 rd party providers Scanning Service sunset ISO 20000 Preparation Increased SNMP Automation/Predictive Analytics Virtual Storage Initiative Service Impact Manager (Dep upon R7 Upgrade/CMDB) PWC Audit Virtual Command Center Additional technology initiatives TBD PWC Audit
Traditional Structure Can No Longer Sustain an Organization
The New Tradition Certified Lead Auditor Linux, Windows Certified Certified Lead Auditor Certified Project Resources for Internal Initiatives 100% of the staff has ITIL Foundations education; 75% of the staff has achieved ITIL Foundations Certification
Traditional Staff Roles can no longer sustain an organization Maintain the administrative mainframe and related servers in the secure environment of the administrative computer room. Have a working knowledge of fire, water detection systems, networking, and other systems housed in the computer room. Be attentive to customer requests, participate in Business Continuity drills and other training. Be a punctual, dependable member of the operations team.
The New Tradition The Command Center Analyst is responsible for observing, controlling and analyzing the computer systems and peripheral equipment under the Command Center domain for the purpose of uninterrupted data processing, operating runs, and batch program jobs. This includes monitoring system tools for errors, failures, network malfunctions, data center security and environmental disruptions. The Command Center Analyst is also responsible for diagnosing problems based on his or her findings, and applying proven analytical and problem-solving skills to help identify and resolve malfunctions in support of system or network recovery. The Analyst must have the ability to work in conjunction with fellow Analysts in a team environment, and work with clients to meet or exceed expectations.
Traditional Staff Skill sets can no longer sustain an organization Certifications Multi-platform Business Understanding Strong Analysis Culture of acceptance (Automation) Extended Peer Network
Creating a New Tradition Breaking down Silos Multi-discipline Automate commodity services Maximizing resource utilization Discovering all asset types Contain costs while delivering similar or better service levels. Exploit alternative education methods
Process Re-engineering Workflow Redesign Metric Capture and Analysis Quality Standards Focus on Security, Compliance, Risk Mitigation Understanding Interdependencies Validating Customer Expectations
1.5 FTE of time recovered!
Tools and Process Must Work in Concert to Maximize Effectiveness Disaster Recovery Test Preparation and Set Up
DR Test Preparation Vital Records DR Test Preparation Customer Service DR Test Preparation Bank Notifications
DR Test Preparation DR Team Leader Test Preparation Post DR Exercise Activity
When people, automation, a culture of change, and simplicity successfully converge Internal Controls Naturally Emerge Let s Face it.customers Have Always Expected Something More from IT
Thank you for allowing me to share my thoughts with you today! Donna M. Manley, MBA IT Sr. Director, Computer Operations ITIL V3 Foundations Certified University of Pennsylvania manleydm@isc.upenn.edu 22