Testing Procedures & Recovery Plans for CIP Compliance DECEMBER 16, 2009 Developed with:
Presenters Bart Thielbar, CISA Senior Research hanalyst Sierra Energy Group, a Division of Energy Central Primer on CIP 007, R1, CIP 008 & CIP 009 Kim Morris Director, Architecture and Information Security PNM Resources Testing Procedures & Recovery Plans 2
Testing Procedures & Recovery Plans Primer on CIP 007, R1, CIP 008 &C CIP 009 BART THIELBAR, CISA SENIOR RESEARCH ANALYST
Disclaimer The information from this webcast is provided for informational purposes only. An entity's adherence to the examples contained within this presentation does not constitute compliance with the NERC Compliance Monitoring and Enforcement Program ("CMEP") requirements, NERC Critical Infrastructure Protection ("CIP") CIP) Reliability Standards, or any other NERC Reliability Standards or rules. While the information included in this material may provide some of the methodology that NERC has elected to use to assess compliance with the requirements of the Reliability Standard, this material should not be treated as a substitute for the Reliability Standard or viewed as additional Reliability Standard requirements. In all cases, the entity should rely on the language contained in the Reliability Standard itself, and not on the language contained in this presentation, to determine compliance with the CIP Reliability Standards. 4
Agenda Purpose, Applicable CIP Standards Testing Procedures Incident Reporting Recovery Plans Audit Trail Recommendations 5
Why are Testing Procedures and Recovery Plans Important? Managerial visibility and organizational control/confidence Risk Management Proper testing Incident preparation Practice recovery Part of good control framework NERC s Interests = Best Interests of Reliability 6
Physical & Electronic Access Controls CIP 007, R1 - Test Procedures R1 Test Procedures Responsible Entity to ensure that new Cyber Assets and modifications to existing CAs do not adversely affect cyber security controls R1.1 Test Procedures R1.2 Document that testing is performed in manner that reflects production environment R1.3 Document test results CIP 008 Incident Reporting and Response Planning R1 Develop and maintain cyber security incident response plan R1.1 Procedures to classify events as reportable Cyber Security Incidents R1.2 Response actions R1.3 Process for reporting to Electricity Sector Information and Analysis Center (ES ISAC) R1.4 Updating plan within 90 days of changes R1.5 Process for reviewing at least annually R1.6 Process for testing at least annually R2 Cyber Security Incident Documentation CIP 009 Recovery Plans for Critical Assets R1 Recovery Plans -- create and annually review recovery plans for CCAs R1.1 Specify actions in response to events R1.2 Define roles and responsibilities of responders R2 Exercises -- recovery plan to be exercised at least annually R3 Change control recovery plan to be updated to reflect ect changes or lessons s learned R4 Backup and Restore include processes and procedures for backup and storage of information required to restore CCAs R5 Testing Backup Media Essential information stored on backup to be tested to ensure availability 7
Schedule for Table 3 Entities Requirement Begin Work Substantially Compliant Compliant Auditably Compliant CIP 007, R1 12/31/06 12/31/08 12/31/09 12/31/10 CIP 008 12/31/06 12/31/08 12/31/09 12/31/10 CIP 009 12/31/06 12/31/08 12/31/09 12/31/10 8
Testing Why: Ensure that new cyber assets and/or changes to cyber assets (not just CCA) do not compromise CCA serving Bulk Electric System Significant Change: Security patches, service packs, vendor releases/upgrades (including operating systems, applications, databases, etc.) Environment: Test Environment is very important must reflect the production environment Documentation: Test results must be documented (good, bad or neutral) All consistent with generally accepted best practices for testing procedures 9
Why Incident Reporting & Response Planning Matters Reliability of the Bulk Electric System Potential blackouts Criminal element/malicious behavior Relationship with other events that may occur Preserving order/avoiding chaos Evidence preservation Asset preservation Guidance/direction for industry and employees NERC s Interests = Best Interests of Reliability 10
Incident Reporting and Response Planning Procedures to classify events as Cyber Security Incidents Response actions including roles and responsibilities i Process for reporting to Electricity ysector Information Sharing and Analysis Center (ES ISAC) Process for updating plan within 90 days of any changes Process for ensuring that Cyber Security Incident Response Plan is reviewed at least annually Process for ensuring the Cyber Incident response plan is tested at least annually. Possibilities: Paper Drill; Operational Exercises; Actual Incident 11
Incident Reporting Documentation Documentation relevant to Cyber Security Incidents must be kept for three calendar years Process Reports Evidence (Logs) 12
Recovery Plans Actions in response to events Roles and responsibilities Exercise and test the plan at least annually Document changes to plan and communicate to appropriate people within 90 days Include backup and restore processes and procedures necessary to restore CCAs Test the backup media at least annually 13
Roles and Responsibilities (Considerations) Management Support Documentation Empowerment Actions Specified to Events Event Duration and Severity 14
Testing and Maintaining The Plan Exercised at least annually Paper Drill Operational Exercise Actual Event Updates to reflect lessons learned Communicated to appropriate personnel within 90 days 15
Backup and Restore Process, procedures and Instructions for restoration of CCAs Test the restoration, not just the backup Tapes/Media stored offsite and protected from disaster Necessary equipment and software Can be completed offsite Remember/be aware of firmware and OS impacts 16
Possible Penalties and Sanctions Up to $1 M per day, per violation Violation Severity (level of non compliance) Violation Risk Factors Mitigating factors may reduce penalties and sanctions Quality of compliance program, self-reporting, voluntary corrective actions, etc. Aggravating factors may increase penalties and sanctions Repeat violations, evasion, inaction, unwarranted intentional violations based on economic choice, etc. May potentially impact reputation, rate cases, etc. 17
Audit Trail Considerations Measures associated with requirements measure the documentation Think like a security person and then like an auditor What would YOU want to know if you personally had to manage restoration Documentation, documentation, documentation 18
Final Thoughts Testing Procedures and Recovery Plans = Good Security, Governance and drisk kmanagement Asset Integrity Helpful to Personnel Promote the best interests of reliability NERC s Interests = Best Interests of Reliability 19
Testing Procedures & Recovery Plans KIM MORRIS DIRECTOR, ARCHITECTURE AND INFORMATION SECURITY
Agenda Introductions Applicable CIP Standards Incident Response Disaster Recovery Testing Media Training Questions 21
Based in Albuquerque, N.M., PNM Resources is an energy holding company with 2008 consolidated operating revenues from continuing and discontinued operations of $2.5 billion. Through its utilities - PNM and TNMP - and energy subsidiary - First Choice Power - PNM Resources serves electricity to 859,000 homes and businesses in New Mexico and Texas. Current Capacity 2717 MW 22
CIP Standards Applicability 23
General Terms & Common Definitions Event an action that has occurred on a system Incident an event that escalates to the need to report or recover Recovery restoring a system to full functionality Cyber Security Incident* Any malicious act or suspicious event that: Compromises or was an attempt to compromise the Electronic Security Perimeter or Physical Security Perimeter of a Critical Cyber Asset, or Disrupts, or was an attempt to disrupt the operation of a Critical Cyber Asset Malicious intending to cause harm Suspicious Event an event where the cause is suspected to be of malicious origin * - NERC Glossary Term 24
Incident Response Elements 25
Cyber Security Incident Response Plan Why Do I Need One? How Do I Respond/Recover?? Internet BPS Reliability Effected Potential Blackouts Chaos / Mayhem ESP ESP Monitoring System Router Firewal l Access Point Access Point Switch Switch Critical Cyber Assets 26 Network Segment Malicious i Insider Network Segment
Incident Reporting NERC offers 3 ways to report to ES-ISAC Critical Infrastructure Protection Information Systems (CIPIS) Reliability Coordinator Information System (RCIS) Telephone, Fax or Email One of the following forms should be used: OE 417 Form Disturbance Reporting form found in NERC Reliability Standard EOP-004 Security Guideline Threat and Incident Reporting Form Free form or your own reporting form as an email to esisac@nerc.com What does a Report look like? Refer to OE-417: http://www.eia.doe.gov/cneaf/electricity/forms/instfor417.doc 27
OE-417 Form Guidance WHEN TO REPORT When Report to Incident File if Meets Criteria 1 hour 1. Actual physical attack If causes major interruption or major negative impact on critical infrastructure facilities or to operations. 1 hour 1 hour 1 hour 2. Actual cyber or communications attack 3. Complete operational failure of electrical system 4. Electrical System Separation (Islanding) If causes major interruptions of electrical system operations. If isolated or interconnected electrical systems (transmission or distribution) suffer electrical system collapse. If part or parts of a power grid remain(s) operational in an otherwise blacked out area or within the partial failure of an integrated electrical system. 1 hour 5. Uncontrolled loss of firm system load If 300 MW or more for greater than 15 minutes from a single incident. 1 hour 6. Load shedding If 100 MW or more implemented under emergency operational policy. 1 hour 7. Voltage reductions 3 percent or more applied system-wide. 1 hour 8. Public appeal to reduce use of electricity If in emergency condition only to reduce demand. 9. Suspected physical impairment which 6 hours targets any security system or impacts electric power system reliability If any component of any physical security system is vandalized, damaged by an attack, or is suspected to have been altered. 6 hours 10. Suspected cyber computer or communications system impairment If the attempt is believed to have or did happen. 6 hours 11. Loss of electric service If greater than 50,000 customers for 1 hour or more. 6 hours 12. Fuel supply emergencies Fuel inventories or hydro project water storage levels at 50 percent or less of normal, with projected continued downward trend; emergency generation requiring abnormal use of a particular fuel. 28
Incident Response Summary Processes must be in place to direct an entity in what to do if a Cyber Security Incident should occur. A Plan should identify the various required elements. The Plan should make sense and proceed in a logical fashion through the course of an incident. Relevant documentation for any reportable incident Logs Investigative Notes 29
Disaster Recovery What is a Disaster*? An unanticipated incident or event, including natural catastrophes, technological accidents, or human-caused events, causing widespread destruction, loss, or distress to an organization that may result in significant property damage, multiple injuries, or deaths. What is Disaster Recovery*? Immediate intervention taken by an organization to minimize further losses brought on by a disaster and to begin the process of recovery, including activities iti and programs designed to restore critical business functions and return the organization to an acceptable condition. (*from NERC Security Guidelines for Electric Sector: Continuity of Operations) 30
Why Disaster Recovery Planning? Identify and measure company risks Greater probability of recovery Identify Roles and Responsibilities Communications plans Compliance Resources People Technology 31
Types Of Events Loss of EMS/SCADA component Server HMI (Human Machine Interface) Database Loss of field equipment Relay Automation ti System RTU Loss of facilities 32
Recovery Is Different Than Prevention Prevention attempts to eliminate the need for recovery Recovery means that the preventative measures have failed Need to respond in order to: Rapidly recover failed systems / components Report following reporting guidelines Preserve evidence for post-event analysis and as required by the standards 33
Summary Contents of recovery plans for the loss of Critical Cyber Assets Parts availability Assembly directions Data and program restore Minimum tests required after recovery Evidence of drills and exercises Evidence of updates following system or asset change Re-made media Instructions reflecting updated hardware Etc. 34
Backup Storage For Restoration Examples of elements included in a good backup storage for restoration plan: Backup information should be stored away from failed system Make sure the backup tapes are not destroyed during a catastrophic failure Should include instructions and directions Include installation media License keys, etc System experts may not be available during restoration 35
Media Testing More than backup tape testing Includes tape recovery system Also includes firmware proms Make sure the recover-to system can read the media 36
Change Management Testing Content of a good testing process Linkages to Change Management processes Test Plans Test Plan results (signed / dated) Unit Test Integration Test Regression Test Reports where tests failed Cycle through to changed code and re-test 37
What to Test Expected functionality resulting from the change Boundary conditions not likely to be found often in real life Interaction of changed component with the rest of the system Regression testing (make sure the fix didn t break something else) 38
Disaster Recovery Plan Testing Annual testing or exercise Recovery from an actual incident counts as an exercise Lessons learned analysis following each test or exercise Follow through on any issues encountered Update plans and procedures Include DR testing during Change Management Practice, Practice, Practice 39
Test Environments Should mimic real environments Be reasonable copies of real system Should be able to support Integration Testing Allow canned testing scripts of varying conditions Can be used for more than development testing Patch testing Vulnerability Assessment Recovery exercises Configuration change testing 40
Testing Methods Document failed test reports and subsequent fixes to be re-tested Document results Sign-off in test plan Maintain records of output results Documented test plan Agreed-to ahead of time Forces test to functionality, not to what was implemented Repeatable at will Unit test Integrated test Exception Testing Automated test scripts 41
Training People Tools Recovery Procedures Incident response Communications Reporting 42
Summary Plan for unexpected events Test, Test, Test Effective Communications Training Practice, Practice, Practice Documentation, documentation, documentation 43
Questions & Answers Contact Information: webcastquestions@energycentral.com The magazine for building a smart grid and delivering information-enabled energy. FREE subscriptions available at www.intelligentutility.com. 44
Questions & Answers Contact Information: webcastquestions@energycentral.com Your source for IT and smart grid research, analysis, and consulting. Visit www.sierraenergygroup.net. 45
Questions & Answers Contact Information: webcastquestions@energycentral.com Go to where the power industry gathers for news, information, and analysis, visit www.energycentral.com. com 46
Questions & Answers Contact Information: webcastquestions@energycentral.com 47 Get the inside scoop with Energy Central Professional News Service. Start your FREE trial at http://pro.energycentral.com/professional.
Questions & Answers Contact Information: webcastquestions@energycentral.com Join the discussion, raise your question, and voice your opinion at www.energyblogs.com. 48
Questions & Answers Contact Information: webcastquestions@energycentral.com The magazine for C-level executives about the business of energy. FREE subscriptions available at www.energybizmag.com. 49
CIP Compliance Series Webcasts For comprehensive preparation for the implementation, compliance, and auditing phases of the CIP standards program, attend all six. Upgrade and save 10%. Apply your single event purchase to the cost of the entire series. Call 800-459-2233 or e-mail orders@energycentral.com for information. Date Topic 9/23/09 Identifying Critical Assets (On Demand) 10/6/09 Program Governance Issues (On Demand) 10/21/09 Change Management Systems (On Demand) 11/11/09 Personnel Issues & Training (On Demand) 12/2/09 Physical & Electronic Access Controls (On Demand) 12/16/09 Testing Procedures & Recovery Plans (On Demand) 50
Thank You for Joining Us For the latest news, articles and blogs, please visit... www.energycentral.com 51