DISCUSSION PAPER: Peak Reliability Performance Metrics



Similar documents
Summary of CIP Version 5 Standards

When this standard has received ballot approval, the text boxes will be moved to the Guidelines and Technical Basis section of the Standard.

Transmission Function Employees Job Titles and Descriptions 18 C.F.R 358.7(f)(1)

San Diego Gas & Electric Company FERC Order 717 Transmission Function Employee Job Descriptions August 10, Electric Grid Operations

Top Ten Compliance Issues for Implementing the NERC CIP Reliability Standard

ATTACHMENT G. Network Operating Agreement

CIP Cyber Security Security Management Controls

San Diego Gas & Electric Company FERC Order 717 Transmission Function Employee Job Descriptions June 4, Electric Grid Operations

FRCC Coordination Procedure. Title: FRCC Reliability Coordinator Contingency Analysis Monitoring Criteria

NERC Cyber Security Standards

CIP Cyber Security Configuration Change Management and Vulnerability Assessments

Alberta Reliability Standard Cyber Security Configuration Change Management and Vulnerability Assessments CIP-010-AB-1

System Operator Certification Program Administrative Guidelines

3. Purpose: To improve the reliability of the Bulk Electric System by requiring the reporting of events by Responsible Entities.

Plans for CIP Compliance

CIP Cyber Security Configuration Change Management and Vulnerability Assessments

North American Electric Reliability Corporation: Critical Infrastructure Protection, Version 5 (NERC-CIP V5)

Scope of Restoration Plan

NERC CIP VERSION 5 COMPLIANCE

CIP Cyber Security Electronic Security Perimeter(s)

Duke Energy Progress Standards of Conduct Transmission Function Employee Job Titles and Job Descriptions 9/1/13

IESO Supply Deliverability Guidelines

Top 10 Compliance Issues for Implementing Security Programs

Load Dispatcher (Class Code 5223) Task List

Panel Session: Lessons Learned in Smart Grid Cybersecurity

FRCC Standards Handbook. FRCC Automatic Underfrequency Load Shedding Program. Revision Date: July 2003

Proactive Performance Management for Enterprise Databases

Electric Field Operations Organization

A. Introduction. B. Requirements. Standard PER System Personnel Training

How to Eliminate the No: 1 Cause of Network Downtime. Learn about the challenges with configuration management, solutions, and best practices.

Part A OVERVIEW Introduction Applicability Legal Provision...2. Part B SOUND DATA MANAGEMENT AND MIS PRACTICES...

Agenda do Mini-Curso. Sérgio Yoshio Fujii. Ethan Boardman.

Master/Local Control Center Procedure No. 13 (M/LCC 13) Communications Between the ISO and Local Control Centers

FERC, NERC and Emerging CIP Standards

UNITED STATES OF AMERICA BEFORE THE FEDERAL ENERGY REGULATORY COMMISSION

Asset Management Business Update

OE-417 ELECTRIC EMERGENCY INCIDENT AND DISTURBANCE REPORT...

4.1.1 Generator Owner Transmission Owner that owns synchronous condenser(s)

Power System Security Monitoring, Analysis, and Control. George Gross

Cyber Security Standards Update: Version 5

Independent Evaluation of NRC s Implementation of the Federal Information Security Modernization Act of 2014 for Fiscal Year 2015

Methodology for Merit Order Dispatch. Version 1.0

PMCS. Integrated Energy Management Solution. Unlock the Full Potential of Power Networks Through Integration. Complete Solution. Informed Decisions

New Brunswick Electricity Business Rules

Alberta Reliability Standard Cyber Security Security Management Controls CIP-003-AB-5

Risks and Controls for VAR and EOP Richard Shiflett Ruchi Ankleshwaria

BSM for IT Governance, Risk and Compliance: NERC CIP

Alberta Reliability Standard Cyber Security Physical Security of BES Cyber Systems CIP-006-AB-5

Generation Interconnection Feasibility Study Report-Web Version. PJM Generation Interconnection Request Queue Position Z1-055

Midwest Reliability Organization Procedure For NERC PRC-012

SCADA. The Heart of an Energy Management System. Presented by: Doug Van Slyke SCADA Specialist

OPERATIONS CAPITAL. The Operations Capital program for the test years is divided into two categories:

CIP Physical Security. Nate Roberts CIP Security Auditor I

North American Electric Reliability Corporation. Compliance Monitoring and Enforcement Program. December 19, 2008

NV Energy ISO Energy Imbalance Market Economic Assessment

ERCOT Design and Implementation of Internal Controls and benefits for NERC CMEP/RAI

CURTAILABLE RATE PROGRAM FOR INDIVIDUAL CUSTOMER LOADS

GENe Software Suite. GENe-at-a-glance. GE Energy Digital Energy

White Paper. Convergence of Information and Operation Technologies (IT & OT) to Build a Successful Smart Grid

Systems Operation Department

Pennsylvania Summer Reliability

Table of Contents. Real-Time Reliability Must Run Unit Commitment and Dispatch (Formerly G-203) Operating Procedure

PG&E Web Tool for Reporting Information Related to. NERC Reliability Requirement Compliance

WHITE PAPER CYBER SECURITY AND ELECTRIC UTILITY COMMUNICATIONS WHAT NERC/CIP MEANS FOR YOUR MICROWAVE

THE NEW REALITY OF RISK CYBER RISK: TRENDS AND SOLUTIONS

SOP-RTMKTS Test and Approve Operations Software Applications. Contents

Software as a Service Decision Guide and Best Practices

The Advantages of an Integrated Factory Acceptance Test in an ICS Environment

Simply Sophisticated. Information Security and Compliance

Leveraging a Maturity Model to Achieve Proactive Compliance

White Paper Case Study: How Collaboration Platforms Support the ITIL Best Practices Standard

Synchronized real time data: a new foundation for the Electric Power Grid.

Job Descriptions. Job Title Reports To Job Description TRANSMISSION SERVICES Manager, Transmission Services. VP Compliance & Standards

Standard CIP 007 3a Cyber Security Systems Security Management

OCC 98-3 OCC BULLETIN

Responsibility for Outage Coordination is designated by 4 key artifacts: 1. FERC Order 2000 FERC Order 2000 (Docket No. RM ; Order No.

Specific amendments to the Capacity Allocation and Congestion Management Network Code

LogRhythm and NERC CIP Compliance

Guidance Note: Corporate Governance - Board of Directors. March Ce document est aussi disponible en français.

CENTRAL BANK OF KENYA (CBK) PRUDENTIAL GUIDELINE ON BUSINESS CONTINUITY MANAGEMENT (BCM) FOR INSTITUTIONS LICENSED UNDER THE BANKING ACT

Cisco Network Optimization Service

White Paper. April Security Considerations for Utilities Utilities Tap Into the Power of SecureWorks

Alberta Reliability Standard Cyber Security Personnel & Training CIP-004-AB-5.1

Transcription:

DISCUSSION PAPER: Peak Reliability Performance Metrics Executive Summary Performance metrics are critical for any organization in order to encourage improvement, effectiveness and efficiency; to assess risk; and to determine appropriate levels of internal controls. Performance metrics become especially important when they concern an entity Peak Reliability 1 charged with maintaining reliability of the grid in the Western Interconnection. Currently, there is no comprehensive set of performance metrics published for Peak Reliability. This discussion paper assumes that performance metrics should be developed and implemented for Peak Reliability., Metrics are needed in order to demonstrate the following: (1) whether the Reliability Coordinator (RC) function performed by Peak Reliability is meeting the overall goal of improving system reliability; (2) whether the RC is adequately executing tasks that contribute to that goal; and (3) whether Balancing Authorities (BAs) and Transmission Operators (TOPs), which are the most critical entities for grid reliability, are adequately executing tasks that enable the RC to meet its grid reliability goal. This discussion paper organizes illustrative metrics into broad topical categories. The development of critical performance metrics should be a priority for Peak Reliability. In developing these metrics, Peak Reliability should consider potential implications that could result if Peak, BAs or TOPs only concentrate on improving their scores on the metrics at the expense of their primary mission of securing grid reliability. Grid reliability should be the utmost concern and focus not scores. However, given that it is a common industry practice to tie strong metrics directly to reliability goals (to enable system operations personnel to have a guiding roadmap to follow in achieving reliability), a balance must be struck. Overall system reliability metric: The overall metric for the success of Peak is the reliability of the power system. This needs to be measured in terms of outages and near misses. The overall metric is a result of the combined performance of the industry and Peak. Five illustrative metrics are proposed: Number of NERC reportable disturbances per month. Serious deviations from system frequency. Number and duration of power outages. 1 The new company Peak Reliaibility (Peak) will begin operations on January 1, 2014. It will execute the Reliability Coordination function presently done by the WECC Reliability Center (RC). 1

Exceedances of system operating limits (SOLs) and interconnection reliability operating limits (IROLs). Energy Emergency Alerts. Internal Peak performance metric: Measuring the Peak s contribution to achieving the overall reliability metric can be organized into five groups of metrics: Achieving performance expected by NERC standards. Frequency and accuracy of system studies. Performance of Peak s information technology (IT) systems and applications. Training of Peak employees. Training of BA and TOP employees. BA and TOP performance metric: The performance of BAs and TOPs is also a major factor in grid reliability. Peakhas the capability to populate metrics for BAs and TOPs that are particularly relevant to reliable grid operations in the areas of: Comparing the accuracy of generation information provided to the RC Network Model 2 with actual operation; Identifying the percentage of scheduled outages not reported in the RC Coordinated Outage System (COS) and the number of outages taken without RC approval; Comparing the net interchange among BAs in the Network Model versus actual interchange. Discussion The remainder of this paper discusses the illustrative metrics in more depth. (1) Overall system reliability metric: The overall metric for the success of the RC is the reliability of the power system. This needs to be measured in terms of outages and near misses. The overall metric is a result of the combined performance of both industry and the RC. Five illustrative metrics are proposed to appropriately measure reliability of the Western Interconnection: Number of NERC reportable disturbances per month o NERC defines a reportable disturbance as any event that causes an ACE change greater than or equal to 80% of a Balancing Authority s or Reserve Sharing Group s most severe contingency. 3 This metric demonstrates 2 The Network Model is a representation of the transmission network found in the field, including elements switches, breakers, transformers and line segments and details such as line impedances, losses in transformers, capacitor status, etc. The Network Model is the foundation used by analytic tools. 3 NERC Glossary of Terms Used in Reliability Standards at 56. ACE i.e., Area Control Error is a measurement of BA performance. NERC defines ACE as the instantaneous difference between net actual and 2

whether the RC footprint is improving or degrading in performance based on metrics established by the NERC community, including WECC members. Reportable disturbances are divided into five categories of events that account for their severity in terms of system impact, with Category 5 being the most severe. WECC has nearly half of the NERC reportable disturbances each year. Serious deviations from system frequency o This metric is used to track and monitor interconnection frequency response. Frequency response is a measure of an interconnection s ability to stabilize frequency immediately following the sudden loss of generation or load. Large and frequent deviations from system frequency demonstrate an inability to control load and generation. This can be caused by improper scheduling or by system events. EXAMPLE: Yearly Number of Frequency Events 2009 2010 2011 2012 Western Interconnection 25 29 25 12 Number and duration of power outages o This metric establishes a pattern of system behavior. Outage causes may be categorized so that a pattern emerges, such as human error, weather, system protection misoperations, or facility failure. Exceedances of System Operating Limits (SOLs) and Interconnection Reliability Operating Limits (IROLs) o This metric measures the number of times that a defined SOL or IROL was exceeded and the duration of these events. 4 The objective of SOLs and IROLs is to distinguish acceptable system performance from unacceptable system performance. o An SOL is defined by NERC as follows: "The value (such as MW, MVar, Amperes, Frequency or Volts) that satisfies the most limiting of the prescribed operating criteria for a specified system configuration to ensure operation within acceptable reliability criteria. 5 SOLs are based upon certain operating criteria, including the following: (1) Facility Ratings; (2) Transient Stability Ratings; (3) Voltage Stability Ratings; and (4) System Voltage Limits. TOPs are required by NERC Standard TOP-002-2.1b to develop SOLs used in the scheduled interchange, taking into account the effects of Frequency Bias including correction for meter error. Id. at 5. 4 Until recently, there were no pre-defined IROLs.. However, one of the findings from NERC and FERC in their joint investigation of the September 8, 2011 Pacific Southwest outage was that the Western Interconnection needed to establish IROLs. 5 Id. at 68. 3

operations horizon by performing seasonal, next-day, and current day studies, and are also required to communicate these SOLs to the WECC RC. o IROLs are a subset of SOLs, pursuant to NERC Standard FAC-0110-2 R1.3. An IROL is defined by NERC as follows: A System Operating Limit that, if violated, could lead to instability, uncontrolled separation, or Cascading Outages that adversely impact the reliability of the Bulk Electric System. 6 o SOL and IROL information is reported to NERC. The trending of overall SOL and IROL exceedances provides a valuable snapshot of whether there is a widespread problem or merely an isolated event. For example, a transmission line may exceed its limit more frequently in a given month because there are multiple forced outages due to fires. However, if a line is routinely exceeding its limit, and there is no immediately reasonable explanation, a deeper investigation must occur. Line segment a Line segment b Line segment c EXAMPLE: Exceedances of SOLs and IROLs 7 Number 2011 2012 Mean Number Mean Duration Duration Energy Emergency Alerts (EEAs) o This is the staged process used by entities to manage capacity emergencies. Pursuant to NERC Reliability Standard EOP-002-2, EEAs are directed at BAs and TOPs to ensure that they are prepared for both capacity and energy emergencies, although only RCs can issue EEAs. There are three severity levels of EEAs. 8 Each level is designed to assist the entity in mitigating their emergency, with Level 3 being declared when load shed is occurring or 6 Id. at 37. 7 Because the West had no pre-defined IROL s during the reporting time frame beginning in 2011, a list was developed by the RC, in conjunction with WECC planning studies that identified the 10 most congested paths in the Western Interconnection. The WECC RC reports data on those paths for the ALR3-5 report. Those paths are: Path 8; Path 14; Path 19; Path 20; Path 31; Path 32; Path 36; Path 49; Path 61; and Path 66. 8 Level 1 all available resources in use indicates that the entity is experiencing conditions where all available resources are committed to meet firm load, but non-firm wholesale energy sales have been curtailed. Level 2 load management procedures in effect indicates that the entity is no longer able to provide its customers expected energy requirements and, as a result, has had to implement procedures (short of firm load curtailment), including (but not limited to): public appeals to reduce demand, voltage reduction, interruption of non-firm loads, demand-side management, and utility load conservation measures. Level 3 firm load interruption imminent or in progress indicates that the entity foresees or has already implemented firm load obligation interruption. 4

is going to occur to mitigate the capacity emergency. However, a sharp increase in even Level 1 EEAs would indicate that entities operations are wandering into unsecure territory and that the root cause should be determined. EXAMPLE: Energy Emergency Alert Levels Number of 2006 2007 2008 2009 2010 2011 2012 Events Western 2 1 5 2 1 5 1 Interconnection Other regions FRCC 0 0 0 0 0 0 0 MRO 0 0 0 0 0 0 0 NPCC 0 0 1 1 0 0 0 RFC 0 3 1 0 2 0 1 SERC 4 14 2 3 4 2 11 SPP RE 1 5 3 35 4 15 6 TRE 0 0 0 0 0 1 0 (2) Internal RC performance metric: Measuring the RC s contribution to achieving the overall reliability of the Western Interconnection is essential and can be organized into five separate groups of metrics: Achieving performance expected by NERC reliability standards. o Although compliance with the standards does not guarantee grid security, compliance does provide a solid foundation for measuring performance. At a minimum, the RC must comply with NERC standards or be subjected to fines or decertification as an RC. The RC must also comply with FERC settlement agreements. The RC is currently audited on a three-year schedule, with the Compliance Monitor entity reserving the right to spot check or audit at any time if it so chooses. Below are suggested additional metrics and associated rationale: Number of confirmed violations. For U.S. entities, these are violations of NERC standards that have been filed with FERC. These violations are public information. Notice of possible violations (NPV) of standards by the RC. NPVs are typically found during an on-site compliance audit by WECC auditors. The reported metric should separately report CIP and Order 693 violations. The RC metric should show the number of violations accepted, the number being contested, mitigation plans that have been filed and the status of enforcement actions. (NERC rules require that 5

every violation be accompanied by a plan by the Registered Entity to mitigate the problem highlighted by the violation.) Notices of Possible Violations demonstrate a disconnect between what internal RC compliance staff believe the state of compliance is and what the WECC auditors believe the state of RC compliance is. This may be due to a number of reasons, including fundamental differences in regional operating philosophy. RC self-reports. These are reports that that RC as a Registered Entities isrequired to make to the compliance function at WECC. These are not public information. Violations of reliability standards by the Peak identified in self-reports should be segregated between CIP and Order 693 (non-cip) violations and reported annually and compared with 2009 as the baseline year. Self-reporting indicates a positive Internal Culture of Compliance. Self-reports can draw attention to a problem area such as a lack of adequate staffing, or that employees lack the appropriate knowledge to operate in a compliant manner or procedural measures are missing to sufficiently self-audit RC compliance. Trending this metric will help draw attention to area. Execution of requirements in settlement agreements with FERC. The RC is presently subject to requirements resulting from a 2011settlement agreement mitigation plan with FERC. (The settlement agreement stems from shortcomings in RC performance during the February 14, 2008 PacifiCorp outage. This was before the current RC was in place.) The plan requires the RC to take specified actions. This metric would track the status of the RC s implementation of Required Remedial Action Measures. 9 This metric would demonstrate that the RC satisfied all the Action Measures specified by FERC in the 2011 settlement agreement. Any outstanding mitigation plans that have not been completed need to be identified and a reason/cause published so a plan of action can be put in place to remedy. Unfulfilled requirements not only put the RC at 9 In semi-annual reports to FERC, WECC (on 1/1/14 Peak Reliability) is required to demonstrate the use of three procedures: the Communications Protocol; the Monitoring of Disturbance Control Performance procedure (DCS Procedure); and the Emergency Operations Procedure. For the Communication procedure, FERC mandated that the RC issue specific directives to mitigate emergencies. This includes how much load to shed and where to shed and/or what generation to move and how much to move. WECC purchased and installed a tool called NetSens that provides the RC with a list of possible generators that may be moved to mitigate an SOL or IROL to assist in issuing the specific directives. FERC s purpose in having WECC provide directive and DCS data/information is to demonstrate how WECC has incorporated the procedures into its operations. The DCS procedure requires the RC to issue the BA a directive to comply with DCS within 15 minutes, regardless if a reserve sharing power pool is whole or not. FERC stated the BA is on the hook for the requirement. The Emergency Operations Procedure was revised to include a section describing the protocol for addressing multiple emergencies, particularly those that go way beyond N-2 contingencies. At the end of the reporting schedule FERC will determine if further reporting is required by Peak. 6

risk for a compliance violation, but strongly suggests the Bulk Electric System is at heightened risk. Quality and timeliness of directives issued by the RC: The RC is the last line of defense against cascading outages and the last tool available to the RC is the issuance of directives to BAs, TOPs and Generator Operators to take specific actions to prevent a cascading outage that results from the exceedance of system operating limits (SOLs) and interconnection reliability operating limits (IROLs). It is critical that both the RC issuing the directive and the entity receiving the directive understand and correctly execute the directive to preserve the integrity of the interconnection. Two metrics are proposed related to directives: (1) number of directives issued 10 ; and (2) percentage of directives successfully executed by the receiving entity. Frequency and accuracy of system studies o Frequent and high quality studies of the state of the grid, potential contingencies and ways to mitigate contingencies are necessary if Peak Reliability is to successfully fulfill its RC obligations. Critical to accurate system studies is an accurate model of the interconnection-wide Network Model (i.e., transmission liens, generation), and Energy Management System (EMS), and analysis tools such as State Estimator and Real-Time Contingency Analysis (RTCA). The figure below shows the flow of information from the Network Model to the State Estimator to the real-time analytic tools: o Errors in the input data for the Network Model or the operation of the EMS can lead to the failure of the State Estimator to solve, or the RTCA to 10 Generally, a decline in the number of directives would indicate reliability is improving, provided that RC is maintaining the same proactive posture in addressing system problem. Lax performance by the RC could also result in a decline in directives. 7

generate accurate contingency analysis results. 11 Failure to perform frequent and timely analyses will limit the ability to use study results to resolve problems. The following metrics are proposed to measure the frequency and accuracy of system studies: o For the State Estimator The percentage of time the State Estimator fails to solve. A failure to solve could indicate poor data inputs or massive cascading outages. The percentage of time the State Estimate generates results that differ from actual grid operation. If the results from the State Estimator do not accurately represent actual grid operations, the real-time analysis tools will not generate meaningful results. o For Real-Time Contingency Analysis (RTCA) The frequency that the RTCA results matched actual grid performance. The percentage and number of RTCA of modeling cases where the RTCA did not produce credible results or complete its calculations because there are so many cascading events occurring that it cannot predict a result. 12 This is presently tracked by the RC in the following form: EXAMPLE: Real-Time Contingency Analysis (August 2013) RCTA results Current Year to date month Total number of RTCA runs 5,967 25,766 Percent of runs that were valid 99.90% 99.41% Percent of unsolved RTCA results.10%.58% The percentage of Real Time Contingency Analysis (RTCA) results reviewed and acknowledged by the Peak staff within 5 minutes. 13 This report should trend each hour. 11 The State Estimator is the back bone for the RC analysis tools, such as the RTCA, which provides situational awareness. Valid analysis solutions from the tools provide actionable results for the Peak staff. With that said, it is important that the parameters that allow the State Estimator to solve. However, the parameters should not be set so broadly and loosely that regardless of how poor the data quality is that SE will solve, but not provide a meaningful result. Setting a metric for State Estimator parameter settings and how often that State Estimator solves is a fundamental metric to ensure reliability. 12 In cases where the RTCA doesn t solve, the RC shift engineer contacts entities to see if the data inputs are wrong or it there is a cascading outage because of the violation of an Interconnection Reliability Operating Limit (IROL). 13 By the requirements of the IRO standards and per RC procedures, the RC uses RTCA to monitor expected postcontingency conditions based on the loss of N-1 and credible N-2 contingencies. When a harmful contingency is identified, the RC is required to contact the applicable BA or TOP and either determine a mitigation plan or review a previously determined mitigation plan to correct the predicted harmful contingency. The inability of the RC to 8

The number and percentage of Remedial Action Schemes (RAS) modeled in RTCA. Correct modeling of RAS makes the RTCA results more intuitive and actionable by the Reliability Coordinator. 14 The number of pre-defined IROL s identified per year or number of suspected IROL s identified in real time and addressed by a mitigation plan. 15 IROLs have the potential to cause cascading outages resulting in widespread load loss. IROLs by definition have the potential to cause instability, uncontrolled separation, or cascading outages that adversely impact the reliability of the Bulk Electric System. The RCs have a procedure to identify and address IROL s that occur in real time. This metric can capture that statistic and alert the interconnection to the vulnerability in this area. In addition, work continues on developing a method for pre-identifying conditions that may result in an IROL and having coordinated processes and plans in place ahead of time to take mitigating actions to prevent or alleviate the IROL. Additionally, this was a weakness noted by FERC in their September 8, 2011 recommendations. o For Next Day Studies The number of real time studies run per day run by Peak. The next day study is typically run for only the peak load time frame and includes only the information and data provided to the study engineer the day prior. 16 If a day-ahead study is not completed, then the RC staff is obligated to address on-the-fly unforeseen events that occur in real time, such as forced generator or transmission outages, unanticipated load increases review and acknowledge RTCA results is but a symptom that may be caused by a number of things, including inadequate RC staffing levels or a deluge of RTCA results far too large for any human to handle, which in turn may indicate a large scale event occurred or may require EMS trouble shooting to rectify. 14 Current practice requires manual effort on part of the RC, by consulting the RAS manual in hardcopy or online, when seeing a contingency marked with an asterisk (indicating an unmolded RAS is attached to the contingency in RTCA) to determine if the RAS will mitigate the contingency. With over 200 RAS Peak staff is required to be aware of pursuant to the IRO 005 standard, and with roughly only 25% of those currently modeled, this metric is important to determine if positive progress is being made in this area. If the metric is showing a lack of progress modeling RAS schemes, then Peak needs to determine the reasons why and develop corresponding actions plans to correct. A base year of 2009 should be established, with statics for each subsequent year. The goal is to have 100% of RAS modeled. 15 The Peak Reliability s Energy Management System (EMS) shift engineer may have available a pre-established mitigation plan for the specified contingency. If not, the Peak s staff immediately contacts the BA system operator to review their mitigation plan and then approve the plan if the plan appears to be viable. 16 Entities (e.g., BAs, TOPs) supply all the data specified in the RC data request for next day studies. The RC next day study engineer then applies this outage data, load data, unit commitment data to the model and runs analysis to see what the predicted results for the next day. The study (due to employee shortages) is only done for the peak load period, typically 10am 2pm. This means that the other 20 hours per day are not studied in the next day time frame. 9

or decreases and hypothetical situations such as fires that are in the vicinity of transmission facilities. This metric would indicate how often RC staff must deal with issues on-the-fly. Performance of the RC s information technology (IT) systems and applications o The RC s suite of tools runs on IT and applications platforms consisting of hardware and software. IT and applications are both fluid environments. If upgrades are not identified, budgeted for and installed, the system lacks vendor support should it be compromised or fail, putting the tools at risk. Any time these systems are degraded or non-functioning, a heightened risk to the interconnection exists because RC situational awareness may be compromised. 17 o Two metrics are proposed to measure the RC s performance in the area of information technologies and applications: 1. The percentage of uptime for key RC function systems (phones, networks, applications, situational awareness and data sharing systems (99.999% type statistics quarterly). For example: EXAMPLE: IT and applications metric (August 2013) Primary phone system (target 99.99%) Communications circuits (target 99.99%) EMS system (target 99.99%) Current month 99.99% 99.98% 100% 97% Year to date 99.99% 99.99% 99.99% 96% WECCRC.org (target 95.99%) 2. The number of actions taken to improve practices, such as the number of IT system vulnerabilities identified and fixed. Training of RC employees o Ensuring that the RC s real-time employees are qualified to work in the critical positions of real-time operator and study desk operator is crucial. Namely, this type of work is unsupervised and these employees are permitted to make decisions regarding grid security (in many cases in an extremely short timeframe) without active input from others. o Continuous training (and testing) of personnel is necessary to ensure that RC personnel have the skills and knowledge to execute their tasks which have become increasingly complex as technology advances. Training is also necessary to ensure that there is a steady flow of qualified workers for tasks at the RC. Two metrics are proposed to measure RC staff training: 17 Evaluation and integration of patches is a specific process mandated by the standards and assures that the systems used for situational awareness have the most up-to-date and effective protections. Vendors only support specific versions of their product. 10

1. Number of Reliability Coordinators (RCs) in training to become on desk RCs per month (include RC levels 1, 2 and 3). 18 This metric is important is because it provides a quick overview of the experience level portfolio of the RCs who are in the pipeline to work in real time. At the startup of the WECC RC in 2009 most staff had well over 25 years of industry experience. Today, that number is dwindling, which in itself is not a negative but it does provide a snapshot of the RC experience level. Training is the only method to augment lack of experience to close the gap between a person with 25 years experience and a person with 5 years experience. 2. Number of RC s requiring re-training or re-testing (due to being involved in an issue that wasn t handled appropriately).this creates transparency into the skill and knowledge gaps that exist within the RC. High numbers of individuals requiring re-training may indicate lack of adequate training materials, ineffective training delivery methods, not enough training time spent on the topic or poor reinforcement of procedures and policies by RC management. Training of BAs, TOPs and Generator Operators (GOPs). o An expansive view of the scope of the RC to promote system reliability includes the RC providing training to BAs, TOPs and GOPs to improve their reliability performance. One metric is proposed to measure the RC s performance in providing training to BAs, TOPs, and GOPs: 1. Number of RCs, BAs, TOPs and GOPs completing required restoration training. The RC offering training at no direct cost to the trainee capitalizes on economies of scale (as opposed to each company establishing its own training program), allows the use of remote training via VPN, allows for the cross pollination of thoughts and ideas and provides those smaller organizations, that do not have the benefit of an on-site simulator, to receive hands on training utilizing one of the best tools available. 2009 would be used as a base year for metrics. A benchmark for good performance would need to be set. It should be noted that EOP 005-3 requires TOPs and GOPs to participate in the RC restoration training. (3) BA and TOP performance metric: The success of the RC in achieving its mission to improve reliability in the Western Interconnection will be determined by the performance of Registered Entities, particularly BAs and TOPs, as well as the performance of the RC itself. 18 RC1s are the employees hired to complete a 24-36 month training program to ready them for promotion to an RC2. They typically lack experience, but do have pre-requisite technical knowledge or skill that would allow them to become successful. RC2s have a shorter training time frame and normally have some system operator experience. RC3s are those individuals who have extensive system operator experience and normally take anywhere from 4 to 6 months to successfully complete their training and be qualified to work independently in real time. 11

Metrics for the performance of each BA and TOP should be posted on WECCRC.org; aggregated statistics for all BAs and TOPs should be posted on a public RC website. Peak Reliability has the capability to populate metrics for BAs and TOPs that are particularly relevant to reliable grid operations, as set forth in more detail below: Percentage of a BA s load forecast within 10% of actual. Percentage of a BA s predicted generation within 10% of actual. Percentage of BAs and TOPs using the RTCA tool. Percentage of the time that BA and TOP real-time tools are not available. o BAs and TOPs are required to notify the RC when their tools are not available, although the WECC survey of company practices indicates that many of these companies don t have a procedure to do this. Until BA and TOP procedures are in place for reporting when their tools are not operating, it will be difficult for Peak Reliability to establish a meaningful metric. The quality of seasonal studies. o There is presently no evaluation of the quality of seasonal studies by BAs and TOPs. Under current procedures, the RC does not review seasonal studies unless they cross boundaries between BAs or TOPs. Procedures for evaluating the quality of season studies would be needed before this metric could be executed. Percentage of time TOPs post relay settings on WECCRC.org. o Relay setting information is required of TOPs in the mandatory RC data request of TOPs and BAs. However, a procedure for posting this information on WECCRC.org needs to be established. Percentage of scheduled outages not reported in Coordinated Outage System (COS) operated by the RC. o BAs and TOPs are required by NERC standard IRO-010 to provide the RC with accurate and timely information on whether specific transmission and generation equipment is in service or out-of-service. This outage information must be accurate to ensure that the results of analysis tools (State Estimator, RTCA) are meaningful. Further, outages must be coordinated and when an outage is not properly entered into the outage system it is not part of the next day study results. This may or may not have a direct adverse impact on the Bulk Electric System. This metric will indicate whether the procedure for evaluating the impacts of outages on the system is operating smoothly or whether stopgap last minute studies are being relied on. Number of outages, per month, by BA that were taken without approval of the RC. o NERC standards TOP-003, IRO-005 and IRO-010 all address outages submitted to the RC so the RC may run next day studies to determine and take action to mitigate any adverse impacts to the BES in the next day time frame. Failure to supply the RC with all outage reports means the next day study results are flawed and a SOL or IROL contingency may exist that was not flagged because not all the outages were inputted for the next day study. Registered entities that fail to 12

provide the RC with their outages need to take immediate action to remedy this situation. This may include re-training of personnel, revising procedures or stronger administrative oversight. Percentage of accurate outage restoration plans approved by the RC. o The impacts from an outage can be mitigated by the rapid execution of successful outage restoration plans. TOP outage restoration plans must be reviewed and approved by the RC. This metric measures both the RC and the TOP compliance with the mandatory standards. TOPs are required to submit their restoration plans to the RC for review and approval. The RC has criteria, taken directly from the standards, that it uses to ensure the plan has been properly coordinated and contains all the necessary information needed by a system operator should a restoration event occur. Generation and net interchange by BA. o Magnitude and frequency of differences between generation and interchanged entered into the Network Model and actual generation and interchange. This provides an indication of whether the quality of the information provided to the RC for conducting the next day studies is improving or degrading. It also identifies chronic offenders who continue to provide bad data and information. In summary, the aforementioned metrics are illustrative of the metrics that will provide valuable insight and transparency into the reliability performance of the Western Interconnection overall, the performance of the RC specifically, and the performance of BAs and TOPs. These metrics should provide the needed focus for industry leaders to develop short and long range strategies to secure the stability of the Western Bulk Electric System. Comparing the accuracy of generation information provided to the Peak Network Mode with actual operation. o The information and data provided to the RC is critical in ensuring that study results and real-time application results are as accurate as possible so that the RC s tools can predict problems before they occur. Identifying the percentage of scheduled outages not reported in the RC s Coordinated Outage System (COS) and the number of outages taken without RC approval. o The purpose of the RC s COS is to maintain coordinated system operations by notifying entities of scheduled and/or forced outages of key transmission and generation facilities. There should be zero scheduled outages not reported in the RC s COS. There should also be zero outages taken without RC approval. If that is not the case, then the entire system is at risk because the RC may have missed an IROL or SOL that it should have predicted. Comparing the net interchange among BAs in the Network Model versus actual interchange. 13

o This information and data is important to accurately run a next-day study. Faulty data, especially on a continual basis, generates inaccurate results that are worthless in predicting system behavior, forcing the real-time system operators and the RC to continuously handle system issues on the fly. 14