Data Analytics for a Secure Smart Grid Dr. Silvio La Porta Senior Research Scientist EMC Research Europe Ireland COE.
Agenda APT modus operandi Data Analysis and Security SPARKS Data Analytics Module
Anatomy of an Attack Anatomy of a Response
APT Kill Chain Advanced Persistent Threat (APT) Phishing and Zero Day Attack An handful of users are targeted by two phishing attacks: Backdoor The user machine is accessed remotely by RAT like PlugX Lateral Movement Attacker elevates access to important user, service and admin accounts, as well specific system Data Gathering Data is acquired from target servers and staged for exfiltration Exfiltrate Data is exfiltrated via encryption file over ftp to external, compromised machine at a hosting provider One user opens Zero day payload (CVE-02011-ZZZX)
Traditional Security Is Not Working 97% of breaches led to compromise within days or less with 72% leading to data exfiltration in the same time Source: Verizon 2013 Data Breach Investigations Report 78% of breaches took weeks or more to discover 66% took months or more
Big Data on Security More sophisticated adversaries and sophisticated methods. Limited human capacity combined with massive amounts of events 40% of all survey respondents are overwhelmed with the security data they already collect 35% have insufficient time or expertise to analyse what they collect Security tools, tactics and defences becoming outdated: Content is static and not as dynamic as the threat landscape Segregated by too many point products, tool interfaces, disparate data sets 1 EMA, The Rise of Data-Driven Security, Crawford, Aug 2012 Survey Sample Size = 200
Evolution of Data Analytics in Security BI and Compliance driven Investigation Driven Behavior metrics driven Data-science driven Data goes in, hard to extract value Fast queries over large data Single source metrics, single correlation, rule based, high false positive Leverage full contextual info, multi-source, automatic, for low false positives
Data Science: The Next Security Frontier Beyond signatures Beyond simple metrics for thresholding Beyond manual engineering of rules Monitor each and every entity in its environmental context with 360 view over long time window with advanced mathematics
Today s Security Requirements Big Data Infrastructure Need a fast and scalable infrastructure to conduct real time and long term analysis High Powered Analytics Give me the speed and smarts to detect, investigate and prioritize potential threats Comprehensive Visibility See everything happening in my environment and normalize it Integrated Intelligence Help me understand what to look for and what others have discovered
Applying Intelligent Driven Security Analytics Big Data Analytics Governance Data Alert & Report Compliance Apps Systems Store Investigate & Analyze Visualize Incident Management Network Respond Remediation Public & Private Threat Intelligence
SPARKS Security analytics test Env. SCADA controller SCADA NEODYNE Enable New (?) attacks LIVE Install Security Analytics solutions In UTRC Middleware NESCOR attack trees Formulation Power Measurements SCADA BMS DB (e.g WSN) Log Files <new sources> Demo site attack down-selection Use Security Analytics as inputs for designing resilient control algorithms Final demonstration Pattern Generation Example of AMI.29: Unauthorized Device Acquires HAN Access and Steals Private Information
SPARKS Sec. Info. Analytics Component The module will be composed by two main components Static Rules Validator Auto-Detector SCADA Controller SPARKS Sec. Info. Analytics Component G.U.I. Static Rules Validator Auto-Detector Resilient Control System
Static Rules Validator The component will search for systems asserts violations Rules List contains the assertions to verify Adapter translate the rules in common language Parser get the rules and search for negative or positive outliers Static Rules Validator Rules list Adapter Parser
Intra-Meter Security Analytics 28 Measured Variables V A V B V C I A I B I C I N P A P B P C 18 Calculated Variables V AB, V BC, V CA Q A, S A Q B, S B Q C, S C P Total, Q Total, S total Power Factor E Active +,E Active - E Reactive +,E Reactive - E Apparent 28 + 18 = 46 Cross-Checking Value ~2 month of data 14,5 Million observations
Intra-Meter Security Analytics 18 Cross-Checking Equations V A V B V C I A I B I C I N V AB V BC V CA cos 1 V A 2 +V 2 B V 2 AA + cos 1 V B 2 +V 2 C V 2 BB + cos 1 V C 2 +V 2 A V 2 CC = 360 2V A V B 2V B V C 2V C V A
Real Time Analysis Equations need not be followed exactly, e.g., unsynchronised sampling We let the rule be followed approximately for each equation, difference or ratio of LHS and RHS are calculated EEEEE = 1 2π cos 1 V A 2 +V 2 2 B V AA + cos 1 V B 2 +V 2 2 C V BB + cos 1 V C 2 +V 2 2 A V CC 2V A V B 2V B V C 2V C V A We calculate and store histograms of all errors in normal operation In real time, we evaluate the current error and compute its probability If probability is too low, we flag the equation and display total number of equations violated
Daily Analysis At the end of the day, we compute the histogram for the day s errors We use the Kullback-Leibler distance of this histogram from the historical distribution as a measure to check whether a deviation exists If deviation is too high, we generate an alarm indicating that there might have been an attack present during the whole day
Nominal Value Check
Rules list example "Phase A Active Power Error", "Phase B Active Power Error", "Phase C Active Power Error", "Phase A Reactive Power Error", "Phase B Reactive Power Error", "Phase C Reactive Power Error", "Phase A Apparent Power Error 1", "Phase B Apparent Power Error 1", "Phase C Apparent Power Error 1", "Phase A Apparent Power Error 2", "Phase B Apparent Power Error 2", "Phase C Apparent Power Error 2", "Total Active Power Error", "Total Apparent Power Error", "Power Factor Error 1", "Power Factor Error 2", "Voltage Phase Error", "Neutral Current Error"
Nimbus Meters
Global View Num. of observation in a day = 1935360 Meter E01 last 24H detections
Detail of a Meter (EM10) Number of Rules that generated outliers Number of Nominal Value Outliers Time
Disconnection Threshold to trigger the alarm Connected Back
Distribution Distance The system checks the current day distribution against historical data distribution using the Kullback-Leibler distance :
Auto-Detector The component will use machine learning technique to evaluate the entire system state Rules Extractor get data from last readings Historical KB compare the new feature with system history Evaluator use tolerance to reduce FP and noise Auto-Detector Rules Extractor Historical KB Evaluator
Work in progress Data analytics Algorithm basic features : Patterns Detection and Patterns Violation (example battery is charged everyday between 7am-12am and discharged between 6pm-10pm) Inter Meter checks Dynamic rules and checks in the interface Interactive interface to zoom in time frames
Thank You for your attention Questions