Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning

Similar documents

U.S. Army Research, Development and Engineering Command. Cyber Security CRA Overview

BOOLEAN CONSENSUS FOR SOCIETIES OF ROBOTS

Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning

Cyber-Physical Security in Power Networks

An Overview of Knowledge Discovery Database and Data mining Techniques

Intrusion Detection via Machine Learning for SCADA System Protection

Intrusion Detection: Game Theory, Stochastic Processes and Data Mining

Network Security A Decision and Game-Theoretic Approach

Decentralized Utility-based Sensor Network Design

Reliability Guarantees in Automata Based Scheduling for Embedded Control Software

Security and Vulnerability of Cyber-Physical Infrastructure Networks: A Control-Theoretic Approach

Neuro-Dynamic Programming An Overview

Network Mission Assurance

The SIEM Evaluator s Guide

Course Syllabus For Operations Management. Management Information Systems

Formulations of Model Predictive Control. Dipartimento di Elettronica e Informazione

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

The Scientific Data Mining Process

Functional Optimization Models for Active Queue Management

Index Terms Cloud Storage Services, data integrity, dependable distributed storage, data dynamics, Cloud Computing.

ANTALYA INTERNATIONAL UNIVERSITY INDUSTRIAL ENGINEERING COURSE DESCRIPTIONS

A Sarsa based Autonomous Stock Trading Agent

Enhancing Wireless Security with Physical Layer Network Cooperation

How To Use Neural Networks In Data Mining

DDOS WALL: AN INTERNET SERVICE PROVIDER PROTECTOR

Techniques for Supporting Prediction of Security Breaches in. Critical Cloud Infrastructures Using Bayesian Network and. Markov Decision Process

Big Data - Lecture 1 Optimization reminders

Compact Representations and Approximations for Compuation in Games

A Robustness Simulation Method of Project Schedule based on the Monte Carlo Method

Linear Threshold Units

Computer Science MS Course Descriptions

How will the programme be delivered (e.g. inter-institutional, summerschools, lectures, placement, rotations, on-line etc.):

Session 9: Changing Paradigms and Challenges Tools for Space Systems Cyber Situational Awareness

Ensuring Security in Cloud with Multi-Level IDS and Log Management System

A Network Flow Approach in Cloud Computing

A Model-based Methodology for Developing Secure VoIP Systems

Optimal linear-quadratic control

A Game Theoretical Framework on Intrusion Detection in Heterogeneous Networks Lin Chen, Member, IEEE, and Jean Leneutre

Optimization Problems in Infrastructure Security

MSCA Introduction to Statistical Concepts

IBM SECURITY QRADAR INCIDENT FORENSICS

Background: State Estimation

The Big Data Paradigm Shift. Insight Through Automation

Operations Research and Knowledge Modeling in Data Mining

Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x

PFP Technology White Paper

A Brief Introduction to Property Testing

Network Security Validation Using Game Theory

Equilibrium computation: Part 1

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XVI - Fault Accomodation Using Model Predictive Methods - Jovan D. Bošković and Raman K.

EEI Business Continuity. Threat Scenario Project (TSP) April 4, EEI Threat Scenario Project

The Predictive Data Mining Revolution in Scorecards:

6.254 : Game Theory with Engineering Applications Lecture 1: Introduction

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

Symantec Cyber Threat Analysis Program Program Overview. Symantec Cyber Threat Analysis Program Team

Communication and Embedded Systems: Towards a Smart Grid. Radu Stoleru, Alex Sprintson, Narasimha Reddy, and P. R. Kumar

2011 Cyber Security and the Advanced Persistent Threat A Holistic View

Real-Time Systems Versus Cyber-Physical Systems: Where is the Difference?

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

Implementing Large-Scale Autonomic Server Monitoring Using Process Query Systems. Christopher Roblee Vincent Berk George Cybenko

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

Tracking Groups of Pedestrians in Video Sequences

Quantifying Seasonal Variation in Cloud Cover with Predictive Models

System Aware Cyber Security

Game-Theoretic Analysis of Attack and Defense in Cyber-Physical Network Infrastructures

Change Management in Enterprise IT Systems: Process Modeling and Capacity-optimal Scheduling

METHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS.

INTRUSION PREVENTION AND EXPERT SYSTEMS

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations

Cyber Security. BDS PhantomWorks. Boeing Energy. Copyright 2011 Boeing. All rights reserved.

Security Optimization of Dynamic Networks with Probabilistic Graph Modeling and Linear Programming

Secure Way of Storing Data in Cloud Using Third Party Auditor

A Web-based Interactive Data Visualization System for Outlier Subspace Analysis

Measuring the Performance of an Agent

Kalman Filter Applied to a Active Queue Management Problem

Combating a new generation of cybercriminal with in-depth security monitoring

SEMANTIC SECURITY ANALYSIS OF SCADA NETWORKS TO DETECT MALICIOUS CONTROL COMMANDS IN POWER GRID

CHAPTER 1 INTRODUCTION

Min/Max Inventory Planning for Military Logistics

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Competitive Analysis of On line Randomized Call Control in Cellular Networks

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Core Courses Seminar (0-2) Non-credit Ph.D. Thesis (0-1) Non-credit Special Studies (8-0) Non-credit. Elective Courses

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

A Mathematical Programming Solution to the Mars Express Memory Dumping Problem

MSCA Introduction to Statistical Concepts

Steve Lusk Alex Amirnovin Tim Collins

MEng, BSc Computer Science with Artificial Intelligence

Transcription:

Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning João P. Hespanha Kyriakos G. Vamvoudakis

Correlation Engine COAs Data Data Data Data Cyber Situation Awareness Framework Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analyze and Characterize Attackers Analysis to determine dependencies between assets and missions Mission Model Cyber-Assets Model Predict Future Actions Create semantically-rich view of cyber-mission status Sensor Alerts Data Impact Analysis

Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization

Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization Bopardikar- UTRC Prandini-Milano Poly. Milano

Network Security Games intrusion detection system chat software attack graph from [J. Wing 2007] web proxy intruder s target! sequence of intruder actions that compromise database server, not detected by IDS Even trivially small network security games can lead to games with very large decision trees Problem statistics of ictf 2010 over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about 10 2527 distinct policies attacker can choose among 10 756 10 2616 distinct policies, depending on attacker's level of expertise

attack graph from [J. Wing 2007] Network Security Games intrusion detection system Developed sample-based approach to solving zero-sum games Approach provides probabilistic guarantees on the performance of the policies (in terms of security levels) Results applicable to very general classes of games that can include stochastic actions, partial information, etc. chat software web proxy intruder s target! sequence of intruder actions that compromise database server, not detected by IDS Even trivially small network security games can lead to games with very large decision trees Problem statistics of ictf 2010 over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about 10 2527 distinct policies attacker can choose among 10 756 10 2616 distinct policies, depending on attacker's level of expertise

Application to ictf 2010 services S6, S2 We were able to Provide Cyber-security Litya office receives estimates avg. 314 of units mission for completion success of all 4 missions Take into account the effect of attacks & counter measures Response can be a function of attacker sophistication Play what-if scenarios (vulnerabilities, information, services S4, S5 etc.) services S3, S7 service S1 Increasing level of attacker sophistication services S2, S3 Level of attacker sophistication # units received by Litya for 1 round of missions [Option I, no bribes] services S0, S2 service S8 service S6 # units received by Litya for 1 round of missions [Option I, with bribes] service S9 # units received by Litya for 1 round of missions [Option II, with bribes] no service vulnerable (baseline) 314 314 314 services S3, S9 S2 services S0, S1 service S2 (vulnerable to 38 teams) 240 240 138 S2, S6, S9 (vulnerable to at least 6 team) 79 79 43 S0, S2, S4, S6, S7, S8, S9 services S3, S8 service S1 service S0 (vulnerable to at least 1 team) 11-738 -1327 all services vulnerable 11-848 -1917

Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization Sinopoli-CMU Y. Mo-Caltech

Detection in Adversarial Environments How to interpret & access the reliability of sensors that have been manipulated? Sensors relevant to cyber missions? Measurement sensors (e.g., SCADA systems) Computational sensors (e.g., weather forecasting simulation engines) Data retrieval sensors (e.g., database queries) Cyber-security sensors (e.g., IDSs) Domains Deterministic sensors: with n sensors, one can get correct answer as long as m < n/2 sensors have been manipulated Stochastic sensors without manipulation: solution given by hypothesis testing/estimation Stochastic sensors with potential manipulation: open problem?

Problem formulation X binary random variable to be estimated for simplicity (papers treats general case) Y 1, Y 2,, Y n noisy measurements of X produced by n sensors per-sensor error probability (not necessarily very small) Z 1, Z 2,, Z n measurements actually reported by the n sensors at most m sensors attacked p attack probability that we are under attack (very hard to know!) interpretation of sensor data should be mostly independent of p attack

Result for small # of sensors (n<2/p err ) X binary random variable to be estimated Y 1, Y 2,, Y n noisy measurements of X produced by n sensors Z 1, Z 2,, Z n measurements actually reported by the n sensors at most m sensors attacked p attack probability that we are under attack (very hard to know!) Theorem: The optimal estimator is go with the majority of the (potentially manipulated) sensor readings go with the majority, EXCEPT if there is consensus The optimal estimator is largely independent of p attack (hard to know)

Result for small # of sensors (n<2/p err ) X binary random variable to be estimated Y 1, Y 2,, Y n noisy measurements of X produced by n sensors Z 1, Z 2,, Z n measurements actually reported by the n sensors This year s at work most m sensors attacked p attack Can probability we extend that this we to the are estimation under attack of (very time-varying hard to know!) variables: e.g., the state of a mission! Theorem: The optimal estimator is go with the majority of the (potentially manipulated) sensor readings go with the majority, EXCEPT if there is consensus The optimal estimator is largely independent of p attack (hard to know)

Estimation in Adversarial Environments How to interpret & access the reliability of sensors that have been manipulated? Sensors relevant to cyber missions? Measurement sensors (e.g., SCADA systems) Computational sensors (e.g., weather forecasting simulation engines) Data retrieval sensors (e.g., database queries) Cyber-security sensors (e.g., IDSs) Previously Now X constant binary random variable to be estimated X(t) time-varying state variable to be estimated, based on 1.sensor measurements that may have been manipulated 2.system dynamics E.g., the state of a cyber mission

Problem formulation dynamical evolution of systems s state control signals N measurements produced by sensor at most M sensors can be manipulated by the attackers N measurements reported by sensor Dynamics can also be formulated as a discrete-event system using the Ramadge- Wonham supervisory control framework Under what conditions can one reconstruct the state from (potentially corrupted) sensor measurements?

Problem formulation dynamical evolution of systems s state control signals N measurements produced by sensor at most M sensors can be manipulated by the attackers N measurements reported by sensor Theorem: Exact state reconstruction is possible if and only if system is observable through every subset of N - 2M measurements state could be reconstructed through only N - 2M measurements in the absence of attacks potential attack at M sensors, effectively disables 2M sensors

Estimation algorithms Gramian-based estimator: batch, finite-time estimation inversion of the observability matrix at each time step Observer-based estimator: asymptotic estimation recursive low-computation algorithm provably robust with respect to noise on all sensors (including non attacked ones) Algorithm outline: 1. Build an estimate removing by ignoring a set S of M sensors 2. Build additional estimates by removing, in addition, all combinations of M additional sensors 3. If all attacked sensors were in set S, then the estimates in steps 1. and 2. will be consistent (modulo noise) (all estimates can be constructed without combinatorial complexity, by using finite dimensionality)

Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization

Resilient Cyber-Mission Architectures In complex cyber missions, human operators define policies and rules computing elements automate processes of distributed resource allocation, scheduling, inventory management, etc. self-configuration: automatic configuration of components self-healing: automatic discovery and correction of faults self-optimization: automatic allocation of resources for optimal operation What is the impact of attacks on this type of automated/optimization process? Can we devise algorithms with built-in attack prediction/awareness capabilities?

Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication 2 nd order adjustment rule value at processor i, iteration k correct update on adjustment update on adjustment by attacker adjustment on x i by processor i, at iteration k Goal: minimize errors between values of agents and their neighbors Attacker: maximize errors using stealth attacks (small v i ) peers of agent i (self-included)

Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication 2 nd order adjustment rule value at processor i, iteration k correct update on adjustment update on adjustment by attacker adjustment on x i by processor i, at iteration k Nash equilibrium formulation: error min. by us max. by attacker our updates (small means smooth) min. by us max. by attacker attacker updates (small means stealth) max. by us min. by attacker

Optimal Solution Bellman Equation Optimal Control and Attacker Policies number of peers Under appropriate regularity assumptions (smoothness) u i * is optimal (minimal) for us v i * is optimal (maximal) for attacker Moreover, Consensus will be reached asymptotically All variables will remain bounded through the transient (in fact, Lyapunov stability) Theoretical results derived for a continuous-time approximation of the algorithms, more suitable for the asymptotic analysis

Optimal Solution Bellman Equation Optimal Control and Attacker Policies number of peers But Bellman equation difficult to solve (curse of dimensionality) Last year: Under appropriate Machine learning regularity based assumptions approach to solve (smoothness) this distributed consensus problem u i * is optimal (minimal) for us Restricted to second-order updates (double v i integrator) * is optimal (maximal) for attacker Global knowledge of the communication graph was required Moreover, Global knowledge of the update rules used by each agent required Consensus will be reached asymptotically All variables This year s will work remain overcomes bounded these through 3 limitations the transient (in fact, Lyapunov stability) Theoretical results derived for a continuous-time approximation of the algorithms, more suitable for the asymptotic analysis

Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication General update rule: value at processor i, iteration k+1 correct update malicious update by attacker Goal: minimize errors between values of agents and their neighbors Attacker: maximize errors using stealth attacks (small v i ) peers of agent i (self-included)

Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication General update rule: value at processor i, iteration k+1 correct update Nash equilibrium formulation: malicious update by attacker error min. by us max. by attacker our updates (small means smooth) min. by us max. by attacker attacker updates (small means stealth) max. by us min. by attacker

Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group Challenges: of computing elements must agree on a common scalar value x (e.g., priority, Each resources agent does allocated, not necessarily inventory know decision, the update database algorithm value) Decision used done by iteratively the other & agents distributed (A i,b i,d using i ) peer-to-peer communication Each agent does not necessarily know the global graph (just its General set update of neighbors rule: N i ) value at processor i, iteration k+1 correct update Nash equilibrium formulation: malicious update by attacker error min. by us max. by attacker our updates (small means smooth) min. by us max. by attacker attacker updates (small means stealth) max. by us min. by attacker

Q-learning We shall use Q-Learning (Watkins, 1989) (a popular method from machine learning) states or situations of each agent Generator actions evaluations Tester Q-learning: Model-free machine learning method to learn an action-utility or Q-function, giving the expected utility of taking a given action in a given state. The learned Q-function directly approximates, the optimal action-value function, independent of the policy being followed. Watkins algorithm motivates us

Q-function Instead of the Q-tables (complexity problems) used by Watkins 1989, we shall instead use appropriate neural networks. But encoding the states and actions (Q-function) properly will be challenging. are the unique symmetric positive definite matrices that solve the game Q-function: Optimal Q-function: Each Q-function is quadratic Unknown matrix to be found

Actor/Critic Learning Approach Explicit representation of policy and value function Minimal computation to select actions Can learn an explicit policy Can put constraints on policies Appealing as psychological and neural models Critic = Model free (distributed) algorithm to evaluate the current algorithm & estimate attacker actions Actor = Model free (distributed) algorithm to enact optimal decisions (based on critic s findings) Actor & Critic based on Approximate Dynamic Programming Neural Network approximation of Q-function (action dependent) & optimal control laws

Learning Structure Critic Neural Networks to approximate the costs unknown weights to be learned basis sets, with local state and control information Actor Neural networks to approximate the control and adversarial inputs Bellman equations in integral form Sampling interval Compare to Watkins Q-function

Learning under attacks key result Theorem : Assuming The signals are persistently exciting. The graph is strongly connected. Then The equilibrium points of the closed-loop signals are asymptotically stable The policies converge to a Nash equilibrium Unknown second order dynamics and unknown graph structure, cost weights and leader information Proposed algorithm. Synchronization of all the agents is achieved even under attacks and unknown network and agent information.

Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization

Cyber Missions Complexity Challenges to real-time cyber-mission protection: cyber assets shared among missions cyber asset requirements change over time missions can use different configurations of resources complex network of cyber-asset dependencies Mission 1 services S0, S1 services S5, S6 service S9 services S0, S1 services S5, S6, S7 service S6 Attack on service S0 can result in multiple mission failure But, damage only realized if missions follows particular paths Mission 2 services S0, S2 service S2 service S8 Cyber Awareness Questions: When & where is an attacker most likely to strike? When & where is an attacker most damaging to mission completion? How will the answer depend on attacker resources? attacker skills? attacker knowledge? (real-time what-if analysis)

Mapping Service Attacks to Mission Damage Mission requires multiple services Mission reliance on services varies with time Damage equation: (for service s at time t) Uncertainty equation: (for service s at time t) Potential damage probability of realizing damage attack resources attack resources equation parameters vary with time as mission progresses (learned from data in ictf exercises) Optimal attacks: maximize constrained by: total damage to mission total attack resources at time t In this period: Developed optimization engine that can address 1000 s of variables/constraints in a few milliseconds.

Multi-Resolution Visualization Multi-resolution attack analysis 1. High-level attack predictions based on online optimization 2. Potential damage & uncertainty associated with attacks to different services 3. Parameters that determine damage and uncertainty AlloSphere Integration High-level predictions permit fast action Low-level parameters permits investigating rationale for predictions

Summary of Accomplishments (Y5) A new notion of observability for systems under attacks. A necessary and sufficient condition for a dynamical system to be M-attack observable. Two estimation algorithms Gramian-based estimator (finite-time convergence) Observer-based estimator (asymptotic convergence) Shielding complex networks from adversarial attacks using Q-learning Developed Model-Free Dynamic Programming-based solution to obtain resilient algorithms with attack estimates Used online Q-learning approach to overcome complexity issues in complex networks There is no need for the physical models, nor the network interactions Applications to critical infrastructures, e.g. power systems Online optimization for real-time attack prediction Develop numerical algorithms for fast (real-time) attack prediction Integration with visualization tools developed (more on Tobias Hollerer s presentation) Focus for future work Develop new self-configuring event-triggered algorithms for complex networks for decision and control given the presence of jamming network attacks. Transition to practice including Mixed Human, Manned and Unmanned Teams and Large Networks of Off-Grid Power systems.

Published last year or in press Publications K. Vamvoudakis, J. Hespanha, B. Sinopoli, Y. Mo. Detection in Adversarial Environments. IEEE Trans. on Automatic Control, Special Issue on the Control of Cyber-Physical Systems, February 2014. To appear. K. G. Vamvoudakis, M. F. Miranda, J. P. Hespanha. Asymptotically-Stable Optimal Adaptive Control Algorithm with Saturating Actuators and Relaxed Persistence of Excitation. IEEE Transactions on Neural Networks and Learning Systems, July 2014. conditionally accepted. Kenji Hirata, J. P. Hespanha, Kenko Uchida. Real-time Pricing Leading to Optimal Operation under Distributed Decision Makings. In Proc. of the 2014 Amer. Contr. Conf., June 2014. Daniel Silvestre, Paulo Rosa, J. Hespanha, C. Silvestre. Finite-time Average Consensus in a Byzantine Environment Using Set-Valued Observers. In Proc. of the 2014 Amer. Contr. Conf., June 2014. K. G. Vamvoudakis, J. P. Hespanha. Online Optimal Switching of Single Phase DC/AC Inverters Using Partial Information. In Proc. American Control Conference, pp. 2624-2630, Portland, OR, 2014. Submitted K. Vamvoudakis, J. Hespanha, Game-Theory based Consensus Learning of Double-Integrator Agents in the Presence of Attackers. Sep 2014. Submitted to journal publication. K. G. Vamvoudakis, P. J. Antsaklis, W. E. Dixon, J. P. Hespanha, F. L. Lewis, H. Modares, B. Kiumarsi. Autonomy and Machine Intelligence in Complex Systems: A Tutorial. submitted to American Control Conference, Chicago, IL, 2015. (invited paper) M. Chong, M. Wakaiki, J. Hespanha. Observability of Linear Systems under attacks. September 2014. Submitted to ACC 2015. Working papers K. G. Vamvoudakis, J. P. Hespanha. Cooperative Q-learning for Rejection of Persistent Adversarial Inputs in Complex Networks. in preparation for Journal of Machine Learning Research, 2014.