Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning João P. Hespanha Kyriakos G. Vamvoudakis
Correlation Engine COAs Data Data Data Data Cyber Situation Awareness Framework Mission Cyber-Assets Simulation/Live Security Exercises Observations: Netflow, Probing, Time analysis Analysis to get up-to-date view of cyber-assets Analyze and Characterize Attackers Analysis to determine dependencies between assets and missions Mission Model Cyber-Assets Model Predict Future Actions Create semantically-rich view of cyber-mission status Sensor Alerts Data Impact Analysis
Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization
Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization Bopardikar- UTRC Prandini-Milano Poly. Milano
Network Security Games intrusion detection system chat software attack graph from [J. Wing 2007] web proxy intruder s target! sequence of intruder actions that compromise database server, not detected by IDS Even trivially small network security games can lead to games with very large decision trees Problem statistics of ictf 2010 over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about 10 2527 distinct policies attacker can choose among 10 756 10 2616 distinct policies, depending on attacker's level of expertise
attack graph from [J. Wing 2007] Network Security Games intrusion detection system Developed sample-based approach to solving zero-sum games Approach provides probabilistic guarantees on the performance of the policies (in terms of security levels) Results applicable to very general classes of games that can include stochastic actions, partial information, etc. chat software web proxy intruder s target! sequence of intruder actions that compromise database server, not detected by IDS Even trivially small network security games can lead to games with very large decision trees Problem statistics of ictf 2010 over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about 10 2527 distinct policies attacker can choose among 10 756 10 2616 distinct policies, depending on attacker's level of expertise
Application to ictf 2010 services S6, S2 We were able to Provide Cyber-security Litya office receives estimates avg. 314 of units mission for completion success of all 4 missions Take into account the effect of attacks & counter measures Response can be a function of attacker sophistication Play what-if scenarios (vulnerabilities, information, services S4, S5 etc.) services S3, S7 service S1 Increasing level of attacker sophistication services S2, S3 Level of attacker sophistication # units received by Litya for 1 round of missions [Option I, no bribes] services S0, S2 service S8 service S6 # units received by Litya for 1 round of missions [Option I, with bribes] service S9 # units received by Litya for 1 round of missions [Option II, with bribes] no service vulnerable (baseline) 314 314 314 services S3, S9 S2 services S0, S1 service S2 (vulnerable to 38 teams) 240 240 138 S2, S6, S9 (vulnerable to at least 6 team) 79 79 43 S0, S2, S4, S6, S7, S8, S9 services S3, S8 service S1 service S0 (vulnerable to at least 1 team) 11-738 -1327 all services vulnerable 11-848 -1917
Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization Sinopoli-CMU Y. Mo-Caltech
Detection in Adversarial Environments How to interpret & access the reliability of sensors that have been manipulated? Sensors relevant to cyber missions? Measurement sensors (e.g., SCADA systems) Computational sensors (e.g., weather forecasting simulation engines) Data retrieval sensors (e.g., database queries) Cyber-security sensors (e.g., IDSs) Domains Deterministic sensors: with n sensors, one can get correct answer as long as m < n/2 sensors have been manipulated Stochastic sensors without manipulation: solution given by hypothesis testing/estimation Stochastic sensors with potential manipulation: open problem?
Problem formulation X binary random variable to be estimated for simplicity (papers treats general case) Y 1, Y 2,, Y n noisy measurements of X produced by n sensors per-sensor error probability (not necessarily very small) Z 1, Z 2,, Z n measurements actually reported by the n sensors at most m sensors attacked p attack probability that we are under attack (very hard to know!) interpretation of sensor data should be mostly independent of p attack
Result for small # of sensors (n<2/p err ) X binary random variable to be estimated Y 1, Y 2,, Y n noisy measurements of X produced by n sensors Z 1, Z 2,, Z n measurements actually reported by the n sensors at most m sensors attacked p attack probability that we are under attack (very hard to know!) Theorem: The optimal estimator is go with the majority of the (potentially manipulated) sensor readings go with the majority, EXCEPT if there is consensus The optimal estimator is largely independent of p attack (hard to know)
Result for small # of sensors (n<2/p err ) X binary random variable to be estimated Y 1, Y 2,, Y n noisy measurements of X produced by n sensors Z 1, Z 2,, Z n measurements actually reported by the n sensors This year s at work most m sensors attacked p attack Can probability we extend that this we to the are estimation under attack of (very time-varying hard to know!) variables: e.g., the state of a mission! Theorem: The optimal estimator is go with the majority of the (potentially manipulated) sensor readings go with the majority, EXCEPT if there is consensus The optimal estimator is largely independent of p attack (hard to know)
Estimation in Adversarial Environments How to interpret & access the reliability of sensors that have been manipulated? Sensors relevant to cyber missions? Measurement sensors (e.g., SCADA systems) Computational sensors (e.g., weather forecasting simulation engines) Data retrieval sensors (e.g., database queries) Cyber-security sensors (e.g., IDSs) Previously Now X constant binary random variable to be estimated X(t) time-varying state variable to be estimated, based on 1.sensor measurements that may have been manipulated 2.system dynamics E.g., the state of a cyber mission
Problem formulation dynamical evolution of systems s state control signals N measurements produced by sensor at most M sensors can be manipulated by the attackers N measurements reported by sensor Dynamics can also be formulated as a discrete-event system using the Ramadge- Wonham supervisory control framework Under what conditions can one reconstruct the state from (potentially corrupted) sensor measurements?
Problem formulation dynamical evolution of systems s state control signals N measurements produced by sensor at most M sensors can be manipulated by the attackers N measurements reported by sensor Theorem: Exact state reconstruction is possible if and only if system is observable through every subset of N - 2M measurements state could be reconstructed through only N - 2M measurements in the absence of attacks potential attack at M sensors, effectively disables 2M sensors
Estimation algorithms Gramian-based estimator: batch, finite-time estimation inversion of the observability matrix at each time step Observer-based estimator: asymptotic estimation recursive low-computation algorithm provably robust with respect to noise on all sensors (including non attacked ones) Algorithm outline: 1. Build an estimate removing by ignoring a set S of M sensors 2. Build additional estimates by removing, in addition, all combinations of M additional sensors 3. If all attacked sensors were in set S, then the estimates in steps 1. and 2. will be consistent (modulo noise) (all estimates can be constructed without combinatorial complexity, by using finite dimensionality)
Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization
Resilient Cyber-Mission Architectures In complex cyber missions, human operators define policies and rules computing elements automate processes of distributed resource allocation, scheduling, inventory management, etc. self-configuration: automatic configuration of components self-healing: automatic discovery and correction of faults self-optimization: automatic allocation of resources for optimal operation What is the impact of attacks on this type of automated/optimization process? Can we devise algorithms with built-in attack prediction/awareness capabilities?
Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication 2 nd order adjustment rule value at processor i, iteration k correct update on adjustment update on adjustment by attacker adjustment on x i by processor i, at iteration k Goal: minimize errors between values of agents and their neighbors Attacker: maximize errors using stealth attacks (small v i ) peers of agent i (self-included)
Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication 2 nd order adjustment rule value at processor i, iteration k correct update on adjustment update on adjustment by attacker adjustment on x i by processor i, at iteration k Nash equilibrium formulation: error min. by us max. by attacker our updates (small means smooth) min. by us max. by attacker attacker updates (small means stealth) max. by us min. by attacker
Optimal Solution Bellman Equation Optimal Control and Attacker Policies number of peers Under appropriate regularity assumptions (smoothness) u i * is optimal (minimal) for us v i * is optimal (maximal) for attacker Moreover, Consensus will be reached asymptotically All variables will remain bounded through the transient (in fact, Lyapunov stability) Theoretical results derived for a continuous-time approximation of the algorithms, more suitable for the asymptotic analysis
Optimal Solution Bellman Equation Optimal Control and Attacker Policies number of peers But Bellman equation difficult to solve (curse of dimensionality) Last year: Under appropriate Machine learning regularity based assumptions approach to solve (smoothness) this distributed consensus problem u i * is optimal (minimal) for us Restricted to second-order updates (double v i integrator) * is optimal (maximal) for attacker Global knowledge of the communication graph was required Moreover, Global knowledge of the update rules used by each agent required Consensus will be reached asymptotically All variables This year s will work remain overcomes bounded these through 3 limitations the transient (in fact, Lyapunov stability) Theoretical results derived for a continuous-time approximation of the algorithms, more suitable for the asymptotic analysis
Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication General update rule: value at processor i, iteration k+1 correct update malicious update by attacker Goal: minimize errors between values of agents and their neighbors Attacker: maximize errors using stealth attacks (small v i ) peers of agent i (self-included)
Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value) Decision done iteratively & distributed using peer-to-peer communication General update rule: value at processor i, iteration k+1 correct update Nash equilibrium formulation: malicious update by attacker error min. by us max. by attacker our updates (small means smooth) min. by us max. by attacker attacker updates (small means stealth) max. by us min. by attacker
Focus: Distributed Consensus/Agreement Classical problem in distributed computing: A group Challenges: of computing elements must agree on a common scalar value x (e.g., priority, Each resources agent does allocated, not necessarily inventory know decision, the update database algorithm value) Decision used done by iteratively the other & agents distributed (A i,b i,d using i ) peer-to-peer communication Each agent does not necessarily know the global graph (just its General set update of neighbors rule: N i ) value at processor i, iteration k+1 correct update Nash equilibrium formulation: malicious update by attacker error min. by us max. by attacker our updates (small means smooth) min. by us max. by attacker attacker updates (small means stealth) max. by us min. by attacker
Q-learning We shall use Q-Learning (Watkins, 1989) (a popular method from machine learning) states or situations of each agent Generator actions evaluations Tester Q-learning: Model-free machine learning method to learn an action-utility or Q-function, giving the expected utility of taking a given action in a given state. The learned Q-function directly approximates, the optimal action-value function, independent of the policy being followed. Watkins algorithm motivates us
Q-function Instead of the Q-tables (complexity problems) used by Watkins 1989, we shall instead use appropriate neural networks. But encoding the states and actions (Q-function) properly will be challenging. are the unique symmetric positive definite matrices that solve the game Q-function: Optimal Q-function: Each Q-function is quadratic Unknown matrix to be found
Actor/Critic Learning Approach Explicit representation of policy and value function Minimal computation to select actions Can learn an explicit policy Can put constraints on policies Appealing as psychological and neural models Critic = Model free (distributed) algorithm to evaluate the current algorithm & estimate attacker actions Actor = Model free (distributed) algorithm to enact optimal decisions (based on critic s findings) Actor & Critic based on Approximate Dynamic Programming Neural Network approximation of Q-function (action dependent) & optimal control laws
Learning Structure Critic Neural Networks to approximate the costs unknown weights to be learned basis sets, with local state and control information Actor Neural networks to approximate the control and adversarial inputs Bellman equations in integral form Sampling interval Compare to Watkins Q-function
Learning under attacks key result Theorem : Assuming The signals are persistently exciting. The graph is strongly connected. Then The equilibrium points of the closed-loop signals are asymptotically stable The policies converge to a Nash equilibrium Unknown second order dynamics and unknown graph structure, cost weights and leader information Proposed algorithm. Synchronization of all the agents is achieved even under attacks and unknown network and agent information.
Outline... Large matrix games (summary of results) Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning Integration of online optimization for real-time attack prediction and visualization
Cyber Missions Complexity Challenges to real-time cyber-mission protection: cyber assets shared among missions cyber asset requirements change over time missions can use different configurations of resources complex network of cyber-asset dependencies Mission 1 services S0, S1 services S5, S6 service S9 services S0, S1 services S5, S6, S7 service S6 Attack on service S0 can result in multiple mission failure But, damage only realized if missions follows particular paths Mission 2 services S0, S2 service S2 service S8 Cyber Awareness Questions: When & where is an attacker most likely to strike? When & where is an attacker most damaging to mission completion? How will the answer depend on attacker resources? attacker skills? attacker knowledge? (real-time what-if analysis)
Mapping Service Attacks to Mission Damage Mission requires multiple services Mission reliance on services varies with time Damage equation: (for service s at time t) Uncertainty equation: (for service s at time t) Potential damage probability of realizing damage attack resources attack resources equation parameters vary with time as mission progresses (learned from data in ictf exercises) Optimal attacks: maximize constrained by: total damage to mission total attack resources at time t In this period: Developed optimization engine that can address 1000 s of variables/constraints in a few milliseconds.
Multi-Resolution Visualization Multi-resolution attack analysis 1. High-level attack predictions based on online optimization 2. Potential damage & uncertainty associated with attacks to different services 3. Parameters that determine damage and uncertainty AlloSphere Integration High-level predictions permit fast action Low-level parameters permits investigating rationale for predictions
Summary of Accomplishments (Y5) A new notion of observability for systems under attacks. A necessary and sufficient condition for a dynamical system to be M-attack observable. Two estimation algorithms Gramian-based estimator (finite-time convergence) Observer-based estimator (asymptotic convergence) Shielding complex networks from adversarial attacks using Q-learning Developed Model-Free Dynamic Programming-based solution to obtain resilient algorithms with attack estimates Used online Q-learning approach to overcome complexity issues in complex networks There is no need for the physical models, nor the network interactions Applications to critical infrastructures, e.g. power systems Online optimization for real-time attack prediction Develop numerical algorithms for fast (real-time) attack prediction Integration with visualization tools developed (more on Tobias Hollerer s presentation) Focus for future work Develop new self-configuring event-triggered algorithms for complex networks for decision and control given the presence of jamming network attacks. Transition to practice including Mixed Human, Manned and Unmanned Teams and Large Networks of Off-Grid Power systems.
Published last year or in press Publications K. Vamvoudakis, J. Hespanha, B. Sinopoli, Y. Mo. Detection in Adversarial Environments. IEEE Trans. on Automatic Control, Special Issue on the Control of Cyber-Physical Systems, February 2014. To appear. K. G. Vamvoudakis, M. F. Miranda, J. P. Hespanha. Asymptotically-Stable Optimal Adaptive Control Algorithm with Saturating Actuators and Relaxed Persistence of Excitation. IEEE Transactions on Neural Networks and Learning Systems, July 2014. conditionally accepted. Kenji Hirata, J. P. Hespanha, Kenko Uchida. Real-time Pricing Leading to Optimal Operation under Distributed Decision Makings. In Proc. of the 2014 Amer. Contr. Conf., June 2014. Daniel Silvestre, Paulo Rosa, J. Hespanha, C. Silvestre. Finite-time Average Consensus in a Byzantine Environment Using Set-Valued Observers. In Proc. of the 2014 Amer. Contr. Conf., June 2014. K. G. Vamvoudakis, J. P. Hespanha. Online Optimal Switching of Single Phase DC/AC Inverters Using Partial Information. In Proc. American Control Conference, pp. 2624-2630, Portland, OR, 2014. Submitted K. Vamvoudakis, J. Hespanha, Game-Theory based Consensus Learning of Double-Integrator Agents in the Presence of Attackers. Sep 2014. Submitted to journal publication. K. G. Vamvoudakis, P. J. Antsaklis, W. E. Dixon, J. P. Hespanha, F. L. Lewis, H. Modares, B. Kiumarsi. Autonomy and Machine Intelligence in Complex Systems: A Tutorial. submitted to American Control Conference, Chicago, IL, 2015. (invited paper) M. Chong, M. Wakaiki, J. Hespanha. Observability of Linear Systems under attacks. September 2014. Submitted to ACC 2015. Working papers K. G. Vamvoudakis, J. P. Hespanha. Cooperative Q-learning for Rejection of Persistent Adversarial Inputs in Complex Networks. in preparation for Journal of Machine Learning Research, 2014.