Cyber Epidemic Models with Dependences



Similar documents
The EOQ Inventory Formula

Distances in random graphs with infinite mean degrees

Verifying Numerical Convergence Rates

2 Limits and Derivatives

FINITE DIFFERENCE METHODS

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

Tangent Lines and Rates of Change

CHAPTER 7. Di erentiation

An inquiry into the multiplier process in IS-LM model

Pre-trial Settlement with Imperfect Private Monitoring

Derivatives Math 120 Calculus I D Joyce, Fall 2013

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

Instantaneous Rate of Change:

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions?

The modelling of business rules for dashboard reporting using mutual information


Strategic trading and welfare in a dynamic market. Dimitri Vayanos

In other words the graph of the polynomial should pass through the points

How To Ensure That An Eac Edge Program Is Successful

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function

Schedulability Analysis under Graph Routing in WirelessHART Networks


Section 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations

Optimized Data Indexing Algorithms for OLAP Systems

Bonferroni-Based Size-Correction for Nonstandard Testing Problems

Math 113 HW #5 Solutions

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS

MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1 - BASIC DIFFERENTIATION

College Planning Using Cash Value Life Insurance

Note nine: Linear programming CSE Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1

Pretrial Settlement with Imperfect Private Monitoring

Geometric Stratification of Accounting Data

CHAPTER 8: DIFFERENTIAL CALCULUS

Theoretical calculation of the heat capacity

Welfare, financial innovation and self insurance in dynamic incomplete markets models

TRADING AWAY WIDE BRANDS FOR CHEAP BRANDS. Swati Dhingra London School of Economics and CEP. Online Appendix

Solutions by: KARATUĞ OZAN BiRCAN. PROBLEM 1 (20 points): Let D be a region, i.e., an open connected set in

2.23 Gambling Rehabilitation Services. Introduction

Multivariate time series analysis: Some essential notions

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky

ACT Math Facts & Formulas

Equilibria in sequential bargaining games as solutions to systems of equations

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring Handout by Julie Zelenski with minor edits by Keith Schwarz

Writing Mathematics Papers

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}

Projective Geometry. Projective Geometry

Research on the Anti-perspective Correction Algorithm of QR Barcode

SAT Math Must-Know Facts & Formulas

For Sale By Owner Program. We can help with our for sale by owner kit that includes:

Multigrid computational methods are

Free Shipping and Repeat Buying on the Internet: Theory and Evidence

f(x) f(a) x a Our intuition tells us that the slope of the tangent line to the curve at the point P is m P Q =

Training Robust Support Vector Regression via D. C. Program

THE IMPACT OF INTERLINKED INDEX INSURANCE AND CREDIT CONTRACTS ON FINANCIAL MARKET DEEPENING AND SMALL FARM PRODUCTIVITY

SAT Subject Math Level 1 Facts & Formulas

A system to monitor the quality of automated coding of textual answers to open questions

FINANCIAL SECTOR INEFFICIENCIES AND THE DEBT LAFFER CURVE

On a Satellite Coverage

Mathematics Course 111: Algebra I Part IV: Vector Spaces

f(x + h) f(x) h as representing the slope of a secant line. As h goes to 0, the slope of the secant line approaches the slope of the tangent line.

2.1: The Derivative and the Tangent Line Problem

1.6. Analyse Optimum Volume and Surface Area. Maximum Volume for a Given Surface Area. Example 1. Solution

Compute the derivative by definition: The four step procedure

Tis Problem and Retail Inventory Management

The Dynamics of Movie Purchase and Rental Decisions: Customer Relationship Implications to Movie Studios

Channel Allocation in Non-Cooperative Multi-Radio Multi-Channel Wireless Networks

Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning

Chapter 7 Numerical Differentiation and Integration

A strong credit score can help you score a lower rate on a mortgage

A Multigrid Tutorial part two

What is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities.

Determine the perimeter of a triangle using algebra Find the area of a triangle using the formula

2.3 Convex Constrained Optimization Problems

Heterogeneous firms and trade costs: a reading of French access to European agrofood

Average and Instantaneous Rates of Change: The Derivative

Catalogue no XIE. Survey Methodology. December 2004

Asymmetric Trade Liberalizations and Current Account Dynamics

Guide to Cover Letters & Thank You Letters

Yale ICF Working Paper No May 2005

DEPARTMENT OF ECONOMICS HOUSEHOLD DEBT AND FINANCIAL ASSETS: EVIDENCE FROM GREAT BRITAIN, GERMANY AND THE UNITED STATES

Lecture 13 Linear quadratic Lyapunov theory

Chapter 11. Limits and an Introduction to Calculus. Selected Applications

Journal of Development Economics

3. Reaction Diffusion Equations Consider the following ODE model for population growth

Transcription:

Cyber Epidemic Models wit Dependences Maocao Xu 1, Gaofeng Da 2 and Souuai Xu 3 1 Department of Matematics, Illinois State University mxu2@ilstu.edu 2 Institute for Cyber Security, University of Texas at San Antonio dagfvc@gmail.com 3 Department of Computer Science, University of Texas at San Antonio sxu@cs.utsa.edu corresponding autor Abstract Studying models of cyber epidemics over arbitrary complex networks can deepen our understanding of cyber security from a wole-system perspective. In tis paper, we initiate te investigation of cyber epidemic models tat accommodate te dependences between te cyber attack events. Due to te notorious difficulty in dealing wit suc dependences, essentially all existing cyber epidemic models ave assumed tem away. Specifically, we introduce te idea of Copulas into cyber epidemic models for accommodating te dependences between te cyber attack events. We investigate te epidemic equilibrium tresolds as well as te bounds for bot equilibrium and nonequilibrium infection probabilities. We furter caracterize te side-effects of assuming away te due dependences between te cyber attack events, by sowing tat te results tereof are unnecessarily restrictive or even incorrect. Keywords: Copula, Cyber epidemics, dependence, epidemic tresold, spectral radius, infection probability 1 Introduction Cyberspace or Internet is peraps te most complex man-made system. Wile cyberspace as become an indispensable part of te society, economy and national security, cyber attacks also ave become an increasingly devastating problem. Despite studies and progresses in te past decades, our understanding of cyber security from a wole-system perspective, rater tan from a component or building-block perspective, is still at its infant stage. Tis is caused by many factors, including te deart of powerful matematical models tat can capture and reason te interactions between te cyber attacks and te cyber defenses. Recently, researcers ave started pursuing te cyber-security value of biological epidemics -like matematical models. Wile conceptually attractive, biological epidemic models cannot be directly used to describe cyber security because tere are many cyber-specific issues. One particular issue, wic we initiate its study in te present paper, is te dependences between te cyber attack events. To te best of our knowledge, tese dependences ave been explicitly assumed away in essentially all existing cyber epidemic models, peraps because tey are notoriously difficult to cope wit. Indeed, accommodating te dependences introduces yet anoter dimension of difficulty to cyber epidemic models tat incorporate arbitrary complex network structures. However, te dependences are inerent because, for example, te events tat computers get infected are not independent of eac oter, and a malware may first infect some computers because te users visit some malicious websites and ten spread over te network. Moreover, cyber attacks may be well coordinated by intelligent malwares, and te coordination causes positive dependences between te attack events. 1.1 Our contributions In tis paper, we initiate te systematic study of a new sub-field in cyber epidemic models, namely understanding and caracterizing te importance of te dependences between te attack events in cyber epidemic models tat accommodate arbitrary complex network structures. Tis is demonstrated troug a non-trivial generalization of te 1

powerful pus- and pull-based cyber epidemic model tat was recently investigated in [19]. Specifically, we capture te dependences between te cyber attack events by incorporating te idea of Copulas into cyber epidemic models. To te best of our knowledge, tis is te first systematic study of cyber epidemic models tat accommodate dependences, rater tan assuming tem away. Specifically, we make two contributions. First, we derive epidemic equilibrium tresolds, namely sufficient conditions under wic te epidemic spreading enters a non-negative equilibrium te spreading never dies out wen tere are pull-based attacks, meaning tat only positive equilibrium is relevant under tis circumstance. Some of te sufficient conditions are less restrictive but require ard-to-obtain information i.e., tese conditions are teoretically more interesting, and te oters are more restrictive but require easy-to-obtain information i.e., tese conditions are practically more useful. We also derive bounds for te equilibrium infection probabilities and discuss teir tigtness. Te bounds are easy to obtain/compute, and are useful especially wen it is infeasible to obtain te equilibrium infection probabilities numerically let alone analytically. For example, te s can be treated as te worst-case scenarios wen provisioning defense resources. For Erdős-Rényi ER and power-law networks, we furter propose to approximate te equilibrium infection probabilities by taking advantage of te bounds. Te approximation results are smaller tan te s and would not underestimate te number of infected nodes, meaning tat te approximation results can lead to more cost-effective defense. We furter present bounds for non-equilibrium infection probabilities, no matter weter te spreading converges to equilibrium or not. All te results are obtained by explicitly accommodating te dependence structures between te cyber attack events. Second, we caracterize te side-effects of assuming away te due dependences on te bounds for equilibrium infection probabilities, on te epidemic equilibrium tresolds, and on te non-equilibrium infection probabilities. We sow tat assuming away te due dependences can make te results tereof unnecessarily restrictive or even incorrect. We furter discuss te cyber security implications of te side-effects. It is wort mentioning tat as a first step towards ultimately tackling te dependence problem in cyber epidemic models, te Copulas tecnique, wic we use in te present paper, is appealing because of te following. On one and, it leads to tractable models, wile capable of coping wit ig-dimensional dependence i.e., dependence between a large vector of random variables. On te oter and, tere are families of copula structures tat ave been extensively investigated in te literature of Applied Probability Teory and Risk Management, and various metods ave been developed for estimating te types and parameters of copula structures in practice. Of course, muc researc remains to be done before we can answer questions suc as: Wat approac is te most appropriate for accommodating dependence in cyber epidemic models, under wat circumstances? 1.2 Related work Biological epidemic models can be traced back to McKendrick and Kermack [13, 10]. Suc omogeneous biological epidemic models were introduced to computer science for caracterizing te spreading of computer viruses in [9]. Heterogeneous epidemic models, especially te ones tat accommodate arbitrary network structures, were not studied until recently [17, 6, 1]. Tese studies led to te full-fledged pus- and pull-based cyber epidemic model [19], wic is te starting point of te present paper. To te best of our knowledge, all existing cyber epidemic models, wic aim to accommodate arbitrary network structures including oter recent studies suc as [16, 11, 20] and te references terein, assumed tat te attacks are independent of eac oter. Tis is plausible because accommodating arbitrary network structures in cyber epidemic models already make te resulting models difficult to analyze, and accommodating dependences introduces, as we sow in te present paper, anoter dimension of difficulty to te models. Te only exception is due to our recent study [18], wic is based on a different approac to modeling cyber epidemics [11]. Te main contribution of [18] is to get rid of te exponential distribution assumptions for certain random variables. Moreover, te model in [18] can only accommodate te specific Marsall-Olkin dependence structure between te attack events. In contrast, we ere accommodate arbitrary dependence structures between te attack events, wile investigating te epidemic equilibrium tresolds, te equilibrium and non-equilibrium infection probabilities, and te side-effects of assuming away te due dependences. Many of tese issues are not studied in te context of [18] because its focus is different. Tis explains wy te present paper is te first systematic treatment of dependences in 2

cyber epidemic models. Te dependence modeled in te present paper is static i.e., time-invariant. Tis study inspired [5], wic makes a furter step towards modeling dynamic dependence between cyber attacks, but using a different modeling approac. Te rest of te paper is organized as follows. In Section 2 we briefly review some facts about Copulas. In Section 3 we investigate te generalized cyber epidemic model tat accommodates te dependences between te cyber attack events. In Section 4 we caracterize te side-effects as caused by assuming away te due dependences. In Section 5, we conclude te paper wit future researc problems. Te following table summarizes te main notations used in te paper. G = V, E te grap/network in wic cyber epidemics occurs, were V is te node set and E is te edge set degv te degree of node v in grap G = V, E, wic can be represented by adjacency matrix A I v t, I v,j t te state of node v at time t: I v t = 1 means infected and 0 means secure; I v,j t is te state of te jt neigbor of node v 1 means infected and 0 means secure, were 1 j degv i v t te probability tat node v V is infected at time t i v,j t te condition probability tat at time t node v V is secure but te jt neigbor of node v is infected i v, i + v lower and s for te non-equilibrium infection probability lim t i v t, were te system does not converge to any equilibrium i v, i v,j te equilibrium infection probabilities tat node v and its jt neigbor are infected, respectively i, i + v i is te for te equilibrium infection probability i v for every v, i + v is te for te equilibrium infection probability i v of node v i, i v i = i 1,..., i N were N = V, i v = i v,1,..., i v,degv α te probability tat a secure node v V is infected by pull-based cyber attacks at a time step β te probability tat an infected node v V becomes secure at a time step γ te probability tat an infected node u successfully attacks node v over u, v E at a time step ρm spectral radius of matrix M C v degv-copula describing te dependence between te pus-based cyber attacks against node v C 2-copula describing te dependence between te pull-based attacks and te pus-based attacks against a node δ C te diagonal section of copula C, i.e., δ C u = Cu,..., u 2 Preliminaries Copulas can model dependences by relating te individual marginal distributions to teir multivariate joint distribution. In tis paper we will use te n-copulas [8, 15]. Specifically, a function C : [0, 1] n [0, 1] is called n-copula if: Cu 1,..., u n is increasing in eac component u z, z {1,..., n}. Cu 1,..., u z 1, 0, u z+1,..., u n = 0 for all u j [0, 1], j = 1,..., n, j z. C1,..., 1, u z, 1,..., 1 = u z for all u z [0, 1], z = 1,..., n. C is n-increasing, i.e., for all u 1,1,..., u 1,n and u 2,1,..., u 2,n in [0, 1] n wit u 1,j u 2,j and for all j = 1,..., n, it olds tat 2 2... 1 n j=1 z j Cu z1,1,..., u zn,n 0. z 1 =1 z n =1 Let R 1,..., R n be random variables wit distribution functions F 1,..., F n, respectively. Te joint distribution function is F r 1,..., r n = P R 1 r 1,..., R n r n. Te well-known Sklar s teorem states tat tere exists an n-copula C suc tat F r 1,..., r n = C F 1 r 1,..., F n r n. 3

Tere are many families of copulas [8, 15]. One example is te Gaussian copula wit Cu 1,..., u n = Φ Φ 1 u 1,..., Φ 1 u n, were Φ 1 is te inverse cumulative distribution function of te standard normal distribution and Φ is te joint cumulative distribution function of a multivariate normal distribution wit mean vector zero and covariance matrix equal to te correlation matrix. For simplicity, we assume tat te correlation matrix as te form = 1 σ... σ σ 1 σ σ... σ σ... 1 were σ measures te correlation between two random variables. Terefore, te Gaussian copula can be rewritten as Anoter example is te Arcimedean family wit, Cu 1,..., u n = Φ σ Φ 1 u 1,..., Φ 1 u n. Cu 1,..., u n = ϕ 1 ϕu 1 +... + ϕu n, were function ϕ is called a generator of C and satisfies certain properties see [14] for details. Te Arcimedean family contains many well-known copula functions suc as te Clayton and Frank copulas [15, 3]. Te generator of te Clayton copula is ϕ θ u = u θ 1, and we ave n Cu 1,..., u n = j=1 j n + 1 u θ 1/θ, θ > 0. Te generator of te Frank copula is ψ ξ u = log e ξu 1, and we ave e ξ 1 Cu 1,..., u n = 1 n } {1 ξ log j=1 + e ξu j 1 e ξ 1 n 1, ξ > 0. For illustration purpose, we will use te Gaussian, Clayton and Frank copulas as examples. In order to compare te effects of dependences, we need to compare te degrees of dependences. For tis purpose, we use te concordance order [15, 8]. Let C 1 and C 2 be two copulas, we say C 1 is less tan C 2 in concordance order if C 1 u 1,..., u n C 2 u 1,..., u n for all 0 u i 1, i = 1,..., n. In particular, Gaussian copulas and Clayton copulas are increasing in σ and θ in concordance order, respectively. Te following lemmas will be used in te paper. Lemma 1 [15] Let C be any n-copula, ten n max u j n + 1, 0 Cu 1,..., u n min{u 1,..., u n }. j=1 Lemma 2 [15] Let C be an n-copula, ten Cu 1,..., u n Cv 1,..., v n n u j v j. j=1 4

3 Cyber Epidemic Model Wit Arbitrary Dependences Now we present and investigate te cyber epidemic model tat accommodates te dependences between te cyber attack events. Tis is te first systematic treatment of dependences in cyber epidemic models. 3.1 Te Model As in [19], we consider an undirected finite network grap G = V, E, were V = {1, 2,..., N} is te set of N = V nodes vertices tat can abstract computers or software components at an appropriate resolution, and E = {u, v : u, v V} is te set of edges. Note tat G abstracts te network structure according to wic te pus-based cyber attacks take place e.g., malware spreading, were u, v E abstracts tat node u can attack node v. In bot principle and practice, G can range from a complete grap i.e., any u V can directly attack any v V to any specific grap structure i.e., node u may not be able to attack node v directly because, for example, te traffic from node u is filtered or u is blacklisted by v, wic explains wy we sould pursue general results witout restricting te network/grap structures. Denote by A = a vu te adjacency matrix of G, were a vu = 1 if and only if u, v E, and a vu = 0 oterwise. Note tat te problem setting naturally implies a vv = 0. Denote by degv te degree of node v. In a discrete-time model, node v V is eiter secure but vulnerable to attacks or infected and can attack oter nodes at any time t = 0, 1,.... At eac time step, an infected node v becomes secure wit probability β, wic abstracts te defense power. Te model accommodates two large classes of cyber attacks: a secure node v can become infected because of i pull-based cyber attacks wit probability α, wic include drive-by-download attacks i.e., node v getting infected because its user visits a malicious website and insider attacks i.e., te user intentionally runs a malware on node v, or ii pus-based cyber attacks launced by v s infected neigbor u over edge u, v E wit probability γ. Our extension to te above model is to accommodate te dependences between te pus-based attacks as well as te dependences between te pus-based attacks and te pull-based attacks. Tese attacks are not independent because te events tat te nodes get infected are not independent of eac oter, and because te pus-based attacks are not independent of te pull-based attacks e.g., a malware could first infect some nodes via te pull-based cyber attacks and ten launc te pus-based cyber attacks from te infected nodes. Moreover, te dependences between te pus-based attacks can model tat intelligent malwares launc coordinated attacks against te secure nodes. Specifically, let I v t denote te state of node v at time t, were I v t = 1 means v is infected and 0 means v is secure. Let I v,1 t,..., I v,degv t denote te state vector of node v s neigbors at time t, were { 1, te jt neigbor of node v is infected at time t, I v,j t = 0, oterwise. Define i v t = PI v t = 1 and i v,j t = PI v,j t = 1 I v t = 0 were j = 1,..., degv. Let X v t = 1 denote te event tat node v is infected at time t + 1 because of te pus-based cyber attacks, and X v t = 0 oterwise. Let X v,j t + 1 = 1 denote te event tat node v is infected at time t + 1 by its jt neigbor, and X v,j t + 1 = 0 oterwise. Note tat PX v,j t + 1 = 1 I v t = 0 = γ i v,j t. Since any dependence structure between X v,1 t + 1,..., X v,degv t + 1 always can be accommodated by some copula function C v, we ave PX v t + 1 = 0 I v t = 0 = C v 1 PXv,1 t + 1 = 1 I v t = 0,..., 1 PX v,degv t + 1 = 1 I v t = 0 = C v 1 γiv,1 t,..., 1 γi v,degv t. 1 Similarly, let Y v t + 1 = 1 denote te event tat node v is infected at time t + 1 because of te pull-based cyber attacks. Ten, we ave PY v t + 1 = 1 I v t = 0 = α. By furter accommodating te dependence structure between te pus-based attacks and te pull-based attacks via some copula function C, we ave PI v t + 1 = 1 I v t = 0 = 1 P X v t + 1 = 0, Y v t + 1 = 0 I v t = 0 = 1 C C v 1 γiv,1 t,..., 1 γi v,degv t. 2 5

Note tat PI v t + 1 = 1 I v t = 1 = 1 βi v t. 3 From Eqs. 1, 2 and 3, we obtain te probability tat node v V is infected at time t + 1 as: i v t + 1 = PI v t + 1 = 1 = PI v t + 1 = 1 I v t = 1PI v t = 1 + PI v t + 1 = 1 I v t = 0PI v t = 0 = 1 β i v t + PI v t + 1 = 1 I v t = 0 1 i v t = 1 βi v t + [ 1 C C v 1 γiv,1 t,..., 1 γi v,degv t ] 1 i v t. 4 We will analyze Eq. 4 for v V to caracterize te effects of te dependence structures C and C v and te side-effects of assuming tem away. Note tat for te special case tat te X v,j s are independent of eac oter and te pus-based attacks and te pull-based attacks are also independent of eac oter, Eq. 4 degenerates to te model in [19]. Note also tat in order to caracterize te side-effects of assuming away te dependences, we need to accommodate te dependences at a iger-level of abstraction tan te model parameters α and γ. Tis is because te parameters are indeed relatively easier to obtain in experiments/practice e.g., considering a single compromised neigbor tat is launcing te pus-based attacks, and considering te pull-based attacks in te absence of te pusbased attacks. 3.2 Epidemic Equilibrium Tresold and Bounds for Equilibrium Infection Probabilities Te concept of epidemic equilibrium tresold [19] naturally extends te well-known concept of epidemic tresold in tat te former describes te condition under wic te epidemic spreading converges to a non-negative equilibrium, wereas te latter traditionally describes te condition under wic te epidemic spreading converges to 0 i.e., te spreading dies out. Note tat α > 0 implies tat te spreading will never die out and tat α = 0 is necessary for te spreading to die out. Denote by i v te equilibrium infection probability for node v V. In te equilibrium, Eq. 4 becomes: [ ] i v = 1 βi v + 1 C C v 1 γi v,1,..., 1 γi v,degv 1 i v, v V. 5 In wat follows, Teorem 1 gives a general epidemic equilibrium tresold i.e., sufficient condition under wic te spreading enters te equilibrium, and Teorem 2 gives a more succinct but more restrictive sufficient condition. Lemma 3 Let A be te adjacency matrix of G. If ρa < β + α2, 6 γβ ten system 4 as a unique equilibrium i 1,..., i N [0, 1]N. Proof For any v V, define f v x : [0, 1] N [0, 1] as f v x = 1 C C v 1 γxv,1,..., 1 γx v,degv β + 1 C C v 1 γxv,1,..., 1 γx v,degv, v = 1,..., N, were x = x 1,..., x N [0, 1] N. Define f : [0, 1] N [0, 1] N, were fx = f 1 x,..., f N x. According to te Banac fixed-point teorem [7], it is sufficient to sow tat fx = x as a unique solution i ; tat is, we need to prove tat f is a contraction mapping. Let x, y [0, 1] N. Consider te distance between tem in te Euclidean norm, fx fy = N f v x f v y 2 = N 2 βγv, v=1 6 v=1 v

were Γ v = C C v 1 γxv,1,..., 1 γx v,degv C Cv 1 γyv,1,..., 1 γy v,degv, v = β + 1 C C v 1 γxv,1,..., 1 γx v,degv By Lemmas 1 and 2, it follows tat Terefore, we ave Moreover, degv N x v,k y v,k v=1 k=1 β + 1 C C v 1 γyv,1,..., 1 γy v,degv. degv Γ v γ x v,k y v,k and v β + α 2. k=1 2 degv βγ N fx fy β + α 2 x v,k y v,k. 2 v=1 k=1 = x 1 y 1,..., x N y N A 2 x 1 y 1,..., x N y N T x 1 y 1,..., x N y N 2 A 2 = x y 2 A 2, were A denotes te operator norm of A. Since A is symmetric matrix, we ave From condition 6, it follows tat wic means tat f is a contraction mapping. A = ρa. fx fy βγρa 2 x y < x y, β + α Teorem 1 general epidemic equilibrium tresold Let A be te adjacency matrix of G and D be te diagonal matrix wit te vt 1 v N diagonal element equal to α, β, γ, i v = C C v 1 γi v,1,..., 1 γi v,degv β, were i v is te equilibrium infection probability tat satisfies Eq. 5. Let W = D+γA. If condition 6 olds, namely tat system 4 as a unique equilibrium, and te spectral radius ρw < 1, ten lim t i v t = i v exponentially for all v V. Proof According to Lemma 3, tere is a unique solution for i v under condition 6. Denote by r v t = i v t i v. We want to identify a sufficient condition under wic lim t r v t = 0 for all v V. Note tat [ ] r v t + 1 = r v t C C v 1 γi v,1,..., 1 γi v,degv β + 1 i v t [ C C v 1 γi v,1,..., 1 γi v,degv C C v 1 γiv,1 t,..., 1 γi v,degv t ]. 7

By Lemma 2, we ave r v t + 1 r v t α, β, γ, i v + 1 i v t C v 1 γi v,1,..., 1 γi v,degv C v 1 γiv,1 t,..., 1 γi v,degv t degv r v t α, β, γ, i v + γ 1 i v t i v,j i v,j t j=1 degv r v t α, β, γ, i v + γ r v,j t, j=1 were α, β, γ, i v = C C v 1 γi v,1,..., 1 γi v,degv β. Define degv z v t + 1 = z v tα, β, γ, i v + γ z v,j t, wit z v 0 r v 0 and z v,j 0 r v,j 0 for j = 1,..., degv. We see r v t z v t for any t. Let zt = z 1 t,..., z n t T. Ten, we ave te following matrix form j=1 zt + 1 = W zt = W t+1 z0, 7 were W = D + γa, D is te diagonal matrix wit diagonal element α, β, γ, i v, and A is te adjacency matrix of G. Since matrix W is nonnegative and symmetric, te Spectral Teorem [12] says tat ρw is real. By using te well-known Gelfand formula, if ρw < 1, ten lim t W t = 0 and terefore lim t zt = 0. Since ρw = lim t W t 1/t and W t [ρw ] t, t, were is te norm in real space R n, we conclude tat W t converges to 0 exponentially wen ρw < 1. Tis means tat te convergence rate of lim t it = i is at least exponential. Use of te sufficient condition given by Teorem 1 requires to know i i.e., i v for all v, wic is difficult to obtain analytically. It is terefore important to weaken tis requirement. Now we present a sufficient condition tat only requires te equilibrium infection probability i v for some v rater tan for all v V. According to [4], we ave ρw max v V α, β, γ, i v + γρa. Terefore, a more restrictive tan te one given by Teorem 1 sufficient condition is to require max α, β, γ, v V i v + γρa < 1, namely ρa < 1 max v V α, β, γ, i v. γ According to Eq. 5, we ave α, β, γ, i v = 1 β 1 i. v Terefore, we obtain te following more restrictive, but more succinct, sufficient condition: Corollary 1 lim t i v t = i v exponentially for all v V, if { 1 maxv V 1 β/1 i ρa min v, γ } β + α2. 8 γβ 8

Applying te above sufficient condition still requires to know te minimal and maximal i v s, wic is ard to obtain analytically. Altoug it is always possible to obtain tem numerically, we would want to ave some more general results witout relying on s. In wat follows we present suc a sufficient condition Teorem 2, wic requires te following Proposition 1 tat presents bounds for te equilibrium infection probability. Te bounds are certainly of independent value. Proposition 1 bounds for equilibrium infection probabilities For any dependence structures C and C v, wic may be unknown, te equilibrium infection probability i v for v V satisfies i i v i + v, were { } i = γ β I{γ > α + β} + α min α + γdegv I{γ α + β}, γ β + α i + β+1, 1 v = { }. β + min α + γdegv β+1, 1 Proof Rewrite Eq. 5 as i v = 1 C C v 1 γi v,1,..., 1 γi v,degv. 9 β + 1 C C v 1 γi v,1,..., 1 γi v,degv By noticing te monotonicity in 9 and applying Lemma 1, we obtain } { max {γi v,1,..., γi v,degv, α min α + γ } degv } i β + max {γi j=1 i v,j, 1 v,1,..., γi v,degv, α v { β + min α + γ }. 10 degv j=1 i v,j, 1 Let us first consider te. Note tat for eac v V, By substituting x 1 for i v,j in Ineq. 10, we ave i v x 1 def = α β + α. i v x 2 = max {γx 1, α} β + max {γx 1, α}. By substituting x 2 for i v,j in Ineq. 10, we obtain x 3. By repeating te substitution, we obtain a sequence {x n, n 1} wit x n = max {γx n 1, α} β + max {γx n 1, α}, x 0 = 0. Since {x n, n 1} is increasing and bounded, we can get its limit, namely i, by solving te following equation max{γx, α} β + max{γx, α} = x. For te, note tat i v β + 1 1 for v V. By substituting 1/β + 1 for i v,j in Ineq. 10, we get i + v. It is useful to know wen te bounds in Proposition 1 are tigt. For tis purpose, we observe tat if β+1 degv 0, meaning tat γ degv << 1 and tat te attack-power is not strong, we ave i + v i = β + α. Tis means α tat te bounds are tigt wen te attack-power is not strong. On te oter and, Proposition 1 allows us to derive te following more succinct, but more restrictive tan Corollary 1 and terefore Teorem 1, sufficient condition for te epidemic spreading converges to te equilibrium i.e., epidemic equilibrium tresold. Te new sufficient condition involves te bounds i and i + v only i.e., none of te equilibrium probabilities tat are ard to obtain analytically. 9 γ

Teorem 2 succinct epidemic equilibrium tresold Te spreading enters te unique equilibrium if { { }} β 1 max max 1 v V 1 i ρa, β 1 1 i + v, γ were i and i + v are defined in Proposition 1. Proof Note tat for any v V, we ave max α, β, γ, v V i v = max v V 1 β { { 1 i max β max 1 v v V 1 i, 1 Note tat wic implies Terefore, 1 max v V { max { 1 { { max max 1 v V 1 β 1 i, 1 β 1 i β 1 i, 1 β 1 i + v According to Corollary 1, we obtain te desired result. }} β + α2 1, β β 1 i + v }} 1 β + α2. β β 1 i + v { min 1 max 1 β/1 v V i v, }}. } β + α2. β 3.3 Tigter Bounds for Equilibrium Infection Probabilities in Star and Regular Networks Star networks. A star-saped network consists of a ub and N 1 leaves tat are connected only to te ub. Hence, te adjacency matrix A can be represented as 0 1... 1 1 0... 0 A =..... 0 1 0... 0 Te spectral radius is ρa = N 1. In tis case, Eq. 5 becomes: N N i = 1 C δ C 1 γi l β + 1 C δ C 1 γi 11 l, i l = 1 C 1 γi 1 + β C 1 γi 12, were i and i l are te equilibrium probabilities tat te ub and te leaves are infected, respectively. Note tat te effect of te copula C on te equilibrium probabilities only depends on its diagonal section δ C. In wat follows we present two results about te equilibrium infection probabilities, wic are not implied by te above general results tat apply to arbitrary network structures. First, we can prove i i l. Proposition 2 For te star networks, it olds tat i i l. 10

Proof Denote by fx = 1 C δ C 1 γx β + 1 C δ C 1 γx and gx = 1 C 1 γx 1 + β C 1 γx were x [0, 1]. Since δ C x x, we ave fx gx. Suppose i < i l and i, i l is a solution to Eqs. 11 and 12. Ten, i = fi l = g 1 i l < i l. Since gx is increasing in x and so is g 1, we ave i l gi l and fi l < i l gi l, wic contradicts wit fx gx for x [0, 1]. Second, we present refined bounds for equilibrium infection probabilities i and i l. Te bounds are useful because even in te case of star networks, it is ard to derive analytic expressions and infeasible to numerically compute especially for complex dependence structures i and i l. Proposition 3 tigter s for te equilibrium infection probabilities in star networks For star networks and regardless of te dependence structures wic can be unknown, we ave i i i + and i i l i + l, were i is defined in Proposition 1 and { i + 1 1 = β + 1 I β + 1 1 α } N 1γ and + N 1γ α β + N 1γ α β 2 + 4N 1γα I 2N 1γ i + l = 1 { 1 β + 1 I β + 1 1 α } γ { 1 β + 1 < + γ α β + γ α β 2 { + 4γα 1 I 2γ β + 1 < 1 α }. γ Proof Te i is te same as in Proposition 1. Let s focus on i +. From Ineq. 10, we ave i min{α + N 1γi l, 1} β + min{α + N 1γi l, 1} = fi l. Since te rigt-and side of te above inequality increases in i l, by Proposition 2 we ave and terefore i i + For te i + l, were i + i fi, is te solution to equation 1 α }. N 1γ x = fx. 13, we can similarly obtain te desired result by solving equation x x = f N 1. 14 Now we explain wy te s i + and i + l given by Proposition 3 are smaller i.e., tigter tan te general s tat can be derived from Proposition 1 by instantiating G = V, E as star networks. To see tis, we note tat i + is te solution to Eq. 13 and i + 1 β+1, meaning tat { } min α + N 1 γ i + β+1 {, 1 }, β + min α + N 1 γ β+1, 1 were te rigt-and side of te inequality is exactly te tat can be derived from Proposition 1 by substituting degv wit te degree of te ub. Tis means tat i + is smaller tan te given by Proposition 11

1. Similarly, we can sow tat i + l is smaller tan te given by Proposition 1. Moreover, by comparing 13 and 14, we see tat i + i + l. Since te i is te same as te given by Proposition 1, we conclude tat te bounds given by Proposition 3 are tigter tan te bounds given by Proposition 1. To see te tigtness of te bounds given by Proposition 3, we consider two combinations of dependence structures: C, C v =Gaussian,Frank and C, C v =Gaussian, Clayton wit parameters σ = θ = ξ = 0.1 as reviewed in Section 2. Figure 1 plots i, i l, i, i +, and i + l for N = 3,..., 81 wit α, β, γ = 0.5, 0.1, 0.1; all tese parameter settings satisfy condition 8. We observe tat te i + becomes flat for N 5, because it causes i + = 1 β+1 i.e., independent of N; wereas, te i + l is flat because it is always independent of N. We observe tat te for ub node, i +, becomes extremely tigt for dense star networks wit N > 40. However, te for leave nodes almost always exibits tat i + l i l 0.011 i.e., te upper bound overestimates about 0.88 infected nodes for a star network of N = 80 nodes. In any case, te s only somewat overestimate te s i v s and tus can be used for decision-making purpose wen i v s are infeasible to compute. 0.75 0.80 0.85 0.90 0.95 Star 0 20 40 60 80 Leaves 0.83 0.84 0.85 0.86 0.87 Star 0 20 40 60 80 Leaves 0.75 0.80 0.85 0.90 0.95 Star 0 20 40 60 80 Leaves 0.83 0.84 0.85 0.86 0.87 Star 0 20 40 60 80 Leaves a Hub: Gaussian, Frank b Leaves: Gaussian, Frank c Hub: Gaussian, Clayton d Leaves: Gaussian, Clayton Figure 1: Star networks: i + for ub i + l for leaves vs. i for ub i l i for bot ub and leaves wit respect to α, β, γ = 0.5, 0.1, 0.1 and C, C v. for leaves vs. Regular networks. For regular networks, eac node v V as degree d for some d [1, N 1] and ρa = d. According to Proposition 1, we ave i v = 1 C C v 1 γi v,1,..., 1 γi v,d β + 1 C C v 1 γi v,1,..., 1 γi, v V. v,d Now we want to present refined bounds for equilibrium infection probability i v. Proposition 4 tigter for te equilibrium infection probability in regular networks For regular network G = V, E and regardless of te dependence structures wic can be unknown, we ave i i v i + for any v V, were i is defined in Proposition 1 and i + = 1 { 1 β + 1 I β + 1 1 α } γd + γd α β + γd α β 2 { + 4γαd 1 I 2γd β + 1 < 1 α }. γd Proof Define function fx = min{α + γdx, 1} β + min{α + γdx, 1} and a sequence {x n, n 0} wit x n = fx n 1, x 0 = 1/β + 1. Observe tat for all v V, we ave i v x 0 and ence from Ineq. 10, it follows tat i v x 1 for all v V. By repeating tis process, we ave i v x n for all n. Since fx is increasing and x 1 x 0, x n is decreasing in n. Tus, we ave i v i +, wic is te solution 1 of te equation x = fx. If β+1 1 α γd, ten i + = 1 β+1 ; oterwise, i + is te positive solution to equation γdx 2 + α + β γdx α = 0. Tus, we obtain te desired result. 12

Note tat te i + given by Proposition 4 is smaller tan te i + v obtained by instantiating degv = d in Proposition 1, because i + v is exactly te x 1 defined in te proof of Proposition 4. To see te tigtness of bounds i and i + given by Proposition 4, we consider C, C v =Gaussian,Frank and C, C v =Gaussian, Clayton wit parameters σ = θ = ξ = 0.1 as reviewed in Section 2. Figure 2 plots numerical i v, i and i + wit respect to node degree d = 2,..., 80 wit α, β, γ = 0.5, 0.1, 0.01; all tese parameter settings satisfy condition 8. We observe tat i + v becomes flat for sufficiently dense regular networks. Tis is because i + v = 1 β+1 wen d 1 αβ+1 γ. For C, C v =Gaussian,Frank, we furter observe tat te i + v is reasonably tigt especially for relatively sparse regular networks, wit i + v i v < 0.021 for d < 20 i.e., for a sparse regular network of N = 1000 nodes, te only overestimates at most 21 infected nodes. Even for dense regular network wit d > 20, we ave i + v i v 0.038 i.e., for a dense regular network of N = 1000 nodes, te only overestimates at most 38 infected nodes, were equality olds for d = 54. For C, C v =Gaussian, Clayton, we also observe tat te i + v is tigt especially for relatively sparse regular networks wit d < 20 and i + v i v < 0.021 i.e., for a sparse regular network of N = 1000 nodes, te only overestimates at most 21 infected nodes. Even for dense regular network wit d > 20, we ave i + v i v 0.039, were equality olds for d = 54. Tis means tat for decision-making purpose, te defender can use te i + v instead of te i v, especially wen i v is infeasible to compute. 0.80 0.85 0.90 0.95 1.00 Regular 0 20 40 60 80 0.80 0.85 0.90 0.95 1.00 Regular 0 20 40 60 80 a Gaussian,Frank,0.5,0.1,0.01 b Gaussian,Clayton,0.5,0.1,0.01 Figure 2: Regular networks: i + v vs. i v vs. i v C, C v, α, β, γ wit respect to 3.4 Approximating Equilibrium Infection Probabilities in ER and Power-law Networks For star and regular networks, we ave derived tigter bounds for equilibrium infection probabilities tan te general bounds given by Proposition 1. Unfortunately, we do not know ow to derive tigter bounds for ER and power-law networks. As an alternative, we propose to approximate equilibrium infection probabilities by taking advantage of te upper and s. Te approximation is useful because it is often smaller tan te, wic never underestimates, but may substantially overestimate, te treats in terms of equilibrium infection probabilities. Tat is, te approximation metod can lead to more cost-effective defense tan te. Te approximation metod is te following: We first compute s, s, and s for a feasible number of instances of G, C, C v, α, β, γ, based on given computer resources. We ten use te resulting data to derive via statistical metods some function of te lower and s. For even larger G of te same type as well as C, C v of te same kind, te resulting function would be smaller tan te and would not underestimate te equilibrium infection probabilities. Te key insigt is tat we can compute, for networks of any size, te upper and s according to Proposition 1. Tis means tat we can approximate te equilibrium infection probabilities for arbitrarily large networks, for wic it is often infeasible to numerically let alone analytically compute te equilibrium infection probabilities. To illustrate te approximation metod, we also consider C, C v =Gaussian,Frank and C, C v =Gaussian, Clayton wit parameters σ = θ = ξ = 0.1 as reviewed in Section 2. We use te erdos.renyi.game generator of te igrap package in te R system to generate a random ER network of N = 1000 nodes and edge probability 0.01; te resulting network instance as spectral radius 11.38045. We use te static.power.law.game 13

generator of te igrap package in te R system to generate a random power-law network of N = 1000 nodes, 5000 edges, and power-law exponent 2.1 note tat 2.1 is te power-law exponent of te Internet AS-level network [6]; te resulting network instance as spectral radius 22.97582. We consider combinations of α, β, γ tat satisfy condition 8, were α {0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5}, β {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}, γ {0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1}. It turns out tat for C, C v =Gaussian, Frank, te ER network as 307 combinations of α, β, γ tat satisfy condition 8; te power-law network as 125 combinations of α, β, γ tat satisfy condition 8, because te spectral radius is larger. For C, C v =Gaussian, Clayton, te ER network as 307 combinations of α, β, γ tat satisfy condition 8; te power-law network as 126 combinations of α, β, γ tat satisfy condition 8. We compute equilibrium infection probability i v numerically by solving Eqs. 5 for v V via te BB package in te R system. We compute te upper and s, namely i and i + v, according to Proposition 1. Since it is infeasible to numerically compute i v for large networks, we propose to approximate i v for node v V via î v = 1 2 ĩ v + iv +, were ĩ v = f C,Cvi, i + v, degv = k 0 + k 1 i + k 2 i + v + k 3 degv can be statistically derived from te data. Note tat te euristic function î v could be refined via more extensive numerical studies. We define te approximation error for network G as err G = v V î v i v, because v V i v is an important factor for cyber defense decision-making. For practical use, it is desired tat err G 0, meaning tat te defender never underestimates te treats, and at te same time err G 0, meaning tat te defender does not overestimate te treats i.e., does not overprovision defense resources too muc. ER networks. For te ER network, we obtain te following formulas : For C, C v =Gaussian, Frank, we ave î v = 0.01759 + 0.3142i + 0.7294i + v 0.0002575 degv. For C, C v =Gaussian, Clayton, we ave î v = 0.0174076+0.3150585i +0.7281992i + v 0.0002596 degv. For C, C v =Gaussian, Frank, te average of te err G s over te 307 combinations of C, C v, α, β, γ is 46, meaning tat te approximation metod only overestimates 46 infected nodes in an ER network of 1000 nodes. In comparison, te average of te v V i + v i v s over te 307 combinations of C, C v, α, β, γ is 93, meaning tat te overestimates 93 infected nodes i.e., te approximation metod is indeed better; te average of te v V i i v s is -52.7, meaning tat te underestimates 52.7 infected nodes in an ER network of 1000 nodes. Finally, we note tat among te 307 combinations of C, C v, α, β, γ, te maximum err G is 165.2, wic is elaborated in Figure 3a and will be discussed furter, and te minimum err G is 4.1, wic is elaborated in Figure 3b and will be discussed furter as well. For C, C v =Gaussian, Clayton, te average of te err G s over te 307 combinations of C, C v, α, β, γ is 46.5, meaning tat te approximation metod only overestimates 46.5 infected nodes in an ER network of 1000 nodes. In comparison, te average of te v V i + v i v s over te 307 combinations of C, C v, α, β, γ is 93, meaning tat te overestimates 93 infected nodes in an ER network of 1000 nodes; te average of te 307 v V i i v s is -52.5, meaning tat te underestimates 52.5 infected nodes in an ER network of 1000 nodes. Among te 307 instances, te maximum err G is 165.0, wic is elaborated in Figure 3d and will be discussed furter, and te minimum err G is 4.2, wic is elaborated in Figure 3e and will be discussed furter. In summary, cyber defense decision-making can be based on te approximation metod, wic takes advantage of te upper and s and would be better smaller tan te. As a side-product, we would like to igligt te penomenon tat te equilibrium infection probability i v increases wit node degree degv. Tis penomenon was observed in [19] in te absence of dependence, and persists, i v, î v and i wit respect to distinct node degrees, by taking te average over te nodes of te same degree wen needed. For C, C v =Gaussian,Frank, Figures 3a-3b plot te infection probabilities corresponding to te α, β, γ tat leads to te maximum and minimum err G, respectively; Figure 3c plots te infection probabilities averaged over te 307 combinations of α, β, γ tat satisfy condition 8. For C, C v =Gaussian,Clayton, Figures 3d-3e plot te infection probabilities corresponding to in te presence of dependence as we elaborate below. We consider i + v 14

0.0 0.2 0.4 0.6 0.8 ER approximation 5 10 15 20 0.20 0.25 0.30 0.35 0.40 ER approximation 5 10 15 20 0.2 0.4 0.6 0.8 1.0 ER approximation 5 10 15 20 a Gaussian,Frank, 0.01,0.4,0.03 b Gaussian,Frank, 0.3,0.9,0.01 c Gaussian,Frank 0.0 0.2 0.4 0.6 0.8 ER approximation 5 10 15 20 0.20 0.25 0.30 0.35 0.40 ER approximation 5 10 15 20 0.2 0.4 0.6 0.8 1.0 ER approximation 5 10 15 20 d Gaussian,Clayton, 0.01,0.4,0.03 e Gaussian,Clayton, 0.3,0.9,0.01 f Gaussian,Clayton Figure 3: ER networks: vs. approximation vs. vs. wit respect to C, C v, α, β, γ or C, C v. te α, β, γ tat leads to te maximum and minimum err G, respectively; Figure 4f plots te infection probabilities averaged over te 307 combinations of α, β, γ tat satisfy condition 8. We observe tat te approximation î v can sligtly underestimate te infection probability i v for node v of degree degv 5, but te overall estimation v V î v is still above te actual treats v V i v as mentioned above. More importantly, we observe tat i v solid curves increases wit degv. Tis ints tat tere migt be some universal scaling laws, in te presence or absence of dependence. It is an interesting future work to identify te possible scaling law. Power-law networks. For power-law networks, we obtain te following formulas in a similar fasion: For C, C v =Gaussian, Frank, we ave î v = 0.007395 + 0.34705i + 0.67205i + v For C, C v =Gaussian, Clayton, we ave î v = 0.007365 + 0.34765i + 0.6714i + v + 0.00013505 degv. + 0.00013525 degv. For C, C v =Gaussian, Frank, te average of te err G s over te 125 combinations of C, C v, α, β, γ is 25, meaning tat te approximation only overestimates 25 infected nodes in a power-law network of 1000 nodes. In comparison, te average of te v V i + v i v s over te 125 combinations of C, C v, α, β, γ is 50.8, meaning tat te overestimates 50.8 infected nodes i.e., te approximation metod is better; te average of te v V i i v s is -26, meaning tat te underestimates 26 infected nodes. Among te 125 combinations of C, C v, α, β, γ, te maximum err G is 84.5, wic is elaborated in Figure 4a and will be discussed furter, and te minimum err G is 7.1, wic is elaborated in Figure 4b. For C, C v =Gaussian, Clayton, te average of te err G s over te 126 combinations of C, C v, α, β, γ is 25.4, meaning tat te approximation only overestimates 25.4 infected nodes in a power-law network of 1000 nodes. In comparison, te average of te v V i + v i v s over te 126 combinations of C, C v, α, β, γ is 50.8, meaning tat te overestimates 50.8 infected nodes; te average of te 126 v V i i v s is -26, meaning tat te underestimates 26 infected nodes. Among te 126 instances, te maximum err G is 84.5, wic is elaborated in Figure 4d, and te minimum err G is 7.2, wic 15

is elaborated in Figure 4e. In summary, cyber defense decision-making can use te approximation metod, wic takes advantage of te upper and s. 0.0 0.2 0.4 0.6 0.8 1.0 Power law approximation 0 20 40 60 80 0.4 0.6 0.8 1.0 Power law approximation 0 20 40 60 80 0.2 0.4 0.6 0.8 1.0 Power law approximation 0 20 40 60 80 a Gaussian,Frank, 0.01,0.5,0.02 b Gaussian,Frank, 0.2,0.1,0.01 c Gaussian,Frank 0.2 0.4 0.6 0.8 1.0 Power law approximation 0 20 40 60 80 0.4 0.6 0.8 1.0 Power law approximation 0 20 40 60 80 0.2 0.4 0.6 0.8 1.0 Power law approximation 0 20 40 60 80 d Gaussian,Clayton, 0.01,0.5,0.02 e Gaussian,Clayton, 0.2,0.1,0.01 f Gaussian,Clayton Figure 4: Power-law networks: vs. approximation vs. vs. wit respect to C, C v, α, β, γ or C, C v. In Figures 4b and 4e, te approximation result matces te almost perfectly. We also would like to igligt te penomenon tat te equilibrium infection probability i v increases wit node degree degv in power-law networks. Similarly, for C, C v =Gaussian,Frank, Figures 4a-4b plot respectively te infection probabilities corresponding to te α, β, γ tat leads to te maximum and minimum err G, and Figure 4c plots te infection probabilities averaged over te 125 combinations of C, C v, α, β, γ tat satisfy condition 8. For C, C v =Gaussian,Clayton, Figures 4a-4b plot respectively te infection probabilities corresponding to te α, β, γ tat leads to te maximum err G, and Figure 4c plots te infection probabilities averaged over te 126 combinations of C, C v, α, β, γ tat satisfy condition 8. We observe tat te approximation î v never underestimates te infection probability i v for any node v. We also observe tat i v solid curves increases wit degv, but exibits a iger nonlinearity wen compared wit te ER networks. 3.5 Bounds for Non-Equilibrium Infection Probabilities It is important to caracterize te beavior of i v t even if it never enters any equilibrium. For tis purpose, we want to seek some bounds for i v t, no matter weter te system converges to an equilibrium or not. Suc caracterization is useful because, for example, te can be used for te worst-case scenario decision-making. It is wort mentioning tat non-equilibrium states/beaviors are always ard to caracterize. Proposition 5 bounds for non-equilibrium probabilities Let lim t i v t and lim t i v t denote te upper and lower limits of i v t, v V. Ten, i v lim t i v t lim t i v t i + v, 16

were i v = 1 Cδ Cv 1 γν β + 1 Cδ Cv 1 γν, Cδ C v 1 γν β, [β Cδ Cv 1 γν] 1 µ v + 1 β, oterwise, and 1 C C v 1 γµv,1,..., 1 γµ v,degv i + v = β + 1 C, C C v 1 γµv,1,..., 1 γµ C v 1 γµv,1,..., 1 γµ v,degv v,degv > β [ ] β C Cv 1 γµv,1,..., 1 γµ v,degv 1 i v + 1 β, oterwise wit δ Cv 1 γν = C v 1 γν,..., 1 γν, µ v = max {1 β, min{γdegv + α, 1}} and ν = min{1 β, α}. Proof By observing te monotonicity in Eq. 4, we note tat i v t ν for all v V. Replacing i v,j t wit ν in Eq. 4 yields i v t + 1 1 βi v t + [1 C δ Cv 1 γν] 1 i v t = [C δ Cv 1 γν β] i v t + 1 C δ Cv 1 γν. If C δ Cv 1 γν > β, by taking te lower limit on bot sides we obtain If C δ Cv 1 γν β, we ave lim t i v t + 1 1 C δ C v 1 γν β + 1 C δ Cv 1 γν. i v t + 1 [C δ Cv 1 γν β] µ v + 1 C δ Cv 1 γν = [β Cδ Cv 1 γν] 1 µ v + 1 β. Hence, lim t i v t i v. For te, by applying Lemma 1 to Eq. 4 we ave i v t + 1 1 βi v t + [1 max {max {2 γdegv α} 1, 0}] 1 i v t = 1 βi v t + min {γdegv + α, 1} 1 i v t max {1 β, min{γdegv + α, 1}} = µ v. 15 By replacing i v,j wit µ v,j s in Eq. 4 yields i v t + 1 1 βi v t + [ 1 C C v 1 γµv,1,..., 1 γµ v,degv ] 1 iv t = [ C C v 1 γµv,1,..., 1 γµ v,degv β ] iv t If C C v 1 γµv,1,..., 1 γµ v,degv > β, ten +1 C C v 1 γµv,1,..., 1 γµ v,degv. lim t 1 C C v 1 γµv,1,..., 1 γµ v,degv β + 1 C C v 1 γµv,1,..., 1 γµ v,degv. If C C v 1 γµv,1,..., 1 γµ v,degv β, ten we ave {[ ] lim t i v t + 1 lim t C Cv 1 γµv,1,..., 1 γµ v,degv β iv t +1 C } C v 1 γµv,1,..., 1 γµ v,degv [ C ] C v 1 γµv,1,..., 1 γµ v,degv β limt i v t +1 C C v 1 γµv,1,..., 1 γµ v,degv [ C ] C v 1 γµv,1,..., 1 γµ v,degv β i v +1 C C v 1 γµv,1,..., 1 γµ v,degv. 17

Hence, we ave lim t i v t + 1 i + v. Wen are te bounds tigt? It is important to know wen te bounds are tigt because te defender can use te i + v for decision-making, especially wen te spreading never enters any equilibrium. Note tat wen γ << 1, it olds tat Cδ Cv 1 γν C1 = 1 α, and C C v 1 γµv,1,..., 1 γµ v,degv C Cv 1,..., 1 = 1 α. Terefore, in te case γ << 1 and α + β < 1, we ave i v i + v 1 Cδ Cv 1 γν β + 1 Cδ Cv 1 γν α β + α, 1 C C v 1 γµv,1,..., 1 γµ v,degv β + 1 C C v 1 γµv,1,..., 1 γµ v,degv Tis means tat te bounds are tigt wen te attack-power is not strong. In te case γ degv << 1 and α + β 1, we can similarly ave i v α2 α β, i + v β + α 1 [1 α2 α β] + 1 β. Terefore, te difference between te and is i + v i v αα + β 1 2. Terefore, te bounds are tigt wen α + β is not far from 1 or α is close to zero. α β + α. Are te equilibrium bounds always tigter tan te non-equilibrium bounds? We observe te following: under te condition γ degv << 1, we ave i i + v α/α+β; under te condition γ degv << 1 and te condition α + β < 1, we ave i v i + v α/α + β. Tis means tat te equilibrium bounds are widely applicable tan te same non-equilibrium bounds, namely tat te equilibrium bounds are strictly tigter tan te non-equilibrium bounds. 4 Side-Effects of Assuming Away te Dependences In te above we ave caracterized epidemic equilibrium tresolds, equilibrium infection probabilities, and nonequilibrium infection probabilities wile accommodating arbitrary dependences. In order to caracterize te sideeffects of assuming away te dependences, we consider te degree of dependences as captured by te concordance order between copulas reviewed in Section 2. In order to draw cyber security insigts at a iger level of abstraction, we also consider tree kinds of qualitative dependences: positive dependence, independence and negative dependence, wose degrees of dependence are in decreasing order. Specifically, positive negative dependence between te pusbased attacks means degv 1 C v 1 γiv,1,..., 1 γi v,degv 1 j=1 1 γi v,j, and positive negative dependence between te pus-based attacks and te pull-based attacks means 1 C C v 1 γiv,1 t,..., 1 γi v,degv t 1 1 αc v 1 γiv,1 t,..., 1 γi v,degv t, were equality means independence. To simplify te notations, let pd stand for positive dependence, ind stand for independence, and nd stand for negative dependence. Let x {pd, ind, nd} denote te dependence structure between te pus-based attacks and te pull-based attacks, as captured by copula C. Let y {pd, ind, nd} denote te dependence structure between te pus-based attacks, as captured by copula C v. Terefore, te dependence structures can be represented by a pair x, y. 18

4.1 Side-Effects on Equilibrium Infection Probabilities and Tresolds For fixed G = V, E, α, β, γ, we compare te effects of two groups of dependences i.e., copulas {C, C v, v V} and {C, C v, v V}. Corresponding to te two groups of copulas, we denote by i v t and i vt te respective infection probabilities of node v V at time t 0. Let i v,x,y denote te equilibrium infection probability of node v, namely i v, under dependence structure x, y. Side-effects on te equilibrium infection probabilities. We present a result about te impact of te dependence structures on te equilibrium infection probabilities. Tis result will allow us to derive te side-effects of assuming away te dependences. Proposition 6 comparison between te effects of different dependence structures on equilibrium infection probabilities Suppose te condition underlying Lemma 3 olds, namely ρa < so tat system 4 as a unique β + α2 γβ equilibrium. If for all v V, we ave C C v u1,..., u degv, u0 C C v u1,..., u degv, u0, 16 were 0 u j 1 for j = 0,..., degv, ten we ave i i. Proof Note tat i and i are respectively te unique positive solutions of fi = 0 and gi = 0, were f = f 1,..., f N and g = g 1,..., g N wit f v i = [ 1 C ] C v 1 γiv,1,..., 1 γi v,degv 1 iv βi v, v V g v i = [ 1 C C v ] 1 γiv,1,..., 1 γi v,degv 1 iv βi v, v V. Since f0 = g0 = α > 0 and f g, we ave gi fi = 0. Since bot f and g are continuous, we ave i i. Te cyber security insigts/implications of Proposition 6 is: Te stronger te negative positive dependences between te attack events, te lower iger te equilibrium infection probabilities. More specifically, we ave i v,pd,y i v,ind,y i v,nd,y for any y {pd, ind, nd} and i v,x,pd i v,x,ind i v,x,nd for any x {pd, ind, nd}. Terefore, te side-effects of assuming away te dependences between attack events are: If te positive negative dependence is assumed away, te resulting equilibrium infection probability underestimates overestimates te actual equilibrium infection probability. Tis means te following: wen te positive dependence between attack events is assumed away, te cyber defense decisions based on i v,ind,ind < i v,pd,pd can render te deployed defense useless; wen te negative dependence is assumed away between attack events, te cyber defense decisions based on i v,ind,ind > i v,nd,nd can waste defense resources. We will use numerical examples below to confirm tese insigts. Anoter important insigt is: if te defender can seek to impose negative dependence on te cyber attacks, te cyber defense effect is better of. We believe tat tis insigt will sed ligt on researc of future cyber defense mecanisms, and igligts te value of teoretical studies in terms of teir practical guidance. Side-effects on te epidemic equilibrium tresold. Corollary 1 gives a sufficient condition under wic te epidemic spreading enters te equilibrium. Here we define { τ def 1 maxv V 1 β/1 i = min v, γ } β + α2, 17 γβ wit respect to a group of copulas {C, C v, v V}. According to Eq. 8, ρa τ means tat te epidemic spreading converges to te equilibrium. Similarly, we can define τ wit respect to anoter group of copulas {C, C v, v V}. We want to compare τ and τ wit respect to te relation between {C, C v, v V} and {C, C v, v V}. 19

β + α2 Proposition 7 Under te conditions of Proposition 6, namely ρa < so tat system 4 as a unique γβ equilibrium and C C v u1,..., u degv, u0 C C v u1,..., u degv, u0 for all v V, we ave i if 1 β i, ten τ τ ; ii if 1 β i +, ten τ τ, were i + def = max v V i + v, i and i + v are defined in Proposition 1. Proof According to Proposition 1, we know tat i i v i + β, wic implies β 1 i 1 i β. According v 1 i + to Eq. 17, τ is decreasing in max v V 1 β 1 i. Terefore, τ is decreasing in i v wen 1 β i, and increasing v in i v wen 1 β i +. By Proposition 6, we get te desired results. In order to draw insigts wile simplifying te discussion, let τ x,y denote te τ as defined in Eq. 17 wit respect to dependence structures x, y. Te cyber security implication of Proposition 7 is: First, under some circumstances, te stronger te dependences between te cyber attacks, te more restrictive te epidemic equilibrium tresold. More specifically, under te condition 1 β i, we ave for all v V: τ nd,y τ ind,y τ pd,y and τ x,nd τ x,ind τ x,pd. Tis means tat under te above circumstances, assuming away te positive dependences between te attacks will lead to incorrect epidemic equilibrium tresold, and assuming away te negative dependences between te make te epidemic equilibrium tresold unnecessarily restrictive. Tis furter igligts te value for te defender to render te dependences negative, provided tat 1 β i. Second, under certain oter circumstances, te stronger te dependences, te less restrictive te epidemic equilibrium tresold. More specifically, under te condition 1 β i +, we ave τ nd,y τ ind,y τ pd,y and τ x,nd τ x,ind τ x,pd. Tis means tat assuming away te negative dependences between te attacks will lead to incorrect epidemic equilibrium tresold, and assuming away te positive dependences will make te epidemic equilibrium tresold unnecessarily restrictive. Moreover, wile rendering te dependences negative can lead to smaller equilibrium infection probabilities, it imposes a very restrictive epidemic equilibrium tresold wen 1 β i +. Tis means tat wen applying te above insigts to guide practice, te defender must be aware of te parameter regions corresponding to te cyber security posture. Numerical examples. In order to illustrate te above analytic results, we consider te example of star network wit N = 11 nodes. We assume tat te dependence between te pus-based and te pull-based attacks can be captured by te Gaussian copula C wit parameter σ and te dependence between te pus-based attacks launced from te leaves against te ub can be captured by copula C v, wic is te Clayton copula wit parameter θ. Tese two copulas are reviewed in Section 2. We consider two sets of parameters α, β, γ = 0.2, 0.5, 0.05 and α, β, γ = 0.4, 0.7, 0.05. From Eqs. 11 and 12, we can compute te equilibrium infection probabilities i for te ub and i l for te leaves, and te tresold τ as defined in 17. Note tat te copulas are increasing in teir parameters in te concordance order. By Proposition 6, bot i and i l are decreasing in θ σ given σ θ, as confirmed by Tables 1-2. Note tat for star networks, te condition 1 β i + in Proposition 7 can be relaxed as 1 β i +, were i + is defined in Proposition 3. Wen α, β, γ = 0.2, 0.5, 0.05, it is easy to verify 1 β i +, meaning tat τ is decreasing in θ σ for fixed σ θ. Tis is confirmed in Table 1. Wen α, β, γ = 0.4, 0.7, 0.05, te condition 1 β i in Proposition 7 is satisfied, meaning tat τ is increasing in θ σ for fixed σ θ. Tis is confirmed in Table 2. Tese examples also confirm te conclusion i i l given by Proposition 2. 20

σ = 0.5 nd σ = 0 ind σ = 0.5 pd θ i i l τ i i l τ i i l τ 1.0.35.29 14.11.38.30 14.31.40.31 14.40 1.5.35.29 14.11.38.30 14.30.40.31 14.39 2.0.35.29 14.11.38.30 14.30.39.31 14.39 2.5.34.29 14.11.38.30 14.30.39.31 14.39 3.0.34.29 14.11.37.30 14.30.39.30 14.39 3.5.34.29 14.11.37.30 14.30.39.30 14.38 4.0.34.29 14.11.37.30 14.30.39.30 14.38 4.5.34.29 14.11.37.30 14.29.38.30 14.38 5.0.34.29 14.11.37.30 14.29.38.30 14.38 5.5.34.29 14.11.37.30 14.29.38.30 14.38 6.0.33.29 14.11.36.30 14.29.38.30 14.38 Table 1: α, β, γ = 0.2, 0.5, 0.05 σ = 0.5 nd σ = 0 ind σ = 0.5 pd θ i i l τ i i l τ i i l τ 1.0.39.37 17.11.41.37 16.09.44.38 15.20 1.5.39.37 17.15.41.37 16.16.43.38 15.29 2.0.39.37 17.18.41.37 16.21.43.38 15.36 2.5.39.37 17.21.41.37 16.26.43.38 15.43 3.0.38.37 17.24.41.37 16.31.43.38 15.50 3.5.38.37 17.27.41.37 16.35.43.38 15.56 4.0.38.37 17.30.41.37 16.39.43.38 15.62 4.5.38.37 17.31.41.37 16.43.42.38 15.67 5.0.38.37 17.33.41.37 16.47.42.38 15.72 5.5.38.37 17.35.40.37 16.50.42.38 15.77 6.0.38.37 17.37.40.37 16.53.42.38 15.81 Table 2: α, β, γ = 0.4, 0.7, 0.05 21

4.2 Side-Effects on te Non-Equilibrium Infection Probabilities We now investigate te side-effects on te non-equilibrium infection probabilities it = i 1 t,..., i N t, no matter weter te epidemic spreading converges to equilibrium or not. Proposition 8 side-effects on te non-equilibrium infection probabilities Consider two vectors of infection probabilities it 0 i t 0 at some time t 0 0. Let µ = max v V µ v = max{1 β, min{α + γdeg, 1}}, were Deg = max v V degv. If condition 16 olds and ten it i t for all t t 0. min {C δ C v 1 γµ } β, 18 v V Proof We need to sow tat it + 1 i t + 1 wen it i t is given. Note tat i v t + 1 = [ C C v 1 γiv,1 t,..., 1 γi v,degv t β ] i v t 1 + 1 β, [ ] i vt + 1 = C C v 1 γi v,1t,..., 1 γi v,degv t β i vt 1 + 1 β. According to Ineq. 15 in te proof of Proposition 5, we ave i v t µ for all v V. Ten, conditions 16 and 18 imply Since it i t, we ave C C v 1 γiv,1 t,..., 1 γi v,degv t β 0, C C v 1 γiv,1 t,..., 1 γi v,degv t β 0. i v t + 1 [ C C v 1 γiv,1 t,..., 1 γi v,degv t β ] i vt 1 + 1 β [ C C v 1 γiv,1 t,..., 1 γi v,degv t β ] i vt 1 + 1 β [ ] C C v 1 γi v,1t,..., 1 γi v,degv t β i vt 1 + 1 β = i vt + 1. Since te above olds for all v V, we obtain te desired result. t = 6 t = 7 t = 8 i v t i vt i v t i vt i v t i vt v = 1 0.61 0.60 0.42 0.42 0.57 0.56 2 0.64 0.63 0.39 0.40 0.60 0.58 3 0.56 0.57 0.46 0.45 0.54 0.54 4 0.57 0.57 0.45 0.45 0.55 0.54 5 0.46 0.47 0.54 0.53 0.47 0.48 6 0.60 0.60 0.42 0.42 0.56 0.56 Figure 5: Clayton copulas wit θ, η = 1, 1.5, θ, η = 10, 15, α, β, γ = 0.9, 0.9, 0.8. One may wonder if a more succinct result tan Proposition 8 could be obtained by, for example, eliminating condition 18. Here we use an example to sow tat if we eliminate condition 18, ten Proposition 8 may not old. Specifically, consider te network wit six nodes illustrated in Figure 5. Suppose C and te C v s for v V are Clayton copulas wit positive parameters θ and η, namely 1/η [ ] degv 1/θ Cu 1, u 2 = u θ 1 + u θ 2 1 and C v u1,..., u degv = u η i degv + 1. 22 i=1

Consider two groups of Clayton copulas respectively wit parameters θ, η = 1, 1.5 and θ, η = 10, 15. Denote te corresponding infection probabilities by i v t and i vt, respectively. Set α, β, γ = 0.9, 0.9, 0.8, and in tis case condition 18 is not satisfied. Set te initial infection probabilities as i0 = i 0 = 0.2, 0.1, 0.3, 0.3, 0.6, 0.2. Te table in Figure 5 sows it and i t for t = 6, 7, 8, from wic we observe tat it i t does not old. Tis means tat we cannot eliminate condition 18 in Proposition 8. 5 Conclusions We ave presented te first systematic investigation of cyber epidemic models wit dependences. We ave derived epidemic equilibrium tresolds, bounds for equilibrium infection probabilities, and bounds for non-equilibrium infection probabilities, wile accommodating arbitrary dependences between te pus-based attacks and te pull-based attacks as well as te dependences between te pus-based attacks. In particular, we sowed tat assuming away te due dependences can render te results tereof unnecessarily restrictive or even incorrect. Our study brings up a range of interesting researc problems for furter work. First, our caracterization study assumes tat te dependence or copula structures are given. It is important to know wic dependence structures are more relevant tan te oters in practice. Second, it is ideal to obtain closed-form results on te equilibrium infection probabilities and te non-equilibrium infection probabilities. Tird, if we cannot derive closed-form results for te non-equilibrium infection probabilities, it is important to seek bounds for tese probabilities and systematically analyze teir tigtness. Acknowledgement. Tis work was supported in part by ARO Grants # W911NF-12-1-0286 and # W911NF-13-1- 0141, and by AFOSR Grant # FA9550-09-1-0165. References [1] D. Cakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic tresolds in real networks. ACM Trans. Inf. Syst. Secur. 10 4, 1-26, 2008. [2] F. Cung, L. Lu and V. Vu. Eigenvalues of random power law graps. Annals of Combinatorics, 7, 21-33, 2003. [3] U. Cerubini, E. Luciano, and W. Vecciato. Copula metods in finance. New York: Wiley, 2004. [4] D. Cvetkovic, P. Rowlingson, and S. Simic. An introduction to te teory of grap spectra. Cambridge University Press, UK, 2010. [5] G. Da, M. Xu, and S. Xu. A New Approac to Modeling and Analyzing Security of Networked Systems. Proc. 2014 Symposium and Bootcamp on te Science of Security HotSoS 14, to appear. [6] A. Ganes, L. Massoulie, and D. Towsley. Te effect of network topology on te spread of epidemics. In Proceedings of IEEE Infocom, 2005. [7] A. Granas and J. Dugundji. Fixed Point Teory. Springer-Verlag, New York, 2003. [8] H. Joe. Multivariate models and dependence concepts. Monograps on Statistics and Applied Probability, vol. 73. Capman & Hall, London, 1997. [9] J. Kepart and S. Wite. Directed-grap epidemiological models of computer viruses. IEEE Symposium on Security and Privacy, pages 343 361, 1991. [10] W. Kermack and A. McKendrick. A contribution to te matematical teory of epidemics. Proc. of Roy. Soc. Lond. A, 115:700 721, 1927. [11] X. Li, T. Parker, and S. Xu. A Stocastic Model for Quantitative Security Analysis of Networked Systems. IEEE Transactions on Dependable and Secure Computing, 81: 28-43, 2011. 23

[12] C. R. MacCluer. Te Many Proofs and Applications of Perron s Teorem. SIAM Review, 42, 487-498, 2000. [13] A. McKendrick. Applications of matematics to medical problems. Proc. of Edin. Mat. Soceity, 14:98 130, 1926. [14] A.J. McNeila and J. Ne sleová. Multivariate Arcimedean copulas, d-monotone functions and l 1 -norm symmetric distributions. Annals of Statistics, 37, 3059-3097, 2009. [15] R. B. Nelsen. An introduction to copulas, second ed. Springer Series in Statistics. Springer, New York, 2006. [16] P. Van Miegem, J. Omic and Kooij, R. Virus Spread in Networks. IEEE/ACM Transactions on Networking, 171, pp 1-14, 2009. [17] Y. Wang, D. Cakrabarti, C. Wang, and C. Faloutsos. Epidemic spreading in real networks: An eigenvalue viewpoint. Proc. of te 22nd IEEE Symposium on Reliable Distributed Systems SRDS 03, pages 25 34, 2003. [18] M. Xu and S. Xu. An Extended Stocastic Model for Quantitative Security Analysis of Networked Systems. Internet Matematics, 83, 288-320, 2012. [19] S. Xu, W. Lu, and L. Xu. Pus- and Pull-based Epidemic Spreading in Networks: Tresolds and Deeper Insigts. ACM Transactions on Autonomous and Adaptive Systems ACM TAAS, 73:32. 2012. [20] S. Xu, W. Lu, and Z. Zan, A stocastic model of multivirus dynamics, IEEE Trans. Dependable Sec. Comput., vol. 9, no. 1, pp. 30 45, 2012. 24