Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202

Toy Example : Redundant Repairable Components C C2 State Meaning both C and C2 work 2 C failed, C2 working 3 C working, C2 failed 4 system failed a general stochastic process, e.g. Y (t) 2 3 4 t System Failed

Continuous-time Markov Chain Model C up C2 up State Meaning both C and C2 work 2 C failed, C2 working 3 C working, C2 failed 4 system failed λ r C down C2 up λ f λ f λ u λ f λ f λ r C up C2 down C down C2 down = π = 0 0, T = 2λ f λ f λ f 0 λ r λ r λ f 0 λ f λ r 0 λ r λ f λ f 0 0 0 0

Inferential Setting Cano & Rios (2006) provide conjugate posterior calculations in the context of analysing repairable systems when the stochastic process leading to absorption is observed. Data For each system failure time, one has: Starting state Length of time in each state Number of transitions between each state Ultimate system failure time

Inferential Setting Cano & Rios (2006) provide conjugate posterior calculations in the context of analysing repairable systems when the stochastic process leading to absorption is observed. Data For each system failure time, one has: Starting state Length of time in each state Number of transitions between each state Ultimate system failure time Reduced information scenario = Bladt et al. (2003) provide a Bayesian MCMC algorithm, or Asmussen et al. (996) provide a frequentist EM algorithm.

Definition of Phase-type Distributions An absorbing continuous time Markov chain is one in which there is a state that, once entered, is never left. That is, the n + state intensity matrix can be written: ( ) S s T = 0 0 where S is n n, s is n and 0 is n, with s = Se Then, a Phase-type distribution (PHT) is defined to be the distribution of the time to entering the absorbing state. F Y (y) = π T exp{ys}e Y PHT(π, S) = f Y (y) = π T exp{ys}s

Relating to the Toy Example State Meaning both C and C2 work 2 C failed, C2 working 3 C working, C2 failed 4 system failed = T = λ r C down C2 up λ f λ f C up C2 up λ u λ f C down C2 down 2λ f λ f λ f 0 λ r λ r λ f 0 λ f λ r S0 λ r λ f sλ f 0 0 0 0 λ f λ r C up C2 down f Y (y) = π T exp{ys}s F Y (y) = π T exp{ys}e

Bladt et al: Gibbs Sampling from Posterior Strategy is a Gibbs MCMC algorithm which achieves the goal of simulating from p(π, S y) by sampling from through the iterative process p(π, S, paths y) p(π, S paths, y) p(paths π, S, y)

Bladt et al: Metropolis-Hastings Simulation of Process In summary: Can simulate chain from p(path Y i y i ) trivially by rejection sampling. A Metropolis-Hastings acceptance ratio (ratio of exit rates) exists st truncating chain to time y i (at which point it absorbs) will be a draw from p(path Y i = y i ) Metropolis-Hastings p(path π, S, Y = y) Rejection Sampling p(path π, S, Y y) CTMC Sampling p(path π, S)

Motivation for Modifications Certain state transitions make no physical sense. (eg 2 3 in earlier example) 2 When part of a larger system, it is highly likely there will be censored observations. 3 Where there is no reason to believe distributional differences between parameters, they should (in idealised modelling sense) be constrained to be equal. This is as much to assist with reducing parameter dimensionality. 4 Computation time!

Toy Example Results 3.5 00 uncensored observations simulated from PHT with S = 3.6.8.8 9.5.3 0 9.5 0.3 = λ f =.8, λ r = 9.5 Density 3.0 2.5 2.0.5.0 0.5 0.0 0.5.0.5 2.0 Parameter Value Param S2 S3 s2 2 s3 3 λf f

Toy Example Results 00 uncensored observations simulated from PHT with S = 3.6.8.8 9.5.3 0 9.5 0.3 = λ f =.8, λ r = 9.5 Density 0.4 0.3 0.2 0. 0.0 8 9 0 2 3 4 5 Parameter Value Param S2 S3 Rλ r Reliability less sensitive to λ r (Daneshkhah & Bedford 2008)

Toy Example Results 2 00 uncensored observations simulated from PHT with S = 3.6.8.8 9.5.3 0 9.5 0.3 = λ f =.8, λ r = 9.5 Density 0 8 6 4 2 0 0.0 0. 0.2 0.3 0.4 0.5 Parameter Value Param s S23 S32

The Big Issue Intractable computation time for many applications!

The Big Issue Intractable computation time for many applications! longer chains and MCMC jumps to states for which observations are far in the tails can stall rejection sampling step of MH algorithm. P(Y i y i π, S) = 0 6 p(π, S paths, y) p(paths π, S, y)

The Big Issue Intractable computation time for many applications! longer chains and MCMC jumps to states for which observations are far in the tails can stall rejection sampling step of MH algorithm. P(Y i y i π, S) = 0 6 p(π, S paths, y) p(paths π, S, y) E(RS iter.) = 0 6 95% CI = [2537, 3688877]

The Big Issue Intractable computation time for many applications! longer chains and MCMC jumps to states for which observations are far in the tails can stall rejection sampling step of MH algorithm. 2 states from which absorption impossible wasteful to resample whole chain because state at time y i unsuitable for truncation. 2 3 4 invalid truncation y i simulation

The Big Issue Intractable computation time for many applications! longer chains and MCMC jumps to states for which observations are far in the tails can stall rejection sampling step of MH algorithm. 2 states from which absorption impossible wasteful to resample whole chain because state at time y i unsuitable for truncation. State Meaning P(state) both PS working 0.9986 = E(MH iter) = 429 2 failed, 2 working 0.0007 3 working, 2 failed 0.0007 95% CI = [36, 5267]

The Big Issue Intractable computation time for many applications! longer chains and MCMC jumps to states for which observations are far in the tails can stall rejection sampling step of MH algorithm. 2 states from which absorption impossible wasteful to resample whole chain because state at time y i unsuitable for truncation. 3 time for MH algorithm to reach stationarity can grow rapidly. Total Variation Distance 0.0 0.2 0.4 0 00 200 300 400 500 Iterations

Solution: Exact Conditional Sampling Metropolis-Hastings p(path π, S, Y = y) Rejection Sampling p(path π, S, Y y) CTMC Sampling p(path π, S)

Solution: Exact Conditional Sampling Metropolis-Hastings i) Starting state ~ discrete p(path π, S, Y = y) Y {0} π ii) Advance time ~ Exponential Rejection Sampling δ Exp( S Y {t},y {t} ) p(path π, S, Y y) iii) Select next state ~ discrete Y {t + δ} { S Y {t}, Y {t} /S Y {t},y {t} } CTMC Sampling iv) If not absorbed, t = t + δ, loop to ii p(path π, S)

Solution: Exact Conditional Sampling Metropolis-Hastings e.g. Starting state mass function changes p(path π, S, Y = y) with conditioning: Rejection Sampling p(path π, S, Y y) P(Y {0} = i π, S) =π i CTMC Sampling p(path p(path π, π, S, Y S) = y) P(Y {0} = i π, S,Y = y) = et i exp{sy}s π i π T exp{sy}s

Tail Depth Performance Improvement 0 2 log[time (secs)] 0 0 0 0-0 -2 Method ECS MH 0-3 2 3 4 5 6 Upper tail probability (0^-x)

Overall Performance Improvement This shows the new method keeping pace in nice problems and significantly outperforming otherwise. T = ( 3 3 3 0 0 0 0 No problems i-iii MH ECS t.6 µs 7.2 µs s t 04 µs 9 µs ) T = ( 2 0.0.99 0 300 0 299 299 0 300 0 0 0 0 All problems i-iii ) MH ECS 0.2 hours 0.06 secs 9.4 hours 0.05 secs 2,300,000 faster on average in hard problem

Networks/Systems of Components Quick overview now of current research focus: inference for networks/systems comprising nodes/components which may be of Phase-type.

Networks/Systems of Components Quick overview now of current research focus: inference for networks/systems comprising nodes/components which may be of Phase-type. 2

Networks/Systems of Components Quick overview now of current research focus: inference for networks/systems comprising nodes/components which may be of Phase-type. 2 2

Networks/Systems of Components Quick overview now of current research focus: inference for networks/systems comprising nodes/components which may be of Phase-type. 2

Networks/Systems of Components Quick overview now of current research focus: inference for networks/systems comprising nodes/components which may be of Phase-type. 2 3

Networks/Systems of Components Quick overview now of current research focus: inference for networks/systems comprising nodes/components which may be of Phase-type.??? T = t? Again, reduced information setting: overall network failure time, or so-called Masked system lifetime data.

Direct Masked System Lifetime Inference Even a very simple setting quite hard to tackle directly. X X 2 X i iid Weibull(scale = α, shape = β) System lifetimes t = {t,..., t n } X 3 F T (t) = ( F X (t) F X2 (t))( F X3 (t)) = L(α, β; t) = m i= { t i β(t i /α) β exp 3(t i /α) β} [ 2 exp {(t i /α) β} { + exp 2(t i /α) β} ] 3 p(α, β t) L(α, β; t)p(α, β) awkward.

MCMC Solution (Independent Case) Proposed solution in the tradition of Tanner & Wong (987), since inference easy in the presence of (augmented) component lifetimes. Thus, for X i iid FX ( ; ψ) sample from the natural completion of the posterior distribution: p(ψ, x,..., x n t) by blocked Gibbs sampling using the conditional distributions: p(x,..., x n ψ, t) p(ψ x,..., x n, t) where x i = {x i,..., x im } are the m component failure times for the i th of n systems (x ij = t j some j)

p(ψ x,..., x n, t) is now simple Bayesian inference for system lifetime distribution well understood and in Phase-type case, above algorithm slots in here. Problem shifted to sampling p(x,..., x n ψ, t)

p(ψ x,..., x n, t) is now simple Bayesian inference for system lifetime distribution well understood and in Phase-type case, above algorithm slots in here. Problem shifted to sampling p(x,..., x n ψ, t) Propose using Samaniego s system signature s j = P(T = X j:n ) e.g. p(x i,..., x im ψ, t) = m {p(x i,..., x im ψ, X j:n = t i ) j= P(T = X j:n ψ, t i )} where ( ) m P(T = X j:n ψ, t i ) s j F X (t i ) j FX (t i ) m j j

Algorithm to sample p(x,..., x n ψ, t) For each system i =,..., n: Sample j {,..., m} from the discrete probability distribution defined by the conditioned system signature, P(T = X j:n ψ, t i ). This samples the order statistic indicating that the j th failure caused system failure. 2 Sample: j values, x i,..., x i(j ), from F X X<ti ( ; ψ), the distribution of the component lifetime conditional on failure before t i m j values, x i(j+),..., x in, from F X X>ti ( ; ψ), the distribution of the component lifetime conditional on failure after t i and set x ij = t i. Each iteration provides x i

All Systems of 4 Components, Exponential (λ = 0.4) Network Rate 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20

Exchangeable Failure Rate Parameters It is straight-forward to allow the more general setting of exchangeable failure rate parameters between networks: i X ij T i j =,...,n i =,...,m However, exchangeability within network would break the signature-based sampling of node failure times so this is probably as general as this particular approach can go.

All Systems of 4 Components, Ψ Gam(α = 9, β = 2 ) Network Population Mean of Rate Distribution 4.0 4.5 5.0 5.5 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20

All Systems of 4 Components, Ψ Gam(α = 9, β = 2 ) Network Population Variance of Rate Distribution 2 3 4 5 6 7 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20

Extend to Topological Inference It is now possible to add topology detection to the inferential process.? T = t??? It is simple to sample from p(m ψ, t) as an additional Gibbs update, moving between network topologies. With care, reversible jump between component numbers is also feasible.

True Topology with Signature No. 2 (40 observation) 0.25 Posterior Probability 0.20 0.5 0.0 0.05 0.00 4 6 8 0 2 4 6 8 Network

Asmussen, S., Nerman, O. & Olsson, M. (996), Fitting phase-type distributions via the EM algorithm, Scand. J. Statist. 23(4), 49 44. Bladt, M., Gonzalez, A. & Lauritzen, S. L. (2003), The estimation of phase-type related functionals using Markov chain Monte Carlo methods, Scand. Actuar. J. 2003(4), 280 300. Cano, J. & Rios, D. (2006), Reliability forecasting in complex hardware/software systems, Proceedings of the First International Conference on Availability, Reliability and Security (ARES 06). Daneshkhah, A. & Bedford, T. (2008), Sensitivity analysis of a reliability system using gaussian processes, in T. Bedford, ed., Advances in Mathematical Modeling for Reliability, IOS Press, pp. 46 62. Tanner, M. A. & Wong, W. H. (987), The calculation of posterior distributions by data augmentation, Journal of the American Statistical Association 82(398), 528 540.