Dealing with uncertainty in Gaussian Bayesian networks from a regression perspective

Transcription

1 Dealing with uncertainty in Gaussian ayesian networks from a regression perspective Miguel A. Gómez-Villegas a, Paloma Main a, Hilario Navarro b and Rosario Susi a a Universidad Complutense Madrid, Spain b Universidad Nacional de Educación a Distancia, Spain Abstract Some sensitivity analyses have been developed to evaluate the impact of uncertainty about the mean vector and the covariance matrix that specify the joint distribution of the variables in the nodes of a Gaussian ayesian network (GN). Nevertheless, uncertainty about the alternative conditional speci cation of GN based on the regression coe cients of each variable given its parents in the directed acyclic graph (DAG), has received low attention in the literature. In this line, we focus on evaluating the e ect of regression coe cients misspeci cation by means of the Kullback-Leibler (KL) divergence. Introduction GNs are de ned as ayesian networks (Ns) where the joint probability density of X = (X ; X ; :::; X p ) is a multivariate normal distribution N p (; ) with the p dimensional mean vector and the p p positive de nite covariance matrix using a directed acyclic graph (DAG) to represent the dependence structure of the variables. As in Ns, the joint density can be factorized using the conditional probability densities of X i (i = ; :::; p) given its parents in the DAG, pa(x i ) fx ; :::; X i g. hese are univariate normal distributions with densities Xi f(x i jpa(x i )) N(x i j i + b ji (x j j ); v i ) j= being i the marginal mean of X i, b ji the regression coe cients of X i given X j pa(x i ), and v i the conditional variance of X i given its parents in the DAG. Note that if b ji = then X j is not a parent of X i. he parameters of the joint distribution can be obtained from the previous conditional speci cation. More concretely, the means f i g are addresses: ma.gv@mat.ucm.es (M.A. Gómez-Villegas), pmain@mat.ucm.es (P. Main), hnavarro@ccia.uned.es (H. Navarro), rsusi@estad.ucm.es (R. Susi) obviously the elements of the p mean vector and = [(I p ) ] D(I p ) dimensional (Shachter and Kenley, 989) where D is a diagonal matrix D = diag(v) with the conditional variances v = (v ; :::; v p ) and a strictly u- pper triangular matrix with the regression coe- cients b ji ; j f; :::; i g. he speci cation based on (; ; D) is more manageable for experts because they only have to describe univariate distributions. Moreover, the DAG can be improved by adding the numerical values of the regression coe cient and conditional variance to the corresponding arc and node, respectively. Nevertheless, there can still be considerably uncertainty about parameters. Sensitivity analysis is an important phase of any modelling procedure. (Castillo and Kjærul, 3) performed a one-way sensitivity analysis investigating the impact of small changes in the network parameters and. Alternatively, (Gómez- Villegas, Main and Susi, 7) proposed a oneway global sensitivity analysis instead of considering local aspects as location and dispersion, over the network s output. Also, in (Gómez- Villegas, Main and Susi, 8) a n way sensitivity analysis is presented as a generalization of

2 the previous one both using the KL divergence to evaluate the impact of perturbations. It is well known the KL divergence is an nonsymmetric measure that evaluates the amount of information available to discriminate between two probability distributions. We have chosen it because we want to compare the global behaviors of two probability distributions. he directed KL divergence between the probability densities f(w) and f (w), de ned over the same domain is D KL (f j f) = Z f(w) ln f(w) f (w) dw : he expression for multivariate normal distributions is given by D KL (f j f) = = ln j j jj + tr + h I p + i ; where f and f are densities of normal distributions N p (; ) and N p ( ; ) respectively. In general, if X = (X ; X ; :::; X p ) is a random vector normally distributed with parameters (; ) where the covariance matrix is inaccurately speci ed, the e ect of a perturbation pp measured in terms of a directed Kullback- Leibler divergence can be expressed as follows D KL (fjf ) = [ln jj j + j + tr( ( + ))] = [ln ji p + j tr( )] 4 tr( ) = 4 jj jj F = = 4 i ( ) i i= being f the density function with the perturbed covariance matrix +, i (i = ; :::; p) each of the columns of, with the necessary restrictions to get symmetric and positive de nite matrices + and, jj jj F the Frobenius matrix norm and tr () the trace function. However, as the directed KL divergence D KL (f j f) can be interpreted as the information lost when f is used to approximate f, in the following sensitivity analyses f has to be the original model and f the perturbed one o- pposite to previously used divergence. Herein we focus on the repercussion of a misspeci ed pp matrix, while the rest of the conditional parameters are known. o evaluate perturbation e ects, the proper directed KL divergence is used in all the studied cases. he paper is organized as follows. In Section the problem is stated and analyzed for constant errors. In Section 3 random perturbations are considered; some examples illustrate the behavior of the proposed measure for both local and global uncertainty. Fixed misspeci cation of Let f be the density of a multivariate normal distribution N p (; ) with conditional parameters ; and D. Denoting by the matrix with additive perturbations on and f the corresponding perturbed density N p ;, where (I p ) D(Ip ) (see (Susi, Navarro, Main and Gómez-Villegas, 9)), the KL divergence for comparing two covariance matrices when the means are equal is h trace i p. DKL f j f = () It should be noted that as I p and I p are upper triangular matrices with diagonal entries equal to one then ln j j jj =. Now, given that and = Ip (Ip ) + (I p (Ip trace ) + D ) = = trace (I p ) ;

3 trace (I p ) = = trace (I p ) the divergence in () can be restored as D KL f j f = = [p trace (I p ) + +trace D p] = = tr D = = tr D. () Note that (I p ) is an upper triangular matrix with diagonal entries equal to one and (I p ) is an upper triangular matrix with diagonal entries zero. Under the same assumptions, if (i) denotes an (i )-dimensional vector of local errors in node i produced by an erroneous estimation or elicitation of the node i with its parents relationships, the perturbation matrix on, with only this error source is i (i) (i)i ;i = : A hus, denoting f ;i as the density with coe - cients matrix + ;i, the e ect on the joint distribution can be expressed by KL f ;i j f = h tr ;i ;id i = = Xi k= (i)k tr (i )k (i)k, (3) i = ; ::::; p, being (i )k the k-th column of the submatrix (i ) built with the rst i rows and columns of. he local perturbation divergence in (3) also may be written in the form: KL f ;i j f = Xi (i)k (i )k ; (i) = k= = (i) (i )(i) = = U (i ) (i) with U = (I p ) and the submatrix U (i ) determined by the rst i rows and columns of U. Returning to the measure of interest, DKL f j f in (), it can be immediately obtained that DKL f j f = KL f ;i j f = = v i (i) (i ) (i). (4) hen, the total e ect can be expressed as the sum of individual e ects and consequently, the global sensitivity analysis can be performed through local analyses of nodes. Some direct results may be useful in applications: If all the components in (i) are equal to, it follows KL f ;i j f = X X i i tk k= t= he possibly erroneous arc deletion from node j to node i having b ji 6=, would yield KL f ;i j f = b ji jj he e ect of arc inclusion from node j to node i introducing b ji is also KL f ;i j f = b ji jj hese last two cases describe the impact of the conditional and marginal variances for an uncertain knowledge of the exact model giving the distance between DAGs obtained by adding or removing arcs.

4 Figure : DAG with regression coe cients on the arcs Example. Let us consider the GN in Figure with parameters: = ; Using that D = diag (; ; ; ; 4; ; ) : = [(I 7 ) ] D(I 7 ) C A the covariance matrix is = 6 4 : A he divergences re ecting the e ect of each arc removal are shown in Figure. It is observed that divergence increases as the depth of the node grows with a maximum in the arc from X 6 to X 7. herefore, it gives us information about Figure : Measure of deviation for each arc removal the di erence between the original GN and the particular networks obtained by means of the cancellation of some regression coe cients. 3 he random case he expression (4) relates easily, global e ect to local errors e ects due to no interaction. Now, we are going to replace the hypothesis that is known by the assumption that it is a random matrix. he main aim is using the relation (4) to evaluate the impact of uncertainty in for a GN. oth this measure and its value for di erent nodes can be useful to point the most sensitive nodes so as to compare structures with respect to uncertainty sensitivity. If we suppose each vector (i) is distributed independent from the remaining errors as N i (; E i ) ; i = ; :::; p, while the common value v = v = = v p = v is known, each random variable (i) (i ) (i) is a quadratic form in normal variables with a chi-squared distribution with i degrees of freedom (i ), if and only if E i (i ) is a symmetric idempotent matrix of rank i (Rencher, ). Also, if Y is a random vector with mean vector, covariance matrix and M pp is a nonrandom matrix, then (Rencher, ) E(Y MY) = M + tr(m)

5 V ar(y MY) = 4 M+tr((M) ). (5) hus, it results that the mean of the random variable DKL f j f is E D KL f j f = v tr (i ) E i. If we express the covariance matrix E i terms of the di erence with (i ), that is in E i = (i ) + A i, then tr (i ) E i = tr I(i ) + (i ) A i = = i + tr (i ) A i. It follows immediately that for a positive semide nite matrix A i it could be denoted by E i "larger than" (i ) the minimum mean impact is obtained when A i = being k = p v (p ) a lower bound for the mean impact of all the matrices E i larger than (i ). Moreover, imposing E i = (i ) we could assure each summand is distributed as a v (i ) because E i (i ) = I i is a symmetric idempotent matrix of rank i. he lower bound k can be considered to de ne the mean relative sensitivity by E DKL f j f P p = tr (i )E i p k (p ), could be as- whenever E i "larger than" sumed. 3. Independent errors (i ) When the hypothesis of independent errors with common variance can be accepted, that is E i = I (i ) ; i = ; :::; p, using (5), the quadratic forms of each component have rst and second order moments given by: E (i) (i ) (i) = tr (i ) V ar (i) (i ) (i) = 4 tr[( (i ) ) ] Figure 3: Node divergence from node (red) to node 7 (yellow) and global divergence (black) Consequently, under the stated conditions, we obtain an increasing average e ect according to node depth in the network, independently of the network speci cation. Figure 3 illustrates the random behavior of Kullback-Leibler divergences displaying the empirical cumulative distribution function (ECDF) for samples from v (i) (i ) (i), i = ; :::; 7, as well as the random global e ect for the example discussed above. We have used a simulated sample of size ;, with v = and = 5, for each case. In this setting, a reasonable procedure to evaluate sensitivity to uncertainty in is to analyze the normalized ratio S (f) D KL f j f =, that can be interpreted as the distribution variation in terms of the uncertainty variation. Obviously, the random divergence DKL f j f changes with ; Figure 4 shows the ECDFs behavior for some values in Example, exhibiting an apparent dominance relation. Nevertheless, the mean as well as the variance of S (f) do not depend on ; more concretely

6 Example. For the GN in Example it is obtained E S (f) = 63 Using (6), the numerical results by nodes are ().8, (3).6, (4).4, (5).7, (6)., (7).76 Figure 4: Empirical cumulative distribution function of Kullback-Leibler divergences for GN in Example with di erent independent errors: " = :5 (black), (red), (green), 5 (dark blue), (light blue) E S (f) = P p = tr (i ) V ar S (f) = v P p tr (i ) Relying on the moments invariance we propose to evaluate the GN sensitivity to uncertainty in by E S (f) = v i= tr (i ) = = Xp ii (p v where ii denotes the variance of X i. hen, the relative contribution of each node to the total sensitivity measure will be given by tr (i P p i= ii(p i) : (6) his result enables us to classify nodes according to greatest contribution to global sensitivity. ) i); Here the most important values are the contribution of the nodes to the global mean normalized divergence, resulting a signi cantly large in uence for node (7) compared to the rest of nodes in the DAG. herefore, independent random errors in the regression coe cients of nodes () to (5) do not describe very di erent joint models, however, that is not the case for node (7) and some e ort has to be made to bring some additional information to asses the correct regression coe cient value. 4 Conclusions he factorization of the joint distribution in GN leads us to an additive decomposition of the Kullback-Leibler divergence. hen, for misspeci ed regression coe cients, the weight each node has in the global deviation of the initial structure can be determined. Modelling uncertainty with independent random errors provides a highly simpli ed analysis to achieve an uncertainty sensitivity measure de nition that can be easily handled. Acknowledgments his research has been supported by the Spanish Ministerio de Ciencia e Innovación, Grant MM 8.38, and partially by GR58/8- A, Métodos ayesianos by the SCH- UCM from Spain. References E. Castillo and U. Kjærul. 3. Sensitivity analysis in Gaussian ayesian networks using a symbolic-numerical technique. Reliability Engineering and System Safety, 79: M.A. Gómez-Villegas, P. Main, R. Susi. 7. Sensitivity Analysis in Gaussian ayesian Networks

7 Using a Divergence Measure. Communications in Statistics: heory and Methods, 36(3): M.A. Gómez-Villegas, P. Main and R. Susi. 8. Sensitivity of Gaussian ayesian Networks to Inaccuracies in heir Parameters. Proceedings of the 4th EuropeanWorkshop on Probabilistic Graphical Models, pages A. Rencher.. Linear Models in Statistics. New York : Wiley. R. Shachter and C. Kenley Gaussian in uence diagrams. Management Science, 35: R. Susi, H. Navarro, P. Main and M.A. Gómez- Villegas. Perturbing the structure in Gaussian ayesian networks. 9. Cuadernos de rabajo de la E.U. de Estadística, C/9:. _trabajo/c_9.pdf.