# Variational Approximations between Mean Field Theory and the Junction Tree Algorithm

Save this PDF as:

Size: px
Start display at page:

Download "Variational Approximations between Mean Field Theory and the Junction Tree Algorithm"

## Transcription

1 626 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 Variational Approximations between Mean Field Theory and the Junction Tree Algorithm Wim Wiegerinck RWCP, Theoretical Foundation SNN, University of Nijmegen, Geert Grooteplein 21, 6525 EZ, Nijmegen, The Netherlands Abstract Recently, variational approximations such as the mean field approximation have received much interest. We extend the standard mean field method by using an approximating distribution that factorises into cluster potentials. This includes undirected graphs, directed acyclic graphs and junction trees. We derive generalised mean field equations to optimise the cluster potentials. We show that the method bridges the gap between the standard mean field approximation and the exact junction tree algorithm. In addition, we address the problem of how to choose the structure and the free parameters of the approximating distribution. From the generalised mean field equations we derive rules to simplify the approximation in advance without affecting the potential accuracy of the model class. We also show how the method fits into some other variational approximations that are currently popular. 1 INTRODUCTION Graphical models, such as Bayesian networks, Markov fields, and Boltzmann machines provide a rich framework for probabilistic modelling and reasoning (Pearl, 1988; Lauritzen and Spiegelhalter, 1988; Jensen, 1996; Castillo et al., 1997; Hertz et al., 1991). Their graphical structure provides an intuitively appealing modularity and is well suited to the incorporation of prior knowledge. The invention of algorithms for exact inference during the last decades has lead to the rapid increase in popularity of graphical models in modern AI. However, exact inference is NP-hard (Cooper, 1990). This means that large, densely connected networks are intractable for exact computation, and approximations are necessary. In this context, the variational methods gain increasingly interest (Saul et al., 1996; Jaakkola and Jordan, 1999; Jordan et al., 1999; Murphy, 1999). An advantage of these methods is that they provide bounds on the approximation error and they fit excellently into a generalised-em framework for learning (Saul et al., 1996; Neal and Hinton, 1998; Jordan et al., 1999). This is in contrast to stochastic sampling methods (Castillo et al., 1997; Jordan, 1998) which may yield unreliable results due to finite sampling times. Until now, however, variational approximations have been less widely applied than Monte Carlo methods, arguably since their use is not so straightforward. One of the simplest and most prominent variational approximations is the so-called mean field approximation which has its origin in statistical physics (Parisi, 1988). In the mean field approximation, the intractable distribution P is approximated by a completely factorised distribution Q by minimisation of the Kullback-Leibler (KL) divergence between P and Q. Optimisation of Q leads to the so-called mean field equations, which can be solved efficiently by iteration. A drawback of the standard mean field approximation is its limited accuracy due to the restricted distribution class. For this reason, extensions of the mean field approximation have been devised by allowing the approximating distributions Q to have a more rich, but still tractable structure (Saul and Jordan, 1996; Jaakkola and Jordan, 1998; Ghahramani and Jordan, 1997; Wiegerinck and Barber, 1998; Barber and Wiegerinck, 1999; Haft et al., 1999; Wiegerinck and Kappen, 2000). In this paper, we further develop this direction. In section 2 we present a general variational framework for approximate inference in an (intractable) target distribution using a (tractable) approximating distribution that factorises into overlapping cluster potentials. Generalised mean field equations are derived which are used in an iterative algorithm to optimise the cluster potentials of the approximating distribution. This

2 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS procedure is guaranteed to lead to a local minimum of the KL-divergence. In section 3 we show the link between this procedure and standard exact inference methods. In section 4 we give conditions under which the complexity of the approximating model class can be reduced in advance without affecting its potential accuracy. In sections 5 and 6 we consider approximating directed graphs and we construct approximating junction trees. In section 7, we consider the approximation of target distributions for which the standard approach of KL minimisation is intractable. 2 VARIATIONAL FRAMEWORK 2.1 TARGET DISTRIBUTIONS Our starting point is a probabilistic distribution P(x) on a set of discrete variables x = x1,..., Xn in a finite domain, Xi E {1,..., ni}. Our goal is to find its marginals P(xi) on single variables or small subsets of variables P(xi,..., xk). We assume that P can be written in the following factorisation 1 P(x) a = Z II Wa(da) = exp L a 1/la(da)- Zp, (1) p da. in which 1lT a are potential functions that depend on a small number of variables, denoted by the clusters Zp is a normalisation factor that might be unknown. Note that the potential representation is not unique. a, When it is convenient, we will use the logarithmic form of the potentials, 1/Ja =log 1lT =log Zp. An example is a Boltzmann machine with binary units (Hertz et al., 1991), Zp 1 P(x) = Z exp(l WijXiXi + L hkxk), (2) p i<j k that fits in our form (1) with dii = (xi, Xj), i < j, dk = Xk and potentials 1/lij(Xi, Xj) = WijXiXj, 1/Jk(Xk) = hkxk. Another example of a distribution that fits in our framework is a Bayesian network given evidence e, which can be expressed in terms of the potentials 1lT i (dj) = P(xj l1ri), with dj = (xi, 1l"j) and the normalisation Zp = P(e). This example shows that our inference problem includes the problem of computation of conditionals given evidence, since conditioning can be included by absorbing the evidence into the model definition via Pe(x) = P(x, e)/ P(e). The complexity of computing marginals in P depends on the underlying graphical structure of the model, and is exponential in the maximal clique size of the triangulated moralised graph (Lauritzen and Spiegelhalter, 1988; Jensen, 1996; Castillo et al., 1997). This may lead to intractable models, even if the clusters da are small. An example is a fully connected Boltzmann machine: the clusters contain at most two variables, while the model has one clique that contains all the variables in the model. 2.2 APPROXIMATING DISTRIBUTIONS In the variational method the intractable probability distribution P(x) is approximated by a tractable distribution Q(x). This distribution can be used to compute probabilities of interest. In the standard (mean field) approach, Q is a completely factorised distribution, Q(x) = Tii Q(xi) We take the more general approach with Q being a tractable distribution that factorises according a given structure. By tractable we mean that marginals over small subsets of variables are computationally feasible. To construct Q we first define its structure. in which c, are predefined clusters whose union contains all variables. <I>,(c,) are nonnegative potentials of the variables in the clusters. The only restriction on the potentials is the global normalisation <I>,(c,) L = II {x} J' ZQ. 2.3 VARIATIONAL OPTIMISATION The approximation Q is optimised such that the Kullback-Leibler (KL) divergence between Q and P, "' Q(x) _ j Q(x)) D(Q, P) = L.,.. Q(x) log P(x) = log P(x) {x} is minimised. In this paper, (...) denotes the average with respect to Q. The KL-divergence is related to the difference of the probabilities of Q and P, m ; x IP(A)- Q(A)I:::; J D(Q, P), for any event A in the sample space (see (Whittaker, 1990)). In the logarithmic potential representations of P and Q, the KL-divergence is D(Q, P) = ( VJ,(c,)-y;1/;a(da)) +constant,, (4)

3 628 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 which shows that D(Q, P) is tractable (up to a constant) when Q is tractable and the clusters in P and Q are small. To optimise Q under the normalisation constraint ( 4), we do a constrained optimisation of the KL-divergence with respect to c.p1 using Lagrangian multipliers. In this optimisation, the other potentials 'Pf3, (3 =/= 'Y remain fixed. This leads to the solution c.p ( c1), given by the generalised mean field equations (a) Target distribution The average (...)c is taken with respect to the con- '"' ditional distribution Q(xlc1). In (5), D1 (resp. C1) are the sets of clusters a in P (resp. (3 =/= "( in Q) that depend on c1. In other words, a (/. D1 implies Q(dalc,) = Q(da), etc. Finally, z is a constant that can be inferred from the normalisation (4), i.e. z =log L exp [L 'Pf3(cf3) + {x} f3=h + ( L '1/Ja(da) - L 'Pf3(C{3) ]-ZQ. (6) aed-r {3EC-r C-y Since Q(xlc1) is independent of the potential c.p1, z (6) is independent of c.p1. Consequently, the right hand side of (5) is independent of c.p1 as well. So (5) provides a unique solution c.p to the optimisation of the potential of cluster 'Y This solutions corresponds to the global minimum of D(Q, P) given that the potentials of other clusters (3 =/= 'Y are fixed. This means that in a sequence where at each step different potentials are selected and updated, the KL-divergence decreases at each step. Since D( Q, P) 0, we conclude that this iteration over all clusters of variational potentials leads to a local minimum of D(Q, P). In the mean field equations (5), the constant z plays only a minor role and can be set to zero if desired. This can be achieved by simultanously shifting ZQ and 'Pe-r by the same amount before we optimize 'Pe-r. (This shift does not affect Q). The generalized mean field equations (5) straightforwardly generalizes upon the standard mean field equations for fully factorized approximations (see e.g. (Haft et al., 1999)). The main difference is that the contribution of the other potentials!3, f3 E C1 vanishes in the fully factorized approximation. In figure 1, a simple example is given. (b) KL = 0.43 (c) KL = 0.03 Figure 1: Chest clinic model (ASIA), from (Lauritzen and Spiegelhalter, 1988). (a): Exact distribution P with marginal probabilities. (b-e): Approximating distributions with approximated marginal probabilities. In (b) Q is fully factorised. In (c), Q is a tree. KL is the KL-divergence D(Q, P) between the approximating distribution Q and the target distribution P. 3 GLOBAL CONVERGENCE In this section, we link the mean field approximation with exact computation by showing global convergence for approximations Q that satisfy the following two conditions: (1) Each cluster da of P is at least contained in one of the clusters c1 in Q. (2) Q satisfies the so-called running intersection property. For a definition of the running intersection property, we follow (Castillo et al., 1997): Q satisfies the running intersection property if there is an ordering of the clusters of Q, (cl,,em) such that s, = c, n (cl u... U c1_1) is contained in at least one of the clusters (c1,..., c1_1). If a cluster c1 intersects with the separator s0 of a successor c0, there are three possibilities: s0 is contained in another successor c17 (J > 17 > "f), or s0 is contained in c1 itself, or s0 intersects only with the separator s1 (since s0 is contained in a predecessor of c1, which is separated by s1). We denote A 1 = {soc c1lso ct. c17,j > 1J > "f}. So each separator is contained in exactly one A1. Finally, we define A, = { da C c1lda C/.. c11, 1J > "(}. Each cluster of P is contained in exactly one A1. With these preliminaries, we consider the mean field equations (5) applied to the potentials of Q. We con-

5 630 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS I, '0 > (a) Figure 2: Example of redundant structure. (a): Graph of exact distribution P(A)P(BIA)P(CIA). (b): Optimisation of an approximating distribution with structure Q(A)Q(B, C) leads to a distribution with simpler structure Q(A)Q(B)Q(C). The variables Band C become independent in Q, although they are marginally dependent in P (via A). clusters c '"Y of Q, adaptive biases for the nodes that are connected with weights in P which are not copied into Q, and fixed copies of biases for the remaining nodes. Note that optimal weights in an approximation of a Boltzmann machine are not always just copies from the target distribution. An illustrative counter example is the target distribution P(x1, x2,x3) ex: exp(w12x1x2 + W13X1X3 + W23X2X3) With W12 = OO, SO X1 and X2 are hard coupled (xi = ±1). The optimal approximation of the form Q(x1, x2,x3) ex: c])(xl,x2)<p(x2,x3) is given by c])(xl, x2) ex: exp(w12x1x2) and <P(x2, x3) ex: exp([w13 +w23]x2x3). The approximation in which the weight between x2 and x3 in Q is copied from P (i.e. w23 instead of w13 + w23 ) is suboptimal. The convergence times between approximate models with and without using copied potentials may differ, even if their potential accuracies are the same. As an example, consider the target P(x1, x2) ex: exp(w12x1x2). The approximation Q(x1, x2) ex: <P(x1, x2) convergences in one step. On the other hand, in Q(x1, x2) ex: exp( w12x1x2 +</>1 (x1) +</>2 (x2)), the potentials i decay only exponentially. The lemma generalises the result on graph partitioning in Boltzmann machines as presented in (Barber and Wiegerinck, 1999). It shows and clarifies in which cases the copied potentials of tractable substructures as originally proposed in (Saul and Jordan, 1996) are optimal. A nice example in which the copied potentials are optimal is the application to the Factorial Hidden Markov Models in (Ghahramani and Jordan, 1997; Jordan et al., 1999). The lemma provides a basis to the intuition that adding structure to Q that is not present in P might be redundant. (The lemma is still valid if AI< is empty.) In fig. 2 a simple example is given. A similar result for approximations using directed graphs ( cf. section 5) is obtained in (Wiegerinck and Kappen, 2000). Finally, we note that the lemma only provides sufficient conditions for simplification. (b) 5 DIRECTED APPROXIMATIONS A slightly different class of approximated distributions are the 'directed' factorisations. These have been considered previously in (Wiegerinck and Barber, 1998; Barber and Wiegerinck, 1999; Wiegerinck and Kappen, 2000), but they fit well in the more general framework of this paper. Directed factorisations can be written in the same form (3), but the clusters need to have an ordering c1, c2, c3,.... We define separator sets s1 = c"' n { c1 U... U c"t-l } and residual sets r"' = c "' s1. We restrict the potentials c))"' ( c"') = <P"' ( r"', s"') to satisfy the local normalisation L c])"'(r"', s"') = 1, {r-,} (9) We can identify c])"'(r"',s"') = Q(r"'ls"') and (3) can be written in the familiar directed notation Q(x) = IJ"' Q(r"'ls"'). To optimise the potentials <p"'(r"', s"') (= log Q(r'"Y is'"y)), we do again a constraint optimisation with constraints (9). This leads to generalised mean field equations for directed distributions in which D (resp. C ) is the set of clusters a in P (resp. f3 =I r in Q) that depend on r"'. z(s"') is a local normalisation factor that can be inferred from (9), i.e. 6 JUNCTION TREES For the definition of junction trees, we follow (Jensen, 1996): A cluster tree is a tree of clusters of variables which are linked via separators. These consists of the variables in the adjacent clusters. A cluster tree is a junction tree if for each pair of clusters c '"Y, C6, all nodes in the path between c "' and CJ contain the intersection. In a consistent junction tree, the potentials c))"' and <P6 of the nodes c "', CJ with intersection I satisfy We consider consistent junction tree representations of Q of the form Q(x)

6 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS in which the product in the denominator is taken over the separators. The separator potentials are defined by the cluster potentials, <1>-y,J(s'I',J) = L <I>'I'(c'l'). c s,j The junction tree representation is convenient, because the cluster probabilities can directly be read from the cluster potentials: Q(c'l') = <I>'I'(c'l'). For a more detailed treatment of junction trees we refer to (Jensen, 1996). In the following, we show how approximations can be optimised while maintaining the junction tree representation. Taking one of the clusters, c,., separately, we write Q as the potential <I>,. times Q(xlck) TI')',e,. <t>')'( ) Q( X ) = <J> t< ( c,. ) X f1 <J> ')',J( s')',j ) ('I',J) Subsequently, we update <!>,. according to the mean field equations (5), <t>:(c,.) = exp( L log'll(da) daedk L log<i>'i'(c'l') + L log<i>'/',,;(s'/',.;)) (10) ')'EC.. ('1',6)ES.. where S,. is the set of separators that depend on c,.. Z makes sure that <I> is properly normalised. Now, however, the junction tree is not consistent anymore. We can fix this by applying the standard DistributeEvidence(c,.) operation to the junction tree (see (Jensen, 1996)). In this routine, c,. sends ames sage to all its neighbours c-y via and <t>;,,.cs')',.) = L <t>:(c,.) ck s K Recursively, the neighbours c'l' send messages to all their neighbours except the one from which the message came. After this procedure, the junction tree is consistent again, and another potential can be updated by (10). Since the DistributeEvidence routine does not change the distribution Q (it only makes it consistent), the global convergence result (section 3) applies if the structure of Q is a junction tree of P. This links the mean field theory with the exact junction tree algorithm. CK 7 APPROXIMATED MINIMISATION The complexity of the variational method is at least proportional to the number of states in the clusters da of the target distribution P, since it requires the computation of averages of the form ('1/!(da)). In other words, the method presented in this paper can only be computationally tractable if the number of states in da is reasonably small. If the cluster potentials are explicitely tabulated, the required storage space is also proportional with the number of possible cluster states. In practice, potentials with large number of cluster states are parameterised. In these cases, one can try to exploit the parametrisation and approximate ('lj;(da)) by a tractable quantity. Examples are target distributions P with conditional probabilities P(xil7ri) that are modelled as noisy-or gates (Pearl, 1988; Jensen, 1996) or as weighted sigmoid functions (Neal, 1992). For these parametrisations (log P(xi 17rf)) can be approximated by a tractable quantity Ei(Q, ) (which may be defined using additional variational parameters ). As an example, consider tables parametrised as sigmoid functions, where Zi is the weighted input of the node, Zi = L:k WikXk +hi. In this case, the averaged log probability is intractable for large parent sets. To proceed we can use the approximation proposed in (Saul et al., 1996) (log(1 + ez')) < i (zi) + log ( e- ;z; + e(l- ;)z;) = Ei(Q, ), which is tractable if Q is tractable (Wiegerinck and Barber, 1998; Barber and Wiegerinck, 1999). Numerical optimisation of.c( Q, ) = (log Q) - [ ( Q, ) with respect to Q and leads to local minimum of an upper bound of the KL-divergence. Note however, that iteration of fixed point equations derived from.c( Q, ) does not necessarily lead to convergence, due to the nonlinearity of [ with respect to Q. In (Wiegerinck and Kappen, 2000) numerical simulations are performed on artificial target distributions P that had tractable substructures as well as sigmoidal nodes with large parent sets. Target distributions with varying system size were approximated by fully factorised distributions as well as distributions with structure. The results showed that an approximation using structure can improve significantly the accuracy of approximation within feasible computer time. This seemed independent of the problem size. Another example is a hybrid Bayesian network (which

7 632 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 has continuous and discrete variables). In the remainder of this section we closely follow (Murphy, 1999). For expositional clarity we consider the distribution P(rix)P(xit)P(t), in which r and t are binary variables and x is a continuous variable. The conditional distribution P(rlx) is parametrised by a sigmoid, P(r = 1/x) = a(wx +b) with parameters w and b. The conditional distribution P(xit) is a conditional Gaussian, P(xlt) = exp(gt + xht + xkt/2) in which (ht. Kt) are parameters depending on t and P(t) is a simple table with two entries. As (Murphy, 1999) showed, computation of the conditional distributions of x and t given observation of r is difficult. In (Murphy, 1999), it is proposed to approximate the KL-divergence by using the quadratic lower bound of the sigmoid function (Jaakkola, 1997; Murphy, 1999), loga(x) 2: x/2 + A.( )x2 +logo-( )- /2- ea.( ), with A.( ) = -1/4 tanh( /2). By fixing, this bound leads to an tractable upper bound of the KLdivergence C(Q, ) = (logq(t) +logq(xlt) -logp(t) -(gt + g( ) + x(ht + h( )) + x2(kt + K( ))/2)), in which g( ) logo-( )+!(2r- 1)b-! + A.( )(b2- e) h( ) = K( ) = (2r- 1)w + 2A.( )bw 2A.( )w2 For given, the optimal distribution Q(x, t) is simply given by Q(x, t) ex P(t) exp (9t + g( ) + x(ht + h( )) + x2(kt + K( ))/2) In other words, Q is the product of a conditional Gaussian and a table. Since the parameters of the conditional Gaussian Q(xlt) depends on t, an obvious extension of this scheme is to make the contribution that depends on also depending on t. In other words, we replace the single parameter by two parameters t, t = 0, 1, and bound the KL-divergence by C(Q, t) = (logq(t) + logq(x/t) -logp(t) -(gt + g( t) + x(ht + h( t)) + x2(kt + K( t))/2)) Then it follows that for given t, the optimal distribution Q(x, t) is given by Q(x, t) ex P(t) exp (gt + g( t) + x(ht + h( t)) +x2(kt + K( t))/2) To optimise t, we find in analogy with (Murphy, 1999) ; = ((wx + b)2)t :. I..,. I I. :,., : I : I : II I ;,, :I I. I. : I. 10 (a) I ' J / Figure 3: The effect of using approximations with structure in hybrid networks. We show results for a network P(t)P(xlt)P(rix), in which t and r are binary (0/1) and x is continuous. P(t = 1) = 0.3, P(xit) is a conditional Gaussian with p,0,1 = (10, 20), o-0,1 = 1 and p(rlx) is defined using a sigmoid with w = -1, b = 5. (This example is based on the crop network, with t is 'subsidy', x is 'price' and r is 'buy'. 'Crop' is assumed to be observed in its mean value - see (Murphy, 1999) for details). In (a) we plot a( -(wx- b)) as a function of x (solid), as well as the variational lower bound using one optimised unconditional parameter (dotted) and the two bounds for the optimised conditional variational parameters t (dashed). In (b) we plot P(r = 0, x) as a function of x using the exact probability (solid), the approximation using one unconditional parameter (dotted) and two conditional parameters t (dashed - this graph coincides with the exact graph). The largest improvements of this extension can be expected when the posterior distribution (given observation of r) is multi-modal. In figure 3, an example is given. 8 DISCUSSION Finding accurate approximations of graphical models such as Bayesian networks is crucial if their application to large scale problems is to be realised. We have presented a general scheme to use a (simpler) approximating distribution that factorises according to a given structure. The scheme includes approximations using undirected graphs, directed acyclic graphs and junction trees. The approximating distribution is tuned by minimisation of the Kullback-Leibler divergence. We have shown that the method bridges the gap between standard mean field theory and exact computation. We have contributed to a solution for the question how to select the structure of the approximating distribution, and when potentials of the target distribution can be exploited. Parametrised dis- (b)

8 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS tributions with large parent sets can be dealt with by minimising an approximation of the KL-divergence. In the context of hybrid networks, we showed that it can be worthwhile to endow possible additional variational parameters with structure as well. An open question is still how to find good and efficient structures for the variational quantities. Nevertheless, one of conclusions of this paper is that using (more) structure in variational quantities is worthwhile to try if increase in accuracy is needed. It is often compatible within the used variational method (EM-learning, applications to hybrid networks etc), and is often possible without too much computational overhead. As such, variational methods provide a flexible tool for approximation in which accuracy and efficiency can be tuned to the needs and the computational resources of the application. Acknowledgements I would like to thank David Barber, Ali Taylan Cemgil, Tom Heskes and Bert Kappen for useful discussions. I thank Ali Taylan Cemgil for sharing his matlab routines. References Barber, D. and Wiegerinck, W. (1999). Tractable variational structures for approximating graphical models. In Kearns, M., Solla, S., and Cohn, D., editors, Advances in Neural Information Processing Systems 11. MIT Press. Castillo, E., Gutierrez, J. M., and Hadi, A. S. (1997). Expert Systems and Probabilistic Network Models. Springer. Cooper, G. (1990). The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intel ligence, 42( ). Ghahramani, Z. and Jordan, M. (1997). Factorial hidden Markov models. Machine Learning, 29: Haft, M., Hofmann, R., and Tresp, V. (1999). Modelindependent mean field theory as a local method for approximate propagation of information. Network: Computation in Neural Systems, 10: Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of Neural Computation. Addison Wesley, Redwood City. Jaakkola, T. (1997). Variational Methods for Inference and Estimation in Graphical Models. PhD thesis, Massachusetts Institute of Technology. Jaakkola, T. and Jordan, M. (1998). Improving the mean field approximation via the use of mixture distributions. In Jordan, M., editor, Learning in Graphical Models, volume 89 of NATO AS!, series D: Behavioural and Social Sciences, pages Kluwer. Jaakkola, T. and Jordan, M. (1999). Variational probabilistic inference and the QMR-DT network. Journal of Artificial Intel ligence Research, 10: Jensen, F. (1996). An introduction to Bayesian networks. UCL Press. Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L. (1999). An introduction to variational methods for graphical models. Machine Learning, 37(2): Jordan, M. 1., editor (1998). Learning in Graphical Models, volume 89 of NATO AS!, Series D: Behavioural and Social Sciences. Kluwer. Lauritzen, S. and Spiegelhalter, D. (1988). Local computations with probabilties on graphical structures and their application to expert systems. J. Royal Statistical society B, 50: Murphy, K. (1999). A variational approximation for bayesian networks with discrete and continuous latent variables. In Blackmond-Laskey, K. and Prade, H., editors, UAI '9g: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers, Inc. Neal, R. (1992). Connectionist learning of belief networks. Artificial Intel ligence, 56: Neal, R. and Hinton, G. (1998). A view of the em algorithm that justifies incremental, sparse, and other variants. In Jordan, M., editor, Learning in Graphical Models, volume 89 of NATO ASI, series D: Behavioural and Social Sciences, pages Kluwer. Parisi, G. (1988). Statistical Fiel d Theory. Addison-Wesley, Redwood City, CA. Pearl, J. (1988). Probabilistic Reasoning in Intel ligent systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc. Saul, L., Jaakkola, T., and Jordan, M. (1996). Mean field theory for sigmoid belief networks. Journal of Artificial Intel ligence Research, 4: Saul, L. K. and Jordan, M. I. (1996). Exploiting Tractable Substructures in Intractable Networks. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems 8. MIT Press. Whittaker, J. (1990). Graphical models in applied multivariate statistics. Wiley, Chichester. Wiegerinck, W. and Barber, D. (1998). Mean field theory based on belief networks for approximate inference. In Niklasson, L., Boden, M., and Ziemke, T., editors, ICANN'98: Proceedings of the 8th International Conference on Artificial Neural Networks. Springer. Wiegerinck, W. and Kappen, H. (2000). Approximations of Bayesian networks through KL minimisation. New Generation Computing, 18:

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab

Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Finding the M Most Probable Configurations Using Loopy Belief Propagation

Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel

### Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

### Discovering Structure in Continuous Variables Using Bayesian Networks

In: "Advances in Neural Information Processing Systems 8," MIT Press, Cambridge MA, 1996. Discovering Structure in Continuous Variables Using Bayesian Networks Reimar Hofmann and Volker Tresp Siemens AG,

### Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

### Square Root Propagation

Square Root Propagation Andrew G. Howard Department of Computer Science Columbia University New York, NY 10027 ahoward@cs.columbia.edu Tony Jebara Department of Computer Science Columbia University New

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### Approximating the Partition Function by Deleting and then Correcting for Model Edges

Approximating the Partition Function by Deleting and then Correcting for Model Edges Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 995

### BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli

### Learning Vector Quantization: generalization ability and dynamics of competing prototypes

Learning Vector Quantization: generalization ability and dynamics of competing prototypes Aree Witoelar 1, Michael Biehl 1, and Barbara Hammer 2 1 University of Groningen, Mathematics and Computing Science

### Randomization Approaches for Network Revenue Management with Customer Choice Behavior

Randomization Approaches for Network Revenue Management with Customer Choice Behavior Sumit Kunnumkal Indian School of Business, Gachibowli, Hyderabad, 500032, India sumit kunnumkal@isb.edu March 9, 2011

### Incorporating Evidence in Bayesian networks with the Select Operator

Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

### Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed A brief history of backpropagation

### ELICITING a representative probability distribution from

146 IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, VOL. 53, NO. 1, FEBRUARY 2006 Entropy Methods for Joint Distributions in Decision Analysis Ali E. Abbas, Member, IEEE Abstract A fundamental step in decision

### Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

Dynamic programming Doctoral course Optimization on graphs - Lecture.1 Giovanni Righini January 1 th, 201 Implicit enumeration Combinatorial optimization problems are in general NP-hard and we usually

### Examples and Proofs of Inference in Junction Trees

Examples and Proofs of Inference in Junction Trees Peter Lucas LIAC, Leiden University February 4, 2016 1 Representation and notation Let P(V) be a joint probability distribution, where V stands for a

### libdai: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

Journal of Machine Learning Research 11 (2010) 2169-2173 Submitted 2/10; Revised 8/10; Published 8/10 libdai: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models Joris

### Agent-based decentralised coordination for sensor networks using the max-sum algorithm

DOI 10.1007/s10458-013-9225-1 Agent-based decentralised coordination for sensor networks using the max-sum algorithm A. Farinelli A. Rogers N. R. Jennings The Author(s) 2013 Abstract In this paper, we

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### An Introduction to Variational Methods for Graphical Models

Machine Learning, 37, 183 233 (1999) c 1999 Kluwer Academic Publishers. Manufactured in The Netherlands. An Introduction to Variational Methods for Graphical Models MICHAEL I. JORDAN jordan@cs.berkeley.edu

### Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

### Chapter 14 Managing Operational Risks with Bayesian Networks

Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian

### Variational Mean Field for Graphical Models

Variational Mean Field for Graphical Models CS/CNS/EE 155 Baback Moghaddam Machine Learning Group baback @ jpl.nasa.gov Approximate Inference Consider general UGs (i.e., not tree-structured) All basic

### Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

Study Manual Probabilistic Reasoning 2015 2016 Silja Renooij August 2015 General information This study manual was designed to help guide your self studies. As such, it does not include material that is

### SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

### Compression algorithm for Bayesian network modeling of binary systems

Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the

### Topic models for Sentiment analysis: A Literature Survey

Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.

### Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Probabilistic Graphical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 1-3. You must complete all four problems to

### Offline sorting buffers on Line

Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

### Optimal Stopping in Software Testing

Optimal Stopping in Software Testing Nilgun Morali, 1 Refik Soyer 2 1 Department of Statistics, Dokuz Eylal Universitesi, Turkey 2 Department of Management Science, The George Washington University, 2115

### Probabilistic Visualisation of High-dimensional Binary Data

Probabilistic Visualisation of High-dimensional Binary Data Michael E. Tipping Microsoft Research, St George House, 1 Guildhall Street, Cambridge CB2 3NH, U.K. mtipping@microsoit.com Abstract We present

### Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

### Computing with Finite and Infinite Networks

Computing with Finite and Infinite Networks Ole Winther Theoretical Physics, Lund University Sölvegatan 14 A, S-223 62 Lund, Sweden winther@nimis.thep.lu.se Abstract Using statistical mechanics results,

### Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

### Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Ex. 2.1 (Davide Basilio Bartolini)

ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips

### Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Erik B. Reed Carnegie Mellon University Silicon Valley Campus NASA Research Park Moffett Field, CA 94035 erikreed@cmu.edu

### Clustering and scheduling maintenance tasks over time

Clustering and scheduling maintenance tasks over time Per Kreuger 2008-04-29 SICS Technical Report T2008:09 Abstract We report results on a maintenance scheduling problem. The problem consists of allocating

### max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x

Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study

### Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

### Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Oscar Kipersztok Mathematics and Computing Technology Phantom Works, The Boeing Company P.O.Box 3707, MC: 7L-44 Seattle, WA 98124

### A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases Gerardo Canfora Bice Cavallo University of Sannio, Benevento, Italy, {gerardo.canfora,bice.cavallo}@unisannio.it ABSTRACT In

### Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

### The Optimality of Naive Bayes

The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

### Computing Near Optimal Strategies for Stochastic Investment Planning Problems

Computing Near Optimal Strategies for Stochastic Investment Planning Problems Milos Hauskrecfat 1, Gopal Pandurangan 1,2 and Eli Upfal 1,2 Computer Science Department, Box 1910 Brown University Providence,

### Real Time Traffic Monitoring With Bayesian Belief Networks

Real Time Traffic Monitoring With Bayesian Belief Networks Sicco Pier van Gosliga TNO Defence, Security and Safety, P.O.Box 96864, 2509 JG The Hague, The Netherlands +31 70 374 02 30, sicco_pier.vangosliga@tno.nl

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

### Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### Tutorial on Markov Chain Monte Carlo

Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,

### Analysis of Algorithms I: Optimal Binary Search Trees

Analysis of Algorithms I: Optimal Binary Search Trees Xi Chen Columbia University Given a set of n keys K = {k 1,..., k n } in sorted order: k 1 < k 2 < < k n we wish to build an optimal binary search

### Using Markov Decision Processes to Solve a Portfolio Allocation Problem

Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Agent-Based Decentralised Coordination for Sensor Networks using the Max-Sum Algorithm

Noname manuscript No. (will be inserted by the editor) Agent-Based Decentralised Coordination for Sensor Networks using the Max-Sum Algorithm A. Farinelli A. Rogers N. R. Jennings the date of receipt and

### Methods for Constructing Balanced Elimination Trees and Other Recursive Decompositions

Methods for Constructing Balanced Elimination Trees and Other Recursive Decompositions Kevin Grant and Michael C. Horsch Dept. of Computer Science University of Saskatchewan Saskatoon, Saskatchewan, Canada

### HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Solving Hybrid Markov Decision Processes

Solving Hybrid Markov Decision Processes Alberto Reyes 1, L. Enrique Sucar +, Eduardo F. Morales + and Pablo H. Ibargüengoytia Instituto de Investigaciones Eléctricas Av. Reforma 113, Palmira Cuernavaca,

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Structured Learning and Prediction in Computer Vision. Contents

Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision

### Integrating Benders decomposition within Constraint Programming

Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP

### The Application of Bayesian Optimization and Classifier Systems in Nurse Scheduling

The Application of Bayesian Optimization and Classifier Systems in Nurse Scheduling Proceedings of the 8th International Conference on Parallel Problem Solving from Nature (PPSN VIII), LNCS 3242, pp 581-590,

### Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

### Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Proceedings of the IEEE ITSC 2006 2006 IEEE Intelligent Transportation Systems Conference Toronto, Canada, September 17-20, 2006 WA4.1 Deterministic Sampling-based Switching Kalman Filtering for Vehicle

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### 1 Approximating Set Cover

CS 05: Algorithms (Grad) Feb 2-24, 2005 Approximating Set Cover. Definition An Instance (X, F ) of the set-covering problem consists of a finite set X and a family F of subset of X, such that every elemennt

### The Expectation Maximization Algorithm A short tutorial

The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### 3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

3. Interpolation Closing the Gaps of Discretization... Beyond Polynomials Closing the Gaps of Discretization... Beyond Polynomials, December 19, 2012 1 3.3. Polynomial Splines Idea of Polynomial Splines

### Unsupervised Learning

Unsupervised Learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK zoubin@gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~zoubin September 16, 2004 Abstract We give

### Bargaining Solutions in a Social Network

Bargaining Solutions in a Social Network Tanmoy Chakraborty and Michael Kearns Department of Computer and Information Science University of Pennsylvania Abstract. We study the concept of bargaining solutions,

### The Chinese Restaurant Process

COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process.

### 6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture - 9 Basic Scheduling with A-O-A Networks Today we are going to be talking

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

### Distributed computing of failure probabilities for structures in civil engineering

Distributed computing of failure probabilities for structures in civil engineering Andrés Wellmann Jelic, University of Bochum (andres.wellmann@rub.de) Matthias Baitsch, University of Bochum (matthias.baitsch@rub.de)

### Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,

### Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Howard Hua Yang and John Moody Oregon Graduate nstitute of Science and Technology NW, Walker Rd., Beaverton, OR976, USA hyang@ece.ogi.edu,

### Lecture 9. 1 Introduction. 2 Random Walks in Graphs. 1.1 How To Explore a Graph? CS-621 Theory Gems October 17, 2012

CS-62 Theory Gems October 7, 202 Lecture 9 Lecturer: Aleksander Mądry Scribes: Dorina Thanou, Xiaowen Dong Introduction Over the next couple of lectures, our focus will be on graphs. Graphs are one of

### The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

### Fleet Assignment Using Collective Intelligence

Fleet Assignment Using Collective Intelligence Nicolas E Antoine, Stefan R Bieniawski, and Ilan M Kroo Stanford University, Stanford, CA 94305 David H Wolpert NASA Ames Research Center, Moffett Field,

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

### ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.

### Lecture 5: Conic Optimization: Overview

EE 227A: Conve Optimization and Applications January 31, 2012 Lecture 5: Conic Optimization: Overview Lecturer: Laurent El Ghaoui Reading assignment: Chapter 4 of BV. Sections 3.1-3.6 of WTB. 5.1 Linear

### Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

### Testing LTL Formula Translation into Büchi Automata

Testing LTL Formula Translation into Büchi Automata Heikki Tauriainen and Keijo Heljanko Helsinki University of Technology, Laboratory for Theoretical Computer Science, P. O. Box 5400, FIN-02015 HUT, Finland