# Compression algorithm for Bayesian network modeling of binary systems

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the reliability of systems. The BN framework is limited, however, by the size and complexity of the system that can be tractably modeled. Each node in a BN graph is associated with a conditional probability table (CPT), the size of which grows exponentially with the number of connected nodes in the graph, presenting a memory storage challenge in constructing and analyzing the BN. In this paper, we look at binary systems, where components of the system are in either one of two states, survival or failure, and the component states deterministically define the system state. This analysis is particularly useful for studying the reliability of infrastructure systems, where, e.g., the states of individual gas pipelines or roads directly impact the state of the overall natural gas or transportation system. We present a compression algorithm for the CPTs of such systems so that they may be modeled on a larger scale as BNs. We apply our algorithm to an example system and evaluate its performance compared to an existing algorithm. 1 INTRODUCTION A Bayesian network (BN) is a probabilistic framework well suited for modeling systems and analyzing their reliability. A BN is a directed acyclic graph comprised of nodes and links. In the BN system model, the components of the system are represented as the nodes of the graph, and the directed links between the nodes represent the dependencies between the components. Since the state of the system depends on the states of its components, the node representing the system is defined as a child of the nodes representing the components, i.e., links are directed from component nodes to the system node. A BN model enables analysis of the contributions of individual components to the overall system reliability and the identification of critical components within the system. As information in the model is handled probabilistically, the BN framework is particularly useful for decision-making under uncertainty for infrastructure systems, where component demands and capacities are uncertain. In addition, the BN framework allows for updating of the network as new information, or evidence, becomes available. When evidence on one or more variables is entered into the BN, the information propagates through the network to yield updated probabilities of the system performance in light of the new information. This enables decision making about the system based on the most up-todate information. The BN framework is limited, however, by the size and complexity of the system that can be tractably modeled. Each node in the BN is associated with a conditional probability table (CPT), which defines the conditional probability mass function of the node, given each mutually exclusive combination of the states of its parent nodes. Consequently, the size of the CPT grows exponentially with the number of parents of the node. For example, if every node has! states, then the CPT has!!!! entries, where! denotes the number of parents. Thus, in a binary system with 100 binary components (i.e., each component having two states, e.g., survival or failure), the CPT for the system node consists of 2!"! = 2.5E30 individual terms. This poses a significant memory storage challenge in constructing and analyzing the BN. For this reason, BN modeling of systems has been limited to small systems, typically of no more than 20 two-state components. In this paper, we present a novel compression algorithm to enable the modeling of large, complex systems as BNs. The algorithm encodes the system CPT in compressed form, both in the initial construction of the BN and in the subsequent calculations for inference, to achieve significant savings in memory storage. The algorithm, however, necessitates longer computation time. Thus, a trade-off is made between the demand for memory storage,

2 which has a hard limit, and the demand for computation time. Several features are added to the compression algorithm to reduce the computation time. An example application demonstrates the memory and computational demands of the proposed algorithm versus the most widely used existing algorithm. systems, such as gas, power, or transportation systems, where the states of individual gas pipelines, electrical transmission lines, or roadways directly determine the state of the infrastructure system. The BN of a system with! components!!,,!! is shown in Figure 1, where!"! denotes the system node. 2 RELATED WORK Examples in the existing literature of the use of BNs for modeling system performance are limited (Bensi et al. 2011). Previous studies have focused on generating BNs from conventional system modeling methods, such as reliability block diagrams (Torres- Toledano and Succar 1998) and fault trees (Bobbio et al. 2001). However, these studies have been limited to small systems comprised of no more than ten components. More recently, BNs have been used to model the reliability of slightly larger systems, including a system of 16 components in Boudali and Dugan (2005). However, the authors state that this large number of components makes it practically impossible to solve the network without resorting to simplifying assumptions or approximations. In addition, it is clear that even a network of 16 components is not enough to create a full model of many real-world systems. A method utilizing Reduced Ordered Binary Decision Diagrams (ROBDDs) to efficiently perform inference in BNs representing large systems with binary nodes is proposed in Nielsen et al. (2000). However, a troubleshooting model is considered, which includes a major assumption of single-fault failures. It is this assumption that bounds the size of the ROBDD and enables modeling of larger systems. For more general systems, this single-fault assumption cannot be guaranteed, and the gains from using the ROBDD may not be applicable. Finally, a topology optimization algorithm is proposed in Bensi et al. (2013) to develop a chain-like BN model of the system with minimal clique sizes. The optimization program, however, must consider the permutation of all component indices and, therefore, may itself become intractably large for large systems. 3 THE PROPOSED METHOD 3.1 Binary systems Consider a system consisting of binary components, i.e., where each component is in one of two possible states, survival or failure. We assume the component states deterministically define the system state and that the system is also binary, i.e., for any combination of the component states, the system is in either a survival or a failure state. This model is particularly useful for studying the reliability of infrastructure Figure 1. BN of a system with n components. As indicated by the dotted arrows in Figure 1, the BN model accommodates parent nodes to the component nodes, which may represent demands on the components resulting from a hazard. Here, the compression algorithm developed focuses on the system description part of the BN enclosed in the dashed box, which models the system performance. In addition, while the algorithm is developed for binary systems, it can be extended to multi-state flow systems, e.g., where the component states are discretized values of a flow capacity, e.g., 0%, 50%, and 100% of maximum capacity. In such a system, the flow capacity of individual components determines the flow capacity of the overall system. When the component states deterministically define the system state, the CPT associated with the system node has a special property. Since for each distinct combination of component states the system state is known with certainty, the system CPT is comprised solely of 0s and 1s. Table 1 shows an example of such a CPT for a binary system with binary components. As seen in the rightmost column of the table, the vector representing the system state given each distinct combination of component states is comprised solely of 0s and 1s and has the size 2! 1. The proposed compression algorithm takes advantage of this property. Table 1. Example CPT for a binary system with n components. C 1 C n-1 C n sys

3 3.2 Compression algorithm The developed algorithm utilizes compression techniques, including run-length encoding and the classical Lempel-Ziv algorithm. A run is defined as consecutive bits of the same value. Run-length encoding stores runs in a dataset as a data value and count, making it well suited for data with many repeated values. However, mixed values are stored literally, which results in little gain for mixed data. Algorithms based on the classical Lempel-Ziv algorithm (Ziv and Lempel 1977) find patterns in the data, construct a dictionary of phrases, and encode based on repeated instances of phrases in the dictionary. The proposed compression algorithm uses both these ideas to compress the system CPT of the BN. A binary system can be defined in terms of its minimum-cut sets (MCSs) or minimum-link sets (MLSs). A MCS is a minimum set of components whose joint failure constitutes failure of the system; if any one MCS fails, the system fails. Similarly, a MLS is a minimum set of components whose joint survival constitutes survival of the system; if any one MLS survives, the system survives. Each combination of component states in the system CPT (see Table 1) is checked against MCSs or MLSs to determine the system state. For a given combination of component states, this is the rightmost value in the row, the last column in Table 1. This value is then encoded in a compressed form, with repeated values in the column encoded as runs, and patterns in the column encoded as phrases with an accompanying dictionary of phrases. Note that we only need to compress the vector of system states; there is no need to compress the entire CPT. This is because the component states in any row can be determined in terms of the row number due to the specific pattern used in defining the component states in the table. Specifically, the state of component!,!!, in row! of the CPT for a system of! total components is determined according to the rule!! = 0!if#!"#\$ 1!if#!"#\$!!!is#odd#!is#even (1) where!"#\$(!) is the value of! rounded up to the nearest integer. The result from employing the above algorithm is that the full system CPT becomes encoded as a compressed combination of runs and phrases. Typically, the size of this data is orders of magnitude smaller than the size of the CPT. 3.3 Inference Once the system CPT for the BN has been constructed in a compressed form, inference is required to draw conclusions about the system. There are both exact and approximate methods for inference. Approximate methods are generally sampling based. However, for a reliable system where failure events are rare, these methods often yield poor convergence due to the unlikelihood of the events that are being modeled, and the number of simulations required to achieve a sufficiently large sample is prohibitive. Therefore, exact methods of inference are preferred and are considered here. There are two major algorithms used for exact inference: the junction tree (JT) algorithm (Spiegelhalter et al. 1993) and the variable elimination (VE) algorithm (Dechter 1999). The advantage of the JT algorithm comes from breaking a network down into smaller structures, or cliques. However, for the network we have in Figure 1, the BN is comprised of only one clique of size! + 1. As the size of the system increases, computation of the potential over this clique during the initialization of the JT increases exponentially and becomes intractable. For this reason, here the VE algorithm is used for inference. As its name suggests, the VE algorithm works by eliminating the nodes in the network, one by one, to arrive at the query node, the node for which the posterior (updated) probability distribution is of interest. Elimination of each node corresponds to summing of the joint distribution over all states of the node to be eliminated, resulting in an intermediate factor that is used during the next step of elimination. For the system given in Figure 1, suppose we are interested in the posterior distribution of the state of component 1 given a particular state of the system. The VE calculation for this query is!!!!"! =!!!!!!!! =!!!!!!!!!!!!!"#!"!!!!!!!! =!!! =!!!!! (2) where!"!!"! is the compressed system CPT and!! is the intermediate factor created after elimination of node (component)!. Once the initial!"!!"! has been compressed, the key to successfully performing inference for a large system is that each subsequent!! must also be stored using the same compression algorithm as used for!"!!"!. The algorithm for inference that we have developed employs the VE method and is able to handle the compressed!"!!"! and!! without decompressing or recompressing. In this way, the memory requirements for storing both the initial system CPT and the intermediate factors

4 during inference are significantly reduced, enabling BN modeling of large systems. 3.4 Example system We start with the 8-component example system shown in Figure 2(a), which is adopted from Bensi et al. (2013). The system consists of a series subsystem {C4, C5, C6}, and a parallel subsystem {C1, C2, C3}. Because the objective is to see how the proposed algorithm scales with increasing system size, we increase the number of components in these subsystems and analyze the performance of the algorithm as the number of components increases. We note that, as pointed out by Der Kiureghian and Song (2008), Song and Ok (2010) and Bensi et al. (2013), the system in Figure 2(a) can be more efficiently represented as a system of three supercomponents, each super-component representing a series or parallel subsystem. However, here we disregard this advantage in order to investigate the system size effect. C1 C2 C3 source C4 C5 C7 C8 sink C4 C5 sink Cn-1 Cn sink C6 (a) C1 C2 C3 source C6 Cn (b) C1 Cn-5 source Cn-4 Cn-3 Cn-2 (c) Figure 2. Example systems: (a) basic case, (b) with increased components in series subsystem, and (c) with increased components in parallel subsystem. We increase the number of components in the series subsystem so that the total number of components in the system is!, as in Figure 2(b). We also increase the number of components in the parallel subsystem so that the total number of components in the system is!, as in Figure 2(c). The resulting analysis of these two expanded systems, Figure 2(b) and Figure 2(c), with variable! shows how the proposed compression algorithm performs compared to existing algorithms to analyze systems of increasing size. In order to determine the state of the system for each distinct combination of the states of the components, we use the MCS representation of the system. A similar MLS formulation of the system can also be used. The MCSs of the system in Figure 2(a) are {(!!,!!,!!,!! ), (!!,!!,!!,!! ), (!!,!!,!!,!! ), (!! ), (!! )}. The MCSs of the system in Figure 2(b) with the expanded series subsystem are {(!!,!!,!!,!! ),, (!!,!!,!!,!! ), (!! ), (!! )}. And the MCSs of the system in Figure 2(c) with the expanded parallel subsystem are {!!,!,,,!!,,,,!!,,,,,!! } 4 RESULTS 4.1 Inference The BN is initialized with a prior probability of failure for the parallel components,!!,!!,!! in Figure 2(b), and!!,!, in Figure 2(c), of 0.2, and a prior probability of failure for the components in series,!!,,!! in Figure 2(b), and,,!! in Figure 2(c), of We are interested in updating the probabilities of failure of the system and components given new information, or evidence. The proposed new algorithm consists of first implementing the developed compression algorithm for the full system CPT to construct the BN, and then implementing the developed inference algorithm to perform VE on the compressed matrices. For illustration purposes, in the following we conduct inference using component 1. Figure 3(a) shows the updated probabilities of system failure given the evidence that component 1 has failed, i.e.,!! = 0. Figure 3(b) shows the updated probabilities of failure of component 1, given the evidence that the system has failed. These updated probabilities are plotted as a function of increasing system size, as indicated by the increasing number of total components in the system,!. The sequence connected by a solid line indicates the results from increasing the number of components in the series subsystem (Figure 2b), and the sequence connected by the dashed line indicates the results from increasing the number of components in the parallel subsystem (Figure 2c).The results in Figure 3 show that the proposed algorithm successfully performs both forward and backward inference. In Figure 3(a), we see that the probability that the system fails increases as the number of components in the series subsystem increases, as there are more MCSs that can fail to lead to system failure. In contrast, as the number of components in the parallel subsystem increases, for a system of 11 components or more, the updated

5 probability of system failure remains essentially constant. This is because the probability of failure of MCSs involving the increasing number of parallel components becomes small, and the system failure probability becomes dominated by the failure probabilities of the two single-component minimum cut sets {!!!! } and {!! }. P(sys=0 C1=0) P(C1=0 sys=0) (a) (b) increase series increase series Figure 3. Updated probabilities of (a) system state given component state, and (b) component state given system state with increasing system size BN of a system with n components. In Figure 3(b), we see that as the number of components in the series subsystem increases, the conditional probability that!! has failed given that the system has failed increases. With an increased number of components in the series subsystem, there are an increased number of MCSs that involve component 1, i.e., {!!,!!,!!,!! },! = 6,,!. Since failure of any of these MCSs leads to system failure, an increase in the system size gives a higher probability that failure of component!! was necessary for the system to fail. In contrast, as the number of components in the parallel subsystem increases, the probability of failure for component!!!given system failure again converges for a system of 11 components or more. In this case, the evidence on the system state is not informative for the component, and the updated probability of failure converges to the prior probability of failure of component!!. 4.2 Memory storage To analyze the performance of the new algorithm compared to existing algorithms, we first look at memory storage. Figure 4 shows the maximum number of values that must be stored in memory during the running of the algorithms, which is used as a proxy to assess the memory storage requirements of each algorithm. The algorithms are run in Matlab v7.10 on a 32 Gb RAM computer. The circle marks indicate the maximum number of values stored when running the new algorithm, including the values both in the compressed CPT and in the accompanying phrase dictionary. The squares indicate the maximum number of elements stored during the construction of the BN using the existing JT algorithm, as implemented in the Bayes Net Toolbox by Murphy (2001). The X mark indicates the maximum size of the system after which the existing algorithm can no longer be used because the memory demand exceeds the available memory storage capacity. These results are the same for both the system obtained from increasing the number of components in the series subsystem (Figure 2b), and from increasing the number of components in the parallel subsystem (Figure 2c). maximum number of elements stored New Existing out of memory Figure 4. Maximum number of elements that must be stored using the new vs. existing algorithm as a function of system size. Figure 4 shows that the proposed new algorithm achieves significant gains in memory storage demand compared to the existing algorithm. For the existing JT algorithm, the memory storage demand, as measured by the maximum number of values that must be stored, increases exponentially with the number of components in the system. In fact, the al-

6 gorithm runs out of memory on our 32Gb RAM computer for a system comprised of more than 24 components. The memory storage demand using the proposed new algorithm, however, remains constant, even as the number of components in the system increases. 4.3 Computation time Figure 5 shows the computation times required to run the algorithms with increasing system size. In Figure 5(a), the computation times are broken into the various functions for each algorithm. The bars labeled New - compression indicate the time required to compress the system CPT using the proposed compression algorithm. The next two solid bars indicate the time required to perform inference on the system given {!! = 0} and the time to perform inference on the component!! given {!"! = 0}, respectively, using the proposed compression algorithm based on the VE method. The bars labeled Existing - initialization indicate the time required to initialize the BN using the existing JT algorithm. The next two bars with diagonal hatching indicate the time required to perform inference on the system given information on the component!!, and to perform inference on the component!! given information on the system, respectively, using JT. The computation times are recorded for systems of increasing size, as indicated by the total number of components in the system. The results in Figure 5(a) are for the case where the size of the system increases by increasing the number of components in the series subsystem. Figure 5(b) compares the computation times for the proposed algorithm for the two systems in Figures 3(a) and 3(b). Again the computation times are broken down into the component for compressing the system CPT, to perform inference on the system given the state of component!!, and to perform inference on component!! given the state of the system. The solid bars are for the case where the size of the system is increased by increasing the number of components in the series subsystem, and the bars with diagonal hatching are for the case where the size of the system is increased by increasing the number of components in the parallel subsystem. Taking Figures 4 and 5 together, we see the classic storage space-computation time trade-off as described in Dechter (1999). In Figure 5(a), we see that the new algorithm requires longer computation times compared to the existing JT algorithm. However, the time to perform inference for both the new and existing algorithms is exponentially increasing with the system size. It is important to note that the natures of the memory and time constraints are fundamentally different. Memory storage is a hard constraint. If the maximum size of a CPT exceeds the storage capacity of a program or machine, no analysis can be performed. In contrast, computation time is more flexible. Indeed, various recourses are available to address the computational time, such as parallel computing. While reliability analyses will take longer using the new algorithm, some problems simply cannot be solved using existing algorithms. computation time [s] computation time [s] New - compression P(sys C1) inference P(C1 sys) inference Existing - initialization P(sys C1) inference P(C1 sys) inference (a) New - compression - increase series P(sys C1) inference - increase series P(C1 sys) inference - increase series (b) Figure 5. Computation times as function of system size: (a) proposed algorithm vs. existing JT algorithm, (b) proposed algorithm when increasing the number of components in the series vs. parallel subsystems. In Figure 5(b), by comparing the results of increasing the size of the series subsystem vs. the parallel subsystem, we see that the algorithm performs slightly better in the initial compression of the system CPT and in both inference scenarios for the systems with an increased number of parallel components. The algorithm, therefore, is better suited to systems that can be formulated as few MCSs comprised of many components each, compared to systems formulated as many MCSs of few components each. In the latter case, it is preferable to use an MLS formulation of the system.

7 5 CONCLUSION We have developed a compression algorithm that significantly reduces the memory storage requirements during the construction of a BN for system reliability analysis. In addition, we have developed a variable elimination (VE) algorithm to perform inference using compressed matrices. It is shown that by implementing the proposed algorithm, the memory storage demand during construction of the BN and during inference not only does not exponentially increase as the number of components in the system increases, but essentially remains constant. Together, these algorithms enable large systems to be modeled as BNs for system reliability analysis. Spiegelhalter, D. J., Dawid, A. P., Lauritzen, S. L., and Cowell, R. G., Bayesian Analysis in Expert Systems, Statistical Science, Vol. 8, No. 3, pp , Torres-Toledano, J. G., and Succar, L. E., Bayesian networks for reliability analysis of complex systems, Lecture Notes in Artificial Intelligence 1484, pp , Ziv, J., and Lempel, A., A Universal Algorithm for Sequential Data Compression, IEEE Transactions on Information Theory, Vol. 23, No. 3, pp , May ACKNOWLEDGEMENTS The first author gratefully acknowledges support from the National Science Foundation Graduate Research Fellowship. Additional support is provided by the National Science Foundation Grant No. CMMI , which is also gratefully acknowledged. REFERENCES Bensi, M. T., Der Kiureghian, A., and Straub, D., A Bayesian Network Methodology for Infrastructure Seismic Risk Assessment and Decision Support, Pacific Earthquake Engineering Research Center (PEER) Report No. 2011/02, University of California, Berkeley, March Bensi, M., A. Der Kiureghian and D. Straub (2013). Efficient Bayesian network modeling of systems. Reliability Engineering and System Safety, 112: Bobbio, A., Portinale, L, Minichino, M., and Ciancamerla, E., Improving the analysis of dependable systems by mapping fault trees into Bayesian networks, Reliability Engineering and System Safety, Vol. 71, No. 3, pp , Boudali, H., and Dugan, J. B., A discrete-time Bayesian network reliability modeling and analysis framework, Reliability Engineering and System Safety, Vol. 87, pp , Dechter, R., Bucket Elimination: a Unifying Framework for Reasoning, Artificial Intelligence, Vol. 113, pp , Der Kiureghian, A., and Song, J., Multi-scale reliability analysis and updating of complex systems by use of linear programming, Reliability Engineering and System Safety, Vol. 93, pp , Murphy, K. P., The Bayes Net Toolbox for Matlab, Computing Science and Statistics: Proceedings of the Interface, Vol. 33, October Nielsen, T. D., Wuillemin, P. H., and Jensen, F. V., Using ROBDDs for inference in Bayesian networks with troubleshooting as an example, Proceedings of the 16 th Conference in Uncertainty in Artificial Intelligence, Stanford University, Stanford, CA, pp , June 30 July 3, Song, J., and Ok, S. Y., Multi-scale system reliability analysis of lifeline networks under earthquake hazards, Earthquake Engineering and Structural Dynamics, Vol. 39, pp , 2010.

### Artificial Intelligence Mar 27, Bayesian Networks 1 P (T D)P (D) + P (T D)P ( D) =

Artificial Intelligence 15-381 Mar 27, 2007 Bayesian Networks 1 Recap of last lecture Probability: precise representation of uncertainty Probability theory: optimal updating of knowledge based on new information

### Finding the M Most Probable Configurations Using Loopy Belief Propagation

Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Informatics 2D Reasoning and Agents Semester 2, 2015-16

Informatics 2D Reasoning and Agents Semester 2, 2015-16 Alex Lascarides alex@inf.ed.ac.uk Lecture 29 Decision Making Under Uncertainty 24th March 2016 Informatics UoE Informatics 2D 1 Where are we? Last

### Bayesian Networks. Mausam (Slides by UW-AI faculty)

Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide

### Causal Independence for Probability Assessment and Inference Using Bayesian Networks

Causal Independence for Probability Assessment and Inference Using Bayesian Networks David Heckerman and John S. Breese Copyright c 1993 IEEE. Reprinted from D. Heckerman, J. Breese. Causal Independence

### SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

### Chapter 14 Managing Operational Risks with Bayesian Networks

Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian

### Incorporating Evidence in Bayesian networks with the Select Operator

Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca

### Invited Applications Paper

Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM

### Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

### Chapter 28. Bayesian Networks

Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, Byoung-Hee and Lim, Byoung-Kwon Biointelligence

### MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

### MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS

MONITORING AND DIAGNOSIS OF A MULTI-STAGE MANUFACTURING PROCESS USING BAYESIAN NETWORKS Eric Wolbrecht Bruce D Ambrosio Bob Paasch Oregon State University, Corvallis, OR Doug Kirby Hewlett Packard, Corvallis,

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Image Compression through DCT and Huffman Coding Technique

International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

### The PageRank Citation Ranking: Bring Order to the Web

The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

### Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

### Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

### Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Decision trees DECISION NODES. Kluwer Academic Publishers 2001 10.1007/1-4020-0611-X_220. Stuart Eriksen 1, Candice Huynh 1, and L.

Kluwer Academic Publishers 21 1.17/1-42-611-X_22 Decision trees Stuart Eriksen 1, Candice Huynh 1, and L. Robin Keller 1 (1) University of California, Irvine, USA Without Abstract A decision tree is a

### 1. Nondeterministically guess a solution (called a certificate) 2. Check whether the solution solves the problem (called verification)

Some N P problems Computer scientists have studied many N P problems, that is, problems that can be solved nondeterministically in polynomial time. Traditionally complexity question are studied as languages:

### Decision-Theoretic Troubleshooting: A Framework for Repair and Experiment

Decision-Theoretic Troubleshooting: A Framework for Repair and Experiment John S. Breese David Heckerman Abstract We develop and extend existing decisiontheoretic methods for troubleshooting a nonfunctioning

### Chapter 11 Monte Carlo Simulation

Chapter 11 Monte Carlo Simulation 11.1 Introduction The basic idea of simulation is to build an experimental device, or simulator, that will act like (simulate) the system of interest in certain important

### MapReduce for Bayesian Network Parameter Learning using the EM Algorithm

apreduce for Bayesian Network Parameter Learning using the E Algorithm Aniruddha Basak Carnegie ellon University Silicon Valley Campus NASA Research Park, offett Field, CA 94035 abasak@cmu.edu Irina Brinster

### An Empirical Evaluation of Bayesian Networks Derived from Fault Trees

An Empirical Evaluation of Bayesian Networks Derived from Fault Trees Shane Strasser and John Sheppard Department of Computer Science Montana State University Bozeman, MT 59717 {shane.strasser, john.sheppard}@cs.montana.edu

### 13.3 Inference Using Full Joint Distribution

191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

### Bayesian Networks Chapter 14. Mausam (Slides by UW-AI faculty & David Page)

Bayesian Networks Chapter 14 Mausam (Slides by UW-AI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation

### SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

### A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases Gerardo Canfora Bice Cavallo University of Sannio, Benevento, Italy, {gerardo.canfora,bice.cavallo}@unisannio.it ABSTRACT In

### December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

### Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams André Ciré University of Toronto John Hooker Carnegie Mellon University INFORMS 2014 Home Health Care Home health care delivery

### Real Time Traffic Monitoring With Bayesian Belief Networks

Real Time Traffic Monitoring With Bayesian Belief Networks Sicco Pier van Gosliga TNO Defence, Security and Safety, P.O.Box 96864, 2509 JG The Hague, The Netherlands +31 70 374 02 30, sicco_pier.vangosliga@tno.nl

### Measuring the Performance of an Agent

25 Measuring the Performance of an Agent The rational agent that we are aiming at should be successful in the task it is performing To assess the success we need to have a performance measure What is rational

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### Scheduling. Open shop, job shop, flow shop scheduling. Related problems. Open shop, job shop, flow shop scheduling

Scheduling Basic scheduling problems: open shop, job shop, flow job The disjunctive graph representation Algorithms for solving the job shop problem Computational complexity of the job shop problem Open

### Predicting Flight Delays

Predicting Flight Delays Dieterich Lawson jdlawson@stanford.edu William Castillo will.castillo@stanford.edu Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing

### DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS

Hacettepe Journal of Mathematics and Statistics Volume 33 (2004), 69 76 DETERMINING THE CONDITIONAL PROBABILITIES IN BAYESIAN NETWORKS Hülya Olmuş and S. Oral Erbaş Received 22 : 07 : 2003 : Accepted 04

### Load Balancing and Switch Scheduling

EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

### JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Scientiae Mathematicae Japonicae Online, Vol. 10, (2004), 431 437 431 JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS Ondřej Čepeka and Shao Chin Sung b Received December May 12, 2003; revised February

### PRACTICE BOOK COMPUTER SCIENCE TEST. Graduate Record Examinations. This practice book contains. Become familiar with. Visit GRE Online at www.gre.

This book is provided FREE with test registration by the Graduate Record Examinations Board. Graduate Record Examinations This practice book contains one actual full-length GRE Computer Science Test test-taking

### Evaluation of Machine Learning Techniques for Green Energy Prediction

arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques

M A C H I N E L E A R N I N G P R O J E C T F I N A L R E P O R T F A L L 2 7 C S 6 8 9 CLASSIFICATION OF TRADING STRATEGIES IN ADAPTIVE MARKETS MARK GRUMAN MANJUNATH NARAYANA Abstract In the CAT Tournament,

### Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

### Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

### UNCERTAINTIES OF MATHEMATICAL MODELING

Proceedings of the 12 th Symposium of Mathematics and its Applications "Politehnica" University of Timisoara November, 5-7, 2009 UNCERTAINTIES OF MATHEMATICAL MODELING László POKORÁDI University of Debrecen

### RN-Codings: New Insights and Some Applications

RN-Codings: New Insights and Some Applications Abstract During any composite computation there is a constant need for rounding intermediate results before they can participate in further processing. Recently

### CS 188: Artificial Intelligence. Probability recap

CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability

### RN-coding of Numbers: New Insights and Some Applications

RN-coding of Numbers: New Insights and Some Applications Peter Kornerup Dept. of Mathematics and Computer Science SDU, Odense, Denmark & Jean-Michel Muller LIP/Arénaire (CRNS-ENS Lyon-INRIA-UCBL) Lyon,

### Analysis of Compression Algorithms for Program Data

Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory

### Modeling Prospect Dependencies with Bayesian Networks*

Modeling Prospect Dependencies with Bayesian Networks* 1 Search and Discovery Article #40553 (2010) Posted June, 2010 *Adapted from oral presentation at AAPG Annual Convention and Exhibition, New Orleans,

### COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Binary numbers The reason humans represent numbers using decimal (the ten digits from 0,1,... 9) is that we have ten fingers. There is no other reason than that. There is nothing special otherwise about

### Chapter 13: Binary and Mixed-Integer Programming

Chapter 3: Binary and Mixed-Integer Programming The general branch and bound approach described in the previous chapter can be customized for special situations. This chapter addresses two special situations:

### International Journal of Software and Web Sciences (IJSWS) www.iasir.net

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

### INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models

Integer Programming INTEGER PROGRAMMING In many problems the decision variables must have integer values. Example: assign people, machines, and vehicles to activities in integer quantities. If this is

### Chapter 10: Network Flow Programming

Chapter 10: Network Flow Programming Linear programming, that amazingly useful technique, is about to resurface: many network problems are actually just special forms of linear programs! This includes,

### 14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation

Continuous Time Bayesian Networks for Inferring Users Presence and Activities with Extensions for Modeling and Evaluation Uri Nodelman 1 Eric Horvitz Microsoft Research One Microsoft Way Redmond, WA 98052

### Report Paper: MatLab/Database Connectivity

Report Paper: MatLab/Database Connectivity Samuel Moyle March 2003 Experiment Introduction This experiment was run following a visit to the University of Queensland, where a simulation engine has been

### Cluster Analysis for Evaluating Trading Strategies 1

CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. Jeff.Bacidore@itg.com +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. Kathryn.Berkow@itg.com

### GameTime: A Toolkit for Timing Analysis of Software

GameTime: A Toolkit for Timing Analysis of Software Sanjit A. Seshia and Jonathan Kotker EECS Department, UC Berkeley {sseshia,jamhoot}@eecs.berkeley.edu Abstract. Timing analysis is a key step in the

### Betting Best-Of Series

Betting Best-Of Series John Mount May 7, 8 Introduction We use the United States Major League Baseball World Series to demonstrate some of the arbitrage arguments used in mathematical finance. This problem

### METHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS.

SEDM 24 June 16th - 18th, CPRI (Italy) METHODOLOGICL CONSIDERTIONS OF DRIVE SYSTEM SIMULTION, WHEN COUPLING FINITE ELEMENT MCHINE MODELS WITH THE CIRCUIT SIMULTOR MODELS OF CONVERTERS. Áron Szûcs BB Electrical

### Agenda. Interface Agents. Interface Agents

Agenda Marcelo G. Armentano Problem Overview Interface Agents Probabilistic approach Monitoring user actions Model of the application Model of user intentions Example Summary ISISTAN Research Institute

### Artificial Intelligence. Conditional probability. Inference by enumeration. Independence. Lesson 11 (From Russell & Norvig)

Artificial Intelligence Conditional probability Conditional or posterior probabilities e.g., cavity toothache) = 0.8 i.e., given that toothache is all I know tation for conditional distributions: Cavity

### A Statistical Framework for Operational Infrasound Monitoring

A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LA-UR 11-03040 The views expressed here do not necessarily reflect the views of the United States Government,

### Big Data from a Database Theory Perspective

Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7 - Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous

### A Logical Approach to Factoring Belief Networks

A Logical Approach to Factoring Belief Networks Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 darwiche@cs.ucla.edu Abstract We have recently proposed

### Learning in Abstract Memory Schemes for Dynamic Optimization

Fourth International Conference on Natural Computation Learning in Abstract Memory Schemes for Dynamic Optimization Hendrik Richter HTWK Leipzig, Fachbereich Elektrotechnik und Informationstechnik, Institut

### Hybrid Lossless Compression Method For Binary Images

M.F. TALU AND İ. TÜRKOĞLU/ IU-JEEE Vol. 11(2), (2011), 1399-1405 Hybrid Lossless Compression Method For Binary Images M. Fatih TALU, İbrahim TÜRKOĞLU Inonu University, Dept. of Computer Engineering, Engineering

### Decentralized Method for Traffic Monitoring

Decentralized Method for Traffic Monitoring Guillaume Sartoretti 1,2, Jean-Luc Falcone 1, Bastien Chopard 1, and Martin Gander 2 1 Computer Science Department 2 Department of Mathematics, University of

### Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

### An On-Line Algorithm for Checkpoint Placement

An On-Line Algorithm for Checkpoint Placement Avi Ziv IBM Israel, Science and Technology Center MATAM - Advanced Technology Center Haifa 3905, Israel avi@haifa.vnat.ibm.com Jehoshua Bruck California Institute

### I I I I I I I I I I I I I I I I I I I

A ABSTRACT Method for Using Belief Networks as nfluence Diagrams Gregory F. Cooper Medical Computer Science Group Medical School Office Building Stanford University Stanford, CA 94305 This paper demonstrates

### An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration

An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration Toktam Taghavi, Andy D. Pimentel Computer Systems Architecture Group, Informatics Institute

### COS 116 The Computational Universe Laboratory 9: Virus and Worm Propagation in Networks

COS 116 The Computational Universe Laboratory 9: Virus and Worm Propagation in Networks You learned in lecture about computer viruses and worms. In this lab you will study virus propagation at the quantitative

### Decision Trees and Networks

Lecture 21: Uncertainty 6 Today s Lecture Victor R. Lesser CMPSCI 683 Fall 2010 Decision Trees and Networks Decision Trees A decision tree is an explicit representation of all the possible scenarios from

### Chapter 2: Systems of Linear Equations and Matrices:

At the end of the lesson, you should be able to: Chapter 2: Systems of Linear Equations and Matrices: 2.1: Solutions of Linear Systems by the Echelon Method Define linear systems, unique solution, inconsistent,

### Discuss the size of the instance for the minimum spanning tree problem.

3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can

### Writing Use Case Scenarios for Model Driven Development

Writing Use Case Scenarios for Model Driven Development This guide outlines how to use Enterprise Architect to rapidly build Use Cases and increase your productivity through Model Driven Development. Use

### Model-based Synthesis. Tony O Hagan

Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

### A Brief Study of the Nurse Scheduling Problem (NSP)

A Brief Study of the Nurse Scheduling Problem (NSP) Lizzy Augustine, Morgan Faer, Andreas Kavountzis, Reema Patel Submitted Tuesday December 15, 2009 0. Introduction and Background Our interest in the

### Why? A central concept in Computer Science. Algorithms are ubiquitous.

Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

### Chapter 4 DECISION ANALYSIS

ASW/QMB-Ch.04 3/8/01 10:35 AM Page 96 Chapter 4 DECISION ANALYSIS CONTENTS 4.1 PROBLEM FORMULATION Influence Diagrams Payoff Tables Decision Trees 4.2 DECISION MAKING WITHOUT PROBABILITIES Optimistic Approach

### Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

### Arithmetic Coding: Introduction

Data Compression Arithmetic coding Arithmetic Coding: Introduction Allows using fractional parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip More time costly than Huffman, but integer implementation

### A Fresh Look at Cost Estimation, Process Models and Risk Analysis

A Fresh Look at Cost Estimation, Process Models and Risk Analysis Frank Padberg Fakultät für Informatik Universität Karlsruhe, Germany padberg@ira.uka.de Abstract Reliable cost estimation is indispensable

### DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

### Object Oriented Programming. Risk Management

Section V: Object Oriented Programming Risk Management In theory, there is no difference between theory and practice. But, in practice, there is. - Jan van de Snepscheut 427 Chapter 21: Unified Modeling

### princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of

### Bayesian Networks and Classifiers in Project Management

Bayesian Networks and Classifiers in Project Management Daniel Rodríguez 1, Javier Dolado 2 and Manoranjan Satpathy 1 1 Dept. of Computer Science The University of Reading Reading, RG6 6AY, UK drg@ieee.org,

### Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects