UNIVERSITY OF LYON DOCTORAL SCHOOL OF COMPUTER SCIENCES AND MATHEMATICS P H D T H E S I S. Specialty : Computer Science. Author

Size: px
Start display at page:

Download "UNIVERSITY OF LYON DOCTORAL SCHOOL OF COMPUTER SCIENCES AND MATHEMATICS P H D T H E S I S. Specialty : Computer Science. Author"

Transcription

1 UNIVERSITY OF LYON DOCTORAL SCHOOL OF COMPUTER SCIENCES AND MATHEMATICS P H D T H E S I S Specialty : Computer Science Author Sérgio Rodrigues de Morais on November 16, 29 Bayesian Network Structure Learning with Applications in Feature Selection Jury : Reviewers : Pr. Philippe Leray - University of Nantes Pr. Florence D Alché-Buc - University of Evry-Val d Essonne Examinators : PhD. David Garcia - Pôle Européen de Plasturgie PhD. Emmanuel Mazer - GRAVIR Laboratory Thesis Advisors : Pr. Alexandre Aussem - UCBL, University of Lyon Pr. Joël Favrel - INSA, University of Lyon

2 I would like to dedicate this thesis to my loving parents, my brother and my sister...

3 Acknowledgements I will forever be thankful to my PhD advisor, Professor Alexandre Aussem. His scientific advice and insightful discussions were essential for this work. Alexandre has been supportive and has given me the freedom to pursue my own proposals without objection. The most important, he has believed in me and given me opportunities that nobody would have... Thanks Alex! I also thank my collaborators, specially David Garcia and Philippe Le Bot. Their enthusiasm and professionalism are contagious. Their questions and advices were of great significance for my work. Thank you for your patience and help. I am also very grateful to Joël Favrel, who was one of my two PhD advisors. He was a role model for a scientist, mentor, and teacher. Thank you for your advices and kindness. I would like to acknowledge the Ligue contre le Cancer, Comité du Rhône, France, which supported the work of chapter 6. The dataset used in this chapter was kindly supplied by the International Agency for Research on Cancer (Lyon - France). I would also like to acknowledge Sophie Rome and the Institut des Sciences Complexes (Lyon - France), who supported and helped in the work presented in chapter 7. Finally, I would like to acknowledge André Tchernof and the Centre de recherche en endocrinologie moléculaire et oncologique et génomique humaine (Québec - Canada), who supported and gave great assistance to the work presented in chapter 8.

4 Abstract The study developed in this thesis focuses on constraint-based methods for identifying the Bayesian networks structure from data. Novel algorithms and approaches are proposed with the aim of improving Bayesian network structure learning with applications to feature subset selection, probabilistic classification in the presence of missing values and detection of the mechanism of missing data. Extensive empirical experiments were carried out on synthetic and real-world datasets in order to compare the methods proposed in this thesis with other state-of-the-art methods. The applications presented include extracting the relevant risk factors that are statistically associated with the Nasopharyngeal carcinoma, a robust analysis of type 2 diabetes from a dataset consisting of 22,283 genes and only 143 samples and a graphical representation of the statistical dependencies between 34 clinical variables among 15 obese women with various degrees of obesity in order to better understand the pathophysiology of visceral obesity and provide guidance for its clinical management. Keywords: Bayesian networks, feature subset selection, missing data mechanism, classification, pattern recognition.

5 Contents 1 Introduction An Overview Author s Contributions Applications Outline Background about Bayesian networks and structure learning Introduction Some principles of Bayesian networks Markov condition and d-separation Markov equivalence Embedded Faithfulness Markov blankets and boundaries Constraint-based structure learning Soundness of constraint-based algorithms G likelihood-ratio conditional independence test Fisher s Z test Existence of a perfect map Conditional independence models Graphical independence models Algebraic independence models Graph-Isomorph iv

6 CONTENTS I STRUCTURE LEARNING 31 3 Local Bayesian network structure search Introduction Preliminaries Conditional Independence Test Pitfalls and related work HPC : the Hybrid Parents and Children algorithm HPC correctness under faithfulness condition Experimental validation Accuracy Scalability MBOR: an extension of HPC for feature selection Discussion and conclusions Conservative feature selection with missing data Introduction Preliminaries Dealing with missing values Deletion process A conservative Markov blanket A conservative independence test Extension to conditional G-tests Experimental evaluation Limits of the conservative test Ramoni and Sebastiani s benchmark Procedure used to remove data Results of the empirical experiments Discussion and Conclusions Exploiting data missingness through Bayesian network modeling Introduction Related work Detecting the missing data mechanism v

7 CONTENTS 5.4 Including the missing mechanism to classification models Empirical experiments Czech car factory dataset Congressional voting dataset Discussion and conclusions II APPLICATIONS 94 6 Analysis of nasopharyngeal carcinoma risk factors Introduction Graph construction with inclusion of domain knowledge Graph-based analysis and related work Predictive performance Model calibration Detection of the missing mechanisms Discussion and conclusions Robust gene selection from microarray data Introduction Robust feature subset selection Ensemble FSS by consensus ranking Experiments Robustness versus classification accuracy Ensemble FSS technique on Diabetes data Discussion and conclusions Analysis of lifestyle and metabolic predictors of visceral obesity with Bayesian networks Introduction Simulation experiments with HPC Results on biological data Discussion and conclusions vi

8 CONTENTS 9 Conclusions and Future Work Summary Future Work vii

9 List of Figures 2.1 Toy example of causal network presented in the WCCI28 Causation and Prediction Challenge Three Markov equivalent DAGs. There are no other DAGs Markov equivalent to them The marginal distribution of V, S, L and F cannot satisfy the faithfulness condition with any DAG Toy problem about PC learning: Z PC T, so that, X G T Z Divide-and-conquer algorithms can be less data-efficient than incremental algorithms HPC empirical evaluation in terms of scalability HPC empirical evaluation in terms of Euclidean distance from perfect precision and recall HPC empirical evaluation in terms of the number of false positives HPC empirical evaluation in terms of the number of false negatives GreedyGmax s p-value as a function of the ratio of missing data Subgraph taken from the benchmark ALARM displaying the MB of the variable SHUNT GreedyGmax s p-value as a function of the ratio of missing data when testing on variables of the benchmark ALARM Original BN benchmark used by [Ramoni & Sebastiani (21)] MCAR made from the original BN benchmark used by [Ramoni & Sebastiani (21)] viii

10 LIST OF FIGURES 4.6 MAR made from the original BN benchmark used by [Ramoni & Sebastiani (21)] NMAR (IM) made from the original BN benchmark used by [Ramoni & Sebastiani (21)] Toy examples of missing completely at random (MCAR) Toy examples of missing at random (MAR) Toy examples of not missing at random (NMAR/IM) Probability tables used to vary the missing data ratio of the DAG shown in Figure Average accuracy in detecting the mechanism NMAR (IM) of toy problem shown in Figure Graphical representation of the MCAR, MAR and NMAR (IM) used for empirical experiments Bayesian network used for generating data from the congressional voting records dataset Empirical evaluation of GMB on a congressional voting reports dataset Empirical evaluation of GMB for MCAR, MAR and IM (NMAR) Local BN graph skeleton around variable NPC Local PDAG of Figure The ROC curves obtained by 1-fold cross-validation with a Naive Bayes classifier Model calibration. Top: Markov boundary. Bottom: all variables NPC graph with dummy missingness variables shown in dotted line Robustness vs MB size for the benchmarks Genes and Pigs Comparative accuracy for the benchmarks Genes and Pigs MBOR outputs for a microarray data Bootstrap-based validation for the algorithm HPC on datasets from the benchmark INSULIN BN learned from 34 risk factors related to lifestyle, adiposity, body fat distribution, blood lipid profile and adipocyte sizes ix

11 Chapter 1 Introduction 1.1 An Overview A Bayesian networks (BN) is a graphical structure for representing the probabilistic relationships among a large number of features (or variables 1 ) and for doing probabilistic inference with those features. The graphical nature of Bayesian networks gives a very intuitive grasp of the relationships among the features. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Bayesian networks are used for modeling knowledge in computational biology and bioinformatics (gene regulatory networks, protein structure, gene expression analysis), medicine, document classification, information retrieval, image processing, data fusion, decision support systems, engineering, gaming and law. The term Bayesian networks was coined by Judea Pearl to emphasize three aspects [Pearl (1986)]: 1. The often subjective nature of the input information. 2. The reliance on Bayes s conditioning as the basis for updating information. 3. The distinction between causal and evidential modes of reasoning, which underscores Thomas Bayes posthumously published paper of 1763 [Bayes (1763)]. 1 The two terms feature and variable are used without distinction in this thesis. 1

12 1.1 An Overview There are numerous representations available for data analysis, including rule bases, decision trees, and artificial neural networks; and there are many techniques for data analysis such as density estimation, classification, regression, and clustering. So what do Bayesian networks have to offer? There are at least three answers: 1. Bayesian networks can readily handle incomplete data sets. For example, consider a classification or regression problem where two of the explanatory or input variables are strongly anti-correlated. This correlation is not a problem for standard supervised learning techniques, provided all inputs are measured in every case. When one of the inputs is not observed, however, most models will produce an inaccurate prediction, because they do not encode the dependencies between the input variable. Bayesian networks offer a natural way to encode such dependencies. 2. Bayesian networks help in the process of learning about causal relationships. Learning about causal relationships are important for at least two reasons. The process is useful when we are trying to gain understanding about a problem domain, for example, during exploratory data analysis. In addition, knowledge of causal relationships allows us to make predictions in the presence of interventions. For example, a marketing analyst may want to know whether or not it is worthwhile to increase exposure of a particular advertisement in order to increase the sales of a product. To answer this question, the analyst would like to determine whether or not the advertisement is a cause for increased sales, and to what degree. 3. Bayesian networks in conjunction with Bayesian statistical techniques facilitate the combination of domain knowledge and data. Anyone who has performed a real-world analysis knows the importance of prior or domain knowledge, especially when data is scarce or expensive. The fact that some commercial systems (i.e., expert systems) can be built from prior knowledge alone is a testament to the power of prior knowledge. Bayesian networks have a causal semantics that makes the encoding of causal prior knowledge particularly straightforward. In addition, Bayesian networks encode 2

13 1.1 An Overview the strength of causal relationships with probabilities. Consequently, prior knowledge and data can be combined with well-studied techniques from Bayesian statistics. Learning a Bayesian network from data requires identifying both the model structure G and the corresponding set of model parameter values. However, the study developed in this thesis focuses only on methods for identifying the Bayesian networks structure from data. The problem of learning the most probable a posteriori BN from data is worst-case NP-hard [Chickering (22); Chickering et al. (24)] and the recent explosion of high dimensional datasets poses a serious challenge to existing BN structure learning algorithms. Two types of BN structure learning methods have been proposed so far: constraint-based (CB) and scoreand-search methods. While score-and-search methods are efficient for learning the whole BN structure, the ability to scale up to hundreds of thousands of variables is a key advantage of CB methods over score-and-search methods. All the proposals of this thesis focus on improving constraint-based methods. Several CB algorithms have been proposed recently for local BN structure learning [Fu et al. (28); Nilsson et al. (27a); Peña (28); Peña et al. (27); Tsamardinos & Brown (28); Tsamardinos et al. (26)]. They search for conditional independence relationships among the variables on a dataset and construct a local structure around the target node without having to construct the whole BN first, hence their scalability. These algorithms are appropriate for situations where the sample size is large enough with respect to the network degree. That is, the number of parents and children (PC set) of each node in the network is relatively small with respect to the number of instances in the dataset. However, they are plagued with a severe problem: the number of false negatives increases swiftly as the size of the PC set increases. This well known problem is common to all CB methods and has led several authors to reduce, as much as possible, the size of the conditioning sets with a view to enhancing the data-efficiency of their methods [Fu et al. (28); Peña et al. (27); Tsamardinos et al. (26)]. 3

14 1.2 Author s Contributions 1.2 Author s Contributions Main contributions to the field of constraint-based Bayesian network structure learning made by the author include: 1. A novel structure learning algorithm called Hybrid Parents and Children (HPC ) [Aussem et al. (29b)]. HPC was proven to be correct under the faithfulness condition. Extensive empirical experiments were provided on public synthetic and real-world datasets of various sample sizes to assess HPC s accuracy and scalability. It was shown that significant improvements were obtained. In addition the number of calls to the independence test (and hence the effective complexity) is only O(n 1.9 ) in practice on the eight BN benchmarks that we considered and O(n 1.21 ) on a real drug design dataset characterized by almost 14, features. 2. An extension of HPC designed for the specific aim of features selection for probabilistic classification. Such extension is called MBOR and was already applied in [Aussem et al. (29c); de Morais & Aussem (28a,b)] with very promising results after extensive empirical experiments on synthetic and real-world datasets. MBOR searches the Markov boundary of a target as a solution for the problem of features selection and was shown to scale up to hundreds of thousands of variables. As the algorithm HPC, MBOR was also proven to be correct under the faithfulness condition. 3. A novel conservative features selection method for handling incomplete datasets [Aussem & de Morais (28)]. The method is conservative in the sense that it selects the minimal subset of features that renders the rest of the features independent of the target (the class variable) without making any assumption about the mechanism of missing data. The idea is that, when no information about the pattern of missing data is available, an incomplete dataset contains the set of all possible estimates. This conservative test addresses the main shortcoming of CB methods with missing data: the difficulty of performing an independence test when some entries are missing without making any assumption about the missing data mechanism. 4

15 1.3 Applications 4. A new graphical approach for exploiting data missingness in Bayesian network modeling [de Morais & Aussem (29a)]. The novel approach makes use of Bayesian networks for explicitly representing the information about the absence of data. This work focused on two different, but not independent aims: first, to help detecting the missing data mechanisms, and second, to improve accuracy in classification when working with missing data. The missingness information is taken into account in the structure of the Bayesian network that will represent the joint probability distribution of all the variables, including new dummy variables that were artificially created for representing missingness. 1.3 Applications The main applications of the methods presented in this thesis to real-world problems made by the author include: 1. Application of the algorithm HPC for extracting the relevant risk factors that are statistically associated with the Nasopharyngeal Carcinoma (NPC) [Aussem et al. (29a)]. Experiments for detecting the missing data mechanisms present in this dataset were also carried out. The dataset was obtained from a case-control epidemiologic study performed by the International Agency for Research on Cancer in the Maghreb (north Africa). It consists of 1289 subjects (664 cases of NPC and 625 controls) and 15 nominal variables. In this study, special emphasis is placed on integrating domain knowledge and statistical data analysis. Once the graph skeleton is constructed from data, it is afterwards directed by the domain expert according to his causal interpretation and additional latent variable are added to the graph for sake of clarity, coherence and conciseness. The graphical representation provides a statistical profile of the recruited population, and meanwhile help identifying the important risk factors involved in NPC. 2. Application of the algorithm MBOR on a microarray dataset in order to provide a robust analysis of type 2 diabetes [Aussem et al. (29c)]. The dataset used in this study consists of 22,283 genes and only 143 samples. It 5

16 1.4 Outline was obtained in collaboration with INSERM U87/INRA 1235 laboratory and represents a compilation of different microarray data published during the last five years on the skeletal muscle from patients suffering from type 2 diabetes, obesity or from healthy subjects. Multiple runs of MBOR on re-samples of the microarray data are combined, using ensemble techniques, to yield more robust results. Genes were aggregated into a consensus genes rank and the top ranked features were analyzed by biologists. It was shown that the findings presented in this study are in nice agreement with the genes that were associated with an increased risk of diabetes in the recent medical literature. 3. The algorithm HPC was applied for representing the statistical dependencies between 34 clinical variables among 15 obese women with various degrees of obesity. Features affecting obesity are of high current interest. Clinical data, such as patient history, lifestyle parameters and basic or even more elaborate laboratory analyses (e.g., adiposity, body fat distribution, blood lipid profile and adipocyte sizes) form a complex set of inter-related variables that may help better understand the pathophysiology of visceral obesity and provide guidance for its clinical management. In the work presented in this chapter bootstrapping method was used to generate more robust network structures. Statistical significance of edge strengths are evaluated using this approach. If an edge has a confidence above the threshold, it is included in the consensus network. This study made thorough use of integration of physiological expertise into the graph structure. 1.4 Outline This thesis is divided into 9 chapters. A great effort was made in order to provide self contain chapters. For this reason some redundant information can be seen from one chapter to another. However, the brief background provided in chapter 2 is necessary for everyone who is not familiar to Bayesian networks. Chapter 2 provides the important background about the principal concepts of Bayesian networks and constraint-based learning. In chapter 3, the algorithms HPC and 6

17 1.4 Outline MBOR are introduced. Furthermore, this chapter also presents the parallel approach for both algorithms HPC and MBOR and a thorough discussion about the main problems that plague CB Bayesian networks structure learning, including the problem of almost-deterministic relationships among variables. Chapter 4 introduces a novel conservative features selection method for handling incomplete datasets. A different approach is presented in chapter 5 which exploits data missingness in Bayesian network modeling. The last chapters of this thesis contain several applications to real-world problems. In chapter 6 the algorithm HPC was applied for extracting the relevant risk factors that are statistically associated with the Nasopharyngeal Carcinoma. In chapter 7 the algorithm MBOR (Chapter 3) is applied on a microarray dataset in order to provide a robust analysis of type 2 diabetes. A graphical representation for helping identifying the most important predictors of visceral obesity was achieved in chapter 8 by applying the algorithm HPC on a dataset containing 34 clinical variables among 15 obese women with various degrees of obesity. Finally, chapter 9 presents a summary and discusses future work. 7

18 Chapter 2 Background about Bayesian networks and structure learning 2.1 Introduction Bayesian networks (BN) are probabilistic graphical models that offer a coherent and intuitive representation of uncertain domain knowledge. Formally, BN are directed acyclic graphs (DAG) modeling probabilistic conditional independences among variables. The graphical part of BN reflects the structure of a problem, while local interactions among neighboring variables are quantified by conditional probability distributions. One of the main advantages of BN over other artificial intelligence (AI) schemes for reasoning under uncertainty is that they readily combine expert judgment with knowledge extracted from the data within the probabilistic framework. Another advantage is that they represent graphically the (possibly causal) independence relationships that may exist in a very parsimonious manner [Brown & Tsamardinos (28)]. Formally, a BN is a tuple < G, P >, where G =< U, E > is a directed acyclic graph with nodes representing the random variables U and P a joint probability distribution on U. In addition, G and P must satisfy the Markov condition: every variable, X i U, is independent of any subset of its non-descendant variables conditioned on the set of its parents, denoted by Pa G i. The analysis of the Bayesian network structure can give very important information for understanding a problem at hand. For instance, let us consider 8

19 2.1 Introduction Figure 2.1: Toy example of causal network presented in the WCCI28 Causation and Prediction Challenge. the causal network presented in figure 2.1. This network was presented as a toy example of causal network in the WCCI28 Causation and Prediction Challenge [Guyon et al. (28)]. When data is generated from a causal network, then such causal network very often coincides with the structure of a Bayesian network that represents the joint probability distribution of the variables in the problem. Clearly, the causal network of figure 2.1 is acyclic, therefore it is called a causal DAG. However, such DAG must satisfy the Markov condition in order to be a Bayesian network. The concept of causality is something rather controversial, but when one consider that an effect is a future consequence of a past cause, then the Markov condition is observed from a causal DAG. It means that when the empirical data is generated from a causal DAG G by a stochastic process, then G and P satisfy the Markov condition. In other words, if the value of each variable X i is chosen at random with some probability P (X i Pa G i ), based solely on the values of Pa G i, then the overal distribution P of the generated instances x 1, x 2,..., x n and the DAG G will satisfy the Markov condition [Pearl (2)]. A lot of information can be taken from the Bayesian network that coincides with the causal DAG presented in figure 2.1. For instance, it is clear that Smoking is directly associated with Lung Cancer. One can also see that even if Yellow Fingers is associated with Lung Cancer it is not a direct association, but it passes 9

20 2.2 Some principles of Bayesian networks through Smoking. It is clear that Born an Even Day has nothing to do with Lung Cancer. Interestingly Car Accident could be even more predictive in relation to Lung Cancer than Smoking because there are three information paths between Lung Cancer and Car Accident. Nonetheless, for a physician it would be much more important to discover that Smoking has a direct impact to developing Lung Cancer, than that Car Accident is predictive. One can see that Allergy is independent of Lung Cancer when there is no information about the values of the other variables, but when it is known that a patient is frequently Coughing, then the fact of knowing that the same patient has no Allergy can increase the probability of this patient having Lung Cancer... However, the structure of such a Bayesian network is not known beforehand when a dataset containing observational data is the only available piece of information. The Bayesian network structure search is the main aim of what is presented in this thesis. This chapter recalls some concepts of Bayesian networks and structure learning that are important for the comprehension of what is discussed in the sequel of this thesis. More information about Bayesian networks can be found for instance in [Neapolitan (24); Pearl (2)]. The contents of the next two sections were mostly taken from [Neapolitan (24)]. A thorough discussion on Bayesian networks can also be found in [François (26); Naïm et al. (24)]. 2.2 Some principles of Bayesian networks As it was already stated in the last section a BN is a tuple < G, P >, where G =< U, E > is a directed acyclic graph (DAG) with nodes representing the random variables U, arcs E the connections between the random variables and P a joint probability distribution on U. In addition, G and P must satisfy the Markov condition: every variable, X i U, is independent of any subset of its non-descendant variables conditioned on the set of its parents, denoted by Pa G i. From the Markov condition, it is easy to prove [Neapolitan (24)] that the joint probability distribution P on the variables on U can be factored as follows : P (U) = P (X 1,..., X n ) = n P (X i Pa G i ) (2.1) i=1 1

21 2.2 Some principles of Bayesian networks Equation 2.1 allows a parsimonious decomposition of the joint distribution P. It enables us to reduce the problem of determining a huge number of probability numbers to that of determining relatively few. Such decomposition is possible because a BN structure G entails a set of conditional independence assumptions. They can all be identified by the d-separation criterion [Pearl (2)]. We discuss this important concept next, but first we need to review some graph theory. Suppose we have a DAG G =< U, E >. We call a chain between two nodes (X, Y ) U a set of connections that create a path in G between the two nodes X and Y. For example, [Yellow Fingers, Smoking, Lung Cancer, Coughing, Allergy] and [Allergy, Coughing, Lung Cancer, Smoking, Yellow Fingers] represent the same chain between Yellow Fingers and Allergy in the DAG of figure 2.1. We often denote chains by showning undirected lines between the nodes in the chain. If we want to show the direction of the edges, we use arrows. A chain containing two nodes is called a link. Given the directed edge X Y, we say the tail of the edge is X and the head of the edge is Y. We also say the following: A chain X W Y is a head-to-tail meeting, the edges meet headto-tail at W, and W is a head-to-tail node on the chain. A chain X W Y is a tail-to-tail meeting, the edges meet tail-totail at W, and W is a tail-to-tail node on the chain. A chain X W Y is a head-to-head meeting, the edges meet headto-head at W, and W is a head-to-head node on the chain. A chain X W Y, such that X and Y are not adjacent, is an uncoupled meeting Markov condition and d-separation Consider three disjoint sets of variables, X, Y and Z, which are represented as nodes in a directed acyclic graph G. To test whether X is independent of Y given Z in any distribution compatible with G, we need to test whether the nodes corresponding to variables Z block (d-separate) all chains between nodes in X 11

22 2.2 Some principles of Bayesian networks and nodes in Y. Blocking is to be interpreted as stopping the flow of information (or of dependence) between the variables that are connected by such chains. Next we develop the concept of d-separation, and show the following: (1) The Markov condition entails that all d-separations are conditional independences, and (2) every conditional independence entailed by the Markov condition is identified by d-separation. That is, if < G, P > satisfies the Markov condition, every d-separation in G is a conditional independence in P. Furthermore, every conditional independence, which is common to all probability distributions satisfying the Markov condition with G, is identified by d-separation. Definition 1 Let G =< U, P > be a DAG, A U, X and Y be distinct nodes in (U \ A), and ρ be a chain between X and Y. Then ρ is blocked by A if one of the following holds: There is a node Z A on the chain ρ, and the edges incident to Z on ρ meet head-to-tail at Z. There is a node Z A on the chain ρ, and the edges incident to Z on ρ meet tail-to-tail at Z. There is a node Z on the chain ρ such that Z and all of Z s descendants are not in A and the edges incident to Z on ρ meet head-to-head at Z. We say the chain is blocked at any node in A where one of the above meetings takes place. There may be more than one such node. The chain is called active given A if it is not blocked by A. Definition 2 Let G =< U, P > be a DAG, A U, X and Y be distinct nodes in (U \ A). We say X and Y are d-separated by A in G if every chain between X and Y is blocked by A. It is not hard to see that every chain between X and Y is blocked by A if and only if every simple chain between X and Y is blocked by A. Definition 3 Let G =< U, P > be a DAG, A, B and C be mutually disjoint subsets of U. We say A and B are d-separated by C in G if for every X A and Y B, X and Y are d-separated by C. We right I G (A, B C). If C =, we write only I G (A, B). 12

23 2.2 Some principles of Bayesian networks We write I G (A, B C) because d-separation identifies all and only those conditional independences entailed by the Markov condition for G. The three following lemmas are need to prove this. Lemma 1 Let P be a probability distribution of the variables in U and G =< U, E > be a DAG. Then < G, P > satisfies the Markov condition if and only if, for every three mutually disjoint subsets A, B, C U, whenever A and B are d-separated by C, A and B are conditionally independent in P given C. That is, < G, P > satisfies the Markov condition if and only if I G (A, B C) I P (A, B C). Proof. The proof can be found in [Neapolitan (24)]. According to lemma 1, if A and B are d-separated by C in G, the Markov condition entails I P (A, B C). For this reason, if < G, P > satisfies the Markov condition, we say G is an independent map (I-map) of P. The question that rises now is if the converse of what was stated by lemma 1 is also true. The next two lemmas prove this. First we have a definition. Definition 4 Let U be a set of random variables, and A 1, B 1, C 1, A 2, B 2, and C 2 be subsets of U. We say conditional independence I P (A 1, B 1 C 1 ) is equivalent to conditional independence I P (A 2, B 2 C 2 ) if for every probability distribution P of U, I P (A 1, B 1 C 1 ) holds if and only if I P (A 2, B 2 C 2 ) holds. Lemma 2 Any conditional independence entailed by a DAG, based on the Markov condition, is equivalent to a conditional independence among disjoint sets of random variables. Proof. The proof can be found in [Neapolitan (24)]. Due to the preceding lemma, we need only discuss disjoint sets of random variables when investigating conditional independences entailed by the Markov condition. The next lemma states that the only such conditional independences are those that correspond to d-separations: Lemma 3 Let G =< U, E > be a DAG, and P be the set of all probability distributions P such that < G, P > satisfies the Markov condition. Then for every three mutually disjoint subsets A, B, C U, 13

24 2.2 Some principles of Bayesian networks I P (A, B C) for all P P I G (A, B C). Proof. The proof can be found in [Geiger & Pearl (199)]. Definition 5 We say conditional independence I P (A, B C) is identified by d- separation in G if one of the following holds: I G (A, B C). A, B and C are not mutually disjoint, A, B and C are mutually disjoint, I P (A, B C) and I P (A, B C ) are equivalent, and we have I G (A, B C ). Theorem 1 Based on the Markov condition, a DAG G entails all and only those conditional independences that are identified by d-separation in G. Proof. The proof follows immediately from the lemmas 1, 2 and 3. One must be careful to interpret theorem 1 correctly. A particular distribution P, that satisfies the Markov condition with G, may have conditional independences that are not identified by d-separation. The next definition is about the situation when the converse of theorem 1 is also true. Definition 6 Suppose we have a joint probability distribution P of the random variables in some set U and a DAG G =< U, E >. We say that < G, P > satisfies the faithfulness condition if, based on the Markov condition, G entails all and only conditional independences in P. That is, the following two conditions holds: < G, P > satisfies the Markov condition. This means G entails only conditional independences in P. All conditional independences in P are entailed by G, based on the Markov condition. When < G, P > satisfies the faithfulness condition, we say P and G are faithful to each other, and we say that G is a perfect map (P-map) of P. When < G, P > does not satisfy the faithfulness condition, we say they are unfaithful to each other. 14

25 2.2 Some principles of Bayesian networks Figure 2.2: Three Markov equivalent DAGs. There are no other DAGs Markov equivalent to them Markov equivalence Many DAGs are equivalent in the sense that they have the same d-separations. For example, each of the DAGs in figure 2.2 has the d-separations I G (Y, Z X), I G (X, W Y, Z) and these are the only d-separations each has. After stating a formal definition of this equivalence, we give a theorem showing how it relates to probability distributions. Finally, we establish a criterion for recognizing this equivalence. Definition 7 Let G 1 =< U, E 1 > and G 2 =< U, E 1 > be two DAGs containing the same set of nodes U. Then G 1 and G 2 are called Markov equivalent if for every three mutually disjoint subsets A, B, C U, A and B are d-separated by C in G 1 if and only if A and B are d-separated by C in G 2. That is I G1 (A, B C) I G2 (A, B C) Although the previous definition relates only to graph properties, its application is in probability, due to the following theorem: Theorem 2 Two DAGs are Markov equivalent if and only if, based on the Markov condition, they entail the same conditional independences. Proof. The proof follows immediately from theorem 1. Corollary 1 Let G 1 =< U, E 1 > and G 2 =< U, E 1 > be two DAGs containing the same random variables U. Then G 1 and G 2 are Markov equivalent if and 15

26 2.2 Some principles of Bayesian networks only if for every probability distribution P of U, (G 1, P ) satisfies the Markov condition if and only if (G 2, P ) satisfies the Markov condition. Proof. The proof follows immediately from theorems 1, 2. Next we present a theorem that shows how to identify Markov equivalence. Its proof requires the following three lemmas: Lemma 4 Let G =< U, E > be a DAG, and X, Y U. Then X and Y are adjacent in G if and only if they are not d-separated by any set V (U \ X, Y ). Proof. The proof can be found in [Neapolitan (24)]. Lemma 5 Suppose we have a DAG G =< U, E > and an uncoupled meeting X Z Y. Then the following are equivalent: X Z Y is a head-to-head meeting. There exists a set not containing Z that d-separates X and Y. All sets containing Z do not d-separate X and Y. Proof. The proof can be found in [Neapolitan (24)]. Lemma 6 If G 1 and G 2 are Markov equivalent, then X and Y are adjacent in G 1 if and only if they are adjacent in G 2. That is, Markov equivalent DAGs have the same links (edges without regard for directions). Proof. The proof can be found in [Neapolitan (24)]. We now give the theorem that identifies Markov equivalence. This theorem was first stated in [Pearl et al. (1989)]. Theorem 3 Two DAGs G 1 and G 2 are Markov equivalent if and only if they have the same links (edges without regard for direction) and the same set of uncoupled head-to-head meetings. Proof. The proof can be found in [Neapolitan (24)]. Theorem 3 gives us a simple way to represent a Markov equivalence class with a simple graph. That is, we can represent a Markov equivalence class with a graph that has the same links and the same uncoupled head-to-head meetings 16

27 2.2 Some principles of Bayesian networks as the DAGs in the class. Any assignment of directions to the undirected edges in this graph, that does no create a new uncoupled head-to-head meeting or a directed cycle, yields a member of the equivalence class. Often there are edges other than uncoupled head-to-head meetings which must be oriented the same in Markov equivalent DAGs. For example, if all uncoupled meeting X Y Z is not head-to-head, then all the DAGs in the equivalence class must have Y Z oriented as Y Z. So we define a DAG pattern for a Markov equivalence class to be the graph that has the same links as the DAGs in the equivalence class and has oriented all and only the edges common to all of the DAGs in the equivalence class. The directed links in a DAG pattern are called compelled edges Embedded Faithfulness The distribution P (v, s, l, f) in figure 2.3 (b) does not admit a faithful DAG representation. However, it is the marginal of a distribution, namely P (v, s, c, l, f), which does. This is an example of embedded faithfulness, which is defined as follows: Definition 8 Let P be a joint probability distribution of the variables in V where V U, and let G =< U, E > be a DAG. We say < G, P > satisfies the embedded faithfulness condition if the following two conditions holds: Based on the Markov condition, G entails only conditional independences in P for subsets including only elements of V. All conditional independences in P are entailed by G, based on the Markov condition. When < G, P > satisfies the embedded faithfulness condition, we say P is embedded faithfully in G. Notice that faithfulness is a special case of embedded faithfulness in which U = V. Theorem 4 Let P be a joint probability distribution of the variables in U with V U, and G =< U, E >. If < G, P > satisfies the faithfulness condition, and P is the marginal distribution of V, then < G, P > satisfies the embedded faithfulness condition. Proof. The proof comes directly from definition 8. 17

28 2.2 Some principles of Bayesian networks Figure 2.3: The marginal distribution of V, S, L and F cannot satisfy the faithfulness condition with any DAG. Theorem 5 Let P be a joint probability distribution of the variables in V with V U, and G =< U, E >. Then < G, P > satisfies the embedded faithfulness condition if and only if all and only conditional independences in P are identified by d-separation in G restricted to elements of V. Proof. The proof can be found in [Neapolitan (24)] Markov blankets and boundaries A Bayesian network can have a large number of nodes, and the probability of a given node can be affected by instantiating a distant node. However, it turns out that the instantiation of a set of close nodes can shield a node from the effect of all other nodes. The following definition and theorem show this: Definition 9 Let U be a set of random variables, P be their joint probability distribution, and X U. Then a Markov blanket M X of X is any set of variables such that X is conditionally independent of all the other variables given M X. That is, I P (X, U \ (M X X) M X ) Theorem 6 Suppose < G, P > satisfies the Markov condition. Then, for each variable X, the set of all parents of X, children of X and spouses of X is a Markov blanket of X. Proof. The proof can be found in [Neapolitan (24)]. 18

29 2.3 Constraint-based structure learning Definition 1 Let U be a set of random variables, P be their joint probability distribution, and X U. Then a Markov boundary MB X of X is any Markov blanket of X such that none of its proper subsets is a Markov blanket of X. Theorem 7 Suppose < G, P > satisfies the faithfulness condition. Then, for each variable X, the set of all parents of X, children of X and spouses of X is the unique Markov boundary of X. Proof. The proof can be found in [Neapolitan (24)]. Theorem 7 holds for all probability distributions including ones that are not strictly positive. When a probability distribution is not strictly positive, there is not necessarily a unique Markov boundary. The final theorem presented in this section holds for strictly positive distributions. Theorem 8 Suppose P is a strictly positive probability distribution of the variable in U. Then for each X U there is a unique Markov boundary of X. Proof. The proof can be found in [Pearl (1988)]. 2.3 Constraint-based structure learning The problem of learning the most probable a posteriori BN from data is worstcase NP-hard [Chickering (22); Chickering et al. (24)] and the recent explosion of high dimensional datasets poses a serious challenge to existing BN structure learning algorithms. Two types of BN structure learning methods have been proposed so far: score-and-search (a basic example is shown in algorithm 1) and constraint-based (a basic example is shown in algorithm 2) methods. While score-and-search methods are efficient for learning the full BN structure, the ability to scale up to hundreds of thousands of variables is a key advantage of constraint-based methods over score-and-search methods. The study presented in this thesis is focused on constraint-based approaches Soundness of constraint-based algorithms Constraint-Based (CB for short) learning methods systematically check the data for conditional independence relationships. Typically, the algorithms run a χ 2 19

30 2.3 Constraint-based structure learning Algorithm 1 DAG Pattern search: a basic score-and-search algorithm Require: D: dataset; U: set of random variables. Ensure: A DAG pattern (gp) that approximates maximizing score(d, gp). 1: E 2: gp (U, E) 3: repeat 4: if [any DAG pattern in the neighborhood of our current DAG pattern increases score(d, gp)] then 5: modify E according to the one that increases score(d, gp) the most 6: end if 7: until [score(d, gp) is not increased anymore] independence test when the dataset is discrete and a Fisher s Z-test when it is continuous in order to decide on dependence or independence, that is, upon the rejection or acceptance of the null hypothesis of conditional independence. A structure learning algorithm from data is said to be correct (or sound) if it returns the correct DAG pattern (or a DAG in the correct equivalence class) under the assumptions that the independence tests are reliable and that the learning dataset is a sample from a distribution P faithful to a DAG G, The (ideal) assumption that the independence tests are reliable means that they decide (in)dependence if and only if the (in)dependence holds in P. Based on what was just stated we next prove the soundness of algorithm 2. Lemma 7 If the set of all conditional independences in U admit a faithful DAG representation, the algorithm 2 creates a link between X and Y if and only if there is a link between X and Y in the DAG pattern gp containing the d-separations in this set. Proof. The algorithm 2 produces a link if and only if X and Y are not d- separated by any subset of U, which, owing to Lemma 4, is the case if and only if X and Y are adjacent in gp. Lemma 8 If the set of all conditional independences in U admit a faithful DAG representation, then any directed edge created by the algorithm 2 is a directed edge in the DAG pattern containing the d-separations in this set. 2

31 2.3 Constraint-based structure learning Algorithm 2 DAG Pattern search: a basic constraint-based algorithm Require: D: dataset; U: set of random variables. Ensure: DAG pattern (gp) such that I G (X, Y Z) I P (X, Y Z). Step 1: 1: for all [pair of nodes X, Y U] do 2: search for a subset S XY U such that I(X, Y S XY ); 3: if [no such set can be found] then 4: create the link X Y in gp; 5: end if 6: end for Step 2: 7: for all [uncoupled meeting X Z Y ] do 8: if [Z S XY ] then 9: orient X Z Y as X Z Y ; 1: end if 11: end for Step 3: 12: for all [uncoupled meeting X Z Y ] do 13: orient Z Y as Z Y ; 14: end for Step 4: 15: for all [link X Y such that there is a path from X to Y ] do 16: orient X Y as X Y ; 17: end for Step 5: 18: for all [uncoupled meeting X Z Y such that X W, Y W and Z W ] do 19: orient Z W as Z W ; 2: end for 21

32 2.3 Constraint-based structure learning Proof. We consider the directed edges created in each step in turns: Step 2: The fact that such edges must be directed follows from Lemma 5. Step 3: If the uncoupled meeting X Z Y were X Z Y, Z would not be in any set that d-separates X and Y due to Lemma 5, which means we would have created the orientation X Z Y in Step 2. Therefore, X Z Y must be X Z Y. Step 4: If X Y were X Y, we would have a directed cycle. Therefore, it must be X Y. Step 5: If Z W were Z W, then X Z Y would have to be X Z Y because otherwise we would have a directed cycle. But if this were the case, we would have created the orientation X Z Y in Step 2. So Z W must be Z W. Lemma 9 If the set of all conditional independences in U admit a faithful DAG representation, all the directed edges, in the DAG pattern containing the d-separations in this set, are directed by the algorithm 2. Proof. The proof can be found in [Meek (1995)]. Theorem 9 If the set of all conditional independences in U admit a faithful DAG representation, the algorithm 2 creates the DAG pattern containing the d- separations in this set. Proof. The proof follows from Lemmas 7, 8 and G likelihood-ratio conditional independence test Statistical tests are needed in order to verify the conditional independence I(X, Y Z) from data. One of the most used statistical tests of conditional independence between two categorical variables is the G likelihood-ratio conditional independence test. In this thesis it is used to determine I P (X, Y Z) from data [Spirtes et al. (2)]. The general formula for G is presented in equation

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

A Comparison of Novel and State-of-the-Art Polynomial Bayesian Network Learning Algorithms

A Comparison of Novel and State-of-the-Art Polynomial Bayesian Network Learning Algorithms A Comparison of Novel and State-of-the-Art Polynomial Bayesian Network Learning Algorithms Laura E. Brown Ioannis Tsamardinos Constantin F. Aliferis Vanderbilt University, Department of Biomedical Informatics

More information

Big Data, Machine Learning, Causal Models

Big Data, Machine Learning, Causal Models Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion

More information

Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015

Study Manual. Probabilistic Reasoning. Silja Renooij. August 2015 Study Manual Probabilistic Reasoning 2015 2016 Silja Renooij August 2015 General information This study manual was designed to help guide your self studies. As such, it does not include material that is

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Large-Sample Learning of Bayesian Networks is NP-Hard

Large-Sample Learning of Bayesian Networks is NP-Hard Journal of Machine Learning Research 5 (2004) 1287 1330 Submitted 3/04; Published 10/04 Large-Sample Learning of Bayesian Networks is NP-Hard David Maxwell Chickering David Heckerman Christopher Meek Microsoft

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

5 Directed acyclic graphs

5 Directed acyclic graphs 5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data n Introduction to the Use of ayesian Network to nalyze Gene Expression Data Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co. Università degli Studi Milano-icocca

More information

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Nine Common Types of Data Mining Techniques Used in Predictive Analytics 1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data.

A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data. 142 A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data. Denver Dash t Department of Physics and Astronomy and Decision Systems Laboratory University of Pittsburgh Pittsburgh,

More information

Statistics and Data Mining

Statistics and Data Mining Statistics and Data Mining Zhihua Xiao Department of Information System and Computer Science National University of Singapore Lower Kent Ridge Road Singapore 119260 xiaozhih@iscs.nus.edu.sg Abstract From

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

More information

Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models

Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models Journal of Business Finance & Accounting, 30(9) & (10), Nov./Dec. 2003, 0306-686X Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models ELENA STANGHELLINI* 1. INTRODUCTION Consumer

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Incorporating Evidence in Bayesian networks with the Select Operator

Incorporating Evidence in Bayesian networks with the Select Operator Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing

Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center Yale University PIs: Ivet Bahar, Jeremy Berg,

More information

3. The Junction Tree Algorithms

3. The Junction Tree Algorithms A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

How To Find Influence Between Two Concepts In A Network

How To Find Influence Between Two Concepts In A Network 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing

More information

Data Mining On Diabetics

Data Mining On Diabetics Data Mining On Diabetics Janani Sankari.M 1,Saravana priya.m 2 Assistant Professor 1,2 Department of Information Technology 1,Computer Engineering 2 Jeppiaar Engineering College,Chennai 1, D.Y.Patil College

More information

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D.

A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel, Ph.D., 1 and Hanna Wasyluk, M.D.,Ph.D. Research Report CBMI-99-27, Center for Biomedical Informatics, University of Pittsburgh, September 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka Onisko, M.S., 1,2 Marek J. Druzdzel,

More information

Big Data Analytics for Healthcare

Big Data Analytics for Healthcare Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1 Healthcare Analytics

More information

life science data mining

life science data mining life science data mining - '.)'-. < } ti» (>.:>,u» c ~'editors Stephen Wong Harvard Medical School, USA Chung-Sheng Li /BM Thomas J Watson Research Center World Scientific NEW JERSEY LONDON SINGAPORE.

More information

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational Simply Mining Data Jilles Vreeken So, how do you pronounce Exploratory Data Analysis Jilles Vreeken Jilles Yill less Vreeken Fray can 17 August 2015 Okay, now we can talk. 17 August 2015 The goal So, what

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

Bayesian Networks. Mausam (Slides by UW-AI faculty)

Bayesian Networks. Mausam (Slides by UW-AI faculty) Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide

More information

What is the purpose of this document? What is in the document? How do I send Feedback?

What is the purpose of this document? What is in the document? How do I send Feedback? This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Statistics

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Data Mining Techniques for Prognosis in Pancreatic Cancer

Data Mining Techniques for Prognosis in Pancreatic Cancer Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Unraveling versus Unraveling: A Memo on Competitive Equilibriums and Trade in Insurance Markets

Unraveling versus Unraveling: A Memo on Competitive Equilibriums and Trade in Insurance Markets Unraveling versus Unraveling: A Memo on Competitive Equilibriums and Trade in Insurance Markets Nathaniel Hendren January, 2014 Abstract Both Akerlof (1970) and Rothschild and Stiglitz (1976) show that

More information

Agenda. Interface Agents. Interface Agents

Agenda. Interface Agents. Interface Agents Agenda Marcelo G. Armentano Problem Overview Interface Agents Probabilistic approach Monitoring user actions Model of the application Model of user intentions Example Summary ISISTAN Research Institute

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D. Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

More information

Design and Analysis of the Causation and Prediction Challenge

Design and Analysis of the Causation and Prediction Challenge JMLR: Workshop and Conference Proceedings 3: 1-33 WCCI2008 workshop on causality Design and Analysis of the Causation and Prediction Challenge Isabelle Guyon Clopinet, California Constantin Aliferis New

More information

Prediction of DDoS Attack Scheme

Prediction of DDoS Attack Scheme Chapter 5 Prediction of DDoS Attack Scheme Distributed denial of service attack can be launched by malicious nodes participating in the attack, exploit the lack of entry point in a wireless network, and

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Procedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu.

Procedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu. Procedia Computer Science 00 (2012) 1 21 Procedia Computer Science Top-k best probability queries and semantics ranking properties on probabilistic databases Trieu Minh Nhut Le, Jinli Cao, and Zhen He

More information

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

About the Author. The Role of Artificial Intelligence in Software Engineering. Brief History of AI. Introduction 2/27/2013

About the Author. The Role of Artificial Intelligence in Software Engineering. Brief History of AI. Introduction 2/27/2013 About the Author The Role of Artificial Intelligence in Software Engineering By: Mark Harman Presented by: Jacob Lear Mark Harman is a Professor of Software Engineering at University College London Director

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Chapter 28. Bayesian Networks

Chapter 28. Bayesian Networks Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, Byoung-Hee and Lim, Byoung-Kwon Biointelligence

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Labeling outerplanar graphs with maximum degree three

Labeling outerplanar graphs with maximum degree three Labeling outerplanar graphs with maximum degree three Xiangwen Li 1 and Sanming Zhou 2 1 Department of Mathematics Huazhong Normal University, Wuhan 430079, China 2 Department of Mathematics and Statistics

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

itesla Project Innovative Tools for Electrical System Security within Large Areas

itesla Project Innovative Tools for Electrical System Security within Large Areas itesla Project Innovative Tools for Electrical System Security within Large Areas Samir ISSAD RTE France samir.issad@rte-france.com PSCC 2014 Panel Session 22/08/2014 Advanced data-driven modeling techniques

More information

How To Create A Text Classification System For Spam Filtering

How To Create A Text Classification System For Spam Filtering Term Discrimination Based Robust Text Classification with Application to Email Spam Filtering PhD Thesis Khurum Nazir Junejo 2004-03-0018 Advisor: Dr. Asim Karim Department of Computer Science Syed Babar

More information

MACHINE LEARNING BASICS WITH R

MACHINE LEARNING BASICS WITH R MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

More information

Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception

More information