Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis

Size: px
Start display at page:

Download "Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis"

Transcription

1 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis Kyu-Baek Hwang 1, Byoung-Hee Kim 2, and Byoung-Tak Zhang 2 1 School of Computing, Soongsil University, Seoul , Korea kbhwang@ssu.ac.kr 2 School of Computer Science and Engineering Seoul National University, Seoul , Korea bhkim@bi.snu.ac.kr, btzhang@cse.snu.ac.kr Abstract. Bayesian network learning is a useful tool for exploratory data analysis. However, applying Bayesian networks to the analysis of large-scale data, consisting of thousands of attributes, is not straightforward because of the heavy computational burden in learning and visualization. In this paper, we propose a novel method for large-scale data analysis based on hierarchical compression of information and constrained structural learning, i.e., hierarchical Bayesian networks (HBNs). The HBN can compactly visualize global probabilistic structure through a small number of hidden variables, approximately representing a large number of observed variables. An efficient learning algorithm for HBNs, which incrementally maximizes the lower bound of the likelihood function, is also suggested. The effectiveness of our method is demonstrated by the experiments on synthetic large-scale Bayesian networks and a real-life microarray dataset. 1 Introduction Due to their ability to caricature conditional independencies among variables, Bayesian networks have been applied to various data mining tasks [9], [4]. However, application of the Bayesian network to extremely large domains (e.g., a database consisting of thousands of attributes) still remains a challenging task. General approach to structural learning of Bayesiannetworks, i.e., greedy search, encounters the following problems when the number of variables is greater than several thousands. First, the amount of running time for the structural learning is formidable. Moreover, greedy search is likely to be trapped in local optima, because of the increased search space. Until now, several researchers suggested the methods for alleviating the above problems [5], [8], [6]. Even though these approaches have been shown to efficiently find a reasonable solution, they have the following two drawbacks. First, they are likely to spend lots of time to learn local structure, which might be I. King et al. (Eds.): ICONIP 2006, Part I, LNCS 4232, pp , c Springer-Verlag Berlin Heidelberg 2006

2 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis 671 less important in the viewpoint of grasping global structure. The second problem is about visualization. It would be extremely hard to extract useful knowledge from a complex network structure consisting of thousands of vertices and edges. In this paper, we propose a new method for large-scale data analysis using hierarchical Bayesian networks. It should be noted that introducing hierarchical structures in modeling is a generic technique. Several researchers have introduced the hierarchy to probabilistic graphical modeling [10], [7]. Our approach is different from theirs in the purpose of hierarchical modeling. The purpose of our method is to make it feasible to apply probabilistic graphical modeling to extremely large domain. We also propose an efficient learning algorithm for hierarchical Bayesian networks having lots of hidden variables. The paper is organized as follows. In Section 2, we define the hierarchical Bayesian network (HBN) and describe its property. The learning algorithm for HBNs is described in Section 3. In Section 4, we demonstrate the effectiveness of our method through the experiments on various large-scale datasets. Finally, we draw the conclusion in Section 5. 2 Definition of the Hierarchical Bayesian Network for Large-Scale Data Analysis Assume that our problem domain is described by n discrete variables, Y = {Y 1,Y 2,..., Y n }. 1 The hierarchical Bayesian network for this domain is a special Bayesian network, consisting of Y and additional hidden variables. It assumes a layered hierarchical structure as follows. The bottom layer (observed layer) consists of the observed variables Y. The first hidden layer consists of n/2 hidden variables, Z 1 = {Z 11,Z 12,..., Z 1 n/2 }. The second hidden layer consists of ( n/2 )/2 hidden variables, Z 2 = {Z 21,Z 22,..., Z 2 ( n/2 )/2 }.Finally, the top layer (the log 2 n -th hidden layer) consists of only one hidden variable, Z log2 n = {Z log2 n 1}. We indicate all the hidden variables as Z = {Z 1, Z 2,..., Z log2 n }. 2 Hierarchical Bayesian networks, consisting of the variables {Y, Z}, have the following structural constraints. 1. Any parents of a variable should be in the same or immediate upper layer. 2. At most, one parent from the immediate upper layer is allowed for each variable. Fig. 1 shows an example HBN consisting of eight observed and seven hidden variables. By the above structural constraints, a hierarchical Bayesian network represents the joint probability distribution over {Y, Z} as follows. 1 In this paper, we represent a random variable as a capital letter (e.g., X, Y,and Z) and a set of variables as a boldface capital letter (e.g., X, Y, and Z). The corresponding lowercase letters denote the instantiation of the variable (e.g., x, y, and z) or all the members of the set of variables (e.g., x, y, andz), respectively. 2 We assume that all hidden variables are also discrete.

3 672 K.-B. Hwang, B.-H. Kim, and B.-T. Zhang Hidden layer 3 Z 31 Hidden layer 2 Z 21 Z 22 Hidden layer 1 Z 11 Z 12 Z 13 Z 14 Observed layer Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Fig. 1. An example HBN structure. The bottom (observed) layer consists of eight variables describing the problem domain. Each hidden layer corresponds to a compressed representation of the observed layer. P (Y, Z) =P (Z log2 n ) log 2 n 1 i=1 P (Z i Z i+1 ) P (Y Z 1 ). (1) An HBN is specified by the tuple S h, Θ Sh. Here, S h denotes the structure of HBN and Θ Sh denotes the set of the parameters of local probability distributions given S h. In addition, we denote the parameters for each layer as {Θ Sh Y, Θ Sh Z 1,..., Θ Sh Z log2 n } (= Θ S h ). The hierarchical Bayesian network can represent hierarchical compression of the information contained in Y = {Y 1,Y 2,..., Y n }. The number of hidden variables comprising the first hidden layer is half of n. Under the structural constraints, each hidden variable in the first hidden layer, i.e., Z 1i (1 i n/2 ), can have two child nodes in the observed layer as depicted in Fig 1. 3 Here, each hidden variable corresponds to a compressed representation of its own children if the number of possible values of it is less than the number of possible configurations of its children. In the hierarchical Bayesian network, edges between the variables of the same layer are also allowed (e.g., see hidden layers 1 and 2 in Fig. 1). These edges encode the conditional (in)dependencies among the variables in the same layer. The conditional independencies among the variables in a hidden layer correspond to a rough representation of the conditional independencies among the observed variables because each hidden variable is a compressed representation for a set of observed variables. When we deal with a problem domain consisting of thousands of observed variables, an approximated probabilistic dependencies visualized through hidden variables can be a reasonable solution for exploratory data analysis. 3 If n is odd, all the hidden variables except one can have two child nodes. In this case, the last one has only one child node.

4 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis Learning the Hierarchical Bayesian Network Assume that we have a training dataset for Y consisting of M examples, i.e., D Y = {y 1, y 2,..., y M }. We could describe our learning objective (log likelihood) as follows. M M L(Θ Sh,S h )= log P (y m Θ Sh,S h )= log m=1 m=1 Z P (Z, y m Θ Sh,S h ), (2) where Z means summation over all possible configurations of Z. General approach to finding maximum likelihood solution with missing variables, i.e., the expectation-maximization (EM) algorithm, is not applicable here because the number of missing variables amounts to several thousands. The large number of missing variables would render the solution space infeasible. Here, we propose an efficient algorithm for maximizing the lower bound of Eqn. (2). The lower bound for the likelihood function is derived by Jensen s inequality as follows. M log Z m=1 M P (Z, y m Θ Sh,S h ) log P (Z, y m Θ Sh,S h ). (3) m=1 Z Further, the term for each example, y m, in the above equation can be decomposed as follows. log P (Z, y m Θ Sh,S h )= log P (y m Θ Sh Y,S h ) P (Z y m, Θ Sh \Θ Sh Y,S h ) Z Z = C 0 log P (y m Θ Sh Y,S h )+ log P (Z y m, Θ Sh \Θ Sh Y,S h ), (4) Z where C 0 is a constant which is not related to the choice of Θ Sh and S h.in Eqn. (4), the parameter sets Θ Sh Y and Θ Sh \Θ Sh Y can be learned separately given S h. 4 Our algorithm starts by learning Θ Sh Y and the substructure of S h related to only the parents of Y. After that, we fill missing values for the variables in the first hidden layer, making the training dataset for Z 1. 5 Now, Eqn. (4) can be more decomposed as follows. C 0 log P (y m Θ Sh Y,S h )+ Z log P (Z y m, Θ Sh \Θ Sh Y,S h ) = C 0 log P (y m Θ Sh Y,S h )+C 1 log P (z 1m Θ Sh Z 1,S h ) + log P (Z\Z 1 y m, z 1m, Θ Sh \{Θ Sh Y, Θ Sh Z 1 },S h ), (5) Z\Z 1 4 In this paper, the symbol \ denotes set difference. 5 Because hidden variables are all missing, this procedure is likely to produce hidden constants by maximizing the likelihood function. We apply an encoding scheme for preventing this problem, which will be described later.

5 674 K.-B. Hwang, B.-H. Kim, and B.-T. Zhang Table 1. Description of the two-phase learning algorithm for hierarchical Bayesian networks. Here, hidden layer 0 means the observed layer and the variable set Z 0 means the observed variable set Y. Input D Y = {y 1, y 2,..., y M} - the training dataset. Output A hierarchical Bayesian È Å «network Sh, Θ S h, maximizing M the lower bound of m=1 log P (ym ΘS,S h h). The First Phase -Forl =0to log 2 n 1 - Estimate the mutual information between all possible variable pairs, I(Z li ; Z lj ), in hidden layer l. - Sort the variable pairs in decreasing order of mutual information. - Select n 2 (l+1) variable pairs from the sorted list such that each variable is included in only one variable pair. - Set each variable in hidden layer (l + 1) as the parent of a selected variable pair. - Learn the parameter set Θ Sh Z l by maximizing È M m=1 log P (z lm Θ Sh Z l,s h ). - Generate the dataset D Zl+1 based on the current HBN. The Second Phase - Learn the Bayesian È network structure inside hidden layer l M by maximizing m=1 log P (z lm Θ Sh Z l,s h ). where C 1 is a constant which is not related to the optimization. Then, we could learn Θ Sh Z 1 and related substructure of S h by maximizing log P (z 1m Θ Sh Z 1,S h ). In this way, we could learn the hierarchical Bayesian network from bottom to top. We propose a two-phase learning algorithm for hierarchical Bayesian networks based on the above decomposition. In the first phase, a hierarchy for information compression is learned. From the observed layer, we choose variable pairs sharing common parents in the first hidden layer. Here, we should select the variable pair with high mutual information for minimizing information loss. After determining the parent for each variable pair, missing values for the hidden parent variable are filled as follows. First, we estimate the joint probability distribution for the variable pair, ˆP (Yj,Y k )(1 j, k n, j k), from the given dataset D Y. Then, we set the parent variable value for the most probable configuration of {Y j,y k } as 0 and the second probable one as 1. 6 By this encoding scheme, a parent variable could represent the two most probable configurations of its child variables, minimizing the information loss. The parent variable value for other two cases are considered as missing. Now, we could learn the parameters for the observed variables, Θ Sh Y, using the standard EM algorithm. After learning Θ Sh Y, we could fill the missing values by probabilistic inference, making a complete dataset for the variables in the first hidden layer. Now, the same procedure 6 Here, we assume that all variables are binary although our method could be extended to more general cases.

6 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis 675 can be applied for Z 1, learning the parameter set Θ Sh Z 1 and generating a complete dataset for Z 2. This process is iterated, building the hierarchical structure and making the complete dataset for all variables, {Y, Z}. After building the hierarchy, we learn the edges inside a layer when necessary (the second phase). Any structural learning algorithm for Bayesian network can be employed because a complete dataset is now given for the variables in each layer. Table 1 summarizes the two-phase algorithm for learning HBNs. 4 Experimental Evaluation 4.1 Results on Synthetic Datasets To simulate diverse situations, we experimented with the datasets generated from various large-scale Bayesian networks having different structural properties. They are categorized into scale-free [2] and modular structures. All variables were binary and local probability distributions were randomly generated. Here, we show the results on two scale-free and modular Bayesian networks, consisting of 5000 nodes. 7 They are shown in Fig. 2(a) and 2(b), respectively. Training datasets, having 1000 examples, were generated from them. The first phase of HBN learning algorithm was applied to the training datasets, building hierarchies. Then, the second phase was applied to the seventh hidden layer, consisting of 40 hidden variables. The learned Bayesian network structures inside the seventh hidden layer are shown in Fig. 2(c) and 2(d). We examined the quality of information compression. Fig. 3(a) shows the mutual information value between parent nodes in the upper layer and child nodes in the lower layer, averaged across a hidden layer. 8 Here, we can observe that the amount of shared information between consecutive layers is more than 50%, 9 although it decreases as the level of hidden layer goes up. Interestingly, the hierarchical Bayesian network can preserve more information in the case of the dataset from modular Bayesian networks. To investigate this further, we estimated the distribution of the mutual information (see Fig. 3(b)). Here, we can clearly observe that more information is shared between parent and child nodes in the case of the modular Bayesain network than in the case of the scale-free Bayesian network. Based on the above experimental results, we conclude that the hierarchical Bayesian network can efficiently represent the complicated information contained in a large number of variables. In addition, HBNs are more appropriate when the true probability distribution assumes a modular structure. We conjecture that this is because a module in the lower layer can be well represented by a hidden node in the upper layer in our HBN framework. 7 Results on other Bayesian networks were similar although not shown here. 8 Here, mutual information values were scaled into [0, 1], by being divided by the minimum of the entropy of each variable. 9 It means that the scaled mutual information value is greater than 0.5.

7 676 K.-B. Hwang, B.-H. Kim, and B.-T. Zhang Pajek Pajek (a) (b) H16 H10 H12 H36 H8 H7 H6 H40 H12 H7 H11 H3 H15 H24 H5 H15 H17 H14 H22 H9 H13 H9 H23 H16 H37 H11 H8 H37 H26 H23 H18 H5 H40 H2 H6 H10 H39 H19 H1 H2 H21 H18 H17 H13 H22 H14 H25 H34 H38 H39 H20 H4 H30 H36 H24 H4 H19 H1 H32 H25 H27 H30 H35 H26 H38 H27 H20 H33 H31 H3 H29 H35 H21 H28 H34 H33 H28 H32 Pajek (c) H31 H29 Pajek (d) Fig. 2. Scale-free (a) and modular (b) Bayesian networks, consisting of 5000 nodes, which generated the training datasets. The Bayesian network structures, inside the seventh hidden layer, learned from the training datasets generated from the scale-free (c) and modular (d) Bayesian networks. These network structures were drawn by Pajek software [3]. 4.2 Results on a Real-Life Microarray Dataset A real-life microarray dataset on the budding yeast cell-cycle [11] was analyzed by hierarchical Bayesian networks. The dataset consists of 6178 genes and 69 samples. We binarized gene expression level based on the median expression of each slide sample. Among the 6178 genes, we excluded genes with low information entropy (< 0.8). Finally, we analyzed a binary dataset consisting of 6120 variables and 69 samples. General tendency with respect to the information compression was similar to the case of synthetic datasets, although not shown here. In the seventh hidden layer, consisting of 48 variables, we learned a Bayesian network structure (see Fig. 4). This Bayesian network compactly visualizes the original network structure consisting of 6120 genes. From the network structure, we can easily find a set of hub nodes, e.g., H2, H4, H8, H28, and H30. Genes comprising these hub nodes and their biological role were analyzed. The function of genes can be described by Gene Ontology (GO) [1] annotations. GO

8 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis 677 Average MI between parent and child nodes Scale free Bayesian network Modular Bayesian network Density Scale free Bayesian network Modular Bayesian network Hidden layer (a) MI between parent and child nodes (b) Fig. 3. Quality of information compression by hierarchical Bayesian network learning. Mutual information between parent nodes in the upper layer and child nodes in the lower layer (a) and its distribution (b). H45 H18 H14 H13 H6 H11 H17 H21 H7 H9 H20 H29 H25 H10 H15 H3 H22 H37 H2 H19 H24 H12 H48 H47 H8 H30 H31 H26 H43 H34 H5 H4 H28 H27 H16 H46 H42 H38 H40 H23 H36 H44 H35 H33 H32 H39 H1 H41 Pajek Fig. 4. The Bayesian network structure in the seventh hidden layer consisting of 48 variables. This network structure approximately represents the original network structure consisting of 6120 yeast genes. Here, we can easily find some hub nodes, for example, H2, H4, H8, H28, and H30. The network structure was drawn by Pajek software [3]. maps each gene or gene product to directly related GO terms, which have three categories: biological process (BP), cellular compartment (CC), and molecular function (MF). We can conjecture the meaning of each hub using this annotation, focusing on the BP terms related to the cell-cycle. For this task, we used GO Term Finder ( which looks for significantly shared GO terms that are directly or indirectly related to the given list of genes. The result are summarized in Table 2. The closely located hub nodes, H4 and H8 (see Fig. 4), share the function of cellular

9 678 K.-B. Hwang, B.-H. Kim, and B.-T. Zhang Table 2. Gene function annotation of hub nodes in the learned Bayesian network consisting of 48 variables. The significance of a GO term was evaluated by examining the proportion of the genes associated to this term, compared to the number of times that term is associated with other genes in the genome (p-values were calculated by a binomial distribution approximation). Node name GO term Frequency p-value H2 Organelle organization and biogenesis 18.7% H4 Cellular physiological process 74.0% H8 Cellular physiological process 74.2% H28 Response to stimulus 14.8% H30 Cell organization and biogenesis 29.3% physiological process. The genes in H30 share more specific function than H4 and H8, namely cell organization and biogenesis. The hub node H2 is related to organelle organization and biogenesis, which is more specific than that of H30. The genes in H28, the most crowded hub in the network structure, respond to stress or stimulus such as nitrogen starvation. 5 Conclusion We proposed a new class of Bayesian networks for analyzing large-scale data consisting of thousands of variables. The hierarchical Bayesian network is based on hierarchical compression of information and constrained structural learning. Through the experiments on datasets from synthetic Bayesian networks, we demonstrated the effectiveness of hierarchical Bayesian network learning with respect to information compression. Interestingly, the degree of information conservation was affected by structural property of the Bayesian networks which generated the datasets. The hierarchical Bayesian network could preserve more information in the case of modular networks than in the case of scale-free networks. One explanation for this phenomenon is that our HBN method is more suitable for modular networks because a variable in the upper layer could well represent a set of variables contained in a module in the lower layer. Our method was also applied to analysis of real microarray data on yeast cell-cycle, consisting of 6120 genes. We were able to obtain a reasonable approximation, consisting of 48 variables, on the global gene expression network. A hub node in the Bayesian network consisted of genes with similar functions. Moreover, neighboring hub nodes in the learned Bayesian network also shared similar functions, confirming the effectiveness of our HBN method for real-life large-scale data analysis. Acknowledgements This work was supported by the Soongsil University Research Fund and by the Korea Ministry of Science and Technology under the NRL program.

10 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis 679 References 1. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel- Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1) (2000) Barabási, A.-L. and Albert, R.: Emergence of scaling in random networks. Science 286(5439) (1999) Batagelj, V. and Mrvar, A.: Pajek - program for large network analysis. Connections 21(2) (1998) Friedman, N.: Inferring cellular networks using probabilistic graphical models. Science 303(6) (2004) Friedman, N., Nachman, I., Pe er, D.: Learning Bayesian network structure from massive datasets: the sparse candidate algorithm. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI) (1999) Goldenberg, A. and Moore, A.: Tractable learning of large Bayes net structures from sparse data. Proceedings of the Twentifirst International Conference on Machine Learning (ICML) (2004) 7. Gyftodimos, E. and Flach, P.: Hierarchical Bayesian networks: an approach to classification and learning for structured data. Lecture Notes in Artificial Intelligence 3025 (2004) Hwang, K.-B., Lee, J.W., Chung, S.-W., Zhang, B.-T.: Construction of large-scale Bayesian networks by local to global search. Lecutre Notes in Artificial Intelligence 2417 (2002) Nikovski, D.: Constructing Bayesian networks for medical diagnosis from incomplete and partially correct statistics. IEEE Transactions on Knowledge and Data Engineering 12(4) (2000) Park, S. and Aggarwal, J.K.: Recognition of two-person interactions using a hierarchical Bayesian network. Proceedings of the First ACM SIGMM International Workshop on Video Surveillance (IWVS) (2003) Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9(12) (1998)

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

Effect of Using Neural Networks in GA-Based School Timetabling

Effect of Using Neural Networks in GA-Based School Timetabling Effect of Using Neural Networks in GA-Based School Timetabling JANIS ZUTERS Department of Computer Science University of Latvia Raina bulv. 19, Riga, LV-1050 LATVIA janis.zuters@lu.lv Abstract: - The school

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases Gerardo Canfora Bice Cavallo University of Sannio, Benevento, Italy, {gerardo.canfora,bice.cavallo}@unisannio.it ABSTRACT In

More information

Agenda. Interface Agents. Interface Agents

Agenda. Interface Agents. Interface Agents Agenda Marcelo G. Armentano Problem Overview Interface Agents Probabilistic approach Monitoring user actions Model of the application Model of user intentions Example Summary ISISTAN Research Institute

More information

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999

In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka

More information

Web-Based Genomic Information Integration with Gene Ontology

Web-Based Genomic Information Integration with Gene Ontology Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Exercise with Gene Ontology - Cytoscape - BiNGO

Exercise with Gene Ontology - Cytoscape - BiNGO Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Fernando Benites and Elena Sapozhnikova University of Konstanz, 78464 Konstanz, Germany. Abstract. The objective of this

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

More information

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Erik B. Reed Carnegie Mellon University Silicon Valley Campus NASA Research Park Moffett Field, CA 94035 erikreed@cmu.edu

More information

Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

More information

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Bayesian probability theory

Bayesian probability theory Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

Genetic Algorithm Evolution of Cellular Automata Rules for Complex Binary Sequence Prediction

Genetic Algorithm Evolution of Cellular Automata Rules for Complex Binary Sequence Prediction Brill Academic Publishers P.O. Box 9000, 2300 PA Leiden, The Netherlands Lecture Series on Computer and Computational Sciences Volume 1, 2005, pp. 1-6 Genetic Algorithm Evolution of Cellular Automata Rules

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity

A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity Dongwon Kang 1, In-Gwon Song 1, Seunghun Park 1, Doo-Hwan Bae 1, Hoon-Kyu Kim 2, and Nobok Lee 2 1 Department

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

Classification On The Clouds Using MapReduce

Classification On The Clouds Using MapReduce Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks

Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks Jangmin O 1,JaeWonLee 2, Sung-Bae Park 1, and Byoung-Tak Zhang 1 1 School of Computer Science and Engineering, Seoul National University

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

Learning diagnostic diagrams in transport-based data-collection systems

Learning diagnostic diagrams in transport-based data-collection systems University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers Faculty of Engineering and Information Sciences 2014 Learning diagnostic diagrams in transport-based data-collection

More information

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

More information

Understanding the dynamics and function of cellular networks

Understanding the dynamics and function of cellular networks Understanding the dynamics and function of cellular networks Cells are complex systems functionally diverse elements diverse interactions that form networks signal transduction-, gene regulatory-, metabolic-

More information

The Theory of Concept Analysis and Customer Relationship Mining

The Theory of Concept Analysis and Customer Relationship Mining The Application of Association Rule Mining in CRM Based on Formal Concept Analysis HongSheng Xu * and Lan Wang College of Information Technology, Luoyang Normal University, Luoyang, 471022, China xhs_ls@sina.com

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

DNA Hypernetworks for Information Storage and Retrieval

DNA Hypernetworks for Information Storage and Retrieval DNA Hypernetworks for Information Storage and Retrieval Byoung-Tak Zhang and Joo-Kyung Kim Biointelligence Laboratory, School of Computer Science and Engineering Seoul National University, Seoul 5-7, Korea

More information

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM) 1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM) 2. Gene regulation tools and methods Regulatory sequences and motif discovery TF binding sites, microrna target prediction

More information

Hidden Markov Models

Hidden Markov Models 8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies

More information

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Oscar Kipersztok Mathematics and Computing Technology Phantom Works, The Boeing Company P.O.Box 3707, MC: 7L-44 Seattle, WA 98124

More information

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

More information

Customer Data Mining and Visualization by Generative Topographic Mapping Methods

Customer Data Mining and Visualization by Generative Topographic Mapping Methods Customer Data Mining and Visualization by Generative Topographic Mapping Methods Jinsan Yang and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National

More information

Load Balancing Algorithm Based on Services

Load Balancing Algorithm Based on Services Journal of Information & Computational Science 10:11 (2013) 3305 3312 July 20, 2013 Available at http://www.joics.com Load Balancing Algorithm Based on Services Yufang Zhang a, Qinlei Wei a,, Ying Zhao

More information

Model-Based Cluster Analysis for Web Users Sessions

Model-Based Cluster Analysis for Web Users Sessions Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr

More information

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone

More information

International Journal of Emerging Technology & Research

International Journal of Emerging Technology & Research International Journal of Emerging Technology & Research An Implementation Scheme For Software Project Management With Event-Based Scheduler Using Ant Colony Optimization Roshni Jain 1, Monali Kankariya

More information

An Intelligent Matching System for the Products of Small Business/Manufactures with the Celebrities

An Intelligent Matching System for the Products of Small Business/Manufactures with the Celebrities An Intelligent Matching System for the Products of Small Business/Manufactures with the Celebrities Junho Jeong 1, Yunsik Son 2, Seokhoon Ko 1 and Seman Oh 1 1 Dept. of Computer Engineering, Dongguk University,

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1

Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1 Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1 Maitreya Natu Dept. of Computer and Information Sciences University of Delaware, Newark, DE, USA, 19716 Email: natu@cis.udel.edu

More information

A Robustness Simulation Method of Project Schedule based on the Monte Carlo Method

A Robustness Simulation Method of Project Schedule based on the Monte Carlo Method Send Orders for Reprints to reprints@benthamscience.ae 254 The Open Cybernetics & Systemics Journal, 2014, 8, 254-258 Open Access A Robustness Simulation Method of Project Schedule based on the Monte Carlo

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Chapter 28. Bayesian Networks

Chapter 28. Bayesian Networks Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, Byoung-Hee and Lim, Byoung-Kwon Biointelligence

More information

Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Probabilistic Evidential Reasoning with Symbolic Argumentation for Space Situation Awareness

Probabilistic Evidential Reasoning with Symbolic Argumentation for Space Situation Awareness AIAA Infotech@Aerospace 2010 20-22 April 2010, Atlanta, Georgia AIAA 2010-3481 Probabilistic Evidential Reasoning with Symbolic Argumentation for Space Situation Awareness Glenn Takata 1 and Joe Gorman

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information

Visualizing Graphical Probabilistic Models

Visualizing Graphical Probabilistic Models Visualizing Graphical Probabilistic Models Chih-Hung Chiang*, Patrick Shaughnessy, Gary Livingston, Georges Grinstein Department of Computer Science, University of Massachusetts Lowell, Lowell, MA01854

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

A mixture model for random graphs

A mixture model for random graphs A mixture model for random graphs J-J Daudin, F. Picard, S. Robin robin@inapg.inra.fr UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Examples of networks. Social: Biological:

More information

Chapter 14 Managing Operational Risks with Bayesian Networks

Chapter 14 Managing Operational Risks with Bayesian Networks Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian

More information

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of

More information

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS Hasni Neji and Ridha Bouallegue Innov COM Lab, Higher School of Communications of Tunis, Sup Com University of Carthage, Tunis, Tunisia. Email: hasni.neji63@laposte.net;

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

Towards Rule-based System for the Assembly of 3D Bricks

Towards Rule-based System for the Assembly of 3D Bricks Universal Journal of Communications and Network 3(4): 77-81, 2015 DOI: 10.13189/ujcn.2015.030401 http://www.hrpub.org Towards Rule-based System for the Assembly of 3D Bricks Sanguk Noh School of Computer

More information

Triangulation by Ear Clipping

Triangulation by Ear Clipping Triangulation by Ear Clipping David Eberly Geometric Tools, LLC http://www.geometrictools.com/ Copyright c 1998-2016. All Rights Reserved. Created: November 18, 2002 Last Modified: August 16, 2015 Contents

More information

NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS

NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS Orly Alter (a) *, Gene H. Golub (b), Patrick O. Brown (c)

More information

Comparison of Major Domination Schemes for Diploid Binary Genetic Algorithms in Dynamic Environments

Comparison of Major Domination Schemes for Diploid Binary Genetic Algorithms in Dynamic Environments Comparison of Maor Domination Schemes for Diploid Binary Genetic Algorithms in Dynamic Environments A. Sima UYAR and A. Emre HARMANCI Istanbul Technical University Computer Engineering Department Maslak

More information

Application of Graph-based Data Mining to Metabolic Pathways

Application of Graph-based Data Mining to Metabolic Pathways Application of Graph-based Data Mining to Metabolic Pathways Chang Hun You, Lawrence B. Holder, Diane J. Cook School of Electrical Engineering and Computer Science Washington State University Pullman,

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

An Efficient Knowledge Base Management Scheme for Context Aware Surveillance

An Efficient Knowledge Base Management Scheme for Context Aware Surveillance An Efficient Knowledge Base Management Scheme for Context Aware Surveillance Soomi Yang Department of Information Security, The University of Suwon, San 2-2, Wau-ri, Bongdam-eup, Hwangseong-si, Gyeonggi-do,

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Multi-layer Structure of Data Center Based on Steiner Triple System

Multi-layer Structure of Data Center Based on Steiner Triple System Journal of Computational Information Systems 9: 11 (2013) 4371 4378 Available at http://www.jofcis.com Multi-layer Structure of Data Center Based on Steiner Triple System Jianfei ZHANG 1, Zhiyi FANG 1,

More information

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the

More information

Junghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 373-1 Kuseong-dong, Yuseong-gu Daejoen, Korea

Junghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 373-1 Kuseong-dong, Yuseong-gu Daejoen, Korea Proceedings of the 211 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. A BINARY PARTITION-BASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT Junghyun

More information

Compression algorithm for Bayesian network modeling of binary systems

Compression algorithm for Bayesian network modeling of binary systems Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

A Computational Framework for Exploratory Data Analysis

A Computational Framework for Exploratory Data Analysis A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.

More information

Neovision2 Performance Evaluation Protocol

Neovision2 Performance Evaluation Protocol Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram rajmadhan@mail.usf.edu Dmitry Goldgof, Ph.D. goldgof@cse.usf.edu Rangachar Kasturi, Ph.D.

More information

Character Image Patterns as Big Data

Character Image Patterns as Big Data 22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information