Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis
|
|
- Posy Skinner
- 8 years ago
- Views:
Transcription
1 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis Kyu-Baek Hwang 1, Byoung-Hee Kim 2, and Byoung-Tak Zhang 2 1 School of Computing, Soongsil University, Seoul , Korea kbhwang@ssu.ac.kr 2 School of Computer Science and Engineering Seoul National University, Seoul , Korea bhkim@bi.snu.ac.kr, btzhang@cse.snu.ac.kr Abstract. Bayesian network learning is a useful tool for exploratory data analysis. However, applying Bayesian networks to the analysis of large-scale data, consisting of thousands of attributes, is not straightforward because of the heavy computational burden in learning and visualization. In this paper, we propose a novel method for large-scale data analysis based on hierarchical compression of information and constrained structural learning, i.e., hierarchical Bayesian networks (HBNs). The HBN can compactly visualize global probabilistic structure through a small number of hidden variables, approximately representing a large number of observed variables. An efficient learning algorithm for HBNs, which incrementally maximizes the lower bound of the likelihood function, is also suggested. The effectiveness of our method is demonstrated by the experiments on synthetic large-scale Bayesian networks and a real-life microarray dataset. 1 Introduction Due to their ability to caricature conditional independencies among variables, Bayesian networks have been applied to various data mining tasks [9], [4]. However, application of the Bayesian network to extremely large domains (e.g., a database consisting of thousands of attributes) still remains a challenging task. General approach to structural learning of Bayesiannetworks, i.e., greedy search, encounters the following problems when the number of variables is greater than several thousands. First, the amount of running time for the structural learning is formidable. Moreover, greedy search is likely to be trapped in local optima, because of the increased search space. Until now, several researchers suggested the methods for alleviating the above problems [5], [8], [6]. Even though these approaches have been shown to efficiently find a reasonable solution, they have the following two drawbacks. First, they are likely to spend lots of time to learn local structure, which might be I. King et al. (Eds.): ICONIP 2006, Part I, LNCS 4232, pp , c Springer-Verlag Berlin Heidelberg 2006
2 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis 671 less important in the viewpoint of grasping global structure. The second problem is about visualization. It would be extremely hard to extract useful knowledge from a complex network structure consisting of thousands of vertices and edges. In this paper, we propose a new method for large-scale data analysis using hierarchical Bayesian networks. It should be noted that introducing hierarchical structures in modeling is a generic technique. Several researchers have introduced the hierarchy to probabilistic graphical modeling [10], [7]. Our approach is different from theirs in the purpose of hierarchical modeling. The purpose of our method is to make it feasible to apply probabilistic graphical modeling to extremely large domain. We also propose an efficient learning algorithm for hierarchical Bayesian networks having lots of hidden variables. The paper is organized as follows. In Section 2, we define the hierarchical Bayesian network (HBN) and describe its property. The learning algorithm for HBNs is described in Section 3. In Section 4, we demonstrate the effectiveness of our method through the experiments on various large-scale datasets. Finally, we draw the conclusion in Section 5. 2 Definition of the Hierarchical Bayesian Network for Large-Scale Data Analysis Assume that our problem domain is described by n discrete variables, Y = {Y 1,Y 2,..., Y n }. 1 The hierarchical Bayesian network for this domain is a special Bayesian network, consisting of Y and additional hidden variables. It assumes a layered hierarchical structure as follows. The bottom layer (observed layer) consists of the observed variables Y. The first hidden layer consists of n/2 hidden variables, Z 1 = {Z 11,Z 12,..., Z 1 n/2 }. The second hidden layer consists of ( n/2 )/2 hidden variables, Z 2 = {Z 21,Z 22,..., Z 2 ( n/2 )/2 }.Finally, the top layer (the log 2 n -th hidden layer) consists of only one hidden variable, Z log2 n = {Z log2 n 1}. We indicate all the hidden variables as Z = {Z 1, Z 2,..., Z log2 n }. 2 Hierarchical Bayesian networks, consisting of the variables {Y, Z}, have the following structural constraints. 1. Any parents of a variable should be in the same or immediate upper layer. 2. At most, one parent from the immediate upper layer is allowed for each variable. Fig. 1 shows an example HBN consisting of eight observed and seven hidden variables. By the above structural constraints, a hierarchical Bayesian network represents the joint probability distribution over {Y, Z} as follows. 1 In this paper, we represent a random variable as a capital letter (e.g., X, Y,and Z) and a set of variables as a boldface capital letter (e.g., X, Y, and Z). The corresponding lowercase letters denote the instantiation of the variable (e.g., x, y, and z) or all the members of the set of variables (e.g., x, y, andz), respectively. 2 We assume that all hidden variables are also discrete.
3 672 K.-B. Hwang, B.-H. Kim, and B.-T. Zhang Hidden layer 3 Z 31 Hidden layer 2 Z 21 Z 22 Hidden layer 1 Z 11 Z 12 Z 13 Z 14 Observed layer Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Fig. 1. An example HBN structure. The bottom (observed) layer consists of eight variables describing the problem domain. Each hidden layer corresponds to a compressed representation of the observed layer. P (Y, Z) =P (Z log2 n ) log 2 n 1 i=1 P (Z i Z i+1 ) P (Y Z 1 ). (1) An HBN is specified by the tuple S h, Θ Sh. Here, S h denotes the structure of HBN and Θ Sh denotes the set of the parameters of local probability distributions given S h. In addition, we denote the parameters for each layer as {Θ Sh Y, Θ Sh Z 1,..., Θ Sh Z log2 n } (= Θ S h ). The hierarchical Bayesian network can represent hierarchical compression of the information contained in Y = {Y 1,Y 2,..., Y n }. The number of hidden variables comprising the first hidden layer is half of n. Under the structural constraints, each hidden variable in the first hidden layer, i.e., Z 1i (1 i n/2 ), can have two child nodes in the observed layer as depicted in Fig 1. 3 Here, each hidden variable corresponds to a compressed representation of its own children if the number of possible values of it is less than the number of possible configurations of its children. In the hierarchical Bayesian network, edges between the variables of the same layer are also allowed (e.g., see hidden layers 1 and 2 in Fig. 1). These edges encode the conditional (in)dependencies among the variables in the same layer. The conditional independencies among the variables in a hidden layer correspond to a rough representation of the conditional independencies among the observed variables because each hidden variable is a compressed representation for a set of observed variables. When we deal with a problem domain consisting of thousands of observed variables, an approximated probabilistic dependencies visualized through hidden variables can be a reasonable solution for exploratory data analysis. 3 If n is odd, all the hidden variables except one can have two child nodes. In this case, the last one has only one child node.
4 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis Learning the Hierarchical Bayesian Network Assume that we have a training dataset for Y consisting of M examples, i.e., D Y = {y 1, y 2,..., y M }. We could describe our learning objective (log likelihood) as follows. M M L(Θ Sh,S h )= log P (y m Θ Sh,S h )= log m=1 m=1 Z P (Z, y m Θ Sh,S h ), (2) where Z means summation over all possible configurations of Z. General approach to finding maximum likelihood solution with missing variables, i.e., the expectation-maximization (EM) algorithm, is not applicable here because the number of missing variables amounts to several thousands. The large number of missing variables would render the solution space infeasible. Here, we propose an efficient algorithm for maximizing the lower bound of Eqn. (2). The lower bound for the likelihood function is derived by Jensen s inequality as follows. M log Z m=1 M P (Z, y m Θ Sh,S h ) log P (Z, y m Θ Sh,S h ). (3) m=1 Z Further, the term for each example, y m, in the above equation can be decomposed as follows. log P (Z, y m Θ Sh,S h )= log P (y m Θ Sh Y,S h ) P (Z y m, Θ Sh \Θ Sh Y,S h ) Z Z = C 0 log P (y m Θ Sh Y,S h )+ log P (Z y m, Θ Sh \Θ Sh Y,S h ), (4) Z where C 0 is a constant which is not related to the choice of Θ Sh and S h.in Eqn. (4), the parameter sets Θ Sh Y and Θ Sh \Θ Sh Y can be learned separately given S h. 4 Our algorithm starts by learning Θ Sh Y and the substructure of S h related to only the parents of Y. After that, we fill missing values for the variables in the first hidden layer, making the training dataset for Z 1. 5 Now, Eqn. (4) can be more decomposed as follows. C 0 log P (y m Θ Sh Y,S h )+ Z log P (Z y m, Θ Sh \Θ Sh Y,S h ) = C 0 log P (y m Θ Sh Y,S h )+C 1 log P (z 1m Θ Sh Z 1,S h ) + log P (Z\Z 1 y m, z 1m, Θ Sh \{Θ Sh Y, Θ Sh Z 1 },S h ), (5) Z\Z 1 4 In this paper, the symbol \ denotes set difference. 5 Because hidden variables are all missing, this procedure is likely to produce hidden constants by maximizing the likelihood function. We apply an encoding scheme for preventing this problem, which will be described later.
5 674 K.-B. Hwang, B.-H. Kim, and B.-T. Zhang Table 1. Description of the two-phase learning algorithm for hierarchical Bayesian networks. Here, hidden layer 0 means the observed layer and the variable set Z 0 means the observed variable set Y. Input D Y = {y 1, y 2,..., y M} - the training dataset. Output A hierarchical Bayesian È Å «network Sh, Θ S h, maximizing M the lower bound of m=1 log P (ym ΘS,S h h). The First Phase -Forl =0to log 2 n 1 - Estimate the mutual information between all possible variable pairs, I(Z li ; Z lj ), in hidden layer l. - Sort the variable pairs in decreasing order of mutual information. - Select n 2 (l+1) variable pairs from the sorted list such that each variable is included in only one variable pair. - Set each variable in hidden layer (l + 1) as the parent of a selected variable pair. - Learn the parameter set Θ Sh Z l by maximizing È M m=1 log P (z lm Θ Sh Z l,s h ). - Generate the dataset D Zl+1 based on the current HBN. The Second Phase - Learn the Bayesian È network structure inside hidden layer l M by maximizing m=1 log P (z lm Θ Sh Z l,s h ). where C 1 is a constant which is not related to the optimization. Then, we could learn Θ Sh Z 1 and related substructure of S h by maximizing log P (z 1m Θ Sh Z 1,S h ). In this way, we could learn the hierarchical Bayesian network from bottom to top. We propose a two-phase learning algorithm for hierarchical Bayesian networks based on the above decomposition. In the first phase, a hierarchy for information compression is learned. From the observed layer, we choose variable pairs sharing common parents in the first hidden layer. Here, we should select the variable pair with high mutual information for minimizing information loss. After determining the parent for each variable pair, missing values for the hidden parent variable are filled as follows. First, we estimate the joint probability distribution for the variable pair, ˆP (Yj,Y k )(1 j, k n, j k), from the given dataset D Y. Then, we set the parent variable value for the most probable configuration of {Y j,y k } as 0 and the second probable one as 1. 6 By this encoding scheme, a parent variable could represent the two most probable configurations of its child variables, minimizing the information loss. The parent variable value for other two cases are considered as missing. Now, we could learn the parameters for the observed variables, Θ Sh Y, using the standard EM algorithm. After learning Θ Sh Y, we could fill the missing values by probabilistic inference, making a complete dataset for the variables in the first hidden layer. Now, the same procedure 6 Here, we assume that all variables are binary although our method could be extended to more general cases.
6 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis 675 can be applied for Z 1, learning the parameter set Θ Sh Z 1 and generating a complete dataset for Z 2. This process is iterated, building the hierarchical structure and making the complete dataset for all variables, {Y, Z}. After building the hierarchy, we learn the edges inside a layer when necessary (the second phase). Any structural learning algorithm for Bayesian network can be employed because a complete dataset is now given for the variables in each layer. Table 1 summarizes the two-phase algorithm for learning HBNs. 4 Experimental Evaluation 4.1 Results on Synthetic Datasets To simulate diverse situations, we experimented with the datasets generated from various large-scale Bayesian networks having different structural properties. They are categorized into scale-free [2] and modular structures. All variables were binary and local probability distributions were randomly generated. Here, we show the results on two scale-free and modular Bayesian networks, consisting of 5000 nodes. 7 They are shown in Fig. 2(a) and 2(b), respectively. Training datasets, having 1000 examples, were generated from them. The first phase of HBN learning algorithm was applied to the training datasets, building hierarchies. Then, the second phase was applied to the seventh hidden layer, consisting of 40 hidden variables. The learned Bayesian network structures inside the seventh hidden layer are shown in Fig. 2(c) and 2(d). We examined the quality of information compression. Fig. 3(a) shows the mutual information value between parent nodes in the upper layer and child nodes in the lower layer, averaged across a hidden layer. 8 Here, we can observe that the amount of shared information between consecutive layers is more than 50%, 9 although it decreases as the level of hidden layer goes up. Interestingly, the hierarchical Bayesian network can preserve more information in the case of the dataset from modular Bayesian networks. To investigate this further, we estimated the distribution of the mutual information (see Fig. 3(b)). Here, we can clearly observe that more information is shared between parent and child nodes in the case of the modular Bayesain network than in the case of the scale-free Bayesian network. Based on the above experimental results, we conclude that the hierarchical Bayesian network can efficiently represent the complicated information contained in a large number of variables. In addition, HBNs are more appropriate when the true probability distribution assumes a modular structure. We conjecture that this is because a module in the lower layer can be well represented by a hidden node in the upper layer in our HBN framework. 7 Results on other Bayesian networks were similar although not shown here. 8 Here, mutual information values were scaled into [0, 1], by being divided by the minimum of the entropy of each variable. 9 It means that the scaled mutual information value is greater than 0.5.
7 676 K.-B. Hwang, B.-H. Kim, and B.-T. Zhang Pajek Pajek (a) (b) H16 H10 H12 H36 H8 H7 H6 H40 H12 H7 H11 H3 H15 H24 H5 H15 H17 H14 H22 H9 H13 H9 H23 H16 H37 H11 H8 H37 H26 H23 H18 H5 H40 H2 H6 H10 H39 H19 H1 H2 H21 H18 H17 H13 H22 H14 H25 H34 H38 H39 H20 H4 H30 H36 H24 H4 H19 H1 H32 H25 H27 H30 H35 H26 H38 H27 H20 H33 H31 H3 H29 H35 H21 H28 H34 H33 H28 H32 Pajek (c) H31 H29 Pajek (d) Fig. 2. Scale-free (a) and modular (b) Bayesian networks, consisting of 5000 nodes, which generated the training datasets. The Bayesian network structures, inside the seventh hidden layer, learned from the training datasets generated from the scale-free (c) and modular (d) Bayesian networks. These network structures were drawn by Pajek software [3]. 4.2 Results on a Real-Life Microarray Dataset A real-life microarray dataset on the budding yeast cell-cycle [11] was analyzed by hierarchical Bayesian networks. The dataset consists of 6178 genes and 69 samples. We binarized gene expression level based on the median expression of each slide sample. Among the 6178 genes, we excluded genes with low information entropy (< 0.8). Finally, we analyzed a binary dataset consisting of 6120 variables and 69 samples. General tendency with respect to the information compression was similar to the case of synthetic datasets, although not shown here. In the seventh hidden layer, consisting of 48 variables, we learned a Bayesian network structure (see Fig. 4). This Bayesian network compactly visualizes the original network structure consisting of 6120 genes. From the network structure, we can easily find a set of hub nodes, e.g., H2, H4, H8, H28, and H30. Genes comprising these hub nodes and their biological role were analyzed. The function of genes can be described by Gene Ontology (GO) [1] annotations. GO
8 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis 677 Average MI between parent and child nodes Scale free Bayesian network Modular Bayesian network Density Scale free Bayesian network Modular Bayesian network Hidden layer (a) MI between parent and child nodes (b) Fig. 3. Quality of information compression by hierarchical Bayesian network learning. Mutual information between parent nodes in the upper layer and child nodes in the lower layer (a) and its distribution (b). H45 H18 H14 H13 H6 H11 H17 H21 H7 H9 H20 H29 H25 H10 H15 H3 H22 H37 H2 H19 H24 H12 H48 H47 H8 H30 H31 H26 H43 H34 H5 H4 H28 H27 H16 H46 H42 H38 H40 H23 H36 H44 H35 H33 H32 H39 H1 H41 Pajek Fig. 4. The Bayesian network structure in the seventh hidden layer consisting of 48 variables. This network structure approximately represents the original network structure consisting of 6120 yeast genes. Here, we can easily find some hub nodes, for example, H2, H4, H8, H28, and H30. The network structure was drawn by Pajek software [3]. maps each gene or gene product to directly related GO terms, which have three categories: biological process (BP), cellular compartment (CC), and molecular function (MF). We can conjecture the meaning of each hub using this annotation, focusing on the BP terms related to the cell-cycle. For this task, we used GO Term Finder ( which looks for significantly shared GO terms that are directly or indirectly related to the given list of genes. The result are summarized in Table 2. The closely located hub nodes, H4 and H8 (see Fig. 4), share the function of cellular
9 678 K.-B. Hwang, B.-H. Kim, and B.-T. Zhang Table 2. Gene function annotation of hub nodes in the learned Bayesian network consisting of 48 variables. The significance of a GO term was evaluated by examining the proportion of the genes associated to this term, compared to the number of times that term is associated with other genes in the genome (p-values were calculated by a binomial distribution approximation). Node name GO term Frequency p-value H2 Organelle organization and biogenesis 18.7% H4 Cellular physiological process 74.0% H8 Cellular physiological process 74.2% H28 Response to stimulus 14.8% H30 Cell organization and biogenesis 29.3% physiological process. The genes in H30 share more specific function than H4 and H8, namely cell organization and biogenesis. The hub node H2 is related to organelle organization and biogenesis, which is more specific than that of H30. The genes in H28, the most crowded hub in the network structure, respond to stress or stimulus such as nitrogen starvation. 5 Conclusion We proposed a new class of Bayesian networks for analyzing large-scale data consisting of thousands of variables. The hierarchical Bayesian network is based on hierarchical compression of information and constrained structural learning. Through the experiments on datasets from synthetic Bayesian networks, we demonstrated the effectiveness of hierarchical Bayesian network learning with respect to information compression. Interestingly, the degree of information conservation was affected by structural property of the Bayesian networks which generated the datasets. The hierarchical Bayesian network could preserve more information in the case of modular networks than in the case of scale-free networks. One explanation for this phenomenon is that our HBN method is more suitable for modular networks because a variable in the upper layer could well represent a set of variables contained in a module in the lower layer. Our method was also applied to analysis of real microarray data on yeast cell-cycle, consisting of 6120 genes. We were able to obtain a reasonable approximation, consisting of 48 variables, on the global gene expression network. A hub node in the Bayesian network consisted of genes with similar functions. Moreover, neighboring hub nodes in the learned Bayesian network also shared similar functions, confirming the effectiveness of our HBN method for real-life large-scale data analysis. Acknowledgements This work was supported by the Soongsil University Research Fund and by the Korea Ministry of Science and Technology under the NRL program.
10 Learning Hierarchical Bayesian Networks for Large-Scale Data Analysis 679 References 1. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel- Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1) (2000) Barabási, A.-L. and Albert, R.: Emergence of scaling in random networks. Science 286(5439) (1999) Batagelj, V. and Mrvar, A.: Pajek - program for large network analysis. Connections 21(2) (1998) Friedman, N.: Inferring cellular networks using probabilistic graphical models. Science 303(6) (2004) Friedman, N., Nachman, I., Pe er, D.: Learning Bayesian network structure from massive datasets: the sparse candidate algorithm. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI) (1999) Goldenberg, A. and Moore, A.: Tractable learning of large Bayes net structures from sparse data. Proceedings of the Twentifirst International Conference on Machine Learning (ICML) (2004) 7. Gyftodimos, E. and Flach, P.: Hierarchical Bayesian networks: an approach to classification and learning for structured data. Lecture Notes in Artificial Intelligence 3025 (2004) Hwang, K.-B., Lee, J.W., Chung, S.-W., Zhang, B.-T.: Construction of large-scale Bayesian networks by local to global search. Lecutre Notes in Artificial Intelligence 2417 (2002) Nikovski, D.: Constructing Bayesian networks for medical diagnosis from incomplete and partially correct statistics. IEEE Transactions on Knowledge and Data Engineering 12(4) (2000) Park, S. and Aggarwal, J.K.: Recognition of two-person interactions using a hierarchical Bayesian network. Proceedings of the First ACM SIGMM International Workshop on Video Surveillance (IWVS) (2003) Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9(12) (1998)
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationEffect of Using Neural Networks in GA-Based School Timetabling
Effect of Using Neural Networks in GA-Based School Timetabling JANIS ZUTERS Department of Computer Science University of Latvia Raina bulv. 19, Riga, LV-1050 LATVIA janis.zuters@lu.lv Abstract: - The school
More informationData Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
More informationA Bayesian Approach for on-line max auditing of Dynamic Statistical Databases
A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases Gerardo Canfora Bice Cavallo University of Sannio, Benevento, Italy, {gerardo.canfora,bice.cavallo}@unisannio.it ABSTRACT In
More informationAgenda. Interface Agents. Interface Agents
Agenda Marcelo G. Armentano Problem Overview Interface Agents Probabilistic approach Monitoring user actions Model of the application Model of user intentions Example Summary ISISTAN Research Institute
More informationIn Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999
In Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering, pages 842-846, Warsaw, Poland, December 2-4, 1999 A Bayesian Network Model for Diagnosis of Liver Disorders Agnieszka
More informationWeb-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic
More informationA Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
More informationA Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationIntroduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationExercise with Gene Ontology - Cytoscape - BiNGO
Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationUsing Semantic Data Mining for Classification Improvement and Knowledge Extraction
Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Fernando Benites and Elena Sapozhnikova University of Konstanz, 78464 Konstanz, Germany. Abstract. The objective of this
More informationThe Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationScaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce
Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Erik B. Reed Carnegie Mellon University Silicon Valley Campus NASA Research Park Moffett Field, CA 94035 erikreed@cmu.edu
More informationTracking Groups of Pedestrians in Video Sequences
Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationThree Effective Top-Down Clustering Algorithms for Location Database Systems
Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr
More informationForecasting Trade Direction and Size of Future Contracts Using Deep Belief Network
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationGenetic Algorithm Evolution of Cellular Automata Rules for Complex Binary Sequence Prediction
Brill Academic Publishers P.O. Box 9000, 2300 PA Leiden, The Netherlands Lecture Series on Computer and Computational Sciences Volume 1, 2005, pp. 1-6 Genetic Algorithm Evolution of Cellular Automata Rules
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationA Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity
A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity Dongwon Kang 1, In-Gwon Song 1, Seunghun Park 1, Doo-Hwan Bae 1, Hoon-Kyu Kim 2, and Nobok Lee 2 1 Department
More informationExtend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia
More informationA Network Flow Approach in Cloud Computing
1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges
More informationClassification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt
More informationTOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More informationStock Trading by Modelling Price Trend with Dynamic Bayesian Networks
Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks Jangmin O 1,JaeWonLee 2, Sung-Bae Park 1, and Byoung-Tak Zhang 1 1 School of Computer Science and Engineering, Seoul National University
More informationAn Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
More informationA NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
More informationBig Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
More informationLearning diagnostic diagrams in transport-based data-collection systems
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers Faculty of Engineering and Information Sciences 2014 Learning diagnostic diagrams in transport-based data-collection
More informationMemory Allocation Technique for Segregated Free List Based on Genetic Algorithm
Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
More informationUnderstanding the dynamics and function of cellular networks
Understanding the dynamics and function of cellular networks Cells are complex systems functionally diverse elements diverse interactions that form networks signal transduction-, gene regulatory-, metabolic-
More informationThe Theory of Concept Analysis and Customer Relationship Mining
The Application of Association Rule Mining in CRM Based on Formal Concept Analysis HongSheng Xu * and Lan Wang College of Information Technology, Luoyang Normal University, Luoyang, 471022, China xhs_ls@sina.com
More informationIntroducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
More informationDNA Hypernetworks for Information Storage and Retrieval
DNA Hypernetworks for Information Storage and Retrieval Byoung-Tak Zhang and Joo-Kyung Kim Biointelligence Laboratory, School of Computer Science and Engineering Seoul National University, Seoul 5-7, Korea
More information1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)
1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM) 2. Gene regulation tools and methods Regulatory sequences and motif discovery TF binding sites, microrna target prediction
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More informationAnother Look at Sensitivity of Bayesian Networks to Imprecise Probabilities
Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Oscar Kipersztok Mathematics and Computing Technology Phantom Works, The Boeing Company P.O.Box 3707, MC: 7L-44 Seattle, WA 98124
More informationIdentifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
More informationCustomer Data Mining and Visualization by Generative Topographic Mapping Methods
Customer Data Mining and Visualization by Generative Topographic Mapping Methods Jinsan Yang and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National
More informationLoad Balancing Algorithm Based on Services
Journal of Information & Computational Science 10:11 (2013) 3305 3312 July 20, 2013 Available at http://www.joics.com Load Balancing Algorithm Based on Services Yufang Zhang a, Qinlei Wei a,, Ying Zhao
More informationModel-Based Cluster Analysis for Web Users Sessions
Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr
More informationManjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India
Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone
More informationInternational Journal of Emerging Technology & Research
International Journal of Emerging Technology & Research An Implementation Scheme For Software Project Management With Event-Based Scheduler Using Ant Colony Optimization Roshni Jain 1, Monali Kankariya
More informationAn Intelligent Matching System for the Products of Small Business/Manufactures with the Celebrities
An Intelligent Matching System for the Products of Small Business/Manufactures with the Celebrities Junho Jeong 1, Yunsik Son 2, Seokhoon Ko 1 and Seman Oh 1 1 Dept. of Computer Engineering, Dongguk University,
More informationData Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
More informationComparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationApplication of Adaptive Probing for Fault Diagnosis in Computer Networks 1
Application of Adaptive Probing for Fault Diagnosis in Computer Networks 1 Maitreya Natu Dept. of Computer and Information Sciences University of Delaware, Newark, DE, USA, 19716 Email: natu@cis.udel.edu
More informationA Robustness Simulation Method of Project Schedule based on the Monte Carlo Method
Send Orders for Reprints to reprints@benthamscience.ae 254 The Open Cybernetics & Systemics Journal, 2014, 8, 254-258 Open Access A Robustness Simulation Method of Project Schedule based on the Monte Carlo
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationChapter 28. Bayesian Networks
Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, Byoung-Hee and Lim, Byoung-Kwon Biointelligence
More informationPersonalization of Web Search With Protected Privacy
Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information
More informationFast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
More informationProbabilistic Evidential Reasoning with Symbolic Argumentation for Space Situation Awareness
AIAA Infotech@Aerospace 2010 20-22 April 2010, Atlanta, Georgia AIAA 2010-3481 Probabilistic Evidential Reasoning with Symbolic Argumentation for Space Situation Awareness Glenn Takata 1 and Joe Gorman
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationVisualizing Graphical Probabilistic Models
Visualizing Graphical Probabilistic Models Chih-Hung Chiang*, Patrick Shaughnessy, Gary Livingston, Georges Grinstein Department of Computer Science, University of Massachusetts Lowell, Lowell, MA01854
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationA mixture model for random graphs
A mixture model for random graphs J-J Daudin, F. Picard, S. Robin robin@inapg.inra.fr UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Examples of networks. Social: Biological:
More informationChapter 14 Managing Operational Risks with Bayesian Networks
Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian
More informationEchidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
More informationONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS
ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS Hasni Neji and Ridha Bouallegue Innov COM Lab, Higher School of Communications of Tunis, Sup Com University of Carthage, Tunis, Tunisia. Email: hasni.neji63@laposte.net;
More informationNew Ensemble Combination Scheme
New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,
More informationTowards Rule-based System for the Assembly of 3D Bricks
Universal Journal of Communications and Network 3(4): 77-81, 2015 DOI: 10.13189/ujcn.2015.030401 http://www.hrpub.org Towards Rule-based System for the Assembly of 3D Bricks Sanguk Noh School of Computer
More informationTriangulation by Ear Clipping
Triangulation by Ear Clipping David Eberly Geometric Tools, LLC http://www.geometrictools.com/ Copyright c 1998-2016. All Rights Reserved. Created: November 18, 2002 Last Modified: August 16, 2015 Contents
More informationNOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS
NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS Orly Alter (a) *, Gene H. Golub (b), Patrick O. Brown (c)
More informationComparison of Major Domination Schemes for Diploid Binary Genetic Algorithms in Dynamic Environments
Comparison of Maor Domination Schemes for Diploid Binary Genetic Algorithms in Dynamic Environments A. Sima UYAR and A. Emre HARMANCI Istanbul Technical University Computer Engineering Department Maslak
More informationApplication of Graph-based Data Mining to Metabolic Pathways
Application of Graph-based Data Mining to Metabolic Pathways Chang Hun You, Lawrence B. Holder, Diane J. Cook School of Electrical Engineering and Computer Science Washington State University Pullman,
More informationA Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
More informationAn Efficient Knowledge Base Management Scheme for Context Aware Surveillance
An Efficient Knowledge Base Management Scheme for Context Aware Surveillance Soomi Yang Department of Information Security, The University of Suwon, San 2-2, Wau-ri, Bongdam-eup, Hwangseong-si, Gyeonggi-do,
More informationIntegrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationMulti-layer Structure of Data Center Based on Steiner Triple System
Journal of Computational Information Systems 9: 11 (2013) 4371 4378 Available at http://www.jofcis.com Multi-layer Structure of Data Center Based on Steiner Triple System Jianfei ZHANG 1, Zhiyi FANG 1,
More informationChapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network
Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the
More informationJunghyun Ahn Changho Sung Tag Gon Kim. Korea Advanced Institute of Science and Technology (KAIST) 373-1 Kuseong-dong, Yuseong-gu Daejoen, Korea
Proceedings of the 211 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. A BINARY PARTITION-BASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT Junghyun
More informationCompression algorithm for Bayesian network modeling of binary systems
Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationA Computational Framework for Exploratory Data Analysis
A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.
More informationNeovision2 Performance Evaluation Protocol
Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram rajmadhan@mail.usf.edu Dmitry Goldgof, Ph.D. goldgof@cse.usf.edu Rangachar Kasturi, Ph.D.
More informationCharacter Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
More informationRandom Forest Based Imbalanced Data Cleaning and Classification
Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem
More informationData Mining & Data Stream Mining Open Source Tools
Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.
More information