Spatial Data Preparation for Knowledge Discovery
|
|
|
- Pamela Sutton
- 10 years ago
- Views:
Transcription
1 Spatial Data Preparation for Knowledge Discovery Vania Bogorny 1, Paulo Martins Engel 1, Luis Otavio Alvares 1 1 Instituto de Informática Universidade Federal do Rio Grande do Sul (UFRGS) Caixa Postal Porto Alegre RS Brazil {vbogorny,engel,alvares}@inf.ufrgs.br Abstract. There is a well known necessity to extract knowledge from spatial databases. Dozens of algorithms for data mining and knowledge discovery are reported in the specific literature to supply this necessity. However, these algorithms have some general drawbacks. Some consider only spatial data and others, only non-spatial data. Most are pseudo-codes, which are usually not implemented in toolkits, and need inputs in a specific representation. These inputs are often in unusual formats, and if these formats are easily obtainable in real databases is normally not mentioned nor considered. Therefore, outside academic institutions, data mining practice in real spatial databases is rare. This research proposes a methodology to prepare spatial data for data mining, as well as an interoperable software architecture to implement the methodology. 1. Introduction Spatial data are used more and more in different areas such as urban planning, transportation, telecommunication, and so on. These data are stored and manipulated by Spatial Database Management Systems (SDBMS) and Geographic Information Systems (GIS) is the technology which provides a set of operations and functions for the analysis and visualization of spatial data. Among the large amount of data stored in spatial databases there is implicit, nontrivial and previously unknown knowledge unable to be directly captured by GIS operations. Specific techniques are necessary to find this kind of knowledge, which is the objective of the Knowledge Discovery in Databases (KDD) research area. Knowledge discovery is an interactive and iterative process, depending on many user decisions. Fayyad in (Fayyad et al 1996) classifies the KDD process for non-spatial databases in five main steps: selection, preprocessing, transformation, data mining and interpretation/evaluation. The selection, preprocessing and transformation steps refer to data preparation for data mining. Data mining is the step of applying discovery algorithms which produce an enumeration of patterns over the data. Interpretation is the step where the patterns discovered by data mining algorithms are visualized and analyzed. Data mining algorithms should be designed to directly access databases, but a large timeconsuming exercise is involved in transforming the database into a data mining algorithmcompatible format, which is usually a single table or a single file. This limitation causes a gap between databases and data mining algorithms. Classical data mining algorithms are implemented in many toolkits, but they were not developed to handle with spatial data. Spatial data mining algorithms are not implemented in many available toolkits, and spatial data preparation in both approaches is basically a manual step. This burden is carried by the end user whom guides the KDD process, which is usually an expert in the application domain, but not an expert in spatial databases. Data preparation is an important and repetitive step and the discovered patterns depend on the type and the quality of the input dataset. It is stated that this step consumes as much as 60 to 80 percent of the whole KDD process (Adriaans 1996) for non-spatial databases, and tends to increase for spatial data.
2 Although some techniques to simplify data preparation have been proposed for non-spatial databases, almost no research has focused in reducing the time and work needed to prepare spatial databases, despite advances in spatial data mining algorithms. Based on the necessity to preprocess large spatial databases for the practice of data mining and knowledge discovery, the first objective of this research is to develop a methodology for spatial data preparation for spatial and classical data mining algorithms (in this paper only data preparation for classical data mining is considered). A second objective is to provide an interoperable framework (software architecture) to support the methodology for Open GIS-based SDBMS. A third objective is to implement the software architecture as an open source data preparation toolkit, for relational databases to validate the architecture and the methodology Outline The remainder of the paper is organized as follows: in Section 2 we describe some related work. Section 3 presents the methodology adopted to develop this research. Section 4 presents a methodology to prepare spatial data for classical data mining, independent of SDBMS and data mining toolkit. Section 5 presents a software architecture for spatial data preparation for relational databases. Section 6 concludes the paper and shows the directions of future work. 2. Related Work There is no specific related work in spatial data preparation for data mining. Most researches in this area define operations or query languages which in one step should preprocess spatial databases and extract knowledge. Han (Han et al 1997) proposed a geo mining query language (GMQL) for spatial data mining. The drawback in this approach is that the proposed language is not available in commercial or open source SDBMS, only in the GEOMINER software prototype developed by the authors, which is no longer available. Ester (Ester et al 1996) defined a set of basic operations and indexes which should be implemented by a SDBMS for KDD. In 2000, Ester (Ester et al 2000) introduced some neighborhood graphs and paths, designed to compute spatial relations for data mining. These approaches have the same drawback: the operations are not implemented by most commercial and open-source SDBMS. Lazarevic (Lazarevic 2000) proposed a software system for spatial data analysis and modeling with some data preparation steps to detect geographic data inconsistencies. In our point of view this approach is useful to construct cartographic databases, and not to mine geographic databases, where digital maps are supposed to be previously validated. Malerba (Malerba et al 2002) proposed an object-oriented data mining query language for classification and association rules, which is implemented by the INGENS (Malerba et al 2003) software prototype. This approach works only with object-oriented databases, while most SDBMS are relational or object-relational, and the language is presented for one specific algorithm. Appice et al (2003) defined a spatial features extractor named FEATEX, to select features from a spatial database, and to create an output for the SPADA algorithm. Most SDBMS do not implement FEATEX s functions and its output is a format for one specific algorithm. 3. Research Methodology The methodology of work for this research follows a set of steps, described bellow: 1. Study all spatial aspects to be considered in the knowledge discovery process. 2. Study the input data format required by spatial and classical data mining algorithms.
3 3. Adapt and to transform spatial data for data mining. 4. Create artificial databases with implicit patterns. 5. Prepare and preprocess the artificial databases in different formats and perform experiments with data mining algorithms until find the implicit patterns. 6. Apply the same data preparation steps used with the artificial dataset, with real databases. 7. Study filters to reduce the amount of data and functions to optimize the data preparation process. 8. Define a methodology for data preparation, independent of SDBMS and data mining algorithms. 9. Create an interoperable software architecture and spatial data mining toolkit. 10. Apply the methodology and the software to real data in order to validate and refine the methodology. 4. Spatial Data Preparation Data mining in conventional databases basically differs from data mining in spatial databases in the spatial relations between spatial features. Classical DM algorithms are unable to interpret the meaning of geographic coordinates (x,y). The coordinates will be considered as two non-spatial variables (e.g. age or gender) and nothing useful will be achieved. To apply classical DM, spatial data have to be defined in terms of spatial predicates rather than items (Shekhar and Chawla 2003). A spatial association rule for example can be of form P 1 P 2... P n Q 1 Q 2... Q m, where at least one of P or Q is a spatial predicate. For example: age=old antenna=close disease=c32, i.e., if age is old and antenna is close than the disease is cancer. The spatial predicate is the materialization of a spatial relation. There are basically three spatial relations between two spatial features to consider: topological, distance and orientation. Topological relations characterize the type of intersection between two geographic features, such as crosses, contains, inside, covers, coveredby, equal, disjoint, overlaps. Distance relations are based on the Euclidean distance between two spatial features. Direction relations deal with the order as spatial features are located in space in relation to each other or in relation to a reference object Spatial Data Preparation for Classical Data Mining Most data mining algorithms have as input a single table or single file. For classical data mining, the single table represents the target feature on which discovery will be performed. Each row in the single table is an independent unit, i.e. a different instance of target feature and each column is an item characterizing this unit. Figure 1 shows a sample of a spatial database used in case studies. On the left side is a map, which graphically represents spatial data. On the right side, a set of database tables (district, cellular antenna, factory and patient) with spatial and non-spatial attributes. Let s consider that Patient is the target feature type and the objective is to figure out possible relations among cancer and cellular antennas or cancer and factories. Each row of the single table should be a different hospitalized patient. Factories and antennas will be the relevant feature types, which in addition to the non-spatial attributes will characterize the target feature. The spatial relations among the target feature type (patient) and the relevant feature types (cellular antennas and factories) should be materialized as columns in the single table. The materialization depends on the granularity level, which varies according to the objective of the KDD process.
4 (a) District Gid Name Area Perimeter Shape 1 Mario Quintana Polygon [(x 1,y 1 ),(x 2,y 2 ),..] 2 Protasio Alves Polygon [(x 1,y 1 ),(x 2,y 2 ),..] (b) Cellular Antenna Gid Shape 1 Polygon [(x 1,y 1 ),(x 2,y 2 ),..] 2 Polygon [(x 1,y 1 ),(x 2,y 2 ),..] 3 Polygon [(x 1,y 1 ),(x 2,y 2 ),..] (c) Factory Gid Type Impact_Degree Shape 1 Chemical High Point[(x 1,y 1 )] 2 Textile Low Point[(x 1,y 1 )] 3 Metallurgical High Point[(x 1,y 1 )] (d) Patient Gid Age Gender Disease Data Address Shape 1 34 M I63 03/04/1999 Ipiranga Point[(x 1,y 1 )] 2 60 M F /11/1999 V. Aires Point[(x 1,y 1 )] 3 45 F C32 01/12/1999 V. Jardim Point[(x 1,y 1 )] Figure 1. Spatial and non-spatial data representation in spatial databases In our example, where the objective is to investigate possible relations between cancer and cellular antennas, then the feature type granularity level is considered, as shown in Table 1. In this case, only the type antenna is relevant, and not which antenna. When the granularity level is feature type, then a dominant spatial relation is defined. In this example, if at least one cellular antenna is close, then the relation CLOSE is dominant over FAR. Table 1. Feature type granularity level Patient Age Gender Disease Antenna 1 34 M I63 CLOSE 2 60 M F19.2 CLOSE 3 45 F C32 FAR On the other hand, if the objective is to investigate possible relations between cancer and factories, in order to discover which factory is polluting the environment and might be related to cancer, than the factory instances are considered, as shown in Table 2. Details about the input data format are in (Bogorny, Alvares, 2005a). Table 2. Feature instance granularity level Patient Age Gender Disease Antenna_1 Antenna_n Factory_1 Factory_n 1 34 M I63 CLOSE CLOSE CLOSE FAR 2 60 M F19.2 CLOSE FAR CLOSE FAR 3 45 F C32 FAR FAR FAR FAR 4.2. A Spatial Data Preparation Methodology We propose a spatial data preparation method for classical DM based on three main steps, as shown in Figure 2: selection, materialization and transformation. The selection step is composed of two sub-steps, data definition and non-spatial filter, which respectively define and retrieve
5 from the spatial database all relevant information for the KDD process. Besides reducing the amount of data, the selection step characterizes the spatial features through non-spatial attributes, retrieving only data which satisfy the specified conditions. Noise and irrelevant data which do not fulfill the requirements of the definition step will be eliminated. Geographic Data Preparation Materialization Transformation Selection Classical DM Interpretation Geographic Database Data Definition Non-Spatial Filter Spatial Optimizer Spatial Join Figure 2. Spatial data preparation method for classical data mining Materialization is the step composed of two sub-steps, spatial optimizer and spatial join. In this step all relevant spatial relations, defined in the selection step, are computed and converted into spatial predicates. Some optimization functions and indexes to speed up the spatial neighborhood computation are defined. The optimization sub-step is not necessarily part of the spatial data preparation process, but in real spatial databases this will significantly reduce the time to compute spatial relations. Transformation is the step where all materialized relations are transposed into the single table format required by classical data mining toolkits. The resultant single table will have the target feature's non-spatial attributes and all materialized spatial relations with the relevant features. 5. Interoperable Software Architecture for Spatial Data Preparation Spatial data can be stored in different databases and in different formats. In this section, we present an interoperable software architecture to implement the spatial data preparation steps, for classical data mining and relational databases. The software architecture is based on the OGC (Open GIS Consortium) database schema and OGC spatial operations (Open GIS Consortium 1999), becoming interoperable with any SDBMS constructed under these specifications. The OGC is an organization which defines standards for Geographic Information Systems. Figure 3 shows the architecture design, which has three abstraction levels: data repository, data preparation and data mining. Data repository is the spatial database level, which can be any SDBMS implemented under OGC specifications. Data mining is the level which represents any classical data mining toolkit, and data preparation is the level which covers the gap between spatial databases and classical data mining toolkits.
6 Weka Intelligent Miner... Other Data Mining Tool Data Mining Feature Selection Materialization Transformation Data Preparation JDBC PostGIS Oracle... Other OpenGISbased SDBMS Data Repository
7 6. Conclusion and Future work Although a large number of spatial data mining papers are available in the literature, knowledge discovery in real spatial databases is still an arduous task. Some reasons are that only a few toolkits are available for spatial data mining and the input is in a restrictive format. This research presents a methodology for the end user of any application domain to easily prepare large amounts of spatial data for classical data mining. A software architecture was also presented and implemented in order to provide a data preparation toolkit. Experiments with the Weka data mining toolkit and real spatial databases stored in PostGIS were performed. Experiments with other databases and data mining toolkits will be performed. In future works, as a first task we will evaluate the materialization step. Dominance aspects for spatial relations will be studied in more detail. In our first experiments, we considered for the spatial relation distance, the predicate close dominant over far. Spatial predicates to represent the spatial relations, relevant to the discovery process, will be studied and detailed. The optimization step will also be evaluated. As a second task, we will apply and adjust the methodology to other real geographic databases, with new case studies, in order to evaluate how data preparation can optimize the data mining results. For spatial association rules, for example, we will figure out how data preparation can reduce the number of insignificant rules. A third task is to extend the methodology and the software architecture to prepare spatial data for spatial data mining algorithms. Acknowledgments This research is partially supported by CAPES and CNPq. References Adriaans, P. and Zantinge, D. Data mining. Addison Wesley Longman, Harlow, England, Appice A, Ceci M, Lanza A, et al (2003) Discovery of spatial association rules in geo-referenced census data: a relational mining approach, Intelligent Data Analysis (Software & Data) 7, 6. Bogorny, V., and Alvares, L. O. (2005a) Geographic Data Representation for Knowledge Discovery. Technical Report UFRGS-TR-349, Federal University of Rio Grande do Sul, Porto Alegre, Brazil. Bogorny, V., Engel, P. M. and Alvares, L. O. (2005b) A reuse-based spatial data preparation framework for data mining. To appear in: Proceedings of the SEKE 17 th international Conference on Software Engineering and Knowledge Engineering(SEKE 2005)(July 14-16, 2005). Taiwan, China. Ester M., Kriegel H.-P., Sander J. and Xu X. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the KDD 2 th international conference on Knowledge Discovery and Data Mining (KDD 96)(August 2-4, 1996). AAAI Press, Portland, OR, Ester, M., Frommelt, A., Kriegel, H.-P., and Sander, J. (2000) Spatial data mining: database primitives, algorithms and efficient DBMS support. Journal of Data Mining and Knowledge Discovery, 4, 2-3(Jul. 2000), Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. (1996) Knowledge discovery and data mining: towards a unifying framework. In Proceedings of the KDD 2 th international conference on
8 Knowledge Discovery and Data Mining (KDD 96) (August 2-4, 1996). AAAI Press, Portland, OR, 1996, Han, J., Koperski, K., Stefanvic, N. (1997) GeoMiner: a system prototype for spatial data mining. In Proceedings of the ACM-SIGMOD international conference on Management Of Data (SIGMOD 97) (May 13-15,1997). ACM Press, Tucson, AR, Lazarevic, A., Fiez, T. and Obradovic, Z. (2000) A software system for spatial data analysis and modeling. In proceedings of the HICSS Hawaii International Conference on System Sciences (HICSS 00) (Jan , 2000). IEEE Computer Society Press, Hawaii. Malerba, D., Appice, A. and Vacca N. (2002) SDMOQL: an OQL-based data mining query language for map interpretation tasks. In Proceedings of the DTDM workshop on Database Technologies for Data Mining (DTDM'02)(March 25-27, 2002). Springer, Prague, Czech Republic. Malerba, D., Esposito, F., Lanza, A., Lisi, F.A. and Appice, A. (2003) Empowering a GIS with inductive learning capabilities: the case of INGENS. Journal of Computers, Environment, and Urban Systems, 27, 3 (May 2003), Open GIS Consortium (1999) Open GIS simple features specification for SQL. In URL: Sattler, K. and Schallen, E. (2001) A data preparation framework based on a multi-database language. In Proceedings of the IDEAS 5 th International Database Engineering and Applications Symposium (IDEAS 01)(July 16-18, 2001). IEEE Computer Society, Grenoble, France, 2001, Shekhar, S., Chawla, S. Spatial databases: a tour. Prentice Hall, Upper Saddle River, NJ, Witten, I. and Frank, E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA, 2000.
Weka-GDPM Integrating Classical Data Mining Toolkit to Geographic Information Systems
Weka-GDPM Integrating Classical Data Mining Toolkit to Geographic Information Systems Vania Bogorny, Andrey Tietbohl Palma, Paulo Martins Engel, Luis Otavio Alvares Instituto de Informática Universidade
Spatial Data Mining: From Theory to Practice with Free Software
Spatial Data Mining: From Theory to Practice with Free Software Vania Bogorny 1, Bart Kuijpers 1, Andrey Tietbohl 2, Luis Otavio Alvares 1,2 1 Hasselt University & Transnational University of Limburg Agoralaan
Oracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction
PhoCA: An extensible service-oriented tool for Photo Clustering Analysis
paper:5 PhoCA: An extensible service-oriented tool for Photo Clustering Analysis Yuri A. Lacerda 1,2, Johny M. da Silva 2, Leandro B. Marinho 1, Cláudio de S. Baptista 1 1 Laboratório de Sistemas de Informação
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Introduction to Data Mining
Introduction to Data Mining José Hernández ndez-orallo Dpto.. de Systems Informáticos y Computación Universidad Politécnica de Valencia, Spain [email protected] Horsens, Denmark, 26th September 2005
Context-aware taxi demand hotspots prediction
Int. J. Business Intelligence and Data Mining, Vol. 5, No. 1, 2010 3 Context-aware taxi demand hotspots prediction Han-wen Chang, Yu-chin Tai and Jane Yung-jen Hsu* Department of Computer Science and Information
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Quality Assessment in Spatial Clustering of Data Mining
Quality Assessment in Spatial Clustering of Data Mining Azimi, A. and M.R. Delavar Centre of Excellence in Geomatics Engineering and Disaster Management, Dept. of Surveying and Geomatics Engineering, Engineering
A Spatial Decision Support System for Property Valuation
A Spatial Decision Support System for Property Valuation Katerina Christopoulou, Muki Haklay Department of Geomatic Engineering, University College London, Gower Street, London WC1E 6BT Tel. +44 (0)20
Three Perspectives of Data Mining
Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
LiDDM: A Data Mining System for Linked Data
LiDDM: A Data Mining System for Linked Data Venkata Narasimha Pavan Kappara Indian Institute of Information Technology Allahabad Allahabad, India [email protected] Ryutaro Ichise National Institute of
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING
Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL
DATA MINING - SELECTED TOPICS
DATA MINING - SELECTED TOPICS Peter Brezany Institute for Software Science University of Vienna E-mail : [email protected] 1 MINING SPATIAL DATABASES 2 Spatial Database Systems SDBSs offer spatial
CHAPTER-24 Mining Spatial Databases
CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis
DBTechNet DBTech Pro Workshop Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining Dimitris A. Dervos [email protected] http://aetos.it.teithe.gr/~dad Georgios Evangelidis
Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?
Introduction Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT 695 01/11/2007 Authors objectives: Describe
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Discovering Trajectory Outliers between Regions of Interest
Discovering Trajectory Outliers between Regions of Interest Vitor Cunha Fontes 1, Lucas Andre de Alencar 1, Chiara Renso 2, Vania Bogorny 1 1 Dep. de Informática e Estatística Universidade Federal de Santa
A Brief Tutorial on Database Queries, Data Mining, and OLAP
A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor
Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
GIS Databases With focused on ArcSDE
Linköpings universitet / IDA / Div. for human-centered systems GIS Databases With focused on ArcSDE Imad Abugessaisa [email protected] 20071004 1 GIS and SDBMS Geographical data is spatial data whose
College information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
sensors ISSN 1424-8220 www.mdpi.com/journal/sensors
Sensors 2009, 9, 2320-2333; doi:10.3390/s90402320 OPEN ACCESS sensors ISSN 1424-8220 www.mdpi.com/journal/sensors Article An Integrated Photogrammetric and Spatial Database Management System for Producing
2.1. Data Mining for Biomedical and DNA data analysis
Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: [email protected]) Dr. G.N. Singh Department of Physics and
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare
Syllabus HMI 7437: Data Warehousing and Data/Text Mining for Healthcare 1. Instructor Illhoi Yoo, Ph.D Office: 404 Clark Hall Email: [email protected] Office hours: TBA Classroom: TBA Class hours: TBA
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
Mining an Online Auctions Data Warehouse
Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance
Principles of Data Mining
Principles of Data Mining Instructor: Sargur N. 1 University at Buffalo The State University of New York [email protected] Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data
Data Mining - Introduction
Data Mining - Introduction Peter Brezany Institut für Scientific Computing Universität Wien Tel. 4277 39425 Sprechstunde: Di, 13.00-14.00 Outline Business Intelligence and its components Knowledge discovery
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
Geographic Knowledge Discovery in INGENS: an Inductive Database Perspective
Geographic Knowledge Discovery in INGENS: an Inductive Database Perspective Annalisa Appice Anna Ciampi Antonietta Lanza Donato Malerba Antonella Rapolla Luisa Vetturi Dipartimento di Informatica, Università
Use of Data Mining in the field of Library and Information Science : An Overview
512 Use of Data Mining in the field of Library and Information Science : An Overview Roopesh K Dwivedi R P Bajpai Abstract Data Mining refers to the extraction or Mining knowledge from large amount of
5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2
Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on
Robust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Visual Data Mining with Pixel-oriented Visualization Techniques
Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 [email protected] Abstract Pixel-oriented visualization
Interactive Exploration of Decision Tree Results
Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,[email protected]) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
Scalable Cluster Analysis of Spatial Events
International Workshop on Visual Analytics (2012) K. Matkovic and G. Santucci (Editors) Scalable Cluster Analysis of Spatial Events I. Peca 1, G. Fuchs 1, K. Vrotsou 1,2, N. Andrienko 1 & G. Andrienko
Knowledge Discovery Process and Data Mining - Final remarks
Knowledge Discovery Process and Data Mining - Final remarks Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 14 SE Master Course 2008/2009
Fuzzy Spatial Data Warehouse: A Multidimensional Model
4 Fuzzy Spatial Data Warehouse: A Multidimensional Model Pérez David, Somodevilla María J. and Pineda Ivo H. Facultad de Ciencias de la Computación, BUAP, Mexico 1. Introduction A data warehouse is defined
ADVANCED DATA STRUCTURES FOR SURFACE STORAGE
1Department of Mathematics, Univerzitni Karel, Faculty 22, JANECKA1, 323 of 00, Applied Pilsen, Michal, Sciences, Czech KARA2 Republic University of West Bohemia, ADVANCED DATA STRUCTURES FOR SURFACE STORAGE
KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery
KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery Mario Cannataro 1 and Domenico Talia 2 1 ICAR-CNR 2 DEIS Via P. Bucci, Cubo 41-C University of Calabria 87036 Rende (CS) Via P. Bucci,
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
MINING CLICKSTREAM-BASED DATA CUBES
MINING CLICKSTREAM-BASED DATA CUBES Ronnie Alves and Orlando Belo Departament of Informatics,School of Engineering, University of Minho Campus de Gualtar, 4710-057 Braga, Portugal Email: {alvesrco,obelo}@di.uminho.pt
IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs
IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs Elzbieta Malinowski and Esteban Zimányi Computer & Decision Engineering Department, Université Libre de Bruxelles 50 av.f.d.roosevelt,
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Data Mining and Soft Computing. Francisco Herrera
Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: [email protected] http://sci2s.ugr.es
Knowledge Discovery from Data Bases Proposal for a MAP-I UC
Knowledge Discovery from Data Bases Proposal for a MAP-I UC P. Brazdil 1, João Gama 1, P. Azevedo 2 1 Universidade do Porto; 2 Universidade do Minho; 1 Knowledge Discovery from Data Bases We are deluged
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D.
Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D. Most college courses in statistical analysis and data mining are focus on the mathematical techniques for analyzing data structures, rather
Knowledge Discovery in Data with FIT-Miner
Knowledge Discovery in Data with FIT-Miner Michal Šebek, Martin Hlosta and Jaroslav Zendulka Faculty of Information Technology, Brno University of Technology, Božetěchova 2, Brno {isebek,ihlosta,zendulka}@fit.vutbr.cz
SQL SUPPORTED SPATIAL ANALYSIS FOR WEB-GIS INTRODUCTION
SQL SUPPORTED SPATIAL ANALYSIS FOR WEB-GIS Jun Wang Jie Shan Geomatics Engineering School of Civil Engineering Purdue University 550 Stadium Mall Drive, West Lafayette, IN 47907 ABSTRACT Spatial analysis
A Comparative Study of clustering algorithms Using weka tools
A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering
Data Mining and Business Intelligence CIT-6-DMB. http://blackboard.lsbu.ac.uk. Faculty of Business 2011/2012. Level 6
Data Mining and Business Intelligence CIT-6-DMB http://blackboard.lsbu.ac.uk Faculty of Business 2011/2012 Level 6 Table of Contents 1. Module Details... 3 2. Short Description... 3 3. Aims of the Module...
Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis
Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Rajesh Reddy Muley 1, Sravani Achanta 2, Prof.S.V.Achutha Rao 3 1 pursuing M.Tech(CSE), Vikas College of Engineering and
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS
USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).
A HOLISTIC FRAMEWORK FOR KNOWLEDGE MANAGEMENT
A HOLISTIC FRAMEWORK FOR KNOWLEDGE MANAGEMENT Dr. Shamsul Chowdhury, Roosevelt University, [email protected] ABSTRACT Knowledge management refers to the set of processes developed in an organization
A Web-based Interactive Data Visualization System for Outlier Subspace Analysis
A Web-based Interactive Data Visualization System for Outlier Subspace Analysis Dong Liu, Qigang Gao Computer Science Dalhousie University Halifax, NS, B3H 1W5 Canada [email protected] [email protected] Hai
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Mining Of Spatial Co-location Pattern from Spatial Datasets
Mining Of Spatial Co-location Pattern from Spatial Datasets G.Kiran Kumar P.Premchand T.Venu Gopal Associate Professor Professor Associate Professor Department of CSE Department of CSE Department of CSE
