Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?



Similar documents
CHAPTER-24 Mining Spatial Databases

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

The Art of Spatial Data Mining A Review of A New Algorithm for Discovery of Spatial Association Rules

Introduction to Data Mining

(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.

DATA MINING - SELECTED TOPICS

Spatial Data Warehouse and Mining. Rajiv Gandhi

An Overview of Knowledge Discovery Database and Data mining Techniques

Fuzzy Spatial Data Warehouse: A Multidimensional Model

Environmental Remote Sensing GEOG 2021

Oracle8i Spatial: Experiences with Extensible Databases

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

A Spatial Decision Support System for Property Valuation

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Introduction. A. Bellaachia Page: 1

Protein Protein Interaction Networks

Classification and Prediction

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

DATA mining in general is the search for hidden patterns

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Spatial Data Mining Methods and Problems

Building Data Cubes and Mining Them. Jelena Jovanovic

International Journal of Advance Research in Computer Science and Management Studies

INTEGRATING GIS AND SPATIAL DATA MINING TECHNIQUE FOR TARGET MARKETING OF UNIVERSITY COURSES

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

Cluster Analysis: Advanced Concepts

Information Management course

DATA MINING TECHNIQUES AND APPLICATIONS

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Introduction to Data Mining

SPATIAL DATA CLASSIFICATION AND DATA MINING

A Survey on Association Rule Mining in Market Basket Analysis

Issues in Information Systems Volume 16, Issue IV, pp , 2015

Calculation of Minimum Distances. Minimum Distance to Means. Σi i = 1

Data Mining Fundamentals

A Survey of Classification Techniques in the Area of Big Data.

Mining Mobile Group Patterns: A Trajectory-Based Approach

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

Principles of Data Mining by Hand&Mannila&Smyth

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

Buffer Operations in GIS

Unsupervised Data Mining (Clustering)

Chapter 7. Cluster Analysis

Rule based Classification of BSE Stock Data with Data Mining

An Introduction to Cluster Analysis for Data Mining

Spatial Data Preparation for Knowledge Discovery

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Clustering Data Streams

Data Warehousing und Data Mining

Neural Networks Lesson 5 - Cluster Analysis

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Week 3 lecture slides

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Quality Assessment in Spatial Clustering of Data Mining

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

New Approach of Computing Data Cubes in Data Warehousing

International Journal of Advance Research in Computer Science and Management Studies

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

ICT Perspectives on Big Data: Well Sorted Materials

The STC for Event Analysis: Scalability Issues

Interactive Data Mining and Visualization

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Topics in basic DBMS course

Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010

Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals

Linköpings Universitet - ITN TNM DBSCAN. A Density-Based Spatial Clustering of Application with Noise

Cluster Analysis: Basic Concepts and Methods

How To Cluster

Keywords: Mobility Prediction, Location Prediction, Data Mining etc

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

College information system research based on data mining

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Transcription:

Introduction Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT 695 01/11/2007 Authors objectives: Describe and critique existing spatial data mining methods Give readers a general perspective of the field s current state Make suggestions for future directions and growth potential of spatial data mining Introduction Spatial Data Mining: Definition My objectives: Summarize the paper s description of the state of spatial data mining in 1996. Examine the predictions for future directions made by these authors. Briefly examine the accuracy of these predictions by doing a topic search on spatial data mining research from 1997 to 2007. Spatial data mining, or knowledge discovery in spatial database, refers to the extraction of implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial databases. (Koperski and Han, 1995) Data mining, or knowledge discovery in databases, refers to the discovery of interesting, implicit, and previously unknown knowledge from large databases. (Frawley et al, 1992) WHAT S THE DIFFERENCE?

What is spatial data and why is mining it different that mining normal data? Spatial Data? Object Where What DATA = The (WHAT) dimension. an attribute of an object. Milk Milk In fridge (X 1,Y 1 ) On table (X 2,Y 2 ) Is cold Is warm SPATIAL DATA = (WHERE) & (WHAT) Attribute data referenced to a specific location. The Attributes of spatial objects are: - Highly dependant on location - Often influenced by neighboring objects SPATIAL DATA MINING: THE FRIDGE IS BROKEN! Object Where What Milk In fridge (X 1,Y 1 ) Is warm Yogurt In fridge (X 1,Y 1 ) Is warm Butter In fridge (X 1,Y 1 ) Is warm Object() : Location() -> Characteristic() (Milk) : (In Fridge) -> [should be] -> (Cold) Why do Spatial Data Mining? To understand spatial data To discover relationships between spatial and non spatial data To capture the general characteristics in a concise way To build spatial knowledge-bases Critical Challenge in Spatial Data Mining SD mining algorithms must efficiently overcome: The huge volume of spatial data The complexity of spatial data types/structures The complexity of spatial accessing/query methods Expensive spatial processing operations HUGE S-DB Object = Where is: Citizen{Brad} GO! www.jupiterimages.com Which highways cross Nat n Park boundaries? = spatial JOIN

Spatial Data Mining Methods Generalization-Based Knowledge Discovery 1. Generalization Based Knowledge Discovery 2. Clustering Methods 3. Aggregate Proximity Measuring 4. Spatial Association Rules Requires background knowledge of dataset, presented as concept hierarchies Developing these hierarchies conceptually similar to Hierarchical Clustering Hierarchies are either based on spatial attributes or non-spatial attributes. Queries about data can then be made at levels of generalization in the hierarchy. Concept Hierarchies Non-spatial attribute hierarchy: Spatial attribute hierarchy: Agricultural land use Agricultural Land Size Divisions Dominion Land Survey Level of generalization Township (36 mile sq.) Section (1 mile sq.) Quarter (.25 mile sq.) Queries can be made at different levels of Generalization Spatial vs Non-spatial Dominant Generalization Spatial objects (individual regions) are generalized (merged) until the desired zone size is reached. Non-spatial data falling into these zones is then generalized and reported on Discrete precipitation values generalized into 7 classes. Spatial data were classified based on their fit to these precipitation attribute classes. Goal: produce high-level descriptions of the data. Thematic maps are effective

Clustering Methods Goal: like Generalization, to reveal relationships between spatial and non-spatial attributes Techniques used are based on some clustering methods we examined in class: PAM (k medoids clustering) CLARA (k-medoids, where medoids are chosen from a sample of a large DB) CLARANS (mixture of PAM and CLARA, where new k-medoids are tested from new random samples) 2 spatial data mining variations of CLARANS: SD(CLARANS): spatial dominant approach NSD(CLARANS): non-spatial dominant approach SD(CLARANS) 1. The spatial components of objects in the dataset are collected and clustered using CLARANS 2. Non-spatial description of the objects are brought into the resulting clusters. 3. Result: each cluster (defined by spatial boundaries) is described by it s relative abundance of non-spatial attributes. NSD(CLARANS) 1. Non-spatial attribute generalization produces k generalized attribute groups 2. The spatial components are clustered using CLARANS to find k clusters. 3. If these spatial clusters overlap, they may be merged, and their attribute descriptions merged as well. 4. Result: each cluster is described by a single attribute description SD(CLARANS): Example Spatial Cluster: Downtown Edmonton Attributes: (Building types) 50% Commercial, 40% Residential, 10% Public Service

NSD(CLARANS): Example Clustering Methods: Improvements Attribute Cluster (Building types): Mostly Industrial Spatial Cluster: Region East of 50St & South of HW#16 & North of Whitemud & CLARANS is inefficient at calculating total distance between clusterings. Integrate with more efficient spatial access methods developed for Spatial Databases Use cluster focusing to increase efficiency in finding locally optimized clusters: Information about sub-clusters is summarized in tree structures Sub-clusters (or nodes in the sub-cluster tree) can be tested when selecting a new medoid (center) during the recursive process of optimizing clusters (mixture of Hierarchical Clustering and Partition Clustering) Aggregate Proximity Measuring Clustering methods are effective at finding where groups of data are in a spatial DB BUT it s often more interesting to know WHY the clusters are there. Aggregate Proximity Measuring Calculates the distance between a feature and the set of points in a cluster. This may reveal how outside features influence the cluster. Cluster centered at X 1,Y 1 Cluster centered at X 2,Y 2 45% of the objects in Clusters & are close to feature{river} Cluster centered at X 1,Y 1 Cluster centered at X 2,Y 2 45% of the objects in Clusters & are close to feature{river}

Aggregate Proximity Measuring How? Mining Spatial Association Rules CRH Algorithm (Circle, Rectangle, convex Hull) Input: any number of local map features AND one cluster Uses filters of various geometries to find the characteristics of a cluster wrt nearby features The C and R filters prune candidate features, and only promising features are sent to the H filter. The H filter builds more accurate buffers around the selected features. Aggregate proximity is calculated between the points in the cluster and the buffers around the features of interest. Output: list of features with smallest aggregate proximity to points in the cluster, and percentages of points located within a defined distance from features. Generalization and Clustering characterize spatial objects based on their non-spatial attributes. Spatial Association Rules are required to associate spatial objects with other spatial objects. Examples from paper: is_a(x,school) -> close_to(x, park)(80%) Reads: 80% of schools are close to parks. Suggested predicates for spatial association rules: topological relations: intersects, overlap, disjoint spatial orientations: left_of, west_of, etc. distance information: close_to, far_from, etc min support and min confidence are required to filter out infrequent and weak rules. Multi-level Mining of Spatial Association Rules Query: describe a set of objects using relations to other objects. is_a(x,school) -> adjacent_to(x, park) High Level mining: Use coarse spatial predicates such as g_close_to (generalized close to) Spatial objects satisfy g_close_to if the distance between their Minimum Bounding Rectangles is less than a threshold. Low Level mining: More detailed, accurate predicates are used to build rules with objects which pass through from High Level. Apriori rational: if a pattern is not large at High Level, it will not be large at Low Level This minimizes expensive spatial computations Predictions, circa 1996 The variety of yet unexplored topics and problems makes knowledge discovery in spatial databases an attractive and challenging research field. Identified Future Directions for spatial data mining: Data Mining in Spatial Object Oriented DB Alternative Clustering Techniques: Clustering overlapping objects, Fuzzy Clustering of Spatial Data Mining under uncertainty: Evidential reasoning, Fuzzy sets approaches Spatial Data Deviation and Evolution Rules: Rule application to data that changes over time Interleaved Generalization (spatial and non-spatial) Generalization of Temporal Spatial Data (data evolution) Parallel Data Mining (multi-processor systems) Spatial Data Mining Query Language Multidimensional Rule Visualization and Multiple Thematic Maps

Prediction Accuracy, 10 years later Conclusion Topic DM in Spatial Obj-Oriented DB Alternative / Fuzzy Spatial Clustering Mining under uncertainty Deviation / Evolution Rules Interleaved Generalization Generalization of Temporal Spatial Data Parallel Data Mining Spatial Data Mining Query Language Visualization Topics Currently Active Research Field Yes Yes Yes esp. Robo nav Yes mixed topics Vague Yes merged Yes: GeoMiner, GMQL, Spatial SQL Yes esp. GIS References 1996, 1991, 1994, 1996, 1997 (h&k), Data Mining / Knowledge Discovery of Spatial Data is a large, active research area. While it was a young field at the time this survey paper was written, it is quickly maturing in applications such as: Geographic Information Systems Medical Imaging Robotics Navigation References Krzysztof Koperski, Junas Adhikary, Jiawei Han. Spatial Data Mining: Progress and Challenges Survey Paper. Workshop on Research Issues on Data Mining and Knowledge Discovery, 1996 K. Koperski and J. Han. Discovery of spatial association rules in geographic information databases. In Proc. 4th Int'l Symp. on Large Spatial Databases (SSD'95), pages 47--66, Portland, Maine, Aug. 1995. W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus. Knowledge discovery in databases: An overview. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 1--27. Roddick, Hornsby, and Spiliopoulou. An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research, TSDM2000, LNAI 2007, pp. 147 163, 2001.c Springer-Verlag Berlin Heidelberg 2001