Introduction to Spatial Data Mining

Similar documents
Introduction to Data Mining

Introduction. A. Bellaachia Page: 1

Information Management course

Classification and Prediction

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

SPATIAL DATA CLASSIFICATION AND DATA MINING

Introduction to Data Mining

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Professor Anita Wasilewska. Classification Lecture Notes

Geointelligence New Opportunities and Research Challenges in Spatial Mining and Business Intelligence

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Data Mining: Foundation, Techniques and Applications

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

not possible or was possible at a high cost for collecting the data.

Data Mining for Knowledge Management. Classification

A Review of Data Mining Techniques

OUTLIER ANALYSIS. Data Mining 1

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

An Overview of Knowledge Discovery Database and Data mining Techniques

Foundations of Artificial Intelligence. Introduction to Data Mining

Data Warehousing and Data Mining. A.A Datawarehousing & Datamining 1

Data Visualization Techniques and Practices Introduction to GIS Technology

Data Mining System, Functionalities and Applications: A Radical Review

Data Warehousing and Data Mining

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining Solutions for the Business Environment

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Data Mining Introduction

Data Mining: An Introduction

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1

Introduction to Data Mining

The Scientific Data Mining Process

Database Marketing, Business Intelligence and Knowledge Discovery

Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?

Spatial Data Analysis

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Social Media Mining. Data Mining Essentials

Data Warehousing and Data Mining

DATA MINING: AN OVERVIEW

DATA MINING TECHNIQUES AND APPLICATIONS

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

An Overview of Database management System, Data warehousing and Data Mining

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Data Mining for Fun and Profit

CHAPTER-24 Mining Spatial Databases

Data Mining Applications in Higher Education

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

GIS: Geographic Information Systems A short introduction

Climate Change: A Local Focus on a Global Issue Newfoundland and Labrador Curriculum Links

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining Part 5. Prediction

Review on Financial Forecasting using Neural Network and Data Mining Technique

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Location Analytics for Financial Services. An Esri White Paper October 2013

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Customer Classification And Prediction Based On Data Mining Technique

2.1. Data Mining for Biomedical and DNA data analysis

Data, Measurements, Features

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

Classification Techniques (1)

Essential Components of an Integrated Data Mining Tool for the Oil & Gas Industry, With an Example Application in the DJ Basin.

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Artificial Neural Network and Non-Linear Regression: A Comparative Study

An Introduction to Data Mining

Chapter 20: Data Analysis

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Predictive Dynamix Inc

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining. Introduction to Modern Information Retrieval from Databases and the Web. Administrivia

Data Mining for Successful Healthcare Organizations

Data Mining for Business Analytics

Data Mining: Overview. What is Data Mining?

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Knowledge Discovery from patents using KMX Text Analytics

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

8. Machine Learning Applied Artificial Intelligence

Pentaho Data Mining Last Modified on January 22, 2007

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

Data mining and official statistics

Transcription:

Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering 7.6 Outlier Detection

Introduction: a classic example for spatial analysis Disease cluster Dr. John Snow Deaths of cholera epidemia London, September 1854 Infected water pump? A good representation is the key to solving a problem 2

Good representation because... Represents spatial relation of objects of the same type Represents spatial relation of objects to other objects Shows only relevant aspects and hides irrelevant It is not only important where a cluster is but also, what else is there (e.g. a water-pump)! 3

Other examples of Spatial Patterns Historic Examples (section 7.1.5, pp. 186) Fluoride and healthy gums near Colorado river Theory of Gondwanaland - continents fit like pieces of a jigsaw puzlle Modern Examples Cancer clusters to investigate environment health hazards Crime hotspots for planning police patrol routes Bald eagles nest on tall trees near open water Nile virus spreading from north east USA to south and west Unusual warming of Pacific ocean (El Nino) affects weather in USA 4

Goals of Spatial Data Mining Identifying spatial patterns Identifying spatial objects that are potential generators of patterns Identifying information relevant for explaining the spatial pattern (and hiding irrelevant information) Presenting the information in a way that is intuitive and supports further analysis 5

What is a Spatial Pattern? What is not a pattern? Random, haphazard, chance, stray, accidental, unexpected Without definite direction, trend, rule, method, design, aim, purpose Accidental - without design, outside regular course of things Casual - absence of pre-arrangement, relatively unimportant Fortuitous - What occurs without known cause What is a Pattern? A frequent arrangement, configuration, composition, regularity A rule, law, method, design, description A major direction, trend, prediction A significant surface irregularity or unevenness 6

What is Spatial Data Mining? Metaphors Mining nuggets of information embedded in large databases Nuggets = interesting, useful, unexpected spatial patterns Mining = looking for nuggets Needle in a haystack Defining Spatial Data Mining Search for spatial patterns Non-trivial search - as automated as possible reduce human effort Interesting, useful and unexpected spatial pattern 7

What is Spatial Data Mining? - 2 Non-trivial search for interesting and unexpected spatial pattern Non-trivial Search Large (e.g. exponential) search space of plausible hypothesis Ex. Asiatic cholera : causes: water, food, air, insects, ; water delivery mechanisms - numerous pumps, rivers, ponds, wells, pipes,... Interesting Useful in certain application domain Ex. Shutting off identified Water pump => saved human life Unexpected Pattern is not common knowledge May provide a new understanding of world Ex. Water pump - Cholera connection lead to the germ theory 8

What is NOT Spatial Data Mining? Simple Querying of Spatial Data Find neighbors of Canada given names and boundaries of all countries Find shortest path from Boston to Houston in a freeway map Search space is not large (not exponential) Testing a hypothesis via a primary data analysis Ex. Female chimpanzee territories are smaller than male territories Search space is not large! SDM: secondary data analysis to generate multiple plausible hypotheses Uninteresting or obvious patterns in spatial data Heavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul, Given that the two cities are 10 miles apart. Common knowledge: Nearby places have similar rainfall Mining of non-spatial data Diaper sales and beer sales are correlated in evenings GPS product buyers are of 3 kinds: outdoors enthusiasts, farmers, technology enthusiasts 9

Why Learn about Spatial Data Mining? Two basic reasons for new work Consideration of use in certain application domains Provide fundamental new understanding Application domains Scale up secondary spatial (statistical) analysis to very large datasets Describe/explain locations of human settlements in last 5000 years Find cancer clusters to locate hazardous environments Prepare land-use maps from satellite imagery Predict habitat suitable for endangered species Find new spatial patterns Find groups of co-located geographic features 10

Why Learn about Spatial Data Mining? - 2 New understanding of geographic processes for Critical questions Ex. How is the health of planet Earth? Ex. Characterize effects of human activity on environment and ecology Ex. Predict effect of El Nino on weather, and economy Traditional approach: manually generate and test hypothesis But, spatial data is growing too fast to analyze manually Satellite imagery, GPS tracks, sensors on highways, Number of possible geographic hypothesis too large to explore manually Large number of geographic features and locations Number of interacting subsets of features grow exponentially Ex. Find tele connections between weather events across ocean and land areas SDM may reduce the set of plausible hypothesis Identify hypothesis supported by the data For further exploration using traditional statistical methods 11

Interactive Exploratory Analysis Choropleth maps showing distribution of variable(s) in space Parallel Coordinate Plot Combining spatial and non-spatial displays Variables selected and manipulated by the user Powerful for lowdimensional dependencies (3-4) Displays dynamically linked Scatter Plot 12

Data Mining: A KDD Process Selection: Obtain data from various sources. Preprocessing: Cleanse data. Transformation: Convert to common format. Transform to new format. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to user in meaningful manner 13

Data Mining: Confluence of Multiple Disciplines Database Systems Statistics Machine Learning Data Mining Visualization Information Theory Algorithms,, other Disciplines 14

Primary Data Mining Tasks Descriptive Modeling Finding a compact description for large dataset Clustering: group objects into groups based on their attributes Association rules: correlate what events are likely to occur together Sequential rules: correlate events ordered in time Trend detection: discovering the most significant changes Predictive Modeling Classification: assign objects into groups by recognizing patterns Regression: forecasting what may happen in the future by mapping a data item to a predicting real-value variable 15

What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Intra-cluster distances are minimized Inter-cluster distances are maximized 16

Clustering Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Clustering Grouping a set of data objects into clusters based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity Example Land use: Identification of areas of similar land use in an earth observation database City-planning: Identifying groups of houses according to their house type, value, and geographical location 17

Association rule Association (correlation and causality) age(x, 20..29 ) ^ income(x, 20..29K ) buys(x, PC ) [support = 2%, confidence = 60%] Association rule mining Finding frequent patterns, associations, correlations among sets of items or objects in transaction databases, relational databases, and other information repositories Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database Motivation: finding regularities in data What products were often purchased together? 18

Example: Association rule Transaction-id Items bought Itemset A1,A2={a 1,, a k } 10 20 30 40 a1,a2, a3 a1, a3 a1, a4 a2, a5, a6 Find all the rules A1 A2 with min confidence and support support, s, probability that a transaction contains A1 A2 confidence, c, conditional probability that a transaction Let min_support = 50%, having A1 also contains A2. min_conf = 50%: a1 a3 (50%, 66.7%) a3 a1 (50%, 100%) 19

Deviation Detection Outlier: a data object that does not comply with the general behavior of the data It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis Trend and evolution analysis Trend and deviation: regression analysis Periodicity analysis Similarity-based analysis 20

Classification and Regression Classification: constructs a model (classifier) based on the training set and uses it in classifying new data Example: Climate Classification, Regression: models continuous-valued functions, i.e., predicts unknown or missing values Example: stock trends prediction, 21

Classification (1): Model Construction Classification Algorithms Training Data NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classifier (Model) IF rank = professor OR years > 6 THEN tenured = yes 22

Classification (2): Prediction Using the Model Classifier Testing Data Unseen Data (Jeff, Professor, 4) NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes Tenured? 23

Classification Techniques Decision Tree Induction Bayesian Classification Neural Networks Genetic Algorithms Fuzzy Set and Logic 24

Regression Regression is similar to classification First, construct a model Second, use model to predict unknown value Methods Linear and multiple regression Non-linear regression Regression is different from classification Classification refers to predict categorical class label Regression models continuous-valued functions 25

Spatial Data Mining Spatial Patterns Hotspots, Clustering, trends, Spatial outliers Location prediction Associations, co-locations Primary Tasks Spatial Data Clustering Analysis Spatial Outlier Analysis Mining Spatial Association Rules Spatial Classification and Prediction Example: Unusual warming of Pacific ocean (El Nino) affects weather in USA 26

Spatial Data Mining Spatial data mining follows along the same functions in data mining, with the end objective to find patterns in geography, meteorology, etc. The main difference: spatial autocorrelation the neighbors of a spatial object may have an influence on it and therefore have to be considered as well Spatial attributes Topological adjacency or inclusion information Geometric position (longitude/latitude), area, perimeter, boundary polygon 27

Example What Kind of Houses Are Highly Valued? Associative Classification 28

Example: Location Prediction Question addressed Where will a phenomenon occur? Which spatial events are predictable? How can a spatial events be predicted from other spatial events? Equations, rules, other methods, Examples: Where will an endangered bird nest? Which areas are prone to fire given maps of vegetation, draught, etc.? What should be recommended to a traveler in a given location? 29

Example: Spatial Interactions Question addressed Which spatial events are related to each other? Which spatial phenomena depend on other phenomenon? Examples: Exercise: List two interaction patterns. 30

Example: Hot spots Question addressed Is a phenomenon spatially clustered? Which spatial entities or clusters are unusual? Which spatial entities share common characteristics? Examples: Cancer clusters [CDC] to launch investigations Crime hot spots to plan police patrols Defining unusual Comparison group: neighborhood entire population Significance: probability of being unusual is high 31