Spatial Data Mining for Customer Segmentation



Similar documents
Geointelligence New Opportunities and Research Challenges in Spatial Mining and Business Intelligence

Introduction to Spatial Data Mining

Oracle8i Spatial: Experiences with Extensible Databases

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Visualization Techniques and Practices Introduction to GIS Technology

Spatial Data Preparation for Knowledge Discovery

Spatial Data Preparation for Knowledge Discovery

Spatial Data Analysis

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

1 File Processing Systems

from Larson Text By Susan Miertschin

CHAPTER-24 Mining Spatial Databases

An Overview of Database management System, Data warehousing and Data Mining

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Weka-GDPM Integrating Classical Data Mining Toolkit to Geographic Information Systems

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

Fluency With Information Technology CSE100/IMT100

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Exploratory Data Analysis for Ecological Modelling and Decision Support

Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D.

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Basics on Geodatabases

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

USING SPATIAL DATA MINING TO DISCOVER THE HIDDEN RULES IN THE CRIME DATA

Model Deployment. Dr. Saed Sayad. University of Toronto

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

CSE 132A. Database Systems Principles

SPATIAL DATA CLASSIFICATION AND DATA MINING

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Information Management

Spatial Data Mining: From Theory to Practice with Free Software

life science data mining

Mining Model Trees from Spatial Data

Data Mining Solutions for the Business Environment

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Knowledge-Based Visualization to Support Spatial Data Mining

Explorable Visual Analytics (EVA) Interactive Exploration of LEHD. Saman Amraii - Amir Yahyavi Carnegie Mellon University

Oracle Database 10g: Building GIS Applications Using the Oracle Spatial Network Data Model. An Oracle Technical White Paper May 2005

1. Visual Paradigm for UML

Easily add Maps and Geo Analytics in MicroStrategy

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

OBIEE 11g Analytics Using EMC Greenplum Database

ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD

Oracle BI and Geo-Spatial Big Data

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Mining Text Data: An Introduction

Distance Learning and Examining Systems

Utilizing spatial information systems for non-spatial-data analysis

Introduction to Data Mining

Introduction to the ArcGIS Data Model and Application Structure

Geodatabase Programming with SQL

HELSINKI UNIVERSITY OF TECHNOLOGY T Enterprise Systems Integration, Data warehousing and Data mining: an Introduction

Introduction to PostGIS

CUSTOMER Presentation of SAP Predictive Analytics

Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?

Practical applications of Predictive Modelling Overview of the process, the techniques and the applications

Business Intelligence and Process Modelling

Chapter 6 - Enhancing Business Intelligence Using Information Systems

Introduction. A. Bellaachia Page: 1

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

Tutorial for proteome data analysis using the Perseus software platform

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Data Mining as Part of Knowledge Discovery in Databases (KDD)

High Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University

Data W a Ware r house house and and OLAP II Week 6 1

Contents WEKA Microsoft SQL Database

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Building Cubes and Analyzing Data using Oracle OLAP 11g

EXPLORING SPATIAL PATTERNS IN YOUR DATA

The Scientific Data Mining Process

How graph databases started the multi-model revolution

DATA MINING - SELECTED TOPICS

The basic data mining algorithms introduced may be enhanced in a number of ways.

10. Creating and Maintaining Geographic Databases. Learning objectives. Keywords and concepts. Overview. Definitions

Introduction to Object-Oriented and Object-Relational Database Systems

Oracle Platform GIS & Location-Based Services. Fred Louis Solution Architect Ohio Valley

TOWARD A DISTRIBUTED DATA MINING SYSTEM FOR TOURISM INDUSTRY

Pentaho Data Mining Last Modified on January 22, 2007

SQL SUPPORTED SPATIAL ANALYSIS FOR WEB-GIS INTRODUCTION

Spatial data models (types) Not taught yet

Contents RELATIONAL DATABASES

MetroBoston DataCommon Training

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

ArcGIS 10.1 Geodatabase Administration. Gordon Sumerling & Christopher Brown

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

A quick overview of geographic information systems (GIS) Uwe Deichmann, DECRG

How To Create A Data Visualization

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

Crime Mapping Methods. Assigning Spatial Locations to Events (Address Matching or Geocoding)

NNMi120 Network Node Manager i Software 9.x Essentials

Reporting. Understanding Advanced Reporting Features for Managers

Transcription:

Spatial Data Mining for Customer Segmentation Data Mining in Practice Seminar, Dortmund, 2003 Dr. Michael May Fraunhofer Institut Autonome Intelligente Systeme Spatial Data Mining, Michael May, Fraunhofer AIS 1

Introduction: a classic example for spatial analysis Disease cluster Dr. John Snow Deaths of cholera epidemia London, September 1854 Infected water pump? A good representation is the key to solving a problem Spatial Data Mining, Michael May, Fraunhofer AIS 2

Good representation because... Represents spatial relation of objects of the same type Represents spatial relation of objects to other objects Shows only relevant aspects and hides irrelevant It is not only important where a cluster is but also, what else is there (e.g. a water-pump)! Spatial Data Mining, Michael May, Fraunhofer AIS 3

Goals of Spatial Data Mining Identifying spatial patterns Identifying spatial objects that are potential generators of patterns Identifying information relevant for explaining the spatial pattern (and hiding irrelevant information) Presenting the information in a way that is intuitive and supports further analysis Spatial Data Mining, Michael May, Fraunhofer AIS 4

Approach to Spatial Knowledge Discovery Data Mining p n 0 (1 p0 ) ( p p ) 0 + Geographic Information Systems = SPIN! Spatial Data Mining, Michael May, Fraunhofer AIS 5

UK, Greater Manchester, Stockport Streets Buildings Rivers Hospitals Person p. Household No. of Cars Long-term illness Age Profession Ethnic group Unemployment Education Migrants Medical establishment Shopping areas... Spatial Data Mining, Michael May, Fraunhofer AIS 6

Representation of spatial data in Oracle Spatial A set of relations R 1,...,R n such that each relation R i has a geometry attribute G i or an identifier A i such that R i can be linked (joined) to a relation R k having a geometry attribute G k Geometry attributes G i consist of ordered sets of x,y-pairs defining points, lines, or polygons Different types of spatial objects are organized in different relations R i (geographic layers), e.g. streets, rivers, enumeration districts, buidlings, and each layer can have its own set of attributes A 1,..., A n and at most one geometry attribute G Spatial Data Mining, Michael May, Fraunhofer AIS 7

Stockport Database Schema River Geographical Layers 85 tables Water Building spatially interacts spatially interacts Shopping Region inside Street spatially interacts spatially interact ED Vegetation =zone_id spatially interact =zone_id =zone_id TAB01 TAB61 TAB95 Spatial Data Mining, Michael May, Fraunhofer AIS 8...... Attribute data 95 tables with census data, ~8000 attributes Spatial Hierarchy County District Wards Enumeration district

Spatial Predicates in Oracle Spatial Topological relation (Egenhofer 1991) A disjoint B, B disjoint A A meets B, B meets A A overlaps B, B overlaps A A equals B, B equals A A covers B, B covered by A A covered-by B, B covers A A contains B, B inside A A inside B, B contains A Distance relation: Minimum distance between 2 points Spatial Data Mining, Michael May, Fraunhofer AIS 9

Typical Data Mining representation spreadsheet data exactly 1 table atomic values Data Mining for spatial data: strong discrepancy between usual and adequate problem representation Spatial Data Mining, Michael May, Fraunhofer AIS 10

SPIN! The Elements p n 0 (1 p0 ) ( p p ) 0 Spatial Data Mining, Michael May, Fraunhofer AIS 11

1. Spatial Data Mining Platform Spatial Data Mining, Michael May, Fraunhofer AIS 12

Providing an integrated data mining platform Data access to heterogeneous and distributed data sources (Oracle RDBMS, flat file, spatial data) Organizing and documenting analysis tasks Launching analysis tasks Visualizing results Note: Same software basis as MiningMart! Spatial Data Mining, Michael May, Fraunhofer AIS 13

SPIN! Architecture: Enterprise Java Bean-based Java Swing based Client Client Visual Component Workspace Algorithm Component JDBC (Connections) RMI/IIOP (References) JBoss application server Data Algorithm Session Bean Client Entity Bean Workspace Entity Bean Persistent object Database Enterprise Java Bean Container Database Object-relational spatial database (Oracle9i) Spatial Data Mining, Michael May, Fraunhofer AIS 14

SPIN! User Interface Point & Click- Tool for defining analysis tasks Workspace Tree Property editor Spatial Data Mining, Michael May, Fraunhofer AIS 15

2. Visual Exploratory Analyis Spatial Data Mining, Michael May, Fraunhofer AIS 16

Interactive Exploratory Analysis Choropleth maps showing distribution of variable(s) in space Parallel Coordinate Plot Combining spatial and non-spatial displays Variables selected and manipulated by the user Powerful for lowdimensional dependencies (3-4) Displays dynamically linked Scatter Plot Spatial Data Mining, Michael May, Fraunhofer AIS 17

3. Searching for Explanatory Patterns Spatial Data Mining, Michael May, Fraunhofer AIS 18

Data Mining Tasks in SPIN! Looking for associations between subsets of spatial and non-spatial attributes Spatial Association Rules A phenomenon of interest (e.g. death rate) is given but it is not clear which of a large number of spatial and non-spatial attributes is relevant for explaining it Spatial Subgroup Discovery A quantitative variable of interest is given and we ask how much this variable changes when one of the relevant independent variables is changed Bayesian Local regression Spatial Data Mining, Michael May, Fraunhofer AIS 19

Subgroup Discovery Search Subgroup discovery is a multi-relational approachthat searches for probabilistically defined deviation patterns (Klösgen 1996, Wrobel 1997) Top-down search search from most general to most specific subgroups, exploiting partial ordering of subgroups (S 1 S 2 S 1 more general than S 2) Beam search expanding only the n best ones at each level of search Evaluating hypothesis according to quality function: T= target group C= concept p( T C) p( T )(1 p( T) p( T)) n N N n T = long-term illness=high C = unemployment=high Spatial Data Mining, Michael May, Fraunhofer AIS 20

Division of labour between Oracle RDBMS and Search Manager mining query Database Server Search Algorithm Database integration: efficiently organize mining queries sufficient statistics Mining query delivers statistics (aggregations) sufficient for evaluating many hypotheses Mining Server search in hypothesis space generation and evaluation of hypotheses (subgroup patterns) Spatial Data Mining, Michael May, Fraunhofer AIS 21

Data Mining visualization High long-term illness in districts crossed by M60 p(t C) vs. p(c) Subgroup Overview Spatial Venn Diagram Subgroup Linked Display Spatial Data Mining, Michael May, Fraunhofer AIS 22

Customer Analysis Rodgau, Germany Spatial Data Mining, Michael May, Fraunhofer AIS 23

System Demo: Customer Analysis using MiningMart and SPIN! Spatial Data Mining, Michael May, Fraunhofer AIS 24

Summary & Outlook SPIN! tightly integrates DataMining analysis and GIS-based visualization Main features: A spatial data mining platform New spatial data mining algortihms for subgroup discovery, association rules, Baysian MCMC New visualization methods Integration of Spatial Dataallows to get results that could not be achieved otherwise MiningMart can usefully applied for some pre-processing tasks Future tasks: Integrating spatial preprocessing in MiningMart Spatial Data Mining, Michael May, Fraunhofer AIS 25