Geointelligence New Opportunities and Research Challenges in Spatial Mining and Business Intelligence



Similar documents
Introduction to Spatial Data Mining

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

Data Visualization Techniques and Practices Introduction to GIS Technology

Spatial Data Analysis

CHAPTER-24 Mining Spatial Databases

Exploratory Data Analysis for Ecological Modelling and Decision Support

Spatial Data Mining Methods and Problems

A HYBRID APPROACH FOR AUTOMATED AREA AGGREGATION

Tutorial on Geographic and Spatial Data Mining

Introduction to GIS (Basics, Data, Analysis) & Case Studies. 13 th May Content. What is GIS?

Tracking System for GPS Devices and Mining of Spatial Data

Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure

Deep Insights Smart Decisions Motionlogic

The STC for Event Analysis: Scalability Issues

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

SPATIAL DATA CLASSIFICATION AND DATA MINING

Short-Term Forecasting in Retail Energy Markets

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Crime Mapping Methods. Assigning Spatial Locations to Events (Address Matching or Geocoding)

Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?

MOBILITY DATA MODELING AND REPRESENTATION

Introduction to Data Mining

Geocoding in Law Enforcement Final Report

Chapter 6. The stacking ensemble approach

Technology and Trends for Smarter Business Analytics

Prediction of Stock Performance Using Analytical Techniques

Alison Hayes November 30, 2005 NRS 509. Crime Mapping OVERVIEW

Big Data Analytics in Mobile Environments

Fuzzy Spatial Data Warehouse: A Multidimensional Model

Chapter ML:XI. XI. Cluster Analysis

Location tracking: technology, methodology and applications

Company Profile.

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Information Visualization WS 2013/14 11 Visual Analytics

Buffer Operations in GIS

Customer Analytics. Turn Big Data into Big Value

Big Data in Transportation Engineering

The Scientific Data Mining Process

ESRI Business Analyst for Telecommunications

Easily Identify Your Best Customers

Data Mining Solutions for the Business Environment

Applications of Dynamic Representation Technologies in Multimedia Electronic Map

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

May 2012 Oracle Spatial User Conference

Easily add Maps and Geo Analytics in MicroStrategy

NetView 360 Product Description

Customer Classification And Prediction Based On Data Mining Technique

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

BI Tools and Data Flow

Oracle Spatial and Graph. Jayant Sharma Director, Product Management

NHPSS An Automated OTC Pharmaceutical Sales Surveillance System

Strengthening Diverse Retail Business Processes with Forecasting: Practical Application of Forecasting Across the Retail Enterprise

CARTOGRAPHIC VISUALIZATION FOR SPATIAL ANALYSIS. Jason Dykes Department of Geography, University of Leicester, Leicester, LE2 ITF, U.K.

DATA QUALITY IN GIS TERMINOLGY GIS11

A quick overview of geographic information systems (GIS) Uwe Deichmann, DECRG

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

MEng, BSc Computer Science with Artificial Intelligence

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Data Mining & Data Stream Mining Open Source Tools

Exploratory Spatial Data Analysis

ICT Perspectives on Big Data: Well Sorted Materials

Data Preprocessing. Week 2

Sanjeev Kumar. contribute

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

NTT DATA Big Data Reference Architecture Ver. 1.0

Course Syllabus For Operations Management. Management Information Systems

Software for Supply Chain Design and Analysis

Data-Driven Optimization

Lesson 15 - Fill Cells Plugin

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

How To Understand The History Of Navigation In French Marine Science

PROGRAM DIRECTOR: Arthur O Connor Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

Oracle Big Data Spatial and Graph

Data Mining Analytics for Business Intelligence and Decision Support

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

MEng, BSc Applied Computer Science

Are You Ready for Big Data?

Numerical Algorithms Group

Transcription:

Geointelligence New Opportunities and Research Challenges in Spatial Mining and Business Intelligence Stefan Wrobel Christine Körner, Michael May, Hans Voss

Fraunhofer Society Joseph von Fraunhofer, German physicist and entrepreneur Fraunhofer mission: - do state-of-the-art research and use it in challenging customer projects - Funding is 33% research grants, 33% customer projects, 33% institutional funding 57 institutes, 40 locations, 12.000 employees, 1 bill. annual volume Best-known invention: MP3 2

Fraunhofer IAIS: Intelligent Analysis- and Information Systems From sensor data to business intelligence, from media analysis to visual information systems: Our research allows companies to do more with data New name, long-standing experience - Founded in 2006 as a merger of the Fraunhofer institutes AIS and IMK 230 people: scientists, project engineers, technical and administrative staff Located on Fraunhofer Campus Schloss Birlinghoven/Bonn Joint research groups and cooperation with Univ. Bonn 3

Fraunhofer IAIS: research and projects Core research areas: Machine learning and adaptive systems Data Mining and Business Intelligence Automated media analysis Interactive access and exploration Autonomous systems 4

Outline Introduction to spatial data mining - Project example: Geomarketing The spatial data mining process - Project example: Outdoor media reach estimation The importance of data - Project example: Customer selection in the gas industry Spatial mining tools and visual analytics - CommonGIS and SPIN! Research challenge: track data - Project example: SPR - GeoPKDD 5

Outline Introduction to spatial data mining - Project example: Geomarketing The spatial data mining process - Project example: Outdoor media reach estimation The importance of data - Project example: Customer selection in the gas industry Spatial mining tools and visual analytics - CommonGIS and SPIN! Research challenge track data - Project example: SPR - GeoPKDD 6

Why Spatial Data Mining now? Almost all data are (or can be) spatially referenced Almost all database and business intelligence systems handle spatial data New data sources push the topic - Satellite data (GPS, Galileo) - Toll collection data - Mobile phone data - RFID - GoogleEarth etc. Spatial Data Mining combines statistics, machine learning, databases, visualization with spatial data 7

A classic example for spatial analysis Disease cluster Dr. John Snow Deaths of cholera epidemia London, September 1854 Infected water pump? 8

Goals of Spatial Data Mining Identifying spatial patterns Identifying spatial objects that are potential generators of patterns Identifying information relevant for explaining the spatial pattern (and hiding irrelevant information) Presenting the information in a way that is intuitive and supports further analysis 9

Spatial Data point objects - located at x, y, (z) coordinates area objects - suitable area description (circle, polygon, path boundary) fields - quantity assumed continuously defined in 2 D or 3 D (+ time!) - e. g. temperature 10

Example: without spatial attributes 11

Example: with spatial attributes 12

Handling spatial data treat as ordinary variables no special algorithms needed spatial properties ignored, e. g. discontiguous areas make spatial relationships explicit e. g. infer topological relationship expensive, but allows normal algorithms to be used specialized algorithms - Neighborhood methods, kriging, Gaussian processes, density-based clustering Use proper combination of data, preprocessing, algorithms, and interaction software! 13

Outline Introduction to spatial data mining - Project example: Geomarketing The spatial data mining process - Project example: Outdoor media reach estimation The importance of data - Project example: Customer selection in the gas industry Spatial mining tools and visual analytics - CommonGIS and SPIN! Research challenge track data - Project example: SPR - GeoPKDD 14

Project example: Outdoor Advertising Reach - Frequency Atlas Customer: Fachverband für Außenwerbung (FAW; Outdoor Advertising Association) Task: Performance value assessment of advertising media Traffic volume forecast separate for private cars, public transport, pedestrians Spatial data mining, active learning procedures 15

Determining reach of a poster board Gesellschaft für Konsumforschung Frequency + Media factories = poster reach 16

The project in numbers Complete model for all German cities with more than 50.000 inhabitants (192 cities) = ca 1.000.000 street segments! Complete model includes, for each segment, item - car frequency - pedestrian frequency - public transport frequency The model is presently beeing extended to to all cities with between 20.000 and 50.000 inhabitants 17

Basic Data: traffic measurements Manual traffic measurement at selected poster locations - 4 times 6 minutes at four days of the week at four times of day Additional empirical model of day totals Properties - Well defined measurements - Distribution of measurements tries to avoid systematic bias - Extended measurement period, so conceptdrift can not be excluded Total of 96.000 manual measurements 18

Secondary data Street network Soxiodemographics + Socioeconomics Points of Interest (POI) Frequency measurements Public transport network DATA MINING 0 200 400 600 800 1000 1250 1500 1750 2000... Frequency classes 19

Smoothing based on flow constraints Measurement errors lead to inconsistencies Need plausible assignment of frequencies Solution: Use Kirchhoff s law as constraint - Sum of inputs = sum of outputs Smoothing algorithm finds locally optimal solution using constraint relaxation 20

Numerical prediction with model trees ORTSTEIL = INNENSTADT (LR)... Fussgängerzone: Nein Ja Straßenkategorie: Nebenstr. Hauptstr. Bahnhof Nein Ja Distanz_zu_Bahnhof: <= 150 > 150 Anzahl_Restaurants : <= 5 > 5 Anzahl_Restaurants : <= 15 > 15 X-Koordinate <= 52.385 > 52.385 Y-Koordinate LM1 LM2 LM3 LM4 LM5 <= 9.6 > 9.6 LM1 FREQUENZ = 2277.3186 * X + 75.4087 * ANZAHL_EINKAUF + -142.4217 * MESSE + -21221.8497 LM6 21

Final result: frequency atlas (cars, public transport, pedestrians) ~1 ~1Million Millionstreet streetsegments segments predicted based on predicted based on96.000 96.000 measurements measurements Accuracy Accuracyincreased increasedtwofold twofold 22

Outline Introduction to spatial data mining - Project example: Geomarketing The spatial data mining process - Project example: Outdoor media reach estimation The importance of data - Project example: Customer selection in the gas industry Spatial mining tools and visual analytics - CommonGIS and SPIN! Research challenge track data - Project example: SPR - GeoPKDD 23

Project example: New customer acquisition for gas supplier Given - nationwide address data with consumer and group data - Response data from original calling campaign To be determined - Nationwide addresses with a high probability of customer interest in a sales representative visit Regional address Interest in visit Nation wide address Consumer attributes Group attributes Interest in visit... yes.........??? 24

Project example: New customer aquisition for gas supplier 1. Use addresses to transfer consumer and group attributes to the regional sample 2. Construct a model for interest in visits based on the enhanced regional sample 3. Apply the model to the nation wide address data Regional address Interest in visit Consumer attributes Group attributes Nation wide Address Consumer attributes Group attributes Interest in visit... yes............... 0,8% 25

Aggregation level of available consumer data 16 federal states 41 districts 441 counties ca. 8.300 zip codes distribution Aggregation ca. 13.900 cities ca. 12.300 statistical districts ca. 40.000 Market Cluster ca. 80.000 voting districts ca. 85.000 market cells ca. 1,5 Mio street segments ca. 20 Mio. Household data 26

Outline Introduction to spatial data mining - Project example: Geomarketing The spatial data mining process - Project example: Outdoor media reach estimation The importance of data - Project example: Customer selection in the gas industry Spatial mining tools and visual analytics - CommonGIS and SPIN! Research challenge track data - Project example: SPR - GeoPKDD 27

Interactive Exploratory Analysis: CommonGIS and SPIN! Choropleth maps showing distribution of variable(s) in space Parallel Coordinate Plot Combining spatial and non-spatial displays Variables selected and manipulated by the user Powerful for lowdimensional dependencies (3-4) Displays dynamically linked Scatter Plot 28

Representation of in the database (Oracle) Klösgen & May 02 A set of relations R 1,...,R n, such that - any relation R i possesses a geometry attribute G i - or an identifier A i which allows joining R i with another relation R k, which in turn possesses a geometry attribute geometry attributes G i consist of sets of x,y-pairs, which define points, lines or polygons different kinds of spatial objects are stored in different relations R i (geographic layers) e.g. streets, rivers, districts, buildings every layer has a single geometry attribute and its own proper set of attributes A 1,..., A n 29

Division of labor between Oracle RDBMS Klösgen & May 02 and search manager mining query Database Server Search Algorithm sufficient statistics Mining Server Database integration: efficiently organize mining queries Mining query delivers statistics (aggregations) ufficient for evaluating many hypotheses search in hypothesis space generation and evaluation of hypotheses (subgroup patterns) 30

Outline Introduction to spatial data mining - Project example: Geomarketing The spatial data mining process - Project example: Outdoor media reach estimation The importance of data - Project example: Customer selection in the gas industry Spatial mining tools and visual analytics - CommonGIS and SPIN! Research challenge track data - Project example: SPR - GeoPKDD 31

Mobility analysis based on GPS-tracks introduction of new pricing model for poster sites based on GPS tracks registration of contact frequencies with poster sites contact extrapolation for target groups: - socio-demographic characteristics - residential areas 32

Time patterns Patterns / Questions - How long (days) does it take till x% of objects visit all locations? - How long does it take till x% of objects visit at least one location twice? Applications - determine mobility of a group of people - reach of poster networks - find popularity of locations (theatres, supermarkets, hospitals) 33

Challenges of track data Goals: - investigate the relationship between spatio-temporal data and frequency measurements - improve prediction performance with active learning Data: - tracks of mobile phones and / or GPS devices - street-map - possibly frequency measurements Tasks: - track-to-street mapping - prediction of traffic frequencies (regression) 34

Track-to-street Mapping Mapping of tracks from cell-level to street-level many possibilities 35

Track-to-street Mapping Mapping of tracks from cell-level to street-level Suppose, we have prior knowledge about traffic frequencies highly frequented streets some routes become more likely. 36

Frequency Prediction with Track Data Steps in Extrapolation: - count number of intersections of streets and tracks within a certain timeframe (e.g. one week) - extrapolate from sample to population Problems: Y-Coordinate - sample is not representative (biased), e.g. more young people have mobile phones than older people, different trafffic behavior of old and young people mobile data is sensitive, possibly only opt in customers - streets with 0-frequency - large gaps within tracks - censored data (people drop out of survey before end) - noise X-Coordinate 37

Probabilistic active track to street mapping [PhD thesis Körner] Tasks: 1. track-to-street mapping 2. extrapolation of traffic frequencies 3. improvement of (online) sampling by active learning 1. mapping 2. extrapolation 3. active learning 38

Integration of Spatial Background Knowledge Aggregation of attributes within a buffer of given location buffer spatially defined buffer places within a radius of 200 m driving zones temporally defined bufffer what places can be reached on foot / by car within the next 20 minutes 4 restaurants within 200m of X 2 hospitals to reach within 12 min 39

Research Questions 1. Can track data be used for frequency prediction? What problems arise? 2. How can track data and frequency measurements benefit from each other? improvement of track-to-street mapping with frequency data enhancement of frequency prediciton using tracks 3. How to incorporate active learning to improve the data model? How to select places for additional traffic measurements? How to select persons for track monitoring? 40

Summary New data sources make spatial mining a very promising topic Spatial data mining is a process consisting of data, preprocessing, algorithms and visualization - Project examples: Geomarketing, outdoor media frequencies Selection of the right data is crucial - Project example: gas industry Spatial data mining is inherently visual - Tools such as CommonGIS Research challenge track data - Project example: SPR GPS tracks, GeoPKDD and and We We are are hiring! hiring! 41