Visually driven analysis of movement data by progressive clustering

Similar documents
Visually driven analysis of movement data by progressive clustering

The STC for Event Analysis: Scalability Issues

Space-Time Cube in Visual Analytics

Scalable Cluster Analysis of Spatial Events

Cluster Analysis: Advanced Concepts

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

Constructing Semantic Interpretation of Routine and Anomalous Mobility Behaviors from Big Data

Spatio-Temporal Clustering: a Survey

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

Forschungskolleg Data Analytics Methods and Techniques

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Public Transportation BigData Clustering

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Discovering Trajectory Outliers between Regions of Interest

Real-time Processing and Visualization of Massive Air-Traffic Data in Digital Landscapes

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Clustering Data Streams

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Linköpings Universitet - ITN TNM DBSCAN. A Density-Based Spatial Clustering of Application with Noise

How To Understand The History Of Navigation In French Marine Science

Advanced Methods for Pedestrian and Bicyclist Sensing

Content Delivery Network (CDN) and P2P Model

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Visual Analytics Tools for Analysis of Movement Data

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

A comparison of various clustering methods and algorithms in data mining

Clustering UE 141 Spring 2013

CONTENTS. List of Contributors Preface Acknowledgments. mobility data modeling and representation

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Mapping Linear Networks Based on Cellular Phone Tracking

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

Clustering & Visualization

Large data computing using Clustering algorithms based on Hadoop

Use of mobile phone data to estimate visitors mobility flows

The Role of Visualization in Effective Data Cleaning

Context-aware taxi demand hotspots prediction

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

OPTICS: Ordering Points To Identify the Clustering Structure

Proposed Application of Data Mining Techniques for Clustering Software Projects

Unsupervised Data Mining (Clustering)

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Identifying users profiles from mobile calls habits

Dr. Shih-Lung Shaw s Research on Space-Time GIS, Human Dynamics and Big Data

Visual Exploration of Sparse Traffic Trajectory Data

Geospatial Data Integration Open Service Bus Architecture

Comparison and Analysis of Various Clustering Methods in Data mining On Education data set Using the weak tool

Data Mining: Foundation, Techniques and Applications

Clustering methods for Big data analysis

Medial Axis Construction and Applications in 3D Wireless Sensor Networks

SPATIO-TEMPORAL TRAJECTORY ANALYSIS OF MOBILE OBJECTS FOLLOWING THE SAME ITINERARY

Large-Network Travel Time Distribution Estimation, with Application to Ambulance Fleet Management

Visualizing and Understanding Players Behavior in Video Games: Discovering Patterns and Supporting Aggregation and Comparison

MOBILITY DATA MODELING AND REPRESENTATION

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Monitoring of Complex Industrial Processes based on Self-Organizing Maps and Watershed Transformations

Online Temporal-Spatial Analysis for Detection of Critical Events in Cyber-Physical Systems

Big Data Analytics in Mobile Environments

Visual Analytics and Data Mining

Distances, Clustering, and Classification. Heatmaps

RESEARCH ARTICLE. Interpreting Map Usage Patterns using Geovisual Analytics and Spatio-Temporal Clustering

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

A Visual Analytics Framework for Spatiotemporal Analysis and Modelling

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

A Comparative Study of clustering algorithms Using weka tools

Clustering: Techniques & Applications. Nguyen Sinh Hoa, Nguyen Hung Son. 15 lutego 2006 Clustering 1

Information Management course

Environmental Remote Sensing GEOG 2021

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

PhoCA: An extensible service-oriented tool for Photo Clustering Analysis

Social Media Mining. Data Mining Essentials

Using Data Mining for Mobile Communication Clustering and Characterization

Spatial Data Analysis

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

SPATIAL DATA CLASSIFICATION AND DATA MINING

LiDAR Point Cloud Processing with

Mobile phone data for Mobility statistics

Available online at ScienceDirect. Procedia Computer Science 52 (2015 )

A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

«VISUALIZATION OF POTENTIAL CUSTOMERS»

Chemotaxis and Migration Tool 2.0

The Quality of Internet Service: AT&T s Global IP Network Performance Measurements

GE-INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH VOLUME -3, ISSUE-6 (June 2015) IF ISSN: ( ) EMERGING CLUSTERING TECHNIQUES ON BIG DATA

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Clustering. Chapter Introduction to Clustering Techniques Points, Spaces, and Distances

Visual Analytics for Understanding Spatial Situations from Episodic Movement Data

Cluster Analysis: Basic Concepts and Algorithms

Procedure for Marine Traffic Simulation with AIS Data

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

Identifying erroneous data using outlier detection techniques

Geovisualization of Dynamics, Movement and Change: Key Issues and Developing Approaches in Visualization Research

Aggregation of Spatio-temporal and Event Log Databases for Stochastic Characterization of Process Activities

Location-Based Social Networks: Users

Transcription:

Visually driven analysis of movement data by progressive clustering Salvatore Rinzivillo, Dino Pedreschi, Mirco Nanni, Fosca Giannotti, Natalia Andrienko, Gennady Andrienko Information Visualization 2008 Eric Shook (eshook2@uiuc.edu)

Outline Introduction Density-based clustering method (OPTICS) Visualizing Cluster Trajectories Progressive Clustering Methods Case study Car trajectories in Milan Summary Conclusions

Movement data examples Andrienko, G. and Andrienko, N. Interactive spatio temporal cluster analysis of VAST challenge 2008 datasets. SIGKDD 2009.

Space-time cubes

Motivation Recent interest, because of availability of large amounts of movement or trajectory data GPS RFID Clustering is a good approach to explore and analyze large trajectory data sets Visual and interactive techniques play a key role in this analysis and exploration Combine clustering and visual techniques

Quick review: Clustering methods Partition-based K-means K-mediod Self-Organizing Map (SOM) Dendrogram (Tree-like representation) Top-down Bottom-up Density-based OPTICS

OPTICS algorithm Density-based clustering algorithm DBSCAN family Robust to noise and outliers desirable trait when analyzing trajectory data Covered by Dr. Yu in previous lectures Has core, reachable, and noise objects

High-level Definitions Core object Reachable object If the number of objects, N, within search radius (Eps) is greater than MinNbs from a given point then that point is a core point A point that is within search radius (Eps) distance to a core point, but not a core point Noise object A point that is not a core or reachable point

Example Eps: A distance threshold or maximum search radius If MinPts=3 then point is a core point If MinPts=5 then point is not a core point Core distance: Distance to its 3rd neighbor if MinPts=3 Reachable point: Point inside Eps radius

OPTICS algorithm pseudo-code OPTICS (SetOfTrajectories, Eps, MinNbs, Distance_function) for each t in SetOfTrajectories do if (not t.isprocessed()) then neighbors := Distance.neighbors(t, Eps) t.setprocessed(true); t.setreachabilitydistance(infinity); t.setcoredistance(neighbors, Eps, MinNbs); emit(t); if (t.getcoredistance() < Infinity) then // This is a core point Expand the cluster PriorityQueue.updateOrAddAll(Neighbors); while (not PriorityQueue.isEmpty()) t1 = PriorityQueue.pop(); t1_neighbors = Distance.neighbors(t1, Eps) t1.setprocessed(true) t1.setcoredistance(t1_neighbors, Eps, MinNbs) emit(t1); if (t1.getcoredistance() < Infinity) then // This is a core point PriorityQueue.updateOrAddAll(t1_neighbors); end while end if end if end for

OPTICS algorithm pseudo-code for each in SetOfTrajectories For eachtunprocessed point do if (not t.isprocessed()) then get neighboring points (closer than Eps) neighbors := Distance.neighbors(t, Eps) t.setprocessed(true); mark point as processed t.setreachabilitydistance(infinity); not currently a reachable point t.setcoredistance(neighbors, Eps, MinNbs); see if a core point (coredistance) emit(t);... end if end for

OPTICS algorithm pseudo-code If if (t.getcoredistance() a core point < Infinity) then PriorityQueue.updateOrAddAll(Neighbors); Add all neighbors to priority queue While(not priority queue has points while PriorityQueue.isEmpty()) Get a point t1 = PriorityQueue.pop(); Get the points neighbors t1_neighbors = Distance.neighbors(t1, Eps) t1.setprocessed(true) Mark point as processed t1.setcoredistance(t1_neighbors, Eps, MinNbs) See if point is a core point (coredistance) emit(t1); If point is a core point if (t1.getcoredistance() < Infinity) then PriorityQueue.updateOrAddAll(t1_neighbors); Add all point's neighbors to priority queue end while end if

Reachability plot in relation to clusters Clusters represented as valleys OPTICS algorithm generates a distance for each point that can be plotted in a reachability plot

Case study Car trajectory data

Data for analysis Using a subset of a larger dataset consisting of: GPS-tracked cars: 17,241 Trajectories: 6187 Date: Wednesday, 4/4/2007 Time: 6:00 am to 10:00 am Location: Milan, Italy Complete data set contains 176,000 trajectories

Trajectories data Spatio-temporal data x,y + time {id, (x1,y1,t1), (x2,y2,t2), } Most clustering methods experience difficulty handling poly-lines (trajectories) Convert trajectory? to be analyzed by clustering methods Two primary approaches for clustering trajectories Feature-based Direct geometric models *

Clustering trajectory data? OPTICS algorithm takes a distance function to compute a relative distance between two objects (or trajectories) We discuss several distance functions

Simple distance functions Common source Common destination Distance between start points of trajectory Distance between end points of trajectory Common source and destination Average distance between source and destinations

Distance functions k-points Defines the number of intermediate points to check when comparing two paths Time steps Defines the desired temporal distance between check points Generally these only work for perfect data Equal time intervals Minor positioning error

Distance functions Route similarity Tolerate incomplete trajectories Repeatedly search for the closest pair of positions in the two trajectories Compute mean distance between these positions Subtract penalty distance for unmatched positions Computationally expensive Route similarity + dynamics Account for relative times of arriving in corresponding positions Temporal distance to some reference time moment Andrienko G, Andrienko N, Wrobel S. Visual analytics tools for analysis of movement data. ACM SIGKDD Explorations 2007

Visualizing clustered trajectories

Visualizing a cluster of trajectories Single trajectory Cluster of trajectories (each trajectory at 30% opacity) Summarized representation of the cluster (thicker lines represent more shared trajectories)

Shared trajectories Summarized representation does not work for arbitrary movement data Hard to make sense of the data

Dynamic aggregator Compute aggregate attributes for all members in the aggregator Member count, minimum, maximum, average, median, etc. Responds to interactive filtering mechanisms Two types of aggregators Summation places Aggregate moves

Aggregator Summation places An area in space that is 'aware' of all trajectories that pass through it Record positions and times entering and exiting the area for each trajectory

Aggregator Aggregate moves A B Two areas (A & B) that are 'aware' of all trajectories that go directly from A to B without passing another area Record member count, statistical summaries, lengths of fragments, durations, and speeds A C B

Finding areas for aggregators Important to find interesting areas for aggregators that contain many trajectories Method for finding some interesting areas Extract start, end, stop, and turn points Parameter: Minimum duration of a stop Parameter: Minimum angle of turn Build circles around concentration of points Parameter: Minimum circle area Parameter: Maximum circle area

Automated method Automated method used to find summation places Aggregation of trajectories using summation places

Comparing methods Semantically defined significant places Automated method for finding interesting places

Improving the understanding of clusters of trajectories Further work is needed to automatically find interesting areas for these methods to work sufficiently well Therefore other methods are also used Interactive filtering Visualize aggregate attributes Progressive clustering

Interactive filtering

Interactive filters Temporal filter Spatial filter Select a 'window' of space Attribute filter Select a time interval Select by attributes such as length and speed Cluster filter Selection of clusters

Results of interactive filters Aggregate moves Only clusters with >=15 members

Visualize non trajectory attributes

Visualize aggregate attributes Yellow Start Blue End Places from automated method Grid cells

Progressive clustering

Progressive clustering example The data set contains the trajectories for people primarily driving to work. Goal: Find the major areas where people drive to work. Problem: Density of data set is not uniform causing problems for OPTICS density-based algorithm Solution: Progressive clustering is a practical approach to analyzing unevenly dense data

Progressive clustering steps Pick a large minimum number of neighbors (MinNbs) Play with parameter values until a few easy to distinguish clusters appear Examine, describe, and explain clusters Remove the objects in these clusters from analysis Continue analysis with a smaller number of minimum neighbors (MinNbs) Repeat until analysis is sufficiently complete

First clustering Route Destination Large clusters Destination function Eps=500m MinNbs=15 1781, 893, 354 trajectories

Second clustering Smaller clusters (large clusters removed) Destination function Eps=500m MinNbs=10 299 127 trajectories

Third clustering Smallest clusters (larger clusters removed) Destination function Eps=500m MinNbs=5 219 45 trajectories

Results of progressive clustering on destination locations Large clusters Eps=500m MinNbs=15 1781, 893, 354 trajectories Smaller clusters (large clusters removed) Eps=500m MinNbs=10 299 127 trajectories Smallest clusters (larger clusters removed) Eps=500m MinNbs=5 219 45 trajectories

Analyzing the largest (red) cluster Select largest cluster out of the previous analysis step and apply further analysis Cluster routes based on 'route similarity' Most routes are short and in the center 517 short routes (29% of cluster) Eps=1000m MinNbs=5

Analyzing the largest (red) cluster Without short routes Cluster routes based on 'route similarity' Very few clusters Cluster sizes range from 11 to 28 Few routes are distance routes Remaining 63% trajectories are noise * Other 2 large clusters generate similar results

Second example: source and destination Apply source + destination function to cluster trajectories in entire dataset 2% (127 trajectories) significantly different start/end 23% (1439 trajectories) starts close to ends 75% (4621 trajectories) noise (gray) Includes roundtrips Eps=800m MinNbs=8

Long distance routes Remove short and roundtrip routes Apply 'route similarity' to remaining routes Peripheral routes Inward routes Outward routes

Long distance routes 41 clusters 66-5 trajectories per cluster Inward routes are larger and more numerous than outward routes Authors note that different cluster functions bring similar results

Summary of findings No destinations are significantly more popular than other destinations Most trips are short local trips Most trajectories follow unique routes Many peripheral routes that do not enter city and follow belt around the city Inward trips are more frequent than outward trips, since the morning traffic

Progressive clustering + simple distance functions Progressive clustering with simple distance functions Only apply computationally intensive functions to small areas of interest User understands the steps created for each result Its possible to combine distance functions into a comprehensive function to address heterogeneous attributes Computational limitations may make this approach infeasible User will have a difficult time making sense of the clusters and the results of the analysis

Future work Re-clustering Automatically search a range of parameters 500<=Eps<=1000 and 5<=MinNbs<=10 Return the set of results for the user to review Automatic quality assessment Automate a process to evaluate a large number of possibilities Use quality measure such as average density or cohesion Applying to trajectory data is an open problem

Conclusions Trajectories are complex spatiotemporal constructs and are difficult to analyze Progressive clustering with visualization and interaction techniques are a suitable approach to explore and analyze the data Combine several functions iteratively helps users better understand the data Allows the user to control the amount of computation to apply to each step

Thank you Questions?