Analyzing Trajectory Data

Similar documents
Traffic mining in a road-network: How does the

SPATIO-TEMPORAL QUERIES FOR MOVING OBJECTS DATA WAREHOUSING

Trajectory Data Warehouses: Design and Implementation Issues

Mining Mobile Group Patterns: A Trajectory-Based Approach

Tracking System for GPS Devices and Mining of Spatial Data

St-Toolkit: A Framework for Trajectory Data Warehousing

Oracle8i Spatial: Experiences with Extensible Databases

CONTENTS. List of Contributors Preface Acknowledgments. mobility data modeling and representation

Analyzing Polls and News Headlines Using Business Intelligence Techniques

A Survey on Spatio-Temporal Data Warehousing

Introduction to Data Mining

IMPLEMENTING SPATIAL DATA WAREHOUSE HIERARCHIES IN OBJECT-RELATIONAL DBMSs

Spatial Data Warehouse and Mining. Rajiv Gandhi

CubeView: A System for Traffic Data Visualization

A Seismic Data Management and Mining System

Continuous Spatial Data Warehousing

EasyTracker: An Android application for capturing mobility behavior

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

CHAPTER-24 Mining Spatial Databases

Building Data Cubes and Mining Them. Jelena Jovanovic

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Distributed Continuous Range Query Processing on Moving Objects

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases

The STC for Event Analysis: Scalability Issues

Indexing and Retrieval of Historical Aggregate Information about Moving Objects

Speed Up Your Moving Object Using Spatio-Temporal Predictors

Mobility data analysis to understand unknown diseases behavior The case of facial paralysis

Privacy-preserving data warehousing for spatiotemporal

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

Multi-dimensional index structures Part I: motivation

Taxi Trajectories and Multidimensional Modeling

MOBILITY DATA MODELING AND REPRESENTATION

Discovering Trajectory Outliers between Regions of Interest

On Mining Group Patterns of Mobile Users

SeaCloudDM: Massive Heterogeneous Sensor Data Management in the Internet of Things

1.5.3 Project 3: Traffic Monitoring

Big Data Analytics in Mobile Environments

Measuring Performance in the Retail Industry

Intelligent Stock Market Assistant using Temporal Data Mining

Abstract. Key words: space-time cube; GPS; spatial-temporal data model; spatial-temporal query; trajectory; cube cell. 1.

Dynamic Modeling of Trajectory Patterns using Data Mining and Reverse Engineering

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Business Intelligence: Using Data for More Than Analytics

MONIC and Followups on Modeling and Monitoring Cluster Transitions

Requirements engineering for a user centric spatial data warehouse

Concept and Applications of Data Mining. Week 1

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

SPATIAL DATA CLASSIFICATION AND DATA MINING

OLAP Online Privacy Control

Introduction to Data Mining

SQL Server 2012 Business Intelligence Boot Camp

International Journal of Advance Research in Computer Science and Management Studies

Data Structures for Moving Objects

MINING CLICKSTREAM-BASED DATA CUBES

Dimensional Modeling for Data Warehouse

Indexing the Trajectories of Moving Objects in Networks

Lecture Data Warehouse Systems

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Dynamic Data in terms of Data Mining Streams

Quality Assessment in Spatial Clustering of Data Mining

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

INTEROPERABILITY IN DATA WAREHOUSES

Efficient Integration of Data Mining Techniques in Database Management Systems

Spatial Hierarchy & OLAP-Favored Search in Spatial Data Warehouse

A Knowledge Management Framework Using Business Intelligence Solutions

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses

DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES A User-driven Approach

Clustering Data Streams

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

Spatial Data Preparation for Knowledge Discovery

Advanced Data Management Technologies

Business Intelligence: Effective Decision Making

Chapter 3 - Data Replication and Materialized Integration

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Scalable Cluster Analysis of Spatial Events

Data Warehousing and Data Mining

How To Understand The History Of Navigation In French Marine Science

Mario Guarracino. Data warehousing

Fuzzy Spatial Data Warehouse: A Multidimensional Model

Information Management course

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Mapping Linear Networks Based on Cellular Phone Tracking

Distance Learning and Examining Systems

Chapter ML:XI. XI. Cluster Analysis

Introduction to Data Mining

Mining various patterns in sequential data in an SQL-like manner *

Data Warehousing. Jens Teubner, TU Dortmund Winter 2014/15. Jens Teubner Data Warehousing Winter 2014/15 1

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Introduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?

Data Warehouse: Introduction

East Asia Network Sdn Bhd

Gain insight, agility and advantage by analyzing change across time and space.

CHAPTER 5: BUSINESS ANALYTICS

Turkish Journal of Engineering, Science and Technology

Load Distribution in Large Scale Network Monitoring Infrastructures

Data Mining and Database Systems: Where is the Intersection?

Constructing Semantic Interpretation of Routine and Anomalous Mobility Behaviors from Big Data

Why Business Intelligence

SQL Server 2012 End-to-End Business Intelligence Workshop

Transcription:

Analyzing Trajectory Data Gerasimos Marketos Free University of Bolzano 20/3/2009 Information Systems Lab (InfoLab) University of Piraeus (UniPi), Greece http://infolab.cs.unipi.gr

Outline Preliminaries Trajectory Data Warehousing Our framework for trajectory analysis Reconstructing trajectories ETL processing Addressing the distinct count problem Traffic Mining Modeling traffic flow in a road network Our framework for mining traffic patterns Detecting traffic flow in a road network G. Marketos: Analyzing Trajectory Data 2

Preliminaries G. Marketos: Analyzing Trajectory Data 3

The big picture (the GeoPKDD project) Bandwidth/Power optimization Mobile cells planning Traffic Management Accessibility of services Mobility evolution Urban planning. Aggregative Location - based services Telecommunication provider interpretation visualization Public administration or business companies Privacy - aware Data mining GeoKnowledge trajectory reconstruction p(x)=0.02 ST patterns warehouse Trajectory warehouse Privacy enforcement G. Marketos: Analyzing Trajectory Data 4

A tour on GeoPKDD Mobility data management Acquiring and storing trajectories in MODs Location-aware querying Trajectory indexing Mobility data mining and the geographic KDD process Trajectory warehousing and OLAP Mobility data mining and reasoning Visual analytics for mobility data Privacy aspects on mobility data Anonymity-preserving mobility mining G. Marketos: Analyzing Trajectory Data 5

Key questions (for this talk) How to store and query trajectory data? Database technology is extended: Moving Object Databases How to reconstruct a trajectory from raw logs? Position devices provide us information just about location points and not about trajectories How to analyze trajectory data Data Warehousing technology is adapted to handle trajectory data Are there any spatio-temporal patterns in my data? New Data Mining algorithms are needed so as to discover such patterns Traffic Mining is a very interesting topic in this area G. Marketos: Analyzing Trajectory Data 6

Mobility Data Typical structure and size: N;Time;Lat;Lon;Height;Course;Speed;PDOP;State;NSat 8;22/03/07 08:51:52;50.777132;7.205580; 67.6;345.4;21.817;3.8;1808;4 9;22/03/07 08:51:56;50.777352;7.205435; 68.4;35.6;14.223;3.8;1808;4 10;22/03/07 08:51:59;50.777415;7.205543; 68.3;112.7;25.298;3.8;1808;4 11;22/03/07 08:52:03;50.777317;7.205877; 68.8;119.8;32.447;3.8;1808;4 12;22/03/07 08:52:06;50.777185;7.206202; 68.1;124.1;30.058;3.8;1808;4 13;22/03/07 08:52:09;50.777057;7.206522; 67.9;117.7;34.003;3.8;1808;4 14;22/03/07 08:52:12;50.776925;7.206858; 66.9;117.5;37.151;3.8;1808;4 15;22/03/07 08:52:15;50.776813;7.207263; 67.0;99.2;39.188;3.8;1808;4 16;22/03/07 08:52:18;50.776780;7.207745; 68.8;90.6;41.170;3.8;1808;4 17;22/03/07 08:52:21;50.776803;7.208262; 71.1;82.0;35.058;3.8;1808;4 18;22/03/07 08:52:24;50.776832;7.208682; 68.6;117.1;11.371;3.8;1808;4 G. Marketos: Analyzing Trajectory Data 7

Location data producers: GSM, GPS, WiFi Ti ( xi, yi, ti ),..., ( xi, yi, ti 1 1 1 ni ni ni ) Location data (id, x, y, t) are generated Trajectory stream manager + Trajectory reconstruction trajectory data (obj-id, traj-id, (x, y, t) * ) are reconstructed T i ( x i, y i, t i ),..., ( x,, t 1 1 1 ni ni ni i y i i ) Moving Object Database G. Marketos: Analyzing Trajectory Data 8

Moving Objects Databases The traditional database technology has been extended into Moving Object Databases (MODs) that handle modeling, indexing and query processing issues for trajectories Spatial and temporal dimensions are considered as first-class citizens. Both past and current (as well as anticipated future) positions of moving objects are of interest. Several prototype MODs DOMINO (Wolfson et al.) EDBT 02 PLACE (Mokbel et al.) VLDB 04 SECONDO (Güting et. al.) ICDE 05 HERMES (Pelekis et. al.) EDBT 06, SIGMOD 08 G. Marketos: Analyzing Trajectory Data 9

Hermes MOD engine Built-in ORACLE ORDBMS Data model: absolute vs. relative location coordinates Current location as a function in time over the starting location linear and arc movement functions MOD management Insert / Update / Delete a moving object or a segment of its trajectory Functions over trajectories or sets of trajectories Indexing support Supported indices: R-tree (for stationary data) Development of a specialized index (TB-tree) G. Marketos: Analyzing Trajectory Data 10

Hermes prototype architecture G. Marketos: Analyzing Trajectory Data 11

Hermes Moving Data Cartridge G. Marketos: Analyzing Trajectory Data 13

Hermes: trajectory data type Primitive definition: Unit_Function = d x i :double, y i :double, x e :double, y e :double, x c :double, y c :double, v:double, a:double, flag:typeoffunction, where TypeOfFunction={ CONST, PLNML_1, ARC_<1..8> } Unit_Moving_Point = d p: Period SEC, m: Unit_Function Moving_Point = d { tab: set Unit_Moving_Point constraints } xx' Y t ε [t1, t2) -> Linear movement Y' (x t,y t,t) (xe,y e,t e ) t ε [t2, t3) -> Arc movement t ε [t3, t4) -> Const movement (x i,y i,t i ) X' t ε [t4, t5) -> Linear movement tt' (x c,y c ) yy' o X t1 t2 t3 t4 t5 G. Marketos: Analyzing Trajectory Data 14

TB-Tree support in Hermes MOD engine TB-Tree Index (Pfoser et al., 2000) Maintains the trajectory concept Each node consists of segments of a single trajectory Nodes are linked together in a chain Effective for trajectory-oriented queries Implemented in Hermes using Oracle s indexing extensibility t1 t7 t3 t11 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 G. Marketos: Analyzing Trajectory Data 15

Trajectory Data Warehousing & OLAP G. Marketos: Analyzing Trajectory Data 16

Trajectory data warehousing The analysis of trajectory data raises opportunities for discovering behavioral patterns that can be exploited in various applications TDW should extract aggregate information from MOD support a variety of dimensions (temporal, spatial, thematic, ) and measures (about space, time and their derivatives) Storing measures associated with facts, concerning the set of trajs crossing the cell aggregate information in base cells Challenges design of a trajectory-oriented data cube high volume and complex nature of data; special query processing requirements extensions of traditional aggregation techniques to produce summary information for OLAP analysis G. Marketos: Analyzing Trajectory Data 17

Trajectory data warehousing A Trajectory Reconstruction process is applied on the raw timestamped location data in order to generate trajectories, which are then stored into a MOD An Extract-Transform-Load (ETL) procedure is activated that feeds the data cube(s) with aggregate information on trajectories The final step of the process offers OLAP (and, eventually, DM) capabilities over the aggregated information contained in the trajectory cube model trajectory data analyst location data producers Location data (id, x, y, t) are recorded Analysis over aggregate data is performed (OLAP) Trajectory Data Cube MOD Trajectory reconstruction module Aggregates are loaded in the data cube (ETL procedure) Reconstructed trajectory data are stored in MOD G. Marketos: Analyzing Trajectory Data 18

Related work Data Warehousing (DW) is widely investigated for conventional, non-spatial data. Some research work has been done in the area of Spatial Data Warehousing (Han et al., 1998). (Shekhar et al., 2001) extend the idea of cube dimensions so as to include spatial and non-spatial ones, and of cube measures so as to represent space regions and/or calculate numerical data One step beyond Spatial DW is modeling a TDW (Pelekis et al., 2007). A simple TDW model is presented in (Orlando et al., 2007). The notion of multiple aggregation paths is defined in (Jensen et al., 2004) The integration of spatial and temporal dimensions is proposed in (Tao and Papadias, 2005) G. Marketos: Analyzing Trajectory Data 19

Store Conventional DWs In conventional DWs, measures about entities (e.g. shopping baskets, customers etc.) refer to a specific base cell that are formed by the dimensions distinct objects appear only once at the minimum abstraction level of the cube Time 1/1 2/1 3/1 TV VCR PC 101 count 201 Count of transactions 301 A transaction cannot be found in both base cells sum G. Marketos: Analyzing Trajectory Data 20

Location Trajectory DW The challenge: In TDW, measures regarding trajectories may affect more than one base cell the same object might affect measures in several base cell due to its movement (!!) 50 < Age 30 < Age 50 Age 30 count Time 12.00 13.00 14.00 Count of trajectories R1 R2 R3 A trajectory can be found in both base cells (the distinct counting problem) sum G. Marketos: Analyzing Trajectory Data 21

Sample schemas Moving Object Database OBJECTS (object-id: identifier, description: text, gender: {M F}, birth-date: date, profession: text, device-type: text) RAW_LOCATIONS (object-id: identifier, timestamp: datetime, eastings-x: numeric, northings-y: numeric, altitude-z: numeric) MOD_TRAJECTORIES (trajectory-id: identifier, object-id: identifier, trajectory: 3D geometry) Trajectory Data Warehouse Dimensions: Spatial, Temporal, Object Profile Measures: count (trajectories), count (users), avg (distance traveled), avg (travel duration), avg (speed), avg (abs (acceler) ) PK SPACE_DIM PARTITION_ID PARTITION_GEOMETRY DISTRICT CITY STATE COUNTRY PK,FK3 PK,FK2 PK,FK1 OBJECT_PROFILE_DIM PK OBJPROFILE_ID GENDER BIRTHYEAR PROFESSION MARITAL_STATUS DEVICE_TYPE FACT_TBL INTERVAL_ID PARTITION_ID OBJPROFILE_ID COUNT_TRAJECTORIES COUNT_USERS AVG_DISTANCE_TRAVELED AVG_TRAVEL_DURATION AVG_SPEED AVG_ABS_ACCELER PK TIME_DIM INTERVAL_ID INTERVAL_START INTERVAL_END HOUR DAY MONTH QUARTER YEAR DAY_OF_WEEK RUSH_HOUR G. Marketos: Analyzing Trajectory Data 22

Our framework for trajectory analysis We focus our research on three issues that are critical to trajectory data warehousing: The preprocessing phase that deals with the explicit reconstruction of the trajectories, which are then stored into a MOD Alternative ETL processes that feed the trajectory data warehouse The solutions proposed on the challenging issue of measure aggregation G. Marketos: Analyzing Trajectory Data 23

Reconstructing trajectories Collected raw data represent time-stamped geographical locations Apart from storing raw data in the MOD, we are also interested in reconstructing trajectories: Raw points arrive in bulk sets We need a filter that decides if the new series of data is to be appended to an existing trajectory or not: Tolerance distance Temporal gap Spatial gap Maximum speed Maximum noise duration t y t y x x G. Marketos: Analyzing Trajectory Data 24

Reconstructing trajectories: parameters Tolerance distance The tolerance of the transmitted time-stamped positions. In other words, it is the maximum distance between two consecutive time-stamped positions of the same object in order for the object to be considered as stationary t y t y x x G. Marketos: Analyzing Trajectory Data 25

Reconstructing trajectories: parameters Tolerance distance Temporal gap between trajectories The maximum allowed time interval between two consecutive time-stamped positions of the same trajectory for a single moving object t y t y temporal gap x x G. Marketos: Analyzing Trajectory Data 26

Reconstructing trajectories: parameters Tolerance distance Temporal gap between trajectories Spatial gap between trajectories The maximum allowed distance in 2D plane between two consecutive time-stamped positions of the same trajectory t y t y spatial gap x x G. Marketos: Analyzing Trajectory Data 27

Reconstructing trajectories: parameters Tolerance distance Temporal gap between trajectories Spatial gap between trajectories Maximum speed It is used in order to determine whether a reported time-stamped position must be considered as noise and consequently discarded from the output trajectory t y t y x x G. Marketos: Analyzing Trajectory Data 28

Reconstructing trajectories: parameters Tolerance distance Temporal gap between trajectories Spatial gap between trajectories Maximum speed Maximum noise duration The maximum duration of a noisy part of a trajectory. Any sequence of noisy time-stamped positions of the same object will result in a new trajectory given that its duration exceeds noise max t y t y x x G. Marketos: Analyzing Trajectory Data 29

Reconstructing trajectories: algorithm As a first step (lines 1-6), the algorithm checks whether the object has been processed if so, it retrieves its partial trajectory from the corresponding list otherwise, it creates a new trajectory and adds it to list Then (lines 7-31), it compares the incoming point P with the tail of the partial trajectory (LastPoint) by applying the trajectory reconstruction parameters Algorithm Trajectory-Reconstruction (PartialTrajectories List, P Point, OId ObjectId) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. IF NOT PartialTrajectories.Contains(OId) THEN CTrajectory=New Trajectory; CTrajectory.AddPoint(P); PartialTrajectories.Add(CTrajectory); ELSE CTrajectory=PartialTrajectories(OId); IF Distance(CTrajectory.LastPoint,P)<= D TOL THEN IF P.T CTrajectory.LastPoint.T > gap Time THEN Report Ctrajectory.LastPoint; CTrajectory.Id=CTrajectory.Id+1; CTrajectory.AddPoint(P); ENDIF ELSEIF Speed(CTrajectory.LastPoint,P)> V max THEN IF P.T CTrajectory.LastPoint.T > noise max THEN Report CTrajectory.Noise; ELSE CTrajectory.AddNoise(P); ENDIF ELSEIF Distance(CTrajectory.LastPoint,P)> gap space THEN Report Ctrajectory.LastPoint; CTrajectory.Id=CTrajectory.Id+1; CTrajectory.AddPoint(P); ELSE IF P.T CTrajectory.LastPoint.T > gap Time THEN Report Ctrajectory.LastPoint; CTrajectory.Id=CTrajectory.Id+1; CTrajectory.AddPoint(P); ELSE CTrajectory.AddPoint(P); ENDIF ENDIF ENDIF G. Marketos: Analyzing Trajectory Data 30

Our framework for trajectory analysis We focus our research on three issues that are critical to trajectory data warehousing: The preprocessing phase that deals with the explicit reconstruction of the trajectories, which are then stored into a MOD Alternative ETL processes that feed the trajectory data warehouse The solutions proposed on the challenging issue of measure aggregation G. Marketos: Analyzing Trajectory Data 31

ETL processing: loading Loading data into the dimension tables straightforward Loading data into the fact table complex Fill in the measures with the appropriate numeric values In order to calculate the measures, we have to extract the portions of the trajectories that fit into the base cells of the cube We propose two alternative solutions to this problem: cell-oriented trajectory-oriented y G. Marketos: Analyzing Trajectory Data 32 x

ETL processing: measures Measure COUNT_ TRAJECTORIES COUNT_USERS AVG_DISTANCE_ TRAVELED AVG_TRAVEL_ DURATION AVG_SPEED AVG_ABS_ ACCELER Formula count all distinct trajectory ids that pass through base cell (bc) count all the distinct object ids that pass through bc SUM _ DISTANCE ( bc) AVG _ DISTANCE _ TRAVELED ( bc) COUNT _ TRAJECTORIES( bc) SUM _ DISTANCEbc ( ) len( TP) SUM _ DURATION ( bc) AVG _ TRAVEL _ DURATION ( bc) COUNT _ TRAJECTORIES( bc) SUM _ DURATIONbc ( ) lifespantp ( ) SUM _ SPEED( bc) AVG _ SPEED( bc) COUNT _ TRAJECTORIES( bc) SUM _ SPEED( bc) speed fin ( TPi ) speedinit ( TPi ) SUM _ ABS _ ACCELER ( bc) G. Marketos: Analyzing Trajectory Data lifespan TP bc 33 ( TPi ) 33 TP i TP TP i bc i bc bc i len( TPi ) lifespan( TP ) SUM _ ABS _ ACCELER ( bc) AVG _ ABS _ ACCELER ( bc) COUNT _ TRAJECTORIES( bc) i i i

ETL processing: algorithms Cell-oriented approach (COA) Search for the portions of trajectories that they reside inside a spatiotemporal cell Perform a spatiotemporal range query that returns the portions of trajectories that satisfy the range constraints This is efficiently supported by the TB-tree Decompose the trajectory portions with respect to the user profiles they belong to Compute measures for this cell Repeat for the next cells y COUNT_TRAJECTORIES = 2 COUNT_USERS = 2 x G. Marketos: Analyzing Trajectory Data 34

ETL processing: algorithms Trajectory-oriented approach (TOA) Discover the spatiotemporal cells where each trajectory resides in In order to avoid checking all cells, use the trajectory MBR Identify the cells that overlap with the MBR and contain portions of the trajectory Compute measures for each cell Repeat for the next trajectories y COUNT_TRAJECTORIES = 12 COUNT_USERS = 12 x G. Marketos: Analyzing Trajectory Data 35

Our framework for trajectory analysis We focus our research on three issues that are critical to trajectory data warehousing: The preprocessing phase that deals with the explicit reconstruction of the trajectories, which are then stored into a MOD Alternative ETL processes that feed the trajectory data warehouse The solutions proposed on the challenging issue of measure aggregation G. Marketos: Analyzing Trajectory Data 36

The distinct count problem: definition During the ETL process, measures can be computed in an accurate way by executing MOD queries Once the fact table has been fed, aggregate-only information is stored inside the TDW (no trajectory / user ids) When rolling up, COUNT_USERS, COUNT_TRAJECTORIES and, hence, all other measures defined over COUNT_TRAJECTORIES are subject to the distinct count problem (Tao et al., 2004): if an object remains in the query region for several timestamps during the query interval, instead of counting this object once, it is counted multiple times in the result y G. Marketos: Analyzing Trajectory Data 37 x

The distinct count problem: solution (1/3) We store in the base cells (C (x,y),t,p ) a tuple of auxiliary measures that help us correct the errors due to the duplicates when rolling-up: C (x,y),t,p.traj : number of distinct trajectories of profile p intersecting the cell C (x,y),t,p.cross-x: number of distinct trajectories of profile p crossing the spatial border between C (x-1,y),t,p and C (x,y),t,p C (x,y),t,p.cross-y: number of distinct trajectories of profile p crossing the spatial border between C (x,y-1),t,p and C (x,y),t,p C (x,y),t,p.cross-t: number of distinct trajectories of profile p crossing the temporal border between C (x,y),t-1,p and C (x,y),t,p T Cell C (x,y),t,p X Y G. Marketos: Analyzing Trajectory Data 38 38

The distinct count problem: solution (2/3) Let C (x,y ),t,p be a cell consisting of the union of two adjacent cells (i.e. C (x,y),t.p C (x+1,y),t,p ) In order to compute the number of distinct trajectories: C (x,y ),t,p.traj = C (x,y),t,p.traj + C (x+1,y),t,p.traj C (x+1,y),t,p.cross-x application of the well-known Inclusion/Exclusion principle for sets: A B = A + B A B BUT in some cases it holds that C (x+1,y),t,p.cross-x A B Example: fast and agile trajectories G. Marketos: Analyzing Trajectory Data 39

The distinct count problem: solution (3/3) Compute the number of distinct trajectories: C x,y,t,p.traj = 1 C x+1,y,t,p.cross-x = 1 C x+1,y,t,p.traj = 1 C x+1,y,t,p.cross-x = 0 C x,y,t,p C x+1,y,t,p C x,y,t,p C x+1,y,t,p C x,y+1,t,p C x+1,y+1,t,p C x,y+1,t,p C x+1,y+1,t,p (a) Correct! (b) Not Correct! G. Marketos: Analyzing Trajectory Data 40

Conclusions (for this part) We explored solutions for the efficient and effective development of trajectory warehouses Related work does not include research work on the complete flow of processes in a TDW In particular, we discussed about techniques for: the solution of the trajectory reconstruction problem efficient ETL support for trajectory data handling measure aggregation issues, with a special attention to the distinct count problem G. Marketos: Analyzing Trajectory Data 41

Future work Extend the proposed framework by applying OLAP analysis and DM techniques over the aggregated data Examine new measures for TDW, specifically suited for trajectories. Two examples: A typical trajectory describing the trend of movement within a cell The average direction measure of the trajectories within a cell G. Marketos: Analyzing Trajectory Data 42

Traffic Mining G. Marketos: Analyzing Trajectory Data 43

The traffic problem and challenges Traffic in cities is a well-known problem: the majority of people living in large cities and use cars for their transportation the number of cars moving in a city increases from year to year Recently, the technology achievements allow us to collect huge amount of traffic related data: mobile phones, traffic cameras, sensors, GPS devices How can we exploit this huge volume of data so as to gain insights on the traffic problem? The focus of our work is to detect how the different road segments of the network are related to each other. G. Marketos: Analyzing Trajectory Data 44

Related work (Liu et al, 2006), proposed a distributed traffic stream mining system they focus on description of the distributed traffic stream system, rather on the discovery of traffic related patterns (Li et al., 2007) proposed a technique for the discovery of hot routes in a road network They introduced a density-based algorithm, called FlowScan The algorithm requires the trajectories of the objects that move within the network G. Marketos: Analyzing Trajectory Data 45

How can the road segments be related? - 1 Kifissias Ave. G. Marketos: Analyzing Trajectory Data 46

How can the road segments be related? - 2 Poseidonos Ave. Syngrou Ave. Poseidonos Ave. G. Marketos: Analyzing Trajectory Data 47

How can the road segments be related? - 3 Kifissias Ave. Mesogeion Ave. Vasilissis Sofias Ave. G. Marketos: Analyzing Trajectory Data 48

Network Traffic We consider a fixed network consisting of a set of nonoverlapping regions regions = road intersections or landmarks of interest R 1 R 2 R 4 R 3 G. Marketos: Analyzing Trajectory Data 49

Network Graph The network is modeled as a directed graph G=(V,E) nodes V regions edges E direct connections between regions R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13 R 14 R 15 R 16 G. Marketos: Analyzing Trajectory Data 50

Capturing traffic through sensors Each edge e=(v, v ) is equipped with sensor technology that captures the movement from region v to region v. Definition: The traffic series of a sensor s S during a time period [t s, t e ] consists of the number of cars passed through this sensor during this period, recorded at Δt intervals and ordered in time: TS s = {v i, t i }, t s t i t e, Δt=t i -t i-1 the transmission rate of the sensor 150 100 80 90 t s t s +Δ t t s +2 Δt t e time v e v G. Marketos: Analyzing Trajectory Data 51

Network Traffic Traffic series of the network: TS = {TS s, s S} R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13 R 14 R 15 R 16 G. Marketos: Analyzing Trajectory Data 52

Similarity between edges: value distance Let e 1, e 2 be two network edges and let TS 1 ={(v 1i,t i )}, TS 2 ={(v 2i,t i )} be their corresponding traffic series, t i [t s, t e ] Definition: The value-based distance between e 1, e 2 during periods p and p is given by the absolute distance of their corresponding traffic series TS 1, TS 2 at these periods: dis value ( e p 1, e p' 2 ) dis value ( TS p 1, TS p' 2 ) G. Marketos: Analyzing Trajectory Data 53

Similarity between edges: shape distance Value-based distance is not appropriate for shape similarity task since it is sensible to: different baselines e.g. one series fluctuating around 100 and the other series fluctuating around 30 different scales e.g. one series having a small amplitude between 90 and 100 and the other series having a larger amplitude between 20 and 40 To deal with this problem, different techniques can be applied for comparing time series, e.g.: Dynamic Time Warping that allows us to compare time series either on the same or adjacent time periods Correlation-Coefficient measure Euclidean distance on normalized time series G. Marketos: Analyzing Trajectory Data 54

Traffic relationships Traffic propagation traffic from e 12 propagates to e 23 This might indicate objects that continue moving in a highway p p' Condition: dis ( TS, TS ) value e 12 e 23 value R 1 R 5 e 12 R 2 R 6 e 23 R 3 R 7 R 4 R 8 Traffic split/ spread traffic from e 12 splits into e 23 and e 26 This might indicate objects that leave a highway and follow different directions to their destination Conditions: R 1 e 12 e 26 R 2 e 23 R 3 R 4 dis shape ( TS, TS 23, 62 ) dis ( TS,( TS, TS )) value Traffic merge p e 12 p e 12 p' e e p' e 23 shape p' e 26 value R 5 R 6 R 7 R 8 traffic to e 23 merges traffic from e 12 and e 62 This might indicate objects that enter a highway from different directions Conditions: dis shape ( TS, TS 12, 62 ) dis TS, TS ), TS ) value p e 23 p' e p' e e p' e (( 12 62 23 shape p e, value e62e R 1 e 12 R 2 23 R 3 R 4 R 5 R 6 R 7 R 8 G. Marketos: Analyzing Trajectory Data 55

Traffic relationships Traffic sink traffic in e 12 is sank because there is no propagation or split relationship with its outgoing edges R 1 e 12 R 2 R 3 R 4 Traffic source R 5 R 6 R 7 R 8 traffic starts (source) from e 23 because there is no propagation or merge relationship with its incoming edges R 1 R 2 R 3 e 23 e 34 R 4 R 5 R 6 R 7 R 8 G. Marketos: Analyzing Trajectory Data 56

Conclusions (for this part) We consider the problem of mining traffic flow in a road network monitored through sensors placed at regions of interest (crossroads, etc.). We employ edge similarity measures based on time series comparison Traffic relationships are introduced and formally defined G. Marketos: Analyzing Trajectory Data 57

Thank you! Questions? Acknowledgement: Research partially supported by EU under the GeoPKDD (Geographic Privacy-aware Knowledge Discovery and Delivery) www.geopkdd.eu G. Marketos: Analyzing Trajectory Data 58

References Han, J., Stefanovic, N., and Koperski, K. Selective Materialization: An Efficient Method for Spatial Data Cube Construction. Proc. PAKDD, 1998. Jensen, C.S., Kligys, A., Pedersen, T.B., Dyreson, C.E., and Timko, I. Multidimensional data modeling for locationbased services, The VLDB Journal, 13:1 21, 2004. Li, X., Han, J., Lee, J.-G. and Gonzalez, H.: Traffic density-based discovery of hot routes in road networks. in Proc. 10th International Symposium on Spatial and Temporal Databases., 2007. Liu, Y., Choudhary, A. N., Zhou, J., and Khokhar, A. A.: A scalable distributed stream mining system for highway traffic data. in Proc. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2006, pp. 309 321. Orlando, S., Orsini, R., Raffaetà, A., Roncato, A., and Silvestri, C. Spatio-Temporal Aggregations in Trajectory Data Warehouses. Proc. DaWaK, 2007. Pelekis, N., Frentzos, E., Giatrakos, N., Theodoridis, Y.: HERMES: aggregative LBS via a trajectory DB engine. SIGMOD Conference 2008: 1255-1258 Pelekis, N., Raffaetà, A., Damiani, M.-L., Vangenot, C., Marketos, G., Frentzos, E., Ntoutsi, I., and Theodoridis, Y. Towards Trajectory Data Warehouses. Chapter in Mobility, Data Mining and Privacy: Geographic Knowledge Discovery. Springer-Verlag. 2008. Pfoser, D., Jensen, C.S., and Theodoridis, Y. Novel Approaches to the Indexing of Moving Object Trajectories, Proc. VLDB, 2000. Shekhar, S.., Lu, C., Tan, X., Chawla, S., Vatsavai, R. Map Cube: a Visualization Tool for Spatial Data Warehouses, Chapter in Geographic Data Mining and Knowledge Discovery. Taylor and Francis, 2001. Tao, T., and Papadias, D. Historical Spatio-Temporal Aggregation. ACM TODS, 23(1):61-102, 2005. Tao, Y., Kollios, G., Considine, J., Li, F., and Papadias, D. Spatio-Temporal Aggregation Using Sketches. Proc. ICDE, 2004. G. Marketos: Analyzing Trajectory Data 59

Selected literature on Mobility Data Modeling & MOD engines Nikos Pelekis, Elias Frentzos, Nikos Giatrakos, Yannis Theodoridis: HERMES: aggregative LBS via a trajectory DB engine. SIGMOD Conference 2008: 1255-1258 Nikos Pelekis, Yannis Theodoridis, Spyros Vosinakis, Themis Panayiotopoulos: Hermes - A Framework for Location-Based Data Management. EDBT 2006: 1130-1134 Güting, R.H. et al. (2000) A Foundation for Representing and Querying Moving Objects. ACM Transactions on Database Systems, 25(1):1-42. Karimi, H. and X. Liu (2003) A Predictive Location Model for Location-Based Services, Proceedings of ACM- GIS. Pelekis, N. et al. (2005) An Oracle Data Cartridge for Moving Objects. UNIPI-ISL-TR-2005-01, Univ. of Piraeus. Retrieved from http://infolab.cs.unipi.gr/ Schlieder, C. et al. (2001) Location Modeling for Intentional Behavior in Spatial Partonomies. Proceedings of Location Modeling for Ubiquitous Computing Workshop. Sistla, P. et al. (1997) Modeling and Querying Moving Objects. Proceedings of IEEE ICDE Conference. Wolfson, O. et al. (1998) Moving Objects Databases: Issues and Solutions. Proceedings of SSDBM Conference. G. Marketos: Analyzing Trajectory Data 60

Selected literature on Mobility Data Warehousing Han, J., Stefanovic, N., and Koperski, K. Selective Materialization: An Efficient Method for Spatial Data Cube Construction. Proc. PAKDD, 1998. Jensen, C.S., Kligys, A., Pedersen, T.B., Dyreson, C.E., and Timko, I. Multidimensional data modeling for location-based services, The VLDB Journal, 13:1 21, 2004. Leonardi, L., Orlando, S., Raffaetà, A., Roncato, A. and Silvestri, C. Frequent Spatio-Temporal Patterns in Trajectory Data Warehouses. Proceedings of 24th Annual ACM Symposium on Applied Computing, 2009 Marketos, G., Frentzos, E., Ntoutsi, I., Pelekis, N., Raffaetà, A., and Theodoridis, Y. Building Real World Trajectory Warehouses. Proc. MobiDE 08, Vancouver, Canada Orlando, S., Orsini, R., Raffaetà, A., Roncato, A., and Silvestri, C. Spatio-Temporal Aggregations in Trajectory Data Warehouses. Proc. DaWaK, 2007. Pelekis, N., Raffaetà, A., Damiani, M.-L., Vangenot, C., Marketos, G., Frentzos, E., Ntoutsi, I., and Theodoridis, Y. Towards Trajectory Data Warehouses. Chapter in Mobility, Data Mining and Privacy: Geographic Knowledge Discovery. Springer-Verlag. 2008. Pfoser, D., Jensen, C.S., and Theodoridis, Y. Novel Approaches to the Indexing of Moving Object Trajectories, Proc. VLDB, 2000. Shekhar, S., Lu, C., Tan, X., Chawla, S., Vatsavai, R. Map Cube: a Visualization Tool for Spatial Data Warehouses, Chapter in Geographic Data Mining and Knowledge Discovery. Taylor and Francis, 2001. Tao, Y., Kollios, G., Considine, J., Li, F., and Papadias, D. Spatio-Temporal Aggregation Using Sketches. Proc. ICDE, 2004. G. Marketos: Analyzing Trajectory Data 61

Selected literature on Mining Traffic Patterns Horvitz, E., Apacible, J., Sarin, R., and Liao, L.: Prediction, expectation, and surprise: Methods, designs, and study of a deployed traffic forecasting service, in In Twenty-First Conference on Uncertainty in Artificial Intelligence, 2005. Li, X., Han, J., Lee, J.-G. and Gonzalez, H.: Traffic density-based discovery of hot routes in road networks. in Proc. 10th International Symposium on Spatial and Temporal Databases., 2007. Liu, Y., Choudhary, A. N., Zhou, J., and Khokhar, A. A.: A scalable distributed stream mining system for highway traffic data. in Proc. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2006, pp. 309 321. Nakata T., and Takeuchi, J.: Mining traffic data from probe-car system for travel time prediction, in Proc. 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 817 822. Ntoutsi, I., Mitsou, N., Marketos, G.: Traffic mining in a road-network: How does the traffic flow? IJBIDM 3(1): 82-98 (2008) Shekhar, S., Lu, C.-T., Chawla, S., and Zhang, P.: Data mining and visualization of twin-cities traffic data, in Technical Report (TR 01-015), University of Minnesota, 2001. G. Marketos: Analyzing Trajectory Data 62