Sponsored by: Mobility Data Mining and Analytics 2 nd Datasim Summer School 14 th July 2014 S. Rinzivillo KDD Lab ISTI CNR Pisa, Italy
BIG DATA availability What we buy Whom we interact with What we search for Where we go
Country-wide mobile phone data
8 Analisi di Reti Sociali. Aprile-Maggio 2011 July 13, 2014
World Cup 2014 Football is a simple game: 22 men chase a ball for 90 minutes and at the end, the Germans always win -- Gary Lieneker (after Italy 1990 Final) @bigdatatales http://bigdatatales.com
BIG DATA availability What we buy Whom we interact with What we search for Where we go
Urban Mobility Complexity: vehicles
Urban Mobility Complexity: phones
Crash Course on MDM How can we manage the complexity coming from huge amount of data?
4-stage mobility data mining semantics derived models basic trajectory patterns and models raw trajectory data
Trajectory data q Mobility of an object is described by a set of trips q Each trip is a trajectory, i.e. a sequence of time-stamped locations Time (x 5,y 5,t 5 ) (x 5,y 5,t 5 ) Y (x 4,y 4,t 4 ) (x 4,y 4,t 4 ) X (x 1,y 1,t 1 ) (x 2,y 2,t 2 ) (x 3,y 3,t 3 ) Y X (x 1,y 1,t 1 ) (x 2,y 2,t 2 ) (x 3,y 3,t 3 )
Basic mobility patterns and models l T-Cluster: represents a group of similar trajectories l T-Pattern: represents trajectory segments that visit the same sequence of regions with similar transition times l T-Flock: represents trajectory segments that move together for a time interval
Basic mobility patterns & models: T-clustering q Trajectories are grouped based on similarity q Several possible notions of similarity q Start/End points q Shape of trajectory q Shape & time q Etc. Nanni, Pedreschi. Time-focused clustering of trajectories of moving objects. J. of Intelligent Information Systems, 2006. Rinzivillo, Pedreschi, Nanni, Giannotti, Andrienko, Andrienko. Visually-driven analysis of movement data by progressive clustering. J. of Information Visualization, 2008
Density Based Clustering K-means Density-based
Average Euclidean Distance Sincronized q Align point temporally q q Eventually assign penalties to non matching points
Common Destination q Select last point Plast for each trajectory q D(T,T ) = Euclidean(Plast, P last)
Common Origins q Select first point Pfirst for each trajectory q D(T,T ) = Euclidean(Pfirst, P first)
Route Similarity q Alignment of points, multiple matches q Average Euclidean Distance q Penalties for non matching initial points (no penalties for destinations)
Process Overview Simple and very efficient distance measure Dataset More selective and particular distance functions (or more restrictive parameters) Clusters Noise Subclusters Subclusters Noise Knowledge
Basic mobility patterns & models: T-pattern q T-Pattern Temporal information Area A Δt = 5 minutes Area B Δt = 35 minutes Area C Spatial information l Variations: Absolute time, visit duration, distance traveled, speed, sensor/ user provided measures (temp., pressure, ratings, ) Giannotti, Nanni, Pedreschi, Pinelli. Trajectory pattern mining. Proc. ACM SIGKDD 2007
Basic mobility patterns & models: T-Flocks q Group of objects that move together (close to each other) for a time interval M. Wachowicz, R. Ong, C. Renso, M. Nanni: Finding moving flock patterns among pedestrians through collective coherence. International Journal of Geographical Information Science 25(11): 1849-1864 (2011)
Derived patterns and models q Combination & refinement of basic patterns and models l Individual Mobility Profile: routines consistently followed by a single moving object l T-PTree: predictive tree built by combining T-Patterns
Derived patterns and models: mobility profiles User history An ordered sequence of spatio-temporal points. Trips construction Cutting the user history when a stop is detected Stops Spatial Threshold Stops Temporal Threshold Grouping Performing a density based clustering equipped with a spatio temporal distance function Spatial Tollerance Temporal Tollerance Spatio temporal distance Pruning Groups with a small Number of trips are Pruned Support Threshold Profile extraction The medoid of each group becomes user s routines and the all set become the user s mobility profile Trasarti, Pinelli, Nanni, Giannotti. Mining mobility user profiles for car pooling. ACM SIGKDD 2011
Derived patterns and models: T- Prediction Tree + q Rule-based prediction model q Each T-Pattern is used as a case q Tree = combination / simplification of a set of T- Patterns + Monreale, Pinelli, Trasarti, Giannotti. Where Next: a predictor on Trajectory pattern mining. Proc. ACM SIGKDD 2009
Derived patterns and models: T- PTree q Example: Compare actual trajectory against the T-PTree q Spatial and temporal similarity used to choose best rule A E D B C
Semantic Annotation Semantic trajectories: translate (x,y,t) trajectories to sequences of events with a semantics Semantic enrichment: tag and classify trajectories or patterns based on domain knowledge or mined information
Semantic trajectories q First transform a (geometric) trajectory into a semantic representation, then apply data mining. q Semantic trajectory represented as a sequence of stops (places where objects stay still) & moves (trajectory segments Tr1 = < Hotel where [21 08], objects change position) Monument [ 9 13], Restaurant [14-16] >
Mobility Diaries q Data-driven diaries q Describe daily mobility routines by means of a set of semantic trajectories...
... Mobility Diaries
Mobility Diaries (a) (b) Figure 1: (a) The top two eigenbehaviors for Subject 4 of the Reality Mining dataset, the lighter the color the higher the probability image taken from [5]. (b) Exemplary LDA-topics extracted from the Reality Mining dataset image taken from [6] Classification & Prediction of Whereabouts patterns from Reality Mining Data Sets. Ferrai andmamei. Pervasive & Mobile Computing Applying PCA or LDA to a set of these arrays allows to extract some lowdimensions latent variables (eigenvectors and LDA-topics respectively) representing underlying patterns in the data, and Journal, o ering conditional Dec. 2011 probability
M-Atlas system Download from: http://m-atlas.eu
M-Atlas input q M-Atlas: An atlas for urban mobility behaviors. A framework to query, analyze and navigate the results on mobility data
M-Atlas platform q A tool kit to extract, store, combine different kinds of models to build mobility knowledge discovery processes.
M-Atlas System Centralized database which contains all the data, patterns and models. It is possible to extend the system with new algorithms and new data, pattern or model types.
Practically the system adds new object-relational types to the database in order to represent the new types of data, patterns and models. The advantage of having an object-relational representation is threefold: (i) it allows the definition of complex data such as lists and trees, ure 8. We distinguish between models and patterns: a pattern is a representation of a local property that holds over a sub-group of mobility data, e.g., a flock of trajectories; on the other hand, a model is a representation of a global property that holds over an entire dataset: accordingly, a model is either a global aggregate (e.g., speed distribution in a trajectory dataset) or a collection of patterns (e.g., the clustering that partitions an entire dataset into separate clusters). Objects taxonomy in M-Atlas Spatial Object Temporal Object Moving Object Data Object M-Model M-Pattern T-Reachability T-Clustering T-ODMatrix T-PTree T-Pattern T-Flow T-Cluster T-Flock set of set of aggregation of Fig. 8 The M-Atlas type hierarchy. M-Model, M-Pattern and Data are the basic types of data. We can notice the relationship between M-Models and M-Patterns. For example, T-Clustering model is represented by a set of T-Cluster patterns, while T-PTree model is an aggregation of T-Patterns We distinguish between models and patterns: a pattern is a representation of a local property that holds over a sub-group of mobility data; a model is a representation of a global property that holds over an entire dataset.
CREATE DATA Travels BUILDING MOVING_POINTS FROM (SELECT userid,lon,lat,datetime FROM RawData ORDER BY userid,datetime) SET MOVING_POINT.MAX_SPACE_GAP = 0.2 AND DMQL: MOVING_POINT.MAX_TIME_GAP Model contructors = 1800 12 3.1.2 T-Flow. M-Pattern The Types T-Flow tf =< R 1,R 2,w > represents a flow of w 0 trajectories which move from region R 1 to region R 2 (Figure 9(d)). A mobility pattern, M-Pattern in short, represents the common behavior of a (sub-)group of trajectories, obtained as a result of a data mining algorithm. The types of M-Patterns 3.1.3 M-Model Types currently supported by M-Atlas are shown in Figure 9. Pattern s Mobility models, M-Models in short, are the global models extracted by a data mining algorithm, where the adjective global indicates the fact that each such model describes the entire input dataset. Figure 10 illustrates some of the available M-models in M-Atlas; other M- Models are simply the entire collection of T-Patterns, T-Clusters and T-Flocks mined over a trajectory dataset. Fig. 9 M-Pattern types: (a) T-Cluster, (b) T-Pattern, (c) T-Flock, (d) T-Flow Models T-Cluster. A T-Cluster (Figure 9(a)) is defined as a set S = {( 1,l), ( 2,l),...} of labelled trajectories, which share the same membership tag l. The trajectories of a T-Cluster are grouped on the basis of their similarity according to a specified similarity function, chosen from a repertoire of possible choices. of a data mining method with a specified parameter setting. M-Atla structor for each method in its data mining library, presented in sec T-Pattern: Fig. 10 M-Models it is represented types: (a) Reachability as tp =(R, plot, T, (b) s) T-PTree where and R mining =< (c) T-ODMatrix. r 0 constructor,...,r k > is query a sequence is the following, of which generates a step of regions, T =< t 1,...,t k > is a sequence of relative time clusters intervals under t j specific =[t s j,te j parameters: ] associated to each region and s is the support of tp, i.e., the number of trajectories that are compatible CREATE MODEL ClusteringTable t withreachability tp in space and plot: time. is a histogram Informally, ofadistances T-Patternbetween can trajectories, represented obtained as r 1 MINE AS T-CLUSTERING FROM (Select t.id, 0 r1 considering tk r t.trajobj k. a from TrajectoryTable t) Originally specificintroduced distance function in [17],(Figure a T-Pattern 10(a)). (Figure More9(b)) precisely, a concise SET it T-CLUSTERING.FUNCTION a sequence description of pairs of frequent Rp = =< ROUTE_SIMILARITY AND behaviors, (t 1,d 1 ) in...(t terms n,dof n )) both > where space t(i.e., j is athe trajectory regionsand of space d j is the visited distance T-CLUSTERING.EPS during between movements) t j = and 100 and t j+1 AND, T-CLUSTERING.MIN_PTS = 20 time where (i.e., the t j+1 duration is the of nearest movements). neighbor of t j which does not occur in {t 1,...,t j }. Using a threshold for distance, the reachability plot identifies a set of T-Clusters representing the T-Flock. A T-Flock f =(I,r,b) represents a spatio-temporal coincidence of a group of partition of the whole dataset into labelled groups of similar trajectories. moving points, where I =[t min,t max ] is the time interval of the coincidence, b is the base 3.2 Spatio-temporal query primitives moving T-PTree. point and A T-Pattern r is the spatial Tree, buffer T-PTree around in short, b which is a is compact used to representation determine the coincidence. of a set of T- This Patterns spatio temporal (Figure 10(b)). coincidence It is a prefix defines tree a PT common = {root, behavior N, E}, of where the people N is the which set of move nodes of the tree, E is the set of edges and root is the root of The thequerying tree. Each primitives node n i = over {r, data, supp} models and patterns are summ
The user Interface The process tree which organize the analyses done Each node has a type : Trajectories, Map, Clustering, Flocks, etc.. Each node is described by the chain of DMQL queries executed from the root The Map loaded from Open Street Map and composed by different layers Pre-built tools. Each one perform a set of DMQL queries on the selected node. Each tool has a set of parameters. Contextual Menu each node type has different options and tools. Each tool has a set of parameters. Additional panels for the navigation or pattern selection.
Mobility Data Mining process as a DMQL query q q q q CREATE MODEL MilanODMatrix AS MINE ODMATRIX FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t), (SELECT orig.id, orig.area FROM MunicipalityTable orig), (SELECT dest.id, dest.area FROM MunicipalityTable dest) CREATE RELATION CenterToNESuburbTrajectories USING ENTAIL FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t, MilanODMatrix m WHERE m.origin = Milan AND m.destination IN (Monza,..., Brugherio)) CREATE MODEL ClusteringTable AS MINE T- CLUSTERING FROM (Select t.id, t.trajectory from CenterToNESuburbTrajectories t) SET T-CLUSTERING.FUNCTION = ROUTE_SIMILARITY AND T-CLUSTERING.EPS = 400 AND T-CLUSTERING.MIN_PTS = 5 CREATE RELATION DistributionCluster USING CONTAINS FROM (SELECT t.id, t.trajectory, c.cid FROM ClusteringTable c, TrajectoryTable t WHERE c.tid=t.id), (SELECT * FROM Periods p) WHERE cid IN (0,2,3)
Mobility Atlas of a City Understanding urban human mobility
The (GeoP)KDD process Mobile phone data, GPS tracks End user Mobility Patterns Mobility manager Mobility Data Mining Mobility Data Raw data
Sensing the movement Several datasources avaiable
GSM data q q Mobile Cellular Networks handle information about the positioning of mobile terminals q CDR Call Data Records: call logs (tower position, time, duration,..) q Handover data: time of tower transition More sophisticated Network Measurement allow tracking of all active (calling) handsets
GPS tracks q Onboard navigation devices send GPS tracks to central servers Ide;Time;Lat;Lon;Height;Course;Speed;PDOP;State;NSat 8;22/03/07 08:51:52;50.777132;7.205580; 67.6;345.4;21.817;3.8;1808;4 8;22/03/07 08:51:56;50.777352;7.205435; 68.4;35.6;14.223;3.8;1808;4 8;22/03/07 08:51:59;50.777415;7.205543; 68.3;112.7;25.298;3.8;1808;4 8;22/03/07 08:52:03;50.777317;7.205877; 68.8;119.8;32.447;3.8;1808;4 8;22/03/07 08:52:06;50.777185;7.206202; 68.1;124.1;30.058;3.8;1808;4 8;22/03/07 08:52:09;50.777057;7.206522; 67.9;117.7;34.003;3.8;1808;4 8;22/03/07 08:52:12;50.776925;7.206858; 66.9;117.5;37.151;3.8;1808;4 8;22/03/07 08:52:15;50.776813;7.207263; 67.0;99.2;39.188;3.8;1808;4 8;22/03/07 08:52:18;50.776780;7.207745; 68.8;90.6;41.170;3.8;1808;4 8;22/03/07 08:52:21;50.776803;7.208262; 71.1;82.0;35.058;3.8;1808;4 8;22/03/07 08:52:24;50.776832;7.208682; 68.6;117.1;11.371;3.8;1808;4 q Sampling rate 30 secs q Spatial precision 10 m
Road side sensors q Measure the flow of a specific road arc q Laser-based sensors q Inductive loops q Traffic cameras
Other data sources q Social web services q Flickr q Foursquare q Gowalla q Twitter q Presence estimation q Hotel statistics q Airport departures and arrivals q Bus and public transportation q Park usage q Weather conditions
Dimensions to explore q Space q Administrative borders q E.g.: city q Distance travelled q How much a person is travelling Space Dimensions Individual Individual Preferred locations EigenMobility Time q Time q Hour of day q Day of week q Weekdays/weekends
A small city: Pisa Space Dimension s Individ ual Time
First dimension: space Travel length distribution Space Dimension s Individ ual Time
Travel length on the map
Pisa Pisa Firenze Lucca Livorno Siena Sum Firenze Lucca Livorno From everywhere To Firenze 26 January 26 Jan 27 Jan To Lucca From everywhere 26 January 28 Jan 29 Jan From everywhere To everywhere All times 30 Jan To Lucca From everywhere All times Exploring Origin and Destinations
Exploring Origins and Destinations
Exploring the origins of trips 0km 5km 5km 15Km > 150km
Exploring origins of trips > 150km 19 trips
Second dimension: time When people move to Pisa? Space Dimension s Individ ual Time
Let s focus at city level 0km 5Km 5km 15Km
Trips segmented by similarity Space Dimension s Individ ual Time
Explore clusters: Florence
Explore clusters: A1
Explore clusters: A12
Explore Clusters: Valdera
Explore clusters: Versilia
Trip segmentation by time Space Dimension s Individ ual Time
Trips Segmented by Time: from 5 to 8
Discover traffic jams
Aggregate trips by common destinations
Industry: Saint Gobain
Industry: Saint Gobain
Residential Area: I Passi
Residential Area: I Passi
Residential vs Industrial
Services: Montacchiello
Services: Montacchiello
Extracting travellers profiles - Analysis focused on the single individual - Find his/her systematic mobility User trips Mobility profile Routines
Services: Montacchiello (Profiles) Space Dimension s Individ ual Time
Impact of systematic mobility on access patterns
What-if scenarios
Service: Montacchiello (Car Pooling?) q Traj Blu q DT: 06:46:53 q Traj Red q DT: 11:52:06 q Traj Green q DT: 06:51:41 q Blu can give a ride to Green
Application: Car pooling Pro-active suggestions of sharing rides opportunities without the need for the user to explicitly specify the trips of interest. Matching two routines: Mobility profile share-ability:
Communities of users
Networks as a mining tool S. Rinzivillo, S. Mainardi, F. Pezzoni, M. Coscia, D. Pedreschi, F. Giannotti Discovering the Geographical Borders of Human Mobility KI - Künstliche Intelligenz, 2012.
Mobility coverages
Step 1: spatial regions
Step 2: evaluate flows among regions
Step 3: forget geography
Step 4: perform community detection
Step 4: perform community detection
Step 5: map back to geography
Step 6: draw borders
Final result
Final result: compare with municipality borders
Borders in different time periods Only weekdays movements Only weekend movements Similar to global clustering: strong influence of systematic movements Strong fragmentation: the influence of systematic movements (home-work) is missing
Borders at regional scale
Final results 7 (a) 500m (b) 1000m (c) 2000m (c) 5,000m (d) 10,000m (e) 20,000m Fig. 7: The resulting clusters obtained with different spatial granularities. 0.59 topology analysis of the networks performed in Section IV, that identified the most promising cell sizes at values smaller 0.58
Confronto con le nuove province
Explore borders by time q Use temporal projections to extract mobility networks q Identified three main periods q Week days q Week ends q Whole week q Having GPS data extending over 4 weeks we extracted 12 distinct networs, named as week0,weekday0,weekend0,week1,and so on Coscia, M., Rinzivillo, S., Giannotti, F. and Pedreschi, D., Optimal Spatial Resolution for the Analysis of Human Mobility. In ASONAM, 2012.
Degree distribution by time p(d) 1 0.1 0.01 0.001 Weekdays1 Weekdays2 Weekdays3 Weekdays4 Weekend1 Weekend2 Weekend3 Weekend4 Week1 Week2 Week3 Week4 0.0001 1e-05 1 10 100 1000 10000 d
Network properties (by day) # 60000 50000 40000 30000 20000 10000 Nodes Edges # Connected Components 220 200 180 160 140 120 0 May 2st May 8th May 15th May 22nd 100 May 29th May 2st May 8th May 15th May 22nd May 29th Day Day
Borders quality
Semantic Enrichment
NetMob 2013 MP4-A Project: Mobility Planning For Africa Mirco Nanni, Roberto Trasarti, Barbara Furletti, Lorenzo Gabrielli Peter Van Der Mede, Joost De Bruijn, Erik De Romph, Gerard Bruil
The Challenge q Incompleteness issue q Call Detail Records describe the location of users only during activity (calls, messages) q Most individual mobility might be invisible q Lack of semantics q No information about activities and purpose q Spatial uncertainty issue q Location described in terms of cells having dynamic and sometimes large extent
The approach (summary) q Analyze raw GSM data to q infer systematic mobility of individuals q Build origin-destination matrices q Describe (expected) flows between areas q Build a transportation model q Assigns O/D matrix to OSM road network through OmniTRANS system
Systematic mobility q A single trace of an individual can be poorly informative about his/her movements H B W A C H A B W C time
Systematic mobility q Yet, several daily traces of the same individual might allow to identify regular places H A W C H H W H H B H A W H H W W H H A B W C
H Systematic mobility q Yet, several daily traces of the same individual might allow to identify regular places A C W H W B H A H W H W H H H A B W H C W H H H H H H H H H H W W W W W W
H Systematic mobility q Yet, several daily traces of the same individual might allow to identify regular places and trips A C W H W B H A H W H W H H H A B W H C W H H H H H H H H H H W W W W W W
Systematic mobility q The whole individual mobility is then summarized by its systematic movements Afternoon routine H W Morning routine l They will be used as typical daily schedule of the individual
Systematic O/D matrix q Combine the ten 2-weeks datasets into one q For each user, extract significant L1 L2 q Aggregate (individual) systematic movements into (collective) systematic flows q Examples: Outgoing traffic Incoming traffic
Mobile phone socio-meters Analyze individual call habits to recognize profiles q Resident q Commuters q Visitors/Tourists
Call Habit Profiles Week: working days Time & weekend slots 0:00-7:59 8:00-18:59 19:00-23:59 Users call habit profile
Resident profile
Resident profile Commuter profile
Resident profile Commuter profile Visitor profile Night visitors Daylight visitors
User profile quantification Resident profile Commuter profile Visitor profile
Sponsored by: Investigating semantic regularity of human mobility lifestyle Vinicius Monteiro de Lira Federal University of Pernambuco, Brazil vcml@cin.ufpe.br Valeria Cesario Times Federal University of Pernambuco, Brazil vct@cin.ufpe.br Patricia Cabral Tedesco Federal University of Pernambuco, Brazil pcart@cin.ufpe.br Salvatore Rinzivillo ISTI-CNR, Pisa, Italy salvatore.rinzivillo@isti.cnr.it Chiara Renso ISTI-CNR, Pisa, Italy chiara.renso@isti.cnr.it 18th International Database Engineering & Applications Symposium IDEAS '14 Porto, Portugal 12 2
INTRODUCTION 123 The appearance and wide distribution of position-enabled personal devices boosted the study of the mobility behavior of the individuals based on crowsourced data. When these postioning data are enrcihed with semantic information (i.e. the place visited) we have semantic trajectories. The semantics helps in the human dynamics understanding
About Regularity 124 We study the tendency of mobile individuals to be regular or irregular when choosing the places and the time to perform some activities Semantic (or activity-based) Regularity Definition of spatial and temporal entropy as a measure of the semantic regularity of users computed from crowsensed data Values ranges from 0 to 1; Where 1 means highest regularity; and 0, lowest regularity or no regularity;
125 Why studying semantic regularity? Regularity profiles can characterize one specific aspect of the user lifestyle We give a quantitative measure of the regularity habits of the people under observation This can be useful in: Recommendation systems Carpooling Advertisement
METHODOLOGY 126 The semantic regularity behavior is measured according to two dimensions: Spatial: how much a user tends to visit the same places to perform a given activity. Temporal: the regularity of the user to perform an activity in a preferred temporal interval.
METHODOLOGY 127 Three phases: (i) Data Collection of users' visits to Points of Interest (POIs); (ii) Estimation of the regularity measures (iii) Extraction of the semantic regularity profiles.
Semantic regularity - Example 128 Visits dataset University Work/Study 14.00 18.00 Gym Leisure 18.30-20 Restaurant Eating 12.45-13.30 We associate a category of place to an activity with a static mapping University Work/Study 14.30 18.30 Gym Leisure 19.00 20.30 Restaurant Eating 13.00 14.00
Visits and frequency distributions 129 The Visits dataset provides the mobility information to associate a person p to a POI poi_id she visited. < VisitID; UserID; poi id; poi cat; timestamp > Formally, for a POI p of category C we define the spatial relative frequency distribution SRFD of u as: SRFD(u,C,p) = P(u in p C) = #visists to p #visits to C Formally, for a POI of category C we define the temporal relative frequency distribution TRFD of u as: TRFD(u,C,t) = P(u in t C) = #visists to t #visits to C
The Entropy measures 130 Given a user u and a place category C, his Spatial Entropy (SH): SH(u,C) = p C SRFD(u,p,C)logSRFD(u,p,C) And, analogously the Temporal Entropy (TH): TH(u,C) = p T TRFD(u,t,C)logTRFD(u,t,C) The Spatial Maximum Entropy (SMH) for each category: SMH(C) = log C The Temporal Maximum Entropy (TMH) for each category: SMH(C) = log I
Semantic regularity 131 Given a user u and a category C, the Semantic Spatial Regularity for C is: Given a user u, a set of interval I and a category C, the Semantic Temporal Regularity for C is: A semantic regularity profile for a user u and consists of a set of tuples < Ci,SSR(u,Ci),STR(u,Ci) > for all catgories of places (activities) Ci in C1,C2,...,Cn.
Example 132 Example of Semantic Spatial Regularity and Semantic Temporal Regularity for Gyms: We compute the Spatial Entropy (SH) and the Temporal Entropy (TH). Based on this we can see that the regularity measure for the gym SSR is high, while the temporal regularity STR is low
EXPERIMENTS 133 We tested our methodology using a dataset of check-ins generated from a Location-based Social Network (LBSN), called Brightkite. The dataset has a total of 968.784 check-ins performed by 2806 users around the world between March 22nd, 2008 and October 18th, 2010. Check-ins : user identification, the geographic coordinates and the time instant Foursquare API to annotate semantically the places where users performed the check-ins. 13 main categories of POIs mapped to most common activities
EXPERIMENTS - restaurants 134 Restaurant category Most REGULAR Most people tend to change the place when they go eating and also the time when they go. Most of the users are irregular in space and time Most IRREGULAR
EXPERIMENTS - University 135 University category Most REGULAR We clearly notice a very regular spatial behavior Most of the users are distributed close the value 1 (more regular) on the spatial dimension Most IRREGULAR
EXPERIMENTS High regularity 136 TL TR High irregularity BL BR
MAPMOLTY tool 137 MAPMOLTY computes a number of measures to summarize the loyalty level of each POI from different Categories, called loyalty indicators. The application is built upon the map to ease the navigability and visualization in the interesting area. Vinicius de Lira, Chiara Renso, Salvatore Rinzivillo, Valeria Cesario Times and Patricia Tedesco. MAPMOLTY: a web tool for discovering place loyalty based on mobile crowdsource data, Demo paper at ICWE 2014
Collect Movements from the Crowd q Investigate approaches to mine urban mobility patterns and anomalies by analyzing socially created trajectories: - Extract mobility from geo-enabled social media - Enrich with contextual/semantic information to extract more insights about the nature of the movements.
Twitter Data q Microblogging platform q User may send short messagges (up to 140 characters) on what is around them q Georeference of tweets q 600k tweets (300k geotagged) q 33k users q 8 weeks (may-june 2012)
How to build Tweet-trajectories q Aggregate consecutive tweets according to a spatio-temporal threshold
Sampling rate distribution of tweets
Trajectory Extraction
Trajectory Extraction
Origin Destination Analysis
Origin Destination Analysis: relevant fluxes From Airport From Sagrada Familia
Semantic Enrichment
Foursquare q q User contributed timestamped position 9 Top-level categories q Nightlife and Sport q Travel & Transport q Outdoor & Recreation q Shop & Service q College & University q Food q Art & Entertainment q News q Residence q Professional & Other Places
Semantic trajectory mining MWC2012 Semantic Trajectories q Dataset: 9689 trajectories built (75 min./100 mt.) from geo-located tweets of Barcelona during the week of the Mobile World Congress 2012 (MWC2012), the week before and the week after and semantically enriched by classifying as performed by tourists and locals associating the most-likely Foursquare venue.
Semantic Trajectory Mining Semantic Origin/Destinaiton matrix built considering the top Foursquare category of origin and destination of trajectories Start trajectory Foursquare place: Burger King, L Hospitalet Foursquare top category: Food End Trajectory Foursquare place: 22@, Glories Foursquare top category: Professional & Other places
Semantic Trajectory Mining Semantic Origin/Destinaiton matrix built considering the top Foursquare category of origin and destination of trajectories
Semantic Trajectory Mining Semantic Origin/Destinaiton matrix built considering the top Foursquare category Week before of origin and destination of trajectories MWC2012 Week Week after
Join Semantics with Spatial Flow 90 80 Trips by category entering to Sants Montjuic Week 0 Week 1 Week 2 70 60 # of trips 50 40 30 20 10 0 Food Arts & Entertainment Outdoors & Recreation Professional & Other Places Travel & Transport Shop & Service Nightlife Spot
Join Semantics with Spatial Flow 40 35 Trips exiting from Sants Montjuic by category Week 0 Week 1 Week 2 30 25 # of trips 20 15 10 5 0 Food Arts & Entertainment Outdoors & Recreation Professional & Other Places Travel & Transport Shop & Service Nightlife Spot College & University
Sponsored by: Where have you been today? Annotating trajectories with DayTag S. Rinzivillo, F. Siqueira, L. Gabrielli, C. Renso, V. Bogorny SSTD 2013, Demo Paper, Monaco
Sensing People Behavior: Surveys q Cons q Low spatial precision q Low temporal accuracy q Limited in time (usually one or two days) q Underestimation of short stops (e.g. ATM) q Pro q Semantically rich q User-view of movement q Motivation of the movement
Sensing People Behavior: GPS q Cons: q No semantic information q Difficult for user to reconstruct movement motivations q Pro q High spatial precision q High temporal accuracy q Unlimited time of track q Precise reconstruction of movement dynamic (accelaration, route, speed) q Low cost technology
DayTag
DayTag: Anatomy
DayTag: Timeline
DayTag: Spatial Reference
DayTag: Semantic Information
Cambia il traffico con i tuoi TAG Una inizifva di: In collaborazione con: tagmyday.isf.cnr.it
Join us Move Tag Send tagmyday.isf.cnr.it
Personal Data Store tagmyday.isf.cnr.it
AcFvity DistribuFon Incoming flow to Calci from Pisa tagmyday.isf.cnr.it
Atlas of Urban Mobility
Atlas of Urban Mobility
Pisa Traffico in Ingresso
Pisa Incoming Traffic
Trip distribution per day Pisa S. Giuliano Cascina 1600 120 1400 100 1200 1000 80 800 60 600 40 400 200 20 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0
From DATA to KNOWLEDGE Demographic data Transport data Movement data Geographic data Data T- Clustering T- Pa[erns Models ValidaFon Forecasts
Deployment of a model Data Integration and Semantic Enrichment Service Continuosly Sensed indicator CREATE MODEL MilanODMatrix AS MINE ODMATRIX FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t), (SELECT orig.id, orig.area FROM MunicipalityTable orig), (SELECT dest.id, dest.area FROM MunicipalityTable dest) Dashboard Periodically Sensed indicator Validation
Privacy by Design in Data Mining
7 Billion October 2011
The dark side: Privacy Risks 176 ü Big data of human activity contain personal sensitive information ü Opportunities of discovering knowledge by analytical and data mining tools increase hand in hand with the risks of privacy violation ü An important question: May data publishing and mining violate individual privacy?
De-identified User Trajectory 177 ü ü Human data may reveal many facets of the private life Privacy protection is increasingly difficult and it cannot simply be accomplished by de-identification ü ü Color darkness of each region is proportional to the number of different visits Discovering persons living in that home and working in that company we can identify the user
178 How can we guarantee privacy protection in Data Mining? Privacy by Design Paradigm
Privacy by Design Paradigm 179 ü Design frameworks to counter the threats of undesirable and unlawful effects of privacy violation without obstructing the knowledge discovery opportunities of data mining technologies ü Natural trade-off between privacy quantification and data utility ü Our idea: Privacy by Design in Data Mining Philosophy and approach of embedding privacy into the design, operation and management of information processing technologies and systems
Privacy by Design in Data Mining 180 ü The framework is designed with assumptions about The sensitive data that are the subject of the analysis The attack model, i.e., the knowledge and purpose of a malicious party that wants to discover the sensitive data The target analytical questions that are to be answered with the data ü Design a privacy-preserving framework able to transform the data into an anonymous version with a quantifiable privacy guarantee guarantee that the analytical questions can be answered correctly, within a quantifiable approximation that specifies the data utility
Our Frameworks 181 q Privacy by Design for Data Publishing q Trajectory Anonymization by spatial generalization q Trajectory Anonymization by semantic generalization q Privacy by Design for Data Mining Outsourcing q Privacy-Preserving Mining of Association Rules from Outsourced Transaction Databases q Privacy by Design for GSM User Profiles q Privacy by Design in Distributed Movement Data
Privacy by Design for Movement Data Publication A. Monreale, G. Andrienko, N. Andrienko, F. Giannotti, D. Pedreschi, S. Rinzivillo, S. Wrobel. Movement Data Anonymity through Generalization. Journal of Transactions on Data Privacy
Privacy-Preserving Framework q Anonymization of movement data while preserving clustering q Trajectory Linking Attack: the attacker q knows some points of a given trajectory q and wants to infer the whole trajectory q Countermeasure: method based on q spatial generalization of trajectories q k-anonymization of trajectories
Trajectory Generalization q Given a trajectory dataset 1. Partition of the territory into Voronoi cells 2. Transform trajectories into sequence of cells
Partition of the territory Characteristic points extraction: Starts (1) Ends (2) Points of significant turns (3) Points of significant stops, and representative points from long straight segments (4) 1 4 4 3 4 4 4 2 4 4 3 4 Spatial Clusters : Group the extracted points with desired spatial extent (MaxRadius) defining the degree of the generalization Voronoi Tessellation: Partition the territory into Voronoi cells using the centroids of the spatial clusters as generating
Generation of trajectories Divide the trajectories into segments that link Voronoi cells For each trajectory: the area a 1 containing its first point p 1 is found the following points are checked If a point p i is not contained in a 1 for it the containing area a 2 is found and so on Generalized trajectory: From sequence of areas to sequence of centroids of areas
Generalization vs k-anonymity 187 q Generalization could not be sufficient to ensure k-anonymity: q For each generalized trajectory there exist at least others k-1 different people with the same trajectory? q Two transformation strategies q KAM-CUT q publishing only the k-frequent prefixes of the generalized trajectories q KAM-REC q recovering portions of trajectories which are frequent at least k times q minimizing the noise
Dataset 188 q Trajectory Data in Milan city q GPS traces by about 17,000 vehicles
Clustering on Anonymized Trajectories 189
Probability of re-identification 190
Conclusion q Opportunities and challenges to have a deep insight within human mobility q Mobility models as dual piece of knowledge q Enabler for new services q Decision support for planning and design q Creation and extraction of complex models supported by an integrated platform: M-Atlas q Management of complex analytical processes q Deployment of services
Conclusion q Privacy is ever-growing concern in our society q Privacy often brings to skepticism q Effects on the use of technologies q Effects on the opportunities of data understanding q Providing methodologies for risk evaluation and data control
Key publications q q q q q q q F Giannotti, M Nanni, F Pinelli, D Pedreschi. Trajectory pattern mining. ACM SIGKDD 2007 F Giannotti, D Pedreschi. Mobility, data mining and privacy: Geographic knowledge discovery. Springer, 2008 A Monreale, F Pinelli, R Trasarti, F Giannotti. WhereNext: a location predictor on trajectory pattern mining. ACM SIGKDD 2009 S Rinzivillo, D Pedreschi, M Nanni, F Giannotti, N Andrienko, G Andrienko. Visually driven analysis of movement data by progressive clustering. Information Visualization 7 (3-4), 225-239. 2008 D Wang, D Pedreschi, C Song, F Giannotti, AL Barabasi. Human mobility, social ties, and link prediction. ACM SIGKDD 2011 F Giannotti, M Nanni, D Pedreschi, F Pinelli, C Renso, S Rinzivillo, R Trasarti. Unveiling the complexity of human mobility by querying and mining massive trajectory data. The VLDB 20(5) 2011 R Trasarti, F Pinelli, M Nanni, F Giannotti. Mining mobility user profiles for car pooling. ACM SIGKDD 2011
Key publications q q q q q M Coscia, G Rossetti, F Giannotti, D Pedreschi. Demon: a local-first discovery method for overlapping communities. ACM SIGKDD 2012 S Rinzivillo, S Mainardi, F Pezzoni, M Coscia, D Pedreschi, F Giannotti. Discovering the geographical borders of human mobility. KI-Künstliche Intelligenz 26 (3) 2012 D Pennacchioli, M Coscia, S Rinzivillo, D Pedreschi, F Giannotti. Explaining the Product Range Effect in Purchase Data. IEEE BIGDATA 2013 B Furletti, L Gabrielli, C Renso, S Rinzivillo. Analysis of GSM Calls Data for Understanding User Mobility Behavior. IEEE BIG DATA 2013 L Milli, A Monreale, G Rossetti, D Pedreschi, F Giannotti, F Sebastiani. Quantification trees. IEEE ICDM 2013
Vision papers q F Giannotti, D Pedreschi, A Pentland, P Lukowicz, D Kossmann, J Crowley, D Helbing. A planetary nervous system for social mining and collective awareness. The European Physical Journal Special Topics 214 (1), 49-75, 2012 q J van den Hoven, D Helbing, D Pedreschi, J Domingo-Ferrer, F Giannotti. FuturICT The road towards ethical ICT. The European Physical Journal Special Topics 214 (1), 153-181, 2012 q M Batty, KW Axhausen, F Giannotti, A Pozdnoukhov, A Bazzani, M Wachowicz. Smart cities of the future. The European Physical Journal Special Topics 214 (1), 481-518, 2012