Mobility Data Mining and Analytics

Size: px
Start display at page:

Download "Mobility Data Mining and Analytics"

Transcription

1 Sponsored by: Mobility Data Mining and Analytics 2 nd Datasim Summer School 14 th July 2014 S. Rinzivillo KDD Lab ISTI CNR Pisa, Italy

2 BIG DATA availability What we buy Whom we interact with What we search for Where we go

3

4

5

6 Country-wide mobile phone data

7

8 8 Analisi di Reti Sociali. Aprile-Maggio 2011 July 13, 2014

9 World Cup 2014 Football is a simple game: 22 men chase a ball for 90 minutes and at the end, the Germans always win -- Gary Lieneker (after Italy

10 BIG DATA availability What we buy Whom we interact with What we search for Where we go

11 Urban Mobility Complexity: vehicles

12 Urban Mobility Complexity: phones

13 Crash Course on MDM How can we manage the complexity coming from huge amount of data?

14 4-stage mobility data mining semantics derived models basic trajectory patterns and models raw trajectory data

15 Trajectory data q Mobility of an object is described by a set of trips q Each trip is a trajectory, i.e. a sequence of time-stamped locations Time (x 5,y 5,t 5 ) (x 5,y 5,t 5 ) Y (x 4,y 4,t 4 ) (x 4,y 4,t 4 ) X (x 1,y 1,t 1 ) (x 2,y 2,t 2 ) (x 3,y 3,t 3 ) Y X (x 1,y 1,t 1 ) (x 2,y 2,t 2 ) (x 3,y 3,t 3 )

16 Basic mobility patterns and models l T-Cluster: represents a group of similar trajectories l T-Pattern: represents trajectory segments that visit the same sequence of regions with similar transition times l T-Flock: represents trajectory segments that move together for a time interval

17 Basic mobility patterns & models: T-clustering q Trajectories are grouped based on similarity q Several possible notions of similarity q Start/End points q Shape of trajectory q Shape & time q Etc. Nanni, Pedreschi. Time-focused clustering of trajectories of moving objects. J. of Intelligent Information Systems, Rinzivillo, Pedreschi, Nanni, Giannotti, Andrienko, Andrienko. Visually-driven analysis of movement data by progressive clustering. J. of Information Visualization, 2008

18 Density Based Clustering K-means Density-based

19 Average Euclidean Distance Sincronized q Align point temporally q q Eventually assign penalties to non matching points

20 Common Destination q Select last point Plast for each trajectory q D(T,T ) = Euclidean(Plast, P last)

21 Common Origins q Select first point Pfirst for each trajectory q D(T,T ) = Euclidean(Pfirst, P first)

22 Route Similarity q Alignment of points, multiple matches q Average Euclidean Distance q Penalties for non matching initial points (no penalties for destinations)

23 Process Overview Simple and very efficient distance measure Dataset More selective and particular distance functions (or more restrictive parameters) Clusters Noise Subclusters Subclusters Noise Knowledge

24 Basic mobility patterns & models: T-pattern q T-Pattern Temporal information Area A Δt = 5 minutes Area B Δt = 35 minutes Area C Spatial information l Variations: Absolute time, visit duration, distance traveled, speed, sensor/ user provided measures (temp., pressure, ratings, ) Giannotti, Nanni, Pedreschi, Pinelli. Trajectory pattern mining. Proc. ACM SIGKDD 2007

25 Basic mobility patterns & models: T-Flocks q Group of objects that move together (close to each other) for a time interval M. Wachowicz, R. Ong, C. Renso, M. Nanni: Finding moving flock patterns among pedestrians through collective coherence. International Journal of Geographical Information Science 25(11): (2011)

26 Derived patterns and models q Combination & refinement of basic patterns and models l Individual Mobility Profile: routines consistently followed by a single moving object l T-PTree: predictive tree built by combining T-Patterns

27 Derived patterns and models: mobility profiles User history An ordered sequence of spatio-temporal points. Trips construction Cutting the user history when a stop is detected Stops Spatial Threshold Stops Temporal Threshold Grouping Performing a density based clustering equipped with a spatio temporal distance function Spatial Tollerance Temporal Tollerance Spatio temporal distance Pruning Groups with a small Number of trips are Pruned Support Threshold Profile extraction The medoid of each group becomes user s routines and the all set become the user s mobility profile Trasarti, Pinelli, Nanni, Giannotti. Mining mobility user profiles for car pooling. ACM SIGKDD 2011

28 Derived patterns and models: T- Prediction Tree + q Rule-based prediction model q Each T-Pattern is used as a case q Tree = combination / simplification of a set of T- Patterns + Monreale, Pinelli, Trasarti, Giannotti. Where Next: a predictor on Trajectory pattern mining. Proc. ACM SIGKDD 2009

29 Derived patterns and models: T- PTree q Example: Compare actual trajectory against the T-PTree q Spatial and temporal similarity used to choose best rule A E D B C

30 Semantic Annotation Semantic trajectories: translate (x,y,t) trajectories to sequences of events with a semantics Semantic enrichment: tag and classify trajectories or patterns based on domain knowledge or mined information

31 Semantic trajectories q First transform a (geometric) trajectory into a semantic representation, then apply data mining. q Semantic trajectory represented as a sequence of stops (places where objects stay still) & moves (trajectory segments Tr1 = < Hotel where [21 08], objects change position) Monument [ 9 13], Restaurant [14-16] >

32 Mobility Diaries q Data-driven diaries q Describe daily mobility routines by means of a set of semantic trajectories...

33 ... Mobility Diaries

34 Mobility Diaries (a) (b) Figure 1: (a) The top two eigenbehaviors for Subject 4 of the Reality Mining dataset, the lighter the color the higher the probability image taken from [5]. (b) Exemplary LDA-topics extracted from the Reality Mining dataset image taken from [6] Classification & Prediction of Whereabouts patterns from Reality Mining Data Sets. Ferrai andmamei. Pervasive & Mobile Computing Applying PCA or LDA to a set of these arrays allows to extract some lowdimensions latent variables (eigenvectors and LDA-topics respectively) representing underlying patterns in the data, and Journal, o ering conditional Dec probability

35 M-Atlas system Download from:

36 M-Atlas input q M-Atlas: An atlas for urban mobility behaviors. A framework to query, analyze and navigate the results on mobility data

37 M-Atlas platform q A tool kit to extract, store, combine different kinds of models to build mobility knowledge discovery processes.

38 M-Atlas System Centralized database which contains all the data, patterns and models. It is possible to extend the system with new algorithms and new data, pattern or model types.

39 Practically the system adds new object-relational types to the database in order to represent the new types of data, patterns and models. The advantage of having an object-relational representation is threefold: (i) it allows the definition of complex data such as lists and trees, ure 8. We distinguish between models and patterns: a pattern is a representation of a local property that holds over a sub-group of mobility data, e.g., a flock of trajectories; on the other hand, a model is a representation of a global property that holds over an entire dataset: accordingly, a model is either a global aggregate (e.g., speed distribution in a trajectory dataset) or a collection of patterns (e.g., the clustering that partitions an entire dataset into separate clusters). Objects taxonomy in M-Atlas Spatial Object Temporal Object Moving Object Data Object M-Model M-Pattern T-Reachability T-Clustering T-ODMatrix T-PTree T-Pattern T-Flow T-Cluster T-Flock set of set of aggregation of Fig. 8 The M-Atlas type hierarchy. M-Model, M-Pattern and Data are the basic types of data. We can notice the relationship between M-Models and M-Patterns. For example, T-Clustering model is represented by a set of T-Cluster patterns, while T-PTree model is an aggregation of T-Patterns We distinguish between models and patterns: a pattern is a representation of a local property that holds over a sub-group of mobility data; a model is a representation of a global property that holds over an entire dataset.

40 CREATE DATA Travels BUILDING MOVING_POINTS FROM (SELECT userid,lon,lat,datetime FROM RawData ORDER BY userid,datetime) SET MOVING_POINT.MAX_SPACE_GAP = 0.2 AND DMQL: MOVING_POINT.MAX_TIME_GAP Model contructors = T-Flow. M-Pattern The Types T-Flow tf =< R 1,R 2,w > represents a flow of w 0 trajectories which move from region R 1 to region R 2 (Figure 9(d)). A mobility pattern, M-Pattern in short, represents the common behavior of a (sub-)group of trajectories, obtained as a result of a data mining algorithm. The types of M-Patterns M-Model Types currently supported by M-Atlas are shown in Figure 9. Pattern s Mobility models, M-Models in short, are the global models extracted by a data mining algorithm, where the adjective global indicates the fact that each such model describes the entire input dataset. Figure 10 illustrates some of the available M-models in M-Atlas; other M- Models are simply the entire collection of T-Patterns, T-Clusters and T-Flocks mined over a trajectory dataset. Fig. 9 M-Pattern types: (a) T-Cluster, (b) T-Pattern, (c) T-Flock, (d) T-Flow Models T-Cluster. A T-Cluster (Figure 9(a)) is defined as a set S = {( 1,l), ( 2,l),...} of labelled trajectories, which share the same membership tag l. The trajectories of a T-Cluster are grouped on the basis of their similarity according to a specified similarity function, chosen from a repertoire of possible choices. of a data mining method with a specified parameter setting. M-Atla structor for each method in its data mining library, presented in sec T-Pattern: Fig. 10 M-Models it is represented types: (a) Reachability as tp =(R, plot, T, (b) s) T-PTree where and R mining =< (c) T-ODMatrix. r 0 constructor,...,r k > is query a sequence is the following, of which generates a step of regions, T =< t 1,...,t k > is a sequence of relative time clusters intervals under t j specific =[t s j,te j parameters: ] associated to each region and s is the support of tp, i.e., the number of trajectories that are compatible CREATE MODEL ClusteringTable t withreachability tp in space and plot: time. is a histogram Informally, ofadistances T-Patternbetween can trajectories, represented obtained as r 1 MINE AS T-CLUSTERING FROM (Select t.id, 0 r1 considering tk r t.trajobj k. a from TrajectoryTable t) Originally specificintroduced distance function in [17],(Figure a T-Pattern 10(a)). (Figure More9(b)) precisely, a concise SET it T-CLUSTERING.FUNCTION a sequence description of pairs of frequent Rp = =< ROUTE_SIMILARITY AND behaviors, (t 1,d 1 ) in...(t terms n,dof n )) both > where space t(i.e., j is athe trajectory regionsand of space d j is the visited distance T-CLUSTERING.EPS during between movements) t j = and 100 and t j+1 AND, T-CLUSTERING.MIN_PTS = 20 time where (i.e., the t j+1 duration is the of nearest movements). neighbor of t j which does not occur in {t 1,...,t j }. Using a threshold for distance, the reachability plot identifies a set of T-Clusters representing the T-Flock. A T-Flock f =(I,r,b) represents a spatio-temporal coincidence of a group of partition of the whole dataset into labelled groups of similar trajectories. moving points, where I =[t min,t max ] is the time interval of the coincidence, b is the base 3.2 Spatio-temporal query primitives moving T-PTree. point and A T-Pattern r is the spatial Tree, buffer T-PTree around in short, b which is a is compact used to representation determine the coincidence. of a set of T- This Patterns spatio temporal (Figure 10(b)). coincidence It is a prefix defines tree a PT common = {root, behavior N, E}, of where the people N is the which set of move nodes of the tree, E is the set of edges and root is the root of The thequerying tree. Each primitives node n i = over {r, data, supp} models and patterns are summ

41 The user Interface The process tree which organize the analyses done Each node has a type : Trajectories, Map, Clustering, Flocks, etc.. Each node is described by the chain of DMQL queries executed from the root The Map loaded from Open Street Map and composed by different layers Pre-built tools. Each one perform a set of DMQL queries on the selected node. Each tool has a set of parameters. Contextual Menu each node type has different options and tools. Each tool has a set of parameters. Additional panels for the navigation or pattern selection.

42 Mobility Data Mining process as a DMQL query q q q q CREATE MODEL MilanODMatrix AS MINE ODMATRIX FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t), (SELECT orig.id, orig.area FROM MunicipalityTable orig), (SELECT dest.id, dest.area FROM MunicipalityTable dest) CREATE RELATION CenterToNESuburbTrajectories USING ENTAIL FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t, MilanODMatrix m WHERE m.origin = Milan AND m.destination IN (Monza,..., Brugherio)) CREATE MODEL ClusteringTable AS MINE T- CLUSTERING FROM (Select t.id, t.trajectory from CenterToNESuburbTrajectories t) SET T-CLUSTERING.FUNCTION = ROUTE_SIMILARITY AND T-CLUSTERING.EPS = 400 AND T-CLUSTERING.MIN_PTS = 5 CREATE RELATION DistributionCluster USING CONTAINS FROM (SELECT t.id, t.trajectory, c.cid FROM ClusteringTable c, TrajectoryTable t WHERE c.tid=t.id), (SELECT * FROM Periods p) WHERE cid IN (0,2,3)

43 Mobility Atlas of a City Understanding urban human mobility

44 The (GeoP)KDD process Mobile phone data, GPS tracks End user Mobility Patterns Mobility manager Mobility Data Mining Mobility Data Raw data

45 Sensing the movement Several datasources avaiable

46 GSM data q q Mobile Cellular Networks handle information about the positioning of mobile terminals q CDR Call Data Records: call logs (tower position, time, duration,..) q Handover data: time of tower transition More sophisticated Network Measurement allow tracking of all active (calling) handsets

47 GPS tracks q Onboard navigation devices send GPS tracks to central servers Ide;Time;Lat;Lon;Height;Course;Speed;PDOP;State;NSat 8;22/03/07 08:51:52; ; ; 67.6;345.4;21.817;3.8;1808;4 8;22/03/07 08:51:56; ; ; 68.4;35.6;14.223;3.8;1808;4 8;22/03/07 08:51:59; ; ; 68.3;112.7;25.298;3.8;1808;4 8;22/03/07 08:52:03; ; ; 68.8;119.8;32.447;3.8;1808;4 8;22/03/07 08:52:06; ; ; 68.1;124.1;30.058;3.8;1808;4 8;22/03/07 08:52:09; ; ; 67.9;117.7;34.003;3.8;1808;4 8;22/03/07 08:52:12; ; ; 66.9;117.5;37.151;3.8;1808;4 8;22/03/07 08:52:15; ; ; 67.0;99.2;39.188;3.8;1808;4 8;22/03/07 08:52:18; ; ; 68.8;90.6;41.170;3.8;1808;4 8;22/03/07 08:52:21; ; ; 71.1;82.0;35.058;3.8;1808;4 8;22/03/07 08:52:24; ; ; 68.6;117.1;11.371;3.8;1808;4 q Sampling rate 30 secs q Spatial precision 10 m

48 Road side sensors q Measure the flow of a specific road arc q Laser-based sensors q Inductive loops q Traffic cameras

49 Other data sources q Social web services q Flickr q Foursquare q Gowalla q Twitter q Presence estimation q Hotel statistics q Airport departures and arrivals q Bus and public transportation q Park usage q Weather conditions

50 Dimensions to explore q Space q Administrative borders q E.g.: city q Distance travelled q How much a person is travelling Space Dimensions Individual Individual Preferred locations EigenMobility Time q Time q Hour of day q Day of week q Weekdays/weekends

51 A small city: Pisa Space Dimension s Individ ual Time

52 First dimension: space Travel length distribution Space Dimension s Individ ual Time

53 Travel length on the map

54 Pisa Pisa Firenze Lucca Livorno Siena Sum Firenze Lucca Livorno From everywhere To Firenze 26 January 26 Jan 27 Jan To Lucca From everywhere 26 January 28 Jan 29 Jan From everywhere To everywhere All times 30 Jan To Lucca From everywhere All times Exploring Origin and Destinations

55 Exploring Origins and Destinations

56 Exploring the origins of trips 0km 5km 5km 15Km > 150km

57 Exploring origins of trips > 150km 19 trips

58 Second dimension: time When people move to Pisa? Space Dimension s Individ ual Time

59 Let s focus at city level 0km 5Km 5km 15Km

60 Trips segmented by similarity Space Dimension s Individ ual Time

61 Explore clusters: Florence

62 Explore clusters: A1

63 Explore clusters: A12

64 Explore Clusters: Valdera

65 Explore clusters: Versilia

66 Trip segmentation by time Space Dimension s Individ ual Time

67 Trips Segmented by Time: from 5 to 8

68 Discover traffic jams

69 Aggregate trips by common destinations

70 Industry: Saint Gobain

71 Industry: Saint Gobain

72 Residential Area: I Passi

73 Residential Area: I Passi

74 Residential vs Industrial

75 Services: Montacchiello

76 Services: Montacchiello

77 Extracting travellers profiles - Analysis focused on the single individual - Find his/her systematic mobility User trips Mobility profile Routines

78 Services: Montacchiello (Profiles) Space Dimension s Individ ual Time

79 Impact of systematic mobility on access patterns

80 What-if scenarios

81 Service: Montacchiello (Car Pooling?) q Traj Blu q DT: 06:46:53 q Traj Red q DT: 11:52:06 q Traj Green q DT: 06:51:41 q Blu can give a ride to Green

82 Application: Car pooling Pro-active suggestions of sharing rides opportunities without the need for the user to explicitly specify the trips of interest. Matching two routines: Mobility profile share-ability:

83 Communities of users

84 Networks as a mining tool S. Rinzivillo, S. Mainardi, F. Pezzoni, M. Coscia, D. Pedreschi, F. Giannotti Discovering the Geographical Borders of Human Mobility KI - Künstliche Intelligenz, 2012.

85 Mobility coverages

86 Step 1: spatial regions

87 Step 2: evaluate flows among regions

88 Step 3: forget geography

89 Step 4: perform community detection

90 Step 4: perform community detection

91 Step 5: map back to geography

92 Step 6: draw borders

93 Final result

94 Final result: compare with municipality borders

95 Borders in different time periods Only weekdays movements Only weekend movements Similar to global clustering: strong influence of systematic movements Strong fragmentation: the influence of systematic movements (home-work) is missing

96 Borders at regional scale

97 Final results 7 (a) 500m (b) 1000m (c) 2000m (c) 5,000m (d) 10,000m (e) 20,000m Fig. 7: The resulting clusters obtained with different spatial granularities topology analysis of the networks performed in Section IV, that identified the most promising cell sizes at values smaller 0.58

98 Confronto con le nuove province

99 Explore borders by time q Use temporal projections to extract mobility networks q Identified three main periods q Week days q Week ends q Whole week q Having GPS data extending over 4 weeks we extracted 12 distinct networs, named as week0,weekday0,weekend0,week1,and so on Coscia, M., Rinzivillo, S., Giannotti, F. and Pedreschi, D., Optimal Spatial Resolution for the Analysis of Human Mobility. In ASONAM, 2012.

100 Degree distribution by time p(d) Weekdays1 Weekdays2 Weekdays3 Weekdays4 Weekend1 Weekend2 Weekend3 Weekend4 Week1 Week2 Week3 Week e d

101 Network properties (by day) # Nodes Edges # Connected Components May 2st May 8th May 15th May 22nd 100 May 29th May 2st May 8th May 15th May 22nd May 29th Day Day

102 Borders quality

103 Semantic Enrichment

104 NetMob 2013 MP4-A Project: Mobility Planning For Africa Mirco Nanni, Roberto Trasarti, Barbara Furletti, Lorenzo Gabrielli Peter Van Der Mede, Joost De Bruijn, Erik De Romph, Gerard Bruil

105 The Challenge q Incompleteness issue q Call Detail Records describe the location of users only during activity (calls, messages) q Most individual mobility might be invisible q Lack of semantics q No information about activities and purpose q Spatial uncertainty issue q Location described in terms of cells having dynamic and sometimes large extent

106 The approach (summary) q Analyze raw GSM data to q infer systematic mobility of individuals q Build origin-destination matrices q Describe (expected) flows between areas q Build a transportation model q Assigns O/D matrix to OSM road network through OmniTRANS system

107 Systematic mobility q A single trace of an individual can be poorly informative about his/her movements H B W A C H A B W C time

108 Systematic mobility q Yet, several daily traces of the same individual might allow to identify regular places H A W C H H W H H B H A W H H W W H H A B W C

109 H Systematic mobility q Yet, several daily traces of the same individual might allow to identify regular places A C W H W B H A H W H W H H H A B W H C W H H H H H H H H H H W W W W W W

110 H Systematic mobility q Yet, several daily traces of the same individual might allow to identify regular places and trips A C W H W B H A H W H W H H H A B W H C W H H H H H H H H H H W W W W W W

111 Systematic mobility q The whole individual mobility is then summarized by its systematic movements Afternoon routine H W Morning routine l They will be used as typical daily schedule of the individual

112 Systematic O/D matrix q Combine the ten 2-weeks datasets into one q For each user, extract significant L1 L2 q Aggregate (individual) systematic movements into (collective) systematic flows q Examples: Outgoing traffic Incoming traffic

113

114

115

116 Mobile phone socio-meters Analyze individual call habits to recognize profiles q Resident q Commuters q Visitors/Tourists

117 Call Habit Profiles Week: working days Time & weekend slots 0:00-7:59 8:00-18:59 19:00-23:59 Users call habit profile

118 Resident profile

119 Resident profile Commuter profile

120 Resident profile Commuter profile Visitor profile Night visitors Daylight visitors

121 User profile quantification Resident profile Commuter profile Visitor profile

122 Sponsored by: Investigating semantic regularity of human mobility lifestyle Vinicius Monteiro de Lira Federal University of Pernambuco, Brazil Valeria Cesario Times Federal University of Pernambuco, Brazil Patricia Cabral Tedesco Federal University of Pernambuco, Brazil Salvatore Rinzivillo ISTI-CNR, Pisa, Italy Chiara Renso ISTI-CNR, Pisa, Italy 18th International Database Engineering & Applications Symposium IDEAS '14 Porto, Portugal 12 2

123 INTRODUCTION 123 The appearance and wide distribution of position-enabled personal devices boosted the study of the mobility behavior of the individuals based on crowsourced data. When these postioning data are enrcihed with semantic information (i.e. the place visited) we have semantic trajectories. The semantics helps in the human dynamics understanding

124 About Regularity 124 We study the tendency of mobile individuals to be regular or irregular when choosing the places and the time to perform some activities Semantic (or activity-based) Regularity Definition of spatial and temporal entropy as a measure of the semantic regularity of users computed from crowsensed data Values ranges from 0 to 1; Where 1 means highest regularity; and 0, lowest regularity or no regularity;

125 125 Why studying semantic regularity? Regularity profiles can characterize one specific aspect of the user lifestyle We give a quantitative measure of the regularity habits of the people under observation This can be useful in: Recommendation systems Carpooling Advertisement

126 METHODOLOGY 126 The semantic regularity behavior is measured according to two dimensions: Spatial: how much a user tends to visit the same places to perform a given activity. Temporal: the regularity of the user to perform an activity in a preferred temporal interval.

127 METHODOLOGY 127 Three phases: (i) Data Collection of users' visits to Points of Interest (POIs); (ii) Estimation of the regularity measures (iii) Extraction of the semantic regularity profiles.

128 Semantic regularity - Example 128 Visits dataset University Work/Study Gym Leisure Restaurant Eating We associate a category of place to an activity with a static mapping University Work/Study Gym Leisure Restaurant Eating

129 Visits and frequency distributions 129 The Visits dataset provides the mobility information to associate a person p to a POI poi_id she visited. < VisitID; UserID; poi id; poi cat; timestamp > Formally, for a POI p of category C we define the spatial relative frequency distribution SRFD of u as: SRFD(u,C,p) = P(u in p C) = #visists to p #visits to C Formally, for a POI of category C we define the temporal relative frequency distribution TRFD of u as: TRFD(u,C,t) = P(u in t C) = #visists to t #visits to C

130 The Entropy measures 130 Given a user u and a place category C, his Spatial Entropy (SH): SH(u,C) = p C SRFD(u,p,C)logSRFD(u,p,C) And, analogously the Temporal Entropy (TH): TH(u,C) = p T TRFD(u,t,C)logTRFD(u,t,C) The Spatial Maximum Entropy (SMH) for each category: SMH(C) = log C The Temporal Maximum Entropy (TMH) for each category: SMH(C) = log I

131 Semantic regularity 131 Given a user u and a category C, the Semantic Spatial Regularity for C is: Given a user u, a set of interval I and a category C, the Semantic Temporal Regularity for C is: A semantic regularity profile for a user u and consists of a set of tuples < Ci,SSR(u,Ci),STR(u,Ci) > for all catgories of places (activities) Ci in C1,C2,...,Cn.

132 Example 132 Example of Semantic Spatial Regularity and Semantic Temporal Regularity for Gyms: We compute the Spatial Entropy (SH) and the Temporal Entropy (TH). Based on this we can see that the regularity measure for the gym SSR is high, while the temporal regularity STR is low

133 EXPERIMENTS 133 We tested our methodology using a dataset of check-ins generated from a Location-based Social Network (LBSN), called Brightkite. The dataset has a total of check-ins performed by 2806 users around the world between March 22nd, 2008 and October 18th, Check-ins : user identification, the geographic coordinates and the time instant Foursquare API to annotate semantically the places where users performed the check-ins. 13 main categories of POIs mapped to most common activities

134 EXPERIMENTS - restaurants 134 Restaurant category Most REGULAR Most people tend to change the place when they go eating and also the time when they go. Most of the users are irregular in space and time Most IRREGULAR

135 EXPERIMENTS - University 135 University category Most REGULAR We clearly notice a very regular spatial behavior Most of the users are distributed close the value 1 (more regular) on the spatial dimension Most IRREGULAR

136 EXPERIMENTS High regularity 136 TL TR High irregularity BL BR

137 MAPMOLTY tool 137 MAPMOLTY computes a number of measures to summarize the loyalty level of each POI from different Categories, called loyalty indicators. The application is built upon the map to ease the navigability and visualization in the interesting area. Vinicius de Lira, Chiara Renso, Salvatore Rinzivillo, Valeria Cesario Times and Patricia Tedesco. MAPMOLTY: a web tool for discovering place loyalty based on mobile crowdsource data, Demo paper at ICWE 2014

138 Collect Movements from the Crowd q Investigate approaches to mine urban mobility patterns and anomalies by analyzing socially created trajectories: - Extract mobility from geo-enabled social media - Enrich with contextual/semantic information to extract more insights about the nature of the movements.

139 Twitter Data q Microblogging platform q User may send short messagges (up to 140 characters) on what is around them q Georeference of tweets q 600k tweets (300k geotagged) q 33k users q 8 weeks (may-june 2012)

140 How to build Tweet-trajectories q Aggregate consecutive tweets according to a spatio-temporal threshold

141 Sampling rate distribution of tweets

142 Trajectory Extraction

143 Trajectory Extraction

144 Origin Destination Analysis

145 Origin Destination Analysis: relevant fluxes From Airport From Sagrada Familia

146 Semantic Enrichment

147 Foursquare q q User contributed timestamped position 9 Top-level categories q Nightlife and Sport q Travel & Transport q Outdoor & Recreation q Shop & Service q College & University q Food q Art & Entertainment q News q Residence q Professional & Other Places

148 Semantic trajectory mining MWC2012 Semantic Trajectories q Dataset: 9689 trajectories built (75 min./100 mt.) from geo-located tweets of Barcelona during the week of the Mobile World Congress 2012 (MWC2012), the week before and the week after and semantically enriched by classifying as performed by tourists and locals associating the most-likely Foursquare venue.

149 Semantic Trajectory Mining Semantic Origin/Destinaiton matrix built considering the top Foursquare category of origin and destination of trajectories Start trajectory Foursquare place: Burger King, L Hospitalet Foursquare top category: Food End Trajectory Foursquare place: 22@, Glories Foursquare top category: Professional & Other places

150 Semantic Trajectory Mining Semantic Origin/Destinaiton matrix built considering the top Foursquare category of origin and destination of trajectories

151 Semantic Trajectory Mining Semantic Origin/Destinaiton matrix built considering the top Foursquare category Week before of origin and destination of trajectories MWC2012 Week Week after

152 Join Semantics with Spatial Flow Trips by category entering to Sants Montjuic Week 0 Week 1 Week # of trips Food Arts & Entertainment Outdoors & Recreation Professional & Other Places Travel & Transport Shop & Service Nightlife Spot

153 Join Semantics with Spatial Flow Trips exiting from Sants Montjuic by category Week 0 Week 1 Week # of trips Food Arts & Entertainment Outdoors & Recreation Professional & Other Places Travel & Transport Shop & Service Nightlife Spot College & University

154 Sponsored by: Where have you been today? Annotating trajectories with DayTag S. Rinzivillo, F. Siqueira, L. Gabrielli, C. Renso, V. Bogorny SSTD 2013, Demo Paper, Monaco

155 Sensing People Behavior: Surveys q Cons q Low spatial precision q Low temporal accuracy q Limited in time (usually one or two days) q Underestimation of short stops (e.g. ATM) q Pro q Semantically rich q User-view of movement q Motivation of the movement

156 Sensing People Behavior: GPS q Cons: q No semantic information q Difficult for user to reconstruct movement motivations q Pro q High spatial precision q High temporal accuracy q Unlimited time of track q Precise reconstruction of movement dynamic (accelaration, route, speed) q Low cost technology

157 DayTag

158 DayTag: Anatomy

159 DayTag: Timeline

160 DayTag: Spatial Reference

161 DayTag: Semantic Information

162 Cambia il traffico con i tuoi TAG Una inizifva di: In collaborazione con: tagmyday.isf.cnr.it

163 Join us Move Tag Send tagmyday.isf.cnr.it

164 Personal Data Store tagmyday.isf.cnr.it

165 AcFvity DistribuFon Incoming flow to Calci from Pisa tagmyday.isf.cnr.it

166 Atlas of Urban Mobility

167 Atlas of Urban Mobility

168 Pisa Traffico in Ingresso

169 Pisa Incoming Traffic

170 Trip distribution per day Pisa S. Giuliano Cascina

171 From DATA to KNOWLEDGE Demographic data Transport data Movement data Geographic data Data T- Clustering T- Pa[erns Models ValidaFon Forecasts

172 Deployment of a model Data Integration and Semantic Enrichment Service Continuosly Sensed indicator CREATE MODEL MilanODMatrix AS MINE ODMATRIX FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t), (SELECT orig.id, orig.area FROM MunicipalityTable orig), (SELECT dest.id, dest.area FROM MunicipalityTable dest) Dashboard Periodically Sensed indicator Validation

173 Privacy by Design in Data Mining

174 7 Billion October 2011

175

176 The dark side: Privacy Risks 176 ü Big data of human activity contain personal sensitive information ü Opportunities of discovering knowledge by analytical and data mining tools increase hand in hand with the risks of privacy violation ü An important question: May data publishing and mining violate individual privacy?

177 De-identified User Trajectory 177 ü ü Human data may reveal many facets of the private life Privacy protection is increasingly difficult and it cannot simply be accomplished by de-identification ü ü Color darkness of each region is proportional to the number of different visits Discovering persons living in that home and working in that company we can identify the user

178 178 How can we guarantee privacy protection in Data Mining? Privacy by Design Paradigm

179 Privacy by Design Paradigm 179 ü Design frameworks to counter the threats of undesirable and unlawful effects of privacy violation without obstructing the knowledge discovery opportunities of data mining technologies ü Natural trade-off between privacy quantification and data utility ü Our idea: Privacy by Design in Data Mining Philosophy and approach of embedding privacy into the design, operation and management of information processing technologies and systems

180 Privacy by Design in Data Mining 180 ü The framework is designed with assumptions about The sensitive data that are the subject of the analysis The attack model, i.e., the knowledge and purpose of a malicious party that wants to discover the sensitive data The target analytical questions that are to be answered with the data ü Design a privacy-preserving framework able to transform the data into an anonymous version with a quantifiable privacy guarantee guarantee that the analytical questions can be answered correctly, within a quantifiable approximation that specifies the data utility

181 Our Frameworks 181 q Privacy by Design for Data Publishing q Trajectory Anonymization by spatial generalization q Trajectory Anonymization by semantic generalization q Privacy by Design for Data Mining Outsourcing q Privacy-Preserving Mining of Association Rules from Outsourced Transaction Databases q Privacy by Design for GSM User Profiles q Privacy by Design in Distributed Movement Data

182 Privacy by Design for Movement Data Publication A. Monreale, G. Andrienko, N. Andrienko, F. Giannotti, D. Pedreschi, S. Rinzivillo, S. Wrobel. Movement Data Anonymity through Generalization. Journal of Transactions on Data Privacy

183 Privacy-Preserving Framework q Anonymization of movement data while preserving clustering q Trajectory Linking Attack: the attacker q knows some points of a given trajectory q and wants to infer the whole trajectory q Countermeasure: method based on q spatial generalization of trajectories q k-anonymization of trajectories

184 Trajectory Generalization q Given a trajectory dataset 1. Partition of the territory into Voronoi cells 2. Transform trajectories into sequence of cells

185 Partition of the territory Characteristic points extraction: Starts (1) Ends (2) Points of significant turns (3) Points of significant stops, and representative points from long straight segments (4) Spatial Clusters : Group the extracted points with desired spatial extent (MaxRadius) defining the degree of the generalization Voronoi Tessellation: Partition the territory into Voronoi cells using the centroids of the spatial clusters as generating

186 Generation of trajectories Divide the trajectories into segments that link Voronoi cells For each trajectory: the area a 1 containing its first point p 1 is found the following points are checked If a point p i is not contained in a 1 for it the containing area a 2 is found and so on Generalized trajectory: From sequence of areas to sequence of centroids of areas

187 Generalization vs k-anonymity 187 q Generalization could not be sufficient to ensure k-anonymity: q For each generalized trajectory there exist at least others k-1 different people with the same trajectory? q Two transformation strategies q KAM-CUT q publishing only the k-frequent prefixes of the generalized trajectories q KAM-REC q recovering portions of trajectories which are frequent at least k times q minimizing the noise

188 Dataset 188 q Trajectory Data in Milan city q GPS traces by about 17,000 vehicles

189 Clustering on Anonymized Trajectories 189

190 Probability of re-identification 190

191 Conclusion q Opportunities and challenges to have a deep insight within human mobility q Mobility models as dual piece of knowledge q Enabler for new services q Decision support for planning and design q Creation and extraction of complex models supported by an integrated platform: M-Atlas q Management of complex analytical processes q Deployment of services

192 Conclusion q Privacy is ever-growing concern in our society q Privacy often brings to skepticism q Effects on the use of technologies q Effects on the opportunities of data understanding q Providing methodologies for risk evaluation and data control

193 Key publications q q q q q q q F Giannotti, M Nanni, F Pinelli, D Pedreschi. Trajectory pattern mining. ACM SIGKDD 2007 F Giannotti, D Pedreschi. Mobility, data mining and privacy: Geographic knowledge discovery. Springer, 2008 A Monreale, F Pinelli, R Trasarti, F Giannotti. WhereNext: a location predictor on trajectory pattern mining. ACM SIGKDD 2009 S Rinzivillo, D Pedreschi, M Nanni, F Giannotti, N Andrienko, G Andrienko. Visually driven analysis of movement data by progressive clustering. Information Visualization 7 (3-4), D Wang, D Pedreschi, C Song, F Giannotti, AL Barabasi. Human mobility, social ties, and link prediction. ACM SIGKDD 2011 F Giannotti, M Nanni, D Pedreschi, F Pinelli, C Renso, S Rinzivillo, R Trasarti. Unveiling the complexity of human mobility by querying and mining massive trajectory data. The VLDB 20(5) 2011 R Trasarti, F Pinelli, M Nanni, F Giannotti. Mining mobility user profiles for car pooling. ACM SIGKDD 2011

194 Key publications q q q q q M Coscia, G Rossetti, F Giannotti, D Pedreschi. Demon: a local-first discovery method for overlapping communities. ACM SIGKDD 2012 S Rinzivillo, S Mainardi, F Pezzoni, M Coscia, D Pedreschi, F Giannotti. Discovering the geographical borders of human mobility. KI-Künstliche Intelligenz 26 (3) 2012 D Pennacchioli, M Coscia, S Rinzivillo, D Pedreschi, F Giannotti. Explaining the Product Range Effect in Purchase Data. IEEE BIGDATA 2013 B Furletti, L Gabrielli, C Renso, S Rinzivillo. Analysis of GSM Calls Data for Understanding User Mobility Behavior. IEEE BIG DATA 2013 L Milli, A Monreale, G Rossetti, D Pedreschi, F Giannotti, F Sebastiani. Quantification trees. IEEE ICDM 2013

195 Vision papers q F Giannotti, D Pedreschi, A Pentland, P Lukowicz, D Kossmann, J Crowley, D Helbing. A planetary nervous system for social mining and collective awareness. The European Physical Journal Special Topics 214 (1), 49-75, 2012 q J van den Hoven, D Helbing, D Pedreschi, J Domingo-Ferrer, F Giannotti. FuturICT The road towards ethical ICT. The European Physical Journal Special Topics 214 (1), , 2012 q M Batty, KW Axhausen, F Giannotti, A Pozdnoukhov, A Bazzani, M Wachowicz. Smart cities of the future. The European Physical Journal Special Topics 214 (1), , 2012

Mobile phone data for Mobility statistics

Mobile phone data for Mobility statistics International Conference on Big Data for Official Statistics Organised by UNSD and NBS China Beijing, China, 28-30 October 2014 Mobile phone data for Mobility statistics Emanuele Baldacci Italian National

More information

Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach

Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach Barbara Furletti, Lorenzo Gabrielli, Giuseppe Garofalo,

More information

Big Data & Privacy. It s Time for a New Deal on Personal Data Dino Pedreschi. KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr.

Big Data & Privacy. It s Time for a New Deal on Personal Data Dino Pedreschi. KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr. Big Data & Privacy It s Time for a New Deal on Personal Data Dino Pedreschi KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr.it Taiwan-Italy Workshop Roma 27 Feb 2015 SIAMO TUTTI POLLICINI DIGITALI

More information

Identifying users profiles from mobile calls habits

Identifying users profiles from mobile calls habits Identifying users profiles from mobile calls habits Barbara Furletti KDDLAB - ISTI CNR Pisa, Italy [email protected] Lorenzo Gabrielli KDDLAB- ISTI CNR Pisa, Italy [email protected]

More information

MOBILITY DATA MODELING AND REPRESENTATION

MOBILITY DATA MODELING AND REPRESENTATION PART I MOBILITY DATA MODELING AND REPRESENTATION 1 Trajectories and Their Representations Stefano Spaccapietra, Christine Parent, and Laura Spinsanti 1.1 Introduction For a long time, applications have

More information

Discovering Trajectory Outliers between Regions of Interest

Discovering Trajectory Outliers between Regions of Interest Discovering Trajectory Outliers between Regions of Interest Vitor Cunha Fontes 1, Lucas Andre de Alencar 1, Chiara Renso 2, Vania Bogorny 1 1 Dep. de Informática e Estatística Universidade Federal de Santa

More information

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey 1 Recommendations in Mobile Environments Professor Hui Xiong Rutgers Business School Rutgers University ADMA-2014 Rutgers, the State University of New Jersey Big Data 3 Big Data Application Requirements

More information

Location-Based Social Networks: Users

Location-Based Social Networks: Users Chapter 8 Location-Based Social Networks: Users Yu Zheng Abstract In this chapter, we introduce and define the meaning of location-based social network (LBSN) and discuss the research philosophy behind

More information

Big Data Analytics in Mobile Environments

Big Data Analytics in Mobile Environments 1 Big Data Analytics in Mobile Environments 熊 辉 教 授 罗 格 斯 - 新 泽 西 州 立 大 学 2012-10-2 Rutgers, the State University of New Jersey Why big data: historical view? Productivity versus Complexity (interrelatedness,

More information

Advanced Methods for Pedestrian and Bicyclist Sensing

Advanced Methods for Pedestrian and Bicyclist Sensing Advanced Methods for Pedestrian and Bicyclist Sensing Yinhai Wang PacTrans STAR Lab University of Washington Email: [email protected] Tel: 1-206-616-2696 For Exchange with University of Nevada Reno Sept. 25,

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Processes of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region

Processes of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region Processes of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region (DAStU, Politecnico di Milano) New «urban questions» and challenges

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,

More information

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Big Data Mining Services and Knowledge Discovery Applications on Clouds Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy [email protected] Data Availability or Data Deluge? Some decades

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Mapping Linear Networks Based on Cellular Phone Tracking

Mapping Linear Networks Based on Cellular Phone Tracking Ronen RYBOWSKI, Aaron BELLER and Yerach DOYTSHER, Israel Key words: Cellular Phones, Cellular Network, Linear Networks, Mapping. ABSTRACT The paper investigates the ability of accurately mapping linear

More information

Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR:

Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR: Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR: Developing Real-time Census from Crowds of Greater Dhaka Ayumi Arai 1 and Ryosuke Shibasaki 1,2 1 Department

More information

Traffic mining in a road-network: How does the

Traffic mining in a road-network: How does the 82 Int. J. Business Intelligence and Data Mining, Vol. 3, No. 1, 2008 Traffic mining in a road-network: How does the traffic flow? Irene Ntoutsi Department of Informatics, University of Piraeus, Greece

More information

CHAPTER-24 Mining Spatial Databases

CHAPTER-24 Mining Spatial Databases CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification

More information

IBM Social Media Analytics

IBM Social Media Analytics IBM Social Media Analytics Analyze social media data to better understand your customers and markets Highlights Understand consumer sentiment and optimize marketing campaigns. Improve the customer experience

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

BIG DATA FOR MODELLING 2.0

BIG DATA FOR MODELLING 2.0 BIG DATA FOR MODELLING 2.0 ENHANCING MODELS WITH MASSIVE REAL MOBILITY DATA DATA INTEGRATION www.ptvgroup.com Lorenzo Meschini - CEO, PTV SISTeMA COST TU1004 final Conference www.ptvgroup.com Paris, 11

More information

The STC for Event Analysis: Scalability Issues

The STC for Event Analysis: Scalability Issues The STC for Event Analysis: Scalability Issues Georg Fuchs Gennady Andrienko http://geoanalytics.net Events Something [significant] happened somewhere, sometime Analysis goal and domain dependent, e.g.

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

PH.D. THESIS (SSD) INF/01. Mastering the Spatio-Temporal Knowledge Discovery Process

PH.D. THESIS (SSD) INF/01. Mastering the Spatio-Temporal Knowledge Discovery Process University of Pisa Department of Computer Science. PH.D. THESIS (SSD) INF/01 Mastering the Spatio-Temporal Knowledge Discovery Process Ph.D. Candidate: Roberto Trasarti Supervisors: Prof. Dino Pedreschi

More information

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE INTRODUCTION RESEARCH IN PRACTICE PAPER SERIES, FALL 2011. BUSINESS INTELLIGENCE AND PREDICTIVE ANALYTICS

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

Use of System Dynamics for modelling customers flows from residential areas to selling centers

Use of System Dynamics for modelling customers flows from residential areas to selling centers Use of System Dynamics for modelling customers flows from residential areas to selling centers ENRICO BRIANO (*) CLAUDIA CABALLINI (*)(**) ROBERTO REVETRIA (*)(**) MAURIZIO SCHENONE (**) ALESSANDRO TESTA

More information

IBM Social Media Analytics

IBM Social Media Analytics IBM Analyze social media data to improve business outcomes Highlights Grow your business by understanding consumer sentiment and optimizing marketing campaigns. Make better decisions and strategies across

More information

Mining Mobile Group Patterns: A Trajectory-Based Approach

Mining Mobile Group Patterns: A Trajectory-Based Approach Mining Mobile Group Patterns: A Trajectory-Based Approach San-Yih Hwang, Ying-Han Liu, Jeng-Kuen Chiu, and Ee-Peng Lim Department of Information Management National Sun Yat-Sen University, Kaohsiung, Taiwan

More information

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011 Behavior Analysis in Crowded Environments XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011 Behavior Analysis in Sparse Scenes Zelnik-Manor & Irani CVPR

More information

Visualizing e-government Portal and Its Performance in WEBVS

Visualizing e-government Portal and Its Performance in WEBVS Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR [email protected] Abstract An e-government

More information

IDENTIFICATION OF KEY LOCATIONS BASED ON ONLINE SOCIAL NETWORK ACTIVITY

IDENTIFICATION OF KEY LOCATIONS BASED ON ONLINE SOCIAL NETWORK ACTIVITY H. Efstathiades, D. Antoniades, G. Pallis, M. D. Dikaiakos IDENTIFICATION OF KEY LOCATIONS BASED ON ONLINE SOCIAL NETWORK ACTIVITY 1 Motivation Key Locations information is of high importance for various

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

Fleet management system as actuator for public transport priority

Fleet management system as actuator for public transport priority 10th ITS European Congress, Helsinki, Finland 16 19 June 2014 TP 0226 Fleet management system as actuator for public transport priority Niels van den Bosch 1, Anders Boye Torp Madsen 2 1. IMTECH Traffic

More information

Craig McWilliams Craig Burrell. Bringing Smarter, Safer Transport to NZ

Craig McWilliams Craig Burrell. Bringing Smarter, Safer Transport to NZ Craig McWilliams Craig Burrell Bringing Smarter, Safer Transport to NZ World Class Transport. Smarter, Stronger, Safer. Bringing Smarter Safer Transport to NZ Craig Burrell Infrastructure Advisory Director

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

NetView 360 Product Description

NetView 360 Product Description NetView 360 Product Description Heterogeneous network (HetNet) planning is a specialized process that should not be thought of as adaptation of the traditional macro cell planning process. The new approach

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30

Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30 Complex Event Processing (CEP) Why and How Richard Hallgren BUGS 2013-05-30 Objectives Understand why and how CEP is important for modern business processes Concepts within a CEP solution Overview of StreamInsight

More information

Smart Transport for Sustainable City

Smart Transport for Sustainable City Smart Transport for Sustainable City Dipartimento di Ingegneria dell Informazione University of Pisa, Italy E-mail: [email protected] Alessio Bechini, Beatrice Lazzerini Projects SMARTY (SMArt

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

arxiv:1408.1519v2 [cs.si] 8 Aug 2014

arxiv:1408.1519v2 [cs.si] 8 Aug 2014 Group colocation behavior in technological social networks arxiv:48.59v2 [cs.si] 8 Aug 24 Chloë Brown, Neal Lathia, Anastasios Noulas, Cecilia Mascolo, and Vincent Blondel 2 Computer Laboratory, University

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory

GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory Yu Zheng, Xing Xie and Wei-Ying Ma Microsoft Research Asia, 4F Sigma Building, NO. 49 Zhichun Road, Beijing 100190,

More information

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential

More information

How To Create A Retail Analytics Platform With Tapway

How To Create A Retail Analytics Platform With Tapway How to revolutionize brickand-mortar retail industry with big data analytics? April 20, 2015 Agenda Tapway Introduction & Why We Do What We Do Technology Overview In-store Analytics for Retail Chains Shopper

More information

Grid Density Clustering Algorithm

Grid Density Clustering Algorithm Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2

More information

Use of Mobile Positioning Data for Tourism Statistics

Use of Mobile Positioning Data for Tourism Statistics Peter Laimer Johanna Ostertag-Sydler Directorate Spatial Statistics Workshop 14 th May 2014 Prague, Czech Republic Use of Mobile Positioning Data for Tourism Statistics Austrian views www.statistik.at

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

In comparison, much less modeling has been done in Homeowners

In comparison, much less modeling has been done in Homeowners Predictive Modeling for Homeowners David Cummings VP & Chief Actuary ISO Innovative Analytics 1 Opportunities in Predictive Modeling Lessons from Personal Auto Major innovations in historically static

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure

Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure Hitachi Review Vol. 63 (2014), No. 1 18 Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure Kazuaki Iwamura Hideki Tonooka Yoshihiro Mizuno Yuichi Mashita OVERVIEW:

More information

Scalable Cluster Analysis of Spatial Events

Scalable Cluster Analysis of Spatial Events International Workshop on Visual Analytics (2012) K. Matkovic and G. Santucci (Editors) Scalable Cluster Analysis of Spatial Events I. Peca 1, G. Fuchs 1, K. Vrotsou 1,2, N. Andrienko 1 & G. Andrienko

More information

Deep Insights Smart Decisions Motionlogic

Deep Insights Smart Decisions Motionlogic Deep Insights Smart Decisions Motionlogic About Motionlogic Big Data business of Deutsche Telekom 100% subsidiary Analytics of people movement behavior and demographic indicators Using anonymized network

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

2013 Student Competition

2013 Student Competition ITS Heartland Chapter 2013 Student Competition Shu Yang ([email protected]) Saber Abdoli ([email protected]) Tiffany M. Rando ([email protected]) Smart Transportation Lab Department of Civil Engineering Parks

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

1.5.3 Project 3: Traffic Monitoring

1.5.3 Project 3: Traffic Monitoring 1.5.3 Project 3: Traffic Monitoring This project aims to provide helpful information about traffic in a given geographic area based on the history of traffic patterns, current weather, and time of the

More information

Spatio-Temporal Clustering: a Survey

Spatio-Temporal Clustering: a Survey Spatio-Temporal Clustering: a Survey Slava Kisilevich, Florian Mansmann, Mirco Nanni, Salvatore Rinzivillo Abstract Spatio-temporal clustering is a process of grouping objects based on their spatial and

More information

PhoCA: An extensible service-oriented tool for Photo Clustering Analysis

PhoCA: An extensible service-oriented tool for Photo Clustering Analysis paper:5 PhoCA: An extensible service-oriented tool for Photo Clustering Analysis Yuri A. Lacerda 1,2, Johny M. da Silva 2, Leandro B. Marinho 1, Cláudio de S. Baptista 1 1 Laboratório de Sistemas de Informação

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

DEMOCRATIZING BIG DATA: THE ETHICAL CHALLENGES OF SOCIAL MINING. Dino PEDRESCHI (KDDLab, Dipartimento di Informatica, Università di Pisa)

DEMOCRATIZING BIG DATA: THE ETHICAL CHALLENGES OF SOCIAL MINING. Dino PEDRESCHI (KDDLab, Dipartimento di Informatica, Università di Pisa) DEMOCRATIZING BIG DATA: THE ETHICAL CHALLENGES OF SOCIAL MINING Dino PEDRESCHI (KDDLab, Dipartimento di Informatica, Università di Pisa) Siamo tutti pollicini digitali Plenty of digital breadcrumbs behind

More information

A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities The first article of this series presented the capability model for business analytics that is illustrated in Figure One.

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Use of a Web-Based GIS for Real-Time Traffic Information Fusion and Presentation over the Internet

Use of a Web-Based GIS for Real-Time Traffic Information Fusion and Presentation over the Internet Use of a Web-Based GIS for Real-Time Traffic Information Fusion and Presentation over the Internet SUMMARY Dimitris Kotzinos 1, Poulicos Prastacos 2 1 Department of Computer Science, University of Crete

More information

3. Dataset size reduction. 4. BGP-4 patterns. Detection of inter-domain routing problems using BGP-4 protocol patterns P.A.

3. Dataset size reduction. 4. BGP-4 patterns. Detection of inter-domain routing problems using BGP-4 protocol patterns P.A. Newsletter Inter-domain QoS, Issue 8, March 2004 Online monthly journal of INTERMON consortia Dynamic information concerning research, standardisation and practical issues of inter-domain QoS --------------------------------------------------------------------

More information

A framework for Itinerary Personalization in Cultural Tourism of Smart Cities

A framework for Itinerary Personalization in Cultural Tourism of Smart Cities A framework for Itinerary Personalization in Cultural Tourism of Smart Cities Gianpaolo D Amico, Simone Ercoli, and Alberto Del Bimbo University of Florence, Media Integration and Communication Center

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING

More information

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,

More information