Mobility Data Mining and Analytics
|
|
|
- Quentin Baldwin
- 10 years ago
- Views:
Transcription
1 Sponsored by: Mobility Data Mining and Analytics 2 nd Datasim Summer School 14 th July 2014 S. Rinzivillo KDD Lab ISTI CNR Pisa, Italy
2 BIG DATA availability What we buy Whom we interact with What we search for Where we go
3
4
5
6 Country-wide mobile phone data
7
8 8 Analisi di Reti Sociali. Aprile-Maggio 2011 July 13, 2014
9 World Cup 2014 Football is a simple game: 22 men chase a ball for 90 minutes and at the end, the Germans always win -- Gary Lieneker (after Italy
10 BIG DATA availability What we buy Whom we interact with What we search for Where we go
11 Urban Mobility Complexity: vehicles
12 Urban Mobility Complexity: phones
13 Crash Course on MDM How can we manage the complexity coming from huge amount of data?
14 4-stage mobility data mining semantics derived models basic trajectory patterns and models raw trajectory data
15 Trajectory data q Mobility of an object is described by a set of trips q Each trip is a trajectory, i.e. a sequence of time-stamped locations Time (x 5,y 5,t 5 ) (x 5,y 5,t 5 ) Y (x 4,y 4,t 4 ) (x 4,y 4,t 4 ) X (x 1,y 1,t 1 ) (x 2,y 2,t 2 ) (x 3,y 3,t 3 ) Y X (x 1,y 1,t 1 ) (x 2,y 2,t 2 ) (x 3,y 3,t 3 )
16 Basic mobility patterns and models l T-Cluster: represents a group of similar trajectories l T-Pattern: represents trajectory segments that visit the same sequence of regions with similar transition times l T-Flock: represents trajectory segments that move together for a time interval
17 Basic mobility patterns & models: T-clustering q Trajectories are grouped based on similarity q Several possible notions of similarity q Start/End points q Shape of trajectory q Shape & time q Etc. Nanni, Pedreschi. Time-focused clustering of trajectories of moving objects. J. of Intelligent Information Systems, Rinzivillo, Pedreschi, Nanni, Giannotti, Andrienko, Andrienko. Visually-driven analysis of movement data by progressive clustering. J. of Information Visualization, 2008
18 Density Based Clustering K-means Density-based
19 Average Euclidean Distance Sincronized q Align point temporally q q Eventually assign penalties to non matching points
20 Common Destination q Select last point Plast for each trajectory q D(T,T ) = Euclidean(Plast, P last)
21 Common Origins q Select first point Pfirst for each trajectory q D(T,T ) = Euclidean(Pfirst, P first)
22 Route Similarity q Alignment of points, multiple matches q Average Euclidean Distance q Penalties for non matching initial points (no penalties for destinations)
23 Process Overview Simple and very efficient distance measure Dataset More selective and particular distance functions (or more restrictive parameters) Clusters Noise Subclusters Subclusters Noise Knowledge
24 Basic mobility patterns & models: T-pattern q T-Pattern Temporal information Area A Δt = 5 minutes Area B Δt = 35 minutes Area C Spatial information l Variations: Absolute time, visit duration, distance traveled, speed, sensor/ user provided measures (temp., pressure, ratings, ) Giannotti, Nanni, Pedreschi, Pinelli. Trajectory pattern mining. Proc. ACM SIGKDD 2007
25 Basic mobility patterns & models: T-Flocks q Group of objects that move together (close to each other) for a time interval M. Wachowicz, R. Ong, C. Renso, M. Nanni: Finding moving flock patterns among pedestrians through collective coherence. International Journal of Geographical Information Science 25(11): (2011)
26 Derived patterns and models q Combination & refinement of basic patterns and models l Individual Mobility Profile: routines consistently followed by a single moving object l T-PTree: predictive tree built by combining T-Patterns
27 Derived patterns and models: mobility profiles User history An ordered sequence of spatio-temporal points. Trips construction Cutting the user history when a stop is detected Stops Spatial Threshold Stops Temporal Threshold Grouping Performing a density based clustering equipped with a spatio temporal distance function Spatial Tollerance Temporal Tollerance Spatio temporal distance Pruning Groups with a small Number of trips are Pruned Support Threshold Profile extraction The medoid of each group becomes user s routines and the all set become the user s mobility profile Trasarti, Pinelli, Nanni, Giannotti. Mining mobility user profiles for car pooling. ACM SIGKDD 2011
28 Derived patterns and models: T- Prediction Tree + q Rule-based prediction model q Each T-Pattern is used as a case q Tree = combination / simplification of a set of T- Patterns + Monreale, Pinelli, Trasarti, Giannotti. Where Next: a predictor on Trajectory pattern mining. Proc. ACM SIGKDD 2009
29 Derived patterns and models: T- PTree q Example: Compare actual trajectory against the T-PTree q Spatial and temporal similarity used to choose best rule A E D B C
30 Semantic Annotation Semantic trajectories: translate (x,y,t) trajectories to sequences of events with a semantics Semantic enrichment: tag and classify trajectories or patterns based on domain knowledge or mined information
31 Semantic trajectories q First transform a (geometric) trajectory into a semantic representation, then apply data mining. q Semantic trajectory represented as a sequence of stops (places where objects stay still) & moves (trajectory segments Tr1 = < Hotel where [21 08], objects change position) Monument [ 9 13], Restaurant [14-16] >
32 Mobility Diaries q Data-driven diaries q Describe daily mobility routines by means of a set of semantic trajectories...
33 ... Mobility Diaries
34 Mobility Diaries (a) (b) Figure 1: (a) The top two eigenbehaviors for Subject 4 of the Reality Mining dataset, the lighter the color the higher the probability image taken from [5]. (b) Exemplary LDA-topics extracted from the Reality Mining dataset image taken from [6] Classification & Prediction of Whereabouts patterns from Reality Mining Data Sets. Ferrai andmamei. Pervasive & Mobile Computing Applying PCA or LDA to a set of these arrays allows to extract some lowdimensions latent variables (eigenvectors and LDA-topics respectively) representing underlying patterns in the data, and Journal, o ering conditional Dec probability
35 M-Atlas system Download from:
36 M-Atlas input q M-Atlas: An atlas for urban mobility behaviors. A framework to query, analyze and navigate the results on mobility data
37 M-Atlas platform q A tool kit to extract, store, combine different kinds of models to build mobility knowledge discovery processes.
38 M-Atlas System Centralized database which contains all the data, patterns and models. It is possible to extend the system with new algorithms and new data, pattern or model types.
39 Practically the system adds new object-relational types to the database in order to represent the new types of data, patterns and models. The advantage of having an object-relational representation is threefold: (i) it allows the definition of complex data such as lists and trees, ure 8. We distinguish between models and patterns: a pattern is a representation of a local property that holds over a sub-group of mobility data, e.g., a flock of trajectories; on the other hand, a model is a representation of a global property that holds over an entire dataset: accordingly, a model is either a global aggregate (e.g., speed distribution in a trajectory dataset) or a collection of patterns (e.g., the clustering that partitions an entire dataset into separate clusters). Objects taxonomy in M-Atlas Spatial Object Temporal Object Moving Object Data Object M-Model M-Pattern T-Reachability T-Clustering T-ODMatrix T-PTree T-Pattern T-Flow T-Cluster T-Flock set of set of aggregation of Fig. 8 The M-Atlas type hierarchy. M-Model, M-Pattern and Data are the basic types of data. We can notice the relationship between M-Models and M-Patterns. For example, T-Clustering model is represented by a set of T-Cluster patterns, while T-PTree model is an aggregation of T-Patterns We distinguish between models and patterns: a pattern is a representation of a local property that holds over a sub-group of mobility data; a model is a representation of a global property that holds over an entire dataset.
40 CREATE DATA Travels BUILDING MOVING_POINTS FROM (SELECT userid,lon,lat,datetime FROM RawData ORDER BY userid,datetime) SET MOVING_POINT.MAX_SPACE_GAP = 0.2 AND DMQL: MOVING_POINT.MAX_TIME_GAP Model contructors = T-Flow. M-Pattern The Types T-Flow tf =< R 1,R 2,w > represents a flow of w 0 trajectories which move from region R 1 to region R 2 (Figure 9(d)). A mobility pattern, M-Pattern in short, represents the common behavior of a (sub-)group of trajectories, obtained as a result of a data mining algorithm. The types of M-Patterns M-Model Types currently supported by M-Atlas are shown in Figure 9. Pattern s Mobility models, M-Models in short, are the global models extracted by a data mining algorithm, where the adjective global indicates the fact that each such model describes the entire input dataset. Figure 10 illustrates some of the available M-models in M-Atlas; other M- Models are simply the entire collection of T-Patterns, T-Clusters and T-Flocks mined over a trajectory dataset. Fig. 9 M-Pattern types: (a) T-Cluster, (b) T-Pattern, (c) T-Flock, (d) T-Flow Models T-Cluster. A T-Cluster (Figure 9(a)) is defined as a set S = {( 1,l), ( 2,l),...} of labelled trajectories, which share the same membership tag l. The trajectories of a T-Cluster are grouped on the basis of their similarity according to a specified similarity function, chosen from a repertoire of possible choices. of a data mining method with a specified parameter setting. M-Atla structor for each method in its data mining library, presented in sec T-Pattern: Fig. 10 M-Models it is represented types: (a) Reachability as tp =(R, plot, T, (b) s) T-PTree where and R mining =< (c) T-ODMatrix. r 0 constructor,...,r k > is query a sequence is the following, of which generates a step of regions, T =< t 1,...,t k > is a sequence of relative time clusters intervals under t j specific =[t s j,te j parameters: ] associated to each region and s is the support of tp, i.e., the number of trajectories that are compatible CREATE MODEL ClusteringTable t withreachability tp in space and plot: time. is a histogram Informally, ofadistances T-Patternbetween can trajectories, represented obtained as r 1 MINE AS T-CLUSTERING FROM (Select t.id, 0 r1 considering tk r t.trajobj k. a from TrajectoryTable t) Originally specificintroduced distance function in [17],(Figure a T-Pattern 10(a)). (Figure More9(b)) precisely, a concise SET it T-CLUSTERING.FUNCTION a sequence description of pairs of frequent Rp = =< ROUTE_SIMILARITY AND behaviors, (t 1,d 1 ) in...(t terms n,dof n )) both > where space t(i.e., j is athe trajectory regionsand of space d j is the visited distance T-CLUSTERING.EPS during between movements) t j = and 100 and t j+1 AND, T-CLUSTERING.MIN_PTS = 20 time where (i.e., the t j+1 duration is the of nearest movements). neighbor of t j which does not occur in {t 1,...,t j }. Using a threshold for distance, the reachability plot identifies a set of T-Clusters representing the T-Flock. A T-Flock f =(I,r,b) represents a spatio-temporal coincidence of a group of partition of the whole dataset into labelled groups of similar trajectories. moving points, where I =[t min,t max ] is the time interval of the coincidence, b is the base 3.2 Spatio-temporal query primitives moving T-PTree. point and A T-Pattern r is the spatial Tree, buffer T-PTree around in short, b which is a is compact used to representation determine the coincidence. of a set of T- This Patterns spatio temporal (Figure 10(b)). coincidence It is a prefix defines tree a PT common = {root, behavior N, E}, of where the people N is the which set of move nodes of the tree, E is the set of edges and root is the root of The thequerying tree. Each primitives node n i = over {r, data, supp} models and patterns are summ
41 The user Interface The process tree which organize the analyses done Each node has a type : Trajectories, Map, Clustering, Flocks, etc.. Each node is described by the chain of DMQL queries executed from the root The Map loaded from Open Street Map and composed by different layers Pre-built tools. Each one perform a set of DMQL queries on the selected node. Each tool has a set of parameters. Contextual Menu each node type has different options and tools. Each tool has a set of parameters. Additional panels for the navigation or pattern selection.
42 Mobility Data Mining process as a DMQL query q q q q CREATE MODEL MilanODMatrix AS MINE ODMATRIX FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t), (SELECT orig.id, orig.area FROM MunicipalityTable orig), (SELECT dest.id, dest.area FROM MunicipalityTable dest) CREATE RELATION CenterToNESuburbTrajectories USING ENTAIL FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t, MilanODMatrix m WHERE m.origin = Milan AND m.destination IN (Monza,..., Brugherio)) CREATE MODEL ClusteringTable AS MINE T- CLUSTERING FROM (Select t.id, t.trajectory from CenterToNESuburbTrajectories t) SET T-CLUSTERING.FUNCTION = ROUTE_SIMILARITY AND T-CLUSTERING.EPS = 400 AND T-CLUSTERING.MIN_PTS = 5 CREATE RELATION DistributionCluster USING CONTAINS FROM (SELECT t.id, t.trajectory, c.cid FROM ClusteringTable c, TrajectoryTable t WHERE c.tid=t.id), (SELECT * FROM Periods p) WHERE cid IN (0,2,3)
43 Mobility Atlas of a City Understanding urban human mobility
44 The (GeoP)KDD process Mobile phone data, GPS tracks End user Mobility Patterns Mobility manager Mobility Data Mining Mobility Data Raw data
45 Sensing the movement Several datasources avaiable
46 GSM data q q Mobile Cellular Networks handle information about the positioning of mobile terminals q CDR Call Data Records: call logs (tower position, time, duration,..) q Handover data: time of tower transition More sophisticated Network Measurement allow tracking of all active (calling) handsets
47 GPS tracks q Onboard navigation devices send GPS tracks to central servers Ide;Time;Lat;Lon;Height;Course;Speed;PDOP;State;NSat 8;22/03/07 08:51:52; ; ; 67.6;345.4;21.817;3.8;1808;4 8;22/03/07 08:51:56; ; ; 68.4;35.6;14.223;3.8;1808;4 8;22/03/07 08:51:59; ; ; 68.3;112.7;25.298;3.8;1808;4 8;22/03/07 08:52:03; ; ; 68.8;119.8;32.447;3.8;1808;4 8;22/03/07 08:52:06; ; ; 68.1;124.1;30.058;3.8;1808;4 8;22/03/07 08:52:09; ; ; 67.9;117.7;34.003;3.8;1808;4 8;22/03/07 08:52:12; ; ; 66.9;117.5;37.151;3.8;1808;4 8;22/03/07 08:52:15; ; ; 67.0;99.2;39.188;3.8;1808;4 8;22/03/07 08:52:18; ; ; 68.8;90.6;41.170;3.8;1808;4 8;22/03/07 08:52:21; ; ; 71.1;82.0;35.058;3.8;1808;4 8;22/03/07 08:52:24; ; ; 68.6;117.1;11.371;3.8;1808;4 q Sampling rate 30 secs q Spatial precision 10 m
48 Road side sensors q Measure the flow of a specific road arc q Laser-based sensors q Inductive loops q Traffic cameras
49 Other data sources q Social web services q Flickr q Foursquare q Gowalla q Twitter q Presence estimation q Hotel statistics q Airport departures and arrivals q Bus and public transportation q Park usage q Weather conditions
50 Dimensions to explore q Space q Administrative borders q E.g.: city q Distance travelled q How much a person is travelling Space Dimensions Individual Individual Preferred locations EigenMobility Time q Time q Hour of day q Day of week q Weekdays/weekends
51 A small city: Pisa Space Dimension s Individ ual Time
52 First dimension: space Travel length distribution Space Dimension s Individ ual Time
53 Travel length on the map
54 Pisa Pisa Firenze Lucca Livorno Siena Sum Firenze Lucca Livorno From everywhere To Firenze 26 January 26 Jan 27 Jan To Lucca From everywhere 26 January 28 Jan 29 Jan From everywhere To everywhere All times 30 Jan To Lucca From everywhere All times Exploring Origin and Destinations
55 Exploring Origins and Destinations
56 Exploring the origins of trips 0km 5km 5km 15Km > 150km
57 Exploring origins of trips > 150km 19 trips
58 Second dimension: time When people move to Pisa? Space Dimension s Individ ual Time
59 Let s focus at city level 0km 5Km 5km 15Km
60 Trips segmented by similarity Space Dimension s Individ ual Time
61 Explore clusters: Florence
62 Explore clusters: A1
63 Explore clusters: A12
64 Explore Clusters: Valdera
65 Explore clusters: Versilia
66 Trip segmentation by time Space Dimension s Individ ual Time
67 Trips Segmented by Time: from 5 to 8
68 Discover traffic jams
69 Aggregate trips by common destinations
70 Industry: Saint Gobain
71 Industry: Saint Gobain
72 Residential Area: I Passi
73 Residential Area: I Passi
74 Residential vs Industrial
75 Services: Montacchiello
76 Services: Montacchiello
77 Extracting travellers profiles - Analysis focused on the single individual - Find his/her systematic mobility User trips Mobility profile Routines
78 Services: Montacchiello (Profiles) Space Dimension s Individ ual Time
79 Impact of systematic mobility on access patterns
80 What-if scenarios
81 Service: Montacchiello (Car Pooling?) q Traj Blu q DT: 06:46:53 q Traj Red q DT: 11:52:06 q Traj Green q DT: 06:51:41 q Blu can give a ride to Green
82 Application: Car pooling Pro-active suggestions of sharing rides opportunities without the need for the user to explicitly specify the trips of interest. Matching two routines: Mobility profile share-ability:
83 Communities of users
84 Networks as a mining tool S. Rinzivillo, S. Mainardi, F. Pezzoni, M. Coscia, D. Pedreschi, F. Giannotti Discovering the Geographical Borders of Human Mobility KI - Künstliche Intelligenz, 2012.
85 Mobility coverages
86 Step 1: spatial regions
87 Step 2: evaluate flows among regions
88 Step 3: forget geography
89 Step 4: perform community detection
90 Step 4: perform community detection
91 Step 5: map back to geography
92 Step 6: draw borders
93 Final result
94 Final result: compare with municipality borders
95 Borders in different time periods Only weekdays movements Only weekend movements Similar to global clustering: strong influence of systematic movements Strong fragmentation: the influence of systematic movements (home-work) is missing
96 Borders at regional scale
97 Final results 7 (a) 500m (b) 1000m (c) 2000m (c) 5,000m (d) 10,000m (e) 20,000m Fig. 7: The resulting clusters obtained with different spatial granularities topology analysis of the networks performed in Section IV, that identified the most promising cell sizes at values smaller 0.58
98 Confronto con le nuove province
99 Explore borders by time q Use temporal projections to extract mobility networks q Identified three main periods q Week days q Week ends q Whole week q Having GPS data extending over 4 weeks we extracted 12 distinct networs, named as week0,weekday0,weekend0,week1,and so on Coscia, M., Rinzivillo, S., Giannotti, F. and Pedreschi, D., Optimal Spatial Resolution for the Analysis of Human Mobility. In ASONAM, 2012.
100 Degree distribution by time p(d) Weekdays1 Weekdays2 Weekdays3 Weekdays4 Weekend1 Weekend2 Weekend3 Weekend4 Week1 Week2 Week3 Week e d
101 Network properties (by day) # Nodes Edges # Connected Components May 2st May 8th May 15th May 22nd 100 May 29th May 2st May 8th May 15th May 22nd May 29th Day Day
102 Borders quality
103 Semantic Enrichment
104 NetMob 2013 MP4-A Project: Mobility Planning For Africa Mirco Nanni, Roberto Trasarti, Barbara Furletti, Lorenzo Gabrielli Peter Van Der Mede, Joost De Bruijn, Erik De Romph, Gerard Bruil
105 The Challenge q Incompleteness issue q Call Detail Records describe the location of users only during activity (calls, messages) q Most individual mobility might be invisible q Lack of semantics q No information about activities and purpose q Spatial uncertainty issue q Location described in terms of cells having dynamic and sometimes large extent
106 The approach (summary) q Analyze raw GSM data to q infer systematic mobility of individuals q Build origin-destination matrices q Describe (expected) flows between areas q Build a transportation model q Assigns O/D matrix to OSM road network through OmniTRANS system
107 Systematic mobility q A single trace of an individual can be poorly informative about his/her movements H B W A C H A B W C time
108 Systematic mobility q Yet, several daily traces of the same individual might allow to identify regular places H A W C H H W H H B H A W H H W W H H A B W C
109 H Systematic mobility q Yet, several daily traces of the same individual might allow to identify regular places A C W H W B H A H W H W H H H A B W H C W H H H H H H H H H H W W W W W W
110 H Systematic mobility q Yet, several daily traces of the same individual might allow to identify regular places and trips A C W H W B H A H W H W H H H A B W H C W H H H H H H H H H H W W W W W W
111 Systematic mobility q The whole individual mobility is then summarized by its systematic movements Afternoon routine H W Morning routine l They will be used as typical daily schedule of the individual
112 Systematic O/D matrix q Combine the ten 2-weeks datasets into one q For each user, extract significant L1 L2 q Aggregate (individual) systematic movements into (collective) systematic flows q Examples: Outgoing traffic Incoming traffic
113
114
115
116 Mobile phone socio-meters Analyze individual call habits to recognize profiles q Resident q Commuters q Visitors/Tourists
117 Call Habit Profiles Week: working days Time & weekend slots 0:00-7:59 8:00-18:59 19:00-23:59 Users call habit profile
118 Resident profile
119 Resident profile Commuter profile
120 Resident profile Commuter profile Visitor profile Night visitors Daylight visitors
121 User profile quantification Resident profile Commuter profile Visitor profile
122 Sponsored by: Investigating semantic regularity of human mobility lifestyle Vinicius Monteiro de Lira Federal University of Pernambuco, Brazil Valeria Cesario Times Federal University of Pernambuco, Brazil Patricia Cabral Tedesco Federal University of Pernambuco, Brazil Salvatore Rinzivillo ISTI-CNR, Pisa, Italy Chiara Renso ISTI-CNR, Pisa, Italy 18th International Database Engineering & Applications Symposium IDEAS '14 Porto, Portugal 12 2
123 INTRODUCTION 123 The appearance and wide distribution of position-enabled personal devices boosted the study of the mobility behavior of the individuals based on crowsourced data. When these postioning data are enrcihed with semantic information (i.e. the place visited) we have semantic trajectories. The semantics helps in the human dynamics understanding
124 About Regularity 124 We study the tendency of mobile individuals to be regular or irregular when choosing the places and the time to perform some activities Semantic (or activity-based) Regularity Definition of spatial and temporal entropy as a measure of the semantic regularity of users computed from crowsensed data Values ranges from 0 to 1; Where 1 means highest regularity; and 0, lowest regularity or no regularity;
125 125 Why studying semantic regularity? Regularity profiles can characterize one specific aspect of the user lifestyle We give a quantitative measure of the regularity habits of the people under observation This can be useful in: Recommendation systems Carpooling Advertisement
126 METHODOLOGY 126 The semantic regularity behavior is measured according to two dimensions: Spatial: how much a user tends to visit the same places to perform a given activity. Temporal: the regularity of the user to perform an activity in a preferred temporal interval.
127 METHODOLOGY 127 Three phases: (i) Data Collection of users' visits to Points of Interest (POIs); (ii) Estimation of the regularity measures (iii) Extraction of the semantic regularity profiles.
128 Semantic regularity - Example 128 Visits dataset University Work/Study Gym Leisure Restaurant Eating We associate a category of place to an activity with a static mapping University Work/Study Gym Leisure Restaurant Eating
129 Visits and frequency distributions 129 The Visits dataset provides the mobility information to associate a person p to a POI poi_id she visited. < VisitID; UserID; poi id; poi cat; timestamp > Formally, for a POI p of category C we define the spatial relative frequency distribution SRFD of u as: SRFD(u,C,p) = P(u in p C) = #visists to p #visits to C Formally, for a POI of category C we define the temporal relative frequency distribution TRFD of u as: TRFD(u,C,t) = P(u in t C) = #visists to t #visits to C
130 The Entropy measures 130 Given a user u and a place category C, his Spatial Entropy (SH): SH(u,C) = p C SRFD(u,p,C)logSRFD(u,p,C) And, analogously the Temporal Entropy (TH): TH(u,C) = p T TRFD(u,t,C)logTRFD(u,t,C) The Spatial Maximum Entropy (SMH) for each category: SMH(C) = log C The Temporal Maximum Entropy (TMH) for each category: SMH(C) = log I
131 Semantic regularity 131 Given a user u and a category C, the Semantic Spatial Regularity for C is: Given a user u, a set of interval I and a category C, the Semantic Temporal Regularity for C is: A semantic regularity profile for a user u and consists of a set of tuples < Ci,SSR(u,Ci),STR(u,Ci) > for all catgories of places (activities) Ci in C1,C2,...,Cn.
132 Example 132 Example of Semantic Spatial Regularity and Semantic Temporal Regularity for Gyms: We compute the Spatial Entropy (SH) and the Temporal Entropy (TH). Based on this we can see that the regularity measure for the gym SSR is high, while the temporal regularity STR is low
133 EXPERIMENTS 133 We tested our methodology using a dataset of check-ins generated from a Location-based Social Network (LBSN), called Brightkite. The dataset has a total of check-ins performed by 2806 users around the world between March 22nd, 2008 and October 18th, Check-ins : user identification, the geographic coordinates and the time instant Foursquare API to annotate semantically the places where users performed the check-ins. 13 main categories of POIs mapped to most common activities
134 EXPERIMENTS - restaurants 134 Restaurant category Most REGULAR Most people tend to change the place when they go eating and also the time when they go. Most of the users are irregular in space and time Most IRREGULAR
135 EXPERIMENTS - University 135 University category Most REGULAR We clearly notice a very regular spatial behavior Most of the users are distributed close the value 1 (more regular) on the spatial dimension Most IRREGULAR
136 EXPERIMENTS High regularity 136 TL TR High irregularity BL BR
137 MAPMOLTY tool 137 MAPMOLTY computes a number of measures to summarize the loyalty level of each POI from different Categories, called loyalty indicators. The application is built upon the map to ease the navigability and visualization in the interesting area. Vinicius de Lira, Chiara Renso, Salvatore Rinzivillo, Valeria Cesario Times and Patricia Tedesco. MAPMOLTY: a web tool for discovering place loyalty based on mobile crowdsource data, Demo paper at ICWE 2014
138 Collect Movements from the Crowd q Investigate approaches to mine urban mobility patterns and anomalies by analyzing socially created trajectories: - Extract mobility from geo-enabled social media - Enrich with contextual/semantic information to extract more insights about the nature of the movements.
139 Twitter Data q Microblogging platform q User may send short messagges (up to 140 characters) on what is around them q Georeference of tweets q 600k tweets (300k geotagged) q 33k users q 8 weeks (may-june 2012)
140 How to build Tweet-trajectories q Aggregate consecutive tweets according to a spatio-temporal threshold
141 Sampling rate distribution of tweets
142 Trajectory Extraction
143 Trajectory Extraction
144 Origin Destination Analysis
145 Origin Destination Analysis: relevant fluxes From Airport From Sagrada Familia
146 Semantic Enrichment
147 Foursquare q q User contributed timestamped position 9 Top-level categories q Nightlife and Sport q Travel & Transport q Outdoor & Recreation q Shop & Service q College & University q Food q Art & Entertainment q News q Residence q Professional & Other Places
148 Semantic trajectory mining MWC2012 Semantic Trajectories q Dataset: 9689 trajectories built (75 min./100 mt.) from geo-located tweets of Barcelona during the week of the Mobile World Congress 2012 (MWC2012), the week before and the week after and semantically enriched by classifying as performed by tourists and locals associating the most-likely Foursquare venue.
149 Semantic Trajectory Mining Semantic Origin/Destinaiton matrix built considering the top Foursquare category of origin and destination of trajectories Start trajectory Foursquare place: Burger King, L Hospitalet Foursquare top category: Food End Trajectory Foursquare place: 22@, Glories Foursquare top category: Professional & Other places
150 Semantic Trajectory Mining Semantic Origin/Destinaiton matrix built considering the top Foursquare category of origin and destination of trajectories
151 Semantic Trajectory Mining Semantic Origin/Destinaiton matrix built considering the top Foursquare category Week before of origin and destination of trajectories MWC2012 Week Week after
152 Join Semantics with Spatial Flow Trips by category entering to Sants Montjuic Week 0 Week 1 Week # of trips Food Arts & Entertainment Outdoors & Recreation Professional & Other Places Travel & Transport Shop & Service Nightlife Spot
153 Join Semantics with Spatial Flow Trips exiting from Sants Montjuic by category Week 0 Week 1 Week # of trips Food Arts & Entertainment Outdoors & Recreation Professional & Other Places Travel & Transport Shop & Service Nightlife Spot College & University
154 Sponsored by: Where have you been today? Annotating trajectories with DayTag S. Rinzivillo, F. Siqueira, L. Gabrielli, C. Renso, V. Bogorny SSTD 2013, Demo Paper, Monaco
155 Sensing People Behavior: Surveys q Cons q Low spatial precision q Low temporal accuracy q Limited in time (usually one or two days) q Underestimation of short stops (e.g. ATM) q Pro q Semantically rich q User-view of movement q Motivation of the movement
156 Sensing People Behavior: GPS q Cons: q No semantic information q Difficult for user to reconstruct movement motivations q Pro q High spatial precision q High temporal accuracy q Unlimited time of track q Precise reconstruction of movement dynamic (accelaration, route, speed) q Low cost technology
157 DayTag
158 DayTag: Anatomy
159 DayTag: Timeline
160 DayTag: Spatial Reference
161 DayTag: Semantic Information
162 Cambia il traffico con i tuoi TAG Una inizifva di: In collaborazione con: tagmyday.isf.cnr.it
163 Join us Move Tag Send tagmyday.isf.cnr.it
164 Personal Data Store tagmyday.isf.cnr.it
165 AcFvity DistribuFon Incoming flow to Calci from Pisa tagmyday.isf.cnr.it
166 Atlas of Urban Mobility
167 Atlas of Urban Mobility
168 Pisa Traffico in Ingresso
169 Pisa Incoming Traffic
170 Trip distribution per day Pisa S. Giuliano Cascina
171 From DATA to KNOWLEDGE Demographic data Transport data Movement data Geographic data Data T- Clustering T- Pa[erns Models ValidaFon Forecasts
172 Deployment of a model Data Integration and Semantic Enrichment Service Continuosly Sensed indicator CREATE MODEL MilanODMatrix AS MINE ODMATRIX FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t), (SELECT orig.id, orig.area FROM MunicipalityTable orig), (SELECT dest.id, dest.area FROM MunicipalityTable dest) Dashboard Periodically Sensed indicator Validation
173 Privacy by Design in Data Mining
174 7 Billion October 2011
175
176 The dark side: Privacy Risks 176 ü Big data of human activity contain personal sensitive information ü Opportunities of discovering knowledge by analytical and data mining tools increase hand in hand with the risks of privacy violation ü An important question: May data publishing and mining violate individual privacy?
177 De-identified User Trajectory 177 ü ü Human data may reveal many facets of the private life Privacy protection is increasingly difficult and it cannot simply be accomplished by de-identification ü ü Color darkness of each region is proportional to the number of different visits Discovering persons living in that home and working in that company we can identify the user
178 178 How can we guarantee privacy protection in Data Mining? Privacy by Design Paradigm
179 Privacy by Design Paradigm 179 ü Design frameworks to counter the threats of undesirable and unlawful effects of privacy violation without obstructing the knowledge discovery opportunities of data mining technologies ü Natural trade-off between privacy quantification and data utility ü Our idea: Privacy by Design in Data Mining Philosophy and approach of embedding privacy into the design, operation and management of information processing technologies and systems
180 Privacy by Design in Data Mining 180 ü The framework is designed with assumptions about The sensitive data that are the subject of the analysis The attack model, i.e., the knowledge and purpose of a malicious party that wants to discover the sensitive data The target analytical questions that are to be answered with the data ü Design a privacy-preserving framework able to transform the data into an anonymous version with a quantifiable privacy guarantee guarantee that the analytical questions can be answered correctly, within a quantifiable approximation that specifies the data utility
181 Our Frameworks 181 q Privacy by Design for Data Publishing q Trajectory Anonymization by spatial generalization q Trajectory Anonymization by semantic generalization q Privacy by Design for Data Mining Outsourcing q Privacy-Preserving Mining of Association Rules from Outsourced Transaction Databases q Privacy by Design for GSM User Profiles q Privacy by Design in Distributed Movement Data
182 Privacy by Design for Movement Data Publication A. Monreale, G. Andrienko, N. Andrienko, F. Giannotti, D. Pedreschi, S. Rinzivillo, S. Wrobel. Movement Data Anonymity through Generalization. Journal of Transactions on Data Privacy
183 Privacy-Preserving Framework q Anonymization of movement data while preserving clustering q Trajectory Linking Attack: the attacker q knows some points of a given trajectory q and wants to infer the whole trajectory q Countermeasure: method based on q spatial generalization of trajectories q k-anonymization of trajectories
184 Trajectory Generalization q Given a trajectory dataset 1. Partition of the territory into Voronoi cells 2. Transform trajectories into sequence of cells
185 Partition of the territory Characteristic points extraction: Starts (1) Ends (2) Points of significant turns (3) Points of significant stops, and representative points from long straight segments (4) Spatial Clusters : Group the extracted points with desired spatial extent (MaxRadius) defining the degree of the generalization Voronoi Tessellation: Partition the territory into Voronoi cells using the centroids of the spatial clusters as generating
186 Generation of trajectories Divide the trajectories into segments that link Voronoi cells For each trajectory: the area a 1 containing its first point p 1 is found the following points are checked If a point p i is not contained in a 1 for it the containing area a 2 is found and so on Generalized trajectory: From sequence of areas to sequence of centroids of areas
187 Generalization vs k-anonymity 187 q Generalization could not be sufficient to ensure k-anonymity: q For each generalized trajectory there exist at least others k-1 different people with the same trajectory? q Two transformation strategies q KAM-CUT q publishing only the k-frequent prefixes of the generalized trajectories q KAM-REC q recovering portions of trajectories which are frequent at least k times q minimizing the noise
188 Dataset 188 q Trajectory Data in Milan city q GPS traces by about 17,000 vehicles
189 Clustering on Anonymized Trajectories 189
190 Probability of re-identification 190
191 Conclusion q Opportunities and challenges to have a deep insight within human mobility q Mobility models as dual piece of knowledge q Enabler for new services q Decision support for planning and design q Creation and extraction of complex models supported by an integrated platform: M-Atlas q Management of complex analytical processes q Deployment of services
192 Conclusion q Privacy is ever-growing concern in our society q Privacy often brings to skepticism q Effects on the use of technologies q Effects on the opportunities of data understanding q Providing methodologies for risk evaluation and data control
193 Key publications q q q q q q q F Giannotti, M Nanni, F Pinelli, D Pedreschi. Trajectory pattern mining. ACM SIGKDD 2007 F Giannotti, D Pedreschi. Mobility, data mining and privacy: Geographic knowledge discovery. Springer, 2008 A Monreale, F Pinelli, R Trasarti, F Giannotti. WhereNext: a location predictor on trajectory pattern mining. ACM SIGKDD 2009 S Rinzivillo, D Pedreschi, M Nanni, F Giannotti, N Andrienko, G Andrienko. Visually driven analysis of movement data by progressive clustering. Information Visualization 7 (3-4), D Wang, D Pedreschi, C Song, F Giannotti, AL Barabasi. Human mobility, social ties, and link prediction. ACM SIGKDD 2011 F Giannotti, M Nanni, D Pedreschi, F Pinelli, C Renso, S Rinzivillo, R Trasarti. Unveiling the complexity of human mobility by querying and mining massive trajectory data. The VLDB 20(5) 2011 R Trasarti, F Pinelli, M Nanni, F Giannotti. Mining mobility user profiles for car pooling. ACM SIGKDD 2011
194 Key publications q q q q q M Coscia, G Rossetti, F Giannotti, D Pedreschi. Demon: a local-first discovery method for overlapping communities. ACM SIGKDD 2012 S Rinzivillo, S Mainardi, F Pezzoni, M Coscia, D Pedreschi, F Giannotti. Discovering the geographical borders of human mobility. KI-Künstliche Intelligenz 26 (3) 2012 D Pennacchioli, M Coscia, S Rinzivillo, D Pedreschi, F Giannotti. Explaining the Product Range Effect in Purchase Data. IEEE BIGDATA 2013 B Furletti, L Gabrielli, C Renso, S Rinzivillo. Analysis of GSM Calls Data for Understanding User Mobility Behavior. IEEE BIG DATA 2013 L Milli, A Monreale, G Rossetti, D Pedreschi, F Giannotti, F Sebastiani. Quantification trees. IEEE ICDM 2013
195 Vision papers q F Giannotti, D Pedreschi, A Pentland, P Lukowicz, D Kossmann, J Crowley, D Helbing. A planetary nervous system for social mining and collective awareness. The European Physical Journal Special Topics 214 (1), 49-75, 2012 q J van den Hoven, D Helbing, D Pedreschi, J Domingo-Ferrer, F Giannotti. FuturICT The road towards ethical ICT. The European Physical Journal Special Topics 214 (1), , 2012 q M Batty, KW Axhausen, F Giannotti, A Pozdnoukhov, A Bazzani, M Wachowicz. Smart cities of the future. The European Physical Journal Special Topics 214 (1), , 2012
Mobile phone data for Mobility statistics
International Conference on Big Data for Official Statistics Organised by UNSD and NBS China Beijing, China, 28-30 October 2014 Mobile phone data for Mobility statistics Emanuele Baldacci Italian National
Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach
Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach Barbara Furletti, Lorenzo Gabrielli, Giuseppe Garofalo,
Big Data & Privacy. It s Time for a New Deal on Personal Data Dino Pedreschi. KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr.
Big Data & Privacy It s Time for a New Deal on Personal Data Dino Pedreschi KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr.it Taiwan-Italy Workshop Roma 27 Feb 2015 SIAMO TUTTI POLLICINI DIGITALI
Identifying users profiles from mobile calls habits
Identifying users profiles from mobile calls habits Barbara Furletti KDDLAB - ISTI CNR Pisa, Italy [email protected] Lorenzo Gabrielli KDDLAB- ISTI CNR Pisa, Italy [email protected]
MOBILITY DATA MODELING AND REPRESENTATION
PART I MOBILITY DATA MODELING AND REPRESENTATION 1 Trajectories and Their Representations Stefano Spaccapietra, Christine Parent, and Laura Spinsanti 1.1 Introduction For a long time, applications have
Discovering Trajectory Outliers between Regions of Interest
Discovering Trajectory Outliers between Regions of Interest Vitor Cunha Fontes 1, Lucas Andre de Alencar 1, Chiara Renso 2, Vania Bogorny 1 1 Dep. de Informática e Estatística Universidade Federal de Santa
Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey
1 Recommendations in Mobile Environments Professor Hui Xiong Rutgers Business School Rutgers University ADMA-2014 Rutgers, the State University of New Jersey Big Data 3 Big Data Application Requirements
Location-Based Social Networks: Users
Chapter 8 Location-Based Social Networks: Users Yu Zheng Abstract In this chapter, we introduce and define the meaning of location-based social network (LBSN) and discuss the research philosophy behind
Big Data Analytics in Mobile Environments
1 Big Data Analytics in Mobile Environments 熊 辉 教 授 罗 格 斯 - 新 泽 西 州 立 大 学 2012-10-2 Rutgers, the State University of New Jersey Why big data: historical view? Productivity versus Complexity (interrelatedness,
Advanced Methods for Pedestrian and Bicyclist Sensing
Advanced Methods for Pedestrian and Bicyclist Sensing Yinhai Wang PacTrans STAR Lab University of Washington Email: [email protected] Tel: 1-206-616-2696 For Exchange with University of Nevada Reno Sept. 25,
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Processes of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region
Processes of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region (DAStU, Politecnico di Milano) New «urban questions» and challenges
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,
Big Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy [email protected] Data Availability or Data Deluge? Some decades
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Mapping Linear Networks Based on Cellular Phone Tracking
Ronen RYBOWSKI, Aaron BELLER and Yerach DOYTSHER, Israel Key words: Cellular Phones, Cellular Network, Linear Networks, Mapping. ABSTRACT The paper investigates the ability of accurately mapping linear
Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR:
Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR: Developing Real-time Census from Crowds of Greater Dhaka Ayumi Arai 1 and Ryosuke Shibasaki 1,2 1 Department
Traffic mining in a road-network: How does the
82 Int. J. Business Intelligence and Data Mining, Vol. 3, No. 1, 2008 Traffic mining in a road-network: How does the traffic flow? Irene Ntoutsi Department of Informatics, University of Piraeus, Greece
CHAPTER-24 Mining Spatial Databases
CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification
IBM Social Media Analytics
IBM Social Media Analytics Analyze social media data to better understand your customers and markets Highlights Understand consumer sentiment and optimize marketing campaigns. Improve the customer experience
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
BIG DATA FOR MODELLING 2.0
BIG DATA FOR MODELLING 2.0 ENHANCING MODELS WITH MASSIVE REAL MOBILITY DATA DATA INTEGRATION www.ptvgroup.com Lorenzo Meschini - CEO, PTV SISTeMA COST TU1004 final Conference www.ptvgroup.com Paris, 11
The STC for Event Analysis: Scalability Issues
The STC for Event Analysis: Scalability Issues Georg Fuchs Gennady Andrienko http://geoanalytics.net Events Something [significant] happened somewhere, sometime Analysis goal and domain dependent, e.g.
Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
PH.D. THESIS (SSD) INF/01. Mastering the Spatio-Temporal Knowledge Discovery Process
University of Pisa Department of Computer Science. PH.D. THESIS (SSD) INF/01 Mastering the Spatio-Temporal Knowledge Discovery Process Ph.D. Candidate: Roberto Trasarti Supervisors: Prof. Dino Pedreschi
DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE
DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE INTRODUCTION RESEARCH IN PRACTICE PAPER SERIES, FALL 2011. BUSINESS INTELLIGENCE AND PREDICTIVE ANALYTICS
Customer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
Use of System Dynamics for modelling customers flows from residential areas to selling centers
Use of System Dynamics for modelling customers flows from residential areas to selling centers ENRICO BRIANO (*) CLAUDIA CABALLINI (*)(**) ROBERTO REVETRIA (*)(**) MAURIZIO SCHENONE (**) ALESSANDRO TESTA
IBM Social Media Analytics
IBM Analyze social media data to improve business outcomes Highlights Grow your business by understanding consumer sentiment and optimizing marketing campaigns. Make better decisions and strategies across
Mining Mobile Group Patterns: A Trajectory-Based Approach
Mining Mobile Group Patterns: A Trajectory-Based Approach San-Yih Hwang, Ying-Han Liu, Jeng-Kuen Chiu, and Ee-Peng Lim Department of Information Management National Sun Yat-Sen University, Kaohsiung, Taiwan
Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011
Behavior Analysis in Crowded Environments XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011 Behavior Analysis in Sparse Scenes Zelnik-Manor & Irani CVPR
Visualizing e-government Portal and Its Performance in WEBVS
Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR [email protected] Abstract An e-government
IDENTIFICATION OF KEY LOCATIONS BASED ON ONLINE SOCIAL NETWORK ACTIVITY
H. Efstathiades, D. Antoniades, G. Pallis, M. D. Dikaiakos IDENTIFICATION OF KEY LOCATIONS BASED ON ONLINE SOCIAL NETWORK ACTIVITY 1 Motivation Key Locations information is of high importance for various
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Fleet management system as actuator for public transport priority
10th ITS European Congress, Helsinki, Finland 16 19 June 2014 TP 0226 Fleet management system as actuator for public transport priority Niels van den Bosch 1, Anders Boye Torp Madsen 2 1. IMTECH Traffic
Craig McWilliams Craig Burrell. Bringing Smarter, Safer Transport to NZ
Craig McWilliams Craig Burrell Bringing Smarter, Safer Transport to NZ World Class Transport. Smarter, Stronger, Safer. Bringing Smarter Safer Transport to NZ Craig Burrell Infrastructure Advisory Director
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
NetView 360 Product Description
NetView 360 Product Description Heterogeneous network (HetNet) planning is a specialized process that should not be thought of as adaptation of the traditional macro cell planning process. The new approach
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
DATA MINING - 1DL360
DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30
Complex Event Processing (CEP) Why and How Richard Hallgren BUGS 2013-05-30 Objectives Understand why and how CEP is important for modern business processes Concepts within a CEP solution Overview of StreamInsight
Smart Transport for Sustainable City
Smart Transport for Sustainable City Dipartimento di Ingegneria dell Informazione University of Pisa, Italy E-mail: [email protected] Alessio Bechini, Beatrice Lazzerini Projects SMARTY (SMArt
Clustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
arxiv:1408.1519v2 [cs.si] 8 Aug 2014
Group colocation behavior in technological social networks arxiv:48.59v2 [cs.si] 8 Aug 24 Chloë Brown, Neal Lathia, Anastasios Noulas, Cecilia Mascolo, and Vincent Blondel 2 Computer Laboratory, University
How To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory
GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory Yu Zheng, Xing Xie and Wei-Ying Ma Microsoft Research Asia, 4F Sigma Building, NO. 49 Zhichun Road, Beijing 100190,
Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle
Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential
How To Create A Retail Analytics Platform With Tapway
How to revolutionize brickand-mortar retail industry with big data analytics? April 20, 2015 Agenda Tapway Introduction & Why We Do What We Do Technology Overview In-store Analytics for Retail Chains Shopper
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
Use of Mobile Positioning Data for Tourism Statistics
Peter Laimer Johanna Ostertag-Sydler Directorate Spatial Statistics Workshop 14 th May 2014 Prague, Czech Republic Use of Mobile Positioning Data for Tourism Statistics Austrian views www.statistik.at
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
In comparison, much less modeling has been done in Homeowners
Predictive Modeling for Homeowners David Cummings VP & Chief Actuary ISO Innovative Analytics 1 Opportunities in Predictive Modeling Lessons from Personal Auto Major innovations in historically static
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure
Hitachi Review Vol. 63 (2014), No. 1 18 Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure Kazuaki Iwamura Hideki Tonooka Yoshihiro Mizuno Yuichi Mashita OVERVIEW:
Scalable Cluster Analysis of Spatial Events
International Workshop on Visual Analytics (2012) K. Matkovic and G. Santucci (Editors) Scalable Cluster Analysis of Spatial Events I. Peca 1, G. Fuchs 1, K. Vrotsou 1,2, N. Andrienko 1 & G. Andrienko
Deep Insights Smart Decisions Motionlogic
Deep Insights Smart Decisions Motionlogic About Motionlogic Big Data business of Deutsche Telekom 100% subsidiary Analytics of people movement behavior and demographic indicators Using anonymized network
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
2013 Student Competition
ITS Heartland Chapter 2013 Student Competition Shu Yang ([email protected]) Saber Abdoli ([email protected]) Tiffany M. Rando ([email protected]) Smart Transportation Lab Department of Civil Engineering Parks
Using multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
1.5.3 Project 3: Traffic Monitoring
1.5.3 Project 3: Traffic Monitoring This project aims to provide helpful information about traffic in a given geographic area based on the history of traffic patterns, current weather, and time of the
Spatio-Temporal Clustering: a Survey
Spatio-Temporal Clustering: a Survey Slava Kisilevich, Florian Mansmann, Mirco Nanni, Salvatore Rinzivillo Abstract Spatio-temporal clustering is a process of grouping objects based on their spatial and
PhoCA: An extensible service-oriented tool for Photo Clustering Analysis
paper:5 PhoCA: An extensible service-oriented tool for Photo Clustering Analysis Yuri A. Lacerda 1,2, Johny M. da Silva 2, Leandro B. Marinho 1, Cláudio de S. Baptista 1 1 Laboratório de Sistemas de Informação
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
DEMOCRATIZING BIG DATA: THE ETHICAL CHALLENGES OF SOCIAL MINING. Dino PEDRESCHI (KDDLab, Dipartimento di Informatica, Università di Pisa)
DEMOCRATIZING BIG DATA: THE ETHICAL CHALLENGES OF SOCIAL MINING Dino PEDRESCHI (KDDLab, Dipartimento di Informatica, Università di Pisa) Siamo tutti pollicini digitali Plenty of digital breadcrumbs behind
A Study of Web Log Analysis Using Clustering Techniques
A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities
A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities The first article of this series presented the capability model for business analytics that is illustrated in Figure One.
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Use of a Web-Based GIS for Real-Time Traffic Information Fusion and Presentation over the Internet
Use of a Web-Based GIS for Real-Time Traffic Information Fusion and Presentation over the Internet SUMMARY Dimitris Kotzinos 1, Poulicos Prastacos 2 1 Department of Computer Science, University of Crete
3. Dataset size reduction. 4. BGP-4 patterns. Detection of inter-domain routing problems using BGP-4 protocol patterns P.A.
Newsletter Inter-domain QoS, Issue 8, March 2004 Online monthly journal of INTERMON consortia Dynamic information concerning research, standardisation and practical issues of inter-domain QoS --------------------------------------------------------------------
A framework for Itinerary Personalization in Cultural Tourism of Smart Cities
A framework for Itinerary Personalization in Cultural Tourism of Smart Cities Gianpaolo D Amico, Simone Ercoli, and Alberto Del Bimbo University of Florence, Media Integration and Communication Center
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers
Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING
An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]
An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,
