Daniela Rus Professor Director of CSAIL Massachusetts Institute of Technology Case Study: Transportation Daniela Rus Professor Director, CSAIL Massachusetts Institute of Technology Future Urban Mobility Data-Driven Transportation Improve Level of Service Enhance sustainability Modeling, Predicting, Controlling Personalization Optimization 1
Data-Driven Transportation Goals Algorithmic and policy solutions for improving traffic Visualizing traffic Analysis of travel patterns Congestion-aware routing Urban planning Road pricing Congestion Special Events Hot Spots Weather [FM s Live Singapore Project] Case Study in Singapore Singapore 30 x 16 miles, 5M people, high GDP per capita, lots of taxis 2
2/27/14 Singapore Taxi Data 16k unique taxis ID, GPS location, speed, status, timestamp 0.5-1 min intervals One month: 512M data points, Analysis of Traffic Data: How Representative Taxi Data is for Traffic! 33 G Sejoon Lim, Xinghao Pan, Javed Aslam, and Daniela Rus! sjlim@mit.edu, pxinghao@dso.org.sg, javed.aslam@smart.mit.edu, rus@csail.mit.edu! I. Problem! Data cleaning, map matching, In-car Data Device III. Results! Comparison : MOTIVATION: * Taxi GPS data are getting available in many sources. * Lots of research use taxi data as a good proxy for understanding the total traffic. GOAL: * How well does the taxi data represent traffic in Singapore? * If it is representative, how much taxi data do we need to infer a Traffic Visualization: Volume good model for traffic? * If it is not representative, how is it deficient? * Fig. 1. shows the comparison between loop detector data and taxi data. * Taxi data s representativeness depends on the location, time of day, etc. Compensation : * Fig. 2. shows the one month of data for a specific loop detector location. We can see the traffic distribution of loop detectors and taxis are very similar after compensating the bias specific to location and predictable change over time. [ Fig. 3. Traffic distribution difference btw loop detector data and taxi data ]! [ Fig. 1. Taxi data on Aug 1! (Sun) and Aug 2 (Mon) ]! Tackling the Challenges of Big Data 2014 Massachusetts Institute of Technology II. Approach! Static sensors: are stationary and collect information about traffic as it passes by: this includes loop detectors, traffic cameras, and ERP sensors. Dynamic sensors: are attached to the vehicles themselves and collect information about individual vehicles as they move. Traffic Visualization Example Comparison of two types of sensors: Given any set of static sensor measurements, infer those measurements from the dynamic sensor taxi data and compare the results in relative terms, e.g., by comparing the corresponding traffic distributions. Compensation of bias: If the taxi data is representative, these distributions should match; if not, then one can quantify the mismatch, identify specific areas of match and mismatch. [ Fig. 4. Loop detector data and taxi data after adjustment by linear regression ]! IV. Conclusions! * As we suspected, we identified some regions where taxi data is very representative and the other regions where it is not. * By applying linear regression, we could compensate most of the errors in taxi data. Acknowledgements! The authors gratefully acknowledge the support of the Singapore-MIT Alliance for Research and Technology (SMART) program s Future Urban Mobility (FUM). References! [ Fig. 2. Inductor loop detector data and Taxi data]! 1. J. Kwon, B. Coifman, and P. Bickel, Day-to-day travel-time trends and travel-time prediction from loop-detector data, Transportation Research Record: Journal of the Transportation Research Board 1717, no. -1 (2000): 120 129. 2. B. Coifman, Estimating travel times and vehicle trajectories on freeways using dual loop detectors, Transportation Research Part A: Policy and Practice 36, no. 4 (2002): 351 364. 3. C. Chen et al., Freeway performance measurement system: mining loop detector data, Transportation Research Record: Journal of the Transportation Research Board 1748, no. -1 (2001): 96 102. 3
Traffic Visualization: Speed/ Congestion Slow Medium Fast Traffic Visualization: Congestion Sweep Slow Medium Fast Traffic Visualization: Congestion Sweep Slow Medium Fast 4
Traffic Visualization: Congestion Sweep Slow Medium Fast Traffic Visualization: Congestion Sweep Slow Medium Fast Traffic Visualization: Congestion Sweep Slow Medium Fast 5
Traffic Visualization: Congestion Sweep Slow Medium Fast Traffic Visualization: Congestion Sweep Slow Medium Fast Case Study: Transportation in Singapore Source: Live Singapore, Future of Urban Mobility, Singapore-MIT Alliance for Research and Technology 6
Case Study: Transportation THANK YOU Daniela Rus Professor Director of CSAIL Massachusetts Institute of Technology Case Study: Transportation Daniela Rus Professor Director, CSAIL Massachusetts Institute of Technology 7
Travel Pattern Analysis How does traffic flow? workdays / non-workdays rush hour / non- rush hour Where does traffic originate? What is its destination? What routes are taken? How does this affect planning, resource allocation, etc.? Data-Driven Traffic Modeling GPS sensors on vehicles Measures traffic speed or delay Easy to deploy Inductive loop detectors Measures traffic flow Costly to install CarTel sensor node: [ Hull, Madden, Balakrishnan 2006 ] Full Study 8
2/27/14 Do Taxis Measure General Traffic? loop detector count per 15 minutes: ground-truth of traffic flow taxi count per 15 minutes: sampled version of general traffic flowq Taxis vs Loop Distribution Sun Aug 01 Mon Aug 02 Biased Sample Conclusion taxi sample is biased, but is it consistently biased and thus correctable? 9
One Month, Loop 11 Traffic Flow Estimation Goal: Predict the general traffic flow using taxi data and available set of loop detector data Approach: Learn linear regression coefficients trafficflow= m * taxiflow + b Different time models for linear regression: Day category Hour category Find the best time model with the most predictive power Correction Factors: Regression 10
Cross-validation: Regression RMSE = 0.0006 Day-of-week, 15 minutes Train Workday/Non-workday, 1 hour RMSE = 0.0011 RMSE = 0.0020 Test RMSE = 0.0013 Cross Validation, Model Selection Training Error Prediction Error Workday/Non-workday, 1 hour Before 11
After Regression How Many Taxis are Needed? What fraction of roads are covered at least 30 times within 15 minutes / 1 hour? Taxi Traffic Conclusions Taxi traffic is a biased sample of overall traffic Bias is consistent and easily correctable even at the granularity of individual intersections using just workday/non-workday + hour of sample Can infer overall traffic from taxi traffic but why does this work? 12
Case Study: Transportation THANK YOU Daniela Rus Professor Director of CSAIL Massachusetts Institute of Technology Case Study: Transportation Daniela Rus Professor Director, CSAIL Massachusetts Institute of Technology 13
2/27/14!"#$%&'&()*(+,#**'-(.#/#0(1"23,&/#"2'"4(#(5#/')"6&(+,#73$(8#//3,"& Xinghao Pan, Sejoon Lim, Javed Aslam, and Daniela Rus pxinghao@dso.org.sg, sjlim@mit.edu javed.aslam@smart.mit.edu, rus@csail.mit.edu Why Are Taxis Good Probes? I. Problem III. Results #+,-./01+2.3G!"#$%&#$"'($ $D$8#')(4(-6'%1$')%,,6/$6*$1('$)%13(>E$B&'$%44#%)*$'($,(--(2$%$*#'$(,$ Stochastic taxi movement due to randomness of passenger )#+&-%)$')%F#-$4%''#)1*C destination $D$!)%F#-$4%''#)1*$%)#$(,$+)#%'$F%-&#G$'"#.$/%1$B#$#H4-(6'#3$,()$'"#$ Empirically show taxi distributions converge regardless of initial F%)6(&*$4&)4(*#*$61/-&361+$/(1+#*'6(1$4)#36/'6(1E$)(&'#$*#-#/'6(1$ locations %13$&)B%1$4-%1161+C $D$91'#)7)#+6(1%-$')%F#-$-%)+#-.$,%/6-6'%'#3$B.$"6+"2%.*$%13$')&10$)(%3* $D$91')%7)#+6(1%-$')%F#-$,()>*$%$*6+16,6/%1'$4()'6(1$(,$(F#)%--$')%,,6/ $D$I)#F6(&*-.$2#$(1-.$"%F#$%1#/3('%-$#F63#1/#$(,$*&/"$4%''#)1*C$J(2E$ 2#$%)#$%B-#$'($&1/(F#)$')%F#-$4%''#)1*$,)(>$/(--#/'#3$')%,,6/$3%'%C [ Fig. 2. Examples of common inter- (left) and intra-regional routes (right) ] )"&*($ "+242536,5767.3025,02153G $D$!($%1%-.*#$561+%4()#K*$')%,,6/$3%'%$%13$L&%1'6,.$'"#$4(4&-%'6(1K*$ ')%F#-$4%''#)1*E$,()$4)#36/'6(1$%13$4-%1161+$4&)4(*#*C $D$!($*'&3.$'"#$#,,#/'*$(,$F%)6(&*$,%/'()*E$*&/"$%*$'6>#7(,73%.$M$3%.7(,7 2##0E$&)B%1$-%13$&*#E$#'/C $D$!)%F#-$'($%13$,)(>$/6'.$<RNQE$a)/"%)3=$,()>$B&-0$(,$'%H6$')%,,6/ $D$!)64*$B#+61$61$)#*63#1'6%-$%)#%*$%13$#13$61$'"#$/6'.$3&)61+$'"#$ >()161+$)&*"$"(&)E$%13$F6/#$F#)*%$3&)61+$'"#$#F#161+$)&*"$"(&)* $D$:6)4()'$6*$B.$,%)$'"#$,%F(&)6'#$3#*'61%'6(1$61$#%)-.$>()161+$"(&)* II. Approach $D$5'&3.$'"#$()6+61*E$3#*'61%'6(1*$%13$')%O#/'()6#*$(,$'%H6$')64*$61$%$(1#7 >(1'"$4#)6(3$<:&+&*'$^_T_=C #+,-./01+2.3G $D$5'&3.$/(>>(1$')%O#/'()6#*$%13$)(&'#*$B.$/-&*'#)61+$'%H6$')64* $$$$$$7$A*61+$3&%-7PQI$STUE$%$1(174%)%>#')6/$N%.#*6%1$>(3#"+242536,5767.3025,02153G Traffic Analysis Setup $D$Q6F63#$'"#$/(&1').$61'($)#+6(1*$B%*#3$(1$()6+61*$%13$3#*'61%'6(1* $$$$$$7$X7>#%1*$/-&*'#)61+$%&'(>%'6/%--.$+#1#)%'#*$`()(1(6$4%)'6'6(1 [ Fig. 3. Randomness of origins and destinations varying over time as taxis travel into city in the day, and out of city at night ] What are origins/destinations? $D$:1%-.*#$4)(B%B6-6'6#*$(,$%$'%H6$')64$()6+61%'61+$()$'#)>61%'61+$%'$#%/"$ GPS coordinates, street addresses, regions )#+6(1$3&)61+$36,,#)#1'$'6>#*$(,$'"#$3%. [ Fig. 4. Probabilities of trips starting or ending in selected regions ] IV. Conclusions $D$N.$#H%>6161+$"6*'()6/%-$')%,,6/$3%'%E$2#$"%F#$,(&13$'"%'$')%F#-$ 4%''#)1*$3($#H6*'$61$')%O#/'()6#*E$()6+61*$%13$3#*'61%'6(1*C [ Fig. 1. Twenty-seven regions of origins and destinations in Singapore ] $D$!"#$%B6-6'.$'($(BO#/'6F#-.$L&%1'6,.$*&/"$4%''#)1*$26--$B#$&*#,&-$,()$ 4)#36/'6(1$%13$4-%1161+$4&)4(*#*C Acknowledgements!"#$%&'"()*$+)%'#,&--.$%/01(2-#3+#$'"#$*&44()'$(,$'"#$561+%4()#789!$:--6%1/#$,()$ ;#*#%)/"$%13$!#/"1(-(+.$<58:;!=$4)(+)%>?*$@&'&)#$A)B%1$8(B6-6'.$<@8=C V. Future Directions $D$91/)#>#1'%-E$(1-61#$F#)*6(1$(,$3&%-7PQI$,()$%1%-.*61+$-%)+#$F(-&>#*$ (,$')%,,6/$3%'%C Problem 3 [30 pts]: Markov Sources and Compression. When I take my two children (ages 10 and 12) References out to dinner, they often get to choose the restaurant. They $D$R(>4%)6*(1$(,$)#*&-'*$26'"$('"#)$>(3#*$(,$4&B-6/$')%1*4()'%'6(1C have three favoritestu$v6%(+%1+$w%1+e$x#1+$!#/0$8%e$y##7w%"$j+e$%13$wc$zc$[c$y)6>*(1c$!)%o#/'().$ restaurants: Bertucci s (Italian), Margaritas (Mexican), and Sato (Chinese/Japanese). %1%-.*6*$%13$*#>%1'6/$)#+6(1$>(3#-61+$&*61+$%$1(14%)%>#')6/$N%.#*6%1$>(3#-C$91$ In the past two years, I have noticed that they are much more likely to pick a restaurant that they have!"#$%&'()*+,+"-).-/)0.&&'(-)1'2"3-+&+"-4)56678)!*01)56678)9:::)!"-;'('-2')"-e$ $D$5'&3.$(,$('"#)$,%/'()*$'"%'$>%.$61,-/#$')%F#-$4%''#)1*C eaten at recently than to choose a different restaurant, and I have also noticed that they have clear favorites 4%+#*$T7\E$]&1#$^ \C among these restaurants. The following Markov Chain roughly describes their restaurant choice behavior, where B, M, and S represent Bertucci s, Margaritas, and Sato, respectively: Markov Chains.7 B.1.3.2.3.1.6 M S.5.2 i. Calculate the stationary distribution this Markov Chain. What fraction of the time Pr[B] = associated 1/2, Pr[M]with = 1/3, Pr[S] = 1/6 (on average) do we eat at Bertucci s, Margaritas, and Sato? (Yes, we did this in class, but I would like you to go through the exercise yourself.) ii. Calculate the entropy of this stationary distribution, and calculate the entropy of the Markov Chain itself. (Again, yes, we did this in class, but I would like you to go through the exercise yourself.) iii. Generate a long i.i.d. sequence of restaurant choices according to the stationary distribution, and generate a long sequence of restaurant choices according to the Markov Chain (represent the restaurants using the characters B, M, and S). Compress these two sequences using the Lempel-Ziv compressor you developed above. What are your compression rates? How do they compare to the entropy rates you calculated above? Explain. iv. Compress the two sequences above using a real compressor such as gzip, bzip2, etc. What are your compression rates, and how do they compare to the entropy rates and the compression rates you obtained above? Explain. v. Go to the Project Gutenberg website and obtain a plain text copy of Alice in Wonderland by Lewis Carroll. Strip off the lengthy header text, retaining just the story itself. Compress this text using the Lempel-Ziv compressor you developed above. What is your compression rate? Now compress this text with a real compressor (gzip, bzip2, etc.); what is your compression rate? Compare and contrast this to the results you obtained in parts iii and iv above. Explain. 14
Rapidly Mixing Taxi Distributions Upper bound on divergence between distributions of taxis starting from different regions. Distributions converge quickly. Taxi probes provide good coverage regardless of initial distribution. RMMC Summary At a coarse granularity, taxis behave like a rapidly mixing Markov chain. Consequence: taxis are good for sampling Future work: granularity vs. fleet size vs. mixing time vs. coverage probability sensing and monitoring applications road conditions, air quality, noise pollution, potholes Case Study: Transportation THANK YOU 15
2/27/14 Daniela Rus Professor Director of CSAIL Massachusetts Institute of Technology Case Study: Transportation Daniela Rus Professor Director, CSAIL Massachusetts Institute of Technology Application: Hotspots Detect regions of high traffic volume and potential congestion 5-7am 8-10am Airport Central Business District 16
2/27/14 Application: Hotspots Detect regions of high traffic volume and potential congestion 5-7pm Central Business District 3-5am Fort Canning Application: Origin-Destination Analysis Application: Origin-Destination Analysis 17
60 50 40 30 20 10 ALL Taxi Speed vs Density (Road 03254) 0 0 5 10 15 20 25 30 density 00 02 02 04 04 06 06 08 08 10 10 12 12 14 14 16 16 18 18 20 20 22 22 24 700 600 500 400 300 200 100 ALL Loop Count vs Density (Road 03254) 0 0 5 10 15 20 25 30 density 00 02 02 04 04 06 06 08 08 10 10 12 12 14 14 16 16 18 18 20 20 22 22 24 60 50 40 30 20 10 ALL Taxi Speed vs Loop Count (Road 03254) 0 0 100 200 300 400 500 600 700 loopcount 00 02 02 04 04 06 06 08 08 10 10 12 12 14 14 16 16 18 18 20 20 22 22 24 2/27/14 Fundamental Traffic Flow Diagram 718 M. Papageorgiou et al. Road performance: speed vs. flow vs. density Fig. 1. Examples of the fundamental relations between flow, density, and speed. flow (veh/hr) = speed (km/hr) * density (veh/km) the different properties of the road (width of the lanes, grade), flow composition (percentage Tackling of trucks, the Challenges fraction of Big Data! of commuters, 2014 Massachusetts experienced Institute of Technology drivers, etc.), external conditions (weather and ambient conditions), traffic regulations, etc. Traffic flow observations however show that many data are not on the fundamental diagram. While some of these points can be explained by stochastic fluctuations (e.g., vehicles have different sizes, drivers have different desired speeds and following distances), a number of researchers have suggested that these differences are structural and stem from the dynamic properties of traffic flow. That is, they reflect so-called transient states (i.e., changes from congestion to free flow (acceleration phase) or from free flow to congestion (deceleration phase)) of traffic flow. Several authors have studied the nonlinear or even chaotic-like behavior of the traffic system (cf. Bovy and Hoogendoorn, 2000; Pozybill, 1998). Among these behaviors are hysteresis and metastable or unstable behavior of traffic flow. The latter implies that in heavy traffic a critical disturbance may be amplified and develop into a traffic jam (spontaneous phase-transitions). In illustration, empirical experiments performed in Forbes et al. (1958), andedie and Foote Can (1958, be inferred 1960) have from shown taxi and that aloop disturbance data at the foot of an upgrade propagates from one vehicle to the next, while being amplified until at some point agiven vehicleearlier came to results a complete stop. This instability effect implies that once the density crosses some critical value, traffic flow becomes rapidly more congested without diagrams any obvious can be reasons. inferred More from empirical taxi data evidence alone ofthisinstability and start stop wave formation can be found in among others (Verweij, 1985; performance can be inferred in real time Ferrari, 1989; 718 Leutzbach, 1991). M. Papageorgiou In Kerner et al. and Rehborn (1997) and Kerner (1999)it is empirically shown that local jams can persist for several hours, while maintaining their form and characteristic properties. In other words, the stable complex structure of a traffic jam can and does exist on motorways. 1 These findings show that traffic flow has some chaotic-like properties, implyingthat Fundamental Traffic Flow Diagram 1 Apart from the formation of stop-and-go waves and localized structures, a hysteric phase-transition from free-traffic to synchronized flow that mostly appears near on-ramps is described in Kerner and taxispeed Fig. 1. Examples of the fundamental relations between flow, density, and speed. the different properties of the road (width of the lanes, grade), flow composition (percentage of trucks, fraction of commuters, experienced drivers, etc.), external conditions (weather and ambient conditions), traffic regulations, etc. Traffic flow observations however show that many data are not on the fundamental diagram. While some of these points can be explained by stochastic fluctuations (e.g., vehicles have different sizes, drivers have different desired speeds and following distances), a number of researchers have suggested that these differences are structural and stem from the dynamic properties of traffic flow. That is, they reflect so-called transient states (i.e., changes from congestion to free flow (acceleration phase) or from free flow to congestion (deceleration phase)) of traffic flow. Several authors have studied the nonlinear or even chaotic-like behavior of the traffic system (cf. Bovy and Hoogendoorn, 2000; Pozybill, 1998). Among these behaviors are hysteresis and metastable or unstable behavior of traffic flow. The latter implies that in heavy traffic a critical disturbance may be amplified and develop into a traffic jam (spontaneous phase-transitions). In illustration, empirical experiments performed in Forbes et al. (1958), andedie 718 M. Papageorgiou et al. and Foote (1958, 1960) have shown that a disturbance at the foot of an upgrade propagates from one vehicle to the next, while being amplified until at some point a vehicle came to a complete stop. This instability effect implies that once the density crosses some critical value, traffic flow becomes rapidly more congested without any obvious reasons. Moreempiricalevidenceofthisinstability and start stop wave formation can be found in among others (Verweij, 1985; Ferrari, 1989; Leutzbach, 1991). In Kerner and Rehborn (1997) and Kerner (1999)it is empirically shown that local jams can persist for several hours, while maintaining their form and characteristic properties. In other words, the stable complex structure of a traffic jam can and does exist on motorways. 1 Fig. 1. Examples of the fundamental relations between flow, density, and speed. These findings show that traffic flow has some chaotic-like properties, implyingthat the different properties of the road (width of the lanes, grade), flow composition (percentage of trucks, fraction of commuters, experienced drivers, etc.), 1 external conditions (weather and ambient conditions), traffic regulations, etc. Apart from the formation Traffic flowof observations stop-and-gohowever waves and showlocalized that many structures, data are not a hysteric on the fundamental to synchronized diagram. flow While that some mostly of these appears points near can on-ramps be explained is described by stochastic in Kerner and phase-transition from free-traffic fluctuations (e.g., vehicles have different sizes, drivers have different desired speeds and following distances), a number of researchers have suggested that these differences are structural and stem from the dynamic properties of traffic flow. That is, they reflect so-called transient states (i.e., changes from congestion to free flow (acceleration phase) or from free flow to congestion (deceleration phase)) of traffic flow. Several authors have studied the nonlinear or even chaotic-like behavior of the traffic system (cf. Bovy and Hoogendoorn, 2000; Pozybill, 1998). Among these behaviors are hysteresis and metastable or unstable behavior of traffic flow. The latter implies that in heavy traffic a critical disturbance may be amplified and develop into a traffic jam (spontaneous phase-transitions). In illustration, empirical experiments performed in Forbes et al. (1958), andedie and Foote (1958, 1960) have shown that a disturbance at the foot of an upgrade propagates from one vehicle to the next, while being amplified until at some point a vehicle came to a complete stop. This instability effect implies that once Tackling the density the Challenges crosses someof critical Big value, Data! traffic 2014 flowmassachusetts becomes rapidly Institute more congested without any obvious reasons. Moreempiricalevidenceofthisinstability and start stop wave formation can be found in among others (Verweij, 1985; of Technology Ferrari, 1989; Leutzbach, 1991). In Kerner and Rehborn (1997) and Kerner (1999) it is empirically shown that local jams can persist for several hours, while maintaining their form and characteristic properties. In other words, the stable complex structure of a traffic jam can and does exist on motorways. 1 These findings show that traffic flow has some chaotic-like properties, implyingthat Church Street loopcount taxispeed 1 Apart from the formation of stop-and-go waves and localized structures, a hysteric phase-transition from free-traffic to synchronized flow that mostly appears near on-ramps is described in Kerner and 18
Application: Congestion-Aware Routing Wrapping Up Roving sensor network of taxi probes Predict general traffic flow using taxi flow Intuition: Rapidly mixing taxi distributions Identify the number of probe vehicles required to predict the general traffic flow Demonstrate use of dynamic traffic probes at city scale Case Study: Transportation THANK YOU 19