Big Data for Transportation Dr. Laurie A. Schintler The School of Public Policy George Mason University Dr. Shanjiang Zhu Department of Civil, Environmental & Infrastructure Engineering George Mason University
Overview of Presentation Big Data Fundamentals Transportation Case Studies Issues and Challenges
FUNDAMENTALS
What is Big Data?
VOLUME 8"9-5'2'8%,2):,%'(-,!'8%#/6%$4;%' 1!"#$%&' 02#3%4-56(75#(0+$!0'!"#$%&''$(!)(*+%,-%./+$!0'
1!"#$%/' <1='>?'@ABACDE'1=FGAH=1' ''''0!)(7%'6I!-%/' ''''J%)'/%#;($%/' ' >KEAK='AK?>FLDCA>K' ''''/!$(27'0%M(2' ''''-%./626%#/N' ''''%O$!00%#$%'',#2-/2$4!-/' ' 8PQ1AHDE'1=K1>F1' ''''/2,%77(,%'(025%#:' ''''-(5I,7(5I,/'M2,2' ' HF>J@1><FHAKB' '''''G!7"-,%%#%M'B%!5#26I($'A-R!#024!-' ' VARIETY @2,2' HI2#2$,%#(/4$/' ' 1,#"$,"#%M'G%#/"/' <-/,#"$,"#%M'@2,2' ' G2#:(-5'E%;%7/'!R'/62427'2-M',%06!#27';2#(24!-' ' F%6"#6!/%M'@2,2' '
G=E>HACQ' S'
C#"/,.!#,I(-%//'!R',I%' @2,2' ' K!(/%' A-$!067%,%-%//' U(2/' D-!027(%/V!",7(%#/'...+I%T2-(32+$!0'
BIG DATA LANDSCAPE Human/Societal Behavior Network-based Proliferation of Unstructured Data (text, video, photos) Digitization of documents and content Open Data (e.g., Data.gov) Geospatial and spatio-temporal data Sentiment/Preferences Improvements in computational capabilities >66!#,"-(4%/VU%-%W,/' Privacy, inequality, bias, computational complexity, data management, etc. A//"%/VHI277%-5%/'
GROWING IMPORTANCE IN TRANSPORTATION ZXX'!"#$%&'()*$+**,-.&/(0$1./2$3#.4$5&/&3$ [\Z' []X' [XX' ]X' X' ZY' X' Y' ZX[[' ZX[Z' ZX[^' ZX[_'
GROWING IMPORTANCE IN TRANSPORTATION Innovations In Travel Modeling, Baltimore, 04/27-04/30 Innovations in Data Workshop Dynamic Models and Dynamic Data Old Data with a New Twist GPS and Emerging Data in Travel Forecasting International Transport Forum 2014 Summit, Leipzig Big Data in Transport: Applications, Implications, Limitations Exploratory Advanced Research Video feature extraction automation Real time data for connected highway-vehicle systems FY 2014 Solicitation, Data for safety analysis and freight
TRANSPORTATION EXAMPLES Travel Demand Modeling/Analysis Public Transportation Logistics Traffic Management/Operations
TRAVEL DEMAND MODELING/ ANALYSIS
Traditional Sources of Data P!"/%I!7M'1"#;%:/' C#2;%7'@(2#(%/' E!!6'M%,%$,!#/' H%-/"/'@2,2' H!/,7:' C(0%'H!-/"0(-5' E(0(,%M'B%!5#26I($VC%06!#27' H!;%#25%'...+2(0/"-+$!0'
New Sources of Data Mobile Phone Traces Location-Sharing Services (LSS) data Taxi Movements Banknotes Credit Card Transactions
Mobile Phone Data Cell Phone Triangulation 1!"#$%/&'$Z+/,24$`($3#+$!0'
Human Mobility Patterns!! Anonymous Cell Phone Records and Mobile Phone traces, Human mobility and resulting contacts between people have a high degree of predictability (Song et al., 2010)!! Regular patterns across population, despite the different socio-economic characteristics Of Individuals and travel context!! Scaling in distances travelled, time spent at locations and popularity of places 1!"#$%&''B!-*27%*'LN'P(M275!'HN'U2#2)2/('DE'aZXXbc' <-M%#/,2-M(-5'(-M(;(M"27'I"02-'0!)(7(,:'62d%#-/+' K2,"#%'E%d%#/+']&''\\YO\bZ'
Location Services Data?!"#/e"2#%N'B!.2772N' U#(5I,3(,%' @2,2&' g$i%$3(-/hn'/!$(27'-%,.!#3/'?!"#/e"2#%'hi%$3(-/'' a>$,!)%#'zx[x'f'?%)#"2#:'zx[[c'?#!0&''hi%-5'%,'27+'azx[[c'
Where are People Going? D,72-,2' HI($25!'
Social and Spatial Interaction 1 set of vertices represent space, the other set represent individuals Roger Location A Roger Location B Kingsley Location A Laurie Location C Rajendra Location B Rajendra Location D How social ties influence travel behavior and how location choices affect social ties D' @' U j(-5/7%:' $ E2"#(%' a=b>c' F2i%-M#2' F2i%-M#2' F!5%#'
1-Mode Networks From the 2-mode network, we extract the following two 1- mode weighted networks:! Social-Space (SL w ): The set of vertices, V, are individuals and each edge, e sl, in the set of all edges E, is weighted by number of shared locations! Spatial-Social (LS w ): The set of vertices, V, are spatial units and each edge, e ij, in the set of all edges E, is weighted by number of individuals common to locations i and j. From these networks, we can extract the following subgraphs:! Social-Space: only social ties! Space-Social: only spatially contiguous geographic units
Social and Spatial CI%'R"77'M2,2/%,'$!-,2(-/'67689796:$ ;-2(-<=.>*?$02M%'):'U#(5I,3(,%'"/%#/'!;%#',I%'6%#(!M'D6#(7'ZXXbO>$,!)%#' ZX[X+'CI%'"-M(#%$,%M'/!$(27'5#26I'(/' $!06#(/%M'!R'@A7BBA$.>0.C.0D&E*$2-M' B967$FGA$*,-.&E$H(*'aHI!'%,'27+'ZX[[c+'' ' J%'%T,#2$,%M'g$I%$3O(-/'R!#'!-7:',' I!/%'7!$24!-/'(-',I%'$!-45"!"/'' <-(,%M'1,2,%/N'2-M'2//(5-%M'' g$i%$3o(-/h',!'$!"-4%/+'ci%#%''.%#%'270!/,':$i.ee.,>$;-2(-<=.>*n'' B97FF9$D>.JD($D*()*$2-M'' BGFK$-,D>H(*'.(,I'2,'7%2/,''!-%'g$I%$3O(-+h'' Interaction 1$I(-,7%#N'E+D+N'j"732#-(N'F+'P2:-%/N'j+'2-M'F+'1,!"5I'aZX[_c+'1%-/(-5'g1!$(!O1624!h' A-,%#2$4!-'2-M'D$$%//()(7(,:'R#!0'E!$24!-O1I2#(-5'1%#;($%/'@2,2+'%M/+'L2#5%#(M2N'D+' %,'27+'!""#$$%&%'%()*+,-*./+0+'*1,(#2+"03,' ''
Accuracy and Sensitivity of the Data HI%-'2-M'1$I(-,7%#N' R!#,I$!0(-5'
PUBLIC TRANSPORTATION MANAGEMENT
Dublin Public Bussing System (in collaboration with IBM)! Bus tables, inductive-loop detectors, closed-circuit television cameras and GPS updates (each bus every 20 seconds)! Real-time digital map of the city Melbourne, Australia s, Yarra Trams! Largest tram system in the world! Tracking 91,000 pieces of equipment! Benefits: real-time route operations, prediction of problems, quick response to problems e.g, special events, inclement weather =-+6#!M(5:-%,.!#3+$!0' gp!.'u(5'@2,2'(/'c#2-/r!#0(-5'8")7($'c#2-/6!#,24!-' L2-25%0%-,h'...+I"k-5,!-6!/,+$!0N'gP!.' L%7)!"#-%'(/'?2/,OC#2$3(-5'102#,'C#2-/(,' </(-5'U(5'@2,2h'
More Examples Abidjan, Ivory Coast, Transit Network! IBM, Data for Development! 2.5 billion call records! Mapping movements of individuals! Transit route planning and optimization MIT Big Data Challenge! Partnership with City of Boston! 2.3 million taxi rides, local events, weather conditions! Taxi Demand Management LAC'C%$I-!7!5:#%;(%.+$!0N'gDR#($2-' U"/'F!",%/'F%M#2.-'</(-5'H%77O8I!-%' @2,2hN'D6#(7'^XN'ZX[^' 1%-/%2)7%^+0(,+%M"V)(5@2,2HI277%-5%'
LOGISTICS
Current Use of Big Data
1!"#$%&'U(5'@2,2'(-'E!5(/4$/&'D'@PE' 8%#/6%$4;%'!-'P!.',!'L!;%'U%:!-M',I%'P:6%N'@%$%0)%#'ZX[^'
TRAFFIC MANAGEMENT/ OPERATIONS
Integrated Corridor Management C#2-/6!#,24!-'$!##(M!#/'!l%-'$!-,2(-'"-"/%M'$262$(,:'(-',I%'R!#0'!R'62#277%7'#!",%/N',I%'-!-O6%23'M(#%$4!-'!-'R#%%.2:/'2-M'2#,%#(27/N' /(-57%O!$$"62-,';%I($7%/'2-M',#2-/(,'/%#;($%/',I2,'$!"7M')%'7%;%#25%M',!'I%76'#%M"$%'$!-5%/4!-+ 'OOO<1@>C'
Integrated Corridor Management L(),M$N()4($OI&)/$%&)<.>47$P+$5Q!$
Integrated Corridor Management BMW i3 & i8 are featured with a mutlimodal navigation system
Active Traffic Management Shoulder lanes for peak hours Shoulder lanes for emergencies Variable speed limit HOV/HOT
I-66 ATM
Data Challenges Static Reactive Single location Road only Single source One time estimation Dynamic Proactive Network wide Multimodal Data fusion Prediction based on rolling horizon
Corridor Travel Time Prediction Loop Detectors Blue Tooth GPS Cellular Data
Dynamic OD Patterns 1!"#$%&'AKFAm'
Optimization on a Rolling Basis C#2k$'@2,2' @%02-M'@2,2' H!-,#!7'@2,2' 8#%M($,%M'C#2k$' 82d%#-/' H!-,#!7'1,#2,%5(%/' =/4024!-' L!M%7/' >640(*24!-' C#2k$'@2,2' C#2k$'@2,2' @%02-M'@2,2' H!-,#!7'@2,2' 8#%M($,%M'C#2k$' 82d%#-/' H!-,#!7'1,#2,%5(%/' =/4024!-' =/4024!-' L!M%7/' >640(*24!-' >640(*24!-' >640(*24!-' H!-,#!7'1,#2,%5(%/' C[' C#2k$'@2,2' @%02-M'@2,2' H!-,#!7'@2,2' 8#%M($,%M'C#2k$' 82d%#-/' H!-,#!7'1,#2,%5(%/' =/4024!-' L!M%7/' >640(*24!-' C#2k$'@2,2' C#2k$'@2,2' @%02-M'@2,2' H!-,#!7'@2,2' 8#%M($,%M'C#2k$' 82d%#-/' H!-,#!7'1,#2,%5(%/' =/4024!-' =/4024!-' L!M%7/' H!-,#!7'1,#2,%5(%/' CZ' 8#%M($,%M'C#2k$' U2:%/(2-'<6M2,%' H!-,#!7'@2,2' >640(*24!-' >640(*24!-' >640(*24!-' U2:%/(2-'<6M2,%' U2:%/(2-'<6M2,%' U2:%/(2-'<6M2,%' U2:%/(2-'<6M2,%' U2:%/(2-'<6M2,%' U2:%/(2-'<6M2,%' U2:%/(2-'<6M2,%' U2:%/(2-'<6M2,%' C-'
ISSUES AND CHALLENGES
Privacy International Telecommunications Union: right of individuals to control or influence what information related to them may be disclosed. Tradeoffs e.g., National Security vs. Efficiency vs. User Satisfaction Solutions?! U.S. Census 72-year rule! Data Philanthropy: corporations [world] take initiative to anonymyze their data sets and provide the data to social innovators to mine the data for insights, patterns, and trends in real-time and near-real-time UN Global Pulse! Regulation/Free Market
Data Management/ Data Silos Proprietary Data Repurposing of Data Volume of files combination of software packages Team research - Files/Data dispersed Organization of unstructured data files, combining structured and unstructured data Varying spatial and temporal scales Data reduction Metadata Sharing
Statistical Issues and Bias Not random: bias Missing data (and noise!) Observations not independent Issues related to temporal/ spatial data Alternative Methods Inequities
UAB'@DCD'A1'L<ECAO'2-M' AKC=F@A1HA8EAKDFQ' Transportation Managers/ Planners Computer Scientists Geographers Semantics Experts, Linguists Sociologists Network Scientists Statisticians Other Topical Areas of Expertise Health/ Medicine, Political Science, Economics Public Policy Analysts/ Professionals Human Capital Needs
Questions?