John A. Volpe National Transportation Systems Center Connected Vehicle and Big Data: Current Practices, Emerging Trends and Potential Implications March 27, 2014
Purpose and Organization Purpose set the stage for the discussion and stimulate thinking by the panel and audience by providing some initial perspective on: What s Happening Now? Current Examples of Transportation Data and Information Services What s Coming? Enabling Data Capture and Management Technologies Big Data Analysis Examples How Do We Enable It? Issues to be Addressed to Capitalize on Connected Vehicle and Traveler Data 2
A Changing Transportation Data Landscape - Yesterday Traditional agency data approaches often featured: Agency-owned, infrastructure-based, purpose-built devices Point rather than probe Current/representative conditions Limited outsourcing of data collection services (e.g., vehicle counts, aerial photos, household travel surveys) Specific, short-term collection (samples) Support limited number of specific studies/activities Infrastructure-oriented (rather than trip or traveler-oriented) Stored and accessible mostly locally by a limited number of users Image Source: FHWA Image Source: FHWA 3
A Changing Transportation Data Landscape - Today Recent and emerging agency approaches feature greater: Diversity of sources, coverage, types, and quantity of data Probe data collection, including agency fleet vehicles (filling gaps on rural and arterial roads), e.g., AVL, GPS Maintenance Fleet (e.g., Michigan DOT Integrated Mobile Observations) Purchasing of information and data services (not just data ), e.g., INRIX, Airsage, IBM, etc. Purchasing of continuously collected data to support real-time monitoring Emphasis on prediction and decision support Accessibility/sharing of data among more users, e.g., via Internet Data mining and visualization capabilities Image Source: Michigan DOT 4
A Changing Transportation Data Landscape - Tomorrow Much more data: Big Data Connected Vehicles Connected Travelers Internet-of-Things (pervasive, uniquely identified, Internet accessible devices) Increased use and need for advanced data capture, management and analysis tools Crowdsourcing - collecting input from a large number of people Cloud Computing - storing data on multiple servers that can be accessed via the Internet Federated database systems - transparent mapping of multiple autonomous database systems into a single federated database Data science/big Data Analytics - extracting knowledge using a range of techniques from many fields, including probability models, machine learning, and visualization 5
Big Data Characteristics Volume, velocity, variety, veracity, value Larger and more diverse data sets N=all Crowdsourced or electronic breadcrumbs data capture Incidental, automatically or system-generated electronic records Repurposing of data + much larger samples = less emphasis on experimental design associated with initial data capture Virtual storage and remote access Data science in addition to traditional statistics E.g., pattern recognition and machine learning 6
Example Netflix Viewing Recommendations Netflix provides users with viewing recommendations using an automated recommendation engine The engine utilizes an algorithm that maps Netflix.com customer behavior (rating, viewing and searching for content) along with their own classification scheme for their content (over 76,000 categories) to generate recommendations Big Data Characteristics No sampling all data Large number of sources (users) Pattern recognition Automated model development 7
Example Traveler Information Provider Products/services Data Smartphone navigation app for travelers Traffic monitoring services to cities/regions Active (e.g., incident reporting) and passive (GPS trace) crowdsourcing of data to generate route guidance and other traveler information Big Data Characteristics Massive data streams (e.g., large number of GPS trace feeds) Crowdsourced and diverse data Predictions using fused data (historical plus realtime GPS trace data) 8
Potential Applications to Connected Vehicle (CV) & Connected Traveler Data (CT) Enhanced System Monitoring and Management Prediction of impending traffic flow or transit schedule adherence break-downs E.g., pattern of vehicle braking, lane changing and acceleration activity preceding a flow breakdown Traveler-centric Transportation System Management Using rich profiles of individual travelers behavior to support highly personalized and targeted traveler information, travel demand management and other strategies E.g., automated rideshare/carpool matching or customizing incentives for transit use 9
Issues Impacting CV/CT Data Approaches Policy issues associated with CV and CT data Who owns and has access to what data? Roles, responsibilities and structures in the CV/CT Data Environment How much to privatize (third-party broker) Which data? Which parts of the process (capture, manage, analyze)? How to enable use of the data by many entities, including private? What are the valuable, specific Big Data use cases for CV? 10
Panel Discussion Questions Over the next 5 to 10 years, how will your organization or industry take advantage of the increasing availability of connected vehicle and connected traveler data (e.g., from handheld devices)? What can public agencies do to enable the use of connected vehicle and traveler data by the public and private sector? What specific tools and techniques is your organization or industry using, now and/or will be using, to help extract value from connected vehicle and traveler data? 11