Using Big Data and Efficient Methods to Capture Stochasticity for Calibration of Macroscopic Traffic Simulation Models

Size: px
Start display at page:

Download "Using Big Data and Efficient Methods to Capture Stochasticity for Calibration of Macroscopic Traffic Simulation Models"


1 CIVIL AND ENVIRONMENTAL ENGINEERING Using Big Data and Efficient Methods to Capture Stochasticity for Calibration of Macroscopic Traffic Simulation Models Sandeep Mudigonda 1, Kaan Ozbay 2 1 Department of Civil and Environmental Engineering, Rutgers University, New Jersey 2 Center for Urban Science + Progress (CUSP); Department of Civil & Urban Engineering, New York University, New York

2 Simulation & Calibration Calibration O obs = observed field data. Parameters: User-, traffic-related parameters I S C s Inputs: Travel Demand Geometry Operational rules O : f( I, C ) O I, C + ε f( I, C ) = functional form of the internal models in a simulation system s O obs s s sim s s s sim = simulation output data given the input data and calibrations, ε = margin of error between simulation output and observed data, and, Traffic Simulation Model Calibration { ε UO O I C } min : (, (, )) C S obs sim s s Simulated Outputs: given inputs and parameters O I, C Sim s s ε Error Observed Field Data O Obs

3 Data Needs for Calibration Model Inputs: Driver Characteristics Data Vehicle Composition Data Travel Demand Data Ped./Bike Data Model Parameters: Link (capacity, speed limit, ) Path (route choice, tolls, ) Infrastructure (signal timings, VMS, work zones, ) Weather Driver behavior data Activity data Observed Outputs: Flows & Speeds Queue data Trajectories Accidents (?) Emissions Other

4 Study Hourdakis et al. (2003) Jha et al. (2004) Toledo et al. (2004) Qin and Mahmassani (2004) Kim et al. (2005) Balakrishna et al. (2007) Zhang et al. (2008) Mudigonda et al. (2009) Lee and Ozbay (2009) Data 5-min. data; 21 detector stations; 12-mile freeway section; PM peak; 3 days Detector data;15 days; AM and PM peaks; large urban network 68 detector stations; 3 freeways; 5 weekdays 7 detector stations; 3 freeways; AM peak; 5 weekdays Travel time data for 1 hr.; AM peak;1.1 km. freeway section 15 min. data; 33 detector stations 5-min detector count; PM peak; 7 days ETC data for AM and PM peaks 5-min detector count; AM & PM peaks ; 16 days OR CURRENT PRACTICE Data used in Previous Calibration Studies Data used for simulation calibration, spans 3-16 days, limited to few specific conditions or, a diluted sample of different conditions OR OR

5 Distribution of traffic data: Typical day? Illustration of demand clusters: No typical day Big Data s large spatial and temporal extent can help calibrate and validate traffic simulation models. RFIDs, GPS-equipped devices, Traffic sensors and cameras

6 METHODOLOGY Incorporating Stochasticity in Macroscopic Model Macroscopic model used adopted to simulate traffic flow during different conditions. Stochastic version of first-order model ( ρv xtω ) ρ( xt,, ω) + (,, ) = 0 t ρ( xt,, ω) D Ω x q= f( ρ, ω) : q- ρ for t-th time period t ρ( x,0) = ρ ( x, ω) : stochastic initial condition I B ρ( x, t) = ρ ( t, Z) : stochastic demand i i B( ρ : x, t, ω ) = 0 Hence simulation parameters and output are obtained t t ρ x = f(, t x) ( Ω, A, P) Θ x = gtx (, ) ( Ωʹ, Aʹ, Pʹ ) t (time of day, season, weather, etc.) and x (distance, changing geometry or pavement condition in different parts of the network).

7 METHODOLOGY Solving Stochastic Traffic Simulation Models Computational complexity is an important factor in the choice of numerical solution methods Most simple and common solution method is a Monte Carlotype independent sampling of n simulation runs for various traffic conditions. 2 tn 1,1 /2 S( n) α No. of replications for a level of precision γ: n( γ ) γ Ξ( n) Convergence rate for MC-type method is slow: O(1/ n) Depending on the size of the network and no. of stochastic dimensions, this approach can become prohibitive in terms of computational time requirements. Also, all possible points in the stochastic space of simulation output may not have corresponding observed data.

8 METHODOLOGY Stochastic Collocation The stochasticity is treated as another dimension and the stochastic solution space Ω is approximated (Γ) using a set of prescribed support nodes with basis functions Θ j (stochastic Q collocation) Θ { j} 1 j= Γ The multi-dimensional stochastic solution is approximated by an interpolation function built using deterministic solutions evaluated at each of a set of prescribed nodes (collocation points) ρ( xt,, ξ) ρ( xt,, ξ) p( ξ) dξ ρˆ = % ρα ( Θ ), j= 1 where, p( ξ) = pdf/weight of ξ j Γ Q j j j α = j-th interpolation basis function For higher dimensions of stochasticity, computationally efficient schemes are required to reduce the number of collocation points.

9 Smolyak Algorithm Developed originally for multi-dimensional integration m i i 1-D interpolant = Θ = U ( f) f( ) L, m # nodes at level i. In N-dimensions the full tensor interpolant is approximated by the sparse grid interpolant Error: O(Q -2 logq 3(N-1) ). (piecewise linear basis) O(Q -k logq (k+2)(n-1) ). (k-polynomial basis) j= 1 j j i Can be controlled by poly. order k : O(Smolyak) > O(MC), O(LHS)

10 METHODOLOGY Distribution of parameter 1 (free flow speed, etc.) Multiple replications for variance reduction due to stochastic demands Collocation Points at which the deterministic simulation is performed Distribution of parameter 2 (jam density, etc.)

11 Parameter Optimization From each realization of the parameter set, using the demand distribution as an input, the simulation output distribution (e.g., flow or density distribution) is generated. This distribution is compared with the observed output distribution and using a test statistic (such as the test statistic from the KS test), the error is estimated. This error is used as an objective function and is minimized as part of the multi-objective parameter optimization using the simultaneous perturbation stochastic approximation (SPSA) algorithm. N Ob S k Ob S k min wu ( q, q ( Θ )) + wu ( ρ, ρ ( Θ )) t { 1 1 i i t 2 2 i i t } i= 1 where, Ob S q, q - observed and simulated flows at location i i ρ Θ Θ Ob i k t 1 2 i S, ρ - observed and simulated densities for location i 1 2 i - parameter set for time period t and iteration k ww, - weights for the error measures U, U - functions representing the error in flow and density METHODOLOGY w Weight parameter signifies the variance of each output measure in the data

12 Input Demand / Parameter Distribution Flowchart of Calibration using Stochastic Collocation Collocation Points j = 0 t Q j j= 1 j = j + 1 No Deterministic 1 st Order PDE Yes Output at collocation point ρ( xt, j ) t j = t Q t j? Output Distribution ρ * ( xtz,, ) SPSA Optimization New Parameter Set k Θ t METHODOLOGY Any existing simulation or legacy codes can alternatively be used No Parallelizable Error Allowed Error? Yes END

13 RESULTS Study section Section of NJTPK at interchange 7 with a single on- and off-ramp with stochastic demand Big Data: ETC Data Vehicle-by-vehicle entry and exit time, lane, transaction type, vehicle type, number of axles. Available in NJ for 150 miles of NJTPK and 170 miles of GSP. The variation in demand at this section is captured using the ETC data for every 5 minutes between January 1, 2011 and August 31, 2011.

14 Study Data The demand is divided into clusters using k-means algorithm. For each cluster, the distribution of demand during each 5 minute time period is generated. The simulation is performed for weekday AM peak (7-9AM) off-peak (10AM-12PM) weekend peak period (10AM-12PM)

15 Implementation of Proposed Approach to Study section With the demand distribution as an input, for each realization of the parameter set, the simulation output flow distribution is generated. Clemshaw-Curtis grid is the appropriate sparse grid to discretize the stochastic demand. Sparse grid interpolation is performed using the output of the simulation at each collocation node. Distribution of simulated flows is obtained by repeated evaluation of the Smolyak interpolation function. This distribution is compared with the sensor data flow distribution and using the test statistic from the KS 90% sign., the error is estimated and is minimized using the SPSA algorithm.

16 Results To achieve the flow distribution for, AM peak: SC approach required 2433 evaluations MC-type sampling required 240,000 runs

17 Results To achieve the flow distribution for, Off-peak SC approach required 441 evaluations MC-type sampling required 5420 runs

18 Results Weekend peak SC approach required 441 evaluations MC-type sampling required 8000 runs

19 Results To illustrate the drawback of using limited data, we compare the distribution of flow for high and low weekend demands with the case where only three weekend days of flow and demand are used to calibrate the weekend model.

20 CONCLUSIONS Conclusions Calibrating for various traffic conditions require large datasets. Big Data such as RFID data from ETC data is useful. However, Big Data poses the computational problem in calibrating for all conditions. Traditional MC-type sampling need heavy computational resources We propose a methodology to capture stochasticity using stochastic collocation by defining each stochastic factor as a dimension. Computationally efficient sparse grids are used to sample the stochastic space and build an interpolant using the deterministic output at each support node of the grid.

21 FUTURE WORK Conclusions & Future Work Using 5-min. 8-month demand data, we calibrate AM peak, off-peak and weekend peak macroscopic traffic models. Distribution of flows is obtained from the interpolant and used with the observed dist. to build a KS test stat. for calibration using SPSA. Proposed methodology: Any type of simulation model Efficient than MC-type methods Parallelized to increase speed Use stochastic parameters for jam density and wave speed for the traffic flow fundamental diagram for a larger freeway section. Apply methodology for a larger network with higher dimensions of stochasticity