Identifying Risk Groups in Flanders: Time Series Approach



Similar documents
Cycling more for safer cycling

CURRICULUM VITAE. Tim De Ceunynck

Introduction to time series analysis

7 Time series analysis

Analysis of Bayesian Dynamic Linear Models

Statistical Forecasting of High-Way Traffic Jam at a Bottleneck

Practical Time Series Analysis Using SAS

Marketing Mix Modelling and Big Data P. M Cain

PREDICTING THE USED CAR SAFETY RATINGS CRASHWORTHINESS RATING FROM ANCAP SCORES

IRG-Rail (13) 2. Independent Regulators Group Rail IRG Rail Annual Market Monitoring Report

State Space Time Series Analysis

Statistics in Retail Finance. Chapter 6: Behavioural models

Reported Road Casualties Great Britain: 2013 Annual Report

Time Series Analysis

ELASTICITY OF LONG DISTANCE TRAVELLING

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

16 : Demand Forecasting

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

A Regional Demand Forecasting Study for Transportation Fuels in Turkey

Oxfordshire Local Transport Plan Revised April Objective 3 Reduce casualties and the dangers associated with travel

1 Example of Time Series Analysis by SSA 1

Implementations of tests on the exogeneity of selected. variables and their Performance in practice ACADEMISCH PROEFSCHRIFT

Penalized regression: Introduction

Chapter 27 Using Predictor Variables. Chapter Table of Contents

Life Cycle Cost Analysis (LCCA)

Time valuation in traffic

ATSB RESEARCH AND ANALYSIS REPORT ROAD SAFETY. Characteristics of Fatal Road Crashes During National Holiday Periods

Accident configurations and injuries for bicyclists based on the German In-Depth Accident Study. Chiara Orsi

MISSOURI TRAFFIC SAFETY COMPENDIUM

THE COST OF ROAD TRAFFIC ACCIDENT IN VIETNAM

Traffic Accident Trends in Hong Kong

Simple Predictive Analytics Curtis Seare

Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith

COMBINING THE METHODS OF FORECASTING AND DECISION-MAKING TO OPTIMISE THE FINANCIAL PERFORMANCE OF SMALL ENTERPRISES

How To Understand The Safety Of A Motorcycle

Integrated Resource Plan

Life Cycle Asset Allocation A Suitable Approach for Defined Contribution Pension Plans

Organizing Your Approach to a Data Analysis

Univariate and Multivariate Methods PEARSON. Addison Wesley

Handling attrition and non-response in longitudinal data

Reported Road Accident Statistics

Policy Document Road safety

On the Dual Effect of Bankruptcy

A Short review of steel demand forecasting methods

How To Know If A Motorcyclist Is Safe

How To Design A 3D Model In A Computer Program

The primary goal of this thesis was to understand how the spatial dependence of

Traffic accidents in Hanoi: data collection and analysis

Time Series Analysis. 1) smoothing/trend assessment

4. Simple regression. QBUS6840 Predictive Analytics.

Measurement of Banks Exposure to Interest Rate Risk and Principles for the Management of Interest Rate Risk respectively.

Association Between Variables

Chapter 25 Specifying Forecasting Models

Advanced Forecasting Techniques and Models: ARIMA

EXPOSURE WORK COMMUTING: CASE STUDY AMONG COMMUTING ACCIDENT IN KLANG VALLEY, MALAYSIA

Estimating and Forecasting Network Traffic Performance based on Statistical Patterns Observed in SNMP data.

Fairfield Public Schools

A three dimensional stochastic Model for Claim Reserving

Independence Day 2016 Holiday Period Traffic Fatality Estimate

Integrating Financial Statement Modeling and Sales Forecasting

An introduction to Value-at-Risk Learning Curve September 2003

Characteristics of High Injury Severity Crashes on km/h Rural Roads in South Australia

A Reliability Point and Kalman Filter-based Vehicle Tracking Technique

A credibility method for profitable cross-selling of insurance products

Deaths/injuries in motor vehicle crashes per million hours spent travelling, July 2008 June 2012 (All ages) Mode of travel

(More Practice With Trend Forecasts)

Four-wheel drive vehicle crash involvement patterns

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

Road safety Performance Indicators in Hungary

East Ayrshire Council Road Safety Plan

11.2 Monetary Policy and the Term Structure of Interest Rates

How To Know If You Are Distracted By Cell Phones

Priority Areas of Australian Clinical Health R&D

11. Time series and dynamic linear models

Computer exercise 4 Poisson Regression

Supplement to Call Centers with Delay Information: Models and Insights

GMP-Z Annex 15: Kwalificatie en validatie

Econometric analysis of the Belgian car market

LOOKING FOR A GOOD TIME TO BET

IASB/FASB Meeting Week beginning 11 April Top down approaches to discount rates

Considerations on the development of frequency and severity of MTPL losses in Italy

Forecasting Tourism Demand: Methods and Strategies. By D. C. Frechtling Oxford, UK: Butterworth Heinemann 2001

5 TRAFFIC ACCIDENT COSTS IN THAILAND

How valid are Motorcycle Safety Data?

Power & Water Corporation. Review of Benchmarking Methods Applied

Kiwi drivers the New Zealand dollar experience AN 2012/ 02

Transcription:

Identifying Risk Groups in Flanders: Time Series Approach RA-MOW-2011-031 D. Karlis, E. Hermans Onderzoekslijn Risicobepaling DIEPENBEEK, 2013. STEUNPUNT MOBILITEIT & OPENBARE WERKEN SPOOR VERKEERSVEILIGHEID

Documentbeschrijving Rapportnummer: Titel: RA-MOW-2011-031 Identifying Risk Groups in Flanders: Time Series Approach Auteur(s): D. Karlis, E. Hermans Promotor: Prof. dr. Geert Wets Onderzoekslijn: Risicobepaling Partner: Universiteit Hasselt Aantal pagina s: 38 Projectnummer Steunpunt: 6.1 Projectinhoud: In dit project worden prognoses op vlak van verkeersveiligheid in Vlaanderen gemaakt. Uitgave: Steunpunt Mobiliteit & Openbare Werken, juni 2012. Steunpunt Mobiliteit & Openbare Werken Wetenschapspark 5 B 3590 Diepenbeek T 011 26 91 12 F 011 26 91 99 E info@steunpuntmowverkeersveiligheid.be I www.steunpuntmowverkeersveiligheid.be

Samenvatting Titel: Identificeren van risicogroepen in Vlaanderen: Tijdreeksbenadering Jaarlijkse ongevallen- en blootstellingsdata voor Vlaanderen uit de periode 1991-2007 worden gebruikt om state-space modellen op te stellen en verkeersveiligheidsvoorspellingen voor de periode 2008-2015 te maken. We maken gebruik van het zogenaamde Latente Risico Tijdreeksmodel, dat geschikt is voor het modelleren van ongevallendata om op die manier inzicht te verkrijgen in de verkeersveiligheidssituatie die kan verwacht worden in de komende jaren. In dit model worden twee componenten, blootstelling enerzijds en verkeersdoden (of een andere categorie van verkeersslachtoffers) anderzijds, gelijktijdig beschouwd. Bovendien focussen we in de analyse ook op kleinere subgroepen, bepaald op basis van de leeftijd van het verkeersslachtoffer, het wegtype en het type weggebruiker (of zijn transportmodus). De voorspellingen duiden op een verwachte daling in het aantal verkeersdoden, hoewel dit niet aan hetzelfde tempo zal gebeuren voor de verschillende subgroepen. Steunpunt Mobiliteit & Openbare Werken 3 RA-MOW-2011-031

English summary Annual accident and exposure data from Flanders covering the period 1991-2007 are used in order to create state space models and make road safety predictions for the period 2008-2015. We make use of the Latent Risk time series model, suitably developed for accident data in order to forecast the road safety situation that can be expected in the forthcoming years. In this model two components, the exposure measurement and the fatalities measurement (or another category of road casualties), are fitted simultaneously. Moreover, in the analysis we also focus on smaller subgroups, depending on the age of the road traffic victim, the road type and the road user type (or transport mode). Forecasts clearly show that the number of fatalities is expected to decrease, however not at the same rate for different subgroups. Steunpunt Mobiliteit & Openbare Werken 4 RA-MOW-2011-031

Inhoudsopgave 1. INTRODUCTION... 6 2. STATE SPACE MODELS... 8 3. THE LATENT RISK MODEL... 11 4. RESULTS FROM AGGREGATE MODELS... 13 5. RESULTS FROM DISAGGREGATE MODELS... 18 5.1 By Road user type 18 5.2 By Age category 21 5.3 By Road type 24 6. SOME STATISTICAL CONSIDERATIONS... 27 7. CONCLUSIONS... 28 8. REFERENCES... 30 9. APPENDIX... 32

1. I N T R O D U C T I O N Road traffic crashes are one of the world s largest public health and injury prevention problems. The problem is all the more important because the victims are overwhelmingly healthy prior to their crashes. A report published by the World Health Organization (WHO, 2009) estimated that approximately 1.3 million people die each year on the world's roads, between 20 and 50 million sustain non-fatal injuries and traffic accidents were the leading cause of death among children of 10 19 years of age. Undoubtedly there is awareness in most societies about this issue and reducing the fatalities from road accidents is always on every political agenda. Also, the issue of traffic safety is high in the academic agenda and a lot of research is undertaken in order to examine and improve traffic safety issues. Lately, there was a downward trend in the number of fatalities in most countries in Western Europe, North America and Oceania (see Elvik, 2010, see also Lassarre, 2001), reflecting the awareness of the problem as well as all the measures undertaken to decrease it. However, apart from fatalities there is also a great concern for the public with respect to other types of non-fatal accidents as they also produce significant losses and thereby contribute to the economic costs. In Flanders, almost 40.000 casualties were registered in 2009 (FOD Economie, 2011). The purpose of the current report is to forecast the (disaggregate) level of road safety in Flanders up to 2015. The data used in the analyses are yearly data up to 2007 covering different subgroups like different age categories, different road user types as well as different road types. A close look in such subgroups is also of primary importance in safety research as it can reveal vulnerable subgroups for which particular measures are most urgently needed. Our forecasts are based on a state space model developed by Bijleveld et al (2008) and is suitable for road safety data as it captures the basic ideas in road safety research. The model assumes that road casualties (of a particular severity level, e.g. fatalities) are the result of the road risk and the exposure of individuals to that risk. While exposure can be approximated using real data the risk is a latent factor not directly observable. So we make use of the Latent Risk time series model that on the one hand treats the risk as latent and on the other hand models the exposure and the casualties at the same time. The model is applied to fatalities, casualties, serious and slight injuries using data from the entire population. As exposure we use the total number of kilometres travelled (in millions). Then we focus on subgroups, namely road type, road user and age. As exposure now we use relevant detailed data if available or proxies if not. Moreover, we primarily focus on fatalities in the disaggregate analyses. The report proceeds as follows: Section 2 briefly introduces the state space models, while section 3 describes the Latent risk model used in the report. Section 4 contains the main results for the entire population. In section 5 we provide some disaggregate analysis focusing on particular subgroups of the population. Working with subgroups offers interesting challenges as for example, most measures affect only a part Steunpunt Mobiliteit & Openbare Werken 6 RA-MOW-2011-031

of the population. Section 6 deals with some statistical considerations about the model fitting. Concluding remarks can be found in section 7. Additional results have been put in the appendix. Steunpunt Mobiliteit & Openbare Werken 7 RA-MOW-2011-031

2. S T A T E S P A C E M O D E L S Road safety data are typically data observed in subsequent time points, creating, hence, a time series. The density of the observations depends on the way they are collected and can have very different time spans. In this report we consider annual data 1, which implies that seasonality has been cancelled out (and consequently, seasonality issues will not be described). A powerful class of time series models are the dynamic models, i.e. models where the parameters may change over time. There are two main classes of univariate dynamic models: ARIMA models studied by Box and Jenkins and unobserved component models which are also called structural models, by Harvey and Sheppard (1993). In a structural model each component or equation is intended to represent a specific feature or relationship in the system under study. State space methods described in this section, belong to the latter group of models. A typical time series may be decomposed in a trend, a seasonal and an irregular part. An important characteristic is that the components are stochastic. Models without stochastic component are called static. Moreover, explanatory variables can be added and intervention analysis carried out. The principal structural time series models are therefore nothing more than regression models in which the explanatory variables are functions of time and the parameters are time-varying. The key to handle structural time series models is the state space form, with the state of the system representing the various unobserved components. State space time series analysis began with the path breaking paper of Kalman (1960) and early developments in the subject took place in the field of engineering. Once in state space form, the Kalman filter may be applied and this in turn leads to estimation, analysis and forecasting. The state space model in its simple form can be expressed as y Z a, a t t t T a R, t ~ N(0, H ), ~ N(0, Q ), t 1 t t t t t t with initial value a1 ~ N( 1, P1 ) t t where matrices be relaxed). Z, t, Ht, Tt Rt and t Q are assumed known (however this assumption can Note matrices equations and H t and Z T, R t t t Q t are covariance matrices associated with the errors of the, are matrices used to appropriately define a multitude of models and they may contain coefficients to be estimated as well. 1 Disaggregate models, providing detailed insights, require more detailed data. At subgroup level, exposure data are difficult to find and often non-existing on a e.g. monthly basis. Steunpunt Mobiliteit & Openbare Werken 8 RA-MOW-2011-031

The key idea of state space models is that a certain parameter a t relates to the parameter at the previous time point, inducing a dynamic linear model. The first equation is called the observation (or measurement) equation and the second equation is called the state equation. The state space formulation for time series models is quite general and encompasses most of the classical time series models like MA and ARIMA models for example. Also since the state equation(s) can capture in a very flexible way the behaviour of the underlying (and unobservable) variables it offers great flexibility with real data. The advantages of state space modelling can be summarized (see, e.g. Durbin and Koopman, 2001) as: They are based on a structural analysis of the problem at hand. The different components that may comprise a time series model, can themselves be modelled separately. They offer greater generality. In fact, several other models can be seen as special case of the state space models. They satisfy the Markovian property and hence the necessary calculations can be put in a typical recursive manner. Forecasting with state space models is relatively easy and simple. State space models in fact apply some smoothing in the data and hence forecasts are also smooth. In addition, diagnostic checking is simple as the Kalman filter employed provide such a framework. State space models are adaptive and the benefits of this are usually realised by implementing them in real time since only minor calculations are needed. Finally, they offer great flexibility as they can be used in certain circumstances, allowing for refined modelling in several problems. At the same time, some disadvantages should be mentioned. The models are usually more complicated and less interpretable than standard time series models, especially for non-treated researchers making their acceptance in some problems not easy. In addition, some added computational effort is needed with respect to much simpler models. Finally, note that while for certain models state space modelling is well established and easy to use, there are models where it is not so easy, like for example discrete valued time series models. The model developed by Zeger (1988) is in fact a state-space model for modelling discrete time series. However, assuming a Poisson distribution leads to a rather complicated recursion for the state equation and makes estimation difficult. State space models are currently popular models for accident prediction mainly due to their generality and flexibility (see e.g. Gould et al, 2004, Hermans et al, 2006a, 2006b, Bijleveld, 2008). Several software packages (like R, EVIEWS, MATLAB just to name a few) are available for fitting such models (see the special issue of Journal of Statistical Software, Commandeur et al, 2011). State space models provide a convenient Steunpunt Mobiliteit & Openbare Werken 9 RA-MOW-2011-031

and powerful framework for analyzing time series data. More details can be found in several textbooks devoted to these models, see e.g. Durbin and Koopman (2001) and Commandeur and Koopman (2007). Steunpunt Mobiliteit & Openbare Werken 10 RA-MOW-2011-031

3. T H E L A T E N T R I S K M O D E L The Latent Risk Time series Model (LRT) was introduced by Bijleveld et al. (2008). The LRT model is a particular case of state-space models. It has been developed in order to capture the idea of risk in road safety, an unobservable quantity which in fact plays a very important role in accident analysis. Road safety is usually affected by two factors: the risk and the exposure of the individuals to that risk. This approach was first developed by Oppe (1989, 1991). This decomposition implies that in order to analyze issues related to road safety one must be able to measure both quantities. While exposure can be measured using several different indicators, measurement of the risk is not easy. The cornerstone assumption is that traffic safety is the product of the respective developments of exposure and risk (Bijleveld, 2008); typically, exposure can be measured by traffic volume while number of fatalities (or casualties in general) is the product of exposure and (fatal) risk (which is unobservable). The stochastic model considered implies also some errors added to the above relationships, i.e. traffic volume is a proxy of exposure and not a full observation of it while the product of exposure and risk does not fully determines the fatalities. Typically one works with logarithms. A plain explanation for this is that firstly road safety quantities are positive numbers so logarithmic transformation guarantees consistent estimation. Secondly, taking logarithms implies a linear relationship in the logarithmic scale which is a more realistic assumption and thirdly, this makes the developed models easier to be fitted with real data. The LRT model developed in Bijleveld (2008) contains two measurement equations: one for traffic volume, and one for fatalities. In fact the model simultaneously fits two dependent variables (traffic volume and fatalities). In addition to each of these measurement equations two state equations correspond: For traffic volume the measurement equation is (3.1) while the state equations are (3.2) For the fatalities, the measurement equation is: while the state equations are: R (3.3) R (3.4) where is the traffic volume at time t, is the exposure variable at time t, is the number of fatalities at time t, and is the risk at time t, which is not observed, i.e. it is latent. Several extensions of this basic model can be considered, by allowing additional explanatory variables to be present, including the case of dummy variables, usually with Steunpunt Mobiliteit & Openbare Werken 11 RA-MOW-2011-031

respect to interventions. Also, note that in the models above we assume normal distributions for the errors considered. This allows to create models inside the normal family. Estimation is not straightforward due to the recursive way in which the model is defined. Kalman filters are of special importance for such models. The LRT allows to consider together all the important aspects of road safety. Risk is latent and quantified via this model. The errors are considered to be normally distributed, which implies that the two dependent variables are normally distributed in the logarithmic scale. For details about estimation, prediction and other statistical properties we refer the interested reader to Bijleveld et al. (2008). Steunpunt Mobiliteit & Openbare Werken 12 RA-MOW-2011-031

4. R E S U L T S F R O M A G G R E G A T E M O D E L S In this section we present the results on the aggregate forecasts for Flanders. Annual observed data from 1991 to 2007 were used. The road safety indicators considered are the number of fatalities, the total number of casualties, the number of severely injured persons and the number of slightly injured persons. The official Flemish casualty data was obtained from the FOD Economie. With respect to exposure we used the number of total kilometres travelled for that period in millions (Federaal Planbureau). The LRT model described in section 3 was fitted, thereby jointly modelling exposure on the one hand and a road safety outcome indicator (e.g. fatalities) on the other. Note that we have used the same model for casualties and injuries since the idea of risk is the same for these kinds of measurements of traffic safety. Figure 1 presents the real data and the forecasted values. The vertical dotted line implies the period where the forecasting started (2008 in this study). On the left, the observed values are shown while on the right we can see the forecasted values (dots) based on the model, together with a 95% forecasting interval to present the uncertainty around the forecast. The fitted model implied a linear forecasting, but this applies to the logarithmic scale as described in section 3, so we can see some curvature in the predictions. Figure 1 Forecasted values for the years 2008-2015 together with the observed values for the number of fatalities in Flanders (yearly data available for 1991-2007) Steunpunt Mobiliteit & Openbare Werken 13 RA-MOW-2011-031

From Figure 1 a downward trend in the number of fatalities can be deduced resulting in a forecasted number of 360 fatalities by 2015. As expected a longer forecasting horizon implies a larger uncertainty. Note that the interpretation of the figure is the same for all figures presented in this section. Figure 2 deals with the number of casualties (i.e. the sum of fatalities, severe injuries and slight injuries). The uncertainty is much larger now. There is again a downward trend yet it is less than the one for fatalities. An explanation for this is that (see the right panel in Figure 3) the slight injuries are not expected to decrease a lot and they make up a larger part of the casualties. Figure 2 Forecasted values for the years 2008-2015 together with the observed values for the total number of casualties in Flanders (yearly data available for 1991-2007). Figure 3 presents the forecasts for the severe injuries (left) respectively slight injuries (right). Severe injuries are forecasted to decrease to 3690 by 2015. For slight injuries one notices that the variability is very large and that the overall trend in 1991-2007 is not decreasing but rather stable (keeping in mind the large fluctuations that were present). Thus forecasted values are quite close to the 2007 level and not expected to decrease a lot. As already mentioned this has an effect on the overall number of casualties as slight injuries are the largest contributor to this number. Steunpunt Mobiliteit & Openbare Werken 14 RA-MOW-2011-031

Figure 3 Forecasted values for the years 2008-2015 together with the observed values for the number of severely injured persons on the left panel and slightly injured persons on the right panel, in Flanders (yearly data available for 1991-2007). Table 1 summarizes the forecasts for all four measurements of traffic safety. Regarding casualties we present two forecasts: one when using the aggregated data (column (1)) and one when each component is forecasted separately and then summed to obtain a forecast for casualties (column (5)). The differences are rather small, as the maximum proportion is around 1% for 2015, which implies a rather good correspondence between both forecasts. The forecasts in column (1) are shown in Figure 2 as they allow for a better estimation of the standard erros. To conclude, table 1 clearly shows that all road safety outcomes are expected to decrease in the following years. Year Casualties (1) Fatalities (2) Severe Injuries (3) Slight Injuries (4) Prediction of casualties from separate components (5)= (2)+(3)+(4) (5)-(1) Relative difference 2008 42439 508 4443 37499 42450 11 0.02% 2009 42159 483 4326 37417 42226 68 0.16% 2010 41880 460 4213 37334 42008 127 0.30% 2011 41604 438 4103 37252 41793 189 0.45% 2012 41329 417 3996 37169 41582 254 0.61% 2013 41056 397 3891 37087 41376 320 0.78% 2014 40784 378 3789 37006 41173 389 0.95% 2015 40515 360 3690 36924 40974 459 1.13% Table 1. Forecasts for the different road safety outcomes. Column (5) predicts the casualties as the sum of the forecasted number of fatalities, severe and slight injuries. The difference from the direct forecast is negligible. Steunpunt Mobiliteit & Openbare Werken 15 RA-MOW-2011-031

As far as exposure is concerned, all 4 models provided forecasts for the total kilometres travelled (in millions). They are presented in Figure 4. In the Appendix (A2) we also depict the uncertainty around the 4 forecasts which clearly shows that the forecasts mostly agree and the observed small differences are due to the model and the uncertainty of the different variables used in the LRT model. Similar analyses are reported for other variables in Appendices A3 and A4. The values of the forecasts can be read from Table 2 for all the models. Figure 4. Forecasted values about exposure (in million of total kms travelled) for the years 2008-2015 together with the observed values in Flanders (yearly data available for 1991-2007). The four predictions are based on the traffic safety variable used in the LRT model. Forecast based on Severe Injuries Slight Injuries Year Casualties Fatalities 2008 57131 57347 57094 57301 2009 57759 58163 57606 58037 2010 58393 58991 58122 58782 2011 59034 59830 58644 59537 2012 59683 60682 59169 60301 2013 60338 61545 59700 61076 2014 61001 62421 60235 61860 2015 61671 63310 60775 62654 Table 2. Forecasts for the exposure variable, i.e. the total kilometres travelled in millions. We obtained 4 forecasts, one from each model depending on the traffic safety variable used. Steunpunt Mobiliteit & Openbare Werken 16 RA-MOW-2011-031

Summarizing so far, the fatalities are forecasted to be reduced to 360 by 2015. Also the severe injuries are expected to decrease but for slight injuries the decrease is expected to be very small. Since we use data up to 2007 2 to predict the period from 2008 up to 2015, the data for 2008-2010 (which became available in the meantime) can be used to comment on the prediction accuracy of the developed model. The comparison is shown in Table 3. Slight injuries Severe injuries Fatalities Observed Forecasted Observed Forecasted Observed Forecasted 2008 36655 37499 4418 4443 495 508 2009 34927 37417 4269 4326 479 483 2010 34134 37334 3879 4213 436 460 Table 3. Forecasts based on the developed model and the real observed values for slight injuries, severe injuries and fatalities (2008-2010). One can see that while for 2008 the forecasted values are close to the real figures, as time passes the forecasts are less accurate. Recall that in almost all the cases the values are within the 95% forecast intervals, i.e. taking into account the uncertainty the model does not fail to forecast. However, as time passes, the observed values are closer to the lower limit of the forecasted intervals. Slight injuries are forecasted worse by the model, while fatalities are forecasted more reasonably. About the target of maximum 250 fatalities and maximum 2000 severely injured persons by the year 2015, this seem not to be validated by the model. Concerning the number of fatalities, 250 is still inside the forecasting 95% interval but very close to the boundary, while for the number of severe injuries, the value of 2000 is outside. Hence, the model shows that the targets are hard to be met by 2015. In Appendix A1 a small comparison with other simpler models is shown. The findings are similar. 2 2007 was the most recent year for which detailed data was available at the start of the analyses. Steunpunt Mobiliteit & Openbare Werken 17 RA-MOW-2011-031

5. R E S U L T S F R O M D I S A G G R E G A T E M O D E L S Disaggregate models are tools for assessing different policy options, setting goals for safety programmes and predicting future safety developments at the disaggregated level. This makes their development of particular importance for better understanding the problem but also for policy and decision making purposes. While disaggregate models can suffer from lack of data, in our case quite accurate and detailed data for certain subcategories exist and thus we present such an analysis in this section. We primarily focus on the fatalities as for this variable the data are more accurate and detailed. However, in Appendix A5 the results from the disaggregated analyses using the (larger) number of casualties are presented. Note that there are two issues that tend to limit the scope for disaggregation. The first one refers to the fact that the numbers (of e.g. fatalities) in each group are typically much less than the overall number (of fatalities), which leads to increased variability. Consequently, it is more difficult to identify trends and hence the uncertainty on predictions is larger. This implies limitations on the level of disaggregation that can be used. The second issue relates to the availability of exposure measures which may be available for the whole population but not for each group separately. In this section, we present the results of applying the LRT model focusing on the following subgroups: Age classes split in 4 categories (ages 0-18, 19-45, 46-64, 65+). Type of road user (cars, trucks, small vans and motorcycles). Type of road (motorways and non-motorways). We have fitted a separate LRT model to each subgroup. Details follow when describing each subgroup. 5.1 By Road user type We worked with 4 categories of road user namely cars, small vans, motorcycles, and trucks. There were data available for other categories like buses but the number of fatalities were too small to build any interesting model. Recall that in the disaggregate analyses we primarily focus on the fatalities as we aim to identify the subgroups with a large share in the forecasted number of fatalities or with a high (or even increased) fatal risk in the future. As exposure variable for the 4 categories we used data on the number of kilometres travelled by this mode (Federaal Planbureau). Road user types for which no (good) exposure data was available (such as pedestrians) were not considered for analysis. Results from the model, with respect to fatalities are reported in Figure 5. One can notice the wide confidence intervals, implying that the uncertainty around the forecasts is large, perhaps invalidating the forecasts themselves. The overall trend is decreasing. There is a clear downwards trend for cars, motorcycles, trucks and a smaller one for small vans. However, the large uncertainty Steunpunt Mobiliteit & Openbare Werken 18 RA-MOW-2011-031

prohibits deriving clear conclusions for all the road user types and thus any result should be interpreted with care. Note also that the data availability covered a smaller time period than the aggregate data, namely only from 1997 to 2007. Figure 5: Forecasted fatalities and corresponding 95% intervals for different road user types. The available data cover the period 1997-2007. Table 4 contains the forecasted values. The last column is the sum of the values for the 4 user types which is smaller than the number of fatalities forecasted in section 4 since we miss data for some accidents (covering an inhomogeneous class named other which is not used in the forecasting) but also some road users were excluded due to nonavailability of reliable exposure data. Steunpunt Mobiliteit & Openbare Werken 19 RA-MOW-2011-031

Year car Road user small van motorcycle truck Total 2007 (observed) 253 27 57 12 349 Change from 2007 2008 220.47 18.00 57.50 6.45 302.42-13.35% 2009 200.77 16.85 55.47 5.84 278.93-20.08% 2010 182.82 15.77 53.52 5.29 257.40-26.25% 2011 166.48 14.76 51.64 4.79 237.67-31.90% 2012 151.60 13.82 49.83 4.33 219.58-37.08% 2013 138.05 12.93 48.08 3.92 202.98-41.84% 2014 125.71 12.11 46.39 3.55 187.76-46.20% 2015 114.48 11.33 44.76 3.22 173.79-50.20% Proportion of each user type to the total 2007 72.49% 7.74% 16.33% 3.44% 2015 65.87% 6.52% 25.76% 1.85% Table 4. Forecasts for the number of fatalities for different types of road users. Forecasts are derived from the LRT model covering the period 1997-2007. The last column presents the percentage of decrease from 2007. The models forecast a large decrease up to 50% for the year 2015. Also at the bottom of the table we have calculated the share of each of the four considered road user types to the total. Interestingly while the car fatalities will decrease, a large increase on the fatalities in motorcycles is expected (from 16.3% in 2007 up to 25.8% in 2015). Also note that the overall decrease concerning motorcycle fatalities is the smallest. Finally, forecasts for the traffic volumes can be read from Table 5 (in millions of vehicle kilometres). The general trend is increasing for all modes. It is interesting however to note that after a small decrease, the model forecasts an increase which is up to 4.8% for 2015 (compared to 2007). The corresponding graphs can be found in Appendix A3. Year Car Road user small van motorcycle truck Total Change since 2007 2007 43616 5977 671 5729 55993 2008 43028 6072 664 5592 55356-1.14% 2009 43147 6303 676 5670 55796-0.35% 2010 43266 6544 689 5749 56248 0.46% 2011 43385 6793 701 5828 56707 1.28% 2012 43505 7052 714 5909 57180 2.12% 2013 43625 7321 727 5991 57664 2.98% 2014 43745 7600 740 6074 58159 3.87% 2015 43866 7890 753 6158 58667 4.78% Table 5 Forecasts for the traffic volumes for 4 categories of road user type. Steunpunt Mobiliteit & Openbare Werken 20 RA-MOW-2011-031

5.2 By Age category For age categories one issue is how to split the time span to smaller categories. We need to choose a set of age ranges that on the one hand constitute relatively homogeneous groups but on the other also result in adequate fatality numbers per group. An implicit assumption is that demographic changes do not play an essential role on the time span to forecast. Such an assumption, while reasonable for prediction up to 2015, is perhaps not valid for a further time span. So any interpretation is based on these assumptions. Note that since population data is considered as exposure proxy (data are obtained from SVR Vlaanderen), the model forecasts this as well based on the past data. In this report we make use of 4 categories based on the following reasoning Ages 0-18 represent young road users Ages 19-45 imply the active population Ages 46-65 imply maturity Ages 65+ refer to retired persons with perhaps increasing limitations to road usage. This rather broad categorization also helps to avoid very small numbers of fatalities per category which may create large problems during the estimation. One problem with this kind of analysis is the lack of detailed exposure data per age category over time. Figure 6 depicts the forecasts up to 2015. For all age categories we see a decrease in the number of fatalities to expect. Note the increased variability in the last age category due to the relatively small numbers but also to large fluctuations of the data. Steunpunt Mobiliteit & Openbare Werken 21 RA-MOW-2011-031

Figure 6 Forecasted values for the years 2008-2015 together with the observed values for fatalities for different age categories (yearly data available for 1991-2007) Steunpunt Mobiliteit & Openbare Werken 22 RA-MOW-2011-031

Figure 7. Forecasted fatalities for all age groups. Figure 7 presents the observed and predicted evolution in the number of fatalities for all age categories in one plot. The aim is to allow for some kind of comparison. Clearly, the age group 19-45 is the one with most fatalities but also the one with a larger decrease. This leads to an estimated downward trend. An interesting comparison also appears between the age groups 46-65 and 65+. It seems that ages 65+ are forecasted to have more fatalities in the future (of course note the uncertainty around the forecasts). The trend for this age group is larger than that of the age group 46-65. The forecasted numbers can also be read in Table 6. The summed values are smaller than the fatalities forecasted when the entire population is considered as some cases are missing (no age was recorded). Steunpunt Mobiliteit & Openbare Werken 23 RA-MOW-2011-031

Age Year 0-18 19-45 46-65 65+ Total Change since 2007 2007 34 283 110 95 522.00 2008 23.95 235.97 95.48 103.52 458.92-12.08% 2009 20.70 218.12 91.08 100.78 430.68-17.49% 2010 17.89 201.63 86.88 98.10 404.50-22.51% 2011 15.47 186.38 82.87 95.50 380.22-27.16% 2012 13.37 172.28 79.04 92.96 357.65-31.48% 2013 11.55 159.25 75.39 90.50 336.69-35.50% 2014 9.99 147.21 71.92 88.09 317.21-39.23% 2015 8.63 136.08 68.6 85.76 299.07-42.71% Proportion of each age group to the total 2007 6.51% 54.21% 21.07% 18.20% 2015 2.89% 45.50% 22.94% 28.68% Table 6 Forecasts for the number of fatalities in the period 2008-2015 for different age categories. The total is less than the total number of forecasted fatalities as for some of the data the real age category is missing and hence they were not considered in the estimation. From Table 6 we can deduce the expected decrease in the number of fatalities (already shown in the aggregate results). An issue worthwhile mentioning is that the share among fatalities for the age group 65+ is expected to increase from 18.2% in 2007 up to 28.7% in 2015, partially due to the increasing age of the population as captured by the LRT model. Moreover, this proportion is expected to decrease for the most vulnerable age categories (0-45). 5.3 By Road type The third categorization used refers to the road type. Two subgroups were distinguished, motorways and non-motorways. This last class consists of local roads and regional roads (these two types of roads are considered together because the distinction between the two differs for the period before and after 2003). Data on the traffic volume for each category were available for the period 1991-2007 (Federaal Planbureau). Regarding the road safety indicator, fatalities are considered. For both road types there is a decreasing trend which is larger for the class of non-motorways than for that of motorways as can be seen in Figures 8 and 9. However, the variability in the number of fatalities on motorways is much larger due to the smaller number of fatalities that leads to rather uncertain forecasts. Table 7 presents the forecasts together with the forecasts for the traffic volumes. Steunpunt Mobiliteit & Openbare Werken 24 RA-MOW-2011-031

Figure 8. Forecasted values for the years 2008-2015 together with the observed values for the number of fatalities on motorways (yearly data available for 1991-2007) Figure 9. Forecasted values for the years 2008-2015 together with the observed values for the number of fatalities on non-motorways (yearly data available for 1991-2007) The increase in traffic volume on motorways is forecasted to be much larger than that on non-motorways as can be seen in Table 7. Steunpunt Mobiliteit & Openbare Werken 25 RA-MOW-2011-031

Year 2007 (observed) motorways Traffic Volume Total Change since 2007 motorways Fatalities nonmotorways nonmotorways Total 22045 34583 56628 75 452 527 Change since 2007 2008 21719 34346 56065-0.99% 53 396 449-14.80% 2009 22107 34442 56549-0.14% 49 370 419-20.49% 2010 22501 34537 57038 0.72% 45 345 390-26.00% 2011 22903 34633 57536 1.60% 41 322 363-31.12% 2012 23312 34729 58041 2.50% 38 301 339-35.67% 2013 23728 34826 58554 3.40% 35 281 316-40.04% 2014 24151 34922 59073 4.32% 32 262 294-44.21% 2015 24582 35019 59601 5.25% 29 245 274-48.01% Table 7. Forecasts for the traffic volume and the number of fatalities in the period 2008-2015 for different road types Finally, note that we forecast a decrease in the proportion of fatalities on motorways from 14.2% in 2007 down to 10.5% in 2015 while the traffic is expected to increase in proportion from 38.9% to 41.2%. Steunpunt Mobiliteit & Openbare Werken 26 RA-MOW-2011-031

6. S O M E S T A T I S T I C A L C O N S I D E R A T I O N S In the previous sections we presented the results from fitting the LRT model in several different cases. In this section, we would like to comment on some statistical issues with respect to the forecasting. The models were fitted using R and the library dlm (see Petris, 2009, 2011, Petris and Peronne, 2011). The library maximized numerically the likelihood of the model. Several initial values were used and the convergence message of the routine was checked to ensure convergence. Due to the complicated nature of the model and the small sample size, a few problems occurred. Based on the large number of initial values used in our case, we are confident that we managed to find the global maximum. Since the logarithm of the dependent variable was fitted, the derived forecasting intervals needed to be transformed back to the correct scale. We applied a single transformation of the intervals derived which might have very little implications. For all series available we fitted the LRT model using an exposure variable and the observed data. We have run some other models as well, like the local linear trend model that does not assume any exposure measure. In almost all cases, as expected, the LRT model was superior. So we presented here only the results from this model. This enables comparison as for all the presented forecasts the same model was applied. We ran some goodness of fit tests, based on the residuals. The sample size limits this assessment to some extent. For some series only 11 data points were available which is small to obtain a clear picture. Here, we do not discuss issues of goodness of fit since the sample size is too small to lead to a solid conclusion. For some series, especially in the disaggregate cases (section 5), the observed values were small. Although the fitted LRT model assumed normality, due to the discreteness of the data and the small values the normality assumption might not have been the most appropriate one. On the other hand, predictions are based merely on the expected mean which provides reasonable estimates. The main problem may be on the variability and the symmetry assumed for the forecasting intervals. So, the range of the forecasting intervals in such cases should be interpreted with caution. For all models fitted the number of available observations was rather small. Working with yearly data we had at most 16 observations available to fit the model. Given the implied latent structure of the model, the derived forecasts have rather large standard errors and hence they provide large confidence intervals. Steunpunt Mobiliteit & Openbare Werken 27 RA-MOW-2011-031

7. C O N C L U S I O N S The Latent Risk Time series model was applied to annual data about traffic safety from Flanders covering he period 1991-2007 (1997-2007 for some series). The aim was to forecast for the time period 2008-2015 and in particular to focus on subgroups, like age, road user type and road type. The results show a clear downward trend in the number of fatalities to be expected up to 2015. However, due to the small time series available the forecasts are presented in rather broad forecasting intervals and hence their usage and interpretation should occur with care. It is important to point that our discussion is based on the point estimate and not the interval itself as it is too wide to allow for interesting commenting. Nevertheless, intervals are needed as the exact number can never be forecasted; there is always some level of uncertainty present. Forecasting was based on the assumption of a similar situation in the future, without additional interventions considered (business-as-usual). However, this is a very strong assumption which is hard to be tested and which perhaps many people do not necessarily believe, especially within the economic environment of today. For example, while our model forecasts exposure based on historical data, taking into account the financial crisis would probably have some impact on the forecasts (but this requires detailed investigation). Hence, the findings should be used cautiously on this aspect. Also note that we have used annual data. This does not allow studying the temporal fluctuations of the phenomenon in detail and allows for only broad forecasts. More detailed (e.g. monthly) data could imply more detailed forecasts but requires additional data efforts and possibly more complicated models. This is also the case for the fact that no exogenous variables have been considered in the forecasting models. However, we believe that the usage of exogenous variables, which could provide interesting insight, could also lead to some problems in prediction as their values should also be forecasted in a good manner in order to allow for realistic forecasts, and this may increase the variability of the forecast as well as lead to instability of the model fitting procedure. Based on the disaggregate analysis performed for various subgroups, like age categories, road user type and road type we forecast as general trend a decrease in the number of fatalities in each subgroup. However, inside the group we expect a change in the proportion of fatalities, namely: The share of motorcycle fatalities will increase within the next years. The share of fatalities in the age group 65+ will increase within the next years. We expect a decrease in the share of fatalities on motorways, despite the increase of traffic, while the share of non-motorway fatalities is expected to increase. Steunpunt Mobiliteit & Openbare Werken 28 RA-MOW-2011-031

Also the model itself assumes linear relationships (in our case the linearity applies to the log of the fatalities and exposure). While this assumption sounds plausible from literature and theory, it is hard to be checked. So the model usually expands a linear trend at the logarithmic scale to the future. Finally, since exposure data were not available for some series in the disaggregate analysis, proxies were used, or particular categories (such as pedestrians) could not be considered. For such series more refined forecasts are possible if more accurate exposure data become available. Steunpunt Mobiliteit & Openbare Werken 29 RA-MOW-2011-031

8. R E F E R E N C E S 1. Bijleveld, F., Commandeur, J., Gould, P. and Koopman, S.J. (2008). Model based measurement of latent risk in time series with applications. Journal of the Royal Statistical Society, Series A, 171, 265-277. 2. Bijleveld, F. (2008). Time series analysis in road safety research using state space methods. PhD Thesis, SWOV. 3. Chatfield, C. (2003). The Analysis of Time-series: An Introduction 6th edition, CRC Press. 4. Commandeur, J.J.F. and Koopman S.J. (2007). An introduction to state space time series analysis. Oxford: Oxford University Press. 5. Commandeur, J.J.F., Koopman, S.J. and Ooms, M. (2011) Statistical Software for State Space Methods, Journal of Statistical Software, Vol. 41, Issue 1. 6. COST 329. (2004). Models for traffic and safety development and interventions (Final report of the action EUR 20913). Brussels, Belgium: Directorate General for Transport, European Commission. 7. Durbin, J. and Koopman, S.J. (2001). Time series analysis by state space methods. Oxford: Oxford University Press. 8. Elvik, R. (2010). The stability of long-term trends in the number of traffic fatalities in a sample of highly motorized countries. Accident Analysis and Prevention, 42, 245-260. 9. FOD Economie (2011). Verkeersongevallen- en slachtofferdata in Vlaanderen. 10. Federaal Planbureau - Transportdatabanken, http://www.plan.be/databases/ Databases. php?lang=nl&tm=27&is=60, accessed September 2011. 11. SVR Vlaanderen cijfers Demografie. http://www4dar.vlaanderen.be/sites/svr/ Cijfers/ Pages/Excel.aspx, accessed September 2011. 12. Gould, P.G., Bijleveld, F.D and Commandeur, J.J.F. (2004). Forecasting road crashes: a comparison of state space models. Paper presented at the 24th International Symposium on Forecasting, 4-7 July 2004, Sydney, Australia. 13. Harvey, A. C. and Shephard, N. (1993). Structural Time Series Models. Handbook of Statistics 11: 261-302. 14. Harvey, A.C., Koopman, D. and Shephard, N. (2004). State space and unobserved component models: theory and applications. Oxford University Press 15. Hermans, E., Wets, G. and Van den Bossche, F. (2006a). Describing the Evolution in the Number of Highway Deaths by Decomposition in Exposure, Accident Risk and Fatality Risk, Transportation Research Record, 1950, 1-8. 16. Hermans, E., Wets, G. and Van den Bossche, F. (2006b), The Frequency and Severity of Belgian Road Traffic Accidents studied by State Space Methods, Journal of Transportation and Statistics, 9,63-76. Steunpunt Mobiliteit & Openbare Werken 30 RA-MOW-2011-031

17. Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82 (1), 35 45. 18. Lassarre, S. (2001). Analysis of progress in road safety in ten European countries. Accident Analysis and Prevention, 33, 743-751. 19. OECD (1997). Road safety principles and models: review of descriptive, predictive, risk and accident consequence models (OECD/GD(97)153). Paris, France: OCDE-OECD. 20. Oppe, S. (1989). Macroscopic models for traffic and traffic safety, Accident Analysis and Prevention, 21(3), 225-232. 21. Oppe, S. (1991). The development of traffic and traffic safety in six developed countries, Accident Analysis and Prevention, 23(5), 401 412. 22. Petris, G. (2009) Dynamic Linear Models with R, Springer 2009 23. Petris, G. and Peronne, S. (2011) State Space Models in R, Journal of Statistical Software, 41 24. Petris, G. (2011) An R package for Dynamic Linear Models, Journal of Statistical Software, 36. 25. WHO (2009) World report on road traffic injury prevention. World Health Organization. 26. Zeger, S.L. (1988). A regression model for time series of counts, Biometrika, 75, 621 629. Steunpunt Mobiliteit & Openbare Werken 31 RA-MOW-2011-031

9. A P P E N D I X A1: Comparison with other models In order to have an idea on the value of the LRT model we also made predictions with other models, mainly the Local Linear Trend model (see e.g. Petris and Peronne, 2011). It models the fatalities only (no exposure is used) based on a simple trend model. The aggregate results of the two models can be seen in Figure A1. There are only minor differences between the forecasts provided by the two models. The LLT model forecasts lying inside the 95% forecast interval (the shadowed area) from the LRT model implies that the differences are not significant. In general, it is hard to select one model over the other especially due to the small sample size and the different nature of the data. Some small simulation comparisons revealed that the LRT forecasts are better. That is why we have presented the results from this model only in the text. However, figure A1 is informative on the dynamics of the two models since the LLT model follows the linear trend of the data while the LRT model takes information from the exposure as well into account and adjusts the trend by this. Also it is imperative to note that the similarity of the two models implies that we might not need the additional latent variable (risk) to use as the variability can be explained by a simple linear trend model. Figure A1. Prediction for the number of fatalities based on the LRT model (black dots + 95% forecasting interval) and the LLT model that ignores the exposure (red triangles). Steunpunt Mobiliteit & Openbare Werken 32 RA-MOW-2011-031

A2: Forecasts for Traffic Volume Figure A2 Forecast for the total kilometres travelled based on the different aggregate models. The forecasts are very close together but the width of the interval is different implying a different uncertainty. (Note that the uncertainty in the fatalities is much larger since we have smaller numbers) Steunpunt Mobiliteit & Openbare Werken 33 RA-MOW-2011-031

A3: Forecasts for Traffic Volume per road user type Figure A3 Forecasts for the total kilometres travelled based on different road user types. Steunpunt Mobiliteit & Openbare Werken 34 RA-MOW-2011-031

A4: Forecasts for Traffic Volume per road type Figure A4 Forecasts for the total kilometres travelled based on different road types. Steunpunt Mobiliteit & Openbare Werken 35 RA-MOW-2011-031

Casualties 2000 2500 3000 Casualties 300 400 500 600 Casualties 50000 150000 Casualties 0 40000 80000 A5: Disaggregated models for the number of casualties In a similar manner as in section 5 we have fitted models for disaggregated data based on the total number of casualties. Section 5 dealt with the number of fatalities. The number of casualties is typically much larger and provides a more general picture. The number of casualties is obtained by summing the number of fatalities, severely injured and slightly injured persons. The exposure variables used for the models are the same as in section 5 for all types of analysis. By road user The forecasts are interesting since they show some differences from Figure 5. Namely, while the fatalities for cars are expected to decrease (see Figure 5) the casualties are forecasted to increase. The same is true for small vans and motorcycles. For trucks, there is a decreasing forecast for both fatalities and casualties. Cars Small Vans 1997 2001 2005 2009 2013 year 1997 2001 2005 2009 2013 year Motorcycles Trucks 1997 2001 2005 2009 2013 year 1997 2001 2005 2009 2013 year Figure A.5.1 Forecasted casualties for the years 2008-2015 and corresponding 95% intervals for different road user types. The available data cover the period 1997-2007. Steunpunt Mobiliteit & Openbare Werken 36 RA-MOW-2011-031

Casualties 6000 10000 14000 Casualties 2000 4000 Casualties 4000 6000 8000 Casualties 15000 25000 By age group Again, we can see some differences regarding the forecasts for casualties compared to that for fatalities. In particular, casualties are expected to increase for age groups 46-65 and 65+ while fatalities (see Figure 6) were forecasted to decrease. Ages 0-18 Ages 19-45 1991 1997 2003 2009 2015 year 1991 1997 2003 2009 2015 year Ages 46-65 Ages 65+ 1991 1997 2003 2009 2015 year 1991 1997 2003 2009 2015 year Figure A.5.2 Forecasted casualties for the years 2008-2015 and corresponding 95% intervals for different age categories (yearly data available for 1991-2007) By road type For motorways the number of casualties is expected to remain almost at the same level while fatalities were expected to decrease. For non-motorways, things are very similar, expecting a decrease in both fatalities and casualties. Steunpunt Mobiliteit & Openbare Werken 37 RA-MOW-2011-031

Casualties 30000 35000 40000 45000 50000 Casualties 0 10000 20000 30000 40000 50000 Motorways 1991 1994 1997 2000 2003 2006 2009 2012 2015 year Figure A.5.3 Forecasted casualties for the years 2008-2015 and corresponding 95% interval for the number of casualties on motorways (yearly data available for 1991-2007) Non-motorways 1991 1994 1997 2000 2003 2006 2009 2012 2015 year Figure A.5.4 Forecasted casualties for the years 2008-2015 and corresponding 95% interval for the number of casualties on non-motorways (yearly data available for 1991-2007) Steunpunt Mobiliteit & Openbare Werken 38 RA-MOW-2011-031