Proceedngs of the World Congress on Engneerng 2012 Vol I Decson Tree Model for Count Data Yap Bee Wah, Norashkn Nasaruddn, Wong Shaw Voon and Mohamad Alas Lazm Abstract The Posson Regresson and Negatve Bnomal Regresson models are the conventonal statstcal models for count data. Ths paper presents usng decson tree to model motorcycle accdent occurrences and compared ts classfcaton performance wth Posson Regresson and Negatve Bnomal Regresson model. The frequency of motorcycle accdents that nvolve death or serous njury based were converted nto a categorcal dependent varable (zero, low and hgh frequency) and the factors consdered are collson types, road geometry, tme, weather condton, road surface condton and type of days. Based on classfcaton accuracy, results show that the decson tree model usng CART (Classfcaton and Regresson Tree) slghtly performs better (78.1%) than Posson Regresson (76.3%) wth Negatve Bnomal Regresson (77.6%) models. The CART decson rules revealed that the most sgnfcant factor contrbutng to hgh frequency of motorcycle accdents that result n death or serous njury s when the accdents happen on a straght road, juncton or bend. Index Terms count data, decson tree, motorcycle accdents, Posson regresson, Negatve Bnomal regresson I. INTRODUCTION oad accdent occurrences are one of the major ssues n R the news. In 2002, t s reported that more than 1.2 mllon people ded due to road traffc accdents and t s ranked as the eleventh top causes of death n the world [1]. Furthermore, n 2004 t s reported that traffc accdents s the top three leadng causes of deaths for people aged between 5 and 44 years old [2]. Accordng to MIROS (Malaysan Insttute of Road Safety Research) the hghest number of regstered vehcles n Malaysa s motorcycles. For the year 2009, the total number of regstered motorcycles n Malaysa was estmated at 8,940,230, where 113,962 nvolved n accdents and about 4070 deaths (nclude motorcyclsts and pllon rders) [3]. It s much more dangerous to rde a motorcycle than to drve an automoble n terms of njury or death when an accdent occurs. Therefore, t s no surprse that the percentage of fataltes that nvolved motorcyclsts and the pllon rders are hgh. MIROS also reported that n terms of fataltes by mode of Manuscrpt receved Mar 21, 2012; revsed Apr 12, 2012. Ths work was supported n part by the Malaysan Fundamental Research Grant, FRGS/2/2010/SG/UITM/02/34 B. W. Yap. Faculty of Computer and Mathematcal Scences, UTM Shah Alam, 40450 Shah Alam, Selangor, Malaysa. (correspondng author: phone: +603-5543-5461; fax: +603-5543-5501; e-mal: beewah@ tmsk.utm.edu.my). N. Nasaruddn. Faculty of Computer and Mathematcal Scences, UTM Shah Alam, 40450 Shah Alam, Selangor, Malaysa. (e-mal: see_keen86@yahoo.com). S. V. Wong. Department of Mechancal and Manufacturng Engneerng, Unversty Putra Malaysa, 43400 UPM, Serdang, Malaysa. (e-mal: wongsv@eng.upm.edu.my). M. A. Lazm. Faculty of Computer and Mathematcal Scences, UTM Shah Alam, 40450 Shah Alam, Selangor, Malaysa. (e-mal: dralas@tmsk.utm.edu.my). transport, motorcycle fataltes ranked the hghest as shown n Table 1, the motorcycle fataltes (58.21%), (59.72%) and (60.30%) for 2002, 2008 and 2009 respectvely [4]. TABLE I FATALITIES BY MODE OF TRANSPORT % change Road User over 2002 % 2008 % 2009 % Prevous 2002 year Pedestran 650 11.03 598 9.16 589 8.73-1.51-9.38 Motorcycle 3429 58.21 3898 59.72 4067 60.30 4.34 18.61 Bcycle 261 4.43 203 3.11 224 3.32 10.34-14.18 Car 1023 17.37 1335 20.45 1405 20.83 5.24 37.34 Van 156 2.65 96 1.47 91 1.35-5.21-41.67 Bus 45 0.76 48 0.75 31 0.46-35.42-31.11 Lorry 197 3.34 195 2.99 213 3.16 9.23 8.12 4Wheel 74 1.26 106 1.62 78 1.16-26.42 5.41 Other 56 0.95 48 0.74 47 0.7-2.08-16.07 Total 5891 100 6527 100 6745 100 3.34 14.50 (Source: Road facts, Retreved from: http://www.mros.gov.my/web/guest/road) The general defnton of the term road accdent n Malaysa s An occurrence on the publc or prvate roads due to the neglgence or omsson by any party concerned (on the aspect of road users conduct, mantenance of vehcle and road condton) or due to envronmental factor (excludng natural dsaster) resultng n a collson (ncludng out of control cases and collsons or vctms n a vehcle aganst object nsde or outsde the vehcle e.g. bus passenger) whch nvolved at least a movng vehcle whereby damage or njury s caused to any person, property, vehcle, structure or anmal and s recorded by the polce. Accordng to the types of road accdents, the descrpton of the term fatal road accdent s road accdents n whch one or more person were klled wthn 30 days from the date of event, serous njury road accdent s A road accdent n whch at least a person sustaned serous njury but none klled, mnor njury road accdent s A road accdent n whch one or more person were njured but not klled or serously njured and non-njury road accdent s A road accdent n whch no person was klled or njured [5], [6]. Most studes nvolvng cross-sectonal count data use the conventonal Posson Regresson and Negatve Bnomal Regresson models. Ths paper focus on usng decson tree to model motorcycle accdent frequency and compared t wth the conventonal Posson Regresson and Negatve Bnomal Regresson model. Ths paper s organzed as follows. Secton 2 provdes a revew on prevous studes on traffc accdents. The methodology and technques are explaned n Secton 3 whle the descrpton of the data s gven n Secton 4. The results for decson trees, Posson and Negatve Bnomal Regresson model are then presented and compared n Secton 5. Fnally the concluson and some recommendatons for future work are gven n Secton 6. ISSN: 2078-0958 (Prnt); ISSN: 2078-0966 (Onlne)
Proceedngs of the World Congress on Engneerng 2012 Vol I II. LITERATURE REVIEW Prevously, numerous studes have addressed varous aspects of traffc accdents, ncludng accdents nvolvng motorcycles. These global or natonal studes focus manly on predctng the crtcal factors that nfluence the occurrence of accdents. However, the results of each study vary dependng on the model and the varables consdered n the study. Snce road accdents data are count data, most of the prevous studes focus on studyng the effect of some varables that nfluence accdents occurrence. For nstance, [7] studed on the effect of the daytme runnng headlght nterventon (RHL) to mprove the conspcuty-related motorcycle accdents n Malaysa. The varables that are taken nto account to nvestgate the effectveness of the daytme runnng headlght (RHL) and the regulatons ntroduced to reduce the conspcuty-related motorcycle accdents are the nfluence of tme trends, changes n recordng and analyss systems, the effect of fastng durng Muslm holy month of Ramadhan and the balk kampung (returnng to hometown) culture durng the annual festval celebratons. Ther study have found conspcuty-related motorcycle accdents decreased by 29% when the daytme runnng headlght (RHL) was ntroduced. Reference [8] studed the level of njury of motorcycle rders when an accdent occurred at T-junctons n the UK for dfferent collson types whch are rarely conducted. The collson types consdered n ths study are head-on, sdeswpe, rearend, approach-turn and angle. The predctor varables ncluded n ths study are human (gender and age of rder), weather (fne, bad or other), road (juncton control, lght condton, speed lmt), month (sprng, summer, autumn, wnter), tme (mdnght, early mornngs, rush hours, nonrush hours and evenng), day of week and vehcle (engne sze, crash partners). The results of ther study ndcate that the level of njury of motorcycle rders depends on dfferent collson types and are assocated wth all ndependent varables n dfferent ways. For example, njures tend to be greatest at the uncontrolled juncton due to sdeswpe crashes and rear-end B collsons type. However, for those who were nvolved n approach-turn A at sgnalsed junctons, the level of njures wll be much hgher. In addton, md-nght or early mornng s most sgnfcantly assocated wth more severe njures; except for rear-end collson type A / B whle crossng wthout lghts causes serous njures to the motorcyclst n head-on collsons compared to motorcyclst nvolved n other types of collsons. The contrbutng factors of average hourly traffc flow, number of lanes, crashes nvolvng one or more than two vehcles, crash types, weekend/weekdays and nght or day on accdent frequences were nvestgated by [9]. They found that the ncdence rates are more lkely to ncrease as the traffc flow (vehcles per hour) ncreases. For lght traffc travel, the number of crashes s hgher on three-lane than on two-lanes and hgher at weekends. In heavy traffc travel, the number of crashes s hgher at weekdays. The number of deaths caused by accdents s hgher at nght even though the occurrences of accdence at nght are lower as compared wth day. Almost all of the data on road traffc accdents are count data. Conventonal models (Posson or negatve bnomal regresson model) have long been used to analyze accdent frequency [9-12]. However, for modelng ths type of data, Posson and negatve bnomal model does not take nto account the fact when there are many observed zero n accdent data. For data wth many observed zeros [13] and [14] used extended conventonal models usng Zero Inflated Posson or Zero Inflated Negatve Bnomal models. It s found that, conventonal models usng Zero-Inflated model are much better n dealng wth accdent data when there are many observed zeroes. However, the models dscussed above are not broadly used for accdent data n Malaysa. When we have a combned tme seres and cross sectonal data (also known as panel data), the approprate count model are Fxed Effects Posson, Fxed Effects Negatve Bnomal, Random Effects Posson, Random Effects Negatve Bnomal and Dynamc Panel model [15]. The Fxed Effects Negatve Bnomal model was used on a panel count data of 25 countres from 1970 to 1999 and results showed that the mplementaton of road safety regulaton, mprovement n the qualty of poltcal nsttutons and medcal care as well as technology developments have contrbuted to reduce motorcycle deaths [16]. Reference [17] presented an analyss on road accdent occurrence usng panel data analyss approach. The Fxed Effect Posson and Negatve Bnomal model were used to analyze the accdent data on 14 states n Malaysa for the perod of 1996 to 2007. Among the factors consdered n ths study are the monthly regstered vehcle n the state, the amount of ranfall, the number of rany day, tme trend and the monthly effect of seasonalty. Ther results ndcate that the road accdent occurrence are postvely assocated wth the ncrease n the number of regstered vehcle, the amount of ran and tme whle accordng to seasonalty, the accdent occurrence s hgher n the month of October, November and December. Recently, data mnng technques has been used to model motorcycle accdents and other data related to accdents. Data mnng appears as a useful tool to analyze and nterpret a large amount of data and maxmum nformaton can be gathered. Data mnng technques were appled to study the relatonshp between road characterstcs and accdent severty n Ethopa usng decson tree, naïve Bayes and K- nearest neghbor classfers. The results of ther studes ndcate that the performances of the three models are almost equal [18]. Reference [19] conducted logstc regresson, CART and multvarate adaptve regresson splnes (MARS) to analyze accdent data. Ther study ndcated that CART and MARS are attractve models snce these two models can dsplay the results n graphcal manner and able to determne the group of people who are of hgh rsk to be nvolved n motorcycle accdents. The two models were then compared and the results showed that the MARS model s the best n provdng nformaton of potental rsky areas accordng to age and number of years drvng experence [19]. Reference [20] also used adaptve regresson trees to analyze the real data obtaned from Adds Ababa cty traffc offce. Ther study focused on predctng the njury severty levels based on drver s age and gender, age and type of vehcle, road, lght and weather condton as well as type and cause of accdent. Ther results show that the decson tree model accurately classfes the njury severty levels wth 87.47% predctve accuracy whch s ISSN: 2078-0958 (Prnt); ISSN: 2078-0966 (Onlne)
Proceedngs of the World Congress on Engneerng 2012 Vol I reasonably hgh. Smlarly, Classfcaton and Regresson Tree (CART) one of the most wdely appled data mnng technques and negatve bnomal model were used to analyze the accdent data n Tawan for the year 2001-2002. CART was found to perform better n analyzng the freeway accdent frequences. CART s dfferent from the Posson and Negatve Bnomal regresson model snce t does not requre any predefned assumpton about the data [21]. III. METHODOLOGY Ths secton explans n detal the modelng technques used. The target varable s frequences of motorcycle accdent. For regresson count model, the target varable s of count data type whle for decson tree model, the target wll be coded nto three categores: 0=zero frequency, 1=low frequency (1-19) and 2=hgh frequency (20 and above). The sample data set was randomly splt to create the tranng and testng samples. The number of observatons for tranng sample s 896 observatons (70% of the total observatons) whle testng sample s 384 observatons (30% of the total observatons). SPSS 16.0 were used to buld a Posson and Negatve Bnomal regresson model whle SPSS Clementne 12.0 (now known as IBM SPSS Modeler) was used to buld the decson tree model. A. Posson Regresson The benchmark model for count data s the Posson Regresson model. Prevously, researchers analyzed count data by usng ordnary lnear regresson. Posson regresson has the advantage of beng precsely talored for dscrete dependent varable whch s hghly-postvely skewed. The Posson regresson model s approprate for target varable that have only non-negatve nteger values such as motorcycle accdent frequences. Besdes, the data y s assumed to be ndependent and follows a Posson dstrbuton. An unusual property of the Posson dstrbuton s that the mean and varance are equal: E ( y) var( y) Let the dependent varable (y) be motorcycle accdent frequences whch s drawn from a Posson dstrbuton wth condtonal mean of, gven vector X for case. Thus the densty functon of y can be expressed as; f ( Y X y ), for y 0,1,2.. (1) y! Where exp( ' X ). In order to develop a Posson regresson model, s expressed as a functon of some explanatory varables through a log lnk functon n the followng form; ' ln X or exp( ' X ) (2) Gven the ndependent observatons assumpton, wth densty functon (1), the regresson parameters β s estmated usng the maxmum lkelhood method. B. Negatve Bnomal Regresson In certan stuatons, overdsperson may arse and Posson regresson model s then not approprate. Overdsperson arses when the observed varance of Y s greater than the mean. The most common parametrc model for overdsperson s Negatve Bnomal whch ntroduced a dsperson parameter to accommodate for unobserved heterogenety n count data. Ths model s a generalzaton of Posson Regresson whch assumes that the condtonal mean of Y s not only determned by X but also a heterogenety component of unrelated to X. The formulaton can be expressed as: ' ' ˆ exp( X ) exp( X ) exp( ) Where exp( ) ~ Gamma (, ). As a result, the densty functon of Y can be derved as ( Y ) f ( Y X ).( ).( ) ( Y 1) ( ) 1 The Negatve Bnomal dstrbuton s derved as a gamma mxture of Posson random varables wth condtonal mean and varance of E ( y ) exp( ' x x ) and 2 Var ( y x ). Note that when 0, the model becomes the Posson regresson model. Thus, Negatve Bnomal model has greater flexblty n modelng the relatonshp between the expected value and varance ofy. The smaller the α, the closer the Negatve Bnomal approaches the Posson model [22-23]. C. Decson Tree A decson tree model conssts of a set of rules for dvdng a large collecton of observatons nto smaller homogeneous group wth respect to a partcular target varable. The target varable s usually categorcal and the decson tree model s used ether to calculate the probablty that a gven record belongs to each of the target category, or to classfy the record by assgnng t to the most lkely category. Decson tree can also be used for contnuous target varable although multple lnear regresson models are more sutable for such varable. Gven a target varable and a set of explanatory varables, decson algorthms automatcally determne whch varables are most mportant, and subsequently sort the observatons nto the correct output category [24]. The common decson tree algorthms n data mnng software are CHAID (Ch-Square Automatc Interacton Detector), CART (Classfcaton and Regresson tree) and C5. The splttng crtera for CART, C5 and CHAID are gn, entropy and ch-square test respectvely. These algorthms wll produce the tree-lke structure dagram and the decson rules whereby mportant nformaton can be extracted [25]. Fg. 1 llustrates the constructon of the decson tree model. The data partton node n SPSS Clementne 12 (now known as IBM SPSS Modeler) was not used because the data was ntally parttoned usng SPSS 16. Three algorthms were used to obtan the decson trees whch are Y ISSN: 2078-0958 (Prnt); ISSN: 2078-0966 (Onlne)
Proceedngs of the World Congress on Engneerng 2012 Vol I CART, CHAID and C5.0. From these generated models, the best decson tree wll be selected usng the analyss and evaluaton node. Fg. 1. Process Flow Dagram Frstly, the decson trees (CART, CHAID and C5.0) nodes were connected to the Source node whch contans the Tranng sample. The decson tree models for Tranng sample were then assessed and compared usng the Analyss and Evaluaton node. Subsequently, the CART, CHAID and C5.0 model nodes were connected to the Source node whch contans the Testng (or valdaton) sample. The performance of the decson tree model for testng sample were then assessed and compared usng the Analyss and Evaluaton node. The best decson tree s then compared wth the conventonal models (Posson and Negatve Bnomal Regresson model). IV. THE DATA The motorcycle accdent data were obtaned from MIROS (Malaysan Insttute of Road Safety Research) whle MIROS obtaned the accdent data from the polce department for the generaton and dssemnaton of road safety data and nformaton. The scope of ths study only covers motorcycle accdents occurrences n Malaysa and the data provded contaned the motorcycle accdents data that nvolved deaths or serous njury for the year 2008 and 2009. The data obtaned from MIROS were messy and some varables have outlers and mssng values. Hence, data cleanng, whch nvolves checkng completeness of data records, mssng values, removng for errors were frst performed. At ths stage, the outlers, redundancy and cases wth mssng values are removed. Next, the accdent data was reorganzed based on the sx ndependent varables and the target varable n ths study whch s the frequences of motorcycle accdents. The fnal data set for modelng conssts of 1280 observatons. The varables ncluded n ths study are presented n Table 2. ISSN: 2078-0958 (Prnt); ISSN: 2078-0966 (Onlne) Varable Name Motorcycle Accdent Collson type(ct) Road geometry(rg) TABLE II DESCRIPTION OF VARIABLES Role Varable Type Descrpton Target Count/Categorcal Frequences of motorcycle accdents / Category of Frequences of motorcycle accdents 1: Zero frequency (0) 2: Low frequency (1-19) 3: Hgh frequency (20 and above) Input Categorcal Collson Type 1: Head on 2: Rear 3: Rght Angle Sde 4: Angular 5: Sdeswpe 6: Httng 7: Out of control 8: Forced/Overturned Input Categorcal Road geometry where motorcycle accdent occur: 1: Straght 2: Bend 3: Roundabout 4: Juncton 5: Interchanges Tme (T) Input Categorcal Tme motorcycle accdent occur: 1: Mdnght/early mornng (0000-0559) 2: Rush hours (0600-0859; 1600-1759) 3: Non-rush hours (0900-1559) 4: Evenng (1800-2359) Weather Input Categorcal Weather condton when motorcycle accdent occur: 1: Clear 2: Not Clear (wndy, foggy and ran) Road surface Input Categorcal Road surface condton condton(rsd) where motorcycle accdent occur: 1: Dry 2: Not Dry (flood, wet, oly, sandy and reconstructon work) Days (WDAYS) Input Categorcal Days where motorcycle accdent occur: 1: Weekdays 2: Weekends V. RESULTS In ths secton, the results of statstcal modelng usng Posson, Negatve Bnomal Regresson and decson trees model are presented. A. Posson and Negatve Bnomal Regresson Results The range of motorcycle accdent frequences for the tranng sample s from 0 to 418, wth mean 10.28 and standard devaton of 39.081. The devance for Poson regresson model s 5.414, far from the value of 1, ndcatng overdsperson problem..the devance value for Negatve Bnomal regresson s 0.779 whch s much closer to 1. Besdes that, the lkelhood rato ch-square s 915.915 wth p<0.05 ndcatng that the negatve bnomal regresson model s sgnfcant. Thus, at least one ndependent varable s a sgnfcant predctor of frequency of accdents. Based on Wald ch-square tests, all the sx varables (collson type, road surface condton, tme, weather, road geometry and
Proceedngs of the World Congress on Engneerng 2012 Vol I type of days) are sgnfcant snce all or at least one category for each varable s sgnfcant (p<0.05). In Table 3, the value of AIC and SIC for Negatve Bnomal regresson model s lower than Posson regresson model. Hence, the Negatve Bnomal model demonstrates a better ft than the Posson model. TABLE III ESTIMATED COEFFICIENTS OF POISSON AND NEGATIVE BINOMIAL REGRESSION MODELS Varable Posson Regresson Model Negatve Bnomal Regresson Model Intercept -9.417* -7.508* Collson type = head on 3.396* 3.360* Collson type = rear 3.310* 2.950* Collson type = rght angle 3.088* 2.946* sde Collson type = angular 3.886* 3.742* Collson type = sdeswpe 2.902* 2.457* Collson type = httng 2.114* 1.851* Collson type = out of control 3.013* 3.364* Road geometry = straght 5.322* 5.319* Road geometry = bend 3.877* 3.945* Road geometry = roundabout 0.868* 0.740* Road geometry = juncton 4.509* 4.077* Tme = mdnght/early -1.267* -1.207* mornng Tme = rush hours -0.295* -0.223 Tme = non rush hours -0.008-0.337* Weather 2.784* 1.824* Road surface condton=dry 2.577* 1.511* Weekdays 0.859* 0.869* Log Lkelhood -3071.72-1512.13 No. of Parameters 17 17 AIC 6179.45 3062.26 BIC 6265.81 3153.42 * Sgnfcant at α = 0.05 Hence, the estmated Negatve Bnomal regresson model s wrtten as follow: Log µ = - 7.508 + 3.360CT1 + 2.950CT2 + 2.946CT3 + 3.742CT4 + 2.457CT5 + 1.851CT6 + 3.364CT7+ 5.319RG1 + 3.945RG2 + 0.740RG3 + 4.077RG4-1.207T1-0.223T2-0.337T3 + 1.824CW + 1.511RSD + 0.869WDAYS B. Decson Tree Results There are three types of decson tree algorthms used n ths study, the CART, CHAID and C5.0. All three decson trees found that the sx varables (collson types, road geometry, tme, weather, road surface condton and type of days) are sgnfcant. Results show that the most mportant varables s road geometry. The three decson trees were then compared based on the accuracy rate. Table 4 summarzes the accuracy rate for the three decson tree models appled on the tranng and testng sample. TABLE IV SUMMARY OF DECISION TREE RESULTS Model Accuracy Rate (%) CART Tranng 83.37 Testng 78.12 CHAID Tranng 80.25 Testng 74.74 C5.0 Tranng 82.59 Testng 78.65 The accuracy rate n Table 4 shows that CART has the best accuracy rate for the tranng sample compared to other models whle C5.0 has the best accuracy rate for testng sample. However, the rules for C5.0 are complcated and the termnal nodes have very small number of cases. Overfttng refers to the stuaton n whch the nducton algorthm generates a classfer whch perfectly fts the tranng data but has the lost capablty of generalzng to nstances not presented durng tranng [26]. Despte the slght overfttng problem, the CART model s stll acceptable. The decson tree model s too large to be dsplayed. Hence, only the nterpretatons of the CART rules for hgh frequency of motorcycle accdents that nvolve death or serous njury are presented n Table 5. TABLE V CART RULES FOR HIGH FREQUENCY OF ACCIDENTS 1. The motorcycle accdents happened on straght road, when the weather s clear, the road surface condton s dry and the accdent nvolve head on/rear/rght angle sde/angular/sdeswpe/out of control collson. 2. The motorcycle accdents happened at bend/juncton road, when the weather s clear, the road surface condton s dry and the accdent nvolve head on/angular collson. 3. The motorcycle accdents happened at bend/juncton road, when the weather s clear, the road surface condton s dry, the accdent nvolve rear/rght angle sde/sdeswpe/out of control collson and the accdents occurred n the weekdays C. Model Comparson The comparsons of Posson, Negatve Bnomal Regresson and CART model are summarzed n Table 6. TABLE VI. SUMMARY OF POISSON, NEGATIVE BINOMIAL AND CART RESULTS Model Accuracy Rate (%) Posson Tranng 73.60 Testng 76.30 Negatve Bnomal Tranng 75.28 Testng 77.60 CART Tranng 83.37 Testng 78.12 Based on the accuracy rate for Posson and Negatve Bnomal Regresson model, the predcton accuracy for testng sample are hgher than the tranng sample. Ths s known as underfttng. Undefttng may occur when we mstakenly exclude mportant varables [27]. From the results presented n Table 6, CART was chosen as the best predctve model n predctng the category of occurrences of motorcycle accdents snce t gave the best accuracy rate for tranng and testng sample even though there s slght overfttng. VI. CONCLUSION AND RECOMMENDATIONS Results of ths study show that decson tree can be used to model motorcycle accdent frequences. The classfcaton performance of decson tree model s qute comparable wth conventonal statstcal models and the rules are easy to nterpret. The results of ths study also provde valuable nformaton on how the collson types, road geometry, tme, weather condtons, road surface condtons and type of days are related wth motorcycle accdent occurrences. The most mportant varable n predctng the occurrence of ISSN: 2078-0958 (Prnt); ISSN: 2078-0966 (Onlne)
Proceedngs of the World Congress on Engneerng 2012 Vol I motorcycle accdent s road geometry wth straght road contrbutng to the hghest number of accdents that nvolve deaths or serous njury. Ths could be due to the possblty that when the road s straght, the rders would tend to rde at hgh speed and when an accdent occurs, the stuaton wll be worse or fatal. Further exploraton of our results found that accdents that nvolved deaths or serous njury occur more often when the weather s clear and road surface condton s dry. Other than that, the motorcycle accdent that nvolved deaths and serous njury s more lkely to happen durng evenng and durng weekdays. More data needs to be collected to confrm these fndngs. Some research lmtatons arse n ths study snce polce report data was analyzed. Besdes, even though there are a lot of ndependent varables avalable n the database, the database contans a lot of mssng values, unrelable data and some mportant varables cannot be further nvestgated for nstances speed of motorcycle, helmet and alcohol use as well as cause of an accdent. In future studes, more explanatory varables that mght be avalable from other sources n Malaysa should be consdered. ACKNOWLEDGMENT The authors would lke to express ther grattude to Malaysan Insttute of Road Safety Research (MIROS) for ther contrbutons by provdng the data. We thank the Malaysa Mnstry of Hgher Educaton (MOHE) for the fundng of ths research. We also hghly apprecate the comments from revewers. REFERENCES [1] World Report on Road Traffc Injury Preventon. World Health Organzaton. (2004). Avalable: http://whqlbdoc.who.nt/publcatons/2004/9241562609.pdf (Accessed on 12 Aprl 2011) [2] Global Status Report on Road Safety: Tme for Acton. World Health Organzaton. (2009). Avalable: http://www.un.org/ar/roadsafety/pdf/roadsafetyreport.pdf (Accessed on 12 Aprl 2011) [3] Status Paper on Road Safety (2009) Malaysa. Avalable: http://www.unescap.org/ttdw/common/meetngs/tis/egm- Roadsafety 2010/CountryStatus2009/13.Malaysa.pdf (Accessed on 12 Aprl 2011) [4] Fataltes by Mode of Transport. Avalable: http://www.mros.gov.my/web/guest/road (Accessed on 12 Aprl 2011) [5] Royal Malaysan Polce, Traffc Dvson, Bukt Aman, Kuala Lumpur (2009). Statstcal Report Road Accdent Malaysa. [6] Royal Malaysan Polce, Traffc Dvson, Bukt Aman Kuala Lumpur (2010). Statstcal Report Road Accdent Malaysa. [7] Radn, U. R. S., Mackay, M. G. and Hlls, B. L., Modellng of conspcuty-related motorcycle accdents n Seremban and Shah Alam, Malaysa, Accdent Analyss and Preventon, vol. 28, no. 3, pp. 325-332, 1996. [8] Pa, C. W. and Saleh, W., Modellng motorcyclst njury severty by varous crash types at T-junctons n the UK, Safety Scence, vol. 46, no. 8, pp. 1234-1247, 2008. [9] Martn, J. L., Relatonshp between crash rate and hourly traffc flow on nterurban motorways, Accdent Analyss and Preventon, vol. 34, no. 5, pp. 619-629, 2002. [10] Abdel-Aty, M. A. and Radwan, A. E., Modelng traffc accdent occurrence and nvolvement, Accdent Analyss and Preventon, vol. 32, no. 5, pp. 633-642, 2000. [11] Teoh, E. R. and Campbell, M., Role of motorcycle type n fatal motorcycle crashes, Journal of Safety Research, vol. 41, no. 6, pp. 507-512, 2010. [12] Indrastut, A. K. and Sulsto, H., Motorcycle accdent model for severty level and collson type, Internatonal Journal of Academc Research, vol. 2, no. 5, pp. 210-215, 2010. [13] Lee, A. H., Stevenson, M. R., Wang, K. and Yau, K. K. W., Modelng young drver motor vehcle crashes: Data wth extra zeros, Accdent Analyss and Preventon, vol. 34, no. 4, pp. 515-521, 2002. [14] Lee, J. and Mannerng, F, Impact of roadsde features on the frequency and severty of run-off-roadway accdents: An emprcal analyss, Accdent Analyss and Preventon, vol. 34, no. 2, pp. 149-161, 2002. [15] Wan Yaacob, W. F., Lazm, M. A. and Yap, B. W., A Practcal Approach n Modellng Count Data, Proceedngs of the Regonal Conference on Statstcal Scences, 2010 (RCSS'10), pp. 176-183. [16] Law, T. H., Noland, R. B. and Evans, A. W., Factors assocated wth the relatonshp between motorcycle deaths and economc growth, Accdent Analyss and Preventon, vol. 41, no. 2, pp. 234-240, 2009. [17] Wan Yaacob, W. F., Lazm, M. A. and Yap, B. W., Applyng fxed effects panel count model to examne road accdent occurrence, Journal of Appled Scences, vol. 11, no. 7, pp. 1185-1191, 2011. [18] Beshah, T. and Hll, S., Mnng road traffc accdent data to mprove safety: Role of road-related factors on accdent severty n Ethopa, 2010. [19] Kuhnert, P. M., Do, K. A. and McClure, R., Combnng nonparametrc models wth logstc regresson: An applcaton to motor vehcle njury data, Computatonal Statstcs and Data Analyss, vol. 34, no. 3, pp. 371-386, 2000. [20] Tesema, T. B., Abraham, A. and Grosan, C., Rule mnng and classfcaton of road traffc accdents usng adaptve regresson trees, Internatonal Journal of Smulaton, vol. 10, no. 6, pp. 80-94, 2005. [21] Chang, L. Y. and Chen, W. C., Data mnng of tree-based models to analyze freeway accdent frequency, Journal of Safety Research, vol. 36, no. 4, pp. 365-375, 2005. [22] Cameron, A. C. and Trved, P. K., Regresson analyss of count data. Cambrdge Unversty Press, 1998. [23] Greene, W., Functonal forms for the negatve bnomal model for count data. Economcs Letters 99, 2008, pp. 585-590. [24] Olson, D. and Yong, S., Introducton to Busness Data Mnng. McGraw Hll Internatonal Edton, 2006. [25] Yap, B. W., Ismal, N.H. and Fong, S., Predctng Car Purchase Intent Usng Data Mnng Approach, IEEE Eghth Internatonal Conference on Fuzzy Systems and Knowledge Dscovery (FSKD) Proceedngs, 2011, pp. 2052-2057. [26] Rokach, L. and Mamon, O. Z., Data mnng wth decson trees: theory and applcatons. World Scentfc Publshng Co. Pte. Ltd, 2008, pp. 49. [27] Yan, Xn. and Xao, Gang. Su., Lnear regresson analyss: theory and computng. World Scentfc Publshng Co. Pte. Ltd, 2009, pp. 157. ISSN: 2078-0958 (Prnt); ISSN: 2078-0966 (Onlne)