Time-series regression models o sudy he shor-erm effecs of environmenal facors on healh * Aurelio Tobías and Marc Saez ψ Deparamen d Economia, Universia de Girona Girona, March 24 Absrac Time series regression models are especially suiable in epidemiology for evaluaing shor-erm effecs of ime-varying exposures on healh. The problem is ha poenial for confounding in ime series regression is very high. Thus, i is imporan ha rend and seasonaliy are properly accouned for. Our paper reviews he saisical models commonly used in ime-series regression mehods, specially allowing for serial correlaion, make hem poenially useful for seleced epidemiological purposes. In paricular, we discuss he use of ime-series regression for couns using a wide range Generalised Linear Models as well as Generalised Addiive Models. In addiion, recenly criical poins in using saisical sofware for GAM were sressed, and reanalyses of ime series daa on air polluion and healh were performed in order o updae already published. Applicaions are offered hrough an example on he relaionship beween ashma emergency admissions and phoochemical air polluans in Madrid for he period 995-998, of how hese mehods are employed. Keywords: Time-series, Poisson, GLM, GAM, auocorrelaion, overdispersion, air polluion. JEL classificaion: C5, C53, Q5, Q54. * We wan o hank commens and advice from José Ramón Banegas, Iñaki Galán, Julio Díaz, María Anonia Barceló and Ricardo Ocaña. We also acknowledge he following insiuions o provide daa: Red Palinológica de la Consejería de Sanidad de la Cominidad de Madrid, Deparameno de Conrol de Conaminación Amosférica del Ayunamieno de Madrid, Subdirección General de Calidad Ambienal del Miniserio de Medio Ambiene, Programa Regional de Prevención y Conrol del Asma and Hospial Gregorio Marañón. This sudy was funded by he Comisión Asesora del Programa Regional de Prevención y Conrol del Asma de la Comunidad de Madrid and Aurelio Tobías was enjoying a posgraduae fellowship of Universidad Auónoma de Madrid. Address: Deparmen of Saisics and Economerics, Universidad Carlos III de Madrid, 2893-Geafe. E-mail: aobias@es-econ.uc3m.es ψ Address: Deparamen d Economia. Universia de Girona, Campus de Monilivi, 77 Girona. E-mail: marc.saez@udg.es
. Inroducion In ime series regression dependen and independen variables are measured over ime, and we would like o model he possible relaionship beween hese hrough regression mehods. Examples of epidemiological ime series sudies are he sudies of he relaionship beween moraliy and air polluion (Kasouyanni e al. 996, Balleser e al. 999, Same e al. 2, Kasouyanni e al. 22a), hospial admissions and air polluion (Kasouyanni e al. 996, Touloumi e al. 23), moraliy from sudden infan deah syndrome and environmenal emperaure (Campbell 994) and amospheric pressure (Campbell e al. 2), or infecious gasroinesinal illness (Schwarz e al. 997) and moraliy (Braga e al. 2) relaed o drinking waer. However, various mehods have been used in hese analyses, from linear (Hazakis e al. 986) o log-linear (Mackenbach e al. 992) and Poisson regression models (Schwarz e al. 996), and recenly generalised addiive models (Schwarz 994, Kelsall e al. 997). Time series regression models are especially suiable in epidemiology for evaluaing shor-erm effecs of ime-varying exposures. Typically, a single populaion is assessed wih reference o is change over he ime in he rae of any healh oucome and he corresponding changes in he exposure facors during he same period. Covariaes varying beween subjecs bu no over ime, for example sex, canno confound he associaions and here are no considered. Furhermore, covariaes ha may also vary wihin subjecs, say sex or smoking habi, bu whose daily variaion is unlikely o vary a same ime wih he exposure, can be excluded as confounders. The problem is ha he poenial for confounding in ime series regression is very high. I is imporan ha seasonaliy and rends are properly accouned for. Many variables eiher simply increase or decrease over ime, and so will be correlaed over ime (Yule 926). In addiion many oher epidemiological variables are seasonal, and his variaion would be presen even if he facors were no causally relaed. Simply because he oucome variable is seasonal, i is impossible o ascribe causaliy because of seasonaliy of he predicor variable. For example, sudden infan deahs are higher in winer han in summer, bu his does no imply ha emperaure is a causal facor; here are many oher facors ha migh affec he resul such as reduced dayligh, or presence of viruses. However, if an unexpecedly cold winer is associaed wih an increase in sudden infan deahs, or very cold days are consisenly followed afer a shor ime by rises in he daily sudden infan deah rae, hen causaliy may possibly be inferred (Campbell 994). The following paper reviews he saisical models which have commonly been used in ime series regression, specially allowing for serial correlaion, which make hem poenially useful for seleced epidemiological purposes. An applicaion of how hese mehods are employed is given by an example on he relaionship beween ashma emergency room admissions and phoochemical air polluans in Madrid (Spain) (Galan e al. 23). 2
2. Regression model for couns In he analysis of epidemiological ime series daa consising of couns, he underlying mechanism being modelled is a Poisson process wih a homogeneous risk λ, i.e. he expeced number of couns on day, o he underlying populaion is assumed. The probabiliy of y occurrences on a given day is defined by prob ( y λ) λ e λ = y! y () The Poisson regression model assumes n ( y x ) exp β + β x E = i i (2) i= where x is he column vecor of independen variables on day wih regression coefficiens β and y is he dependen variable on day. The equaion (2) could also be formulaed as a Generalised Linear Model (GLM) (McCullagh and Nelder 989), Link funcion E ( y x ) log = µ ( µ ) = β + n i= β x i i (3) Variance funcion ( y ) V = µ (4) The usefulness of Poisson regression in epidemiology is ha i provides an esimaion of he relaive risk (RR) as RR i =exp(β i ) where β i is he regression coefficien associaed wih a uni incremen in a polluan. 3
3. Misspecificaion in ime series regression 3.. Auocorrelaion A basic assumpion of any regression analysis is ha observaions mus be idenically independenly disribued, ha is x and/or y are no influenced by previous values, say for example x - and y -, respecively. Dealing wih ime series daa his assumpion is usually broken. When he dependen variable, y, is observed over ime, usually all he independen variables, x, have a emporal srucure. As a consequence, he observaions of he response have a emporal dependence, probably due o he effec of misspecificaion, for insance omied variables. Figure presens an example where a posiively correlaed influence causes posiively auocorrelaed residuals. The possible relaionship beween x and y is masked by a clear seasonal paern in y. When his relaionship is isolaed here remains an auocorrelaed srucure for he residuals e. In fac, ofen when confounding facors are correcly accouned for, he serial correlaion of he residuals disappears; hey appear serially correlaed because of he associaion wih a ime dependen predicor variable, and so condiional on his variable he residuals are independen. This is paricularly likely for moraliy daa, where, excep in epidemics, he individual deahs are unrelaed. However, if he model were correc, he residual auocorrelaion should be minimal since one deah does no cause anoher. Thus residual auocorrelaion maybe implies confounding of air polluion associaions due o unmeasured or missmodeled variables. In fac, if he inclusion of known or poenial cofounders fails o remove he serial correlaion of he residuals, hen i is known ha he esimaion mehods does no provide valid esimaes of he sandard errors of he parameers (Campbell 998). For example, analysing he relaionship beween daily moraliy and air polluans he effecs of rend, weaher and unusual evens are no included in such relaionship. These variables are auocorrelaed hemselves and consequenly he residuals will be dependen. In he same way, he relaionship beween daily moraliy and weaher emperaure presens he ypical V-shape (Saez e al. 995). Low environmenal emperaure implies high moraliy and very high weaher emperaure is also relaed o high moraliy. Increasing emperaure up o a cerain poin, however, reduces moraliy. If he regression does no accoun for his fac posiive residuals will be followed by oher posiive residuals and he same even occurs wih negaive residuals. Thus, in ime series regression one can ofen use convenional regression mehods followed by a check for he serial correlaion of he residuals and need only proceed furher if here is clear evidence of a lack of independence. 4
6 y Posiively correlaed influence y =β +β x x Posiively auocorrelaed residuals over ime e Figure : Inadequaely removed rend causing posiively auocorrelaed errors 3. Overdispersion A basic assumpion underlying he use of log-linear regression for Poisson disribued daa is ha he variance of he residual disribuion is compleely deermined by he mean. In pracice, his assumpion ofen fails. This is known as overdispersion. 5
In his case (4) could be replaced by V(y ) = φµ (5) where φ is an scalar capuring he over-dispersion (McCullagh and Nelder 989). 4. Time series regression models for couns 4.. Marginal and condiional models A number of auhors have disinguished marginal and condiional models (Fizmaurice 998). For a marginal model E(y )=f(x,x -,...,x -τ ) where he x 's are exernal imevarying covariaes. This is in conras o a condiional model in which E(y )=f(x,x -,...,x -τ,y -,...,x -υ ), τ, υ, and he pas values of he dependen variable are included as new predicor variables. I has been argued ha marginal models are raher arificial, and give unlikely correlaion srucures. However, hey are very useful for modelling mean raes in populaions. On he oher hand, condiional models are useful for modelling changes in individuals bu are poor a deermining relaionships beween he y and x's variables because he parameers are no readily inerpreable (Saneck e al. 989). 4. Transiional models Brumback e al. (2) unifies he marginal and condiional exension of he GLM for non-gaussian ime series under he heading of Transiional Regression Models (TRM). These are non-linear regression models ha can be wrien in erms of condiional means and variances given pas observaions. The erm ransiional is used raher han condiional o emphasise ha he oucomes are ordered in ime and ha he condiioning is on pas oucomes only, and also o allude o he ransiional probabiliies of Markov models. Raher han specifying he enire probabiliy disribuions of he ransiions beween oucomes, he TRM parameerises he ransiional means and variances. Firsly, he simples way o deal wih hose problems is o included lagged values of he oucome as covariaes in he model; an approach ha could be called ransiional GLM (TGLM) (Brumback e al. 2) n k ( µ ) = β + βixi + θ jf j( xi, y j) log (6) i= j= where f j are (known) funcions of boh, covariaes and pas responses, and θ j denoe unknown parameers. 6
A slighly more sophisicaed approach includes he case of sandardised residuals of earlier observaions as covariaes, he GLM wih ime series errors, GLM wih TSE (Schwarz e al. 996) e log( µ ) = β + (7) n k j βixi + θ j i= j= υ j n where e = y υ, υ = exp β + βixi. However, e could also be scaled by φ in i= order o avoid for possible overdispersion. Comparison beween models could be done by using he Akaike Informaion Crieria (AIC) (Akaike 973) AIC = D + 2df (8) where D denoes he deviance, and df are he degrees of freedom for he model. 4. Generalised Addiive Models The Generalised Addiive Models (GAM) exends he GLM by fiing non-parameric funcions (g i below) o esimae he relaionships beween he response and he predicors (Hasie and Tibshirani, 989) ( µ ) = β + g ( x ) n log (9) i= i i Since hese funcions are unknown infinie dimensional parameers, we could consider esimaing hem by using naural cubic smoohing splines (Wahba 99, Green and Silverman 994). The amoun of smoohing in he splines, echnically he approximae degrees of freedom, could be decided by means of he AIC A spline wih k degrees of freedom for a paricular explanaory variable would be similar o inroducing k dummy variables for he covariae in he model, each one corresponding o a ime period of n/k, where n is he oal number of days (Kelsall e al. 997). However, GAM models could also be formulaed as ransiional models (TGAM) or as a GAM wih TSE n k ( µ ) = β + gi( xi ) + θ jf j( xi, y j) log () i= j= 7
4.3. Exac GAM e log( µ ) = β + () n k j gi( xi ) + θ j i= j= υ j While GAM has been he preferred mehod o model he relaionship beween healh oucome ime series and exposures, mainly air polluans and meeorological variables, recen repors, however, have quesioned he adequacy of is use for ime series epidemiological sudies. Dominici e al. (22) have repored ha in he sandard case of sudies looking for he shor-erm healh effecs of air polluion where: a) regression coefficiens are very small and b) adjusmen is made for a leas wo confounding facors using nonparameric smoohing funcions, esimaed GAM models using he gam funcion in S- Plus (Insighful Corporaion, Seale, WA, USA) may provide biased esimaes of he regression coefficiens and heir sandard errors. This is due o he original defaul parameers were inadequae o guaranee he convergence of he backfiing algorihm. Alhough he defauls have recenly been revised (Dominici e al 22, Kasouyanni e al. 22b), a remaining and imporan problem is ha S-Plus funcion gam calculaes he sandard errors of he linear erms by effecively assuming ha he smooh componen of he model is linear, resuling in an underesimaion of uncerainy (Chambers and Hasie 992; Ramsay e al. 23). Briefly, an explici version for he asympoically exac covariance marix of he linear erms is V( ˆ β ) = H W H (Hasie and Tibshirani 99), where H = { X W( I S) X} X W( I S) ; X is a design marix; W is diagonal in he final IRSL weighs; W = Cov(z) ; z is he working response form he final version of he IRLS algorihm (McCullagh and Nelder 989); and S is he operaor marix ha fis he addiive model involving he smooh erms in he model. Because calculaion of he operaor marix S can be compuaionally expensive, ' he curren version of he S-Plus funcion gam approximaes ( ) V( β ˆ ) = X aug WX ; where X aug is he design marix of he model augmened by he predicors used in he smooh componen (Hasie and Tibshirani 99, Chambers and Hasie 992). Tha is o say, he asympoic variance is approximaed by effecively assuming ha he smooh componen of he model is linear. In ime series sudies, he assumpion of lineariy is inadequae, resuling in underesimaion of he sandard error of he linear erm (Ramsay e al. 23). The degree of underesimaion will end o increase wih he number of degrees of freedom used in he smoohing splines, because a larger number of non-linear erms is ignored in he calculaions. Here, Dominici e al. (23) re-define H as H = { X ( WX WSX) } ( WX WSX) and also provide exac deails of he calculaion of an esimae of he asympoic variance. aug 8
5. Example 5.. Daa Ashma daily emergency room admissions o he Emergency Ward of he Gregorio Marañón Universiy Hospial, was sudied for he period 995-998. The polluans and analyical mehods used were: pariculaes measured as he daily average of NO 2 and average of maximum 8-hourly O 3 values. Polluion daa were obained from he auomaed nework of he Madrid Ciy Comprehensive Air-Polluion Monioring, Forecasing and Informaion. We used mean emperaure and mean relaive humidiy as regisered a he Barajas meeorological observaory, siuaed 8 kilomeres norh-eas of he ciy. Informaion was also obained on repored cases of acue respiraory infecion aended a he Gregorio Marañón Hospial Emergency Ward. Addiional deails have been repored elsewhere (Galán e al. 23). A oal of 4,827 ashma emergency room admissions were regisered during he period 995-998, wih a daily mean of 3.3 and range of -26 emergencies. A oal of 5% of all aacks involved children ages -4 years, 25% of whom were under he age of five years. The emporal disribuion for daily ashma emergency room admissions regisered a seasonal paern, wih wo epidemic peaks occurring in he second fornigh of May 996 and May 998. NO 2 was evenly disribued horough he year and O 3 showed a srong seasonal componen ha peaked during he summer monhs (Figure 2). In general, polluion levels remained below he sandards proposed by he European Communiy. NO 2 and O 3 were slighly negaively correlaed (r=-9). max: 26. O3(max.8h) NO2(mean4h) Ashma ER mean: 3.3 min:. max: 47.5 mean: 67. min: 25.7 max: 52.7 mean: 45.8 min:.9 995 996 997 998 999 Figure 2: Disribuion of ashma emergency room visis and phoochemical polluion levels in Madrid, for he sudy period 995-998 9
5. Parameric modelling For Poisson regression models we followed a sandardised proocol (Kasouyanni e al. 996) which has widely been applied in oher mulicenre sudies (Balleser e al. 999). To conrol for unobserved covariaes wih a sysemaic behaviour in ime we inroduced a linear and quadraic rends and dummy variables for each year o conrol for long wavelengh rends, sinusoidal erms o conrol for seasonaliy and dummy variables for week days and public holidays o conrol for weekly variaion. Covariaes considered were emperaure and humidiy; and daily repored cases of acue respiraory infecion. The variables included in he model were chosen individually, on he basis of heir respecive levels of significance, and joinly on he basis of hose ha minimised he AIC crierion. Once he bes-fied core model had been seleced wih he suppor of Pearson residuals, we hen esed for overdispersion using he overdispersion parameer, and for residual auocorrelaion using he simple (ACF) and parial auocorrelaion funcion (PACF) plos. Finally, four models were considered o assess for he relaionship beween ashma emergency room admissions and phoochemical air polluans: GLM, GLM correced by overdispersion, TGLM, and GLM wih TSE, where he polluans were nex included on a linear basis, wih assessmen of lags up o he fourh order. 5.3. Non-parameric modelling Following Kelsall e al. (997), a long wavelengh rend and seasonaliy were fied using by means of a cubic smoohing spline wih a leas as many degrees of freedom (df) as he number of monhs of he sudy period, and also dummy variables for week days o conrol for weekly variaion. As covariaes, daily mean emperaure, relaive humidiy and daily cases of acue respiraory infecion were fied using cubic smoohing splines, and dummy variables for each day of he week and public holidays. The choice of he number of df for each non-parameric smoohing funcion was made on he basis of minimisaion of he AIC and of observed residual auocorrelaion using he ACF and PACF plos, as well as using cross-validaion of prediced values. Analyses were performed using he S-Plus saisical sofware. Models considered were: sandard GAM Poisson using resricive convergence parameers (convergence precision ε= -, maximum number of ieraions M=, convergence precision of he backfiing algorihm ε bf = -, maximum number of ieraions M bf = of he backfiing algorihm), as suggesed by NMMAPS (Dominici e al. 22) and APHEA2 researchers (Kasouyanni e al. 22b), as well as exac GAM proposed by Dominici e al. (23). 5.4. Resuls Table shows he bes-fied core parameric model using sandard GLM Poisson. The model included a linear rend, dummy variables for each year, sinusoidal erms up o he
sixh order, dummy variables for each day of he week, also for public holidays (work and school), linear and quadraic erms for emperaure and humidiy, and a linear erm for acue respiraory infecions. The bes-fied non-parameric core model using GAM (Table 2) included a cubic smoohing spline wih 72 degrees of freedom o conrol for rend and seasonaliy, dummy variables for days of he week and holidays, and cubic smoohing splines wih 4 degrees of freedom for emperaure and 2 degrees of freedom for relaive humidiy and acue respiraory infecions (Figure 3). Variable β (se) p-value Inercep -.484457 (.3593) -.54 4 Linear rend ().327 (.46) 37. Sin(π/365).4784 (.572) 85 <. Cos(π/365).429834 (.5869) 7.33 <. Sin(2π/365) -.39592 (.3973) -9.97 <. Cos(2π/365) -.79457 (.385) -2.63.8 Sin(3π/365).385377 (.3783) 2.52 <. Cos(3π/365) -.453 (54) -.6.8 Sin(4π/365) -.799 (65) -4.42 <. Cos(4π/365).32782 (338).42.57 Sin(5π/365).6746 (3997) 8.779 Cos(5π/365).52777 (37) 28 3 Sin(6π/365).94375 (3528) 4. <. Cos(6π/365) -.389 (949) -6. <. Year * 996-78 (.55234) -.36.74 997 -.955284 (98587) -3. 998-62756 (.44683) -2.83.5 Day of week ** Tuesday -.9398 (.53522) -2.4.4 Wednesday -.97 (.53249) -.7.87 Thursday -.988 (.533) -.69.9 Friday -.89772 (.54398) -3.49 <. Saurday -.769 (.5522) -3. Sunday -.9282 (.5969) -.56.9 Public holidays.85528 (.73457).6 44 School holidays.94939 (.5638).69.92 Temperaure -934 (.3279) -2 7 Temperaure 2.642 (.466) 3.52 <. Humidiy.3888 (.882) 3.62 <. Humidiy 2-8 (.66) -3.3. Respiraory inf..383 (43) 5.3 <. φ.44 Deviance 27.6 Residual df 43. AIC 23.6 * Reference year was 995 ** Reference day of week was Monday Table : Core model regression coefficiens (β -4 ) and heir sandard errors (se) obained by a GLM sandard Poisson for ashma emergency room admissions
Variable (df) β (se) p-value Inercep.748498 (.973) 65 <. s(trend) (72).56 (.37) Day of week ** Tuesday () -.7272 (8243) -2.55. Wednesday () -.6499 (.654) -..37 Thursday () -.5296 (.677) -.45.653 Friday () -.8646 (.9294) -2..44 Saurday () -23 (.7733) -.32.87 Sunday ().53 (.694).76.447 Public holidays ().84349 (.75463) 62 School holidays () -.574 (.4755) -23 6 s(temperaure) (4) 437 (.334) s(humidiy) (2).98 (69) s(respiraory inf.)(2).488 (74) φ.5 Deviance 73 Residual df 372.6 AIC 888. * Convergence parameers: precision ε= -, maximum ieraions M=, precision of he backfiing algorihm ε bf= -, maximum ieraions M bf= of he backfiing algorihm ** Reference day of week was Monday Table 2: Core model regression coefficiens (β -4 ) and heir sandard errors (se) obained by a GAM* sandard Poisson for ashma emergency room admissions s(trend) -2 - s(temperaure) 5 5 2 3 Trend Temperaure s(humidiy) -.3 - -... s(respiraory infecions) -.4.6 -..4.6 2 4 6 8 2 3 4 5 6 Humidiy Respiraory infecions Figure 3: Non-linear funcions for covariaes (rend, emperaure, humidiy and acue respiraory infecions) in he core model obained by GAM sandard Poisson 2
Figure 4 compares he esimaed seasonal paern using he parameric model and he nonparameric smooh. The parameric model has he same behaviour each year. There is was a single peak of emergency admissions in each spring, and was a shoulder in he summer of each year. The naure of he sinusoidal funcions forces he peak o occur eiher every year or no a all. The non-parameric model allows he spring-osummer difference o change from year o year, which i clearly did in his case. I also shows a high peak capuring he ashma epidemic excesses on he second fornigh in May 996. The parameric core model showed overdispersion (φ=.4) as well as residual auocorrelaion of almos firs order (Figure 5). The non-parameric core model reduced he overdispersion (φ=.5) and did no show residual auocorrelaion (Figure 5). 4-2 - - Non-parameric Parameric 8-6 - 4-2 - - 4-2 - - 8-6 - 4-2 - - 995 996 997 998 999 Figure 4: Fied daily ashma emergency room admissions using a parameric modelling, based on a linear erm and sinusoidal erms up o sixh order (op), versus a nonparameric smooh (boom) 3
ACF for GLM sandard Poisson core model PACF for GLM sandard Poisson core model.8. Auocorrelaion.6.4 Parial auocorrelaion -. - - 2 3 4 5 6 7 8 9 ACF for GAM sandard Poisson core model 2 3 4 5 6 7 8 9 PACF for GAM sandard Poisson core model.8. Auocorrelaion.6.4 Parial auocorrelaion -. - - 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Figure 5: Auocorrelaion and parial auocorrelaion funcions for he core model residuals obained by GLM and GAM sandard Poisson Afer core models were bes-fied, boh phoochemical polluans were nex included on a linear basis, under differen models: GLM sandard Poisson, GLM correced by overdispersion, TGLM and GLM wih TSE allowing for firs order auocorrelaion and also for overdispersion, GAM sandard Poisson, and exac GAM. For any of hese, he lag ha describes he sronges associaion wih ashma emergency room admissions was he lag of 3 days for NO 2, and he lag of day for O 3. Furhermore, saisically significan associaions were observed in he srucure of fourh-order lags for NO 2, and curren-day lag, and second- and fourh-order lags for O 3. Table 3 ses ou he resuls by means of muli-polluan models including joinly bes lags of NO 2 and O 3. Alhough regression coefficiens did no differ subsanially beween parameric models GLM, TGLM, and GLM wih TSE, being highly saisical significan (p<.), sandard errors were considerable increased when overdispersion was allowed for. Allowing for boh auocorrelaion of firs order and overdispersion, by using TGLM or GLM wih TSE, he model goodness of fi in erms of deviance and AIC was improved, and also he residual auocorrelaion was reduced (Figure 6). Therefore, boh models provided similar esimaes. Looking a he nonparameric mehod, GAM models again showed neiher residual auocorrelaion (Figure 6) nor overdispersion (φ=.9) afer including boh air polluans in he model. Even hough regression coefficiens for NO 2 and O 3 sill were saisical significan (p= and p=.3, respecively), bu heir magniude were reduced as well as heir sandard errors. In erms of deviance and AIC, he GAM model provided lower values han previous models based on GLM. When sandard errors were correced using an exac GAM procedure, esimaes for boh polluans were now marginally significan (p=.58 and p=.9). 4
ACF for GLM sandard Poisson PACF for GLM sandard Poisson.8. Auocorrelaion.6.4 Parial auocorrelaion -. - - 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 ACF for TGLM, AR() PACF for TGLM, AR().8. Auocorrelaion.6.4 Parial auocorrelaion -. - - 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 ACF for GLM wih TSE, AR() PACF GLM wih TSE, AR().8. Auocorrelaion.6.4 Parial auocorrelaion -. - - 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 ACF for GAM PACF for GAM.8. Auocorrelaion.6.4 Parial auocorrelaion -. - - 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Figure 6: Auocorrelaion and parial auocorrelaion funcions for he final model residuals obained by GLM sandard Poisson, TGLM, GLM wih TSE, GAM sandard Poisson and exac GAM 5
Air polluan NO 2 (lag 3) O 3 (lag ) Model β (se) p-value β (se) p-value φ Dev. Res.df AIC GLM Sandard Poisson Correced by overdispersion 3.329 3.329 (.858) (.4) <. <. 4.47 4.47 (.5) (.36) <. <..4 24.9 426 25.8 TGLM * AR() 3.423 (.9) <. 484 (.355) 22.4 425 284.4 GLM wih TSE * AR() 3.47 (.9) <. 4.32 (.354) 22.6 425 284.6 GAM ** Sandard Poisson Exac 2.628 2.628 (.86) (.392).58 2.869 2.869 (.96) (.7).3.9.9 76.5 367 887.5 Deviance, Residual degrees of freedom * Also correced by overdispersion ** Convergence parameers: precision ε= -, maximum M=, precision of he backfiing algorihm ε bf= -, maximum ieraions M bf= of he backfiing algorihm. Table 3: Comparison of regression coefficiens (β -4 ) and heir sandard errors (se) for phoochemical air polluans, NO 2 and O 3, obained using differen regression models 6. Discussion We have presened he saisical models commonly used o evaluae he shor-erm effecs of environmenal facors, mainly air polluion, on healh. As we showed, when using ime series regression for couns, i is imporan o accoun properly for boh auocorrelaion and overdispersion. Consequenly, seasonaliy is an imporan issue when dealing wih ime series regression. Mehods for seasonal adjusmen could be based in a parameric approach using a combinaion of rend and sinusoidal erms, or hrough a non-parameric smoohing echnique. The parameric modelling presened a more rigid approach forces he same seasonal paern o repea each year. The nonparameric smoohing echnique, using GAM, allowed more flexibiliy in he conrol of seasonaliy, as well as oher poenial confounders, as was showed in Figure 3. The GLM sandard Poisson model did no conrol adequaely for auocorrelaion or overdispersion, and underesimaed he sandard errors of he esimaes. Oher parameric models which allow for overdispersion and auocorrelaion, TGLM and GLM wih TSE, did no differ subsanially being in agreemen wih hose previously repored. Alhough residual auocorrelaion was low, wha remains was probably due o inflexible conrol of seasonaliy. The GAM applied here did no show residual auocorrelaion as well as reduced overdispersion, and generally lead o lower regression coefficiens of ashma emergency room visis wih higher concenraions of NO 2 and O 3. 6
Sandard errors were also reduced using GAM in comparison wih hose models which conrol for seasonaliy using a parameric mehod. This fac has usually been jusified by he fac ha he residual auocorrelaion was removed by using a non-parameric smooher of ime. Bu when a GAM exac mehod was used, sandard errors were considerably increased, being closer o hose provided by he parameric auoregressive models, TGLM and GLM wih TSE. Alernaive models, ha we do no discuss furher, have also been applied in he analysis of epidemiological ime series. Probably he mos common choice has been he Box-Jenkins mehodology, hrough ransfer funcion modelling (Box and Jenkins 976). This mehodology has radiionally been used for forecasing applicaions in economics. These models are very useful o describe changes over ime, bu he advanage of regression mehods in epidemiology over Box-Jenkins mehodology is ha regression mehods are more flexible. Box-Jenkins mehods only can be applied o daa wih an underlying normal srucure. Box-Jenkins models are buil wih he aim of predicion and use ransformaions in he dependen variables which urn he regression parameers non-inerpreable in an epidemiological manner. Moreover, he use of regression mehods enables he researcher o address for more specific hypoheses common o epidemiology, such as dose-response curves, hreshold models, ineracions, cumulaive effecs, or even effec modificaion. Also inerpreaion of he resuls from a regression model for couns is more familiar and sraighforward for he epidemiologis in erms of relaive risks. However, Box-Jenkins models have also been applied in air polluion (Diaz e al. 999) and emperaure sudies (Saez e al. 995). I has also been showed ha i resuls did no differ from regression mehods when he healh oucome is nonnormally disribued, like hospial or emergency room admissions (Tobías e al. 2). Independenly of he saisical model used, here are differen inerpreaions of ime series when he oucome is moraliy or somehing like admissions o hospial which can occur more han once. The fundamenal difficuly is ha he analysis can only examine shor erm effecs. Le us imagine a daa se in which deahs or hospial admissions were evenly spread hroughou he week, and also suppose ha hrough a clerical error, deahs which occurred before midday on Saurday were included in Friday s oal. Then Friday would have 5% more deahs han he average, and Saurday 5% less. In any ime series regression model, he risk for Friday would appear as.5, and is likely o be highly significan. However, he overall deah rae is unaffeced. In air polluion sudies, i may be ha he air polluion hasens deahs or hospial admissions in suscepible individuals by one day. This is known as harvesing (Zeger e al. 999, Schwarz 2). So, alhough he risk is high, he effec in erms of person-years los in he communiy is likely o be very low. Thus i is imporan o appreciae ha a significan risk is no necessarily an imporan one from a public healh view poin. To examine long erm effecs one has o compare communiies which are sandardised for he main risk facors such as age, sex and race, bu have differen levels of polluion (Kunzli e al. 2). Of course, hisorical levels of polluion also need o be considered, because i is likely ha i will have effec which may ake years o become eviden. 7
Anoher difficuly is ha he effec may ake several days o build up. If deahs occur in he early evening, hey may be aribued o he following day. Thus one should examine lagged effecs of he polluan. This means ha he risk of a paricular polluan should be aribuable o a paricular day. I can be difficul o compare ciies if he lag srucure of he models is differen (Same e al. 2, Kasouyanni e al. 22). A furher problem is in separaing ou he effecs of differen polluans. Mos are very highly correlaed, and i is very difficul o disenangle which are he imporan ones. Saisical soluions are usually somewha of a compromise. However, his is a highly poliical area, because differen polluans have differen sources, such as from cars, lorries or indusry, and blaming one polluan a he expense of he ohers requires very srong evidence from he daa, and his is usually lacking. We have showed ha differen models lead o differen esimaes. Care is needed in heir inerpreaion, and careful reporing so i is clear how variables have been modelled. In his conex, GAM presens he bes model fi in erms of absence of auocorrelaion and reducion of overdispersion, leading o more efficien esimaes. Moreover, GAM can be useful o suggess funcional forms for he parameric modelling, or for checking an exising parameric model for bias. Thus, we venure o sugges he use of GAM mehods in he modelling of epidemiological ime series. References Akaike, H. (973), Informaion heory and an exension of he maximum likelihood principal, in Perov, B. N. and Csaki, F. (Eds.) Second Inernaional Symposium on Informaion Theory, Akademia Kiao, Budapes. Balleser, F., Saez, M., Alonso, M. E., Taracido, M., Ordonez, J. M., Aguinaga, I. and The EMECAM projec: he Spanish mulicener sudy on he relaionship beween air polluion and moraliy (999), The background, paricipans, objecives and mehodology, Revisa Española de Salud Publica, 73, 65-75. Braga, A.L., Zanobei, A. and Schwarz J (2), The ime course of weaherrelaed deahs, Epidemiology, 2, 662-667. Box, G. E. P. and Jenkins, G. M. (976), Time series Analysis, Holden-Day, San Francisco; Holden-Day. Brumback, B. A., Ryan, L. M., Schwarz, J. D., Neas, L. M., Sark, P. C. and Burge, H. A. (2), Transiional regression models, wih applicaion o environmenal ime series, Journal of he American Saisical Associaion, 95, 6-27. Campbell, M. J. (994), Time series regression for couns: an invesigaion ino he relaionship beween Sudden Infan Deah Syndrome and environmenal emperaure, Journal of he Royal Saisical Sociey, Series A, 57, 9-28. Campbell, M. J. (998), Time series regression, in Armiage, P. and Colon, T. (Eds.), Encyclopaedia of Biosaisics, New York, Wiley (pp. 4936-4938). 8
Campbell, M. J., Julious, S. A., Peerson, C. K. and Tobias, A. (2), Amospheric pressure and sudden infan deah syndrome in Cook Couny, Chicago, Paediaric and Perinaal Epidemiology, 5, 287-289. Chambers, J. and Hasie, T. (992), Saisical Models in S. London, Chapman and Hall. Diaz, J., Garcia, R., Ribera, P., Alberdi, J.C., Hernández, E., Pajares, M. S., Oero, A. (999), Modelling of air polluion and is relaionship wih moraliy and morbidiy in Madrid, Spain, Inernaional Archieves of Occupaional and Environmenal Healh, 72, 366-376. Dominici, F., McDermo, A., Zeger, S.L. and Same, J.M. (22), On Generalized Addiive Models in ime series sudies of air polluion and healh, American Journal of Epidemiology, 56, -. Dominici, F., McDermo, A. and Hasie, T. (23), Issues in semi-parameric regression wih applicaions in ime series regression models for air polluion and moraliy, Available a hp://www.ihapss.jhsph.edu/ Fimaurice, G. M. (998), Regression models for discree longiudinal daa, in Everi, B. S. and Dunn, G. (Eds.), Saisical Analysis of Medical Daa, London, Arnold. Galán, I., Tobías, A., Banegas, J. R. and Aranguez, E. (23), Shor-erm effecs of air polluion on daily ashma emergency room admissions in Madrid, Spain. European Respiraory Journal, 22: 82-88. Green, P. J. and Silverman, B. W. (994), Nonparameric Regression and Generalized Linear Models, London, Chapman and Hall. Hasie, T. J. and Tibshirani, R. J. (99), Generalized Addiive Models, London, Chapman and Hall. Hazakis, A., Kasouyanni, K., Kalandidi, A., Day, N. and Trichopoulos, D. (986), Shor-erm effecs of air polluion on moraliy in Ahens, Inernaional Journal of Epidemiology, 5, 73-8. Kasouyanni, K., Schwarz, J., Spix, C., Touloumi, G., Zmirou, D. and Zanobei, A. (996), Shor erm effecs of air polluion on healh: a European approach using epidemiologic ime series daa: he APHEA proocol, Journal of Epidemiology and Communiy Healh, 5 (Suppl.), S2-S8. Kasouyanni, K., Touloumi, G., Samoli, E., Gryparis, A., Le Terre, A., Monopolis, Y. and Rossi, G. (22), Confounding and effec modificaion in he shorerm effecs of ambien paricles on oal moraliy: resuls from 29 European ciies wihin he APHEA 2 projec, Epidemiology, 2, 52-53. Kasouyanni, K., Toloumi, G., Samoli, E., Gryparis, A., Manopolis, Y., and Le Terre, A. (22b), Differen convergence parameers applied o he S-Plus gam funcion, Epidemiology, 3, 742. Kelsall, J. E., Same, J. M., Zeger, S. L. and Xu, J. (997), Air polluion and moraliy in Philadelphia, 974-988, American Journal of Epidemiology, 46, 75-762. Kunzli, N., Medina, S., Kaiser, R., Quenel, P., Horak, F. Jr. and Sudnicka, M. (2), Assessmen of deahs aribuable o air polluion: should we use risk esimaes 9
based on ime series or on cohor sudies?. American Journal of Epidemiology, 53, 5-55. McCullagh P. and Nelder J. (989), Generalised Linear Models, London, Chapman and Hall. Mackenbach, J. P., Knus, A. E. and Looman, C. W. N. (992), Seasonal variaion in moraliy in he Neherlands, Journal of Epidemiology and Communiy Healh, 46, 26-265. Ramsay, T., Burne, R. and Krewski D. (23), The effec of concurviy in generalized addiive models linking moraliy and ambien air polluion, Epidemiology, 4, 8-23. Saez, M., Sunyer, J., Casellsagué, J. and Anó J. M. (995), Relaion beween emperaure and moraliy, Inernaional Journal of Epidemiology, 24, 576-582. Same, J. M., Zeger, S. L., Dominici, F., Curriero, F., Coursac, I. and Dockery, D. W. (2), The Naional Morbidiy, Moraliy, and Air Polluion Sudy. Par II: Morbidiy and moraliy from air polluion in he Unied Saes, Research Repor of he Healh Effecs Insiue, 94 (P 2), 5-7. Schwarz, J. (994), Non-parameric smoohing in he analysis of air polluion and respiraory illness, Canadian Journal of Saisics, 4, 47-487. Schwarz, J., Spix, C., Touloumi, G., Bacharova, L., Barumamdzadeh, T. and le Tere, A. (996), Mehodological issues in sudies of air polluion and daily couns of deahs or hospial admissions, Journal of Epidemiology and Communiy Healh, 5 (Suppl. ) S3-S. Schwarz, J., Levin, R. and Hodge, K. (997), Drinking waer urbidiy and paediaric hospial use for gasroinesinal illness in Philadelphia, Epidemiology, 8, 65-62. Schwarz, J. (2), Harvesing and long erm exposure effecs in he relaion beween air polluion and moraliy, American Journal of Epidemiology, 5, 44-448. Saneck, E. J., Sheerley, S. S., Allen, L. H., Pelo, G. H. and Chavez A. (989), A cauionary noe on he use of auoregressive models in analysis of longiudinal daa. Saisics in Medicine, 8, 523-528. Tobías, A., Díaz, J., Sáez, M. and Alberdi, J. C. (2), Use of Poisson regression and Box-Jenkins models o evaluae he shor-erm effecs of environmenal noise levels on daily emergency admissions in Madrid, Spain, European Journal of Epidemiology, 7, 765-77. Touloumi, G., Akinson, R., Le Terre, A., Samoli, E., Schwarz, J., Schlinder, C., Vonk, J. M., Rossi, G., Saez, M., Rabszenko, D. and Kasouyanni, K. (23), Analysis of healh oucome ime series daa in epidemiological sudies. Environmerics (in press). Wahba, G. (99), Spline Funcions for Observaional Daa. Philadelphia, CBMS-NSF Regional Conference Series, SIAM. Yule, G. U. (926), Why do we someimes ge nonsense-correlaions beween ime series?. A sudy in sampling and he naure of ime series, Journal of he Royal Saisical Sociey, 89, 87-227. 2
Zeger, S.L., Dominici, F. and Same, J. (999), Harvesing-resisan esimaes of air polluion effecs on moraliy, Epidemiology,, 7-75. 2