Intrnationa Journa of Basic & Appid Scincs IJBAS-IJNS Vo:1 No:0 11 Association Rus of Data Mining Appication for Rspiratory Inss by Air Poution Databas Caroyn Payus, Nora Suaiman, Mazrura Shahani and Azuraiza Abu Bakar Abstract xposur to air poution has bn ratd with vary advrs hath ffcts. This study aims to assss th impact of air poution to th numbr of hospitaization for rspiratory inss in Kuaa umpur as th cas study. Kuaa umpur, th capita city of Maaysia, is an urban and industriaizd city in th tropica cimat of Maaysia that oftn xprincing has highst rcord of svr rspiratory inss du to air poution. Th ffcts of air poution on hath triggrs oxidativ strss and infammation, and it is pausib that high vs of air poutants causing th high numbr of hospitaizations. In this study, an intignt approach in data mining cad association rus has bn usd basd on its capabiity to sarch for an intrsting rationship among attributs in a argr databas and to its abiity to hand uncrtain databas that oftn occurs in th ra word probm. Association rus mining is a discovry of association rationships, frqunt pattrns or corrations among sts of itms or mnts in databass. In air poution and hathcar databas, association rus ar usfu as thy offr th possibiity to conduct intignt diagnosis and xtract invauab information and buid important knowdg bass quicky and automaticay, in ordr to dvop ffctiv stratgis to minimiz th hath xposur to th air poution. A tota of 10 data wr obtaind from th Dpartmnt of nvironmnt Maaysia and Maaysian Ministry of Hath. Thr wr six attributs usd as input and on attribut as an output for th association ru mining. Data has bn through a pr-procssing stag to faciitat th rquirmnt of th moding procss. As for concusion, association rus mining has givn a promising rsut with mor than 0% accuracy and th rus obtaind hav contributing to knowdg for th rspiratory inss. Indx Trm Association ru, air poution, rspiratory inss, data mining I. INTRODUCTION Can air is considrd to b a basic rquirmnt of human hath and w-bing, howvr air poution continus to pos Caroyn Payus is with th nvironmnta Scinc Program, Schoo of Scinc & Tchnoogy, Univrsiti Maaysia Sabah (UMS), Kota Kinabau, MAAYSIA (-mai: cpayus@ gmai.com). Nora Suaiman is from Schoo of nvironmnta and Natura Rsourcs Scinc, Univrsiti Kbangsaan Maaysia (UKM), 400 Bangi, Sangor, MAAYSIA (-mai: noraganun@gmai.com). Azuraiza Abu Bakar is with Schoo of Computr Scinc, Univrsiti Kbangsaan Maaysia (UKM), 400 Bangi, Sangor, MAAYSIA (mai:aab@ftsm.ukm.my) Mazrura Shahani is from Facuty of Hath Scinc, Univrsiti Kbangsaan Maaysia (UKM), 400 Bangi, Sangor, MAAYSIA (mai:mazrura@fskb.ukm.my) a significant thrat to hath wordwid (WHO 011). Th ffcts of air poution on hath hav shown ffcts ranging from minor y irritations to uppr rspiratory symptoms, chronic rspiratory disass, cardiovascuar disass and ung cancr, that may rsut in hospita admission and vn dath (Zhng 011; Sousa t a. 00; Ragas t a. 011). Th impacts of air poution on human hath can b assss in trms of a rduction in avrag if xpctancy, additiona prmatur daths, absnt in work pac or schoo, hospita admissions and th incras us of mdication and days of rstrictd activity (A 007; Rosa t a. 00; Png and Dominici 00). In dvopd countris in urop and Unitd Stats (US), gisation and guidins rgarding th concntrations of air poutants in ambint air has bn stabishd basd on th pidmioogica, toxicoogica and cinica vidnc (WHO 00). Howvr, in dvoping countris, and rcnty nwy industriaizd countris, such as Maaysia, studis on this mattr hav bn ignord and startd atr than in urop and US. Nvrthss, gisation rgarding air poution standards in Maaysia rmains th sam sinc 17 (Omo t a. 011), thus aowing vs that hav bn provd to hav srious ffcts on human hath, spciay on chidrn and dry xposd to thm. Assssmnt on air poution bhavior and thir impacts on hath wi hp dcision makrs to undrstand bttr its ffcts, as w as th bnfits that coud b achivd through th appication of contro masurs. Th causs of rspiratory inss and air poution ar dpnd on various factors incuding th poutant missions, atmosphric chmica procsss, topography, mtoroogica conditions and soar radiation (Sinfd and Pandis 1). Th compx mchanism of air poution formation and rspiratory ffcts maks it vn mor compx and difficut to contro. In ordr to undrstand it is ncssary to appy an intignt approach that can dscrib th compx rationship btwn air poution concntrations and th many variabs that caus or hindr th rspiratory ffcts. Th compxity maks appying th convntiona statistica anaysis to air quaity and rspiratory inss as infficint task as it mosty basd on basic inar princips (Braak 1). Though th statistica mthods may provid rasonab rsuts, but ths ar ssntiay incapab of capturing th important knowdg of th compxity and non-inarity of th poution-advrs impacts rationships (Chakraborty t a. 1). Thrfor, it is xpctd that it wi undrprform whn us to mod th rationship btwn air poution and th hath ffcts that xtrmy non-inar. In th past fw yars, th coction of air quaity and cinica data has gnratd an urgncy nd for nw tchniqus and toos that can intignty and automaticay 150-7474- IJBAS-IJNS @ Jun 01 IJNS I J N S
Intrnationa Journa of Basic & Appid Scincs IJBAS-IJNS Vo:1 No:0 1 transform th procssd data into usfu information and knowdg (Fayyad t a. 1). Data mining which is aso known as knowdg discovry in databass is a procss of nontrivia xtraction of impicit, prviousy unknown and potntiay usfu information from data in databass. In gnra, data mining is an ssntia procss in knowdg discovry whr intignt mthods ar appid in ordr to xtract th important data pattrns. Data mining can b usd as an intignt diagnostic too in hathcar. In cinica data, it is possib to xtract th knowdg about th vatd concntrations of air poution that causd th rspiratory inss from th patint masurmnts. In addition, in rsarch data th xtraction knowdg coud b th information about th v of concntration that has bn xposd to th patints that hav causd th rspiratory sicknss. Consqunty, data mining has bcom important rsarch domain in nvironmnta and aso in hathcar. In this papr, w hav appid on of th association rus mining agorithms, namy th apriori agorithm, and appy it in xtracting knowdg from a cinica databas from rspiratory patints, for air poution impacts anaysis. II. MTHODOOGY Data mining is th procss of discovring intrsting knowdg, such as pattrns, associations, changs, anomais and significant structurs, from arg amounts of data stord in databass, data warhouss, or othr information rpositoris. Mining association rus is on of th tchniqus invovd in th procss mntiond abov and usd in this papr. Association rus ar th discovry of association rationships or corrations among a st of itms. Association ru mining sarch for th intrsting rationships among attributs in th databas. Association rus ar simiar to cassification rus xcpt that thy can prdict any attribut and not just th cass, and this aows thm to prdict combination of th attributs. Diffrnt association rus xprss diffrnt rguaritis that undri th datast, and thy gnray wi prdict diffrnt things. Bcaus of so many intrsting association rus can b drivd from vn a tiny datast, intrst is rstrictd to thos that appy to a rasonaby arg numbr of instancs and hav a rasonaby high accuracy on th instancs to which thy appy to. Th covrag of an association ru is th numbr of instancs for which it prdicts corrcty (Zhou 00). This if oftn cad its support. Its accuracy oftn cad confidnc is th numbr of instancs it prdicts corrcty xprssd as a proportion of a instancs to which is appis. For xamp, in this rsarch, th ru Air Poutant (T, SS ) ==> Rspiratory Inss (T, SS ), mans if air poutant,t, is ss thn, T, rspiratory inss is ss. According to Mahotra and Vnugopa 011, th accuracy is th proportion of th days whn air poutant is ss than th man air poution aso has rspiratory inss ss than th man rspiratory inss, xprssd in prcntag or fraction. It is usua to spcify minimum support (covrag) and th confidnc (accuracy) vaus and to sk ony thos rus whos support and confidnc ar at ast qua to ths spcifid minima. Rus that satisfy both minimum support thrshod and minimum confidnc thrshod ar cad strong. Gnray support and confidnc vaus ar xprssd btwn 0% to 100% rathr than 0 to 1.0. Thr ar two mthods for mining th form of association rus which is th Booan association rus (Harms and Dogun 004). On is a basic agorithm for finding frqunt itm sts and anothr on is th frqunt pattrn growth mthods which adopts a divid and conqur stratgy. Apriori agorithm (Wittn and Frank 00) for mining frqunt itm sts for Booan association rus is usd in th prsnt study. Th agorithm mpoys an itrativ approach known as v-wis approach whr k itm sts ar usd to xpor (k+1) itm sts. In nvironmnta, particuary on air poution association rus ar usfu to summariz poutants vs into groups (catgorizd) and to buid mod for patints prdiction (Wang 005). In this study, it invovs fiv major phass, namy (i) data sction, (ii) pr-procssing, (iii) data mining; (iv) tstinf and vauation; and (v) knowdg discovry, as shown in th framwork of this study in Figur 1. Basd on th framwork, th stag of pr-procssing and data prparation is don in two stps, which wr during caning and intgration of data coction and data sction and transformation. Pr-procssing is don so that th gnratd rus at th nd of th study wi b th crtainty and riab rus as a knowdg basd. In this stag, svra phass hav bn carrid out, which wr th data intgration, data caning, attribut sction and data rduction. Data caning is rquird whn thr ar incompt attributs or missing vaus in data. It invovd fiing th missing vaus, smoothing noisy data, idntifying outirs and corrcting th data inconsistncy. Data intgration combins data from mutip sourcs to form a cohrnt data stor. Mtadata, corration anaysis, data confict dtction and rsoution of smantic htrognity contribut towards smooth data intgration. Data transformation convrts th data into appropriat forms for data mining that dpnds on th mining tchniqu. In th cas of dvoping a knowdg basd mod, data ar rquird to b discrtizd. This is bcaus th rough cassification agorithm ony accpts catgorica attributs. Discrtization invovs rducing th numbr of distinct vas for a givn continuous attribut by dividing th rang of th attribut into intrvas. Intrva abs can thn b usd to rpac actua data vaus. Fig. 1. Mthodoogy framwork 150-7474- IJBAS-IJNS @ Jun 01 IJNS I J N S
Intrnationa Journa of Basic & Appid Scincs IJBAS-IJNS Vo:1 No:0 1 A tim sris datasts wr obtaind from Maaysian Ministry of Hath and th Dpartmnt of nvironmnt Maaysia consists of 1000 ins with 7 attributs of PM 10, CO, SO, NO, O tmpratur and numbr of hospita admissions (rspiratory inss patints). Th first svn attribut wr usd as an input or prdictor attribut, whi th ast attribut which was patints, as th targt knowdg (output). Tab I shows for ach attribut and th cassifirs. Basd on ths figurs, th air quaity and cinica data obtaind is continuous, drivd from th study ara, Kuaa umpur for 1 January 00 ti 1 Dcmbr 00. N o T AB I ATTRIBUTS ON AIR QUAITY PARAMTR AND PATINTS Attribut Masurm nt Unit Data Notattion 1 SIT not rvant Station DAT month/day/y ar Dat O Ozon 4 PM 10 Particuat Mattr 5 CO Carbon Monoxid SO Suphur Dioxid 7 NO Nitrogn Dioxid TMP Csius PATINT S - Tmpratur Numbr of Rspiratory Inss Patints Data Sca Kuaa umpur 1 Jan 00 hingga 1 Dcmb r 00 0.15 1 4 0.1 1.41 1 0.05 4.. 7 - Tab II shows th first tn rows of th data sts that wr coctd. A attributs in th datast contain highy distinct vaus that rquird to b handd. Th natur of association rus is that th data to b modd ar in discrt form. Thrfor, discrtization is rquird to transfr th data in rangs of catgoris. T AB II FIRST TN ROWS OF TH RAW DATASTS Dat O PM 10 CO SO NO Tm p 1/1/0 1//0 1//0 1/4/0 1/5/0 1//0 1/7/0 1//0 1//0 0.0 0.04 0.04 0 0.0 0.05 0.07 0.0 1 0.04 0.07 7 1 4 0 7 4 47 4 0.4 0. 0.4 0.1 5 0.7 0 0.4 1 1.14 4 1. 4 1.0 1 4 5 5 7 0.0 0 0.0 0.0 0.0 Patin t 7.0 0.. 4. 5 5. 5. 0 5. 5.0 1 5. 0 Discrtization data is sufficint spciay for arg numbr of datasts that invovd and having a ot of incompt attributs or missing vau in th data. In this study discrtization has bn don by first prforming svra statistica anayss to invstigat th distribution of vaus in ach attributs. Th qua frquncy binning mthod was usd to discrtiz th data (Han and Kambr 001). Tab III dpicts th rsuts of data discrtization on ach attribut. T AB III FIRST TN ROWS OF TH DISCRTIZ DATASTS AFTR BINNING O PM 10 CO SO NO Tmp Patint ow ow ow ow ow Modrat Modrat Modrat Modrat Modrat Modrat III. APPICATION & RSUT Th association ru mod was conductd with minimum support of 0.1 and minimum confidnc of 0.1. Th numbr of association rus gnratd was 4 rus with th highst 150-7474- IJBAS-IJNS @ Jun 01 IJNS I J N S
Intrnationa Journa of Basic & Appid Scincs IJBAS-IJNS Vo:1 No:0 14 confidnc vau of 0.. In addition, thr ar 4 major itms that wr idntifid with a ngth of 1 to of patints forcasting mod as shown in Tab IV. T AB IV T OTA OF -ITMST -itmst Siz No. Rus -itmst Patints Datast 1 1 1 5 4-5 - Thn sum ru has four rang vaus for th coctd confidnc v. Th four rang of th confidnc v ar to 0.40 rprsnt as a wak ru; 0.41 to 0.0 as for modrat ru; 0.1 to 0.0 gnra or common rus and 0.1 to 1.00 for strong ru. In this rsarch th association ru mod has gnratd 17 rus for NORMA hospitaizd patints, 1 ru for HIGH hospitaizd and 4 rus for MODRAT hospitaizd patints. In Figur shows th output of th association rus that wr obtaind from Wka.7 patform, with minimum support = 0.1 and minimum confidnc = 0.1. Aftr th fina scrning procss, summarizd in Appndix 1, th gnratd association rus indicat that PM 10, CO and tmpratur ar strongy associatd with th numbr of hospitaization of patints. Th gnratd association rus aso show that thr is som association btwn HIGH patints with CO but with a wak confidnc vau of 0.15. Tmpratur and air poutants such CO and PM 10 ar gnray highy corratd in many pacs (Hogat t a. 1) and thy may intract significanty to affct hath outcoms (Choi t a. 17; Robrts 004). For xamp, Katsouyanni t a. 1 rportd that th air poution and ambint tmpratur had synrgistic ffct on xcss mortaity during th 17 hat wav in Athns. Thy hav found a statisticay significant modification of tmpratur on th association btwn xposurs to SO, CO and tota xcss mortaity, athough th main ffct of this poutant was not statisticay significant. Robrts (004) found that tmpratur modifid th association btwn PM 10 and mortaity. Our findings aso found that tmpratur PM 10, CO and tmpratur ar significanty associatd with th patint hospitaizations. Ths support th hypothsis that air poutants aong with tmpratur might contribut to hath outcoms. xposur to air poutants such PM 10, SO, CO may dircty affct airways through inhaation, incuding uppr airways, bronchio and avous. Th xposur coud moduat th automatic nrvous systm and might furthr infunc th cardiovascuar systm (Gordon 00; Jffry 1). Som studis hav shown that PM 10 is associatd with dcrasd hart rat variation (Crason t a. 001; God t a. 000). Ambint tmpratur changs aso affct physioogica and psychoogica strsss to our body systm (Gordon 00), which coud aggravat th pr-xisting disass. Thrfor, both air poutants and tmpratur may intract to synrgisticay ffct human morbidity, thus mortaity. === Run information === Schm: wka.associations.apriori -N 1000 -T 0 -C 0.1 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -A -c -1 Ration: PatintsData_Can-wka.fitrs.unsuprvisd.attribut.N umrictonomina-rfirst-ast-wka.fitrs.unsuprvisd.a ttribut.rmov-r-7 Instancs: Attributs: 7 SO NO O PM10 CO TMP PATINTS === Associator mod (fu training st) === Apriori ======= Minimum support: 0.1 (7 instancs) Minimum mtric <confidnc>: 0.1 Numbr of cycs prformd: 1 Gnratd sts of arg itmsts: Siz of st of arg itmsts (1): 1 Siz of st of arg itmsts (): 1 Siz of st of arg itmsts (): 5 Fig.. Run information of th association rus Th major strngth of this study is, to our knowdg, th first study vr using an intignt diagnosis cad th association rus, to xtract invauab information and association pattrns from th databas. Howvr, this study aso has on important ky imitation. This study was carrid out in a sing city with a tropica cimat and for a yar data, 00, though using houry databas, ar actuay not xtnsiv. Caution is ndd whn intrprting any such tim-sris study within a sing ocation. Thrfor, w suggst for futur work, it is bttr to invov at ast diffrnt ocations for comparisons, so that th findings wi b mor gnraiz, vaid and consistnt to othr pacs. Howvr, this study is sti th piot-study and pionr introducing association rus for air poution and rspiratory databas, and th variation woud not b that significant and svr to th study (Kim t a. 005). IV. CONCUSION This papr has givn a promising and vauab contribution spciay to th air poution managmnt. It is th first attmpt using th association rus in trying to undrstand th air poution formation to its ffct to th 150-7474- IJBAS-IJNS @ Jun 01 IJNS I J N S
Intrnationa Journa of Basic & Appid Scincs IJBAS-IJNS Vo:1 No:0 15 rspiratory inss, thus to sov nvironmnta issu. Th knowdg mod obtaind can b usd as a dcision support systm to gain sts of knowdg that is usfu in trms of prvnting th vatd xposur of th hazardous air poutants that givs mor impact on th rspiratory inss. From th association rus bas knowdg as w, w can know what ar th bst combinations or associations of th air poutants that contributs to highr hath risk, so that mor action pans can b don in rsoving th probm. Association ru data mining producs knowdg that is undrstandab and can b intrprtd asiy. This is an advantag compar to othr arning agorithm in convntiona anaysis. From this study, it indicatd svra important attributs combinations that hav strong infunc to th rspiratory inss of patints, such as PM 10, CO and tmpratur. RFRNCS [1] Choi, K., Inou, S. and Shinozaki, R. 17. Air poution, tmpratur, and rgiona diffrncs in ung cancr mortaity in Japan. Archaoogy nvironmnta Hath, Vo. 5, pp. 10-1. [] Crason, J., Nas,., Wash, D. and Shdon,. 001. Particuat mattr and hart rat variabiity among dry rtirs. Journa of xporatory Anaytica nvironmnta pidmioogy, Vo. 11, pp. 11-1. [] A. 007. urop nvironmnt: Th 4 th Assssmnt. uropan nvironmnt Agncy, Copnhagn. [4] Fayyad, U.M., Piattsky-Shapiro, G., Smyth, P. and Uthurusamy, R.1. Advancs in Knowdg Discovry and Data Mining. AAAI/MIT Prss. [5] God, D.R., Schwartz, J., ovtt,., arson, A. and Naring, B. 000. Ambint poution and hart rat variabiity. Circuation, Vo. 101, pp. 17-17. [] Gordon, C.J. 00. Ro of nvironmnta strss in th physioogica rspons to chmica toxicants. nvironmnta Rsourcs, Vo., pp. 1-7. [7] Harms, S.K. and Dogun, J.S. 004. Squntia association ru mining with tim ags. Journa of Intignt Information Systms, Vo., pp.7-. [] Hogat, S.T., Samt, J.M., Korn, H.S. and Maynard, R.. 1. Air Poution and Hath. Acadmic Prss, ondon. [] Jffry, P. 1. ffcts of cigartt and air poutants on th owr rspiratory tract. Air Poution and Hath. Acadmic Prss, Sydny. [10] Katsouyanni, K., Pantazopouou, A. Tououmi, G. and Asimakopouos, D. 1. vidnc for intraction btwn air poution and high tmpratur in th causation of xcss mortaity. Archaoogy nvironmnta Hath, Vo. 4, pp. 5-4. [11] Kim, D., Sass-Kortsak, A., Purdham, J.T. and Brook, J.R. 005. Associations btwn prsona xposurs and fixd-sit ambint masurmnts of fin particuat mattr, nitrogn dioxid and carbon monoxid in Toronto, Canada. Journa of xporatory Anaytica nvironmnta pidmioogy, Vo. 1, pp. 1-1. [1] Omo, N.R.S., Sadiva, P.H.N., Braga, A..F., in, C.A., Santos, U.P. and Prira,.A.A. 011. A rviw of ow v air poution and advrs ffcts on human hath: impications for pidmioogica studis and pubic poicy. Cinica, Vo., pp. 1 0. [1] Png, R.D. and Dominici, F. 00. Statistica Mthods for nvironmnta pidmioogy with R: A Cas Study in Air Poution and Hath. Springr, Nw York, pp.. [14] Ragas, A.M., Odnkamp, R., Prkr, N.., Wrnick, J. and Schink, U. 011. Cumuativ risk assssmnt of chmica xposurs in urban nvironmnts. nvironmnt Intrnationa, Vo. 7, pp. 7 1. [15] Robrts, S. 004. Intraction btwn particuat air poution and tmpratur in air poution mortaity tim sris studis. nvironmnta Rsourcs, Vo., pp. -7. [1] Rosa, A.M., Ignotti,., Hacon, S.S. and Castro, H.A. 00. Anaysis of hospitaizations for rspiratory disass in Tangará da Srra, Brazi. Intrnationa Journa of nvironmnta Hath, Vo. 4, pp. 575 5. [17] Sousa, S.I., Avim-Frraz, M.C.M., Martins, F.G. and Prira, M.C. 011. Spiromtric tsts to assss th prvanc of chidhood asthma at Portugus rura aras: Infunc of xposur to high ozon vs. nvironmnt Intrnationa, Vo. 7, pp. 474 47. [1] Wang, K.S. 005. Mining customr vau from association rus to dirct markting. Data Mining and Knowdg Discovry, Vo. 11, pp. 57-7. [1] Wittn, I.H. and Frank,. 00. Data Mining: Practica Machin arning Toos and Tchniqus, ISBN 7-1-1-0050-, Morgan Kaufmann Pubishrs. [0] WHO. 00. Air Quaity Guidins - Goba updat 005. Word Hath Organization, Copnhagn, Dnmark. Rgiona Offic for urop. [1] WHO. 011. Word Hath Statistics. Word Hath Organization, Franc Rgiona Offic for urop. [] Zhng, M. 011. Hong Kong: Particuat air poution and hath impacts. ncycopdia nvironmnta Hath, pp. 5-1. [] Zhou, H. 00. Tim ratd association rus mining with attributs accumuation mchanism and its appication to traffic prdiction. Journa of Advancd Computationa Intignc and Intignt Informatics, Vo.1, pp. 47-47 150-7474- IJBAS-IJNS @ Jun 01 IJNS I J N S
Intrnationa Journa of Basic & Appid Scincs IJBAS-IJNS Vo:1 No:0 1 Appndix 1: Apriori association ru of patint prdiction mod ATRIBUT 1 ATRIBUT ATRIBUT PATINT O=HIGH CO=NORMA O=NORMA O=HIGH O=HIGH O=NORMA O=NORMA O=NORMA O=NORMA O=HIGH PM10=HIGH O=HIGH O=NORMA CO=NORMA TMP=HIGH TMP=HIGH TMP=HIGH TMP=HIGH TMP=OW CO=NORMA CO=NORMA CO=NORMA CO=NORMA CO=NORMA TMP=NORMA CO=NORMA PM10=HIGH TMP=HIGH CO=NORMA TMP=NORMA TMP=NORMA MODRAT MODRAT MODRAT MODRAT MODRAT MODRAT MODRAT MODRAT MODRAT MODRAT CON F 0. 0. 0. 0. 0. 0.7 0.7 0.5 0.57 0.55 NORMA 0.55 MODRAT 0.55 NORMA 0.54 MODRAT 0.54 NORMA 0.54 MODRAT MODRAT 0.5 0.4 CO=NORMA NORMA 0.4 150-7474- IJBAS-IJNS @ Jun 01 IJNS I J N S