Usng an Adaptve Fuzzy Logc System to Optmse Knowledge Dscovery n Proteomcs James Malone, Ken McGarry and Chrs Bowerman School of Computng and Technology Sunderland Unversty St. Peter s Way, Sunderland, SR6 0DD, Unted Kngdom e-mal: {james.malone, ken.mcgarry, chrs.bowerman}@sunderland.ac.uk Abstract: The growth of bomedcal databases has seen a demand for data mnng technques to effcently and effectvely analyse the data contaned wthn. One mportant consderaton s the need to nclude expert s opnons wthn the knowledge dscovery process. However, ths can be dffcult to accomplsh when such heurstcs are presented n loosely defned, fuzzy terms. We present the use of an Adaptve Nero-Fuzzy Inference System (ANFIS) as an approach to optmzng such fuzzy opnons wth the long term am of ncorporatng such optmzed rules nto goal and data drven data mnng of proteomc s data. Keywords: ANFIS, fuzzy opnons, data mnng, proteomcs data, 2-DE gel analyss 1. Introducton In the last decade, proteomcs has grown rapdly as an area of research, largely drven by technologcal mprovements, and s often descrbed as the next step to dramatcally advance drug dscovery [1]. Wthn proteomcs, two-dmensonal electrophoress (2-DE) s unrvalled as a technque to analyse proten expresson [2, 3] and s a key component of current proteomcs research [4]. 2-DE s capable of separatng thousands of protens accordng to ther electrcal charge and molecular weght. Such experments can create large amounts of hgh-dmensonal, spato-temporal expermental data. Usng stanng technques, these protens appear as spots on these gels (as shown n Fgure 1). The gel mages are dgtsed by mage processng and analyss software, such as the popular Progeness product whch produced the mages n Fgure 1. Analyss of the data generated from such 2-DE gel experments has been dentfed as the most mportant task n proteomcs [5]. However, dscoverng nformaton from ths data s a complex and dffcult task because of the large number of spots, varatons n shape locaton and sze [6]. Ths problem s compounded when consderng that experments are often conducted over a seres of gels, hence, the am s detectng trends across a large number of mages. At present, the analyss of 2-DE gel data s normally conducted manually whch s both tme consumng and requres consderable expertse. Further development of database analyss algorthms would be of huge beneft to proteomcs [7] snce current methods are unable to handle and analyse such results meanngfully [8]. Snce current analyss s conducted by experts usng heurstc knowledge gathered over years of experence wthn the area, an mportant consderaton would be the ncorporaton of such expert doman knowledge wthn any automated analyss method devsed. One ssue relatng to ths expert knowledge s that expert s opnons vary greatly from nsttuton to nsttuton. Followng knowledge acquston conducted on 20 experts, results presented varyng opnons on metrcs whch are used to dentfy nterestng proten characterstcs. Ths presents problems when attemptng to extract optmum values for such metrcs for ncorporaton nto goal drven data mnng for trend analyss purposes. 80 RASC2004
(a) Fgure 1. (a) 2-DE Gel Image, proten spots appearng as black dots on the mage (b) a secton of the gel showng several proten spots, descrbng the 3 dmensonal aspect of the gel (b) We nvestgate the use of an adaptve fuzzy logc controller, specfcally Adaptve Nero-Fuzzy Inference System (ANFIS) [9] to optmse the fuzzy f-then type rules obtaned from human experts. The long term am of such an nvestgaton s to ncorporate these optmsed rules nto goal drven data mnng to dentfy nterestng protens from 2-DE gel data. The remander of the paper s structured as follows. Secton 2 descrbes the ANFIS model wth a break down of each layer, secton 3 descrbes the Proteomc s data used and nput parameters to the model and presents results of the tranng as well as an example of some of the optmsed fuzzy rules extracted followng ths process. Fnally, conclusons are drawn n secton 4. 2. An adaptve nero-fuzzy nference system ANFIS s a fuzzy nference system mplemented wthn the archtecture and learnng procedure of adaptve networks [9, 10, 11]. An adaptve network s a superset of all knds of feed-forward neural networks wth supervsed learnng capablty. ANFIS can be used to optmse membershp functons to generate stpulated nput output pars and has the advantage of beng able to subsequently construct fuzzy f then type rules representng these optmsed membershp functons. There are varous successful examples of ANFIS used n bomedcal applcatons [12, 13, 14]. 2.1 ANFIS Archtecture The ANFIS structure used n ths paper conssted of a 5-layer Sugeno type archtecture, a typcal example of whch s seen n Fgure 2. In ths example, two nputs are used (x,y) and one output (f) (whch s a lmtaton of Sugeno-type systems,.e. that there s only a sngle output, obtaned usng weghted average defuzzfcaton (lnear or constant output membershp functons)). Fgure 2. A 5-laye r ANFIS structure [9] 81 RASC2004
In the frst layer, all nodes are adaptve, s the degree of the membershp of the nput to the fuzzy membershp functon (MF) represented by node; O 1 = µ A (x) = 1, 2 (1) O 1 = µ B-2 (y) = 3, 4 (2) where O l s the output of the node n a layer l. In the second layer the nodes are fxed (.e. not adaptve). Nodes n ths layer are labelled Π and multply the sgnal before outputtng. The outputs are gven by; O 2 = w = µ A (x) µ B (y) = 1, 2 (3) Each node output n ths layer represents the frng strength of the rule. In the thrd layer, every node s also fxed and are labelled wth an N and perform a normalsaton of the frng strength from the prevous layer. The output of each node s gven by; O 3 = w = w + w1 w2 = 1, 2 (4) In the fourth layer, all nodes are adaptve. The output of a node s the product of the normalsed frng strength and a frst order polynomal and s gven by; O 4 = w f = w ( p x + q y + r ) = 1, 2 (5) where {p x, q y, r } s the modfable parameter set, referred to as consequent parameters snce they deal wth the then part of the fuzzy rule. Fnally, layer 5 s a sngle node labelled wth Σ whch ndcates that the functon s that of computng the overall output as the summaton of all ncomng sgnals; O 5 = f = wf = wf w = 1, 2 (6) Further detals of the ANFIS model can be obtaned from Jang [9] and Jang and Gulley [10]. 3. Optmsaton of Proteomcs Heurstcs Followng twenty knowledge acquston sessons wth experts from wthn the feld of proteomcs, heurstc knowledge was gleaned and represented n the form of fuzzy rules. Such rules have the advantage of beng able to represent ntutve terms a form close to natural language and have no precse thresholds, reducng brttleness. These rules are to be used to provde a goal drven element to the fnal data mnng process by allowng the ncorporaton of expert s opnons to dentfy what contrbutes nterestngness from wthn the proteomcs data sets [15]. The structure used n ths paper for expermentaton conssts of 6 nputs, descrbed n Table 1. These nputs form the bass of the experts opnons and are used n varous rules to descrbe whether or not a partcular proten spot s nterestng or not and hence worthy of further laboratory analyss. 82 RASC2004
Input Table 1. Inputs used n ANFIS structure Descrpton No. Membershp Functons Absence/Presence The absence and presence of proten spots from gel to gel 2 X/Y Movement Movement of a proten spot n x and y dmensons from gel to gel 2 Volume Change Increase or decrease n the volume of a proten spot from gel to gel 2 Buddng The jonng or separaton of proten spots from gel to gel 2 Shape Change Morphologcal changes n shape, such as heght from gel to gel 2 Percentage Varaton Percentage change n terms of spot abundance from gel to gel 3 3.1 Results The data set was splt nto two; the tranng data set whch was used to tran the ANFIS model whlst a testng data set was used to verfy accuracy of the traned ANFIS model. The tranng data set conssted of 75% of the total data set wth the test data set consstng of the remanng 25%. Fgure 3 shows the tranng of the ANFIS model, wth the fnal convergence value approachng 0. The dagram shows that the error rate decreases rapdly over the frst 25 epochs before quckly levellng out. Table 2 shows the classfcaton accuracy of the ANFIS, wth the accuracy for nterestng and nonnterestng protens of 96% and 94% respectvely. Fgure 3. Network tranng error graph Table 2. Test Results on ANFIS Class Classfcaton Accuracy Total No. Optmsed Fuzzy Rules Interestng Protens 96% 14 Non-nterestng Protens 94% 82 83 RASC2004
Followng the tranng and testng, the fuzzy expert rules were extracted and a sample of whch s shown n Table 3. The membershp functons of each nput are descrbed n fuzzy terms, such as hgh and low for all but %_Change whch also has a medum membershp. Table 3. Optmsed fuzzy rules If (Absence/Presence s hgh) and (X/Y_Movement s low) and (Volume_Change s hgh) and (Buddng s low) and (Shape_Change s hgh) and (%_Change s low) then (output s hgh) If (Absence/Presence s hgh) and (X/Y_Movement s hgh) and (Volume_Change s hgh) and (Buddng s hgh) and (Shape_Change s low) and (%_Change s low) then (output s hgh) If (Absence/Presence s hgh) and (X/Y_Movement s hgh) and (Volume_Change s hgh) and (Buddng s hgh) and (Shape_Change s hgh) and (%_Change s low) then (output s hgh) If (Absence/Presence s hgh) and (X/Y_Movement s hgh) and (Volume_Change s hgh) and (Buddng s low) and (Shape_Change s low) and (%_Change s low) then (output s hgh) If (Absence/Presence s hgh) and (X/Y_Movement s low) and (Volume_Change s hgh) and (Buddng s low) and (Shape_Change s low) and (%_Change s hgh) then (output s hgh) 3.2 Knowledge Dscovery from Fuzzy Rules Ths optmsaton process revealed whch attrbutes appear to have most nfluence accordng to expert s opnons when classfyng protens as nterestng or not. An ana lyss of the rules reveals that the Absence/Presence nput parameter features as hgh n most of rules that result n output as hgh (.e. hgh level of nterestngness). Ths was also true of the Volume_Change nput whch also featured as hgh n most of the optmsed fuzzy rules. It can also be noted from Table 2 that there were almost sx tmes as many rules produced for non nterestng protens as there were for nterestng. Ths seems to confrm the belef that only partcular combnatons of the nput parameters result n nterestng protens accordng to the expert s belefs. Although many of the expert s opnons n ther ntal form seemed to have varyng bases towards partcular attrbutes and at varyng levels, the optmsaton process has revealed that there s an underlyng trend and hence agreement as to whch attrbutes lead to nterestng protens. Ths can be shown n the hgh classfcaton accuracy of the ANFIS model usng these optmsed membershp functons that were derved from the ntal fuzzy expert s opnons. 4. Conclusons In ths paper we have demonstrated the use of ANFIS to optmse expert s opnons. The ANFIS model offers the advantage of enablng use of ntally approxmate data n an effectve manner whlst, followng tranng, allowng fuzzy rules to be extracted whch represent the optmsed fuzzy membershp functons. Such nformaton s of partcular use when consderng goal-drven data mnng technques. The ncluson of expert s opnons nto the data analyss process of proteomcs data s seen as crtcal to extractng useful and nterestng knowledge. Optmsng the membershp functons enables the use of prevously messy expert s opnons to drve the data mnng and hence mprove the accuracy and value of any extracted results. Further work wll concentrate on ncorporatng such optmsed expert s opnons wth data drven technques to automatcally analyse post-expermental proteomcs data. 5. Acknowledgements The authors acknowledge the support of EPSRC, Nonlnear Dynamcs Ltd and Mark Elshaw. References [1] P.A. Whttaker (2003). What s the relevance of bo nformatcs to pharmacology? In Trends n Pharmacologcal Scences, vol. 24(8), pages 434-439, 2003. 84 RASC2004
[2] R.E. Jenkns and S.R. Pennngton (2001). Novel approaches to proten expresson analyss. In Proteomcs: From proten sequence to functon, pages 207-224 BIOS Scentfc Publshers, Oxford, 2001. [3] S.R. Pennngton, S.R. Wlkns, D.F. Hochstrasser and M.J. Dunn (1997). Proteome analyss: from proten charactersaton to bologcal functon. In Trends n Cell Bology, vol. 17(4), pages 168-173, 1997. [4] S. Beranova-Gorgann (2003). Proteome analyss by two-dmensonal gel electrophoress and mass spectrometry: strengths and lmtatons. In TrAC Trends n Analytcal Chemstry, vol. 22(5), pages 273-281, 2003. [5] D.C. Lebler (2002). Introducton to Proteomcs: Tools for the New Bology, Human Press, 2002. [6] A. Thompson and T. Brotherton (1998). Informaton extracton from 2D electrophoress mages. In Proceedngs of the 20 th Annual Internatonal Conference of the IEEE Engneerng n Medcne and Bology Socety, vol. 20(2), pages 1060-1063, 1998. [7] M. Mann and O.N. Jensen (2003). Proteomc analyss of post-translatonal modfcatons. In Nature Botechnology, vol. 21(3), pages 255-261, 2003. [8] M. Vhnen (2001) Bonformatcs n proteomcs. In Bomolecular Engneerng, vol.18, pages 241-248, 2001. [9] J.-S.R. Jang (1993). ANFIS: Adaptve-network-based fuzzy nference systems. In IEEE Transactons on Systems, Man, and Cybernetcs, vol. 23(3), pages 665-685, 1993. [10] J.-S.R. Jang and N. Gulley (1995). The Fuzzy Logc Toolbox for use wth MATLAB, The Mathworks Inc, 1995. [11] J.-S.R. Jang, C.T. Sun and E. Mzutan (1996). Neuro-Fuzzy and Soft Computng:AComputatonalApproach to Learnng and Machne Intellgence, 1996. [12] S.Y. Belal, A.F.G. Taktak, A.J. Nevll, S.A. Spencer, D. Roden and S. Bevan (2002). Automatc detecton of dstorted plethysmogram pulses n neonates and paedatrc patents usng an adaptve-network-based fuzzy nference system. In Artfcal Intellgence n Medcne, Vol. 24(2), pages 149 165, 2002. [13] I. Güler and E.D. Übeyl (2004) Applcaton of adaptve neuro-fuzzy nference system for detecton of electrocardographc changes n patents wth partal eplepsy usng feature extracton. In Expert Systems wth Applcatons, Vol. 27(3), pages 323-330, 2004. [14] H.F. Kwok, D.A. Lnkens, M. Mahfouf and G.H. Mlls (2003) Rule-base dervaton for ntensve care ventlator control usng ANFIS. In Artfcal Intellgence n Medcne, Vol. 29(3), pages 185-201, 2003. [15] Malone, J., McGarry, K. and Bowerman, C. (2004). Performng trend analyss on spatotemporal proteomcs data usng dfferental rato data mnng. In Proceedngs of the 6th EPSRC Conference on Postgraduate Research n Electroncs, Photoncs, Communcatons and Software (Prep 2004), pages 103-105, 2004. 85 RASC2004