ACTA CHROMATOGRAPHICA, NO. 15, 2005 BASELINE REDUCTION IN TWO DIMENSIONAL GEL ELECTROPHORESIS IMAGES K. Kaczmarek 1, *, B. Walczak 1, S. de Jong 2, and B. G. M. Vandegnste 2 1 Insttute of Chemstry, Slesan Unversty, 9 Szkolna Street, 40-006 Katowce, Poland 2 Unlver R&D Vlaardngen, Olver van Noortlaan 120, 3133 AT Vlaardngen, The Netherlands INTRODUCTION The protens are the controllers of all cell functons and, as such, are closely connected wth many dseases and metabolc processes. Although protens are coded by genes, t has been shown there s a poor correlaton between proten and mrna abundance [1]. For ths reason a full knowledge of the genome s not suffcent for determnaton of the proten composton of cells [2]. Ths neffectveness of genomcs caused a bg shft of nterest from genomcs to proteomcs. The man analytcal tool used for proten separaton n proteomcs s two-dmensonal gel electrophoress (2DGE) [3,4]. Ths technque separates protens accordng to ther masses and charges. These ndependent attrbutes enable separatons of thousands of protens n a sngle analyss. After separaton all protens can be dentfed by mass spectroscopy. In studes performed to dentfy specfc protens related to a gven metabolc process or dsease t s, however, much more effcent to detect and dentfy only protens dfferentatng groups of samples [5]. In comparatve studes of bologcal materal a large number of analyses must be conducted to suppress natural dfferences and to ncrease dfferences related to process beng studed. Ths results n a huge amount of data to be analyzed and generates a need for a rapd, effcent and fully automated method for matchng and comparng gel mages. The mages may dffer sgnfcantly and also contan nose of dfferent characterstcs and a varyng baselne (Fg. 1). Ths necesstates careful preprocessng,.e. nose and background removal. Sgnal nose hampers spot detecton wth methods based on sgnal dervatves, because they magnfy the nose, causng dentfcaton of false peaks (spots) and ncorrect determnaton of the borders of spots. In turn, varyng sgnal background nterferes wth methods based on hstogram segmentaton and thresholdng [6]. The background component s added to the - 82 -
Fg. 1 Components of 2DGE mage presented on ts profle (from top to bottom): profle of 2DGE mage, nose, baselne, and real sgnal. real sgnal and overstates the ntenstes of spots. Another serous problem caused by varyng background s dffculty wth matchng mages. In areabased matchng [7,8] mages are matched to maxmze the smlarty measure between them [9]. Varyng background present n warped mages makes t dffcult to properly calculate ther smlarty. Flterng n the wavelet doman has been found to be the method best suted for dealng wth nose present n 2DGE mages [10]. Many numercal methods have been developed for estmaton of varyng background present n one-dmensonal sgnals [11 25]. Among these technques are methods based on dgtal flters [11 15]. Such flters usually ntroduce artefacts and smultaneously dstort the real sgnal. Other approaches rely on automated peak rejecton [16]. These algorthms ft some functons to fnd regons of sgnal that consst only of the baselne wthout peaks of real sgnal. The functons beng ftted may have dfferent forms, e.g. polynomals [17,18], splnes [19]. The man dsadvantages of peak-rejecton approaches are dffcultes related to dentfcaton of peak-free regons. On the other hand, threshold-based rejecton of peaks gves good results when the baselne s relatvely smooth [20], and fals for sgnals wth sgnfcantly varyng baselne. - 83 -
Because of dffcultes caused by automatc peak rejecton other approaches have been desgned to ft a baselne wthout detectng the peaks. In Ref. [17] the baselne s ftted wth a low-order polynomal that prevents t from fttng the real sgnal peaks. For sgnals wth many postve peaks, however, e.g. electrophoretograms, the baselne estmated n ths way has values whch are too hgh. Subtracton of a background wth values whch are too hgh from a sgnal ntroduces sgnfcant dstortons to the analyzed sgnal,.e. the values for the spots (peaks) are too low. Other approaches rely on statstcal methods, such as maxmum entropy [21,22]. There are also approaches based on baselne removal n the wavelet doman [23-25]. In ths paper we focus on the method, proposed by Elers [26] for background elmnaton n two-dmensonal sgnals based on asymmetrc least squares splnes regresson and evaluate ts potental as an automated approach. THEORY The new baselne-correcton procedure proposed by Elers [26] may be regarded as a method smlar to the peak rejecton approaches; there s, however, no need to detect peaks. Ths procedure s based on the Whttaker smoother [27,28], whch mnmzes the followng cost functon: Q = v 2 ( y f ) + ( d λ f ) (1) 2 where y s the analyzed sgnal, f s a smooth approxmaton of y (baselne), denotes the consecutve values of the sgnal, d s the order of dfferences, and v are weghts. Weghts v should have hgh values n parts of the sgnal where the sgnal analyzed s allowed to affect estmaton of the baselne. In all other regons of the sgnal, values of v are zero. The postve parameter λ s the regularzaton parameter and controls the sgnfcance of the penalty term (λ( d f ) 2 ),.e. the hgher the value of λ, the smoother the estmated baselne. Because of the asymmetry problem n baselne estmaton, the weghts should be chosen n a way that wll enable rejecton of the peaks. To acheve ths, the weghts are assgned as: v p f y > f = 1 p f y f where 0 < p < 1. (2) - 84 -
The postve devatons from the estmated baselne (peaks) have low weghts whle the negatve devatons (baselne) obtan hgh weghts. There s, however, a problem of smultaneous determnaton of weghts (v) and baselne (f). Wthout the weghts t s mpossble to calculate the baselne and wthout the baselne t s mpossble to determne the weghts. Ths problem s solved teratvely,.e., n the frst teraton all weghts get the same value,.e. unty. Usng these weghts, the frst estmate of the baselne s calculated. Iteratng between calculaton of the baselne and settng weghts, gves a good estmate of the baselne n a few teratons. The use of p close to zero and large λ enables baselne estmaton to follow the baselne exactly. The shape of the estmated baselne s not too flat and smultaneously does not follow the peaks of the real sgnal. The penalty term n eq. (1) can also be formulated dfferently. In Ref. [29] Elers proposed the splnes-based approach to smoothng nstrumental sgnals. The multdmensonal extenson of the splne-based approach was presented by Elers n Ref. [30]. The two dmensonal sgnal s descrbed by a data matrx Y contanng j ntensty values. To estmate background, let B (j l) be a B- splne bass along columns of Y matrx and B ( ( k) be a B-splne bass along rows of Y matrx. The splne bass along columns of sgnal constructed from fve bass functons s presented n Fg. 2. Fg. 2 Splne bass along columns of sgnal constructed from fve bass functons As a compromse between speed of calculaton and memory requrements on the one hand and accuracy on the other, ten bass functons,.e., k = l = 10 were used n our study. Then, estmaton of a smooth surface F can be presented n the equaton: - 85 -
f, j ( = bkb k, l lj a kl where a kl s the k,lth element of matrx A contanng regresson coeffcents. The matrx of regresson coeffcents can be calculated by mnmzng the cost functon: 2 ( y f ) P Q = v, j, j, j + (4), j where P s a penalty term defned as: d 2 = + ( ( d P λ a λ a ( l ) ( k ) l k The frst part of eq. (5) s a dfference of order d calculated for each column of A (a l ) and the second part s the dfference ( of order d ( calculated for each row of A (a k ). From eq. (5) t s apparent that the penalty may have dfferent values for the vertcal and horzontal drectons, because there are two dfferent regularzaton parameters ( λ, ( λ ). As the backgrounds n 2DGE mages do not have dfferent spatal structure for the horzontal and vertcal drectons, ( however, one value for both regularzaton parameters wll be used ( λ = λ ). (3) (5) DATA Real 2DGE mages have been used for vsual nspecton of baselne estmaton accuracy. These mages contan results from separatons conducted for human and anmal (e.g. mouse) tssues and have been taken from publc databases [31,32]. For evaluaton of Monte Carlo performance smulated gel mages have been used. Images have been smulated as squares of dfferent sde length (512 to 1024 pxels) and dfferent numbers of spots (500 to 3000 spots) placed n random coordnates. The spots have been smulated as two-dmensonal Gaussan functons. Random whte nose has been added to each smulated mage, resultng n mages wth sgnal-to-nose ratos varyng from 30 to 50. Also, the varyng background has been added. The background has been smulated as a smooth surface n the followng way: 1. fve or nne ponts were selected, four lyng n the corners of the mage, one n the mddle of the mage, and, for nne ponts, four halfway between the centre of the mage and ts edges; - 86 -
2. random ntensty values (not exceedng the 25% of the hghest ntensty n the mage) were assgned to these ponts; 3. a smoothng functon was used to nterpolate the background values between the ponts. The procedure descrbed resulted n smooth backgrounds smlar to those present n real 2DGE mages (Fg. 3). Fg. 3 Comparson of smulated background and typcal real 2DGE mage RESULTS AND DISCUSSION Typcal results from baselne estmaton usng the 2D extenson of the splne-based approach are presented n Fg. 4. Baselne estmaton was performed on a typcal real 2DGE mage [31], whch s presented n Fg. 4a. The mage wth removed background s presented n Fg. 4b. Detals of the background estmaton process are better vsble on a sngle profle of a 2DGE mage. For ths reason Fg. 5 shows the baselne estmaton process for a sngle profle of a smulated 2DGE mage. For ths profle, the process of baselne estmaton converged n twelve teratons; a few of these teratons (1st, 2nd, 3rd, and fnal) are presented. Ths profle has negatve peaks, so the dfferent weghts assgned to data ponts must be used (eq. 2). In ths case the ponts lyng above the estmated baselne must obtan weghts wth hgh values, so p must be close to unty. In the example above followng parameters were used: λ = 100, p = 0.999, and d = 3. - 87 -
a) b) Fg. 4 Results from baselne removal: (a) real 2DGE mage [31] and (b) ts counterpart wth removed background Optmzaton of Input Parameters Because each of the nput parameters used for calculaton of the cost functon n eq. (3) (p, λ, d) affects the results obtaned dfferently, they must be chosen carefully to ensure proper estmaton of the baselne. For ths reason the automated method for baselne correcton should enable automated determnaton of nput parameters or should be nsenstve to changes of nput parameters over a wde range of ther values. Examples of results obtaned for dfferent values of λ and d are presented n Fg. 6 It s apparent that the values of both parameters determne the accuracy of baselne estmaton. The hgher the value of λ (smoothng parameter) the flatter the estmated baselne becomes. In turn, d (order of the dfferences) decdes how well baselnes of dfferent shape,.e. polynomals of dfferent degree, wll be estmated. For 2DGE mages second-order dfferences seem to be a reasonable choce, because they yeld a good estmate of the baselne. The value of p (weghts) also affects the estmates obtaned; ts closeness to unty, however, e.g. 0.999, ensures correct results for sgnals contanng negatve peaks. Methods are avalable for automated estmaton of regularzaton parameters (λ) for sgnals wthout an asymmetry problem. The methods commonly used for determnaton of λ nclude generalzed cross-valdaton [33], the L-curve crteron [34], the quas-optmalty crteron [35], and the - 88 -
a) b) c) d) Fg. 5 Illustraton of the baselne estmaton process n consecutve teratons: (a) frst, (b) second, (c) thrd, and (d) fnal teraton (λ = 500, p = 0.999 d = 3). The estmated baselne s depcted as the sold lne. a) b) Fg. 6 Baselne estmates obtaned for dfferent values of λ and d: (a) λ=1000, (b) λ=10-89 -
dscrepancy prncple [36]. None of these methods can be used for automatc determnaton of the λ value for 2DGE mages, because they have sgnfcant asymmetry. If there s a lack of methods for automatc determnaton of the regularzaton parameter t s possble to conduct a Monte Carlo study on smulated data. Such a study enables determnaton of a value of λ whch yelds proper results for smulated data and also for real data wth smlar characterstcs. For smulated mages t s possble to determne the qualty of baselne estmaton by comparng the real (known) background wth that estmated. As a measure of qualty the MSE (mean square error) may be used: ( bn m bˆ, n m ) n, m, MSE = (5) n m where n and m denote the vertcal and horzontal sze of the mage and b n,m and b ˆ n, m are values of a sngle pxel from the real background and the estmated background, respectvely. Results from real and estmated background comparson are presented n Fg. 7. a) b) Fg. 7 (a) Mean values of MSE for 100 dfferent mages, obtaned for dfferent values of the regularzaton parameter, and (b) ther standard devaton The results were obtaned for 100 mages. Ffty were smulated usng fve ponts and ffty were smulated usng nne ponts for background generaton. Comparson of the real and estmated baselnes ndcate that optmum values of the regularzaton parameter λ are the range 10 1 to 10 3. Ths study enabled determnaton of the λ value for smulated sgnals. Ths - 90 -
value wll also yeld correct results for real mages wth background characterstcs smlar to those present n smulated data. It was also shown that the descrbed method s very robust and resstant to changes n sgnals. Baselne Correcton for Nosy Sgnals For real, nosy sgnals even ponts belongng to the baselne may le under the estmated baselne n a gven teraton (Fg. 8a). As such, they wll not be taken nto calculaton of baselne n next teraton, causng overestmaton of baselne,.e., background wll obtan too hgh values. Because of that, there s a need for estmaton of nose level present n analyzed sgnal. We propose two methods for estmaton of the baselne for nosy 2DGE mages. 1. The standard devaton of the nose can be easly estmated on the bass of wavelet coeffcents [37]. After estmaton of the nose t s possble to calculate the background usng not only ponts lyng above the estmated baselne ponts lyng under the baselne also are used for estmaton f ther dstance to baselne s smaller than the estmated nose. The baselne presented n Fg. 8a s estmated usng only ponts lyng above the baselne whereas the baselne presented n Fg. 8b s estmated usng also ponts lyng under the baselne wthn the range of estmated nose. 2. The second way of dealng wth nose present n analyzed sgnals s heavy smoothng wthout attemptng to preserve narrow peaks. Heavy smoothng may be acheved by use of medan flterng wth a broad wndow, e.g. equal to 5% of sgnal length. Such smoothng does not change the background but very effectvely removes the nose, and of course, most of the narrow peaks. Ths enables proper estmaton of the baselne even for nosy sgnals (Fg. 8c). Baselnes estmated usng these two dfferent approaches are compared n Fg. 9. It s apparent the baselne estmated wthout any consderaton of the nose present n the sgnal s sgnfcantly hgher than the real background. Both approaches ensure proper estmaton of the baselne wthout the nfluence of nose present n the mage. Medan smoothng seems to be less complcated and faster than the wavelet approach. Usually, however, mage processng also nvolves nose reducton and the wavelet approach enables smultaneous, effcent nose removal [10] wth estmaton of the nose present n the mage. For typcal gel mages, the method of background estmaton presented yelds reasonable results for a wde range of parameter values. The - 91 -
a) b) Fg. 8 c) Examples of baselnes estmated for (a) a nosy sgnal, and for the same sgnal wth (b) wavelet estmaton of nose (λ = 10 4, p = 0.999, d = 2) and (c) usng a heavly de-nosed sgnal (λ = 10 4, p = 0.999, d = 2) Fg. 9 Comparson of background estmates obtaned by use of dfferent methods for dealng wth sgnal nose - 92 -
results obtaned for smulated 2DGE mages are presented n Fgs. 10 and 11. The accuracy of method s mmune even to large changes n λ and d. a) b) c) Fg. 10 Results from baselne removal: (a) smulated 2DGE mage and (b, c) ts counterparts wth removed background usng dfferent values of the smoothng parameter (λ = 100 and λ = 10000) and d = 2 a) b) c) Fg. 11 Results from baselne removal: (a) smulated 2DGE mage and (b, c) ts counterparts wth removed background usng dfferent values of smoothng parameter (λ = 100 and λ = 10000) and d = 4 For the reasons already mentoned, the real mages may be used solely for vsual nspecton of the result obtaned. The results obtaned for typcal real 2DGE mages are presented n Fgs. 4, 12, and 13. - 93 -
a) b) Fg. 12 2DGE mage [31] wth varyng background (a) and the same mage wth removed background (b) a) b) Fg. 13 2DGE mage [32] wth varyng background (a) and the same mage wth removed background (b) CONCLUSIONS In the work dscussed n ths paper the sutablty of baselne reducton methods based on splnes regresson were nvestgated. The tests were conducted on synthetc mages n a Monte Carlo study and, because the backgrounds generated n the synthetc mages have characterstcs smlar to those of backgrounds present n real mages, the results should also - 94 -
be vald for real 2DGE mages. The evaluaton demonstrated that the method presented deals very effectvely wth the background present n typcal 2DGE mages. The method s very robust and gves good results for wde range of values of the regularzaton parameter for typcal mages. Ths means that t s possble to deal very effectvely and automatcally wth the background present n typcal gel electrophoress mages. Occasonally, however, the need for fne tunng of the nput parameters may arse. For typcal mages the error of estmaton s very low, rrespectve of values of the smoothng parameter and the sze of the mages. The proposed technques of heavy smoothng and wavelet estmaton of the nose enable baselne estmaton wthout bas ntroduced by the nose always present n nstrumental sgnals. ACKNOWLEDGEMENT The authors are very grateful to Professor Elers for the Matlab code of the 2D splnes approach. K. Kaczmarek thanks Unlever R&D (Vlaardngen, Holland) for fnancal support of hs PhD study. REFERENCES [1] S.P. Gyg, Y. Rochon, B.R. Franza, and R. Aebersold, Mol. Cell. Bol., 19, 1720 (1999) [2] H.F. Hebestret, Curr. Opn. Pharmacol., 1, 513 (2001) [3] P.H. O Farrell, J. Bol. Chem., 250, 4007 (1975) [4] S.J. Fey and P.M. Larsen, Curr. Opn. Chem. Bol., 5, 26 (2001) [5] W.P. Blackstock and M.P. Wer, Trends Botechnol., 17, 121 (1999) [6] Y. Peng-Yeng and C. Lng-Hwe, Sgnal Process., 60, 305 (1997) [7] S. Veeser, M.J. Dunn, and G.Z. Yang, Proteomcs, 1, 856 (2001) [8] Z. Smlansky, Electrophoress, 22, 1616 (2001) [9] C. Hepke, Overvew of Image Matchng Technques. OEEPE Workshop Applcatons of Dgtal Photogrammetrc Workstatons, Proceedngs, Lausanne, Swtzerland, 1996, pp. 173 191 [10] K. Kaczmarek, B. Walczak, S. de Jong, and B.G.M. Vandegnste, Proteomcs, 4, 2377 (2004) [11] A.F. Ruckstuhl, M.P. Jacobson, R.W. Feld, and J.A. Dodd, J. Quant. Spectrosc. Radat. Transfer, 68, 179 (2001) - 95 -
[12] M.A. Kneen and H.J. Annegarn, Nucl. Instrum. Methods Phys. Res. B, 109/110, 209 (1996) [13] J.A. Maxwell, J.L. Campbell, and W.J. Teesdale, Nucl. Instrum. Methods Phys. Res. B, 43, 218 (1989) [14] Y. Sun, K.L. Chan, and S.M. Krshnan, Comput. Bol. Med., 32, 465 (2002) [15] E.H. van Veen and M.T.C. de Loos-Vollebregt, Spectrochm. Acta B, 53, 639 (1998) [16] A. Rouh, M.A. Delsuc, G. Bertrand, and J.Y. Lallemand, J. Magn. Reson. Ser. A, 102, 357 (1993) [17] H. Abbnk Spank, T.T. Lub, R.P. Otjes, and H.C. Smth, Anal. Chm. Acta, 183, 141 (1986) [18] G. Dellalunga, R. Pogn, and R. Basos, J. Magn. Reson. Ser. A, 108, 65 (1994) [19] G. Della Lunga and R. Basos, J. Magn. Reson. Ser. A, 112, 102 (1995) [20] W. Detrch, C.H. Rüdel, and M. Neumann, J. Magn. Reson., 91, 1 (1991) [21] J. Padayachee, V. Prozesky, W. von der Lnden, M.S. Nkwnka, and V. Dose, Nucl. Instrum. Methods Phys. Res. B, 150, 129 (1999) [22] A.J. Phllps and P.A. Hamlton, Anal. Chem., 68, 4020 (1996) [23] X.-G. Ma and Z.-X. Zhang, Anal. Chm. Acta, 485, 233 (2003) [24] H-W. Tan and S.D. Brown, J. Chemom., 16, 228 (2002) [25] C. Perrn, B. Walczak, and D.L. Massart, Anal. Chem., 73, 4903 (2001) [26] P.H.C. Elers, Anal. Chem., 76, 404 (2004) [27] P.H.C. Elers, Anal. Chem., 75, 3631 (2003) [28] R.J. Verrall, Ins.: Mathematcs Econ., 13, 7 (1993) [29] P.H.C. Elers and B.D. Marx, Statst. Sc., 11, 89 (1996) [30] P.H.C Elers, I.D. Curre, and M. Durbán, Comput. Stat. Data Anal., n press [31] GelBank at http://gelabank.anl.gov [32] World-2DPAGE - 2-D PAGE databases and servces at http://www.expasy.ch/ch2d/2d-ndex.html [33] G.H. Golub, M. Heath, and G. Wahba, Technometrcs, 21, 215 (1979) [34] P.C. Hansen, SIAM Rev., 34, 561 (1992) [35] S. Morg and F. Sgallar, Appl. Math. Comput., 121, 55 (2001) [36] H.W. Engl and H. Gfrerer, Appl. Numer. Math., 4, 395 (1988) [37] D.L. Donoho, De-nosng va Soft Thresholdng, IEEE Trans. Inform. Theory, 41, 613 (1995) - 96 -